Professional Documents
Culture Documents
Abstract— False and nuisance alarms are major problems in in a large number of false alarms under fault-free operation
the process industry. Techniques like deadbands, delay-timers, or increase the missed alarms significantly once fault has
and filtering can significantly reduce these false and nuisance occurred.
alarms. The down-side, however, is that using these techniques
introduces some delay in raising the alarm (detection delay).
In this paper, detection delays caused by deadband and
In case of a fault occurrence, alarms may not be raised
delay-timer techniques are calculated using Markov processes. instantly due to different delays in the system. The delay can
A design procedure is also proposed that compromises between be caused by various reasons including network delays, bad
detection delay, false alarm rate and missed alarm rate for an implementation, hardware problems, sensor failure, and data
optimal configuration. loss. Also the alarm configuration (deadband, delay-timer,
Keywords:Alarm systems; Alarm management; Fault detec-
etc) can cause delay in raising the alarm. The difference
tion; Detection delay; Deadband; Delay-timer. in time between the actual moment of fault occurrence and
the moment an alarm is activated is defined as the detection
I. INTRODUCTION delay. For a reliable and effective alarm system, the false
Modern industries are monitored by hundreds and thou- alarm rate, missed alarm rate and detection delay should be
sands of sensors. These sensors are installed in different considered as three performance specifications.
areas and they communicate through a medium to monitor
physical or environmental conditions of the plant. Operators Since the limit checking technique is widely exercised in
are informed of any sensor measurement problem by alarms industries, in this paper a major design constraint of this
indicating abnormal behavior of the plant. To ensure cost technique namely detection delay will be analyzed and con-
efficiency, safety of the work force and plant, and quality of sidering other constraints such as false alarm rate and missed
products, faults must be identified promptly and appropriate alarm rate [6], [7], a recommendation on tuning method
actions should be taken as soon as possible. There are will be discussed. The detection delay has been discussed to
several ways for fault detection as discussed in [1], [2], [3]. some extent in [8] for some change detection algorithms, e.g.
These detection techniques can be broadly classified into CUSUM-type algorithm. But in this work alarm attributes
two categories: model-based, and signal processing based. (e.g. threshold limit, deadbands, delay-timers) described in
Compared to signal processing based, model-based fault the ISA 18.2 [9] or EEMUA 191 [10] standards for basic
detection is a more active field in the area of control theory alarm design are considered only to analyze for detection
and engineering [4]. However, for most practical systems delay; these were not addressed elsewhere earlier. This
it is very difficult to obtain precisely known mathematical knowledge is important to include preventive measures in
models [1] or they are highly nonlinear and not feasible for system design to compensate for activation delay.
implementation from economic point of view. Therefore the
In deadbands, two different limits are used for alarm
application of the model-based scheme is limited.
raising and clearing. A higher threshold is set for raising the
The most common and frequently used fault detection
alarm, and a lower one for clearing the alarm. Another very
method in industry is the simple limit checking method of
effective technique in alarm systems is delay-timers. With
a directly measured variable [1], [5]. This method can be
delay-timers configured, the system requires consecutive few
referred to as signal processing based fault detection. This
samples to cross the threshold before activating the alarm [7].
method has the advantages of simplicity and easy imple-
Both these technique (deadbands and delay-timers) introduce
mentability. However a problem with this simple technique
some delay in triggering the alarm. Markov processes are
is to properly select the threshold which directly affects the
used in this paper to model the alarm system and estimate
number of false and missed alarms. A f alse alarm is an
activation delay for these two techniques.
alarm that is raised without presence of any abnormality
in the process. A missed alarm is an alarm that is not In Section II, an overview of Markov processes with
raised in the presence of a fault. Due to incorrect threshold assumptions presented in the paper are given. In Section III,
setting, normal fluctuations of measured variables may result detection delays are discussed in simplest case of threshold
Naseeb Ahmed Adnan and Tongwen Chen are with Department of limit comparison. Section IV and V discuss detection delays
Electrical and Computer Engineering, University of Alberta, Edmon- due to deadbands and delay-timers, respectively. Section VI
ton, Alberta, Canada, T6G 2V4. naseeb@ece.ualberta.ca, presents an analysis for finding the optimum set level of
tchen@ece.ualberta.ca
Iman Izadi is with Matrikon Inc.,# 1800, 10405 Jasper Ave. Edmonton, threshold and a design procedure. In Section VII, concluding
Alberta, Canada, T5J 3N4. iman@ualberta.ca remarks and future work are discussed.
p1
q2
q1
for deadbands and delay-timers. Consider a Markov process 0 200 400 600 800 1000
0
−4 −2 0 2 4 6 8
with a limited number of states. Assume that the transitional Fig. 1. Process data with threshold and fault occurrence instance [Left];
probability, pij , is the probability of going from state ei at Corresponding probability density functions of fault-free and faulty data
time t to state ej at time t + 1 and define [Right]
··· p ···
p p
11 12
···
1j
···
probability of one sample exceeding limit. Therefore p2 =
p21
.
p22
. .
p2j
. .
1 − p1 and q2 = 1 − q1 as shown in Fig. 1.
P =
.. .
.
.
.
.
.
.
.
,
Probability of zero detection delay is given by the proba-
pi1 pi2 ··· pij ···
··· ··· ··· ··· ··· bility of an alarm being raised instantly at the time of fault;
probability of detection delay one means alarm activation is
The matrix P is known as transition probability matrix.
delayed by one sample from the fault instance. Similarly
A probability vector π is called invariant for the Markov
detection delay of z samples denotes alarm is raised z
process if π = πP . In other words, π is a left eigenvector of
samples later from the actual instance of fault. Assume that,
P with eigenvalue 1 [11]. It is a known fact that an invariant
abnormality occurred at time t = tf , A denotes an alarm
vector π exists if the Markov process satisfies the following
state and N A denotes a no alarm state. Probability of
two conditions:
P detection delays are
1) The sum of all entries in each row of P is 1 ( j pij = P (DD = 0) =P (A at t = tf ) = q2
1). P (DD = 1) =P (A at t = tf + 1 & NA at t = tf ) = q2 .q1
2) All the entries of P are non-negative (pij ≥ 0). .
.
.
To satisfy these conditions and the definition of Markov
P (DD = z) =P (A at t = tf + z & NA at t = tf + z − 1 & . . .
process, we make the following assumptions on the process
& NA at t = tf + 1 & NA at t = tf )
data: z
=q2 .q1
1) Process data is independent and identically distributed
(I.I.D.), i.e. at each sampling instant, the process data Here the detection delays are in terms of samples, though the
(random variable) has the same probability distribution detection delay is normally known as a measure of time. But
as the other instants and all are mutually independent. it will not affect the real scenario as in practice the sampling
2) Probability density functions of the fault free and faulty time is constant and these values can be converted to actual
data are known. These distributions can be estimated time measurements.
from historical data. From the above equations it can be seen that in the simple
case the probabilities of detection delay only depend on the
III. DETECTION DELAY probabilistic distribution of the faulty data. Fault-free region
If a process variable moves from fault-free region of of operation does not have any effect on raising or clearing
operation into faulty region of operation at time tf and alarm of alarm. The expected value of detection delay is
∞ ∞
is raised at time ta , then the detection delay (DD) is given E(DD) =
X
z.P (DD = z) =
X z
z.q2 .q1 = q1 /q2
by the number of samples in the interval ta − tf . In the z=0 z=0
ideal case fault should be detected instantly at the moment Expected value of detection delay (or average detection
of occurrence, which is hardly seen due to different delays. delay) is an important parameter in alarm design as it
In practical condition, the problem is to detect the occurrence indicates the average time it takes to raise an alarm once
of the change as soon as possible [8]. Techniques like delay- there is an abnormality in the system. Therefore, it is always
timers, deadbands, and filters are widely used in alarm desired to reduce average detection delay to ensure more
systems and enhance the effectiveness of limit checking reliable operation of the plant. A method of threshold limit
method but increase the detection delay. However, even if design will be discussed in details in Section VI.
no delay-timer, deadband and filter is configured, there may
still be some detection delays as it is directly related the IV. DETECTION DELAY FOR DEADBANDS
position of alarm threshold limit. In this section, we discuss Deadbands are widely used in industry to eliminate re-
detection delay for the simple threshold comparison alarm peating oscillations or chattering alarms. With deadband
configuration. configured, alarms are raised and cleared according to two
In the fault free operating region assume the probability of different limits, instead of the same limit in regular cases. For
one sample exceeding alarm limit is p1 and the probability example, for high alarms a limit is set, as usual, for raising
of one sample falling within the alarm limit is p2 . Similarly the alarm. However, once an alarm is activated it will not
under the faulty region of operation, q1 is the probability be cleared even if the variable falls below the limit. To clear
of one sample falling within the alarm limit and q2 is the the alarm, the variable must go below a lower threshold.
787
When a variable transits from the fault-free state to the p1
faulty state, due to presence of noise, it crosses the alarm 1 − p1
limit a few times before settling in the abnormal state. This NA A
oscillation results in subsequent raising and clearing of the 1 − p2
separating raising and clearing limits) is, then, helpful here Fig. 3. Markov diagram of a system with deadband in fault-free region of
in preventing alarm chattering [12]. operation
Deadbands should typically be configured based on the
normal operating range of the process variables, measure- steady-state invariant vector is unique [11]. Therefore the
ment noise, and type of the process variables [9], [12]. There Markov process in the faulty operation always have this
are certain standards for setting deadbands, e.g. in ISA 18.2 unique steady-state invariant vector as initial condition.
[9] or EEMUA [10]. The correct configuration of a deadband If the system transfers from fault-free to faulty state at time
(setting limits appropriately) is essential in maximizing the tf , using the forward Chapman-Kolmogorv equations [11],
benefits of the deadband. probabilities of alarm and no-alarm states can be calculated
When deadband is configured, as it can be seen in Fig. 2, as
[PN A (tf ) PA (tf )] = [PN A (tf − 1) PA (tf − 1)] Pf
the probabilities of one sample going over the raising limit
p1 q1 + p2 (1 − q2 ) p2 q2 + p1 (1 − q1 )
and below the clearing limit do not add up to one, i.e., p1 + = πn Pf =
p1 + p2 p1 + p2
p2 6= 1, q1 + q2 6= 1; where p1 , p2 , q1 , q2 are defined as
before in Section III. Therefore the probability of detection delay zero (proba-
To calculate the probabilities of false alarm and missed bility of alarm being raised immediately after transition from
alarm, notice that a process operating in either the fault- normal to abnormal) is
P (DD = 0) = P (A at t = tf )
free or faulty region, can be in two states from alarm
p2 q2 + p1 (1 − q1 )
point of view: alarm state (A) and no alarm state (N A). = PA (tf ) =
p1 + p2
These states can be modeled with a Markov chain [7]. A
Markov process for deadband is shown in Fig. 3 for fault- Probabilities of higher detection delays can be expressed in
free operating region. The Markov model in faulty region terms of conditional probabilities as
p1 q1 + p2 (1 − q2 )
is similar. Transitional probabilities from one state to other P (DD = 1) =P (A at t = tf + 1 & NA at t = tf ) = .q2
p1 + p2
are represented by transition probability matrix Pn , in the P (DD = 2) =P (A at t = tf + 2 & NA at t = tf + 1, NA at t = tf )
fault-free region of operation and by Pf in the faulty region p1 q1 + p2 (1 − q2 )
= .q2 .(1 − q2 )
of operation: p1 + p2
1 − p1 p1
1 − q2 q2
.
Pn = , Pf = .
p2 1 − p2 q1 1 − q1 .
P (DD = z) =P (A at t = tf + z & NA at t = tf + z − 1, ... ,
If the process remains in the fault-free operating region, NA at t = tf )
after a transient time the Markov process reaches its steady p1 q1 + p2 (1 − q2 ) z−1
state and the vector of state probabilities converges to the = .q2 .(1 − q2 ) , z≥1
p1 + p2
invariant vector. The steady state vector of probabilities The average detection delay (i.e., the expected value of
(invariant vector) for Pn is [7] the detection delay) is then given by
p2 p1
πn = ∞
p1 + p2 p1 + p2 X
E(DD) = z.P (DD = z)
When a fault occurs, the Markov process changes from z=0
∞
fault-free model (represented by Pn ) to the faulty model =
X
z.q2 (1 − q2 )
z−1
.
p1 q1 + p2 (1 − q2 )
=
p1 q1 + p2 (1 − q2 )
p1 + p2 q2 (p1 + p2 )
(represented by Pf ). Therefore, the steady-state probabilities z=1
788
p1
N A2
Pf 11 have dimension n × (n + 1), Pn21 and Pf 21 are m × n
N A1 and Pn22 and Pf 22 are m × m. Hence, Pn and Pf are of
p1
p1 1 − p1
dimension (n + m) × (n + m).
1 − p1
Similar to the deadband case, at the moment of fault
1 − p1
NA A occurrence, the Markov model of the system switches from
1 − p2 1 − p2
the fault-free model (represented by Pn ) to the faulty model
p2
(represented by Pf ). Therefore, to calculate the probabilities
p2
A1 of the state after fault occurrence, Pn is assumed to reach its
Fig. 4. Markov diagram of a system with delay-timer n = 3 and m = 2 steady state; and the steady state probabilities of the fault-
in fault-free region of operation free model, should be used as initial states for the faulty
model. The steady state vector of probabilities for the fault-
[10] and ISA [9]) recommend some values for delay-timers free state of operation (e.g., the invariant vector for Pn ) is
based on the nature of the process variable. Similar to the [7]
deadband case, we can model the alarm/no-alarm states of
1
a process variable with delay-timers by Markov chains [7]. πn = ×
n−1
X m−1
i n
X j
Fig. 4 shows the Markov model of a system in its fault-free pm
2 p1 + p1 p2
i=0 j=0
operation with n = 3 samples on-delay and m = 2 samples
pm p1 pm pn−1 pm pn p2 pn pm−1 pn
··· ···
off-delay. A similar Markov model with probabilities q1 and 2 2 1 2 1 1 2 1
q2 can be constructed for the faulty region of operation. To calculate the detection delay for delay-timers, we use
In Fig. 4, assuming N A or no alarm state is the initial the concept of hitting time [11]. In our context, hitting time
state for the process, 1 − p1 denotes the probability that it is the minimum time required for system currently in the
will remain in the same state. If the next sample exceeds no-alarm state to switch to the alarm state for the first time.
alarm threshold with probability p1 then it moves to state We divide the whole state space into two subspaces.
N A1 , which is the first intermediate state for 3−samples The first subspace, denoted by D, contains all the no-alarm
delay case before going to alarm state. Exceeding threshold state(s) (N A, ..., N An−1 ). The second subspace contains all
by consecutive 3−samples with probability p1 will take the the alarm state(s) (A, ..., Am−1 ) and is denoted by E. With
system to the alarm state. If in between the states N A these definitions, the detection delay is the same as the hitting
and N A2 , any sample falls below threshold with probability time: the time required to switch to states E, assuming the
1 − p1 , then system will go back to no alarm state. Only system is initially started in states D.
consecutive 3-sample crossing over threshold can raise the Let Q be the matrix of transitional probabilities from D to
alarm, taking the system to the alarm state. Once the system itself in the faulty region. Q is obtained from Pf by keeping
goes to alarm state, to clear the alarm same principal is all the probabilities corresponding to no-alarm states and
followed as raising. Consecutive 2 samples falling below replacing all other probabilities with zero. For the Markov
threshold with probability p2 (where p2 = 1 − p1 ) can process in Fig. 4, Q is
only clear the alarm. In intermediate states as output, the
Pf 11 0n×(m−1)
Q=
system will always provide either alarm or no alarm, de- 0m×n 0m×m
pending on from where these intermediate states started. It can be shown that probability of z−sample detection
Transitional probability matrices for n−samples on-delay delay for delay-timer (e.g., hitting time for switching from
and m−samples off-delay for both the fault-free (Pn ) and no-alarm states D, to alarm states E) is given by
faulty (Pf ) region of operations are given by P (DD = z) = πn .Pf .Q
z
0···0 1···1 T
Pn =
Pn11 0n×(m−1)
, Pf =
Pf 11 0n×(m−1) Here in the column vector there are n zeros and m ones.
Pn21 Pn22 Pf 21 Pf 22
The expected detection delay for delay-timer can then be
where, expressed as
1 − p1 p1 0 ··· 0
∞
−2
X T
1 − p1 0 ··· 0
p1 E(DD) = z.P (DD = z) = πn .Pf .Q.(I − Q) . 0···0 1···1
Pn11 = . . . . ,
. . . .. .
z=0
. . . . .
n−1
n−1 n−j−1 n−1
1 − p1 0 0 ··· p1 pn
X i
X j
X k n
X i
pm−1
2 1 q1 q2 + p2 p1 q2 − q2 p1
i=0 j=0 k=0 i=0
1 − p2 p2 0 ··· 0 =
0 0 ··· 0
n−1 m−1
!
1 − p2 0 p2 ··· 0 m
X i n
X i
.
.
.
.
.
.
.
.
. . .
. q2n p2 p1 + p1 p2
Pn21 = . . . . , Pn22 = . . . .. . ,
. i=0 i=0
. . . .
0 0 ··· 0
p2 0 ··· 0
1 − p
2 0 0 ··· p 2 Since Q is a substochastic matrix, i.e. a matrix with
1 − p2 0 0 ··· 0
nonnegative entries whose row sums are less than or equal
here, horizontal line in Pn is used to indicate, number of to 1 and Qn → 0 as n → ∞; therefore, all eigenvalues of
columns in Pn11 and Pn21 or in 0n×(m−1) and Pn22 are Q have absolute values strictly less than 1; and the series in
not equal. Same is true for Pf . Furthermore Pf 11 , Pf 21 the summation converges [11].
and Pf 22 have the same structure as Pn11 , Pn21 and Pn22 Detection delays for delay-timers are calculated so far for
respectively, only p1 , p2 are replaced by q2 , q1 . Pn11 and n−samples on-delay and m−samples off-delay. When n = 1
789
12
EDD by Monte Carlo Simulation
10
EDD by Monte Carlo Simulation
setting the threshold lower than the optimum point will result
EDD by Markov Model EDD by Markov Model
10
in smaller number of missed alarms but more false alarms.
8
In the rest of this section, we focus our design on delay-
8
6
timers only. For a given set of fault-free and faulty data, the
n=4
design parameters are then the threshold (t), the number of
EDD
EDD
6 n=4
790
TABLE I
40
FAR MAR D ESIGN PARAMETERS SELECTION CHART
35
n=1
Delay-timer (n) Threshold (t) FAR = MAR (%) EDD
30
1 0.67 25.26 0.21
25 2 0.67 13.76 1.26
FAR / MAR
n=2
20
3 0.67 6.37 2.89
n=3
4 0.67 2.63 5.04
15 n=4 5 0.67 1.01 7.66
10
n=5
5 detection delay is 4.92 samples, which are consistent with
0
0 0.5 1 1.5
the calculated values.
Threshold
Fig. 7. Estimation of threshold when FAR = MAR VII. CONCLUSIONS AND FUTURE WORKS
n ≥ n1 . The corresponding area for FAR and MAR (for The expected detection delay, as a measure of the time
≤ 3%) is shown by the shaded area in Fig. 6. The smallest it takes for the alarm system to respond to a fault, is an
delay-timer that satisfies the condition is n1 = 4. Though important parameter in the design of alarm systems. In this
design requirement was for FAR ≤ 4%, n1 is selected for paper, the expected detection delay is calculated for two com-
the one with lower percentage (≤ 3%) requirements among mon techniques in alarm systems, namely, deadbands and
FAR and MAR. Since n1 is selected for more conservative delay-timers. We also presented a simple design procedure,
range, the selected delay-timer does not violate the original based on three important performance measures of an alarm
requirements and is expected to provide better performance. system: false alarm rate, missed alarm rate and the expected
detection delay. As a future extension, a more systematic
Step 3 approach to the design of parameters of the alarm system
(on-delay timer, off-delay timer and the trip-point) is under
The EDD is taken into account in step 3 for the design of
investigation.
delay-timer. The largest value of delay-timer n2 is selected
such that EDD ≤ 6; where n ≤ n2 . From Fig. 8, n2 = 4. VIII. ACKNOWLEDGMENTS
Step 4 The authors would like to thank Dr. B. Schmuland, Depart-
ment of Mathematical and Statistical Sciences, University of
Once n1 and n2 are estimated, the next step is to select the
Alberta, for helpful discussions.
range of delay timers to finalize the design process. If n2 ≥
n1 , any n satisfying n1 ≤ n ≤ n2 is a solution of the delay- R EFERENCES
timer. Here the only delay-timer that satisfies the condition [1] R. Isermann, Fault-Diagnosis Systems: An Introduction from Fault
is, n = 4; from Table I the shaded row is the solution of Detection to Fault Tolerance, Springer-Verlag, Berlin, Heidelberg,
given design problem. n = 3 satisfies the condition of EDD 2006.
[2] J. Chen and R. Patton, Robust Model-Based Fault Diagnosis for
but does not satisfy the requirement of FAR / MAR; other Dynamic Systems, Kluwer Academic Publishers, Boston, 1999.
delay-timers also do not satisfy requirements. The optimum [3] J.J. Gertler, Fault Detection and Diagnosis in Engineering Systems,
threshold of operation is 0.67 and delay-timer is 4 samples, Marcel Dekker, New York, 1998.
[4] S. X. Ding, Model-based Fault Diagnosis Techniques: Design
and it will take 5.04 samples to raise the alarm. If such a Schemes, Algorithms, and Tools, Springer-Verlag, Berlin, Heidelberg,
range of delay-timer cannot be found; or in other words if no 2008.
such n exists to satisfy n1 ≤ n ≤ n2 ; design requirements [5] J. Ahnlund, T. Bergquist, and L. Spaanenburg, Rule-based reduction of
alarm signals in industrial control, Journal of Intelligent and Fuzzy
need adjustments. Less demanding design requirements are Systems, IOS press, vol. 14, no. 2, pp. 73-84, 2003.
recommended for such cases. [6] I. Izadi, S.L. Shah, D. S. Shook, and T. Chen, An introduction to
A Monte Carlo simulation is performed to check consis- alarm analysis and design, Proceedings of the 7th IFAC Symposium
on Fault Detection, Supervision and Safety of Technical Processes,
tency of the calculated values. Setting the threshold to 0.67, pp. 645-650, June 30 - July 3, 2009, Barcelona, Spain.
associated false alarm rates, missed alarm rates and detection [7] I. Izadi, S. L. Shah, D. S. Shook, S. R. Kondaveeti and T. Chen,
delays were calculated. From a Monte Carlo simulation of ”Optimal alarm design”, Proceedings of the 7th IFAC Symposium on
Fault Detection, Supervision and Safety of Technical Processes, pp.
5000 iterations (for Gaussian distributed fault-free data with 651-656, June 30 - July 3, 2009, Barcelona, Spain.
mean 0, variance 1 and faulty data with mean 2 and variance [8] M. Basseville, I.V. Nikiforov, Detection of Abrupt Changes: Theory
2), the calculated FAR is 2.58%, MAR is 2.84 %, and and Application, PTR Prentice-Hall Inc. 1993.
[9] International Society of Automation, Management of Alarm Systems
for the Process Industries, ANSI/ISA18.22009.
[10] Engineering Equipment and Materials Users Association (EEMUA),
Alarm Systems-A Guide to Design, Management and Procurement,
EEMUA Publication 191, 2007.
[11] G. F. Lawler, Introduction to Stochastic Processes, 2nd Edition,
Chapman & Hall/CRC, 2006.
[12] A. Hugo, ”Estimation of Alarm Deadbands”,Proceedings of the 7th
IFAC Symposium on Fault Detection, Supervision and Safety of
Technical Processes, pp. 663-667, June 30 - July 3, 2009, Barcelona,
Spain.
791