Risk-Based Maintenance (RBM) A Quantitative Approach For

Journal of Loss Prevention in the Process Industries 16 (2003) 561573
www.elsevier.com/locate/jlp
Risk-based maintenance (RBM): a quantitative approach for

maintenance/inspection scheduling and planning
Faisal I. Khan , Mahmoud M. Haddara
Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St. Johns, Nfld, Canada A1B 3X5
Abstract
The overall objective of the maintenance process is to increase the profitability of the operation and optimize the total life cycle
cost without compromising safety or environmental issues. Risk assessment integrates reliability with safety and environmental
issues and therefore can be used as a decision tool for preventive maintenance planning. Maintenance planning based on risk
analysis minimizes the probability of system failure and its consequences (related to safety, economic, and environment). It helps
management in making correct decisions concerning investment in maintenance or related field. This will, in turn, result in better
asset and capital utilization.
This paper presents a new methodology for risk-based maintenance. The proposed methodology is comprehensive and quantitative.
It comprises three main modules: risk estimation module, risk evaluation module, and maintenance planning module. Details of
the three modules are given. A case study, which exemplifies the use of methodology to a heating, ventilation and air-conditioning
(HVAC) system, is also discussed.
2003 Elsevier Ltd. All rights reserved.
Keywords: Maintenance; Risk assessment; Risk-based maintenance; Risk-based inspection; Maintenance planning
1. Introduction
The last two decades witnessed major progress in the
development of new maintenance strategies. Progress in
the maintenance area has been motivated by the increase
in the number, size, complexity, and variety of physical
assets; growing awareness of the impact of maintenance
on the environment, safety of personnel, the profitability
of the business, and quality of the products.
Unexpected failures usually have adverse effects on
the environment and may result in major accidents. Studies by Kletz (1994), Khan and Abbasi (1998), and
Kumar (1998) show the close relationship between
maintenance practices and the occurrence of major accidents. Profitability is closely related to availability and
reliability of the equipment, while product quality is very
much dependent on equipment condition. The major
challenge for a maintenance engineer is to implement a
Corresponding author. Tel.: +1-709-737-8939/7652; fax: +1-709737-4042.

E-mail address: fkhan@engr.mun.ca (F.I. Khan).
0950-4230/$ - see front matter 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jlp.2003.08.011
maintenance strategy which maximizes availability and

efficiency of the equipment; controls the rate of equipment deterioration; ensures a safe and environmentally
friendly operation; and minimizes the total cost of the
operation. This can only be achieved by adopting a structured approach to the study of equipment failure and the
design of an optimum strategy for inspection and maintenance.
Maintenance management techniques have been
through a major process of metamorphosis, from focusing on periodic overhauls to the use of condition monitoring, reliability-centered maintenance, and expert systems.
Most
recently,
risk-based
maintenance
methodologies started to emerge.
Chen and Toyoda (1990) proposed a strategy for
maintenance scheduling based on equalizing incremental
risk. The risk-based inspection and maintenance strategy
developed by the American Society of Mechanical
Engineers (1991) was used as a basis for developing a
base resource document on risk-based inspection by
the American Petroleum Institute, API (1995).
Work by Aller, Horowitz, Reynolds, and Weber
(1995) and Reynolds (1995) constituted the basis for the
development of a risk-based inspection policy for equip-
562
F.I. Khan, M.M. Haddara / Journal of Loss Prevention in the Process Industries 16 (2003) 561573
Nomenclature
A
B
C
D
AR
AD
PDI
IM
i
t
h
F(t)
PDF1
system performance loss factor (dimensionless)

financial loss factor (dimensionless)
human health loss factor (dimensionless)
environmental loss factor (dimensionless)
area under the damage radius (m2)
asset density in the vicinity of the event (up till 500 m radius) ($/m2)
population density in the vicinity of the event (up till 500 m radius) (persons/m2)
importance factor can be derived from Fig. 4 (dimensionless)
number of events, fire, explosion, toxic release, etc.
failure time (h)
characteristics life of the component (scale of Weibull distribution) (h)
slope or shape factor of Weibull distribution
failure probability function
population distribution factor (dimensionless)
ment owned by Brunei Shell Petroleum (Hagemeijer and

Kerkveld (1998)).
A risk-based approach has been applied successfully
to the maintenance of oil pipelines. Dey, Ogunlana,
Gupta, and Tabucanon (1998) discussed a simple riskbased model for the maintenance of a cross-country
pipeline. Nessim and Stephens (1998) proposed a quantitative risk analysis model, and recently Dey (2001)
described a more general model for risk-based inspection
and maintenance of cross-country pipelines.
The use of a risk-based policy in the maintenance of
medical devices has been tackled by Capuano and Koritko (1996) and Ridgway (2001).
The review of the literature indicates that there is a
new trend to use the level of risk as a criterion to plan
maintenance tasks. However, most of the previous studies focused on a particular equipment type. It seems that
there is a need for a more generalized methodology that
can be applied to all types of assets irrespective of
their characteristics.
There is also a need for a more realistic quantification
of risk factors. The quantitative description of risk is
affected by the quality of the consequence study and the
accuracy of the estimates of the probability of failure.
This study will focus, among other things, on these two
factors. It is hoped that this study will lead to a mathematical model that can be used to develop an optimum
maintenance strategy.
1.1. Concept of risk and its relevance in maintenance
One of the main objectives of a sound maintenance
strategy is the minimization of hazards, both to humans
and to the environment, caused by the unexpected failure
of the equipment. In addition, the strategy has to be cost
effective. Using a risk-based approach ensures a strategy, which meets these objectives. Such an approach
uses information obtained from the study of failure

modes and their economic consequences.
Risk analysis is a technique for identifying, characterizing, quantifying, and evaluating the loss from an
event. Risk analysis approach integrates probability and
consequence analysis at various stages of the analysis
and attempts to answer the following questions:
What can go wrong that could lead to a system failure?
How can it go wrong?
How likely is its occurrence?
What would be the consequences if it happens?
In
this
context,
risk
can
be
defined
qualitatively/quantitatively as the following set of
duplets for a particular failure scenario.
Risk probability of failure
consequence of the failure
Risk assessment can be quantitative or qualitative. The
output of a quantitative risk assessment will typically be
a number, such as cost impact ($) per unit time. The
number could be used to prioritize a series of items that
have been risk assessed. Quantitative risk assessment
requires a great deal of data both for the assessment of
probabilities and assessment of consequences. Fault tree
or decision trees are often used to determine the probability that a certain sequence of events will result in a
certain consequence. Qualitative risk assessment is less
rigorous and the results are often shown in the form of
a simple risk matrix where one axis of the matrix represents the probability and the other represents the
consequences. If a value is given to each of the probability and a consequence, a relative value for risk can
be calculated. It is important to recognize that the quali-
tative risk value is a relative number that has little meaning outside the framework of the matrix. Within the
framework of the matrix, it provides a natural prioritization of items assessed using the matrix. However, as
these risk values are subjective, prioritizations based on
these values are always debatable.
The proposed risk-based maintenance (RBM) strategy
aims at reducing the overall risk of failure of the
operating facilities. In areas of high and medium risk, a
focused maintenance effort is required, whereas in areas
of low risk, the effort is minimized to reduce the total
scope of work and cost of the maintenance program in
a structured and justifiable way. The quantitative value
of the risk is used to prioritize inspection and maintenance activities. RBM suggests a set of recommendations on how many preventive tasks (including the
type, means, and timing) are to be performed. The
implementation of RBM will reduce the likelihood of an
unexpected failure. Detailed description of the methodology is presented in subsequent sections.
(2) risk evaluation, which consists of risk aversion and

risk acceptance analysis, and
(3) maintenance planning considering risk factors.
3. Module I: risk estimation

This module comprises four steps, which are logically
linked as shown in Fig. 2. A detailed description of each
step is presented below.
3.1. Step I.1. Failure scenario development
A failure scenario is a description of a series of events
which may lead to a system failure. It may contain a
single event or a combination of sequential events. Usually, a system failure occurs as a result of interacting
sequence of events. The expectation of a scenario does
not mean it will indeed occur, but that there is a reasonable probability that it would occur. A failure scenario
is the basis of the risk study; it tells us what may happen
2. Risk-based maintenance methodology

The risk-based maintenance methodology is broken
down into three main modules, see Fig. 1:
(1) risk determination, which consists of risk identification and estimation,
Fig. 1.
Architecture of RBM methodology.
563
Fig. 2.
Description of risk estimation module.
564
so that we can devise ways and means of preventing or

minimizing the possibility of its occurrence. Such scenarios are generated based on the operational characteristics of the system; physical conditions under which
operation occur; geometry of the system, and safety
arrangements, etc. Recently, Khan (2001) has proposed
a systematic proceduremaximum credible accident
scenario (MCAS)to evaluate failure (accidents) scenarios in a process system. The procedure introduces the
concept of maximum credible scenarios as an alternative
to the current methodology based on the worst-case
scenario as recommended by many regulatory agencies.
The developed failure scenarios are then screened to
short list the ones that are more relevant to the system at
hand. MCAS provides the criteria to form this short list.
3.2. Step I.2. Consequence assessment
The objective here is to prioritize equipment and their
components on the basis of their contribution to a system
failure. For example, in the case of a pressure containment, a pinhole leak on a process line may not lead to
a total loss of production. This is in contrast to a failure
of a pipe valve which may cause a shut down of the line.
Consequence analysis involves assessment of likely
consequences if a failure scenario does materialize.
Initially, consequences are quantified in terms of damage
radii (the radius of the area in which the damage would
readily occur), damage to property (shattering of window
panes, caving of buildings) and toxic effects
(chronic/acute toxicity, mortality). The calculated damage radii are later used to assess the effect on human
health, and environmental and production losses. Fig. 3
illustrates the procedure for this step. The assessment of
consequences involves a wide variety of mathematical
models. For example, source models are used to predict
the rate of release of hazardous material, the degree of
flashing, and the rate of evaporation. The models for

explosions and fires are used to predict the characteristics of explosions and fires. The impact intensity models are used to predict the damage zones due to fires,
explosion and toxic load. Lastly, toxic gas models are
used to predict human response to different levels of
exposures to toxic chemicals. There are many tools
available to conduct this analysis such as WHAZAN,
MAXCRED, RISKIT, etc. (Khan & Abbasi, 1999a).
MAXCRED is one of the recent tools that is built upon
the latest models of fires, explosions and toxic release
and dispersion (Khan & Abbasi, 1999a). The total consequence assessment is a combination of four major categories as described below.
3.2.1. System performance loss
Factor A accounts for the systems performance loss
due to component/unit failure. This is estimated semiqualitatively based on the experts opinion. In this work,
it is suggested using the following procedure for
determining the value of this parameter:
Ai function (performance)
(1)
Details of the function are given in Table 1.

3.2.2. Financial loss
Factor B accounts for the damage to the property or
assets and may be estimated for each accident scenario
using the following relations:
Bi (AR)i (AD)i / UFL
(2)
(3)
Bi
i 1,n
where i denotes the number of events, i.e. fire, explosion,

toxic release, etc. The UFL in Eq. (2) signifies the level
of an unacceptable loss. In the present study, we will
use a value of 1000 for UFL. This value is subjective
and may change from case to case as per an organizations criterion.
3.2.3. Human health loss
A fatality factor is estimated for each accident scenario using the following equations:
PD1 PD1 PDF1
(4)
Ci (AR)i (PD1)i / UFR
(5)
(6)
Ci
i 1,n
Fig. 3.
Consequence assessment chart.
where UFR denotes an unacceptable fatality rate. The

suggested value for UFR is 103 (subjective value and
may change from case to case). The PDF1 defines the
population distribution factor, which reflects heterogeneity of the population distribution. If the population is
uniformly distributed in the region of study (~500 m
565
Table 1
Quantification scheme for system performance function used in Eq. (1)
Class
Description
Function (operation)
810
II
III
IV
V
Very important for system operation

Failure would cause system to stop functioning
Important for good operation
Failure would cause impaired performance and adverse consequences
Required for good operation
Failure may affect the performance and may lead to subsequent failure of the system
Optional for good performance
Failure may not affect the performance immediately but prolonged failure may cause system to fail
Optional for operation
Failure may not affect the systems performance
radius), the factor is assigned a value of 1; if the population is localized and away from the point of accident
the lowest value 0.2 is assigned. Values for this parameter have been adapted from the latest work of Hirst
and Carter (2000).
3.2.4. Environment and/or ecological loss
The factor C signifies damage to the ecosystem, which
can be estimated as:
Di (AR)i (IM)i / UDA
(7)
(8)
Di
68
46
24
02
Con [0.25A2 0.25B2 0.25C2 0.25D2]0.5
(9)
3.3. Step I.3. Probabilistic failure analysis

Probabilistic failure analysis is conducted using fault
tree analysis (FTA). The use of FTA, together with
components failure data and human reliability data,
enables the determination of the frequency of occurrence
of an accident. Developing probabilistic fault trees is
made easier using a methodology called analytical
simulation, see Khan and Abbasi (2000).
The key features of this step are:
i 1,n
UDA indicates a level for the unacceptable damaging

area, the suggested value for this parameter is 1000 m2
(subjective value and may change from case to case).
IM denotes importance factor. IM is unity if the damage
radius is higher than the distance between an accident
and the location of the ecosystem. This parameter is
quantified using Fig. 4, see Khan and Abbasi (1997).
Finally, these three factors are combined together to
yield the factor Con.
Fig. 4.
Quantification of importance factor (IM).
(1) Fault tree development: The top event is identified

based on the detailed study of the process, control
arrangement, and behavior of components of the
unit/plant. A logical dependency between the causes
leading to the top event (failure) is developed.
(2) Boolean matrix creation: The fault tree developed is
transformed to a Boolean matrix. If the dimension
of the Boolean matrix is too large to be handled by
the available computer, a structural moduling technique may be applied (Shafaghi, 1988; Yllera,
1988). This technique proposes moduling of the fault
tree into a number of smaller submodules with
dependency relations among them. This reduces the
memory allocation problem as well as makes the
computation faster.
(3) Finding of minimum cutsets and optimization: Minimum cutsets are determined from the Boolean
(Greenberg & Slater, 1992). If the problem has been
structurally moduled, then each module is solved
independently, and the results are combined. The
minimum cutsets are then optimized using an appropriate technique. Optimization is necessary in order
to eliminate the unimportant paths (cutsets).
(4) Probability analysis: The optimized minimum cutsets are used to estimate probabilities. The present
authors recommend the use of MonteCarlo simulation method (Rauzy, 1993; Soon, Joo, & Myung,
1985) for this purpose. The simulation methods not
566
only give the probability of the top event but they

also provide information on the sensitivity of the
results. In addition, simulation is helpful in studying
the impact of each of the initiating events. To
increase the accuracy of the computations and
reduce the margin of error due to inaccuracies
involved in the reliability data of the basic events
(initiating events), we recommend the use of a fuzzy
probability set (Dubois & Prade, 1980; Lai,
Shenoi, & Fan, 1988; Noma, Tankara, & Asai, 1981;
Tanaka, Fan, Lai, & Toguchi, 1983). Fuzzy probability set theory is used in analytical simulation
algorithm and coded in PROFAT software (Khan &
Abbasi, 1999b).
(5) Improvement index estimation: The improvement
index provides a measure of the impact of each root
cause on the final failure event. Improvement indices
are estimated using the simulation results. To estimate the impact of a root cause, the simulation is
carried out twice: with and without the cause. The
improvement index is then obtained as a measure
of the change in the probability of occurrence of the
final event.
estimated risk exceeds the acceptance criteria are identified. These are the units that should have an improved
maintenance plan.
3.4. Step I.4. Risk estimation
The last step in this methodology aims at verifying

that the maintenance plan developed produces acceptable total risk level for the system.
The results of the consequence and the probabilistic

failure analyses are then used to estimate the risk that
may result from the failure of each unit. In the next module, we will show how the estimated risk is evaluated
against an acceptance criteria.
5. Module III: maintenance planning

Units whose level of estimated risk exceeds the
acceptance criteria are studied in detail with the objective of reducing the level of risk through a better maintenance plan. The details of this analysis are given below,
see Fig. 6.
5.1. Step III.1. Estimation of optimal maintenance
duration
The individual failure causes are studied to determine
which one affects the probability of failure adversely. A
reverse fault analysis is carried out to determine the
required value of the probability of failure of the root
event. A maintenance plan is then completed.
5.2. Step III.2. Re-estimation and re-evaluation of risk
6. Case study: a maintenance plan for an HVAC

system
6.1. System description
4. Module II: risk evaluation

The objective of this module is to evaluate the estimated risk using the methodology explained above. The
algorithm used is shown in Fig. 5. This evaluation algorithm comprises two steps as detailed below.
4.1. Step II.1. Setting up an acceptance criteria
In this step, we identify the specific risk acceptance
criteria to be used in our study. To allow for different
criteria for the acceptable level of risk depending on the
system nature and type, an open-ended methodology has
been used in this study. Different acceptance risk criteria
are available in the literature, see ALARP (as low as
reasonably possible), Dutch acceptance criteria, and
USEPA acceptance criteria (Lees, 1996).
4.2. Step II.2. Risk comparison against acceptance
criteria
In this step, we apply the acceptance criteria to the
estimated risk for each unit in the system. Units whose
Heating, ventilation and air-conditioning (HVAC)

systems control the temperature, humidity, and total air
quality in residential, commercial, and industrial buildings. Efficient and failure free operation of HVAC system is critical for the safety of patients. A typical HVAC
system is critical for the safety of patients. A typical
HVAC system consists of various mechanical, electrical,
and electronic components such as motors, compressors,
pumps, fans, ducts, pipes, thermostats, and switches. A
simplified block diagram of an HVAC system is shown
in Fig. 7. To maintain an uninterrupted operation of an
HVAC system requires a plan for early correction of
anticipated problems. Further, planned maintenance
ensures conservation, recovery, and recycle of chlorofluorocarbon (CFC) and hydrochlorofluorocarbon
(HCFC) refrigerants used in systems. The release of
CFCs and HCFCs contributes to the depletion of the
stratospheric ozone layer, which protects plant and animal life from ultraviolet radiation.
The present case study deals with the analysis of an
HVAC system (Wong, 2000) and the development of
a maintenance plan to provide efficient and failure free
Fig. 5.
567
Description of risk evaluation module.
operation. The process flow diagram of the HVAC is

shown in Fig. 7.
6.2. Risk estimation
6.2.1. Failure scenarios
The complete HVAC system has been divided into 10
different functional units according to their operational
characteristics (Table 2). Two most probable failure
scenarios have been developed for most of the units and
listed in Table 2. These failure scenarios have been subjected to consequence analysis.
6.2.2. Consequence analysis
Consequence analysis has been carried out for envisaged failure scenarios for each of the 10 units. The operation of an HVAC system does not involve the processing of chemicals and the effect of its stoppage cannot
be measured in terms of the lost production. Thus, the
consequences related to financial and to human health
loss have been ignored. The focus in this study is on
the consequences related to system performance and the
effect on the environment. The consequences for these
two major classes are combined by applying Eq. (9).
They are then normalized on a scale of 110 for each
failure scenario. The results of consequence analysis for

envisaged failure scenarios of the different units of the
system are presented in Table 3. It is evident from Table
3 that the highest impact on the system performance
results from three units: the air supply fan, the EP relay,
and the freeze protection system.
6.2.3. Probabilistic failure analysis
An analysis to determine the failure probability distribution for each unit is listed in Table 3. For units having
more than one failure scenario, the scenarios that have
the maximum consequences are selected for subsequent analysis.
Memorial University facility management division has
been maintaining performance data of all of the components of the HVAC system. In the present study, 5 year
data have been used (19931998) (Wong, 2000). These
data have been used to verify various failure functions
(distribution) and it has been observed that two-parameter Weibull distribution define them the best (Eq. (8)).
F(t) 1exp
t
h
(10)
The two parameters, h and b, are estimated for the dif-
568
ferent units and their subcomponents and are presented

in Tables 4 and 5 (Wong, 2000).
Subsequently, fault trees have been developed for the
envisaged failure scenarios of the different units. Fig. 8
depicts the fault tree for the whole system, while Fig. 9
shows the fault tree for the air supply fan. These fault
trees are used to estimate the probability of occurrence
of failure according to the different scenarios. The
results of this analysis are shown in Tables 4 and 5. It
is evident from Table 4 that the failure of the air supply
fan, the humidifier, the EP relay system, and the damper
motor are the most probable causes for the failure of the
HVAC system.
6.2.4. Risk estimation
The results of the consequence and the probabilistic
analyses are combined to quantify the risk factors.
Tables 4 and 5 provide the values estimated for the risk
factors of the different units of the HVAC system.
The risk for the HVAC system failure is estimated at
1.01 (for 1 year duration), which is far above the acceptance level of 1.0E02.
6.3. Risk evaluation
Fig. 6.
Description maintenance planning module.
Fig. 7.
The results in Table 4 show that in order to reduce

the risk of the HVAC system failure, we need to reduce
the probability of failure of the air supply fan, the
humidifier, the EP relay system, and the damper motor.
This will be dealt with through the use of a more effective maintenance program. For illustration purpose, stepwise detailed risk calculations for damper motor are
shown in Appendix A.
Simplified block diagram of a typical HVAC system.
569
Table 2
Units in a typical HVAC system
Unit number in Fig. 8
Unit name
Failure scenarios
Outdoor louver
Damper motor
Air filter unit
4
5
Freeze protection unit

Heating unit
Cooling unit
Humidification unit
EP relay unit
9
10
Computer control unit

Air supply fan
Louver is blocked by foreign material

Louver is damaged or removed
Failed to allow fresh air intake during system operation
Failed to stop fresh air to HVAC during system shut down
Filter failed to remove particles from intake air
Pre-filter failed
Main filter failed
Unit failed to operate on demand
Failed to provide adequate heating
Provide excess heating
Cooling coil failed to provide adequate cooling
Coiling coil provide excess cooling
Failed to supply adequate moisture
Supplied excess moisture
Failed to provide enough control air
Failed to energize the final control element (electric control system)
Control system failed
Failed to supply adequate condition air with acceptable noise level
approach suggested in this work is explained briefly in

this section using the air supply fan unit as an example.
A value of the probability of the top event on the fault
tree of the unit is determined. This value is chosen
such that the resulting risk meets the risk acceptance
criterion. In the case of air supply fan system, the
value of risk is 1.0E03.
Using the value of the probability of failure of the
top event, a reverse fault tree analysis is conducted
to determine the required probabilities of the root
events. The probability of failure of a root event is
then used to estimate the time interval between consecutive inspection/maintenance tasks. Using this
analysis, we were able to determine the values for the
time intervals between consecutive maintenance tasks
of 41 days for external accessories such as fan belts,
etc., and 75 days for internal accessories such as fan
bearing, etc.
This exercise was repeated for the other units of the
system and the estimates determined for the maintenance
intervals are given in Table 6.
These values are then used to develop a maintenance
plan using the RBM methodology as shown in Table 6.
Fig. 8. Fault tree for air supply fan failure scenario; numbers are
explained in Table 2.
7. Summary and conclusion

6.4. Maintenance planning
One of the objectives of this study is to develop a
technique to design maintenance plans for reducing the
level of risk resulting from the failure of a system. The
Maintenance is aimed at increasing the availability of

any system taking account of safety or environment
issues and optimizing total life cycle cost. Risk assessment integrates reliability analysis with safety and
570
Table 3
Results of consequence analysis for different accident scenarios
Unit name
Failure scenarios
Consequence analysis results
Outdoor louver
Louver is blocked by foreign material

Louver is damaged or removed
Failed to allow fresh air intake during system operation
Failed to stop fresh air to HVAC during system shut down
Filter failed to objects/particles from intake air
Pre-filter failed
Main filter failed
Units failed to operate on demand
Failed to provide adequate heating
Provide excess heating
Cooling coil failed to provide adequate cooling
Coiling coil provided excess cooling
Failed to supply adequate moisture
Supplied excess moisture
Failed to provide enough control air
Failed to energize the final control element (electric control system)
Control system failed
Failed to supply adequate conditional air with acceptable level of noise
Fail to perform as desired
6
4
6
4
Damper motor
Air filter unit
Freeze protection unit

Heating unit
Cooling unit
Humidification unit
EP relay unit
Computer control unit
Air supply fan
HVAC system
4
4
8
4
3
4
3
5
4
6
8
6
8
5
Table 4
Results of risk estimation module; units in italicized exceeding the acceptance level
Unit name
Probability of failure in 1 year Risk factor
Outdoor louvera
Damper motor
Air filter unita
Freeze protection unita
Heating unita
Cooling unita
Humidification unit
EP relay unit
Computer control unita
Air supply fan
Overall HVAC system failure as per Fig. 8
Not available
51,996.6
Not available
Not available
Not available
Not available
57,608.6
69,366.5
Not available
See Table 5
Not available
3.85
Not available
Not available
Not available
Not available
2.99
3.04
Not available
See Table 5
1.0E04
1.05E03
1.0E04
1.0E04
1.0E04
1.0E04
3.58E03
1.86E03
1.0E04
0.1965
0.2021
6.0E04
6.3E03
5.0E04
6.0E04
4.0E04
4.0E04
1.8E02
1.5E02
8.0E04
1.57
1.01
Failure data for these units were not available, as they did not ever fail on operation; the failure probability for these units is adopted from
the literature (Lees, 1996).
Table 5
Details of air supply fan failure
Component number used in Fig. 9
Unit name
Probability of failure in 1 year
Risk factor
1
2
3
4
5
Air supply fan failure as per Fig. 9
Fan belt failure

Vortex vanes failed
Fan bearing failed
Fan assembly failed
Fan motor failed
20,464.3
68,043.1
62,328.5
121,417.4
132,780.2
2.146
1.638
2.466
2.035
1.712
1.49E01
3.42E02
7.88E03
4.74E03
9.47E03
0.1965
1.192
2.73E01
6.3E02
3.8E02
7.6E02
1.572
environmental issues. Risk-based maintenance attempts

to answer five important questions related to integrity
and fault free operation of the system:
What can cause the system to fail?
How can it cause the system to fail?

What would be the consequences if it fails?
How probable is it to occur?
How frequent an inspection/maintenance of what
components would avert such failure?
571
quence assessment, (iii) probabilistic failure analysis,

and (iv) risk estimation.
This paper illustrates the applicability of the proposed
methodology by applying it to a HVAC system. Initially,
the complete HVAC system is divided into 10 different
units. Among these units, fourdamper motor, freeze
protection unit, EP relay unit, and supply air fanwere
identified to be most risky, and contributing the
maximum in the overall risk of HVAC failure. An
inspection/maintenance schedule has been worked for all
four units. It is further demonstrated analytically that the
implementation of this inspection/maintenance schedule
would bring down the high level of unacceptable risk to
an acceptable level.
Appendix A. Detailed calculations for damper

motor unit of HVAC system
Fig. 9. Fault tree for air supply fan failure scenario; numbers are
explained in Table 5.
Having known the answers to these five questions, it

is safe to say that maintenance planning based on risk
analysis is expected to provide cost effective maintenance, which minimizes the consequences (related to
safety, economic, and environment) of a system
outage/failure. This will, in turn, result in a better asset
and capital utilization. Risk-based maintenance strategies can be used to improve the existing maintenance
policies through optimal decision procedures in different
phases of the life cycle of a system.
This paper presents a new methodology for risk-based
maintenance. The proposed methodology is more comprehensive and quantitative. It comprises three main
modules: (i) risk estimation module, (ii) risk evaluation
module, and maintenance planning module. Each module consists of many steps, i.e. risk estimation module
involves: (i) failure scenario development, (ii) conse-
A.1. Failure scenarios

Two scenarios are envisaged for this unit, they are:
Scenario 1: failed to allow fresh air
Scenario 2: failed to stop fresh air to HVAC during
shut down.
A.2. Consequence analysis
A.2.1. Scenario 1
System performance loss is 100%, A = 10
No financial loss, B = 0
Due to non availability of fresh air serious human health
effects, C = 6
Table 6
Results of optimal maintenance duration computations
Unit name
Optimal maintenance duration Revised frequency of failure (year1)

(days)
Un-revised risk factor
Revised risk factor
Damper motor
Humidification unit
EP relay unit
Air supply fan
Fan belt and vortex
Fan bearing, etc.
132
172
81
2.1E05
3.77E04
1.91E05
6.2E03
1.8E02
1.5E02
1.26E04
1.88E03
1.57E04
41
75
3.42E03
1.57
2.73E02
Overall HVAC risk prior to implementing maintenance plan

Overall HVAC risk after implementing maintenance plan
1.01
2.2E02
572
No environmental and ecological loss, D = 0

Con = [0.25 102 + 0.25 62]0.5 = 5.83 = 6
A.2.2. Scenario 2
Significant loss of system performance, A = 8
No financial loss, B = 0
Moderately serious human health effects, C = 4
No environmental and ecological loss, D = 0
Con = [0.25 82 + 0.25 42]0.5 = 4.47 = 4
Final consequence results = maximum of 4 and 6 =
6
A.3. Probabilistic failure analysis

Failure probability in 1 year of operation
b
3.85
Failure probability 1e(t/h) 1e(36524/51966.6)

1.05 103
A.4. Risk estimation

Risk factor due to damper motor = 61.05 103 =
6.3 103
Total calculated risk of the HVAC system = 1.01
A.5. Risk evaluation and maintenance planning

HVAC Target risk = 2.2 102
Target risk calculated for damper motor based on
HVAC target risk and reverse fault tree analysis = 2.1
105
Based on target risk, preventive maintenance time,
132 days.
References
Aller, J. E., Horowitz, N. C., Reynolds, J. T., & Weber, B. J. (1995).
Risk based inspection for petrochemical industry. In Risk and safety
assessment where is the balance? New York: American Society of
Mechanical Engineers.
API (1995). Base resource document on risk based inspection for API
committee on refinery equipment. Washington, DC: American Petroleum Institute.
ASME (1991). Research task force on risk based inspection guidelines,
risk based inspection development of guidelines. In General document CRTD 20-1. Washington, DC: American Society of Mechanical Engineers.
Capuano, M., & Koritko, S. (1996). Risk oriented maintenance. Biomedical Instrumentation and Technology, January/February, 2537.
Chen, L. N., & Toyoda, J. (1990). Maintenance scheduling based on
two level hierarchical structure to equalize incremental risk. IEEE

Truncations on Power Systems, 5(4), 15101561.
Dey, P. M. (2001). A risk-based model for inspection and maintenance
of cross-country petroleum pipeline. Journal of Quality in Maintenance Engineering, 7(1), 2541.
Dey, K. P., Ogunlana, S. O., Gupta, S. S., & Tabucanon, M. T. (1998).
A risk-based maintenance model for cross-country pipelines. Cost
Engineering, 40(4), 2431.
Dubois, D., & Prade, H. (1980). Fuzzy sets and systems: Theory and
applications. New York: Academic Press.
Greenberg, H. R., & Slater, B. B. (1992). Fault tree and event tree
analysis. New York: Van Nostrand Reinhold.
Hagemeijer, P. M., & Kerkveld, G. (1998). A methodology for riskbased inspection of pressurized systems. Proceedings of the Institute of Mechanical Engineers, Part E, 212, 3747.
Hirst, I. L., & Carter, D. A. (2000). A Worst Case methodology
for risk assessment of major accident installations. Process Safety
Progress, 19(2), 7882.
Khan, F. I. (2001). Maximum credible accident scenario for realistic
and reliable risk assessment. Chemical Engineering Progress, November, 5567.
Khan, F. I., & Abbasi, S. A. (1997). Accident hazard index: A multiattribute scheme for process industry hazard rating. Institution of
Chemical Engineers (IChemE) of UK (Environmental Protection
and Safety), IChemE, UK, 75B, 217.
Khan, F. I., & Abbasi, S. A. (1998). Safe maintenance practice. Chemical Industry Digest, March, 91105.
Khan, F. I., & Abbasi, S. A. (1999a). MAXCREDa new software
package for rapid risk assessment in chemical process industries.
Environment Modeling and Software, 14, 1125.
Khan, F. I., & Abbasi, S. A. (1999b). PROFAT: A user-friendly system
for probabilistic fault tree analysis. Process Safety Progress, 18(1),
4249.
Khan, F. I., & Abbasi, S. A. (2000). Analytical simulation and PROFAT II: A new methodology and a computer automated tool for
fault tree analysis in chemical process industries. Journal of Hazardous Materials, 75, 127.
Kletz, T. A. (1994). What went wrong. Houston, TX: Gulf Publication House.
Kumar, U. (1998). Maintenance strategies for mechanized and automated mining systems: a reliability and risk analysis based
approach. Journal of Mines, Metals and Fuels, Annual review,
343347.
Lai, F. S., Shenoi, S., & Fan, L. T. (1988). Fuzzy fault tree analysis
theory and applications. In Kandel, & Avni (Eds.), (pp. 139167).
Engineering risk and hazard assessment, vol. 1. Florida: CRC
Press Inc.
Lees, F. P. (1996). Loss prevention in chemical process industries, vol.
1. London: Butterworths.
Nessim, M., & Stephens, M. (1998). Quantitative risk-analysis model
guides maintenance budgeting. Pipe Line and Gas Industry, 81(6),
133.
Noma, K., Tankara, H., & Asai, K. (1981). Fault tree analysis with
fuzzy probability. Journal of Ergonomics, 17, 291297.
Rauzy, A. (1993). New algorithms for fault tree analysis. Reliability
Engineering and System Safety, 40, 203211.
Reynolds, J. T. (1995). Risk based inspection improves safety of pressure equipment. Oil and Gas Journal, special 16 January issue.
Ridgway, M. (2001). Classifying medical devices according to their
maintenance sensitivity: A practical, risk-based approach to PM
program management. Biomedical Instrumentation and Technology, May/June, 167176.
Shafaghi, A. (1988). Structure modeling of process systems for risk
and reliability analysis. In Kandel, & Avni (Eds.), (pp. 4564).
Engineering risk and hazard assessment, vol. 2. Florida: CRC
Press Inc.
Soon, H. C., Joo, Y. P., & Myung, K. K. (1985). The MonteCarlo
method without sorting for uncertainty propagation analysis in

PRA. Reliability Engineering, 10, 233.
Tanaka, H., Fan, L. T., Lai, F. S., & Toguchi, K. (1983). Fault tree
analysis by fuzzy probability. IEEE Transactions on Reliability, R32, 453456.
573
Wong, D. (2000). A knowledge-based decision support system in

reliability-centered maintenance of HVAC systems. PhD thesis,
Memorial University of Newfoundland, St. Johns, Canada.

Risk-Based Maintenance (RBM) A Quantitative Approach For

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Risk-Based Maintenance (RBM) A Quantitative Approach For

Uploaded by

Copyright:

Available Formats

Journal of Loss Prevention in the Process Industries 16 (2003) 561573

Risk-based maintenance (RBM): a quantitative approach for

Corresponding author. Tel.: +1-709-737-8939/7652; fax: +1-709737-4042.

maintenance strategy which maximizes availability and

system performance loss factor (dimensionless)

ment owned by Brunei Shell Petroleum (Hagemeijer and

uses information obtained from the study of failure

(2) risk evaluation, which consists of risk aversion and

3. Module I: risk estimation

2. Risk-based maintenance methodology

Architecture of RBM methodology.

Description of risk estimation module.

so that we can devise ways and means of preventing or

flashing, and the rate of evaporation. The models for

Details of the function are given in Table 1.

where i denotes the number of events, i.e. fire, explosion,

Ci (AR)i (PD1)i / UFR

Consequence assessment chart.

where UFR denotes an unacceptable fatality rate. The

Very important for system operation

Con [0.25A2 0.25B2 0.25C2 0.25D2]0.5

3.3. Step I.3. Probabilistic failure analysis

UDA indicates a level for the unacceptable damaging

Quantification of importance factor (IM).

(1) Fault tree development: The top event is identified

only give the probability of the top event but they

3.4. Step I.4. Risk estimation

The last step in this methodology aims at verifying

The results of the consequence and the probabilistic

5. Module III: maintenance planning

6. Case study: a maintenance plan for an HVAC

4. Module II: risk evaluation

Heating, ventilation and air-conditioning (HVAC)

Description of risk evaluation module.

operation. The process flow diagram of the HVAC is

failure scenario. The results of consequence analysis for

The two parameters, h and b, are estimated for the dif-

ferent units and their subcomponents and are presented

Description maintenance planning module.

The results in Table 4 show that in order to reduce

Simplified block diagram of a typical HVAC system.

Air filter unit

Freeze protection unit

Computer control unit

Louver is blocked by foreign material

approach suggested in this work is explained briefly in

7. Summary and conclusion

Maintenance is aimed at increasing the availability of

Consequence analysis results

Louver is blocked by foreign material

Freeze protection unit

Probability of failure in 1 year Risk factor

Probability of failure in 1 year

Fan belt failure

environmental issues. Risk-based maintenance attempts

How can it cause the system to fail?

quence assessment, (iii) probabilistic failure analysis,

Appendix A. Detailed calculations for damper

Having known the answers to these five questions, it

A.1. Failure scenarios

A.2. Consequence analysis

Optimal maintenance duration Revised frequency of failure (year1)