You are on page 1of 14

Maintenance and replacement policies for protective devices with imperfect repairs

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada Department of Mechanical Engineering, Universidad de Chile, Santiago, Chile Abstract This work considers setting an optimal policy for inspection, repair, and replacement (IRR) of a protective system. Failures of protective systems are hidden and can only be discovered during inspections or an actual case of emergency. The model proposed here considers maximization of expected interval availability of a protective system during a given interval of time. In this model, at any decision epoch we use dynamic programming to determine the optimal time to next inspection, and the type of action to be undertaken, depending on the observed state of the device. During a perfect inspection (all failures can be detected) and in case failure was detected, an immediate repair or replacement is performed and optimal time to next inspection is determined. In case when the system was found in operating condition, immediate repair or replacement can be performed or an optimal time to next action found. It is assumed that repairs are imperfect, i.e. they change the distribution of times to failure. The IRR policy is constrained by a maximum number of repairs and a maximum operating age limit. The proposed model has been applied to a set of real industrial data. Results and discussion are included. Keywords: protective device, inspection interval, non-periodic inspection, imperfect repair, dormant failures

A. Pak1 , R. Pascual2 , A.K.S. Jardine1

Introduction

Protective devices exist to operate in case of system emergency and help prevent or minimize undesired consequences from development. Depending on application, consequences of failure of protective systems can create danger to equipment, environment, or human life. The importance of availability of protective devices employed to ensure safety is often underestimated which leads to unnecessary losses which could have been avoided (Dhillon [1]). 1

It is often found that condition of safety systems is not monitored continuously, thus leaving room for undetected non self-announcing (dormant) failures and risk associated with the protective systems downtime. Rather, inspections are performed, conrming working condition or diagnosing failure to operate. Importance of inspections dedicated to nding dormant failures of protective devices can be projected from the following citation from Moubray [2]: If RCM is correctly applied to almost any modern, complex industrial system, it is not unusual to nd that up to 40% of failure modes fall into the hidden category. Furthermore, up to 80% of these failure modes require failure-nding, so up to one third of the tasks generated by comprehensive, correctly applied maintenance strategy development programs are failure nding tasks. Setting an inspection policy represents one of the tools to control risk and balance it with economics. A class of actions considered in this work, besides inspection policies, includes repair and replacement options for the decision maker to choose. Combined in a single model, these three options can be coordinated under several constraining considerations in order to optimize the objective function of interest, such as availability or cost associated with maintenance. The problem of optimization of maintenance policies for protective devices is highlighted in literature. Jiang and Jardine in [3] consider models which are developed to optimize xed interval inspection schedule and xedinterval schedule with delayed rst inspection with respect to the long-run interval availability. Jardine and Tsang in [4] suggest several models for optimization of replacement decisions, scheduling repair procedures, and setting inspection policies, which in particular can be applied to protective devices. Paper of Pascual et al. [5] is concerned with inspection programs of protective systems. The authors propose a model for optimization of frequency of actions and types of activities to be performed. Unavailability of protective systems is separated into known unavailability (due to maintenance actions) and unknown unavailability (due to undetected failures). Distinction is made between two types of inspection: partial inspection and full inspection. In the model proposed here, all inspections are perfect, i.e. all failures are identied. Quality of maintenance actions is an important consideration which should be taken into account in analysis of maintenance policies. Here we agree with Ascher and Feingold [6] who in particular state: In most probabilistic modeling, it is assumed that an overhaul restores a system 2

to a same-as-new condition. Not only is this not necessarily true, there is evidence that some overhauls reduce reliability. In support, authors provide several citations from case studies where overhauls ([7], [8], [9]) and inspections ([10]) inuence lifetime distribution characteristics of observed systems. Theoretical investigation on a model concerning with the degree of repair can be found in Brown and Proschan [11], where authors consider a probabilistic way of selecting whether a repair is minimal or perfect. Kijima [12] uses the concept of virtual age to describe the degree of repair. Other modeling approaches can be found in an extensive review of literature on imperfect maintenance of Pham and Wang [13]. In this work we are assuming that nth repair changes the devices failure distribution to a specied distribution dened by its reliability function Rn+1 (t). Perfect repair is a particular case in our model and can be represented using Rn+1 (t) = R(t), n, t. Dynamic programming (DP) is a versatile technique for modeling and solving sequential optimization problems. Early use of dynamic programming for maintenance decision making can be found in Bellman [14] and [15], White [16], and Jardine [17]. Ben-Ari and Gal [18] use DP to obtain optimal replacement policy for multi-component systems. Authors apply an algorithm for producing an approximate solution to DP problems with many state variables. Flynn et al. [19] present an optimal replacement model using stochastic dynamic programing to nd an optimal balance between cost of component replacement and cost of component failure. One of the practical advantages of dynamic programming in decision making is that it is capable of providing a reliable numerical result where analytical solution may be impossible to nd.

Dynamic Programming Problem Formulation

The model proposed in this work considers maximization of expected interval availability of a protective system during a given interval of time [0, T ] by setting an optimal policy of inspection, repair, and replacement (IRR). It can be claried that in this model by interval availability we understand the ratio of total uptime of the protective system and total duration of time interval of interest T . Our model is constrained by the following realistic considerations: An Inspection (I) is not instantaneous and takes TI time units; A Repair procedure (r) takes Tr time units; Replacements (R) take TR time units; 3

A Repair procedure changes a devices failure distribution. In this paper, to maintain generality, we assume that a repair procedure can be performed even if the device is in working condition; Any devices lifetime is constrained by a maximum operating age Tmax ; Maximum number of repairs of a single device is nmax . Another intrinsic consideration is that we consider a protective device as a black box which can be modeled as a single component system. There are situations when this assumption is too exigent, i.e. a system can be decomposed into subsystems and redundant components for which failure history exists. Interested reader is referred to Hauge et al. [20], Pascual et al. [5]. The above mentioned constraints, while making the model more realistic and general, also add diculty to the formulation of problem and make analytical solution virtually impossible. Thus, a dynamic programming formulation is suggested in this work for performing numerical analysis and developing an optimal IRR policy. The dynamic formulation developed in this paper assumes that at each inspection (starting from installation) decisions are made on: Time t to next inspection; Action AS to perform if the item is found in working (survived) condition (AS {Do nothing, Repair, Replace}); Action AF to perform if the item is found failed (AF {Repair, Replace}). Optimal values of t , AS , AF are obtained from maximization of the expected interval availability in the time interval [0, T ] dynamically. Flowchart illustrating the process can be found on Figure 1.

2.1

Mathematical Formulation

In order to formalize the logic of dynamic optimization, we need to introduce the following notations: tc - Time since installation of a protective device; tv - Time since last repair; n - Number of repairs undergone; A (tv , tc , T, n) - Interval availability in time interval [0, T ], given current conditions tv , tc , n, under optimal IRR policy; 4

Figure 1: Model Flowchart Rn+1 (t) - Reliability function of a protective device after nth repair. Under the current conditions tc , tv , T, n, value of the optimal expected interval availability A (tv , tc , T, n) can be found from the system of recursive equations {(1), (2), (3), (4)} where:

A (tv , tc , T, n) = An (t, tv , tc , T ) =

0, max

t min{Tmax tc , T }

T 0 A (t, tv , tc , T ), T > 0
n

(1) (2) tv + GF (t, tv , tc , T ) T

Rn+1 (tv + t) t + GS (t, tv , tc , T ) + Rn+1 (tv ) T


tfn+1 (t) dt tv Rn+1 (tv )Rn+1 (tv +t)

tv +t

Rn+1 (tv ) Rn+1 (tv + t) Rn+1 (tv )

(3) A (tv + t, tc + t, T t TI , n), (a) [T t TI TAS ] A (0, tc + t, T t TI Tr , n + 1), (b) max AS {Do nothing, r, R} A (0, 0, T t TI TR , 0), (c) (a) AS = Do nothing, tc < Tmax (b) AS = r, n < nmax , tc < Tmax (c) AS = R

GS (t, tv , tc , T ) =

GF (t, tv , tc , T ) =
AF {r, R}

(4) A (0, tc + t, T t TI Tr , n + 1), (d) A (0, 0, T t TI TR , 0), (e)

max

[T t TI TAF ]

(d) AF = r, n < nmax , tc < Tmax (e) AF = R

Summarizing the above, at each inspection, given conditions tv , tc , T, n, the three decision elements t , AS , AF are found as follows: t : An (t , tv , tc , T ) = max An (t, tv , tc , T )

t min{Tmax tc , T }

AS maximizes the expression dening GS (t, tv , tc , T ) AF maximizes the expression dening GF (t, tv , tc , T ) Optimality of the obtained policy in the class of dynamic policies described above can be shown by discretizing time and performing induction on T . Limiting behaviour of A (, , T, ) when T can be studied in order to obtain the long-run expected availability. It should be mentioned, however, that dynamic solution of the problem is numerically intensive and obtaining the long run behaviour may be challenging. 6

2.2

Numerical Example

For illustration of the proposed algorithm, consider the following set of model parameters: T = 500, Tmax = 100, TI = 1, Tr = 2, TR = 4 hours nmax = 2, R1 (t) = e 100 , R2 (t) = R3 (t) = e( 200 )
t t 3

Figure 2 illustrates the graph of expected interval availability A (0, 0, T, 0) under the optimal IRR policy developed by the proposed dynamic algorithm.

Interval Availability under Optimal IRR Policy


1 Expected Interval Availability 0.98 0.96 0.94 0.92 0.9 0.88 0 100 200 300 400 500 Length of Consideration Interval T (hours)

Figure 2: Optimal Expected Interval Availability of a New Device Numerical solution concludes that the optimal IRR policy for T = 500 is to perform a repair procedure after 8 time units every time a new device is installed, and then after 85 time units perform a complete replacement, in both cases irrespective to the observed state of the device. The optimal expected interval availability in [0, T ] for a new device under the developed IRR policy is 91.1%. It can also be observed that expected availability on the long run is approximately 91%. Signicance of model parameters can be observed on Figure 2 and from the optimal policy. It can be seen that on relatively short consideration intervals there is a large variation of the objective function which is due to the signicance of parameters TI , Tr , and TR with respect to length of the consideration interval T . The type of optimal policy reveals that life characteristics of a repaired device are more preferable than those of a new one. Thus, in order to achieve better performance, it may be recommended that major causes of rst failures are eliminated at the beginning of devices lifetimes. It can also be seen that in this setting the maximal operating age

limit Tmax is forcing replacements, bringing the availability down. Therefore, the possibility of extending the maximum allowable lifetime should be considered.

Statistical Analysis of Lifetime Data

In practice, a model for mathematical analysis often depends on the data one has. In case of protective devices this data can consist of previous histories of failures and suspensions, i.e. instances when condition of devices is no longer monitored for whatever reason or reliable information is not available. Such data is called the lifetime data. A wide range of methods of statistical analysis of this type of data are described, for example, in Lawless [21]. In this work we will give a brief outline of one of the approaches which can be applied to extract and in succession use information contained in the lifetime data of protective devices with dormant failures. Namely, we will consider a parametric approach on example of Weibull distribution and the Maximum Likelihood method of estimation of parameters. Since inspections of protective devices are usually discrete, failures are not detected instantaneously after they happened. Rather, when a device is found failed at inspection, an interval in time is known when the failure has taken place. Such type of data is classied as interval-censored and can be considered most general type of lifetime data. Particular cases of interval-censored data include right-censored data and left-censored data, when intervals in time are of type [a, ) and [0, b] respectively. The two-parameter Weibull distribution can be characterized by the socalled shape parameter and the scale parameter . Flexibility in parameters allows for adequate representation of a wide range of types of data, including data distributed exponentially ( = 1), normally ( 3.44), etc. More generally, the shape parameter can be adjusted for the data with decreasing ( < 1), constant ( = 1), and increasing ( > 1) hazard rates reecting infancy, random, and ageing behaviors correspondently. Reliability function of the Weibull distribution can be expressed as: R(t, , ) = e

(5)

One of the most popular tools for estimation of parameters of a hypothetical distribution is the Maximum Likelihood method (Lawless [21]). The method searches for the combination of parameters which best describes available data by maximizing the so-called likelihood function. The obtained parameters are termed Maximum Likelihood Estimators (MLEs) and posess several useful statistical properties.

In general, if a set of interval-censored data {[ai , bi ], i = 1...n} is available, the likelihood function for Weibull distribution with parameters , can be written as:
n

L(, ) =
i=1

[R(ai , , ) R(bi , , )]

where, in particular, contribution of left-censored data is R(0, , ) R(bi , , ) = 1 R(bi , , ) = F (bi , , ) and contribution of right-censored data is R(ai , , ) R(, , ) = R(ai , , ) Taking note of representation (5), we can conclude:
n

L(, ) =
i=1

ai

bi

(6)

Maximization of the likelihood function L(, ) (6) or, alternatively, ln L(, ) yields MLEs , of parameters of Weibull distribution which can be used for further modeling and analysis.

Case Study

The following experimental study was performed using the data provided by the maintenance department of one of North Americas bigger mining and rening companies. The company was interested in nding an optimal inspection schedule with respect to Availability for their eet of safety pressure valves. At the companys smelter, there are steam and process pipes which are subject to variable pressure loads. In order to protect the piping and downstream equipment from sudden pressure changes, safety valves have been installed. The main function of a valve is to release the pressure when it reaches a critical level. Failure to do so may damage downstream equipment or cause a pipe failure, resulting in unplanned downtime and emergency replacement costs. In order to identify safety valves failures, inspections have been done on a regular basis, each requiring on average 8 hours of a valves downtime. 9

During each inspection the valve was cleaned and bench tested at the desired pressure level. If the valve failed, it was disassembled, re-machined and installed back in place. Such repair procedure would on average take the same time an inspection takes. A replacement procedure also on average takes approximately 8 hours. A report was then created and saved in the database. In dialog with the company representative, a set of unacceptable events, or failures, was dened. In fact, any recorded action other than a successful bench pressure test was declared a failure. Any history not ending with a failure was considered a suspension. According to this denition most of the valves had experienced more than one failure and were subject to subsequent repairs. The total number of events processed was around 300. The content of suspension data was around 25%. The data on safety valves was collected in the form of dated records with indication of event and action taken. Records were taken at each inspection which were typically performed once a year. Failure data was interpreted as interval-censored data with intervals of length equal to the time period from the time of inspection preceding a failure to the time of inspection during which the failure had been identied. Suspension data was treated as right-censored data. It has been shown that the distribution of lifetimes before rst repair was signicantly dierent from the distribution of all consecutive lifetimes. Weibull distribution has been t to the data using the maximum likelihood method yielding statistically acceptable ts of distributions with the following respective parameters for rst lifetimes and all consecutive lifetimes : (1 = 1.03, 1 = 15 years) and (2 = 0.87, 2 = 4.4 years). Statistical hypothesis of insignicance of dierence of parameters 1 and 2 from 1 has been tested and accepted with 95% condence level. During the preliminary study, it has been noticed that for the provided data (i.e. 1 = 2 = 1, 1 > 2 , Tr = TR ), decision to replace a device will always dominate decision to repair. Thus, the optimal policy will not allow repairs. Moreover, due to specics of the data, signicance of parameters Tmax and nmax was minimized and their values in the model were assumed to be 30 years and 0 respectively. This observation signicantly improved numerical eciency of the algorithm and simplied the set of available policies. Numerical investigation showed that the optimal IRR policy for a new valve (Figure 3) tended to a xed-interval inspection scheme with the period of approximately 2 months. In other words, on the long run one should 10

Optimal Inspection Schedule


350

First Inspection Second Inspection Third Inspection Fourth Inspection Fith Inspection End of Consideration Period
0 100 200 300

Inspection Times (days)

300 250 200 150 100 50 0

Length of Consideration Interval T (days)

Interval Availability under Optimal Inspection Schedule


1

Expected Interval Availability

0.998 0.996 0.994 0.992 0.99 0.988 0 100 200 300

Length of Consideration Interval T (days)

Figure 3: Optimal Inspection Schedule and Optimal Expected Interval Availability of a New Device inspect each device every 2 months until it is found failed, at which time replacement should be performed. It can be shown that for the dened model setting with 1 = 1 and large Tmax , due to the memoryless property of the exponential distribution, the long-run optimal inspection schedule is xed-interval periodic, which is supported by the observed results. Alternatively to the dynamic algorithm, model (7) for optimization of xed-interval periodic policies proposed by Jardine and Tsang in [4] can be used to derive the long-run optimal inspection

11

schedule in the dened setting. t R1 (t) + 0 zf1 (z) dz A(t) = t + TI + TR (1 R1 (t))


t

(7)

According to Jardines model (see Figure 4) and in agreement with the developed IRR policy, the long-run optimal expected interval availability is close to 99% and is achieved by performing inspections every 60 days.

Optimal Availability for Fixed-Interval policies


0.989 Expected Interval Availability

0.9889

0.9888 50 55 60 Inspection Interval (days) 65 70

Figure 4: Expected Interval Availability A(t) as a function of Inspection Interval t It can also be observed on Figure 3 that short-term optimal inspection schedule is also xed interval periodic with inspection intervals of type {T, T , T , ...}. This is consistent with the result obtained by Boland and 2 3 Proschan in [22] for optimization of cost of replacement and repair policies over a nite time interval [0, T ]. A slight modication of cost model proposed in their work can be performed in order to show that optimal Interval Availability for the dened setting is achieved on a regular inspection schedule with intervals of type {T, T , T , ...}. 2 3

Conclusion

Setting an inspection, repair, and replacement policy for a protective device plays an important role in maintaining the systems safety at an optimal level. The model proposed in this work takes advantage of robustness of DP approach and allows for optimization under the specied conditions of IRR policies of protective devices with respect to the expected interval availability on a nite time interval. It is shown that numerical results obtained in the case study section are consistent with the previously developed models for optimization of maintenance decisions. 12

Numerical diculty of large-scale problems is one of the disadvantages of the proposed DP model. Specics of data can be used in order to improve eciency of the algorithm when appropriate. Derivation and investigation on types of optimal nite interval and long-run policies under the proposed limitations is encouraged.

Acknowledgements
We wish to thank the Natural Sciences and Engineering Research Council of Canada and the Ontario Centres of Excellence for their continuous support. Appreciation is also expressed to the participants in the ConditionBased Maintenance Consortium for their eort in collaboration, for data and funding provided for research. Rodrigo Pascual would like to acknowledge the nancial support of Material and Manufacturing Ontario and member companies of the ConditionBased Maintenance (CBM) Consortium that allowed his research visit to the CBM Lab of the University of Toronto during 2006.

References
[1] Dhillon, B.S. (2003) Engineering safety, World Scientic. [2] Moubray, J. (1997) Reliability-Centered Maintenance, 2nd Edition, Butterworth-Heinemann. [3] Jiang, R. and Jardine, A.K.S. (2005) Two Optimization Models of the Optimum Inspection Problem, Journal of the Operational Research Society, 56: 11761183. [4] Jardine, A. K. S., Tsang, A. H. S. (2005) Maintenance, replacement, and reliability: theory and applications, Boca Raton, FL: CRC Press. [5] Pascual, R., Louit, D., Jardine, A.K.S Optimal inspection intervals for safety systems with partial inspections, (in revision). [6] Ascher, H.E., Feingold, H. (1984) Repairable Systems Reliability: Modelling, Inference, Misconceptions and Their Causes, New York, NY: Marcel Dekker. [7] Lavalee, W. (1974) Aircraft Periodic Depot Level Maintenance Study, Center for Naval Analyses, Arlington, Virginia, Rep. No. CNS 1025. [8] Lavalee, W. (1975) Aircraft Engine Maintenance Study, Center for Naval Analyses, Arlington, Virginia, Rep. No. CNS 1060. 13

[9] Lomnicki, Z.A. (1973) Some Aspects of the Statistical Approach to Reliability, J. Roy. Stat. Soc., Ser. A, 133: 395419. [10] Kamins, M. (1973) Quick Fix: Reducing Aircraft Inspection Redundancy Between Base and Depot, Rand Corp., Santa Monica, R-1177PR [11] Brown, M., Proschan, F. (1983) Imperfect repair, Journal of Applied Probability, 20: 851859. [12] Kijima, M. (1989) Some results for repairable systems with general repair, Journal of Applied Probability, 26: 89102. [13] Pham, H., Wang, H. (1996) Imperfect maintenance, European Journal of Operational Research, 94: 425438. [14] Bellman, R. (1955) Equipment replacement policy, Journal of the Society for the Industrial Applications of Mathematics, 3: 133136. [15] Bellman, R. (1957) Dynamic Programming, Princeton, NJ: Princeton University Press. [16] White, D.J. (1969) Dynamic Programming, Oliver and Boyd/HoldenDay. [17] Jardine, A. K. S. (1973) Maintenance, replacement, and reliability, Pitman Publishing. [18] Ben-Ari, Y., Gal, S. (1986) Optimal replacement policy for multicomponent systems: an application to a dairy herd, European Journal of Operational Research, 23: 213221. [19] Flynn, J., Chung, C., and Chaing, D. (1988) Optimal replacement policies for a multicomponent reliability system, Operations Research Letters, 7: 167172. [20] Hauge, S., Hokstad, P., Langseth H., ien, K. (2006) Reliability Prediction Method for safety Instrumented Systems, PDS Method Handbook, 2006 Edition. SINTEF Report STF50 A06031, SINTEF, Trondheim, Norway, 2006. [21] Lawless, J.F. (2003) Statistical Models and Methods for Lifetime Data, Second edition, New Jersey: John Wiley & Sons, Inc. [22] Boland, P. J., Proschan, F. (1982) Periodic Replacement with Increasing Minimal Repair Costs at Failure, Operations Research, 30: 1183 1189.

14

You might also like