You are on page 1of 9

Reliability of Standby Systems

Salvatore Distefano
Dipartimento di Matematica, Universit` a di Messina, C.da Di Dio, 98166 Messina, Italy sdistefano@unime.it Dipartimento di Elettronica e Informazione, Politecnico di Milano, Via Ponzio 34/5, 20133 Milano, Italy distefano@elet.polimi.it

Abstract. Reliability theory bases on the concept of boolean components, i.e. of up, operating or down, failed components. But often such assumption is not adequate for modeling specic behaviors of components, units, subsystems and systems. It cannot catch, for example, different operating conditions of components due to dependencies on other components or environment variations. Aim of this paper is to investigating a specic dynamic behavior, the standby phenomena in reliability contexts, starting from a characterization from both internal and external perspectives. The formal specication of the problem is obtained through the dynamic reliability theory, providing its analytical formulation. Keywords: Standby, redundancy, standby redundant systems, k-outof-n standby redundancy.

Introduction and Motivations

Standby is a hot topic in reliability as also highlighted in literature. With specic regards to the evaluation of the standby systems reliability several techniques have been used. For example, in [1] and [2] renewal theory and semi-Markov models are exploited to evaluate some specic case studies such as: three-state systems, systems with mixed constant repair time, systems with multi-phase repair, systems with non-regenerative states, two-component systems with cold standby and maintenance, and so on. The method of the supplementary variables and Laplace transform are instead used in [3,4] to evaluate the stationary availability of n-unit parallel redundant systems with correlated failures and single repair facilities. However, to the best of our knowledge, the specic literature partially faces or lacks of some aspects that can arise by evaluating the problem from dierent perspectives. In particular, as introduced above, the concept of standby is always mixed to the redundant one. This fact can drive to misunderstanding or even to approximations that can result wrong and dangerous in the system design. The main aim of this paper is to cover these lacks, focusing on standby and deeply investigating the related phenomena in reliability contexts from both the
D.-S. Huang et al. (Eds.): ICIC 2011, LNBI 6840, pp. 267275, 2012. c Springer-Verlag Berlin Heidelberg 2012

268

S. Distefano

internal and the external viewpoints. In the former case, the unit is observed in isolation, without taking into account the interactions with the external environment, in order to characterize and to evaluate its internal behaviour. Then, the unit behaviour is evaluated also considering such interactions, in a larger system context. With the support of dynamic reliability theory, the characterization thus specied is formalized in terms of specic equations starting from the conservation of reliability principle. In order to achieve such goals, the remainder of the paper is organized as follows: section 2 provides background on the standby behaviour and related concepts; section 3 characterizes the standby from both the internal and the external points of view also introducing standby redundant systems, while section 4 species the standby behaviour in analytical terms. Then section 5 summarizes and closes the paper.

Preliminary Concepts

Standby systems are characterized by dual-operating mode: active and sleep. While in active-mode a standby system is fully operating, able to provide its services. Otherwise, in the sleep-mode no services are provided by the standby system until a specic external call, signal or input switches it from the sleep to the active mode. Standby systems are widely used in modern technologies due to their capabilities to optimize costs, to reduce the environmental impact, to optimize the system reliability and availability, to adequately manage redundant resources, and so on. Usually the concept of standby, in technological context, is referred to the energy or to the power consumption of the system. In fact, more and more often standby devices such as standby generators, standby batteries, standby power systems, etc., are used in designs, projects, schemes, data sheets and technical notes. As a conrmation of this, several technical glossaries now include standby and related terms, such as sleep-mode or standby-mode as in [10]: a mode in which electronic appliances are turned o but under power and ready to activate on command. The attention attracted by standby and related issues has consolidated a research trend on the topic, especially in recent times in which the sensitiveness on environment, pollution and energy-related problems is strongly grown, giving rise to many government [7,8,9,11] and non-government [5,6,12] initiatives and projects. This has impacted on the designing approach, preferring low-power devices managed through standby policies with the aim of reducing the power consumption and optimize costs, performance and reliability. In the ICT context, Green computing [16] was born in order to achieve such goals. A good denition of standby in proposed in [11], in particular since it highlights the relationship between standby and energy/power, thus characterizing the active/sleep modes in terms of the load applied to the standby unit: in the

Reliability of Standby Systems

269

former case a full load is applied, while sleep modes are characterized by partial or phantom loads. According to such viewpoint, the hot/warm standby represents fully or partial powered sleep mode, respectively, while in cold sleep mode the system is not powered.

Standby Characterization

Standby, in reliability contexts, is usually considered as a specic policy of redundancy. But, as discussed in section 1, it can be interpreted as a more general and complex concept that has to be investigated from a higher level of abstraction, separately from the redundancy. With this aim, in the following the standby behaviour is studied in deep from two dierent points of view: internal, by observing the eects of the standby from the inside, and external/system, taking into account the interactions from a system reliability viewpoint, thus considering the standby unit as a component. 3.1 Inside a Standby Unit

The goal of this section is to observe a standby unit from the inside in order to identify the eects of standby into the unit and to characterize them as a specic state of the standby unit state machine. According to the (static) system reliability theory, two states can be assumed by a component/unit: up and down. A unit is therefore Boolean, i.e. it can only be either operating or failed, respectively. Such classication cannot adequately represent the standby, since the sleep-mode cannot be clearly identied as an up or a down state. It is thus necessary to review such classication by considering the standby behaviour. A good starting point is the denition provided in [13]: Up - Pertaining to a system or component that is operable and in service. It can be: Operating - Pertaining to a system or component that is operable, in service, and in use. Idle - Pertaining to a system or component that is operable and in service, but not in use. Down - Pertaining to a system or component that is not operable or has been taken out of service. In this way 3 features characterizing the states of a system can be identied: operable - if the system is ready for use, for example if it is physically intact; serviceable - if the system is ready to provide its service to the environment; in use - if the system is performing its service. The serviceable property mainly regards the interaction between the unit and the external environment, and therefore it is better considered and evaluated in the next subsection.

270

S. Distefano Table 1. Standby unit state machine


FEATURE Operable Serviceable Use STATE Yes Not Yes Not In Not in operating X X X idle X X X dormant X X X failed X X X

From such classication it is possible to obtain four meaningful states as reported in Table 1. In this way, the states classication of [13] into operating, idle and failed, is enriched by a new state, the dormant one. This latter describes a condition in which an operable unit is not in service due to a particular condition or constraint applied to the unit, for example an external input switching the unit in the sleep-mode as occur in standby unit.

Operating
resume repair

sleep repair

failure

Standby
failure

Failed

Fig. 1. State machine of a standby unit from inside

From an internal viewpoint, the dormant state cannot be distinguished by the idle one, since the serviceable property, as discussed above, is not taken into account from an internal point of view. Therefore the dormant state has to be considered as an up state. As a consequence, from the internal perspective, idle and dormant states can be merged into a standby state, as shown in Fig. 1. Even though such standby state is an up state as the operating one, it diers by this latter since it is characterized, as introduced in section 2, by a dierent (phantom) load that can signicantly aect its behaviour. Such distinction is particularly meaningful in case of cold and warm standby but, otherwise, it is meaningless in case of hot standby since the load characterizing such state is the same of the one characterizing the operating state. Therefore the hot standby state, from the internal point of view, is undistinguishable from the operating state and so it can be considered as an operating state. The standby unit dynamics thus resulting is regulated by four main events: failure, resume, sleep and repair. The failure event brings to the failed state. Since in general a unit can fail from both operating and standby states, failures from both such states are possible. The only exception is the cold standby by which it is not possible to fail and therefore no failure events can be specied from it. On the other hand, the repair switches from failed to active or standby states, thus representing the standby unit repair. The resume represents transitions from

Reliability of Standby Systems

271

the standby to the operating states, while sleep the reverse transitions, thus implementing the sleep-active cycles of a standby unit. 3.2 The System Perspective

Once characterized the internal behaviour of a standby unit, the focus is moved towards the system observing the unit from the outside. The standby unit is therefore now considered as a part of the system, thus taking into account the relationships with the other components and with the external environment. In this way the standby can be characterized as a dynamic-dependent behaviour, involving two parts: the driver/trigger side driving the standby, and the standby unit that reacts to the inputs incoming from driver. From the system reliability point of view, the characterization discussed above and synthesized in Table 1 has to be adequately revised. First of all, it is necessary to take into account the serviceable property above identied and neglected, since it is strictly related to the external viewpoint. This means that the states characterization specied in Fig. 1 has to be modied, and, in particular, that it is no more possible to merge idle and dormant states into a unique standby state. This fact requires further explanations. Since the dormant state represents the condition in which the standby unit depends upon an external event able to switch it in service, or serviceable, it is no more possible to consider it as an up state as above, but more properly it has to be evaluated as a down state, out of service. In this way, from a system reliability perspective, both the dormant and the failed states are identied as down states. But there is an important dierence between dormant and failed states: in case of failure, an external time-consuming action, such as a replacement or a repair, is required to restore the operating conditions of the unit; while, in the case of the dormant state, the standby unit is not physically failed, it waits for a driver input that can immediately switch it in service. This further justies the fact that the one-standby-state characterization of Fig. 1 does not well represent the behaviour of the standby component. It becomes necessary to split the unique standby state into two dierent states corresponding to the idle and the dormant modes as shown in Fig. 2. In this way,

Operating
pause resume enable disable enable disable repair failure failure repair failure repair

Dormant

Idle sleep

Failed

Fig. 2. State machine of a standby unit from outside

272

S. Distefano

a unit can transitions to the dormant state if repaired or from both active and idle states (the disable event), vice-versa, a unit can be enabled from the dormant state by transitioning to both idle or active states (the enable event), or it can fail. Thus, according to the standby unit characterization of Table 1, from an external point of view it is possible to identify operating and idle as up states, while dormant and failed as down states.

Formal Aspects

In section 2, starting from [11], a classication among hot, warm and cold standbys is performed while in section 3, a characterization of the standby behaviour in specic states is proposed. With the aim of quantitatively translating such characterization in terms of reliability, it is possible to base on the heuristic rules that establishes a relationship between the load applied to a generic standby system (subsystem or unit) and the system (un)reliability, as introduced above. According to such rule, to greater load applied to the system corresponds lower reliability or, equivalently, higher unreliability. This is due to the fact that, in case of greater load, the system makes more work and, consequently, its reliability quicker decreases or its failure rate increases. Such heuristic is particularly true when the standby system is subjected to a phantom load as in the cold standby case, since it does not work in standby and therefore cannot fail on its own but only if external causes arise. Following this reasoning, it is possible to provide a reliability characterization of the standby state based on the relationship between reliability and load. From an high level point of view, the standby, the idle and the dormant states (more generally standby) of a standby system, as specied in section 3, can be further characterized as [14,15]: Cold - if the unit cannot fail autonomously; Warm - if the unit can fail autonomously, but with a lower failure rate or greater reliability distribution (in the statistical sense) than the operating state one; Hot - if the unit can fail autonomously as in the operating state, with the same failure rate or reliability distribution. A standby unit in cold standby is (intrinsically) reliable during its sleep mode. Otherwise (warm and hot standby), the unit can also fail from the sleep mode. From the system point of view, as discussed in section 3.2, a standby can be either an up state in case it is classied as idle, or a down state if identied as dormant. However, in general a standby state, both idle and dormant, can be characterized by its own reliability function that quanties the impact of the standby on the system. From the probability theory point of view, a standby system SS O S and RSS . The can be characterized by, at least, two reliability functions: RSS former models the behaviour of the unit in the fully operating mode, while the

Reliability of Standby Systems

273

latter characterizes the standby/sleep mode. Assuming the unit is initially, at time t = 0, fully operating and at time t = x > 0 it switches to the sleep mode, the standby system reliability function RSS (t, x) can be specied as follows: RSS (t, x) =
O RSS (t) t x S RSS (t) t > x

(1)

where x is associated to the trigger event random variable X driving the standby. Thus, following the classication of cold, warm and hot standby, cold and warm standby can be more formally specied by eq. (1), in which at change O S point x there is a change of the reliability CDF from RSS (t) to RSS (t) with O S O S RSS (t) RSS (t) t 0, while in the hot standby case RSS (t) = RSS (t). As stated above, the standby system active-sleep cycles are triggered by an external event, i.e., in probability terms, the standby system reliability RSS depends on two events as in eq. (1): the standby unit lifetime T and the trigger O S event X . Assuming to know RSS (t) and RSS (t) or equivalently the corresponding O S unreliability CDFs FSS (t) and FSS (t), and the distribution of the conditioning event X , FX (x), the aim is to obtain the reliability of the standby system RSS (t) of eq. (1), removing its dependency on x. Thus, exploiting the theorem of total probability, FSS (t) = 1 RSS (t) can be obtained as follows: FSS (t) = P r(T t|X = x)fX (x)dx = t + P r(T t|X = x)fX (x)dx + t P r(T t|X = x)fX (x)dx = 0 t O = 0 (1 P r(T > t|X = x))fX (x)dx + FSS (t)(FX (x)| t )
+

(2)

O O S S (t) = 1 RSS (t) and FSS (t) = 1 RSS (t) and fX (x) = FX (x). In where FSS order to evaluate P r(T > t|X = x), it can be observed that the dependent O (t) and then, at t = x, it switches into the sleep component for t x follows FSS S mode state characterized by FSS (t). It is therefore necessary to understand what happens at change point x. Starting from the conservation of reliability principle [17], also known as the Markov additive property [18], the eect of the switching between the two distributions can be quantied in terms of time through , thus obtaining the equivalent time such that, at change point x: O S (x) = RSS (x + ) = RSS RSS S (1) O (RSS (x)) x

(3)

O () is strictly decreasing and therefore invertible. assuming that RSS S S t + RSS In this way P r(T > t|X = x) = RSS (t + ) = RSS S (1)

O (RSS (x)) x

since x t, and thus substituting it in eq. (2):


O (t)(1 FX (t)) + FSS (t) = FSS t 0 S t + RSS FSS S (1) O (RSS (x)) x fX (x)dx (4)

Eq. (4) thus quanties the unreliability of a standby system switching from operating to sleep modes when triggered by an external event stochastically represented by X .

274

S. Distefano

Conclusions

Standby systems are of strategic importance in the actual technologies, being a way for reducing the environmental impact and the costs, by extending the systems time-to-life. Focusing on reliability, this paper studies in depth the standby behaviour considering dierent complementary perspectives, the intrinsic one, investigating a standby unit from the inside, and the external/operational viewpoint, considering reliability interactions and dynamics among the standby components of a system. Starting from such characterization, the behaviour of a generic standby system is analytically investigated, providing the corresponding formal relationships and equations. Moreover, standby redundancy is formally evaluated rstly considering a 2-unit standby redundant system.

References
1. Limnios, N., Oprisan, G.: Semi-Markov Processes and Reliability, ser. Statistics for Industry and Technology. Birkh auser, Boston (2001) 2. Janssen, J., Manca, R.: Semi-Markov Risk Models for Finance, Insurance and Reliability. Springer, Heidelberg (2007) 3. Itoi, T., Nishida, T., Kodama, M., Ohi, F.: N-unit Parallel Redundant System with Correlated Failure and Single Repair Facility. Microelectronics Reliability 17(2), 279285 (1978) 4. Nikolov, A.V.: N-unit Parallel Redundant System with Multiple Correlated Failures. Microelectronics and Reliability 26(1), 3134 (1986) 5. International Energy Agency (IEA). IEA Standby Power Initiative. Task Force 1: Denitions and Terminology of Standby Power, November 17-18, Washington, USA (1999) 6. International Electrotechnical Commission (IEC). IEC 62301 standard: Household electrical appliances - Measurement of standby power. Edition 2.0. 7. Australian Government, Department Of Environment, Water, Heritage and the Arts. Australian standby power program (September 2009), http://www.energyrating.gov.au/standby.html 8. U.S. Environmental Protection Agency and U.S. Department of Energy. ENERGY STAR program 9. The European Commission. The Directive 2005/32/EC on the Eco-Design of Energy-using Products (EuP) 10. Meier, A.: Standby Power Use - Denitions and Terminology. In: First Workshop on Reducing Standby Losses, Paris, France (January 1999) 11. Alliance for Telecommunications Industry Solutions (ATIS). American National Standard ATIS Telecom Glossary (2007) 12. Institute of Electrical and Electronics Engineers (IEEE). IEEE Std 446-1995 - IEEE Recommended Practice for Emergency and Standby Power Systems for Industrial and Commercial Applications 13. Institute of Electrical and Electronics Engineers (IEEE). IEEE 610-1991 - IEEE Standard Computer Dictionary. A Compilation of IEEE Standard Computer Glossaries (1991) ISBN:1559370793.

Reliability of Standby Systems

275

14. Dugan, J.B., Bavuso, S., Boyd, M.: Dynamic Fault Tree Models for Fault-Tolerant Computer Systems. IEEE Trans. Reliability 41(3), 363377 (1992) 15. Distefano, S., Puliato, A.: Dependability evaluation with dynamic reliability block diagrams and dynamic fault trees. IEEE Transactions on Dependable and Secure Computing 6(1), 417 (2009) 16. Murugesan, S.: Harnessing Green IT: Principles and Practices. IT Professional 10(1), 2433 (2008), doi:10.1109/MITP.2008.10. 17. Kececioglu, D.: Reliability Engineering Handbook, vol. 1 & 2. DEStech Publications (1991) ISBN Volume 1: 1932078002, ISBN Volume 2: 1932078010 18. Finkelstein, M.S.: Wearing-out of components in a variable environment. Reliability Engineering & System Safety 66(3), 235242 (1999)

You might also like