You are on page 1of 9

DP 471: ELECTRICAL SAFETY AND MAINTENANCE

LESSON 3: RELIABILITY ANALYSIS


3. Reliability theory
Maintenance engineers and managers should understand the concept of reliability and its various
implications for maintenance activities. Despite the increasing sophistication of various kinds of
equipment used both in the manufacturing and service sectors, there has been observations that
many of these systems are down most of the time, require extensive maintenance and consume a
lot of financial and human resources for their up-keep. This makes the issue of ensuring adequate
reliability for our systems important. An acceptable level of reliability is dependent on the conditions
for which a system is used. Whereas, for example, a transistor designed for home stereo
equipment may work for years without failure, the same may be quite useless in a space satellite
component.
Reliability is defined as the probability that a system or facility will function satisfactorily within
specified limits for at least a given period of time under certain operating conditions. There
are therefore four important elements when talking about reliability of a given system. These are:
probability, satisfactory performance, minimum time, and operating conditions specifications.
The first element (probability) is straightforward to understand and it indicates that reliability says
something about the ratio of actual operating time to the specified period of operation. The
element of satisfactory performance refers to the meeting of certain criteria considered as
acceptable performance by management. The criteria may be quantitative or/and qualitative. The
element of time must be known in order to arrive at the probability that a given equipment functions
as programmed. This specification of time will limit the utilization of the component to the specified
period of time: if the equipment fails outside the specified time limit, then this failure will not be
taken in reliability calculations. The last item, i.e. specified operating conditions, are the
environmental standards such as temperature and humidity standards required for proper
operation of the system. To give an example, whereas the reliability of a standard automobile tyre
is very close to 100% for the first 10,000 km. of its normal operations, it is practically zero if used
for racing car competitions.
3.1 Reliability function
The reliability, denoted by R(t), is a function of time with the property that 0 R(t) 1 (See
Figure 3.1).

R(t)

t
Figure3.1: Reliability as a function of time
Let F(t) = the probability that the system will fail by time t. Then:

R(t)= 1 - F(t)= f(t)dt

3.1

where F(t) is the probability that the system will fail by time t, which can also referred to as failure
distribution function
f(t) is the probability density function of the variables t (i.e., for the time to failure).
Assuming that the time to failure is described by an exponential function:
- t
3.2
hence:

f(t)= e

R(t) =

- t

= e- t

3.3

where = is a constant (from previous analysis in lesson 2 we found that for a negative exponential
function represents an instantaneous failure rate). Further analysis will show that = 1/MTBF.
The mean time between failures (MTBF) is the average of the lifetimes of a sample of n similar
items.
The illustration presented here primarily focuses on the reliability function in terms of the
exponential distribution, which is commonly assumed in many applications. Actually, the failure
characteristics of different items are not necessarily the same. There are a number of wellknown probability density functions, which in practice have been found to describe the failure
2

characteristics of different equipment. These include the binomial, exponential, normal (or
Gaussian), Poisson, gamma, and Weibull distributions. Thus, one should take care not to
assume that the exponential distribution is applicable in all instances, or the Weibull distribution
is the best, and so one.
The rate at which failures occur in a specified time interval is called the failure rate during that
interval. The failure rate () is expressed as:

Number of failures
Total operating hours

3.4

The failure rate may be expressed in terms of failures per hour, percent failures per 1,000
hours, or failures per million hours. As an example, suppose that 10 components were tested
under specified operating conditions. The components (which are not repairable) failed as
follows:
-Component 1 failed after 75 hours.
-Component 2 failed after 125 hours.
-Component 3 failed after 130 hours.
-Component 4 failed after 325 hours.
-Component 5 failed after 525 hours.
There were five failures and the total operating time was 3,805 hours. Using Equation (3.4), the
calculated failure rate per hour is

5
0.001314
3,805

As a second example, suppose that the operating cycle for a given system is 169 hours, as
illustrated in Figure 2-2. During that time six failures occur at the points indicated. A failure is
defined as an instance when the system is not operating within a specified set of parameters. The
failure rate, or corrective maintenance frequency, per hour is given by:

number of failure
6

0.0422535
total mission time 142

Assuming an exponential distribution, the system mean life or the mean time between failures
(MTBF) is given by:

MTBF

1
23.6667 hours
0.0422535

When determining the overall failure rate, particularly with regard to estimating corrective
maintenance actions (i.e., the frequency of corrective maintenance), one must address all
system failures to include failures due to primary defects, failures due to manufacturing defects,
failures due to operator and maintenance errors, and so on. The overall failure rate should cover
all factors that will cause the system to be inoperative at a time when satisfactory system
operation is required. A combined failure rate is presented in Table 3.1.
Table 3.1: Combined failure rate
Consideration
a) Inherent reliability of the system
b) Manufacturing defects
c) Wear out rate
d) Dependent failure rate
e) Operator-induced failure rate
f) Maintenance induced failure rate
g) Equipment damage rate
Total combined factor

Assumed Factor
(instances/hour)
0.000392
0.000002
0.000000
0.000072
0.000003
0.000012
0.000005
0.000486

When assuming the negative exponential distribution, the failure rate is considered to be
relatively constant during normal system operation if the system design is mature. That is, when
equipment is produced and the system is initially distributed for operational use, there are
4

usually a higher number of failures due to component variations and mismatches,


manufacturing processes, and so on. The initial failure rate is higher than anticipated, but
gradually decreases and levels off during the "debugging" or "burn-in" period, as illustrated in
Figure 3.3. Likewise, when the system reaches a certain age, there is a "wear-out" period where
the failure rate increases. The relatively level portion of the curve in Figure 3.3 is the constant
failure rate where the exponential failure law applies.
Figure 3.3 illustrates certain "relative" relationships. Actually the curve may vary considerably
depending on the type of system and its operational profile. Further, if the system is continually
being modified for one reason or another, the failure rate may not be constant. In any event, the
illustration does provide a good basis for considering failure-rate trends on a relative basis.

3.2 Reliability Component Relationship


The overall reliability of a system is a function of the nature of the relationship between its
component parts. It is usually appropriate and common to consider series networks, parallel
networks and a combination of these as underlying the make-up of system. These networks are
used in reliability block diagrams and in static models for predicting and analysing the reliability of
systems.
3.2.1 Serial Networks
A serial network must have its components in working order for the overall system to function in an
acceptable way.

input

Component A

Component B

Output

Fig. 3.4: Serial Network


5

If it assumed that the failure behavior of each component is statistically independent from that of
the other then the reliability of the system is the product of the reliabilities for the individual
component of the system:

R(t) = RA(t) x RB(t)

for a two component system

3.5

and

R(t) = R1(t) x R2(t) x R3(t) .... x Rn(t) for n series-connected components.


3.6
For example, if it assumed that each component has an exponential p.d.f. then:
R(t) = RA(t) x RB(t) = e-1t.e-2t = e-(1+ 2)t

3.7

It is important to note that since the magnitudes of RA(t) and RB(t) are each less than unit, the
combined R(t) for a series-connected system will always be less than that of any individual
component.
Example:
1. As an example, suppose that an electronic system includes a transmitter, a receiver, and a
power supply. The transmitter reliability is 0.8521, the receiver reliability is 0.9712, and the
power supply reliability is 0.9357. The overall reliability for the electronic system is
= (0.8521)(0.9712)(0.9357) = 0.7743

2. A small plant is required to operate for 1000 hrs. It has four series-connected subsystems
whose MTBF (Mean Time Between Failures) are 6000 hrs, 4500 hrs, 10500 hrs and 3200 hrs.
What is the overall reliability of the plant. Assume exponential p.d.f behavior.
Solution: i= 1 / M.T.B.F
i.e.,
1 = 0.000167 failure/hr
2 = 0.000222 failure/hr
3 = 0.000095 failure/hr
4 = 0.000313 failure/hr
- (i)x 1000
Therefore R = e
= 0.4507
i.e., the probability of the plant operating for at least 1000 hrs is 45%. If the requirement were
reduced to 500 hours, the reliability would increase to about 67%.

3.2.2 Parallel Networks


Parallel networks have similar components that have been arranged in parallel (See example
below). In a parallel network, all components must fail in order to cause total system failure. The
system depicted in the figure below will function if either A or B, or both are working.

For the two component system, the probability of system failure in time t, F(t) is = FA(t) x FB(t).
Hence,

R(t)

= 1 - F(t) = 1 - FA(t).FB(t)
= 1 - [1 - RA(t)].[1 - RB(t)]
= RA(t) + RB(t) - RA(t)RB(t) for 2-component system

3.8

For n - components in parallel each with the same R,

R(t) = 1 - (1 - R)n

3.9

Fig. 3.5: A Parallel Network


Parallel redundant networks are principally used to increase the overall reliability of a system. An
example is the use of a spare tyre in a car.
For instance, assume that a system includes two identical subsystems in parallel and that the
reliability of each subsystem is 0.95. The reliability of the system is found from Equation (2.10)
as:
Reliability (R) = 0.95+0.95- (0.95)(0.95)
0.9975
Suppose that the reliability of the system above needs improvement beyond 0.9975. By adding
a third identical subsystem in parallel, the system reliability is found from equation 3.9

Reliability R = 1-(1-0.95)3
Note that this is a reliability improvement of 0.002375 over the previous configuration, or that the
unreliability of the system was improved from 0.0025 to 0.000125.
If the subsystems are not identical, Equation (3.8) can be used. For example, a parallel
redundant network with two subsystems with
reliability of:

RA = 0.75 and RB = 0.82 gives a system

Reliability (R) = 0.75+0.82 - (0.75)(0.82) = 0.955


7

3.2.3 Combined series-parallel networks.


Various levels of reliability can be achieved through the application of a combination of series
and parallel networks. Consider the three examples illustrated in Figure 3.6.

With combined series-parallel networks, computation of the overall system reliability is obtained by
first evaluating the reliability of the redundant (parallel) elements. Then the overall reliability is
computed by finding the product of the "equivalent" series quantities.
The calculation of reliability of series and/or parallel networks assumes that the components are
operated within their useful life phases. But for reasons of cost and technical limitations many
components will have useful lifetimes much less than the expected life of the overall system. An
example is the life of the clutch plate in a car. This has to be replaced before its useful life (to) in
order to maintain a desired reliability of the system in time t much greater that to
3.3 Reliability Planning
Reliability is an intrinsic characteristic of the design of a system, it must be carefully planned,
determined, and specified as part of the overall planning of the system. The following activities
should be part of the reliability planning exercise:
a)

Establishment of the quantitative and qualitative reliability requirements for the system i.e.
for a given plant, for example, what should the acceptable reliability level be for a given
period of time. The figure can be stated as a probability or in terms of M.T.B.F.

b)

The allocation or apportionment of the requirements stipulated in a to the subsystem level


and beyond. The starting point of reliability analysis is the transformation of the system
structure into a reliability block diagram. The block diagram consists of the logical
connection of the system components to fulfil the function of the system. Using the block
diagram it is possible to calculate the overall system reliability by calculating the
probability of failure of each component.

c)

Using a variety of design procedures, techniques and practices to ensure that components
have the necessary reliability level (or M.T.B.F). Such techniques could involve effective
component part-selection and control, derating (i.e. using components below rated stress
conditions, redundancy, etc.).
8

d)

Analysis of the reliability of the resultant network of components with the help of blockdiagrams, mathematical models, stress-strength analysis (i.e. stress-strength relationship
under severe loading conditions - dynamic, shock, high temperature, etc., worst-case
analyses, etc.).

e)

Early in the study, establishment of the different ways in which components in the resultant
system can fail (failure modes analysis) and the effects of these failures on other elements
of the system (i.e. effect analysis).

f)

Perform reliability predictions and assessment, as more and more engineering data is
made available, to check the extent to which the system design has met requirements and
other factors identified through the allocation process. Can base predictions on analysis of
similar equipment, estimation of the reliability of active elements (i.e. smallest building
blocks) and also on stress analysis.

g)

Consider the effects on reliability of storage, packaging, transportation, handling,


maintenance, etc.

h)

Perform Critical-useful Life Analyses (CLA). A critical-useful-life item is the one that
because of its short life, is incapable of satisfying the functional requirements imposed by
its application unless corrective or preventive maintenance is performed. During the design
phase, critical items are listed along with their expected life in terms of calendar time,
operating cycles, or system operating hours. This listing specifies the requirements for
maintenance, personnel support and spare parts.

i)

Formal design reviews, focusing on the evaluation of the characteristics of the system and
its elements as to how they meet the initially specified reliability requirements for the
system. The reviews cover each key stage of the system design process - including
conceptual design, preliminary system design, and detailed design checklists/questions are
developed covering aspects considered important from the reliability point of view.

j)

Conduct reliability test and evaluation to see whether the system meets the specified MTBF
requirements. This is achieved by operating the system in the prescribed manner for a
specified length of time while failures are recorded and evaluated as the testing is done.
The system is said to be O.K. if a minimum acceptable operational life is obtained.

k)

Once the system is put to commercial or actual routine operations, an opportunity to


evaluate the reliability of the systems in a realistic situation is obtained, hence the idea of
"operational reliability assessment". This consists of on going collection of data, analysis
and corrective action.

3.4 Reliability Centred Maintenance (RCM)


RCM is a systematic analysis approach whereby the system design is evaluated in terms of
possible failures, the consequences of these failures, and the recommended maintenance
procedures that should be implemented. The objective is to design a preventive maintenance
program by evaluating the mainteance for an item according to possible failure consequences. The
RCM analysis is very similar to the Failure Mode and Criticality Analysis - FMECA - in many
respects, should be accomplished in conjuction with the FMECA, and should constitute a major
data input for the Logistic Support Analysis (LSA).
9

You might also like