You are on page 1of 17

RELIABILITY:

CONCEPTS & TRENDS

By: Carlos Mario Perez Jaramillo


STRATEGIES FOR LIFE CYCLE

ARTICULO

RELIABILITY: CONCEPTS & TRENDS


Author: Carlos Mario Perez Jaramillo

1. INTRODUCTION
Maintenance management has dynamically and permanently evolved. Maintaining
involves agreeing with new technological developments, new challenges for industrial,
service and agricultural sectors. New challenges are related to the need for optimizing
efficiency and efficacy in production of goods and provision of services as well as
improving quality and assuring integrity of people and their environment.
These requirements have a direct repercussion on maintenance management and have
generated evolutionary processes around definition of maintenance techniques and
strategies, not only centered in interventions on equipment, but also in an integral
management that from a business and systemic perspective, approaches opportune
relationship with strategic, administrative, technical and operative work of the
maintenance area.
As every process undergoing evolution, control of maintenance has followed a series of
chronological stages that have been characterized by some specific methodologies. It is
advisable to outline that reaching a more advance stage doesnt necessarily mean
totally abandoning former methodologies; even if the newer ones lose relevance.
First generation
First generation covers the period running until World War II. In those days, companies
were not too mechanized, and shutdown times didnt matter so much. Assets were
simple and mostly designed for a specified purpose. These made them reliable and
easy to maintain. Complicated maintenance systems were not needed, and need for
qualified staff was lesser than now. Some of the characteristics were:
Repairing in case of failure.
Simple equipment.
Second generation
Things drastically changed during World War II. War times increased demands of all
type of products, while labor got a remarkable reduction: that led to the need for an
increase in process automation. In the 1950s, all type of assets as well as increasingly
complex were constructed, and companies started depending on them.

Upon increasing this dependence, the assets downtime became more evident and
important. This situation lead to the idea that failures could and should be totally
prevented, situation whose output was the birth of the Preventive Maintenance Concept,
and thats how in the 1960s, maintenance was fundamentally based on complete
intervention of assets at fixed intervals.
Maintenance cost started going up considerably regarding other operation costs; as a
result, planning and maintenance scheduling systems started being introduced in order
to keep it under control. The main characteristics of this time were and still being, in
some cases, the following:

Periodic Interventions
Costs reduction
Shutdown reduction
Systems for intervention planning and scheduling
Computerization.
Emphasis on Statistics
Maintenance carried out by specialties.
Orientation towards implementation

Third generation
Since the mid-1980s, change process in companies has reached vertiginous speeds
due to the increasingly demands of society, clients, employees and shareholders.
Continuous automation growth at every scope and high mentioned demands showed
that failures have effects every time more important on business performance. A
situation that is clearly evidenced in the trend towards response and flexibility timely
systems, where optimum inventory levels make that impact of any failure on operation
may be mitigated, based on reduction of downtimes and affectations on quality and
services.
Mechanization and complexity growth of business processes, along with greater risks in
handling, control and disposition of materials, make failures to cause more harmful
consequences in security and environment, especially if it happens in a society which is
increasingly less tolerant.
Evolution of processes and dynamism of businesses changed paradigms and basic
credences about maintenance. It is clear that nowadays it is not too relevant to do so
much, but doing it well. Nowadays it is recognized that theres a minor connection
between operation time of an asset and its failure possibilities. Reliability is more

recognized as an issue of users satisfaction than a statistic problem and, likewise, the
concept result is outlined as a preponderant objective instead of control.
Today theres an intense and dynamic change in concepts, strategies, methods and
techniques being applied upon maintenance. Some maintenances characteristics of
this century are:

Condition Based Monitoring.


Search of Reliability.
Design of Reliability and Maintainability
Risk Analysis.
Cause /effect analysis.
Modern decision making systems.
Integration of Computing and Automation Systems.
Integration with operations.
Integrated HHRR who implements, manages, directs and defines strategies.
Application of Management Models.
Understanding different failure modes.

2. RELIABILITY CONCEPT
The word Reliability is frequently used now and, unfortunately, sometimes, said use is
done ignoring the context and real implication; there are several improvement
techniques in asset improvement and, with the use of this word, a constant advertising
siege has been developed.
The most known concept to define Reliability is: Probability that an asset or system
operates without failing during a given period of time, under some operation conditions
previously established.
Sometimes, this concept is wrongly used, due to the particular use given to the word
Failure; for many, Failure only means shutdowns and thus they construct Complex
Mathematic Complexes to calculate shutdown probability, without taking into account
that theres a failure also when being inefficient, insecure, and costly, when having a
high rejection level and, when contributing to a bad image.
Other item to be taken into account is shutdown causes that may occur for many
reasons and, mixing pears with apples should be avoided... per example, shutdowns
due to bearing lubrication with shutdowns due to errors in bearing mounting.

Some have coined the term operational reliability as capacity of an installation or


system (integrated by processes, technology and people) to meet its function within its
design limits and under a specific operational context.
The term operational does not set a clear boundary with reliability concept and, in
some companies, this is only limited to measure indexes in order to control "reliability."
For others, Reliability is the set of theories and mathematic methods, operative
practices and organizational procedures, that being applied upon the study of the Laws
of Occurrence of Failures, allows turning to troubleshooting problems as to prevention,
estimation, and optimization of survival probability, improvement, average duration and
System Proper Operation Time Percentage and they use three ways to state it:
Desired Time Operation Percent
Sometimes stated as: The Asset has 95% reliability in 720 hours planned time,
generating confusion in the famous and very used availability concept, or efficiency of
desired use of the system, equipment or asset.
Mean Time Between Failures (MTBF):
Sometimes stated as: Mean Time Between Failures for the equipment is 3,000 hours.
The cipher is an average (a trend cipher) and its value tries to describe the behavior of
a set of data or a sample (times and failures). This term is overvalued by some, thus
generalizing the idea that reliability is improved if failure frequency is reduced in a time
interval. (Note that failure here is stopped).
Failure Rate:
Sometimes stated as failure percentage in total number of elements or as the number of
failures during a given t time: Per example: Batteries have 1% failure rate during one
year guarantee period.

2.1 . Is Response Statistic?


A very common discussion is whether or not reliability is a statistic issue; managing data
has an undeniable usefulness in the companys management and direction; it is
necessary to distinguish if statistics is used to manage real data and see its behavior or
to support forecasts and estimations that sometimes border upon daring and
irresponsible speculations.
In maintenance, data of all types, quantity and quality are used and discussion about
using large volumes of information should be placed in the responsible utilization
thereof and not in their existence.

A real case of prudent information application was carried out by the US Aviation
Industry in the 1960s, since it made a survey that showed that different elements failed
in different manner and that even a particular element may fail in several manners. In a
simpler manner: It is not the same changing an item because "It is going to fail" or
changing it "because it failed" than changing it because a frequency was met "before it
failed." Specifying, an item that failed due to wearing is not the same than another that
failed due to an improper installation or one damaged by an accident.
Some authors adhere to defining mathematical postulates as an absolute true about
failures and deny the fact that numbers of analyzed failures mix effects with causes; in
addition, they deny that having failure data to analyze is accepting that failures occur
and, more data, more failures.
The most common Concepcion of Reliability is like the average time between failure
occurrences; this statement has several connotations to be considered, the first is to
remember that the cipher is an average and that the failure concept is associated to
more shutdowns than with unconformities such as spilling, non-conforming product, or
increased risks which are failures too.
Datum as such, is an average cipher; theres a great difference between probability and
reality, thus many confusions are generated. A probable failure is a possible failure and
an occurred failure is a real failure and, not necessarily a calculus logarithm assures its
occurrence at a given point.
Per example, a calculation produces 75% failure mathematic probability, for a
component that in average has lasted 1,200 days in a defined operational context; this
doesnt mean that it is not going to fail, or that the failure is immediate. Even more, if
theres another having 95% probability, the later may fail afterwards and it doesnt mean
that maintenance strategy is necessarily different, especially when causes have been
mixing (failure due to lubrication or mounting error).
Therefore, using calculated, desired, estimated, arbitrarily fixed, imagined,
recommended by manuals and even invented ciphers, may carry themselves error
percentages, inaccuracy and deficiencies requiring responsible handling.
Per example, a boiler has the following failure causes:

Figure 1. Boilers example

If failures are analyzed, the following results:


No.
1
2

5
6

8
9

Failure Cause

Effect

Dirty casing
Increases fuel consumption.
Release
valve
is In case of pressure increase, steam would
plugged at closed.
not be released, thus increasing risk.
Increases fuel consumption
Combustion gas relief
Non-compliance
of
environmental
is partially plugged.
legislations.
Increases fuel consumption
Fuel system is bad
Increases gas issuance, non- compliance
adjusted
of environmental legislations
Bearing of the burner Combustion air is not supplied and boiler
fan is worn out.
turns OFF.
Piping is ruptured, theres a leak and
Steam release piping
someone may get burnt. There are
features corrosion.
associated damages.
Pumps motor Power
Pump stops, water is not supplied and boiler
cable
has
been
turns OFF.
bumped.
Water
pump
thermistor motor fails, Upon a surcharge, engine would burn.
it is closed.
Forced Temperature Upon temperature increase, boiler would not

Generates
shutdown?
No
No

No

No

Yes
No

Yes

No
No
7

No.

10

Failure Cause
Sensor
(bridged)
Dirty boiler

Effect

Generates
shutdown?

Signal turn OFF, increasing risk.


Companys standards are not met.

No

It is clear that not all failures affect availability, therefore they could not be used in
calculating MTBF, as it is recurrently used.
Getting back to boiler failures:

Assuming that 10 failure modes are produced within 720 hours (1 month).
Only 2 of the above failure modes produce shutdown, generating a total of 20
shutdown hours.
According to the traditional failure concept, calculation of MTBF for boiler would
be: MTBF = (720 hours 20 hours) / 2 failures = 350 hours.
If for the company, the MTBF goal is 300 hours, the goal would be meeting.
Probability that boiler does not fail before the MTBF goal would be calculated this
way = e-(300/350) = 42,5%

Thus, analyzing numbers only may give peace of mind to some people; however, there
are other failure manners an asset may feature, such as:
Incompliance of cleaning standards;
Inoperative protections;
Harmful situations for security and environment;
Greater fuel consumption, that is greater cost.
Then, if the asset does not perform all required functions as desired, it is considered a
failure.
Even more, if real failure concept is applied, calculations would be different:
MTBF = 720 hours 20 hours / 10 failures = 70 hours.
Since for the company the MTBF is 300 hours, the purpose would not be met.
Probability that boiler does not fail (with the current failure concept) before the
MTBF goal, would be calculated this way: Probability = e-(70/350) = 1.37%
Very few companies have data on MTBF; they really have a datum on mean time
between shutdowns.

Very few companies record failure occurrence at failure mode scope and some other do
it, but their information systems difficult MTBF calculation.
Conclusion: Time being used for mathematic calculation of MTBF or Failure Probability
would be better used to define failure consequences and to define an action plan to
mitigate these consequences.
Data paradigm
Business processes usually have few assets of a single type; the trend is putting them
into operation in groups instead of doing it simultaneously.
Sample sizes trend to be very small for statistic procedures to be really convincing.
Assets are always in a continuous evolution and modification state, as a response to
new operational requirements and in a try to eliminate failures having serious
consequences or that preventing them is too expensive, this means that the time an
asset is used in any configuration is relatively short, therefore, database is very small
and it is constantly changing.
Due to asset complexity and diversity, for most companies it is not easy to develop a
complete analytic description of the reliability characteristics because many functional
failures are not caused by 2 or 3, but for 2 or 3 dozens of failure modes.
It is easy to plot the incidence of functional failures, but statically is difficult to separate
and describe the failure pattern being applied to each failure mode.
There are differences in Data Gathering Policy from one organization to another. One
item may be removed from one place because it is failing while in another place it is
removed because it has failed; similar differences are caused by different performance
expectations.
Resnikoffs Riddle
Gathering information considered as great need for those who design the
Maintenance Policy information on critical failures - is unacceptable at the beginning
and it is evidence of failure of the maintenance plan. This is because critical failures
cause potential deaths, but there is no death rate being acceptable for any organization
as the price of failure information to be used to design a Maintenance Policy H.L.
Resnikoff.

2.2 How to Improve Reliability?


Currently, the issue faced by maintenance staff is not only learning what the new
techniques are, but also being able of deciding what are useful or not for their own
companies.
If properly chosen and used in an integrated manner, possibly they will improve
maintenance practices and outputs and, likewise, cost will be optimized. If improperly
chosen, more problems will be created which in turn will worsen existing ones.
Companies want to assure their future by means of defining strategies, planning and
application of activities leading to achieve objectives related to availability, quality,
security, environmental integrity, and effectiveness of satisfactory costs for owners,
community, employees and clients. To meet these objectives, the companies have to
exceed, control or establish challenges, such as:
Improving Reliability: It is related to reducing failures in a time interval.
Understanding as failure any event affecting Asset Performance.
Reducing Risk: implies application of measures to minimize circumstances
affecting a loss possibility.
Improving Profitability: It is related to the capacity to generate profit or benefit; in
other words, the relationship between profits and investment or resources that
were used to achieve them.
Using best practices: Implementing methods, tools, methodologies, procedures
and processes that have been used by companies in a continuous and congruent
manner and that have contributed in an efficient manner to achieve the best
results in the performance of their assets.
Meeting Legislation: Assuring compliance of the set of laws and rules establish
by one State regarding a specified subject or issue.
Supporting Growth: It is related to the growth of profits or the value of goods and
services produced by a company; it is also related to certain indicators that as a
whole show an organizations progress.
Assuring Security: Looking after implementation of measures and actions to
provide protection against specific risks.
Assuring sustainability: Considering long term consequences to make sure that
decisions being made are implemented for future requirements and obligations.
Leadership: Influence being exerted upon people and that allows encouraging
them to work for a common purpose, making right decisions.
Improving productivity: It is related to growth of the relationship between the
quantity of goods and services produced with a quantity of required resources.

10

Reducing vulnerability: It is reducing susceptibility of any system to a hazard


impact.
Compliance of environmental rules: it is related to adherence to Law governing the
environment that determines the forms life including natural, social and cultural
elements existing in a specific place a/o time.

The above mentioned risks result in greater requirements as to maintenance activities


and actions. New requirements and technological expectations have broadened tasks,
responsibilities and requirements as to strategies, plans, programs, response times,
competences, accuracy in implementation and organization of maintenance tasks. As a
response to these challenges, there are the following strategies and processes to
manage assets:

Prospection
Marketing Plans
Purchase
Planning
Reliability
Maintenance
Risk Analysis
Good Governance and Social Responsibility

There are some companies that have gone beyond statistics and have reviewed their
internal practices, and they carry out benchmarking with those which are outstanding.
These organizations came to the conclusion that it is impossible to talk about reliability
as a unique cipher; therefore, it is necessary to use several measurements as
fundamental indicators of inputs/outputs of the processes.
Need for reliability in installations is as old as humanity, but undeniably the growing
relevance of environmental issues and their security have led to the need for changing
orientation of some markets and niches, due to:

More Complex products.


High pressure to reduce costs and being able of competing.
Greater number of operational functions carried out by equipment and machines.
Requirements as to reducing products weight and volume, maintaining and
improving performance and security standards.
Requirements as to increase or reduce operation duration of products, to
increase or reduce demand.
Greater difficulties to carry out maintenance interventions due to asset utilization
increase.
11

Trends to use software, electronic, pneumatic or hydraulic components having


different wearing behavior in regard to components failing in function of age.
Current Legislations increasingly more demanding and less tolerant.
Greater impact of shutdowns and operational loses on sales and products.
Growing level of demands in quality parameters of services and products.
New conceptions of the image concept or companys commitment.
Commitment to reduce human life loss risk.
Request as to reduce spilling risk or affectations of the equipment on
environment.

Successful companies have made a concerted effort to incorporate their maintenance


improvement strategies into other corporate initiatives, avoiding or preventing the
syndrome of the campaign of the moment or the peak of the wave, or the promotion of
the month. The best indication that this is effort produces satisfactions is supported on
the fact that it turns into a durable and stable policy.
These new demands drive the use of strategies that have been successfully applied in
many companies strengthening global performance, optimizing costs, reducing risks,
improving corporate image, lowering environmental impact and consolidating business
results.
Amongst the most successful tools being used and congruent, there are:

Orientation towards reliability as a global concept, instead of reducing costs or


downtime reduction.
Carrying out diagnoses, audits and evaluation of maintenance practices.
Definition and use of a development strategic plan describing and establishing a
corporate vision related to reliability and asset good performance.
Extensive utilization of performance measurements with appropriate goals.
Using benchmarking to identify opportunities and barriers for improvement.
Sharing knowledge and achieving consensus among areas typically separated,
using teams with different functions and specialties, who work together during a
specific period of time, analyzing problems and opportunities aiming at a
common output.

Challenges set forth by new generations of maintenance may be achieved by


constructing a step by step scheme in order to overcome defined and verifiable stages.
Determination as to choosing this path is given by recognizing strengths and
weaknesses of maintenance management, in order to have a view of a desired status,
expressed in objectives. These goals are justifiable if they are related to companys
results and not only with maintenance operative results.

12

To Reliability, Maintenance is not the only responsible area. I t r e q u i r e s


r e s p o n s i b l e d e s i g n s , consistent a n d t r a i n e d operators, professional
purchasers and stable policies. In other words, several responsible actors take part
during life cycle.
Maintenance is considered as an action, which is a joint responsibility, more than a
function: maintenance starts with selecting equipment, it follows with installation; it is
supported on right operation and good maintenance, with support provided by
purchases and inventories.
For this reason, those responsible for being involved for assets to be reliable or not, are:

Design.
Selection.
Manufacturing.
Suppliers,
Installation;
Environment;
Operation;
Maintenance;
Stores;
Purchases.

Conclusion: Improving MTBF is not enough.

3. PRESSURE FOR RESULTS


The commitment that sometimes becomes more heroic is reducing maintenance
costs. Generally motivated by pressure to increase productivity and reduce costs.
Many companies are specialized in not spending and have become a use source of a
broad range of tools and methodologies that have emerged in the last years; generally
they are focused on not increasing maintenance costs.
This situation has led to make some deficient decisions that bring several benefits at
short term, but rarely sustainable at long term and that may be harmful.
We have to be aware of the direct costs reduction concept, because it aims at saving
and it is more related to cutting; therefore, it is necessary to know the real impact of
using appropriate and timely item in resource management, related to the impact on
business activity.
For that reason, any change should aim at improving the company and not at
maintenance.
13

Then, outputs are not particular, they are integral for the entire organization, achieving:

Optimum maintenance, operation and power consumption costs.


Better utilization times of the assets: Quantity and results
Reduction of accidents and incidents.
Less environmental affectation
Improvement of employment environment.
Growth of satisfied customer percent.

As a conclusion, a reliable Asset is effective, efficient, profitable, and secure, it does not
affect the environment and produces little non-conformities.
These outputs achieve direct cots reduction at business scope, instead of focusing on
economy. Better practices may be applied independently of the organizational
structure and the type of company.
Reliability is a powerful tool to provide competitive advantages that may increase
profitability, security, and customer and user satisfaction as well as respect.
Although strategies and activities to achieve improvement may be very clear, and action
points are easily listed and prioritized, the final output (transformation) towards a
reliability corporate culture takes time.
The larger a company is, cultural change takes longer.
Substantial changes may be made within 5 years, and results start to be noticed upon
first two or three years.
People themselves show certain resistance to change and the way to manage it is
different in each company, when employees already assimilate and accept new
schemes, rather than resisting and doubting about usefulness, change is being
achieved.
Optimization initiatives usually lose impetus, one of the reasons is that people get
acquainted with relationship change and look for new keys on how to act.
If a communication plan has not been implemented as part of the change, those
carrying out the job have time to get adjusted to the new function and do not find
reasons to start something new.
Some maintenance decisions have encouraged continuity of their traditional processes,
even when operation a/o production and customer requirements change, instead of
encouraging it, there is resistance to it. Maintenance reacting to changes in operation
requirements in a manner more reactive than proactive is more than common.
14

Responsibility of a real maintenance strategist is accelerating evolution, involving


employees in constant progress. Organizational structures are changing and their size
is being reduced, but not their relevance.
Those responsible for making decisions on systems, equipment and assets fully need to
understand their responsibility and implications of the decisions they make. Thus,
established dispositions will be properly defended. In other words, maintenance strategy
should be totally auditable at the scope of indicators, methods, tools and processes

15

BIBLIOGRAPHY
JONES, E. Constructing a Corporate Culture towards Reliability
McGREY, M. Structuring Training in Reliability.
MOUBRAY, J. Reliability Centered Maintenance.
GULATI, R. Maintenance and Reliability best Practices
SMITH, R. and MOBLEY K. Rules of Thumb for Maintenance and Reliability
Engineers
VESIER. Ph.D Carol. Benefits Achieved Through Reliability.
VIOSCA, Robert R. Reliabilitys Ladder to World Class.
HERNU, M. Effectively Using Benchmarking data.
PETERSON, S. B. Designing the Best Maintenance Organization.
PETERSON, S. B. Creating an Asset Health Care Program.
MATHER, Daryl. Strategic Importance of Asset Management.
SEXTO, Luis Felipe. OH! Statics.
Perez Jaramillo, C. M. Management and Asset Life Cycle (Maintenance Evolution
and Maturing). Soporte y Campania. Medellin.
Perez Jaramillo, C. M. Reliability: Human Talent or Tools? Soporte y Campania.
Medellin.
Perez Jaramillo, C. M. Future of Maintenance Function. Soporte y Campania.
Medellin. Retrieved from www.soporteycia.com

16

AUTHOR
Carlos Mario Perez Jaramillo
Mechanical Engineer. Information Systems Specialist Engineer. Asset and Projects
Management Specialist.
MBA in Project Management and Physical Asset
Administration.
RCM2 Professional of Aladon Network. Certified as Endorsed assessor and Endorsed
trainer of Institute of Asset Management.
Maintenance Management and Direction Adviser and Consultant. He has developed
and supported application of Asset Management Models in food, mining, oil,
petrochemical, textile, utilities, training and power companies.
He has instructed RCM, failure analysis, maintenance planning and scheduling, costs,
maintenance management indicators, life cycle cost analysis and standard PAS 55 for
optimum asset management.
He has worked en RCM divulgation, training and application, Maintenance management
and Asset management in companies in Ecuador, Peru, Spain, Chile, Argentina, Cuba,
Mxico, Panama, Costa Rica, El Salvador, Guatemala and Colombia.

17

You might also like