You are on page 1of 2

White Paper MTBF: The most misunderstood RAMS parameter

It may seem paradoxical that one of the most used indicators among the reliability engineers is, at the same time, the most controversial one. The MTBF (Mean Time Between Failures), is frequently used by the electronic devices manufacturers as a marketing tactic intending to show how good their product is. Nevertheless, as it will be outlined in this article, an MTBF stated in isolation, may lead to erroneous interpretations. Failure rate Before analyzing the MTBF parameter, the failure rate must be introduced. Let us assume the fact that any alive element, that is, active and working during an operation period, will finally fail sooner or later, as an immutable truth. In its first vital stage, the failures feature to happen more frequently, due to Infant mortality, which gradually decreases over time until reaching an almost constant pace. This second period of the items life, known as service lifetime, ends up at the moment in which the component grows old, leading to a rise in its failure speed. This failure pattern is commonly known as the bathtub curve because of the shape of the failure rate vs. time graphs.

SILVER ATENA
Parque Empresarial San Fernando Avda. Castilla, 2 (Edificio Italia-Planta Baja) 28830 San Fernando de Henares, Madrid Tel: Fax: +34 91 659 11 04 +34 91 651 06 18

During the period of constant rate failure, the probability function for the failure-free operation can be approximated to an exponential curve . Accordingly, and taking into account that the MTBF is defined as , it can be concluded that .

This equivalence - MTBF, mathematically right, produces, nevertheless, the first headache for those embarking on the reliability engineering field for their first time, and even for others with a broad experience. How to get the failure rate The failure rate of a component can be measured as the number of failures per time unit. However, it can be too frequently mistaken with the mean failure rate. Knowing how components are tested in laboratory to obtain their failure rate, may help to understand the right unit to be used. So lets consider a light bulb, whose we wish to calculate. We must imagine a board with a high number x of light bulbs on, and wait until one of them goes off, that is, fails. After waiting for the failure z of a specific number of light bulbs, we could get out the failure rate , where T is the on-time. Thus, it is shown that the failure rate, instead of being an indicator related to a single item, it is obtained for a population. Therefore, it does not represent the mean failure rate of a single item appropriately.

Fig. 1 Bathtub curve

SA/WHITE PAPER/0001

Ver: 1.0 SILVER ATENA

19-sep-13

Page 1 of 2

White Paper A confidence relationship From the foregoing, the accuracy of the failure rate which is measured in the laboratory mainly depends on the sample size and on the cumulative time for this one. The degree of uncertainty of the calculated failure rate is represented by a confidence interval. In order to evaluate the accuracy of the measured failure rate, the RAMS engineer should know both the probability of the failure rate is within of confidence interval, in percentage rate, and the amplitude of this one. The confidence level for a failure rate should not be less than 60%. An all too common mistake It is derived from , that its unit of measurement corresponds to number of items-time units per failure. Notwithstanding, in practice, the number of items and failure factors are often ignored and only time units are considered. Consequences of the above, along with the not very fortunate MTBF expression, mean time between failures, let us believe that the average time to the next fail will be equal to the MTBF. Just by consulting the datasheet of some of the major PCs hard-drive manufacturers, we realize that this assumption cannot be right. The MTBFs handled for these components range from a million to two millions of hours. Taking into account the fact that one year equals 8760h, is anyone able to imagine a hard drive operating without failures for a mean time ranging from 114 to 228 years?! If so, no one would understand that the guarantee period did not surpass the 5 years value in the best of the cases. What is going on, then? On the one hand, the wrong use of the MTBFs unit of measurement. A hard drive with a MTBF value of 1 million, just means that for each million of hard drives operating within an hour, it can be assured with a confidence or certainty value equal to x % that one of them will fail (items-time/failure). Besides, the bathtub curve is ignored, where, as shown above, at a certain time, the failure rate is no longer constant, and therefore, the relationship is not valid anymore.
SA/WHITE PAPER/0001 Ver: 1.0 SILVER ATENA

Service Lifetime This time-point to which we refer to in the previous section, is what in fact determines the end of the items service lifetime (see bathtub curve), but this time-point is usually hidden by manufacturers in their specifications. Undoubtedly, it is much more effective, commercially speaking, to say that your equipment features a MTBF value of 1 million hours (more than 100 years), rather than saying that its expected lifetime is 5 years, that is, the acceptable period of use in service before the failure rate is soaring because of the aging of the item. What should we do with the MTBF? At this point, we might believe that wed rather not mentioning the MTBF parameter anymore and focus on the items service life time. However, I do not think this would be a good idea, provided that related indicators are proper enough only if their meaning is well understood and if they are correctly used. Thus, from now on, it would be more advisable to keep using the MTBF indicator but being aware of its authentic meaning. Having an item with a high MTBF and lifetime, x, it is suggested that its survival probability along its service lifetime x, will be high, as well, . This data is really useful in order to estimate spare parts correctly, with the objective to guarantee the availability level that will satisfy the final customer requirements. However, if what we need to know is when we must change our hardware so the whole system does not break down, it will be more convenient to have the items service lifetime. To conclude, it would be advisable that manufacturers increase the accuracy of any MTBF value, involving those other values that seldom accompany it, that is, the probability of being in the confidence interval, the width of that confidence interval and the period of service life. These parameters, if well-used, are very valuable data which will prevent us from making mistakes in the management and planning of the systems in operation. Author: Juan Ramn Ruiz Jimnez (RAMS Senior Consultant in Silver Atena)

19-sep-13

Page 2 of 2

You might also like