Practical Statistical Tools For The Reliability Engineer

Practical Statistical Tools for the Reliabilit y Engineer
Reliability Analysis Center

RAC is a DoD Information Analysis Center sponsored by the Defense Technical Information Center and operated by IIT Research Institute, Rome, NY
Ordering No.: STAT
Practical Statistical Tools for the Reliability Engineer
Qrhrqi)
Sryvhivyv6 hyv8rr ! HvyyTrr SrI` "##%( %
Vqr8hp)
9rsrrTy8rr8yi 9T88QGD QP7 %&#7vyqvt!
8yiPC#"! %$
Reliability Analysis Center

S68vh99Dshv6hyv8rrrqiur9rsrrUrpuvphy Dshv8rrhqrhrqiDDUSrrhpuDvrSrI`
6rqsQiyvpSryrhr9vvivVyvvrq
The information and data contained herein have been compiled from government and nongovernment technical reports and from material supplied by various manufacturers and are intended to be used for reference purposes. Neither the United States Government nor IIT Research Institute warrant the accuracy of this information and data. The user is further cautioned that the data contained herein may not be used in lieu of other contractually cited references and specifications. Publication of this information is not an expression of the opinion of The United States Government or of IIT Research Institute as to the quality or durability of any product mentioned herein and any use for advertising or promotional purposes of this information in conjunction with the name of The United States Government or IIT Research Institute without written permission is expressly prohibited.
-ii-
REPORT DOCUMENTATION PAGE

1. AGENCY USE ONLY (Leave Blank) 2. REPORT DATE 3.
Form Approved OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching, existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget. Paperwork Reduction Project (0704-0188), Washington, DC 20503.
REPORT TYPE AND DATES COVERED
September 1999
4. TITLE AND SUBTITLE 5. FUNDING NUMBERS

6. AUTHOR(S)
65802S
Anthony Coppola
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER
Reliability Analysis Center 201 Mill Street Rome, NY 13440-6916

9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)
STAT
10. SPONSORING/MONITORING AGENCY REPORT NUMBER
Defense Technical Information Center (DTIC-AI) 8725 John J. Kingman Road, Suite 0944 Ft. Belvoir, VA 22060-6218
11. SUPPLEMENTARY NOTES:
SP0700-97-D-4006
Hard copies available from the Reliability Analysis Center, 201 Mill Street, Rome, NY 13440-6916. (Price: $75.00 U.S., $85.00 Non-U.S.).
12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE
Approved for public release; distribution unlimited.

13. ABSTRACT (Maximum 200 words)
Unclassified
This report provides basic instruction in statistics and its applications to reliability engineering. General probability and statistical concepts are explained and specific statistical tools are introduced by considering their applications to measuring reliability, demonstrating reliability, reliability growth testing, sampling, statistical quality control, and process improvement.
14. SUBJECT TERMS
15. NUMBER OF PAGES
120 Probability Statistics Reliability

17. SECURITY CLASSIFICATION OF REPORT 18. SECURITY CLASSIFICATION OF THIS PAGE 19. SECURITY CLASSIFICATION OF ABSTRACT
16. PRICE CODE $75.00

20. LIMITATION OF ABSTRACT
NSN 7540-01-280-5500
Standard Form 298 (Rev. 2-89)

Prescribed by ANSI Std. Z39-18 298-102
-iii-
The Reliability Analysis Center (RAC) is a Department of Defense Information Analysis Center sponsored by the Defense Technical Information Center, managed by the Air Force Research Laboratory (formerly Rome Laboratory), and operated by IIT Research Institute (IITRI). RAC is chartered to collect, analyze and disseminate reliability, maintainability and quality information pertaining to systems and products, as well as the components used in them. The RAC addresses both military and commercial perspectives and includes such reliability related topics as testability, Total Quality Management and lifetime extension. The data contained in the RAC databases is collected on a continuous basis from a broad range of sources, including testing laboratories, device and equipment manufacturers, government laboratories and equipment users (government and industry). Automatic distribution lists, voluntary data submittals and field failure reporting systems supplement an intensive data solicitation program. Users of RAC are encouraged to submit their reliability, maintainability and quality data to enhance these data collection efforts. RAC publishes documents for its users in a variety of formats and subject areas. While most are intended to meet the needs of reliability practitioners, many are also targeted to managers and designers. RAC also offers reliability consulting, training, and responses to technical and bibliographic inquiries.
REQUESTS FOR TECHNICAL ASSISTANCE AND INFORMATION ON AVAILABLE RAC SERVICES AND PUBLICATIONS MAY BE DIRECTED TO: Reliability Analysis Center 201 Mill Street Rome, NY 13440 General Information: Product Ordering: Training Inquires: TQM Inquiries: Technical Inquiries: TeleFax: DSN: E-Mail: Internet: (888) RAC-USER (888) 722-8737 (800) 526-4802 (800) 526-4803 (800) 526-4804 (315) 337-9933 (315) 337-9932 587-4151 rac@iitri.org http://rac.iitri.org
ALL OTHER REQUESTS SHOULD BE DIRECTED TO:
Air Force Research Laboratory AFRL Information Directorate Attn: R. Hyle 525 Brooks Road Rome, NY 13441-4505 Telephone: DSN: TeleFax: E-Mail: (315) 330-4857 587-4857 (315) 330-7647 hyler@rl.af.mil
1999, IIT Research Institute This material may be reproduced by or for the US Government pursuant to the copyright license under the clause at DFARS 252.227-7013 (Oct. 1988)
-iv-
STAT
Preface
PREFACE
Statistical tools are powerful aids to reliability engineering and related disciplines. However, many people, including engineers, consider the process of mastering statistics to be painful. This book is an attempt to provide the reliability practitioner a reasonable capability in the use of statistical tools without the pain. For this reason, discussion of statistical theory will be kept to a minimum, and useful tools will be demonstrated by showing their practical application to various reliability engineering tasks.
Reliability Analysis Center (RAC) 201 Mill Street, Rome, NY 13440-6916 1-888-RAC-USER
vi
STAT
This page intentionally left blank.
STAT
Table of Contents
vii
TABLE OF CONTENTS 1.0 WHAT YOU NEED TO KNOW ABOUT PROBABILITY...................................... 1.1 When Events are Independent .......................................................................... 1.2 When Events are Mutually Exclusive............................................................... 1.3 When Events are Not Independent ................................................................... 1.3.1 Bayes Theorem..................................................................................... 1.4 In Summary....................................................................................................... INTRODUCTION TO STATISTICS ......................................................................... 2.1 Many Ways to be "Average"............................................................................. 2.2 Ways to Measure Spread .................................................................................. 2.3 Introduction to Distributions............................................................................. 2.4 Testing Hypotheses........................................................................................... 2.5 For Further Study.............................................................................................. SOME DISTRIBUTIONS AND THEIR USES.......................................................... 3.1 Discrete Distributions ....................................................................................... 3.1.1 The Binomial Distribution.................................................................... 3.1.2 The Poisson Distribution ...................................................................... 3.1.3 The Hypergeometric Distribution ......................................................... 3.2 Continuous Distributions .................................................................................. 3.2.1 The Normal Distribution....................................................................... 3.2.1.1 The Standard Normal Distribution ........................................ 3.2.1.2 The Normal Distributions Role in Sampling ........................ 3.2.2 Various Other Useful Distributions, In Brief........................................ 3.2.2.1 The Lognormal ...................................................................... 3.2.2.2 The Exponential..................................................................... 3.2.2.3 The Weibull ........................................................................... 3.2.2.4 The Student t.......................................................................... 3.2.2.5 The F Distribution.................................................................. 3.2.2.6 The Chi-Square Distribution.................................................. 3.3 In Summary....................................................................................................... MEASURING RELIABILITY.................................................................................... 4.1 General Principles............................................................................................. 4.2 The Versatile Weibull Distribution .................................................................. 4.2.1 Caveats.................................................................................................. 4.3 Measuring Reliability of Repairable Systems................................................... 4.3.1 Testing for Trends................................................................................. 4.3.2 Confidence Limits when the Failure Rate is Constant ......................... 4.4 Measuring Reliability of "One-Shot" Products................................................. DEMONSTRATING RELIABILITY ......................................................................... 5.1 Zero Failure Tests ............................................................................................. 5.2 Tests Allowing Failures .................................................................................... 5.2.1 Controlling the Producers Risks .......................................................... 5.3 Testing Under the Exponential Distribution..................................................... 5.3.1 Sequential Tests: A Short Cut.............................................................. 5.4 Other Test Considerations ................................................................................ Page 1 1 2 3 4 6 7 7 8 9 11 12 13 13 13 15 16 18 19 20 22 22 22 23 23 23 25 25 25 27 27 28 32 33 33 36 40 41 41 43 43 44 46 48
2.0
3.0
4.0
5.0
viii
STAT
TABLE OF CONTENTS (CONTD) 6.0 RELIABILITY GROWTH TESTING ........................................................................ 6.1 Duane Growth Analysis.................................................................................... 6.1.1 Least Square Regression ....................................................................... 6.2 AMSAA Growth Analysis................................................................................ 7.0 SAMPLING (POLLING) AND STATISTICAL QUALITY CONTROL.................. 7.1 Measuring Quality from Samples ..................................................................... 7.1.1 Caveats.................................................................................................. 7.2 Demonstrating Acceptability Through Sampling ............................................. 7.3 Statistical Quality Control ................................................................................ 7.3.1 Control Charts....................................................................................... 7.3.2 Control Charts for Variables................................................................. 7.3.3 Range Charts......................................................................................... 7.3.4 Interpreting Control Charts ................................................................... 7.3.5 Controlling Attributes........................................................................... 7.3.5.1 Proportions............................................................................. 7.3.5.2 Rates ...................................................................................... 7.3.6 Caveat: "In Control" May Not Be "In-Spec" ....................................... 7.3.6.1 Measuring Process Capability................................................ 7.3.6.2 Measuring Process Performance............................................ 8.0 USING STATISTICS TO IMPROVE PROCESSES ................................................. 8.1 Designing Experiments..................................................................................... 8.1.1 Saturated Arrays: Economical, but Risky............................................ 8.1.2 Testing for Robustness ......................................................................... 8.2 Is There Really a Difference? ........................................................................... 8.3 How Strong is the Correlation? ........................................................................ 9.0 CLOSING COMMENTS............................................................................................ Appendix A: Poisson Probabilities ..................................................................................... Appendix B: Cumulative Poisson Probabilities.................................................................. Appendix C: The Standard Normal Distribution ................................................................ Appendix D: The Chi-Square Distribution ......................................................................... Appendix E: The Student t Distribution.............................................................................. Appendix F: Critical Values of the F Distribution for Tests of Significance...................... Page 49 49 50 52 55 55 58 58 61 62 63 64 66 66 67 68 68 68 70 71 71 75 75 76 79 83 85 89 93 97 101 105
STAT
Table of Contents
ix
LIST OF FIGURES Figure 2-1: Figure 3-1: Figure 3-2: Figure 3-3: Figure 3-4: Figure 4-1: Figure 4-2: Figure 5-1: Figure 5-2: Figure 6-1: Figure 7-1: Figure 7-2: Figure 7-3: Figure 7-4: Figure 7-5: Figure 7-6: Figure 7-7: Figure 7-8: Figure 8-1: Figure 8-2: Distribution of Heads in Two Tosses of a Coin ........................................... Distribution of Heights................................................................................. More Detailed Distribution of Heights......................................................... Continuous Distribution Curve for Height................................................... Standard Normal Distribution ...................................................................... Probability of Failure as Represented by the Area Under the Probability Density Function........................................................................ Weibull Plot ................................................................................................ Devising a Reliability Test ........................................................................... Typical Sequential Test ................................................................................ Typical Duane Plot....................................................................................... Ideal O-C Curve ........................................................................................... Practical O-C Curve ..................................................................................... Run Chart ..................................................................................................... Control Chart................................................................................................ X and R Chart Combination ....................................................................... "p chart" for Different Sample Sizes ............................................................ Process Capability (Cp) Chart ...................................................................... Process Performance (Cpk) Chart ................................................................. Scattergram................................................................................................... Scattergram of Data in Table 8-10 ............................................................... Page 10 18 18 19 21 27 32 44 47 52 59 59 62 63 64 67 69 70 80 81
STAT
LIST OF TABLES Table 1-1: Table 1-2: Table 1-3: Table 2-1: Table 2-2: Table 2-3: Table 2-4: Table 3-1: Table 3-2: Table 3-3: Table 3-4: Table 4-1: Table 4-2: Table 4-3: Table 4-4: Table 4-5: Table 4-6: Table 4-7: Table 5-1: Table 5-2: Table 5-3: Table 6-1: Table 6-2: Table 6-3: Table 7-1: Table 7-2: Table 8-1: Table 8-2: Table 8-3: Table 8-4: Table 8-5: Table 8-6: Table 8-7: Table 8-8: Table 8-9: Table 8-10: Table 8-11: Known Data.................................................................................................. Converted Data............................................................................................. Summary of Section 1 .................................................................................. Salary Data ................................................................................................... Spread Analysis............................................................................................ Experimental Data........................................................................................ Comparison of Results ................................................................................. Extracts from Appendix A ........................................................................... Standard Normal Distribution Data.............................................................. Critical Values of z....................................................................................... Summary of Distributions ............................................................................ Life Data....................................................................................................... Ordered Data ................................................................................................ Completed Data Table.................................................................................. Failure Data .................................................................................................. Critical Values for the Laplace Statistic....................................................... Chi-Square Values........................................................................................ Confidence Interval Formulas ...................................................................... Fixed-Time Reliability Tests........................................................................ Sequential Tests............................................................................................ Sequential Test Plan for 10% Risks, 2.0 Discrimination Ratio ................... Growth Data ................................................................................................. Growth Data Revisited ................................................................................. Comparison of Estimates ............................................................................. Critical Values of z....................................................................................... Statistical Constants ..................................................................................... Two Factor Orthogonal Array...................................................................... Expanded Test Matrix .................................................................................. Orthogonal Array.......................................................................................... Sample Test Results ..................................................................................... Three Factor Full-Factorial Array ................................................................ Saturated Array (Table 8-2 Modified).......................................................... Testing for Robustness ................................................................................. Defect Data................................................................................................... Critical Values for F at 0.05 Significance .................................................... Paired Data ................................................................................................... Data Analysis ............................................................................................... Page 4 5 6 7 9 9 10 16 21 22 25 30 31 31 33 35 37 40 46 47 47 51 53 54 56 65 72 72 73 74 75 75 76 76 79 81 82
STAT
Table of Contents
xi
INTRODUCTION This book presents some basic material on probability and statistics and provides examples of how they are used in reliability engineering. To keep the book short and uncomplicated, not all subjects will be treated in detail and many more topics were ignored. Nevertheless, this text should help the novice reliability engineer understand the utility of probability and statistics, and can serve as a quick reference and refresher for the experienced engineer. It is important to remember that reliability engineering is not just the application of probability and statistics, and probability and statistics are not exclusively dedicated to reliability engineering. Reliability engineering is the science of designing products and processes to be reliable. Probability and statistics are simply tools that can help evaluate, predict, and measure reliability, among other uses. However, it is important for every reliability engineer to be able to use these tools effectively.
xii
STAT
STAT
Section 1: What You Need to Know About Probability
1.0
WHAT YOU NEED TO KNOW ABOUT PROBABILITY
Statistical inferences will often be expressed as probabilities. Probability can be defined as our degree of belief that an event will occur (e.g., I think I have an even chance of finishing this book on schedule). In statistical analysis, however, probability is usually defined as the expected frequency that an outcome will occur (e.g., nine times out of ten the actual value of a products failure rate will be within a range calculated by a particular method). This frequency may be stated as a percentile (e.g., 90% of the time the failure rate will actually be within the calculated range) or its decimal equivalent (e.g., there is a probability of 0.90 that the failure rate is really in the calculated range). Probability may also be referred to as "confidence" (e.g., there is a 90% confidence that the failure rate is within the calculated range). In dealing with probabilities, some useful relationships can be applied, depending on certain assumptions. 1.1 When Events are Independent If we know the probabilities of two events happening, and can assume that the events are independent (i.e., the occurrence of one does not increase or decrease the probability that the other will occur), then the probability of both events happening is: P(a and b) = P(a)P(b) where: P(a and b) = probability of both event "a" and event "b" happening P(a) = probability of event "a" happening P(b) = probability of event "b" happening For example, suppose an airplane uses a satellite receiver to track its position from the Global Positioning System (GPS) and can also track its position from a radio direction finder (RDF) receiver. If the GPS receiver fails once in 100 flights, the probability of losing the GPS tracking capability is 0.01 per flight. If the RDF receiver fails once every fifty flights, the probability of losing the RDF tracking capability is 0.02 per flight. If we can assume that failure of one does not affect the other (this is not a trivial assumption: both could fail simultaneously from some common cause, for example, lightning hitting the aircraft) then the probability of losing both position tracking systems on the same flight is: P = (0.01)(0.02) = 0.0002 Or, two times in 10,000 flights both position tracking systems will be out of service. (1-2) (1-1)
STAT
1.2 When Events are Mutually Exclusive When probabilities are mutually exclusive (i.e., the occurrence of one event precludes the other), the probability of either of two events happening is: P(a or b) = P(a) + P(b) (1-3)
A useful fact is that the sum of the probabilities of all possible outcomes of an event must equal unity. Further, the probability that an event will occur (P) plus the probability that it will not occur (Q) must also equal one, since there are no other possibilities. Thus: P + Q = 1 or P = (1 - Q) or Q = (1 - P) (1-4)
If the probability of a GPS receiver failure is 0.01, then the probability of no failure is (1 - 0.01) or 0.99. Often, it is much easier to calculate one of the parameters (P or Q) than the other. Since P + Q = 1, one parameter can always be found from the other. Continuing the position tracking example, and the assumption of independence, the probability of a flight without total loss of position tracking capability would be: P(s) = P(g)P(r) + P(g)Q(r) + Q(g)P(r) where: P(s) P(g) Q(g) P(r) Q(r) = = = = = probability of success (no total loss of all position tracking systems) probability of no failure in GPS receiver = (1 - 0.01) = 0.99 probability of failure in GPS receiver = 0.01 probability of no failure in RDF receiver = (1 - 0.02) = 0.98 probability of failure in RDF receiver = 0.02 (1-5)
Hence: P(s) = (0.99)(0.98) + (0.99)(0.02) + (0.01)(0.98) = 0.9702 + 0.0198 + 0.0098 = 0.9998 (1-6) Each of these events is mutually exclusive and they constitute all the "successful" situations. The other possibility is Q(g)Q(r), the probability that both the GPS and RDF receivers will fail, which was computed by Equation 1-2 as equal to 0.0002. From Equation 1-4: P(s) = 1.0 - Q(g)Q(r) = 1.0 - 0.0002 = 0.9998 (1-7)
Note: in this example, an event can be defined either as the occurrence of a failure or as the lack of a failure. Hence P(i) can be the probability of no failure or the probability of failure in the component identified as (i). By convention, P(i) is usually the probability of success (no failure), and Q(i) the probability of failure, when both notations are used in one formula. P(i) is generally used when only one notation is needed, whether it refers to a failure event or a non-failure event.
STAT
We are following this convention, even though this reverses the meaning of P(i) from the previous example. Q(g)Q(r) in Equation 1-7 is identical in meaning to P(a)P(b) in Equation 1-1. Finally, consideration of mutually exclusive events leads to another solution for the case of independent events, shown in Equation 1-8. P(s) = P(g) + P(r) - P(g)P(r) where all terms are as defined for Equation 1-5. The rationale for this is that P(g) includes all of the cases in which the GPS receiver is operating, including both the times that the RDF receiver is operating and the times that it has failed. Similarly, P(r) includes all of the cases in which the RDF receiver is operating, including both the times the GPS receiver is operating and the times it fails. Thus, P(g) + P(r) twice counts the times that both the GPS and RDF receivers are operating, and so these times must be subtracted to yield P(s). This is easily proven by decomposing P(g) and P(r) into mutually exclusive events: P(g) = P(g)P(r) + P(g)Q(r) P(r) = P(r)P(g) + P(r)Q(g) = P(g)P(r) + Q(g)P(r) Substituting Equations 1-9 and 1-10 into Equation 1-8 we get: P(s) = P(g)P(r) + P(g)Q(r) + P(g)P(r) + Q(g)P(r) - P(g)P(r) = P(g)P(r) + P(g)Q(r) + Q(g)P(r) which is the same result as Equation 1-5. 1.3 When Events are Not Independent In the examples given in Sections 1.1 and 1.2, a failure of the GPS receiver does not change the probability that the RDF receiver will also fail. This is not always true. Suppose one-tenth of all failures in the GPS are due to external events, like lightning strikes, which also take out the RDF. Then, our calculation of the probability of both receivers failing becomes more complicated. First, we need a new term: P(b|a), defined as the conditional probability that event "b" will occur, given that event "a" has occurred. Then: P(a and b) = P(a)P(b|a) where: P(a and b) = the probability that both events "a" and "b" will occur P(a) = the probability that event "a" will occur P(b|a) = the probability that event "b" will occur, given that event "a" occurs (1-12) (1-11) (1-9) (1-10) (1-8)
STAT
Since "a" and "b" are arbitrary labels, Equation 1-12 can also be written: P(a and b) = P(b)P(a|b) (1-13)
If the events were independent, P(b|a) = P(b) and P(a|b) = P(a), and Equations 1-12 and 1-13 would be identical in form to Equation 1-1. Since the events are not independent, we must do a little more work. If P(a and b) is the probability of both the GPS and RDF receivers failing, and P(a) is the probability of the GPS failing, then P(b|a) is the probability of the RDF failing on a flight when the GPS failed. We know that one-tenth (10% or 0.10) of all GPS failures are caused by factors that also kill the RDF (probability of RDF failure = 1.0). This means that for 90% of the GPS failures, any RDF failures must be from other causes, which have some probability of occurrence whether or not the GPS has failed. To determine this probability, we could search our records using just those flights when there was no failure of the GPS, thus eliminating any effects of GPS failure. Suppose we found that the failure rate for the RDF, using the censored data, was 19 failures in 1,000 flights = 0.019. Hence: P(b|a) = 0.10(1) + 0.90(0.019) Using Equations 1-12 and 1-14, the probability of both receivers failing is: P(a and b) = P(a) P(b|a) = P(a)[0.10 (1) + 0.90(0.019)] = 0.01 [0.10 + 0.90(0.019)] = 0.001 [0.1 + 0.017] = 0.0001 + 0.00017 = 0.00027 (1-15) (1-14)
This result is higher than the 0.0002 found by Equation 1-2, where we assumed independence, even though the RDF failure rate, exclusive of simultaneous failures, is higher in Equation 1-2 than it is in Equation 1-15. When some failure mechanisms take out both units simultaneously, the overall probability of both units failing must go up. 1.3.1 Bayes Theorem A noted derivation from conditional probabilities is Bayes Theorem. For our discussion, let us consider a radar installed in an aircraft. Assume we have gathered some statistics as shown in Table 1-1. Table 1-1: Known Data
Mission Profile Combat Training Transport Percent of Sorties Using Mission Profile 0.20 0.20 0.60 Probability of Radar Failure During Mission 0.20 0.10 0.05
Letting event ai represent the probability of a sortie using a specific mission profile, and event b represent a radar failure, we can convert the data in Table 1-1 to terms of probabilities and conditional probabilities. Table 1-2 shows the converted data.
STAT
Table 1-2: Converted Data

i 1 2 3 Mission Profile Combat Training Transport P(ai) 0.20 0.20 0.60 P(b|ai) 0.20 0.10 0.05
Suppose we are given the information that an aircraft came back from a sortie with a failed radar. We can use the information to calculate the probability that the sortie was a combat mission. To do so, we must derive Bayes Theorem. Since P(a and b) = P(a|b) P(b) = P(b|a) P(a), it follows that: P(ai) P(b) = P(ai|b) P(b) = P(b|ai) P(ai) Therefore: P(ai|b) = P(b|ai) P(ai)/P(b) Since P(a1) is the probability of a combat mission, the solution we seek is: P(a1|b) = P(b|a1) P(a1)/P(b) (1-18) (1-17) (1-16)
Since event "a" is a set of mutually exclusive events, and there is a different conditional probability of event "b" happening for each event in set "a", the total probability of event "b" happening is: P(b) = P(b|ai) P(ai) Substituting Equation 1-19 into Equation 1-18: (1-19)
P(a 1 | b) =
P(b | a1 ) P(a1 ) P(b | a i ) P(a i )
(1-20)
In the equation, P(a1|b) is the probability of the returned airplane having flown a combat profile, given that it came back with a failed radar. The other terms are quantified in Table 1-2. Equation 1-20 is Bayes Theorem. Substituting the data from Table 1-2 into Equation 1-20: P(a 1 | b) = (0.20)(0.20) 0.04 = (0.20)(0.20) + (0.20)(0.10) + (0.60)(0.05) 0.04 + 0.02 + 0.03 (1-21)
0.04 = 0.44 0.09
STAT
Without the knowledge of the radar failure, we would have estimated the probability of the aircraft having just flown a combat profile at 0.20, based on the data in the second column of Table 1-1. The information that the radar failed raises our estimate to 0.44. The 0.20 figure is called the "prior" estimate, because it comes before the gathering of additional data (i.e., that the radar failed on the mission), and the 0.44 figure is called the "posterior" estimate, because it comes after the new data is considered. Bayes Theorem is the foundation of both useful and dubious analyses combining "prior" data with information from statistical sampling to produce a "posterior" estimate. The theorem is quite correct. Some applications that depend on "subjective priors" (i.e., the known information is an opinion or assumption rather than a conclusion from a set of statistics) can be questionable. 1.4 In Summary Table 1-3 summarizes the material presented in this section. Table 1-3: Summary of Section 1
When events are: Independent Mutually exclusive And the following apply: The occurrence of one event has no effect on the occurrence of the other The occurrence of one event precludes the other P(a) = probability event "a" occurs Q(a) = probability event "a" does not occur The occurrence of one event may affect the other One event may have several different outcomes, each affecting the other event differently You can use: P(a and b) = P(a) P(b) P(a or b) = P(a) + P(b) - P(a and b) P(a or b) = P(a) + P(b) P(a) + Q(a) = 1 P(a) = 1 - Q(a) Q(a) = 1- P(a) P(a and b) = P(a) P(b|a) = P(b) P(a|b)
Not independent
P(a 1 | b) =
P(b | a 1 ) P(a 1 ) P(b | a i ) P(a i )
STAT
Section 2: Introduction to Statistics
2.0 INTRODUCTION TO STATISTICS "Statistics" as a mathematical discipline is concerned with describing something in useful numerical terms (e.g., the average salary for a reliability engineer) or with providing the means for making useful inferences about something (e.g., the percent of a product that will be outside specified limits). These products are based on measured values, such as the number of defects in a sample of parts. One measured value is called a "statistic" and more than one are called "statistics". The term random variable refers to the function that describes the values the statistics can take. For example, the outcome of a toss of a coin is a random variable that can take the value "heads" or "tails". The number of heads in two tosses of a coin is a random variable that can take the values 0, 1, or 2. (A case could be made that the constraints on a random variable make it neither truly random nor greatly variable, but we shall have to take the term as given.) A variety of statistical tools are used to convert statistics (i.e., measurements) into the desired descriptions or inferences. These will be discussed in the following sections organized by the use of the results. Common to all of these tools will be the terminology and concepts discussed in this section. 2.1 Many Ways to be "Average" The word "average" is used to describe a central point for the values taken by a random variable. However, more specific measures are needed in statistical analysis. To illustrate, consider the following (completely contrived) data shown in Table 2-1 which we will say represents the salaries of reliability engineers in a mythical company, ordered from smallest to largest. Table 2-1: Salary Data
$20,000 20,000 20,000 40,000 50,000 75,000 125,000
What is the average salary for a reliability engineer in the company? (We suggest you form your own opinion before going on.) The company claims its reliability engineers earn an average of $50,000. The engineers usually figure the average salary as $40,000, and the company union claims it's $20,000. And they are all right! The differences arise from different definitions of "average", and the different definitions are based on the different uses of the measure. The company is interested in what it pays for reliability engineers and calculates the "average" using the arithmetic mean (sum the values and divide by the number of data points). This gives their cost per capita for reliability engineers.
STAT
The engineers are interested in their standing vis a vis each other and compute the "average" based on the median (the value for which there are as many data points above as below, i.e., the 50th percentile). The union uses the mode (the value most frequently measured) as its "average" because it represents the most people. Thus the definition of "average" depends on its use. In most of the methods described below, the arithmetic mean will be the definition of choice. We will call this simply "the mean," ignoring the existence of the geometric mean, harmonic mean, and others, which do exist but will not be needed for our purposes. Any "average" measure not using the arithmetic mean will be noted when it occurs. 2.2 Ways to Measure Spread Besides measuring central tendency, we often want to measure variation, or spread. For example, if we were producing rods designed to be one inch long, we would want the mean length of a production sample to be close to one inch. However, if the mean length were one inch, but the individual rods varied from one-half inch to one and one-half inches in length, we would probably not be happy. Hence, in specifying or measuring product parameters, we are usually concerned with measures of spread. The most obvious measure of spread is, perhaps, the range, defined as the difference between the highest and lowest values of the parameter of interest. Specified values most often include a stated range or "tolerance" around the design value (e.g., "rods shall be one inch long plus or minus 0.001 inch"). This establishes a desired limit on spread, if a somewhat arbitrary one (is 1.0009 inches always good and 1.0011 inches always bad?). We seldom reject a group (or a "lot") of products because some units measure outside the specified range. We do reject groups of products that we feel have too many units outside the specified range. To determine what proportion of the product group is outside the specified range, we need another measure of spread. One way of measuring spread would be to take a sample of the product, measure the parameter of interest in each unit of the sample and compare these measurements to the mean value of the measurements. However, we could not merely subtract each measurement from the mean (or the mean from each measurement), sum these results and divide by the number of units measured. Because some measurements will be higher than the mean and others lower, our results would tend to zero. For example, suppose we measured the length of a sample of rods, and calculated the difference between each measurement and the mean, as shown in Table 2-2.
STAT
Table 2-2: Spread Analysis

Sample 1 2 3 4 5 Sum of all Data Mean Value Rod Length in Inches 0.50 0.90 1.00 1.10 1.50 5.00 1.00 Deviation from Mean Value -0.50 -0.10 0 +0.10 +0.50 0 0
We therefore need a better way to measure spread. We could use the absolute values of the differences between the individual measurements and the mean, but a more common alternative is to use squared values. One measurement of spread, called the variance, is formulated by subtracting the mean value from each measurement (x - x ), squaring the results (x - x )2, summing the squares and dividing by the number of measurements, (x - x )2/n. A more convenient measure is the standard deviation, which is merely the square root of the variance (Equation 2-1). The larger the standard deviation, the more spread there is to the data.
(x - x) n
2
(2-1)
2.3 Introduction to Distributions We have discussed the random variable for the number of heads occurring in two tosses of a coin. Suppose we are interested in wagering on the outcome of two tosses. We would want to know the probabilities of all the possible outcomes, so we could establish appropriate odds. (Note: the use of gambling examples is common in statistics, which was actually created to analyze gaming odds. Gaming examples are quite analogous to some engineering applications, where we are interested in the odds of making a wrong decision based on a set of statistics.) Returning to our example, we could experiment by tossing a coin twice and counting the number of heads. One such experiment would not be any help, but many replications, say 1,000, would give a useful set of statistics. We would then have to organize these statistics into a useful format. There are several ways to do this. One is simply as shown in Table 2-3. Table 2-3: Experimental Data
Outcome (Number of Heads) 0 1 2 Frequency (in 1000 Experiments) 238 509 253
A better way, because it graphically shows the relative frequencies, is shown in Figure 2-1, called a histogram.
10
STAT
509 Frequency In 1,000 Trials 253 238
0 1 2 Number of Heads
Figure 2-1: Distribution of Heads in Two Tosses of a Coin The histogram of Figure 2-1 shows a frequency distribution. The next step towards determining the odds on each outcome would be to convert the frequency distribution to a probability distribution. This could be done by dividing the value of each column by the number of experiments. This makes the total area enclosed by the data equal to one, and the area of each column equal to the probability of the outcome it represents. The odds for a fair bet are then simply the ratio of the probability of the outcome to one minus the probability. For example, the probability of no heads is 0.238 so the odds are 0.238 to (1 - 0.238) or 0.762. One betting on the outcome could put up $2.38 against $7.62 and should break even in the long run, if the computed probability holds. Now, for comparison, we shall compute the same probabilities from a theoretical approach, using probability theory. Let us assume our coin is fair, meaning that it has no bias towards falling either heads or tails, and that our method of tossing is also unbiased. We would therefore expect heads or tails to be equally likely. This means the probability of a head is 0.5 on each toss, and the probability of not getting a head is (1 - 0.5), which is the same as 0.5, but the longer expression shall be used in the following formulas so that it will be clear which probability is meant. From probability theory, the probability of no heads in two tosses is (1 - 0.5)(1 - 0.5) = 0.25. The probability of one head in two tosses is the sum of the probability of getting heads on the first toss and not on the second and the probability of getting heads on the second toss and not the first: (0.5)(1 - 0.5) + (1 - 0.5)(0.5) = 0.50. The probability of getting two heads is (0.5)(0.5) = 0.25. Let us now compare these to our experimental results, as shown in Table 2-4. Table 2-4: Comparison of Results
Outcome (Number of Heads) 0 1 2 Experimental Results 0.238 0.509 0.253 Theoretical Results 0.25 0.50 0.25
The experimental results show a little bias towards getting more heads than tails, but would you bet on it? The difference could just be experimental error, which always exists. Many of the
STAT
11
methods described in this book will be concerned with separating experimental error from true indications of bias. Instead of trying to decide whether or not we have a fair coin, we will try to decide such things as how many failures it will take to convince us that a product is not as reliable as we expected. Many different statistical distributions will be useful in making such decisions. Some, like the one shown in our example, will be discrete distributions, having only integers (e.g., number of failures) as possible outcomes. Others will be continuous distributions, where an infinite delineation in outcomes is possible (for example, times between failures). We will sort these out in Section 3. 2.4 Testing Hypotheses One of the specialties of statistics is called hypothesis testing, which has some applications to reliability engineering that we will cover later in Sections 4, 5 and 7. It involves using statistical analysis to come to a conclusion to accept or reject a stated hypothesis with a known risk of being in error. A hypothesis might be that a product has a given mean time between failures (which can be one we want it to have or one we dont want). This is called the null hypothesis. With each null hypothesis is an associated alternate hypothesis. This is usually merely the negation of the null hypothesis. If the null hypothesis is that a product has a certain MTBF, the alternate hypothesis is that it does not have that MTBF. However, there are other ways. In sequential reliability testing (Section 5.3.1) the null hypothesis is that a product has a specific MTBF that is considered desirable and the alternate hypothesis is that it has a specific MTBF that is considered undesirable. In any event, each hypothesis test can have any of four results: Null hypothesis is true and is accepted. This is a correct result, which may be the desired result or not. If the null hypothesis is that a product has a poor MTBF, we would probably prefer it to be refuted, but whether we like it or not, we do want a correct conclusion. Null hypothesis is true and is rejected. This is called a Type I error in hypothesis testing. The probability of it occurring is conventionally called " ". If the null hypothesis is that the product has a parameter (e.g., MTBF, failure rate, number of defects, et al) which is acceptable, " " is the probability that we would conclude the product is not acceptable. In this case, the "producer's risk" defined in Section 5 and Section 7 would be equal to since it is the probability that a customer will not accept a good product. Alternate hypothesis is true and is accepted (null hypothesis rejected). This is another correct result, and whether we prefer the null hypothesis or the alternate hypothesis, we want a correct result. Alternate hypothesis is true and is rejected (null hypothesis accepted). This is called a Type II error and its probability of occurrence is conventionally called " ". The "consumer's risk" discussed in Sections 5 and 7 represents the probability of accepting a bad product, and is equal to when the rejected alternate hypothesis is that the product is bad (i.e., has an unacceptable MTBF, failure rate, number of defects, etc.).
12
STAT
Basically, the hypothesis test is performed by defining an acceptable value of " " and comparing sample data to a distribution representing the expected results when the null hypothesis is true. If the distribution would produce the sample data with a probability of or less, the data is considered sufficiently unlikely to have come from the distribution and the null hypothesis is rejected. Otherwise, the null hypothesis is accepted. Hypothesis tests can also be defined based on " " and on both risks, as we shall see in Sections 5 and 7. 2.5 For Further Study Some recommended references in probability and statistics, ranked by increasing level of difficulty, are: Introductory: Probability and Statistics, Stephen S. Willoughby, Silver Burdett Company, Morristown, NJ, 1968. The Cartoon Guide to Statistics, Larry Gonik & Woollcott Smith, HarperCollins Publishers, New York, NY, 1993.
Intermediate: Basic Statistics, M.J. Kiemele & S.R. Schmidt, Air Academy Press, Colorado Springs, CO, 1990. Statistical Methods in Engineering and Manufacturing, John E. Brown, Quality Press, Milwaukee, WI, 1990.
Advanced: Methods for Statistical Analysis of Reliability & Life Data, N.R. Mann, R.E. Schaefer & N.D. Singpurwalla, John Wiley & Sons, New York, NY, 1974. Military Handbook 338-1B, Electronic Reliability Design Handbook, U.S. Department of Defense, Washington, DC, 1997.
STAT
Section 3: Some Distributions and Their Uses
13
3.0
SOME DISTRIBUTIONS AND THEIR USES
Statistical analysis of practical problems often requires consideration of the distribution of the data. There are non-parametric methods that do not assume any distribution for the data. Analysis of a distribution is invariably advantageous, however, either because it is simpler or because it gives more precise results. This, of course, requires the assumption of a distribution for the data, and the usefulness of the results depends on the assumption being reasonable. No one distribution fits all data, and more than one distribution may describe a set of data, depending on the problem addressed. As an example, the distribution of the lengths of a lot of rods may follow a normal distribution, but the number of rods with lengths out of specified limits is described by a binomial distribution. So there are a variety of distributions we may find useful. 3.1 Discrete Distributions Discrete distributions are concerned with random variables which are integers. We will use these distributions to determine the probabilities that certain outputs will be experienced, such as the probability of no failures, of less than "x" failures or more than "y" failures, etc. The most useful discrete distributions are the binomial and the Poisson. The hypergeometric distribution is also of interest to reliability engineers. 3.1.1 The Binomial Distribution The binomial distribution, as the name implies, is concerned with "yes-no" outcomes. Either a person is over six feet tall or he is not. Of more interest, either a product has failed or it has not. It assumes that the probability of an event is the same in every trial. Then: f(x) = where: f(x) = probability of exactly "x" events occurring in "n" trials p = probability of a successful trial (success = the event happened) q = probability of an unsuccessful trial = (1 - p) is simply the probability of "x" successes and "n - x" failures in The product p times q "n" trials. (Note: success of a trial could be the occurrence of a product failure and failure of a trial the non-occurrence of a product failure, or vice-versa). However, there is more than one way for this to happen. There could be "x" successful trials followed by "n - x" failures, or one success, "n - x" failures, then "x - 1" successes, etc. Each of these is a mutually exclusive way of getting the result of interest. The term n! x! (n - x)! (3-2)
x (n - x)
n! p x q (n - x) x! (n - x)!
(3-1)
is a counting formula giving the total number of different ways one can have exactly "x" successes in "n" trials. Since each way is presumably equally likely and its probability of
14

x (n - x)
STAT
occurrence equal to p times q , multiplying the two expressions gives the total probability of exactly "x" successes in "n" trials. For example, to use Equation 3-1 to determine the probability of an aircraft with two engines getting through a flight with no engine failures, we can define p as the probability that a randomly selected engine will fail during a flight (hence, q = probability the engine will not fail during a flight). Let us assume that experience indicates that the frequency of engine failures is 10% per flight. Therefore, p = 0.10 and q = 0.90. Noting that n = number of engines = 2, and x = number of failures we are interested in = 0, Equation 3-1 becomes: P(0) = 2! 2 x1 (0.10) 0 (0.90) (2 - 0) = (1)(0.90) 2 = 0.81 0! (2 - 0)! 1 (2 x 1) (3-3)
Quite often we are interested in the probability of getting "x or less", or "x or more" successes, rather than merely "x". For example, in testing one-shot devices, such as missiles, we may want to know the probability of passing a test of 50 firings with no more than two failures allowed. Or, we may have six radios in a cockpit and only need four for a successful mission. We may then want to know the probability of having more than two failures during a mission. To answer such questions, we note that we can have from zero to "n" successes in "n" trials, and the probabilities of these events are mutually exclusive. Hence, the probability of "x or less" events is: P(x or less) =
x
n! p x q (n x) 0 x! (n - x)!
(3-4)
And the probability of "x or more" events is:
P(x or more) =
n! p x q (n x) x x! (n - x)!
(3-5)
As always, it is useful to remember: P(x or less) = 1 - P(x + 1 or more) (3-6)
To illustrate, suppose an airplane with two engines could still fly safely with one engine failed. The probability of a successful flight would then be equal to the probability of one or less failures occurring. Using Equation 3-4, and letting p = probability of engine failure = 0.10, as before, we get: 2! 2! (0.10) 0 (0.90) (2 - 0) + (0.10)1 (0.90) (2 - 1) 0! (2 - 0)! 1! (2 - 1)! 2 x1 2 x1 = (1)(0.90) 2 + (0.10)(0.90) = (0.81) + 2(0.09) = 0.81 + 0.18 = 0.98 1(2 x 1) 1(1) P(1 or less) =
(3-7)
STAT
15
The same result would have been obtained by finding the probability of two engine failures, using Equation 3-5 and subtracting this from one, per Equation 3-6, as the reader may wish to verify. 3.1.2 The Poisson Distribution The Poisson distribution can be considered an extension of the binomial distribution when "n" is infinite. It assumes events occur at a constant average rate. These events could be product failures, or roses sold in a flower shop, etc. The number of events occurring in any interval is independent of the number in any other interval (e.g., a recent failure does not make another failure more or less likely). Under these assumptions, we can determine an expected number of events in any given interval (e.g., the expected number of failures in a mission is the failure rate " " times the mission length "t"). If there is any doubt that events are occurring at a constant rate, the trend test in Section 4.3.1 may be used to test for constancy. Should a non-constant rate apply, the methods in Section 6, Reliability Growth Testing, may be more appropriate. When the assumptions of a constant failure rate and independent failures do hold, for an expected number of events "a" the probability of getting exactly "x" events is:
f(x) =
a x e a x!
t, the probability of zero failures is:
(3-8)
When a = P(0) = e -t
(3-9)
Equation 3-9 is a basic equation in reliability engineering. It calculates the probability of a product completing a mission of time "t" without failure (under the assumptions that is constant with time, and that failures are independent events). We can use the Poisson distribution to determine the probability of passing a test run for a fixed time with an allowable number of failures by the expression:
a x e a P(n or less) = x! 0
n
(3-10)
where: n = the number of failures allowed
The probability of failing the test would be:

P(n + 1 or more) = a x e a x! n +1
(3-11)
16
STAT
or more practically:
P(n + 1 or more) = 1 -
0
a x e a x!
(3-12)
It is not necessary to compute Equations 3-8, 3-10 or 3-11. Tables of Poisson probabilities (probability of "x" events when "a" are expected; i.e., solutions to Equation 3-8) have been tabulated. One such table is presented in Appendix A. Also tabulated are cumulative Poisson probabilities giving the probabilities of "x" or less events when "a" are expected (solutions to Equation 3-10), and even cumulative probabilities of "x" or more events (solutions to Equation 3-11). Appendix B provides a table with solutions to Equation 3-10. The reader may use Equation 3-12 to convert these to solutions of Equation 3-11. As an example, suppose we tested repairable products and would consider the product acceptable if no more than two failures occurred during the test. We assume the failure rate is constant. If the product had a failure rate ( ) and was tested for the test time (t), the expected number of failures for the product would be ( t). If we assume ( t) = 0.3, what is the probability of the product passing the test? The solution can be found from Equation 3-10, where (a) = 0.3, and (x) = 2. However it is easier to use the tables. Table 3-1 is extracted from Appendix A. Table 3-1: Extracts from Appendix A
Number of Failures (x): 0 1 2 Probability of (x) Failures When (a) = 0.3: 0.7408 0.2222 0.0333
Hence, the probability of passing the test (i.e., having 2 or less failures) = 0.7408 + 0.2222 + 0.0333 = 0.9963, which we could also have found directly using Appendix B. It may be useful to note that the Poisson and binomial are related in that the expected value "a" in the Poisson distribution can be expressed as the probability of an event "p" times the number of opportunities or trials "n", where "p" and "n" are terms of the binomial. For any given value of "a", as "p" approaches zero and "n" approaches infinity, the result of the binomial formula (Equation 3-1) approaches the result of the Poisson formula (Equation 3-8). 3.1.3 The Hypergeometric Distribution The binomial and Poisson distributions assume independent events. For example, the occurrence of a defective unit in a sample does not increase or decrease the probability that the next unit in the sample will be defective. Where the population being sampled is large in relation to the sample size, or when sample units are replaced after selection and can be drawn again, this assumption is reasonable. However, when the population is relatively small compared to the sample size (say no larger than 10 times the sample size) and units sampled are not replaced, the assumption may not be reasonable. For example, assume an office with six employees, two of
STAT
17
whom are women. Suppose we wish to select two at random for a survey of job satisfaction. The probability that the first selected will be a woman is two in six, or one-third. If the first selection is a woman, the probability of the second also being a woman is one in five, or onefifth. If the first selection were a man, the probability of the second being a woman would be two in five or two-fifths. Hence, the samples are not independent. To handle such situations, we can use the hypergeometric distribution. The distribution is essentially a combination of three counting formulas like that of Equation 3-2. The probability of "x" events happening in our sample (number of woman selected, number of green apples in a bag, etc.) is: D! (N - D)! x! (D - x)! (n - x)![(N - D) - (n - x)]! P(x) = N! n! (N - n)! where: N = number of units in the population being sampled D = number of units with the characteristic of interest (event happens) in the population being sampled n = number of units in the sample (sample size) x = number of events in the sample (units with the characteristic of interest) The numerator of Equation 3-13 is the number of ways we can select "x" units from the "D" units in the population that have the characteristic of interest times the number of ways we can select (n - x) of the (N - D) units without the characteristic of interest. The denominator is the number of ways we can select a sample of "n" units from a population of "N" units. For our office example, N = six people, D = two women, and n = two people selected for the survey. From Equation 3-13 the probability of selecting no women for the survey would be: 2! (6 - 2)! 2! 4! 4! 4 * 3 * 2 *1 0! (2 - 0)! (2 - 0)![(6 - 2) - (2 - 0)]! 2! 2![4 - 2]! 2 * 1 * 2 *1 P(0) = = = 2! 2! = 6! 6! 6! 6 * 5 * 4 * 3 * 2 *1 2! (6 - 2)! 2! (4)! 2! 4! 2 *1 * 4 * 3 * 2 *1 24 6 (3-14) = 4 = = 0.4 720 15 48 Four times out of ten we will conduct the survey with no women in the sample.
(3-13)
18
STAT
3.2 Continuous Distributions Suppose we wanted to describe the height of male adults in a community. We could measure the heights of a random sample of adult males (it would not do to measure only basketball players, for example). After enough statistics had been collected, we would have to organize the data in a meaningful format, and might elect to use a histogram as shown in Figure 3-1.
Figure 3-1: Distribution of Heights If our sample size increases, and our measurements become more precise, our histogram will become more like Figure 3-2.
Figure 3-2: More Detailed Distribution of Heights As our sample size approaches infinity and our measurement intervals become infinitely small, the histogram approaches a smooth curve, as shown in Figure 3-3.
STAT
19
Figure 3-3: Continuous Distribution Curve for Height While many continuous distributions exist, the distribution of heights, and many other parameters, usually follow a symmetrical bell-shaped curve called the Normal or Gaussian distribution (to be discussed in Section 3.2.1). Other distributions of interest in reliability engineering, such as distributions of times to failure, are usually not symmetrical. In either case, we shall be dealing with probability distributions, meaning that the distribution is normalized so that the total area under the curve is equal to one. The utility of this is that we can then calculate the probability of a measurement being in any given range of values. For symmetrical continuous distributions such as the normal, the mean and standard deviation (Equation 2-1) of the distribution provide all the information we need. More generally, for example in considering distributions of times to failure, we shall be concerned with three parameters: The probability density function, designated by f(x), is the height of the curve at the value "x". This in itself is of little value, except as the foundation for the other two parameters. The cumulative density function, designated by F(x), is the integral of f(x). This provides the area under the curve from minus infinity to x. For example, let f(x) = f(t), the distribution of times to failures. Then F(t) = proportion of products which will fail before time = t (see Section 3.2.2.2 for an example). The reliability function, designated by R(t), is the proportion of products which have not failed in time = t, when f(t) is a distribution of failure times. It is simply 1 - F(t). This is equivalent to the probability that a randomly selected product will operate without failure for time = t.
3.2.1 The Normal Distribution While many continuous distributions are of interest for specific applications, the normal distribution is undoubtedly the most useful overall. Many parameters of interest, such as the physical dimensions of a product, can be described by it. In addition, the binomial distribution can be approximated by a normal distribution when the number of binomial trials (n) is high (30 or more). Since the Poisson becomes approximately equal to the binomial when the number of trials (n) is high and the probability of an event (p) is low, it can also be approximated by the normal distribution under these conditions. A particularly important characteristic of the normal
20
STAT
distribution is that it applies to data from samples, even when the population sampled is not normally distributed (see Section 3.2.1.2). A normal distribution is symmetrical; the mean, median and mode (see Section 2.1) are identical, and the distribution theoretically extends from minus infinity to plus infinity. The latter obviously could not apply to a distribution of heights, but the probability of a value being far in the tails of the curve is so small that it may be neglected, making the fit adequate. The normal distribution is completely defined by its mean and standard deviation. The percent of the population of interest under any portion of the curve can be determined from the mean and standard deviation. For example, the percent of adult males who are over six feet tall. Or, of more interest to reliability engineers, the percent of a product which falls outside specified limits. Since the mean and standard distribution of a population are not usually directly measurable, they are estimated from sample data. The mean of the sampled data is the estimated distribution mean. The standard deviation of the distribution can be estimated from the standard deviation of the sample (Equation 2-1). However, when the number of samples is low, this gives a biased estimate. In such cases, the population standard distribution is estimated from the unbiased expression:
(x - x) n -1
2
S =
(3-15)
Since both the mean and the standard distribution can take any value, there can be an infinite number of normal distributions. However, analysis is facilitated by converting data of interest to a standard normal curve, for which the relation of the area under the curve to the distance from the mean (in standard deviations) is available in tables, such as the one presented in Appendix C. 3.2.1.1 The Standard Normal Distribution The standard normal distribution is one in which the mean is zero and the standard deviation is 1.0. Data is converted from the actual distribution to the standard normal by the formula:
z = x-
(3-16)
where: = = = = the mean of the distribution being converted the standard deviation of the distribution being converted a data point from the distribution being converted the corresponding data point on the standard normal distribution
x z
STAT
21
An abbreviated tabulation of data (extracted from Appendix C) for a standard normal curve is given in Table 3-2. Figure 3-4 shows the area between zero and z quantified in column two of Table 3-2. Table 3-2: Standard Normal Distribution Data
z 0.5 1.0 1.5 2.0 3.0 Area Between 0 and z (or Between 0 and -z) 0.1915 0.3413 0.4332 0.4772 0.4987 0.5000 Area Between -z and z 0.3830 0.6826 0.8664 0.9544 0.9974 1.000
0 z Figure 3-4: Standard Normal Distribution
Suppose we estimate the mean length of a rod at one inch with a standard deviation of 0.001 inch, following a normal distribution, and want to find the proportion of rods that are more than 1.002 inches. From Equation 3-16:
z = 1.002 - 1.0 = 2 0.001
(3-17)
Where z is the point on the standard normal distribution equivalent to 1.002 inches. The solution to our problem is simply the area under a standard normal distribution from z = 2 to . Table 3-2 does not directly list this area. However, we can calculate it by subtracting the areas we dont want from 1.0, the total area under the curve. The areas we dont want are the area between - and zero (0.5000 from Table 3-2) and the area from zero to 2.0 (0.4772). Area wanted = 1.0 - (0.5000 + 0.4772) = 1 - 0.9772 = 0.0228 Thus, 0.0228 or 2.28% of the rods we produce will be longer than 1.002 inches. (3-18)
22
STAT
If we specified that the rods made should be one inch long plus or minus 0.002 inches, the proportion of rods "in spec" would be given by the area under the standard normal from z = -2 to +2. From the table, 0.9544 or 95.44% of the rods will meet the specified tolerance. Table 3-3 is another set of values for the standard normal distribution, but one that solves for z given areas of interest, rather than one which solves for areas given values of z, like Table 3-2 or Appendix C. This table defines "critical values" of (z) marking the ends of some specific areas of the standard normal distribution that are often used in determining confidence intervals in measuring and demonstrating values of parameters that conform to a normal distribution. We shall use this table in Section 7. Table 3-3: Critical Values of z
Critical Value of z 1.28 1.645 1.96 2.33 2.58 Area Between z and -z 0.80 0.90 0.95 0.98 0.99 Area From z to 0.10 0.05 0.025 0.01 0.005
3.2.1.2 The Normal Distributions Role in Sampling One of the reasons for the usefulness of the normal distribution is that means of samples from any distribution with a finite mean and variance, if the sample is large enough, fit well to a normal distribution whose mean is the mean of the samples and is equal to the mean of the parent distribution. For example, the times to failure of a rod under stress may follow a Weibull distribution (to be discussed later). The mean time to failure of the rods may be estimated from a sample. If many samples are taken, the means of the samples will follow a normal distribution with its mean the same value as the mean time to failure of the parent population of rods. In addition, the standard deviation of the distribution of sample means is equal to the standard deviation of the parent distribution divided by the square root of the sample size ( / n ). This phenomena, called the central limit theorem, means the normal distribution can be applied to samples from almost any distribution to provide information on the parent distribution. We shall see this in action later. 3.2.2 Various Other Useful Distributions, In Brief 3.2.2.1 The Lognormal The lognormal distribution is one in which the logarithms of measurements from a population are distributed normally. The distribution of the measurements themselves will be skewed to the right. The lognormal is descriptive of such things as the time it takes to get to work or the time to repair a failure. In these cases, there are often circumstances which lengthen the time (accidents on the road, troubleshooting difficulties, etc.), but few opportunities to shorten the time significantly. The lognormal is handled by taking the logarithms of the measurements of interest, and analyzing the resultant normal distribution as discussed in Section 3.2.1. Since there is a one-to-
STAT
23
one transformation of the points on the lognormal to the points on the normal, we can translate the results back to the original data. For example, if a measured value from a lognormal distribution of repair times "y" = 20 minutes, then the corresponding value on a normal distribution "x" = ln 20 = 2.9957. If analysis (which probably involves further transformation of the data to the standard normal and back, as discussed in Section 3.2.1.1) finds that 90% of the area of the normal is below x = 2.9957, then 90% of the area on the lognormal (i.e., 90% of x 2.9957 repair times) will be below y = e = e = 20 minutes. A practical application is found in MIL-HDBK-470A, Designing and Developing Maintainable Products and Systems, which includes maintainability demonstration tests based on lognormal distributions of repair time. 3.2.2.2 The Exponential This distribution describes time to failures of a repairable system, when the failure rate is reasonably constant with time. Its probability density function is: f(t) = e -t (3-19)
resulting in a cumulative density function (percent of units failed at time = t, or probability of a unit failing by time = t) of: F(t) = 1 - e -t
-t
(3-20)
Since the probability of a unit not failing at time t = R(t) = 1 - F(t), R(t) = e , which is the widely used reliability expression encountered earlier in Equation 3-9. 3.2.2.3 The Weibull When a constant failure rate cannot be assumed, the Weibull is often the distribution of choice, because it can accommodate increasing, decreasing and constant failure rates. Weibull analysis assumes no repair of failed units. We will become more familiar with this distribution in Section 4. 3.2.2.4 The Student t Using the central limit theorem (see Section 3.2.1.2), one must assume a large sample size and that , the standard deviation of the parent population, is known. When the sample size is small, and is estimated from the sample using the unbiased estimator of Equation 3-15, the normal distribution does not really apply. Instead of the standard normal random variable: z = x/ n (3-21)
24
STAT
where: x = = = = = a sample measurement mean of the sample measurements (and of the parent population) standard deviation of the parent population sample size standard deviation of the sample measurements (the distribution being converted to the standard normal)
/ n
we have: t = where: S S/ n = the estimated standard deviation of the parent population (per Equation 3-15) = the standard deviation of the sample measurements other terms as above xS/ n (3-22)
There is a family of "t" distributions, one for each value of "n". As "n" increases, values of "t" become close to the values of "z". Appendix E provides a tabulation of Student t distribution data. To compare the two distributions, we will compute the values of "z" and "t" which cover 95% of the area under the curves starting from minus infinity (i.e., only 5% of the area is excluded). From Appendix C, we note that the area from the mean (z = 0) to "z" is 0.45 when "z" is between 1.6 and 1.7. We will interpolate this to z = 1.65. Hence, as discussed in the text above the table in Appendix C, the area in the tail of the curve from 1.65 to = 1 - 0.5 - 0.45 = 0.05. Hence, when z = 1.65, 95% of the area under the curve is between - and z. Since "z" is measured in standard deviations, 95% of the curve is below +1.65 standard deviations from the mean. The tabulated Student t in Appendix E gives the values of "t" for defined areas from - to "t", directly. However, the values differ with "degrees of freedom", roughly representing the amount of information available in a sample and equal to the sample size minus one. For a sample of ten (nine degrees of freedom), the value of "t" marking the edge of 95% of the area under the curve is 1.833. Since "t" is also measured in standard deviations, 90% of the curve is below +1.833 standard deviations from the mean. This shows that the Student t distribution is wider than the standard normal for relatively small samples. However, as sample size increases, the value of "t" approaches the value of "z" for the same area under the curve, as the reader can verify by comparing values derived from Appendix C (the Standard Normal distribution) to those given in Appendix E (the Student t distribution). For our example, the value of "t" for 95% goes
STAT
25
to a limiting value of 1.645 as sample size increases, which agrees with our interpolated value of z = 1.65. 3.2.2.5 The F Distribution The F distribution describes the ratio of the variances of two independent samples. It is a family of distributions, dependent on the sample sizes, and is used to test whether or not the samples are from the same population. When the samples are from the same population, the value of the F statistic will be distributed about 1.0. By measuring the value of F derived from two samples and determining the probability that the value measured would occur if the samples were from the same population, we can accept or reject the hypothesis that the samples are from the same population within a specified risk of error. Appendix F is a tabulation of "critical values" of the F distribution. These are the values of F which cannot be exceeded if we are to conclude there is no difference in the variance of our samples, for stated risks of error. We will discuss this in Section 8. 3.2.2.6 The Chi-Square Distribution The Chi-square distribution describes the relation of the true mean of a population to the mean of a sample. It is also a family of distributions, dependent on sample size. It may be used to determine the confidence limits around a measured failure rate or to determine if a failure rate is stationary (i.e., does not change with time). We will use the Chi-square distribution in Section 4. 3.3 In Summary Table 3-4 presents a brief summary of the distributions discussed in this section, by type and use in reliability engineering. Table 3-4: Summary of Distributions
Distribution Binomial Poisson Hypergeometric Normal Standard Normal Lognormal Exponential Weibull Student t F distribution Chi-square Type Discrete Discrete Discrete Continuous Continuous Continuous Continuous Continuous Continuous Continuous Continuous Main Uses in Reliability Engineering Finding probability of "x" events in "n" trials Finding probability of "x" events when "a" are expected Replaces binomial for samples from small populations, when samples are not replaced Describes many parameters, including mean values of samples from any distribution with a finite mean and variance All normal distributions can be converted to the standard normal for ease of analysis Describes some parameters of interest, such as repair times Describes distribution of failures when failure rate is constant Describes distribution of failures for constant or changing failure rate, no repair Replaces standard normal for small samples Testing for significance of differences in the variances of two samples Estimating confidence intervals, and testing for a constant failure rate
26
STAT
STAT
Section 4: Measuring Reliability
27
4.0
MEASURING RELIABILITY
To select the statistical tools useful in measuring reliability, we have to be more specific about what we are measuring. Are we concerned with the reliability of products which are discarded on failure or of products repaired on failure? Do we know if the reliability is constant, decreasing or increasing? Is reliability better expressed as a probability (e.g., of a successful missile launch), as an expected life (mean time to failure), or as a frequency of failures (failure rate or its reciprocal, mean time between failures)? The answers to these questions determine the statistical tools of interest. We shall first consider some general principles. 4.1 General Principles We can crudely measure reliability from test data by dividing the number of failures seen by the total hours of operation of our test sample (for a failure rate) or the number of products tested (for a probability of failure). However, these are rarely sufficient. Most often, we need more information, and can get it by finding (or assuming) a failure distribution and determining its parameters.
f(x)
F(x) =
0 f(x )dx
x
Probability Density Function
Time To Failure
Figure 4-1: Probability of Failure as Represented by the Area Under the Probability Density Function Figure 4-1 represents a probability density function, showing the relative probability of a random variable occurring, in this case a failure, plotted against time. The area under the curve is unity. The curve may be defined by the parameter f(t) which describes its height against time. From f(t) we can obtain three useful measurements: 1. The integral f(t) = the percent of all failures occurring between t1 and t2.
t1 t2
When t1 = 0, the integral is the cumulative density function, F(t), defined as the percent of the population failed in the interval ending at t = t2. It is also the probability that any given unit will fail in a mission of length t2. Note that the integral of f(t) from minus infinity to plus infinity (or from zero to plus infinity) equals unity.
28
STAT
2. The reliability function, R(t), the probability of a unit not failing in a given period of time (t), is simply 1 - F(t). 3. The failure rate in an interval from t1 to t2 is given by:
R(t 2 ) - R(t1 ) R(t1 )(t 2 - t1 )

where the numerator represents the proportion of a population failed in the interval, and the denominator represents the proportion surviving at the start of the interval and the length of the interval. Using the relationships between F(t), f(t) and R(t), it can be shown that as the interval becomes infinitely small it becomes: f(t) f(t) or R(t) 1 - F(t) This defines an instantaneous failure rate at the instant "t". This is also occasionally called the hazard rate, force of mortality, or ROCOF (Aschers acronym for the rate of occurrence of failures for a repairable system). With these concepts understood, we are ready to discuss measuring reliability. We shall first consider a useful tool when repair is not a consideration: the Weibull distribution. 4.2 The Versatile Weibull Distribution When we are interested in the reliability of a part, or a non-repairable assembly, or the first failure of a repairable assembly (e.g., an automobile drive chain), we can often make use of a versatile statistical distribution invented in 1937 by Waloddi Weibull, which can describe constant, increasing or decreasing failure rates. It is often used to describe the reliability of mechanical items that are subject to a wearout failure mechanism. The Weibull Probability Density Function is described by the formula:
f(t) =
where:
-1
e - (t/ )
(4-1)
= characteristic life (> 0) = shape parameter (> 0) = time
The reliability function (surviving portion of the population at time = t) would be: R = e - (t/ ) (4-2)
STAT
29
Instead of time, "t" could represent cycles, miles, or any other parameter appropriate to the failure mechanism of interest. Although it has been used to model mixed failure modes, such as infant mortality due to defects, the Weibull is really a model for a single failure mechanism. Equations 4-1 and 4-2 apply to the two-parameter Weibull. There is also a three-parameter Weibull, obtained by replacing "t" in Equations 4-1 and 4-2 with "t - t0". This is used in the case where the failure mechanism cannot result in a failure until a certain time "t0" is reached. An example might be cracks in a crystal which do not grow large enough to cause a failure before t0 is reached. The two-parameter form of the Weibull is more common than the three-parameter form. When = 1.0, the Weibull formula reduces to the exponential formula, showing a constant failure rate whose reciprocal, the MTBF, equals . A value of > 1 indicates an increasing failure rate (i.e., wearout), and < 1 shows a decreasing failure rate (i.e., infant mortality). It is possible for the failure distribution of a product to be described by three different Weibull functions at different times in its life: first by a Weibull function with < 1 reflecting improved failure rates as initial quality defects are eliminated; then by a Weibull function with = 1 reflecting a relatively constant failure rate during the products useful life; and finally by a Weibull with > 1 as wearout mechanisms act to increase failure rate with time. Test data following a Weibull distribution will plot as a straight line when the cumulative percent failed is plotted against time to failure on special graph paper. On this paper the X-axis is scaled as the natural logarithm of time. The Y-axis is scaled as: Y = ln ln (1/1 - F(t)) where: F(t) = the estimated percent of the population failed Weibull analysis paper is designed to permit the values of and to be found by graphical methods. Essentially, the value of is that point on the time axis marking the point where the plot crosses the 63rd percentile on the Y-axis (actually, 63.2% cumulative failures). The slope of the plot, , can be calculated by: (4-3)
=
where:
ln ln (1/1 - F2 ) - ln ln (1/1 - F1 ) Y = ln t 2 - ln t1 X
(4-4)
F2 = the percentage failed at time = t2 F1 = the percent failed at t1 To determine the parameters of a Weibull distribution, therefore, one needs to plot cumulative percent failures against time on Weibull analysis paper and either use the graphical
30
STAT
solutions or the calculations described above. One could even forgo the graph paper by knowing the times corresponding to two percentiles. From this, one could calculate . With and the known time for any cumulative percent failed, the Weibull reliability function (Equation 4-2) could be used to solve for . The use of the graph is easier, and provides some verification that the data is indeed Weibull. If the plot is not a straight line, something is amiss. However, there are some subtleties involved in determining the cumulative percent failed. The most obvious way to estimate the cumulative percent failed is to divide the number of failures by the sample size. If we had ten samples and three failed, the time of the third failure would represent 30% cumulative failures. This, however, is considered a biased estimator. (Consider a sample of one: its time of failure would be counted as the 100th percentile, but, intuitively, one would expect the failure of one sample to better represent the 50th percentile.) Accordingly, there are various schemes to determine the cumulative percent failure represented by each failure in a sample. One is to determine the "median rank" by Bernards formula: Median rank (cumulative percent failed) = where: I = rank order of a given failure (I = 1 for the shortest time to failure, etc.) N = sample size Also, it should be noted that the rank order is determined by the operating times to failure on the individual units, not the sum of times among the samples, or any measure of calendar time. The unit that has accrued the lowest operating time when it fails is the first failure in the rank order, regardless of the operating time on the other units or when it went on test. To illustrate these points, we will use an example from RAC Publication NPS, Mechanical Applications in Reliability Engineering. Let us assume we have tested six items to failure and have measured the life to failure in terms of operating cycles, with the following results shown in Table 4-1. Table 4-1: Life Data
Part Number 1 2 3 4 5 6 Life (10 Cycles) 6.6 1.3 4.0 2.7 5.2 9.8
5
I - 0.3 x 100 N + 0.4
(4-5)
The first step would be to rank order the data as shown in Table 4-2.
STAT
31
Table 4-2: Ordered Data

Rank 1 2 3 4 5 6 Part Number 2 4 3 5 1 6 Life (10 Cycles) 1.3 2.7 4.0 5.2 6.6 9.8
5
Next, we need to determine the median rank of each failure by Bernards formula, to estimate the cumulative percent failure it represents. For part number 2, the first in rank:
Median rank = 1 - 0.3 0.7 x 100 = x 100 = 10.91% 6 + 0.4 6.4
(4-6)
After computing the median rank for all six parts, we have the data shown in Table 4-3. Table 4-3: Completed Data Table
Rank 1 2 3 4 5 6 Part Number 2 4 3 5 1 6 Median Rank (%) Life (10 Cycles) 10.91 1.3 26.55 2.7 42.18 4.0 57.82 5.2 73.45 6.6 89.09 9.8
5
We now have all the information we need to plot the data on Weibull analysis paper as shown in Figure 4-2. On this Weibull paper, a line parallel to the plot is drawn from the circle to the arc at the left edge at the 60% failure line. The slope of the curve is read from the scale on the arc at the point of intersection. This is equal to in the Weibull reliability function. In this case, is about 1.5. The characteristic life, , is found by moving horizontally from the circle (across the 63.2 percentile) to the plot, and then down to the corresponding number of cycles, here about 5 5.8 x 10 . A Weibull reliability function can be used for many purposes, such as to determine an effective burn-in time, calculate the probability of success for a mission, determine the expected number of spare parts needed for a set of products, help establish appropriate warranty terms, etc. Weibull analysis paper is available commercially. Two sources are: Team Graph Papers, Box 25, Tamworth, NH 03886 (Tel: 603 323-8843), and Chartwell Technical Papers, H.W. Peel & Co., Jeymer Drive, Greenford, Middlesex, England (Tel: 01-578-6861). There are also PC software packages for Weibull analysis. These include SuperSMITH (TM) by Fulton Findings (on the world wide web at http://www.weibullnews.com) and Weibull++ (TM) by Reliasoft (on the web at http://www.Weibull.com).
32
STAT
6 .0 4.0 3.0
1.4 1. 2
99.% 95.% 90.% 80.% 70.% 60.% 50.% 40.% PERCENT FAILURE 30.% 20.%
2. 0
WEIBULL SLOPE
1.
.8
0
.7 .6 .5
10.%
5.0% 4.0.% 3.0% 2.0%
1.0%
5 6 7 8 91
5 6 7 8 91 x 105
x 104
Figure 4-2: Weibull Plot 4.2.1 Caveats Weibull analysis is not always as easy as the example shown. The data may not plot as a straight line on Weibull paper, for many reasons. First of all, the data may not be distributed according to the Weibull distribution. Some data are better described by the lognormal, for example, a distribution which is not convertible into a Weibull (it is handled by converting it to a
STAT
33
normal, as discussed in Section 3.2.2.1). There may be suspensions, which are parts on test which are removed from test before failing or which fail from mechanisms other than the primary one. There may be more than one significant failure mode. Test data may not include all time on the parts (i.e., time zero is some unknown time before the test starts). Ways to handle these situations are discussed in the New Weibull Handbook, by Dr. Robert B. Abernethy. This authoritative reference is available from the Reliability Analysis Center under RAC order code WHDK. 4.3 Measuring Reliability of Repairable Systems Our failure data may not represent the times to failure of a number of parts, but rather the times between failures of a system which is repaired after each failure. For such a system, it is often assumed that the failures occur at a reasonably constant rate, which has many advantages in analysis. This assumption is reasonable in that a system is a collection of parts with differing failure mechanisms, which, in aggregate, can appear to fail at a constant rate. This is referred to as a stationary stochastic process ("stochastic" meaning involving a random variable, and "stationary" meaning that the expected number of failures in a given interval will be the same regardless of the age of the system). It is also called a homogeneous Poisson process (HPP) because the number of failures in an interval follows a Poisson distribution that is "homogeneous" or stationary (i.e., unchanging with time). However, there are also nonhomogeneous Poisson processes (NHPPs) which describe systems where the number of failures in an interval may follow a Poisson distribution, but the distribution will change with time, because the expected number of failures in a given interval depends on the system age at the start of the interval. There are practical cases where reliability improves with time and also practical cases where reliability degrades with time. We will discuss this further later, but it seems reasonable to start this section with some statistical tools for determining what kind of process we have. 4.3.1 Testing for Trends To illustrate the tool we will be discussing, we shall use some contrived data, borrowed from the text Repairable Systems Reliability, by Harold Ascher and Harry Feingold (Marcel Dekker Inc., 1984). The data compares three systems, each having seven failures as shown in Table 4-4. Table 4-4: Failure Data
Failure Number 1 2 3 4 5 6 7 Time of Failure in System A 15 42 74 117 168 233 410 Time of Failure in System B 177 242 293 336 368 395 410 Time of Failure in System C 51 94 121 298 313 378 410
The data represents seven failures, which in system A arrive at intervals increasing with time, in system B arrive in intervals decreasing with time, and in system C arrive (pseudo)
34
STAT
randomly. In each case the seven intervals between failures are identical except for the order of occurrence. In system A, for example, the first failure occurs 15 hours after turn-on, and the last occurs 177 hours after the second to last, while in system B, the first failure occurs 177 hours after turn-on, and the last occurs 15 hours after the penultimate. It is obvious that system A has an increasing time between failures, or a decreasing rate of occurrence of failure (ROCOF), and that system B has an increasing ROCOF. But real data is not so obligingly obvious. Hence, we find useful a statistical measure invented by Laplace. When our data is failure truncated (i.e., the records end at the occurrence of the last failure), as it is in the data given in Table 4-4, the statistic is calculated by the formula:
n -1 (t i ) /(n - 1) - t n /2 i =1 U = t n 1/[12(n - 1)]

th
(4-7)
where "n" is the number of failures, "ti" is the time of the "i " failure in order of occurrence, and "tn" is the last failure. The result, "U", will be zero for perfectly random data, positive for increasing intervals between failures (ROCOF decreasing with time) and negative for decreasing intervals (increasing ROCOF). For time truncated data (i.e., the records end at a given time while the system is in a nonfailed state):
n (t i ) /(n - 1) - T/2 i =1 U = T 1/[12(n)]

where: T = the time at the end of the data
(4-8)
The statistic approximates the distance (in standard deviations) that the data differs from the mean of a standard normal distribution. A system with no changes in ROCOF would have an expected value of zero, the mean of the standard normal distribution. Statistical variation results in it taking other values, with probability decreasing proportionally with the distance from zero. Applying the principles of hypothesis testing, discussed in Section 2.4, we can reject the hypothesis that there is no trend when the statistic calculates a value whose probability is satisfactorily small. "Satisfactorily small" can be defined quantitatively as an acceptable risk, i.e., a probability of making a wrong decision that is small enough to satisfy the decision maker. This probability can be directly translated into the distance from the mean of the standard normal distribution. As discussed in Section 2, we simply determine the area of the distribution which falls outside a given distance from the mean. This area is the risk of error (in this case the probability that a system with no trend would produce a data set falling in the tail of the curve cut off by the value
STAT
35
calculated for the distance from the mean). We can define a "critical value" for the Laplace statistic which represents the value the statistic must exceed for a risk of error satisfactory to us. Some common critical values are shown in Table 4-5. Table 4-5 relates to Appendix C in that "U" corresponds to "Z" and the probability of error corresponds to the tails of the standard normal distribution (the area under the curve outside of the range -Z to +Z). Table 4-5: Critical Values for the Laplace Statistic
Critical Value of U (Absolute Value) 3.09 2.576 2.326 1.960 1.645 1.282 Probability of Error (%) 0.2 1.0 2.0 5.0 10.0 20.0
For example, there is only 0.2% of the standard normal distribution outside the limits of 3.09, so if the absolute value of "U" exceeds 3.09, we can assume there is a trend with only a 0.002 probability of error. Further, if the value exceeds +3.09, we can assume a trend for increasing ROCOF (reliability decreasing) with only 0.1% error, and a value less than -3.09 permits assuming an improving trend with only a 0.1% risk, because only one tail would be outside the limit. The refuted hypothesis would be that there is no trend in the direction of our conclusion. Applying Equation 4-7 to the data in Table 4-4 for system A:
15 + 42 + 74 + 117 + 168 + 233 410 6 2 = - 2.0 U = 410 1 /(12 x 6)
(4-9)
The negative result indicates a decreasing ROCOF (improving reliability). The value of U exceeds the critical value given in Table 4-5 for 5% risk (meaning that there is less than a 5% probability that there is really no trend), and indicates we can accept that reliability is improving with only a 2.5% risk. Doing the analysis for the data of system B:
177 + 242 + 293 + 336 + 368 + 395 410 6 2 = + 2.0 U = 410 1 /(12 x 6)
(4-10)
Which is the same result as Equation 4-9, except for sign, which indicates a deteriorating reliability to a 97.5% confidence (1 - 0.025 risk).
36
STAT
Repeating the analysis using the data for system C:
51 + 94 + 121 + 298 + 313 + 378 410 6 2 = + 0.086 U = 410 1 /(12 x 6)
(4-11)
Which does not permit the rejection of the hypothesis that there is no trend even at a 20% risk of error (U = 1.282 for a 20% risk, as shown in Table 4-5). We can come up with an estimate of the risk of error by using Appendix C. In the Appendix C, for z = 0.1, the lowest figure listed, the area from 0 to z = 0.0398. Using a linear extrapolation, for z = 0.086, the area from 0 to z would be (0.086/0.1) x 0.0398 = 0.034. Since the area in the upper tail is one minus 0.05 minus the area from 0 to z, as explained in the text of Appendix C, it is equal to 1 - 0.5 - 0.034 = 0.466. The area in both tails would be twice this or 2 x 0.466 = 0.932. Hence, to reject the hypothesis that there is no trend, we would have to be willing to accept a 0.932 probability of error. A constant ROCOF is indicated. 4.3.2 Confidence Limits when the Failure Rate is Constant If we assume a constant failure rate, the mean time between failures (MTBF) of a product can be simply determined from the total operating time of all the samples (i.e., the sum of the operating times accrued by each unit tested) divided by the total number of failures. For example, 100 hours of operation during which one failure occurred would give us an MTBF of 100 hours. So would 1,000 hours of operating time with 10 failures. Which data set would you prefer as a measure of your product? It should be obvious that the risk in making conclusions based on one failure is greater than the risk in making conclusions from many failures. Put another way, we have more confidence in conclusions reached from extensive data than in conclusions reached from skimpy data. Statistical methods quantify confidence by defining it as the probability of being correct when we state the MTBF of a product is within a given range of values, called the confidence interval. The values marking the limits of the interval are called confidence limits. The confidence interval can be two-sided (e.g., between confidence limits of 100 and 200 hours), or one-sided (e.g., extending from a confidence limit of 100 hours to infinity). As might be expected, the wider the confidence interval, the greater the confidence provided by a given set of data. For example, we have a greater probability of being correct (i.e., confidence) that a products true MTBF has a one-sided confidence limit of 100 hours than we would have in a confidence limit of 200 hours, for any given set of data. Risk is defined as the probability that the true MTBF of the product is outside the confidence interval, and is equal to one minus the confidence. If we have a 10% probability of error, we have a 10% risk and a 90% confidence. To determine the width of the confidence interval for any given value of confidence, we need to relate the confidence interval to a portion of a probability distribution curve. Because we have assumed the constant failure rate, the times to failure are distributed in accordance with an
STAT
37
exponential distribution, and the probability of no failure in any given time is calculated by the familiar reliability function:
R = e -(t/MTBF)
(4-12)
The reliability function is the portion of the failure probability distribution function extending from t to infinity, and represents the portion of a population still operating after time t or the probability that a given unit will not fail between time zero and t. However, the distribution we need for determining confidence intervals is not the distribution of failures itself, but a dependent distribution describing the relation between the measured and the true MTBF of the product. When the exponential distribution of failures holds, the relation between the measured and the true MTBF of the product is described by a distribution called the chi-square. In actuality, the chi-square is a family of distributions, each member determined by a function of the number of failures recorded, called the degrees of freedom (roughly, degrees of freedom represents the amount of information at hand, which is a function of the number of failures, the exact function dependent on circumstances, as will be explained). Table 4-6 is an abbreviated chi-square table; a more extensive table is provided in Appendix D. Table 4-6: Chi-Square Values
Degrees of Freedom 2 3 4 5 10 20 22 30 95% 0.103 0.352 0.711 1.145 3.940 10.851 12.338 18.493 Chi-Square Value at 90% 10% 0.211 4.605 0.584 6.251 1.064 7.779 1.610 9.236 4.865 15.987 12.443 28.412 14.041 30.813 20.599 40.256 5% 5.991 7.815 9.448 11.070 18.307 31.410 33.924 43.773
In Table 4-6 and Appendix D, = the selected risk (one minus the specified confidence), or the area in the tail of the curve from the listed values to infinity. The reader may have noted that we have used different ways of presenting statistical tables, when a more consistent format could have been used. This lack of consistency is intended to help the reader understand and deal with statistical tables in many different forms, since there is no standard format used by all references. To use Table 4-6, we select the chi-square values for the appropriate degrees of freedom based on the number of failures accrued and whether the test was time-truncated or failuretruncated (to be explained later). From the selected values, the total test time and the specified risk, we can calculate the confidence, whether for a one-sided limit or for a two-sided interval. A time-truncated test is one which ends with a certain amount of time accrued, either by reaching a pre-designated limit or by simply ceasing to test. The most common scenario is that of a group of equipment on test for a period determined by the willingness of management to tie up the test units.
38
STAT
A failure-truncated test is one which ends at a given number of failures, the most common manifestation being the test of a number of units until all fail. Another way is the test of one unit, repaired after each failure, until a certain number of failures occur. The one-sided confidence limit for a set of test data from a time-truncated test is calculated from the formula: MTBF = where: T = total test time n = total failures accrued = acceptable risk (1 - desired confidence) For example, if we test for 100 hours and have one failure, we have 2(1) + 2 = 4 degrees of freedom. For a 90% confidence, the risk () would be (1 - 0.90) or 0.10. The value listed in Table 4-6 for ten percent risk and four degrees of freedom is 7.779. Plugging this value into the formula we have:
MTBF = 2(100) 200 = = 25 hours 7.779 7.779
2T (Chi - square value for 2n + 2 degrees of freedom at percentile )
(4-13)
(4-14)
Hence, we would be 90% confident that the "true" MTBF is actually 25 hours or more. If, instead, we had 1,000 hours of test time and 10 failures, we would have 2(10) + 2 = 22 degrees of freedom, and our 90% confidence limit is found by:
MTBF = 2(1,000) 2,000 = = 65 hours 30.813 30.813
(4-15)
Hence, we would be 90% confident that the "true" MTBF is 65 hours or more. If our test data were failure-truncated, rather than time-truncated, we would use the following formula to determine the one-sided confidence limit: MTBF = where: T = total test time N = total failures accrued = acceptable risk (1 - desired confidence) 2T (Chi - square value for 2n degrees of freedom at percentile ) (4-16)
STAT
39
This formula is identical to the previous one, except that the degrees of freedom are reduced, resulting in a slightly higher MTBF. The degrees of freedom is less because we have less information; there is no operating time after the last failure as there was in the time-truncated case. For the one failure in 100 hours scenario:
MTBF = 2(100) 200 = = 46 hours at 90% confidence , 4.605 4.605
(4-17)
as opposed to 25 hours for the time-truncated test. These formulas essentially cut off a portion of a probability density function equal to the desired risk. In our examples, we have found the value of MTBF below which there is only 10% probability that the true MTBF would fall. To calculate a two-sided confidence interval, we would cut off portions of a density function at both ends. The lower end is determined by the same formulas as the one-sided limits, except that the chi-square value for /2 is used. (If we want a 90% confidence in a two-sided confidence interval, we want to lop off 5% of the distribution at each end, rather than 10% of the distribution at one end.) The upper limit, for either time-truncated or failure-truncated tests, is determined by the formula: MTBF = 2T (Chi - square value for 2n degrees of freedom at percentile = 1 - / 2) (4-18)
For example, the upper 90% confidence limit would be determined by finding the chisquare value for 2n failures at the 95th percentile (1 - 0.10/2). Note that the degrees of freedom remain the same for both time-truncated and failuretruncated tests. This is, roughly, because the additional information of time after the last failure does not significantly affect the estimation of the upper confidence limit as it does for the lower confidence limit. Hence, for a time-truncated test of 100 hours with one failure, the two-sided confidence limits are: Lower limit = 2(100) Chi - square value for 4 degrees of freedom at the 5 th percentile
200 = 21 hours 9.448
(4-19)
Upper limit =
2(100) Chi - square value for 2 degrees of freedom at the 95 th percentile

200 = 1,940 hours .103
(4-20)
40
STAT
Hence, the 90% confidence interval for the data is from 21 to 1,940 hours. We are 90% confident (have a 10% risk of error) that the true MTBF is between these limits. Table 4-7 summarizes the formulas discussed above. Table 4-7: Confidence Interval Formulas
One-sided Confidence interval (MTBF lower limit) Two-sided Confidence intervals For time truncated tests 2T 2 X (, 2n + 2) For failure truncated tests 2T 2 X (, 2n)
2T
2 2 X (/2, 2n + 2) X (1 - /2, 2n)

Lower Upper
2T
2T
2 2 X (/2, 2n) X (1 - /2, 2n)

Lower Upper
2T
MTBF limit:
These formulas apply only to the estimation of an MTBF based on a constant failure rate. To calculate other parameters or under other assumptions, the analyst must use the appropriate distribution, if he can identify or generate one. For example, MIL-HDBK-189, Reliability Growth Management, provides tables for a distribution (origin not identified) used to determine the confidence intervals of a failure rate that changes according to a nonhomogeneous Poisson process. See Section 6 for more on reliability growth and MIL-HDBK-189. 4.4 Measuring Reliability of "One-Shot" Products The reliability of the electronics modules in the space shuttle could be measured in terms of mean-time-between-failures (MTBF) and this result used (in Equation 4-12, or even 4-2) to determine the probability of these items operating satisfactorily for a particular mission. However, the reliability of the shuttles booster rockets would not be appropriately measured in MTBF or any other measure of "life". Even if recoverable and re-usable, the rockets are subject to only one short demand per mission. This type of product is called a "one-shot" device. Either the booster will work properly on demand, or it will fail with no possibility of repair during the mission. Its proper figure of merit is the probability of success, and this must be found directly by dividing the number of successful uses by the total number of attempts to use the product. Naturally, the more data available, the better our confidence in the measured probability of success, and a quantitative measure of describing confidence would be quite useful. Fortunately, we can determine confidence limits on the reliability of one-shot devices by making use of the fact that the successes and failures can be described by the binomial distribution. This is an exercise in measuring quality from samples and is described in detail in Section 7.1.
STAT
Section 5: Demonstrating Reliability
41
5.0 DEMONSTRATING RELIABILITY Measuring reliability seeks to answer the question, "What is the true reliability of the product?" In contrast, reliability demonstration seeks to answer, "How sure can I be that the reliability is satisfactory?" Since these questions are different, the methods used to answer them are also different. We have covered the measurement of reliability, and now will examine some ways to demonstrate reliability. These are derived from a branch of statistics called hypothesis testing, a set of methods designed to accept or reject hypotheses (e.g., the reliability meets a specified number) within acceptable risks, as described in Section 2.4. Leaving the testing of "one-shot" devices to Section 7.2, we will consider here the testing of life measures (i.e., characteristic life and mean time between failures). To begin, let us consider the simplest reliability demonstration test, the zero failure test. 5.1 Zero Failure Tests A zero failure test requires a given number of samples to be tested for a specified time. If no failures occur the product is accepted as meeting reliability requirements. The determination of sample size and test length is accomplished through consideration of the products reliability function. For example, the reliability function of a product following a Weibull distribution of failures is: R = e - (t/ ) where: R = probability of failure free operation t = operating time = characteristic life ( > 0) = shape parameter ( > 0) When several products are tested together, R = e - n(t/ ) where: n = number of products tested other terms as above (5-2) (5-1)
To demonstrate reliability, we would select a characteristic life which represents the minimum acceptable value, (or perhaps better, the highest value that we would call "bad") and determine the risk we are willing to take of accepting a product with that characteristic life. That
42
STAT
risk is the probability of the "bad" product having no failures during the test. Then, assuming we have an estimate for and a known sample size: Risk = e - n(t/ ) The equation is solved for "t", the only unknown. Thus, if we test "n" samples for time "t" and have no failures, we can accept the product with the predetermined risk that we may have accepted products with the defined "bad" characteristic life. Products with higher characteristic lives will have a higher probability of passing the test, and products with lower lives will have a lesser probability of passing. For example, if we are willing to take a 10% risk of accepting products with a defined "bad" , know that = 2, and can test 100 units:
2 0.10 = e -100(t/ )
(5-3)
(5-4)
Which gives: ln(0.10) = - 100(t/ ) 2 - 2.3 = - 100(t/ ) 2 - 2.3/ - 100 = (t/ ) 2 = 0.023
(t/ ) = 0.023 = 0.15
t = 0.15( )
(5-5)
Thus the samples are tested for a time (t) equal to 0.15( ). If no failures occur, the product is accepted. The zero failure test procedure will work with any distribution for which a reliability function can be defined and all parameters identified, though some distributions, such as the Weibull, are easier to work with than others. The exponential distribution is the easiest of all. Under the Weibull assumption, all samples are tested for the same operating time. When the exponential distribution of failures is assumed (i.e., the failure rate is assumed constant), equal test times are not necessary, as we will discuss in Section 5.3. However, we will first discuss the derivation of tests in which some failures are allowed, in Section 5.2. It should be noted that the only risk considered in our discussion so far has been the "consumers risk" (i.e., the probability of a "bad" product passing the test). There was no consideration for the probability of a "good" product failing the test (the "producers risk"). We shall also discuss this in Section 5.2.
STAT
43
5.2 Tests Allowing Failures If we test a number of units for the same test time on each unit, except for failed units, and do not allow a failed unit to be repaired and returned to the test, we can derive test plans allowing some failures by using the binomial distribution. As discussed in Section 3.1.1, Equation 3-4, repeated here as Equation 5-6, yields the probability of getting "x or less" events in "n" trials. P(x or less) =
0 x
n! p x q (n - x) x! (n - x)!
(5-6)
where: p q n = probability of an event happening in one trial = probability of event not happening in one trial = number of trials
If we let an event be a failure, then the probability of a failure, "p", is equal to one minus the reliability function (1 - e (t/ ) ) , and the probability of no failures, "q", is equal (by definition) to the reliability function (e (t/ ) ) . Letting "n" equal the number of units on test, Equation 5-7 gives the probability of "x" or less failures when all units are operated for "t" time units unless prevented by failure.
P(x or less) =
(x) n! 1 - e - (t/ ) 0 x! (n - x)! x
e - (t/ )
nx
(5-7)
We then set P(x or less) equal to the risk we are willing to take of accepting products with an MTBF equal to , and solve for "t" given any desired values of "x". This is obviously not a closed-form solution, and would ordinarily be done by trial and error iteration using a computer. It becomes even more difficult when the producers risk is considered, as discussed in Section 5.2.1. 5.2.1 Controlling the Producers Risks The producers risk is the probability of a test rejecting "good" products. A "good" product may be one with an MTBF clearly acceptable for the mission, or equal to the state-of-the-art for the product, or just something that is an arbitrary ratio higher than the "bad" MTBF that we want to reject. It should always be possible to achieve the "good" MTBF with reasonable effort. It warrants consideration because a test based solely on satisfactory consumers risks may unintentionally provide a high risk of rejecting "good" products. Though this is called the producers risk (for obvious reasons), it does the consumer no good to reject satisfactory products.
44
STAT
Equation 5-7 yields the consumers risk when " " is equal to the "bad" MTBF. The producers risk is calculated from Equation 5-8, when " " is equal to the "good" MTBF.
(x) nx n! 1 - e - (t/ ) e - (t/ ) P(x + 1, or more) = x + 1 x! (n - x)! n
(5-8)
To formulate a test with satisfactory producers and consumers risks, one defines the values of both risks, and solves for values of "n" and "t" that satisfy both risk equations. One way is the procedure shown in Figure 5-1.
Calculate time needed to satisfy the consumers risk in a zero-failure test. Calculate the probability that a value of life considered good would be rejected (producers risk) using the zero-failure test time. If the producers risk is too high, assume a one-failure test and re-compute the test time for a satisfactory consumers risk.
Start
End
If the producers risk is still too high, allow another failure, recompute test time and recalculate probability until an acceptable producers risk is achieved.
Calculate the probability that a value of life considered good would be rejected (producers risk) using the one-failure test time.
Figure 5-1: Devising a Reliability Test 5.3 Testing Under the Exponential Distribution When the exponential distribution of failures applies (i.e., the failure rate is constant), the reliability function is:
R = e -(t/ )
(5-9)
where: R = probability of zero failures in time (t) = the mean time between failures Since the failure rate does not change with time, it does not matter whether or not the test units have equal operating times. The test time "t" can therefore be set to the sum of the test time among all the units. In addition, unlike the tests based on the Weibull distribution, failures can be repaired and the failed unit returned to test. Equation 5-6 can then be used to determine a zero failure test as shown in Section 5.1. Since the assumptions we used in Section 5.2 do not hold, the binomial distribution cannot be used to derive tests allowing one or more failures. However, the assumption of a constant failure rate permits us to derive tests allowing failures by using the Poisson formula.
P(n) =
(u) n e -(u) n!
(5-10)
STAT
45
where: P(n) = the probability of exactly n events u = the expected number of events Since the number of failures expected in time (t) = t/ , the probability of exactly n failures in a test with time equal to t is:
(t/ ) n e -(t/ ) P(n) = n!
(5-11)
If we establish a test where the sum of test time = t and up to n failures are allowed, the probability of passing the test (i.e., the consumers risk, when = the "bad" MTBF) is:
P(n) =
(t/ ) n e -(t/ ) n! 0
(5-12)
Thus, we can establish a set of tests with different numbers of allowable failures, all of which provide the same consumers risk. We shall need this capability in order to formulate tests that have satisfactory producers risks as well as satisfactory consumers risks. Based on the Poisson formula, the producers risk for a test time of "t" with "n" failures allowed, is:
P(n) = 1 -
where:
(t/ ) n e -(t/ ) n! 0
(5-13)
= "good" MTBF other terms as defined previously We can use the procedure shown in Figure 5-1 to derive tests with satisfactory producers and consumers risks. However, there is no need to go through this routine. Table 5-1 tabulates test plans for the most reasonable combinations of risk and the ratio of the "good" MTBF to the "bad" MTBF (called the "discrimination ratio"). In the table, 0 is the "good" MTBF and 1, the "bad" MTBF. (Note: in some references 0 designates the "bad" MTBF and 1 is the "good" MTBF.)
46
STAT
Table 5-1: Fixed Time Reliability Tests

Nominal decision Risks Producers Consumers Risk (%) Risk (%) 10 10 10 20 20 20 10 10 10 20 20 20 10 10 10 20 20 20 30 30 30 30 30 30 Discrimination Ratio (0/1) 1.5 1.5 1.5 2.0 2.0 2.0 3.0 3.0 3.0 1.5 2.0 3.0 Test Duration (x 1) 45.0 29.9 21.5 18.8 12.4 7.8 9.3 5.4 4.3 8.0 3.7 1.1 Accept-Reject Criteria Reject (Failures Accept (Failures or More) or Less) 37 36 26 25 18 17 14 13 10 9 6 5 6 5 4 3 3 2 7 6 3 2 1 0
Table 5-1 is taken from MIL-HDBK-781, Reliability Test Methods, Plans, and Environments for Engineering Development, Qualification and Production, which is the most comprehensive reference on reliability testing. 5.3.1 Sequential Tests: A Short Cut As Table 5-1 shows, small values of risk and small discrimination ratios can result in long tests. Where this is unsatisfactory, a probability ratio sequential test may be used. The sequential test is based on the ratio of two probabilities: (1) the probability that a combination of failures and test time will occur when the test units actually have the "bad" MTBF, and (2) the probability of occurrence when the test units have the "good" MTBF. If the former is satisfactorily higher than the latter, a reject decision can be made. Conversely, when a combination of failures and test time is a predetermined times more likely to have occurred from a test of "good" units than from a test of "bad" units, an accept decision can be made. Where the ratio of the probabilities is not great enough to make a decision, the test continues to an arbitrary truncation point (used to assure that the test ends in a reasonable time). Figure 5-2 shows the form of the test, which will permit more rapid decisions than the fixed time test, when the true MTBF of the test units is much closer to one of the defined "good" or "bad" values than it is to the other. A summary of sequential test plans for the most common risks and discrimination ratios is presented in Table 5-2. More details (i.e., decision times for each failure) will be needed for application, as provided in Table 5-3 for a test with both risks approximately 10% and a discrimination ratio of 2.0. Details on other sequential tests are available in MIL-HDBK-781.
STAT
47
9 8 Reject Cumulative Failures 7 6 5 4 3 2 1 Test Data End Test Accept Continue Test
100
200
300 400 500 Cumulative Test Time, Hrs.
600
700
800
Figure 5-2: Typical Sequential Test Table 5-2: Sequential Tests

Nominal Decision Risks Producers Consumers Risk (%) Risk (%) 10 10 20 20 10 10 20 20 10 10 20 20 30 30 30 30 Notes: Discrimination Ratio (0/1) 1.5 1.5 2.0 2.0 3.0 3.0 1.5 2.0
0
Time to Accept Decision (Multiples of 1) Minimum Expected1 Maximum2 6.6 4.19 4.40 2.80 3.75 2.67 3.15 1.72 25.95 11.4 10.2 4.8 6.0 3.42 5.1 2.6 49.5 21.9 20.6 9.74 10.35 4.5 6.8 4.5
1. Expected time for a true MTBF equal to 2. Arbitrary truncation point
Table 5-3: Sequential Test Plan for 10% Risks, 2.0 Discrimination Ratio
Number of Failures 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Reject if t 1 Times: N/A N/A N/A 0.70 2.08 3.48 4.86 6.24 7.63 9.02 10.40 11.79 13.18 14.56 15.94 17.34 20.60 Accept if t 1 Times: 4.40 5.79 7.18 8.56 9.94 11.34 12.72 14.10 15.49 16.88 18.26 19.65 20.60 20.60 20.60 20.60 N/A
48
STAT
As an illustration, assume we put a number of products on test using Table 5-3 for our accept-reject criteria, and 1 = 100 hours. The first opportunity to accept the product would happen when we accrue 440 hours (4.4 x 1) among all the units on test. If there was no failures at that time, we would accept the product. If we had one failure before 440 hours were accumulated, we would have to wait until 579 hours to accept the product, presuming it did not fail a second time. The first opportunity to reject the product would be when three failures occurred, if we had accrued 70 hours (0.70 x 1) or less among the units on test. If we had more time than 70 hours accumulated at the third failure, we could not make a reject decision until the fourth failure, and then only if it occurred before we had more than 208 hours. When the failures and times do not permit a decision to accept or to reject, the test continues. An arbitrary truncation at 16 failures (reject) or 2,060 hours (accept), whichever comes first, prevents the test from continuing indefinitely. 5.4 Other Test Considerations There is more to reliability testing (or, for that matter, any testing) than the statistics involved. Some other considerations are: Definition of Failure: What is a failure? Is a transient event a problem? How much degradation is acceptable? Test Environment: Are the temperature, shock, vibration, thermal and power cycling, and other test conditions a realistic representation of the environment the product will face in its expected use? Product Configuration and Operation: Is the product on test truly representative of the product of interest? Is it being exercised to the same degree? Monitoring: How often will it be checked for proper functioning, and how? Failure Analysis: recommended? Will failures be analyzed for root cause and corrective action
ESS and Preventive Maintenance: Do the products on test have the same benefits of environmental stress screening (ESS) and preventive maintenance that the production units will have (and no better)? Special Ground Rules: If the number of failures is acceptable, but a pattern of failures exists, must the cause for the pattern be found and rectified before the product is accepted?
These considerations are beyond the scope of this text. However, MIL-HDBK-781, Reliability Test Methods, Plans and Environments for Engineering Development, Qualification, and Production covers them in great detail.
STAT
Section 6: Reliability Growth Testing
49
6.0 RELIABILITY GROWTH TESTING Reliability growth is the improvement in reliability of a product as design defects are found and eliminated. This can be done using data from all operational tests of the product and/or by dedicated reliability growth testing. In either case, it is of interest to estimate when the reliability will have grown to a satisfactory value. Various models have been proposed, and the two most popular are based on fitting the data to a straight line on log-log scales. Both methods will be explained. The first, the Duane model, because its solution uses least square regression, a method of broad utility. The other, the AMSAA/Crow model, because it assumes an underlying distribution to the data. Both of these are described, with others, in MIL-HDBK-189, Reliability Growth Management. 6.1 Duane Growth Analysis The first formal reliability growth model was created by James T. Duane, who noted that failure rate data plotted as a straight line against operating time on log-log scales. Predicting how much operating time would be required to achieve a desired failure rate was therefore a matter of fitting an equation to the data and solving the equation, finding time for a given failure rate. Duanes equations for reliability growth are:
cum = KT -
(6-1)
where:
cum
K T
= = = =
cumulative failure rate initial failure rate test time growth rate
By finding the rate of change in the number of failures against time,

inst = K(1 - ) T -
(6-2)
where inst = instantaneous failure rate at time T (the failure rate expected if reliability growth stops at time = T). and inst plot as parallel straight lines on log-log graph paper. One can, of course, obtain a solution by drawing a straight line through the data points, extending the instantaneous failure rate curve until it intersects with the desired failure rate, and reading the corresponding point on the time axis. However, given any spread in the data, placing the straight line becomes rather arbitrary. The parameters of the equations are therefore determined by a statistical analysis method called least square regression.
cum
50
STAT
6.1.1 Least Square Regression Least square regression is a way of finding a straight line fitting a set of data believed to be linear, but showing scatter. Basically, it minimizes the sum of the squares of the distances between the data points and the line. The basic equation for any straight line plot is: Y = mX + b (6-3)
To determine the equation from a set of paired data points using the method of least squares: m = X i Yi - (X i Yi )/n 2 X i - (X i )2 /n
Yi - m X i n
(6-4)
and b =
(6-5)
Relating this general procedure to Duanes equations: Y b m X = = = = log of cum log K - log of cum time
As an example, we will perform the least squares regression on the set of data shown in Table 6-1.
STAT
51
Table 6-1: Growth Data

Cum. Failures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Cum. Test Time 1 4 8 13 20 30 42 57 78 104 136 177 228 292 372 473 599 757 956 1205 1518 1879 2262 2668 3099 Log Cum. Time (Xi) 0 0.60206 0.90309 1.1139 1.3010 1.4771 1.6232 1.7559 1.8921 2.0170 2.1335 2.2480 2.3579 2.4654 2.5705 2.6749 2.7774 2.8791 2.9805 3.0810 3.1813 3.2739 3.3545 3.4262 3.4912
2 Xi
0 0.36248 0.81557 1.2408 1.6926 2.1819 2.6348 3.0832 3.5797 4.0683 4.5518 5.0535 5.5597 6.0782 6.6075 7.1551 7.7140 8.2892 8.8834 9.4926 10.121 10.718 11.253 11.739 12.188 2 X i = 145.1
1.0 0.500 0.375 0.308 0.250 0.200 0.167 0.140 0.115 0.0962 0.0809 0.0678 0.0570 0.0479 0.0403 0.0338 0.0284 0.0238 0.0199 0.0166 0.0138 0.0117 0.0102 0.00899 0.00807
Log(Yi) 0 -0.30102 -0.42597 -0.51145 -0.60206 -0.69897 -0.77728 -0.85387 -0.93930 -1.0168 -1.0921 -1.1688 -1.2441 -1.3197 -1.3947 -1.4711 -1.5467 -1.6234 -1.7011 -1.7799 -1.8601 -1.9318 -1.9914 -2.0462 -2.0931
XiYi 0 -0.1812 -0.3847 -0.5697 -0.7833 -1.032 -1.262 -1.499 -1.777 -2.051 -2.330 -2.627 -2.933 -3.253 -3.585 -3.935 -4.296 -4.674 -5.070 -5.484 -5.918 -6.325 -6.680 -7.011 -7.307
X i = 55.58
Yi = -30.39 X i Yi = -80.97
Using the values from Table 6-1: m = - = 80.97 - (55.58)(-30.39)/25 145.1 - (55.58) 2 /25 = - 0.62, or = 0.62 (6-6)
b = log(K) = [(-30.39) - (-0.62)(55.58)]/25 = 0.16, or K = 10 0.16 = 1.45
(6-7)
A typical Duane plot is shown in Figure 6-1. The cumulative failure rate is plotted as a straight line on the log-log scale, and the instantaneous failure rate is plotted by another straight line parallel to the cumulative failure rate. Numerically, at the time of the last failure, the cumulative failure rate, calculated from the data using Equation 6-1, is:
cum = KT -
= 1.45(3,099) -0.62 = 1.45 x 0.0068 = 0.00986
(6-8)
52
STAT
Failure Rate (Failures/ Hours)
Cumulative Failure Rate
Instantaneous Failure Rate
Cumulative Test Time
Figure 6-1: Typical Duane Plot The instantaneous failure rate, using Equation 6-2, is:
inst = K(1 - ) T -
= 1.45 (1 - 0.62)(3,099) -0.62 = 0.0037
(6-9)
6.2 AMSAA Growth Analysis Dr. Larry Crow, then at the Army Material Systems Analysis Agency (AMSAA), devised another approach to reliability growth analysis based on the assumption that reliability growth is a non-homogenous Poisson process with a Weibull intensity function (Translation: a process in which the number of failures follows a Poisson distribution, but the Poisson distribution changes with time in such a manner that the instantaneous failure rate resembles a Weibull function). Under this assumption:
cum = inst =
T ( -1) T ( -1)
(6-10) (6-11)
where: = = T = growth rate initial failure rate test time
STAT
53
The parameters of the AMSAA/Crow model are determined by what is called a maximum likelihood method:
= N
i =1
ln (T/X i )
(6-12)
where: N = T = Xi = and: = N T Using the AMSAA model on the same data that we applied the Duane model yields the results shown in Table 6-2. Table 6-2: Growth Data Revisited
Cumulative Failure Cumulative Test Count Time at Failure (Xi) 1 1 2 4 3 8 4 13 5 20 6 30 7 42 8 57 9 78 10 104 11 136 12 177 13 228 14 292 15 372 16 473 17 599 18 757 19 956 20 1205 21 1518 22 1879 23 2262 24 2668 25 3099 24 ln (X n /X i ) = 72.99 i =1 Xn/Xi 3099 774.75 387.38 238.38 154.95 103.30 73.786 54.368 39.731 29.798 22.787 17.508 13.592 10.613 8.3306 6.5518 5.1736 4.0938 3.2416 2.5718 2.0415 1.6493 1.3700 1.1615 1.0000 ln(Xn/Xi) 8.0388 6.6525 5.9594 5.4739 5.0431 4.6376 4.3012 3.9958 3.6821 3.3944 3.1262 2.8627 2.6095 2.3621 2.1199 1.8797 1.6436 1.4095 1.1761 0.94460 0.71369 0.50034 0.31483 0.14975 0
no. of recorded failures total test time (= X n when test ends at a failure) time when an individual failure occurs
(6-13)
54
STAT
25 = 0.34 72.99
(6-14)
25 (3,099) 0.34
= 1.625
(6-15)
At the end of test, using Equations 6-10 and 6-11:

cum = inst =
T ( - 1) = (1.625)(3,099) (0.34 - 1) = 0.008066 failures/hour T ( - 1) = (0.34)(1.625)(3,099) (0.34 - 1) = 0.0027 failures/hour
(6-16) (6-17)
The AMSAA/Crow equations will yield the same type of straight line plots on log-log paper as the Duane equations. However, though we used the same data for both, a comparison of Equations 6-8 and 6-9 to 6-16 and 6-17 shows that the Duane and AMSAA models can lead to different answers. Table 6-3 compares the estimates at the end of the test (3,099 hours.) Table 6-3: Comparison of Estimates
Parameter estimated
cum inst
Duane Estimate at 3099 hrs. AMSAA Estimate at 3099 hrs. 0.00986 failures/hour 0.008066 failures/hour 0.0037 failures/hour 0.0027 failures/hour
One reason for the differences shown in Table 6-3 might be that the least squares solution treats all the data points equally, and in growth data the early points are not as significant as the later points, because they are based on less data. Or, the assumption of a non-homogenous Poisson process with a Weibull intensity function may not quite fit. There is no preferred solution. However, there are some advantages to the AMSAA/Crow model simply because a distribution is assumed. For example, confidence intervals can be drawn around the plot by finding the limits of specified areas of a distribution (as we did in Section 4.3.2, for example). Tables of the appropriate distribution for analyzing data from a non-homogeneous Poisson process are found in MIL-HDBK-189, Managing Reliability Growth.
STAT
Section 7: Sampling (Polling) and Statistical Quality Control
55
7.0 SAMPLING (POLLING) AND STATISTICAL QUALITY CONTROL Almost all statistical tools are designed to analyze the characteristics of a population of something from data on a sample taken from that population. In our discussion of measuring reliability, we used confidence limits to better understand the trust we could put into a measured value of life, failure rate or mean-time-between-failures. We also often want to measure quality, in terms of an attribute such as the proportion of products in a distribution containing defects. An analogous problem is to determine the proportion of voters in favor of some proposed government action, and, we shall see, the same methods of solution apply. In our discussion of demonstrations, we used tests of hypotheses to decide how confident we were that an accept or reject decision was correct, for measures of life. We also often want to demonstrate the acceptability of a product, in terms of the proportion defective, from sample data. A problem analogous to this, and solved by the same methods, is the demonstration of reliability for a "one-shot" product, as mentioned in Section 4.4. A key assumption in these applications is that the attribute of interest follows the binomial distribution (a product is defective or not, a voter is for or against, etc.). 7.1 Measuring Quality from Samples The central limit theorem (discussed in Section 3.2.1.2) tells us that when we measure a parameter such as percent defective in an infinite number of samples, the mean of these measurements will be equal to the percent defective in the population from which we took the samples, and the measurements will be distributed normally. (Caveat: for proportions, the normal applies well only when p or (1 - p) > 0.1 and np > 5, where "p" is the proportion defective and "n" is the sample size.) Because we are sampling a parameter distributed binomially: = where: = the standard deviation of the population = the proportion of interest (defects, yes votes, etc.) p(1 - p) (7-1)
The standard deviation of the measurements, "S", is a function of the standard deviation of the parent population, " ", and the size of the sample, "n".
S =
p(1 - p) n
(7-2)
56
STAT
where: S = standard deviation of the distribution of sample measurements taken n = sample size (no. of parts checked, people polled, etc.) Since the measurements follow a normal distribution, we can convert it to a standard normal, as discussed in Section 3:
z = p-p S
(7-3)
where:
p = a measurement taken on one sample (of size n) p = the mean of the measurements on all samples (= the population mean) S = standard deviation of the measurements
Our next step is to choose a critical value of "z" for the confidence we want. The confidence is equal to the area under the standard normal defined as between "z" and "-z", when "z" equals the critical value. For example, if we desire a 95% confidence, we need the critical value of "z" marking the limits to 95% of the area under the normal curve around the mean. Table 7-1 (a copy of Table 3-3) gives these values for some common areas of interest. Table 7-1: Critical Values of z
Critical Value of z 1.28 1.645 1.96 2.33 2.58 Area Between z and -z 0.80 0.90 0.95 0.98 0.99 Area From z to Infinity 0.10 0.05 0.025 0.01 0.005
As Table 7-1 shows, in the standard normal distribution 95% of all data fall between -1.96 and + 1.96, or:
0.95 = probability that ( p - p)/S is between -1.96 and + 1.96

where:
p is the measured proportion in the sample. Put another way, we are 95% sure the true proportion is between p - 1.96S and p + 1.96S.
STAT
57
However, since we dont know p, we also dont know S. Fortunately, we can approximate it by substituting p for p, yielding an expression called the standard error:
SE(p) =
p(1 - p) n
(7-4)
To illustrate this, let us discuss a political poll. Suppose someone polled 1,000 people on some question and found 500 yes votes. This means p = 500/1000 = 0.50. Our 95% confidence limits are therefore:
Lower limit = p - 1.96S Upper limit = p + 1.96S

Using the Standard Error in place of S:
Lower limit = p 1.96 [p(1 - p)] n 1.96 [p(1 - p)] n = 0.5 1.96
(7-5) (7-6)
[0.5(1 - 0.5)]
1,000
= 0.469
(7-7)
Upper limit = p +
= 0.5 +
1.96
[0.5(1 - 0.5)]
1,000
= 0.5309
(7-8)
Thus we are 95% sure that the true value (p) is between 0.469 and 0.5309, which is the measured value ( p ) of 50%, plus or minus about 3%. Note that the latter figure is not 3% of the measured value of p (50%), but 3% of the total number of votes. In polling, this 3% is called the margin of error. The margin of error will change with the desired confidence. As an example, for a 99% confidence interval, the area under the standard normal curve (see Table 7-1) is between 2.58 standard deviations. Using 2.58 in place of 1.96 in Equations 7-7 and 7-8 yield a margin of error of 4%, for 99% confidence. However, the selection of sample size can have a much more significant effect. We will leave it to the reader to verify, if he (or she) wishes, that a poll of 100 people with the same measured value (0.5), for a 95% confidence interval, would have about a 10% margin of error (0.098 to be more precise), and a poll of 10,000 people with the same result would have a margin of error of about 1%. The expense of polling 10,000 people instead of 1,000 or so is seldom considered worth the greater precision.
The margin of error is also dependent on the value of p , but is highest when p = 0.5. For example, when p = 0.1 (or = 0.9), the margin of error for 1,000 samples at 95% confidence is 0.0185 or less than 2%, as compared to 3% at p = 0.5.
58
STAT
7.1.1 Caveats These conclusions assume that the sample is truly representative of the population of interest. Fortunately for us, making a random selection of products to measure defect rate is a lot easier than assuring that people polled (and their responses) really represent the population of interest. This method presumes a large sample. If small samples are used (less than 30 or so), the Student t distribution (see Section 3.2.2) should be used instead of the standard normal. This is done by using Student t tables (see Appendix E) in lieu of Standard Normal tables in finding critical values for the desired confidence. For example, we have determined from Table 7-1 that the critical value for 95% confidence using the standard normal is 1.96. Using Appendix E, we would find a value of "t" under the column for 0.975 (when 0.975 of the area under the curve is between - to "t", 1 - 0.975 = 0.025 is in one tail of the curve. Hence, both tails contain 2 x 0.025 = 0.05, or the area between "t" and "-t" is 0.95). For a sample size of 4 (three degrees of freedom), this value (from Appendix E) would be 2.353. Using 2.353 instead of 1.96 in Equations 7-7 and 7-8 yields wider confidence limits, reflecting the greater uncertainty caused by small samples. As sample size increases, the differences between Table 7-1 and Appendix E vanish. Neither the normal nor the Student t distribution apply well when p < 0.1 or np < 5. It is possible to estimate confidence limits using the Poisson distribution when p < 0.1 and the binomial when p < 0.1 and np < 5, but since these are discrete distributions, they are awkward to handle. The reliability engineer may not have enough interest to pursue this further, but quality engineers have done so. The usual practice is to approximate the distributions by a smooth curve (the higher the sample size, the better this works) and use graphical methods devised for finding the confidence limits. Some simple examples of this are presented in Statistical Methods in Engineering and Manufacturing, by John E. Brown, Quality Press, Milwaukee, 1990. 7.2 Demonstrating Acceptability Through Sampling In Section 7.1, we were concerned with measuring an attribute of a population through a sample. Quite often, we are more concerned with assuring that the value of an attribute is acceptable than knowing the value itself. For this purpose, we are not concerned with measuring to some confidence, but rather testing a hypothesis (the product is acceptable defined as having better than a stated defect rate, or the product is not acceptable, defined as having worse than a given defect rate) to some confidence (or equivalently, with some acceptable risk of error). Or, we may be concerned with the reliability of a "one-shot" device, and seek to verify that its probability of success when called upon is satisfactory to some desired confidence. The same methods will apply. A sampling test is defined by sample size and an allowable number of samples with the undesired trait. For example, take a sample of 5 units and allow no defects. The probability of passing this test will depend on the true defect rate of the product. Intuitively, we would expect a product with a defect rate of 1% to pass this test most of the time and a product with a defect rate of 50% to fail most of the time. Statistical analysis is needed to determine what "most of the time" is in terms of probability of passing, especially for products with less extreme defect rates.
STAT
59
Plots of the probability of acceptance versus the actual defect rate of a product are called operating characteristic (OC) curves. An ideal OC curve would look like Figure 7-1, where all products below a specified value would fail and all above would pass. Unfortunately, the real world is not so obliging and the typical shape of an operating characteristic curve is as shown in Figure 7-2.
1
Probability of Acceptance P()
0%
(% Defective)
(High Qual)
100% (Low Qual)
Figure 7-1: Ideal O-C Curve
P() Probability of Acceptance of a Batch of Quality
0 0% (High Qual) Process Average 1 2 (% Defective)
100% (Low Qual)
= risk of rejecting product with defect rate =

= risk of accepting product with defect rate =
1 2
Figure 7-2: Practical O-C Curve Figure 7-2 illustrates two types of risks involved in sampling. A customer who considers to an acceptably low 2 the worst defect rate that he can tolerate would like tests that keep value. For this reason is called the consumers risk. A manufacturer who considers 1 the best defect rate he can practically achieve would like tests that keep to an acceptably low level. Hence, is called the producers risk. These terms have been discussed in Section 5, Reliability
60
STAT
Tests, where we derived tests that considered both risks. Typically, however, sampling plans consider only one risk. A customer doing incoming inspections of suppliers products will use test plans based on , while a supplier checking his quality control will use plans based on . In either case the risks are computed from the binomial formula, Equation 3-1 in Section 3.1.1, reprinted here as Equation 7-9. P(r) = where: p n r P(r) = = = = the proportion defective sample size number defective probability of getting exactly r defective units in a sample of n units n! p r (1 - p) n - r r!(n - r)! (7-9)
If r or less failures are allowed in a test of n units, the probability of passing the test is: P(accept) = n! p r (1 - p) n - r 0 r!(n - r)!
r
(7-10)
The consumer can set P(accept) = for p = worst acceptable defect rate and generate a set of plans by setting values of "r" and solving the equation to determine n for each value of "r". These plans are referred to as LTPD Plans. LTPD stands for Lot Tolerance Percent Defective. LTPD plans are designed to reject the designated LTPD value most of the time. The producer can set P(accept) = 1 - for p = the defect rate he considers realistic for his product, and generate a set of test plans in a similar manner. These plans are referred to as AQL plans. AQL stands for Acceptable Quality Level and AQL plans are designed to accept the designated AQL most of the time. In practice, rather than solve Equation 7-10, a value of "p" is selected and P(accept) determined for various values of "n" and "r" until a plan giving an acceptable value of P(accept) for a reasonable sample size is produced. This iterative process is an ideal job for a computer, but quite tedious when done manually. However, when "r" = 0 we can solve Equation 7-10 in closed form, and will use this fact to give an example. When "r" = 0, Equation 7-10 simplifies to Equation 7-11: P(accept) = n! p 0 (1 - p) n - 0 = (1 - p) n 0! (n - 0)! (7-11)
Taking logarithms of both side, Equation 7-11 becomes: log P(accept) = n log(1 - p) (7-12)
STAT
61
or: n = log P(accept) log (1 - p) (7-13)
Let us assume a consumer wants to be 90% confident that products he receives have an LTPD of 0.15, using a test with no failures allowed. For this case, P(accept) would be the consumers risk or one minus the confidence or (1 - 0.90) = 0.10, and (1 - p) = 1 - 0.15 = 0.85. Using Equation 7-13: n = log (0.10) - 1.0 = = 14.16 log (0.85) - 0.0706 (7-14)
The consumer would therefore test 14 units and accept the product if no failures occurred. Let us suppose the producer of the same product wanted to be 90% confident that the products made had an AQL of 0.05, using a test with no failures allowed. In this case, P(accept) would equal one minus the producers risk or the desired confidence (0.90) and (1 - p) = (1 - 0.05) = 0.95. Again using Equation 7-13: n = log (0.90) - 0.04576 = = 2.05 log (0.95) - 0.02228 (7-15)
Hence, the producer would test 2 units and allow no failures. There are published tabulations of test plans for use by quality professionals. LTPD plans are tabulated in MIL-PRF-19500K, Performance Specification, Semiconductor Devices, General Specification For, in Appendix C, and also in MIL-PRF-38535E, Performance Specification, Integrated Circuits (Microcircuits) Manufacturing, General Specification For, in Appendix D. AQL plans are found in ANSI/ASQC Z1.4-1993, American National Standard, Sampling Procedures and Tables for Inspection by Attributes, available from the American Society for Quality (ASQ). The ANSI/ASQC standard replaces MIL-STD-105, Sampling Procedures, which has been cancelled. The reader should note that these publications are not user-friendly to the neophyte and take a procedural approach which obscures the correlation of the table entries to a confidence limit. 7.3 Statistical Quality Control A product is created by some process. There is always some variability among products due to inherent variability in the process. When the process is satisfactory, the variability will consistently be between acceptable limits about a target value. Special causes (e.g., an inadequately trained process operator, a bad lot of parts, tool wear, etc.) can change the variability among products in production so that it is no longer within satisfactory limits, is no longer centered about the target value, or both. To help maintain stable processes during production, the discipline of Statistical Quality Control was established. In this section we will
62
STAT
discuss these statistical tools and also some statistical tools that can help define an acceptable process. Statistical Quality Control (SQC), is also known as Statistical Process Control (SPC). The basic assumption is that as long as sample measurements taken periodically during production vary randomly within an expected variance, the process is in control and needs no adjustment. Non-random measurements or indications of variance outside the expected distribution show some special influence at work which can be found and corrected to restore the expected performance. The sample data are plotted and analyzed on control charts. An authoritative reference for SQC is Statistical Quality Control, by E.L. Grant and R.S. Leavenworth, McGrawHill Book Co., NY, 1989 (6th Edition). 7.3.1 Control Charts If samples of a process are taken over time, and a parameter of interest measured, the results can be presented on a run chart, such as Figure 7-3.
4 Data Samples
Figure 7-3: Run Chart To determine the significance of the data in Figure 7-3, we need to establish an expected variance for the data. Taking advantage of the central limit theorem, we assume that the data will be normally distributed about the process mean, and that 99.7% of the data points should be within plus or minus three standard deviations from the mean (from Appendix C, at z = 3.0, the area from 0 to z is 0.4987, or the area between "z" and "-z" is 2 x 0.4987 = 0.9974 = 99.7%). We can therefore establish control limits at three standard deviations from the mean with little chance that samples from a stable process will exceed them. Should the data plot exceed these limits, we can feel we have good evidence that the process has gone out of control (see Figure 7-4). With the limits in place, the run chart has become a control chart. Note that in the automotive industry, where many thousands of control charts are used, using control limits encompassing even 99.7% of expected variation would result in too many false alarms. Hence, control limits are often set higher than three standard deviations in automotive plants. Most other applications are satisfied using three standard deviations.
STAT
63
Indicates Process Out of Control Upper Control Limit (UCL) = Mean + 3S
Centerline = Process Mean
Lower Control Limit (LCL) = Mean - 3S
4 Data Samples
Figure 7-4: Control Chart The control limits are determined from the standard deviation of the distribution of the sample mean measurements. This is obtained by dividing the standard deviation of the process by the square root of the size of one sample. The standard deviation of the process, in turn, can be estimated from the sample data. The procedure for doing this will vary dependent on the distribution of the parameter of interest. 7.3.2 Control Charts for Variables Controlling variables, such as part diameter, tensile strength, etc., which ordinarily follow a normal distribution, can be done with the aid of a control chart called the X chart. This looks like Figure 7-4. The center line is the process mean determined by the average measurement of a (hopefully large) sample, or the mean of the means of many equal sized samples. The process target can also be used as the centerline, for example a desired part diameter. The upper and lower control limits would be: UCL = X + 3S n 3S n (7-16)
LCL = X where:
(7-17)
X = the centerline value n = the number of units in a sample S = the process standard deviation
64
STAT
The process standard distribution, S, is determined from sample data. One way to do this is to use the formula:
2 (X i - X ) n
S =
where: Xi = = =
i =1
n -1
(7-18)
mean of one sample mean of sample means number of samples taken
X n
7.3.3 Range Charts If we are interested in the mean of a variable such as product diameter, we will also usually be interested in the variation about the mean. If the mean of a sample is on target, but the variation is too great, we do not have a good situation. For this reason, the X chart is often accompanied by a Range chart, as shown in Figure 7-5.
UCL Mean Variation
X LCL
UCL R LCL Range Variation
Figure 7-5: X and R Chart Combination Range is simply the difference between the highest and lowest measurements of the units in the sample. It is a measure of variation, and, as such, can be used instead of the process standard deviation in setting control limits for both the X and R charts. It is necessary to determine the average range, preferably from the mean of many samples. This value becomes the centerline of the R chart. The centerline of the X chart is determined by the mean of many sample means, as before. However the control limits for both charts are determined from the following formulas:
UCL(X) = X + A 2 R
(7-19)
STAT
65
LCL(X) = X A 2 R
UCL(R) = D 4 R
LCL(R) = D 3 R
(7-20) (7-21) (7-22)
where: UCL(X) LCL(X) UCL(R) LCL(R) = = = = upper control limit for X chart lower control limit for X chart upper control limit for Range chart lower control limit for Range chart
X = process mean = mean of sample means A2, D4, D3 = constants (see Table 7-2)
The constants in the control limit formulas have been worked out by statisticians, assuming the processes measured follow a normal distribution, and are presented in Table 7-2. Table 7-2: Statistical Constants
Sample Size 2 3 4 5 6 7 8 9 10 12 15 20 A2 1.88 1.02 0.73 0.58 0.48 0.42 0.37 0.34 0.31 0.27 0.22 0.18 D3 0 0 0 0 0 0.08 0.14 0.18 0.22 0.28 0.35 0.41 D4 3.27 2.57 2.28 2.11 2.00 1.92 1.86 1.82 1.78 1.72 1.65 1.59
For example, if we are producing rods with a mean length of one inch and find our mean range in many samples is 0.002 inches, then X = 1.000, R = 0.002. For a sample size of three: UCL(X) = X + A2 R = 1.000 + 1.02(0.002) = 1.000 + 0.00204 = 1.00204 LCL(X) = X - A2 R = 1.000 - 1.02(0.002) = 1.000 - 0.00204 = 0.99796 UCL(R) = D4 R = 2.57(0.002) = 0.00514 LCL(R) = D3 R = 0(0.002) = 0 (7-23) (7-24) (7-25) (7-26)
66
STAT
7.3.4 Interpreting Control Charts A plot running outside the control limits is not the only sign of unexpected variation. Any plot that does not appear to be random is evidence of loss of control, and often provides clues about the problem. For example, a plot hugging the centerline (all points falling within one standard deviation of the centerline) shows some abnormal condition, probably the misrecording of data because of a fear of reporting measurements off target. In contrast, a plot hugging the control limits, going from close to the Upper Control Limit to Close to the Lower Control Limit without any points in between, shows some factor switching the distribution from high to low. One example could be a process done on either of two machines that are adjusted differently. Some possible signs of non-random variation (i.e., trouble) are: One or more points outside the control limits 10 consecutive points above the center line or below the center line 7 consecutive points in a steadily increasing or steadily decreasing pattern 14 points alternating up and down 15 consecutive points within plus and minus one standard deviation of the center line 2 consecutive points in the band between plus two and plus three standard deviations or in the band between minus two and minus three standard deviations.
In summary, if the plot does not look random, you have grounds for concern, even if the control limits are not exceeded. However, if the plot is random and between control limits, the process should not be adjusted. If, for example, a machine is making rods and every time a sample measures below the center line an adjustment is made to produce longer rods, and every time a sample measures above the center line an adjustment is made to produce shorter rods, the end result would be to double the natural variance of the process. If the process is in control, do not tinker with it. The only way to improve the results would be to change the process to a better one. 7.3.5 Controlling Attributes Variables such as dimensions are not always the parameter of interest. We may be concerned with an attribute, which is something that a product either has or lacks, rather than has in some range of value. A ball is red or it is not, for example. Of more interest, a product is defective or it is not. This was discussed in Section 7.1, where we were concerned with measuring the attribute of interest, and in Section 7.2, where we were concerned with demonstrating the attribute of interest. Statistical Quality Control for attributes is most akin to the latter, taking an AQL approach. If defects are our concern, we could be interested in the proportion of defective product or the defect rate. For the former, we would not care whether or not a product had more than one defect, while for the latter we are interested in such ratios as defects per product, defects per cubic foot, etc. The proportion of defective product is described by a binomial distribution (as discussed in Section 7.2) and the defect rate by a Poisson distribution. Hence, we use different
STAT
67
formulas in determining the control limits. It also makes a difference whether or not a constant sample size is used. 7.3.5.1 Proportions The control chart for proportions (called a "p" chart) is derived from the value of the proportion of interest, estimated by the number of defective parts divided by the number of parts tested in a large sample. This serves as the centerline for the p chart. The process standard deviation would be the square root of p(1 - p), and the standard deviation of samples containing "n" units would be:
S =
p(1 p) / n
(7-27)
Again setting control limits at plus and minus three standard deviations (of the distribution of sample measurements) from the center line, we have:
UCL = p + 3 p(1 p)/n LCL = p 3 p(1 p)/n
(7-28) (7-29)
It is possible for the calculated LCL to be a negative number. This has no meaning, so the LCL in such cases is set to zero. If the sample size is not constant, the control limits would be different for every sample, as shown in Figure 7-6.
Effect of Smaller Sample
UCL
Effect of Larger Sample
Centerline
LCL (0) Min LCL = 0
Figure 7-6: "p chart" for Different Sample Sizes
68
STAT
7.3.5.2 Rates If we are interested in the defect rate or defect density, the average defect rate, estimated from a large sample, is the centerline. For rates, the standard deviation of the process would be the square root of the average rate, "u", and the sample standard deviation would be "u" divided by the square root of the sample size. Hence, setting the control limits as usual, we have:
UCL = u + 3 (u / n ) LCL = u 3 (u / n )
(7-30) (7-31)
For constant sample sizes, the average number of defects, "c", can be used for the centerline. The standard distribution of both the process and the samples number of defects is the square root of c. Hence: UCL = c + 3 c LCL = c 3 c (7-32) (7-33)
These formulas assume a Poisson distribution for the process, and are most accurate when the sample size is 10% of the size of the population sampled, to assure that the probability of finding a defect remains reasonably constant as samples are selected and removed from the population. 7.3.6 Caveat: "In Control" May Not Be "In-Spec" It is important to note that a process that is "in control" is not necessarily one meeting specified limits. If it is in control, it is doing as well as can be expected, and no amount of tinkering will improve the product, short of changing the process itself. Necessary changes can be determined by the statistical design of experiments, described in Section 8, and, once a satisfactorily capable process is installed, SQC may be used to monitor its stability. We will now discuss some measures that can be used to determine the capability of a process to create products within specified limits. These measures assume the process is stable, but not necessarily satisfactory. As discussed in Section 3, when the product can be described by a normally distributed parameter, the standard deviation can be estimated from a sample and then used to calculate the proportion of the product outside any given limits. An extension to this procedure is the calculation of figures of merit describing the relationship between the distribution of the parameter of interest and the designated acceptable range for its values. 7.3.6.1 Measuring Process Capability Process Capability (Cp) is one measure of the ability of a process to produce acceptable products. To compute Cp, it is necessary to determine the standard deviation of the parameter of
STAT
69
interest. This is done by measuring the parameter in a sample of the product and using the formula:
2 (x i - x ) n
=
where: = xi = x = n =
i =1
n -1
(7-34)
standard deviation ith measurement mean of all measurements number of measurements
Cp is then calculated by:

Cp = USL - LSL 6
(7-35)
where: USL = Upper Specification Limit (highest value considered acceptable) LSL = Lower Specification Limit (lowest value considered acceptable) = Standard Deviation Where the difference between the USL and LSL is 6 , Cp = 1. This means that all the product from the target value to plus and minus 3 are within the specified limits, as shown in Figure 7-7.
LSL
Target
USL
Figure 7-7: Process Capability (Cp) Chart As mentioned in Section 7.3.1, 99.7% of a product is distributed between plus and minus 3 . Therefore, a Cp of 1.0 means that 99.7% of the product is in the range defined as acceptable. In current conditions, this is considered marginal. It is not until Cp equals or exceeds 1.3 that a process is considered good. The "6 " quality program initiated by Motorola aims for a full 6
70
STAT
between the target value and either of the specification limits (a total spread of 12 between the upper and lower specification limits), a Cp of 2.0. However, Motorola does not assume the target value will be centered, since the mean of a process is also subject to variation. Where the process mean is off target, Cp cannot be translated into percent "in-spec". For this reason, Motorola also utilizes a process performance figure of merit, described in Section 7.3.6.2. 7.3.6.2 Measuring Process Performance Process Performance (Cpk) measures the capability of a process when the mean value of the parameter of interest is not necessarily on target, as shown in Figure 7-8.
Lower Spec Limit
Target Value
Process Mean
Upper Spec Limit
Figure 7-8: Process Performance (Cpk) Chart Cpk is calculated by the formula:
C pk = Min {(USL - ); ( - LSL)} 3
(7-36)
where: Min = smaller of the two values = process mean other terms as defined above Motorolas "6 " goal is a Cp of 2.0 and a Cpk of 1.5. This means that when the process mean is 1.5 off target, the shortest distance from the process mean to either specification limit is 4.5 , which equates to 3.4 parts in a million "out of spec". The "6 " philosophy can also be applied to processes parameters not described by a normal distribution. For example, taxpayer questions to the IRS are answered either correctly or not, a binomial process. Such cases can be related to the others through the error rate. If the IRS answered incorrectly only 3.4 time in a million queries, their process would be equivalent to a "6 " process (one having a Cpk of 1.5). Most industrial and commercial practices are, or are equivalent to, about "4 " processes, having error rates around 6,200 per million opportunities.
STAT
Section 8: Using Statistics to Improve Processes
71
8.0 USING STATISTICS TO IMPROVE PROCESSES If a process is in control (see Section 7), the only way to improve it is to change it. A new process may be created, or, more often, the parameters of the process are changed. For example, to improve a wave solder process, we might change the temperature of the solder, the height of the solder wave, or add flux to the mixture. We could test the effects of each possible change individually, but this is inefficient and would not show any interactions between factors (e.g., perhaps a higher solder wave works better with colder solder than with hotter solder). To most efficiently find desired improvements, and to examine interactions of factors, if necessary, we can employ the statistical design of experiments (DOE). We can use DOE to find the optimum process parameters for a defined use environment, or adapt it to find a robust design (i.e., one well suited for a range of use environments). We can also evaluate the significance of our results using tools from a family of methods called the analysis of variance (ANOVA). 8.1 Designing Experiments The first consideration in designing an experiment is the identification of the factors to be tested. These are the process parameters that we can control and that we believe will affect the product. Examples given above were the temperature, wave height and use of flux in a wave solder process. It is a good idea to have the factors selected by a team of people involved with the process. The next step is to set factor levels to be used as test settings. We will need a high and low setting for each factor. It is possible to use more than two levels of a factor, but much simpler to use only two. The two settings should be close enough together to assure that any difference in the outcome caused by the factors is reasonably linear between the high and low test settings, and far enough apart so any affects are noticeable. "High" and "low" are arbitrary terms and it does not matter if the "low" value (of temperature, for example) is actually greater than the "high" value. For parameters that are either present or absent, like the use of flux, one setting, "high" or "low" will represent the presence of the factor, and the other setting will represent the absence of the factor. For analysis purposes, the values are coded, each high setting given the value plus one (indicated on test matrixes by "+"), and each low setting given the value minus one ("-" on the test matrixes). Hence, if the high value of temperature were 120 degrees and the low value were 80 degrees, a setting of 120 degrees would be valued as plus one and one of eighty degrees valued as minus one. A value of zero would correspond to a setting of 100 degrees which is not a test setting, but may be a solution after analysis (to be discussed later). We also need a measure of the process output. This could be solder defects per card for the wave solder example, rod length for a rod manufacturing process, or miles per gallon for an automobile. Notice we may want to minimize, normalize or maximize the output. We will do so by setting process parameters to values determined by the outcome of the experiment. The key to test efficiency is the use of "orthogonal arrays". These are matrices describing test settings that allow the effects of each factor to be separated from the others. An example for a two parameter test is given by Table 8-1.
72
STAT
Table 8-1: Two Factor Orthogonal Array

Run 1 2 3 4 A + + B + +
In Table 8-1, A and B represent test factors, such as temperature and wave height. A plus in the matrix under a factor means the high setting is used during the corresponding test run and a minus means that the low test setting is used. Each test run is a repetition of the established test (e.g., processing 100 cards through the wave solder machine) with the settings as shown in the matrix. The orthogonal array of Table 8-1 is a full factorial array in that all combinations of high and low settings for all factors are tested. Such an array will also permit the analysis of all possible interactions between factors as a by-product. Expanding Table 8-1 to include the interaction of the two factors, and providing a column for results, we get the matrix shown in Table 8-2. Table 8-2: Expanded Test Matrix
Run 1 2 3 4 Factors (test settings) A B + + + + Interactions (by-products) A*B + + Results (measured outcomes)
Setting both factors A and B to "high" or both to "low" is equivalent to setting their interaction to "high". For other combinations, the interaction is "low". The analysis will not differentiate between factors that are set in the experiment and factors that are defined as byproducts. The actual running of the tests should be done to minimize the probability of any bias from factors that are not being tested. For example, if there were some concern that the workers involved may make a difference, all the tests would be run with the same workers. Environmental effects such as humidity could be compensated for by repeating the same test run on different days and averaging the outcomes. The most ideal situation would use several iterations of each run performed in a random order. Once the outcome for each run is determined, a linear regression is performed to determine the optimum settings for the parameters tested. The general form of this is shown in Equation (8-1) for the array shown in Table 8-3.
STAT
73
Table 8-3: Orthogonal Array

Run 1 2 3 4 AVGAVG+ A + + Factors (test settings) B + + Interactions (by-products) A*B + + Results (measured) y1 y2 y3 y4
(y1 + y2)/2 (y2 + y3)/2 (y1 + y3)/2 (y3 + y4)/2 (y1 + y4)/2 (y2 + y4)/2 (AVG+) - (AVG-) for each column
Y = Y +
A B (A * B) A + B + (A * B) 2 2 2
(8-1)
where: expected output average output = (y1 + y2 + y3 + y4)/4 (AVG+) - (AVG-) values from column A in matrix coded value of A (high setting = +1, low setting = -1) B , B, C , C, similar to A , A AVG+ = Average outcome when the factor in a column is at a high setting AVG- = Average outcome when the factor in a column is at its low setting Equation 8-1 merely quantifies an assumption that the output will change from its mean value linearly with changes in the factors. The difference between "AVG+" and "AVG-" ( ) for each factor is the change in Y as the factor varies from -1 to +1 (its low and high values). These values of " " are used in Equation 8-1 multiplied by the coded values of the factors. If A were zero (the mid value between -1 and +1), it would have no impact changing the output from its average value ( Y ). When A is -1, it reduces the output by ( A)/2, and when A is +1, it increases the output by ( A)/2. Factors B and A*B have similar effects, based on the measured differences in the output, B and (A*B), as the factors vary from low to high. Hence, Equation 8-1 predicts the outcome of a process for any values of the factors between the high and low values tested. The regression equation is then used to find values of the factors (between plus and minus one) which give the desired output (maximum, minimum or nominal). These values are then translated to settings (e.g., temperature) to be used in the new process. To illustrate, let us use some hypothetical values, as shown in Table 8-4. Y Y A A = = = =
74
STAT
Table 8-4: Sample Test Results

Run 1 2 3 4 AVGAVG+ A + + 9 5 -4 B + + 8 6 -2 A*B + + 7 7 0 Y 10 6 8 4
Y=7
Y = 7 - 2A - B + 0(A*B) = 7 - 2A - B Using the equations given in Table 8-3, the outcomes listed in Table 8-4 result in the regression equation Y = 7 - 2A - B. If Y represented some undesirable quantity (e.g., a defect rate), we would want to minimize it. Hence, A and B would be set to their high values. In the equation, the high settings are represented by plus one, so Y = 7 - 2 - 1 = 4. We would expect the average defect rate to change from 7 to 4 by using the high settings of both factors. If the outcome were something we want to maximize, such as miles per gallon, we would maximize the regression equation by setting both A and B to their low settings (-1). In that case, Y = 7 - (-2) - (-1) = 7 + 2 + 1 = 10. We would raise the average gas mileage from 7 to 10 mpg by setting factors A and B to their low values. Suppose the outcomes were a measurement that we wanted to be 5.0. Then we would set the vales of A and B between plus and minus one so that Y = 5. One way would be to set A at plus one and B at zero. Hence, Y = 7 - 2 - 0 = 5. A would be set at the high value represented by plus one, and B would be set at a value mid way between the high value and the low value. This may not be possible. If B is a parameter which can only be present or absent, it can only take the values plus and minus one, and hence the above solution could not apply. In that case, B could be set to plus one and A set to 0.5 (a setting three quarters of the difference between the low setting and the high setting) for Y = 7 - (2 x 0.5) - 1 = 7 - 1 - 1 = 5. The optimal setting should also consider the costs involved. It may, for example, be less costly to change A than to change B. Once the optimal settings are derived, it is good practice to perform a test at those settings to confirm that the expected new output is indeed achieved. If it is not, it indicates there is another factor at work that was not tested. This should be identified and made a factor in a new set of experiments. Once the improvements have been verified and initiated, new experiments can be devised to see if settings outside the range tested can produce more improvements. The simple array of our example can be expanded to handle any number of factors. There will always be 2n test runs, where n = the number of factors. A three factor, full-factorial matrix would be as shown in Table 8-5.
STAT
75
Table 8-5: Three Factor Full-Factorial Array

Run 1 2 3 4 5 6 7 8 Factors (test settings) A B C + + + + + + + + + + + + Interactions (by-products) A*B B*C A*C A*B*C + + + + + + + + + + + + + + + + Outcomes
8.1.1 Saturated Arrays: Economical, but Risky In our numerical example (Table 8-4), the interaction of A and B had no effect. This is often the case. When it is reasonably safe to assume there will be no interactions, a great economy can be achieved by using saturated arrays. In these arrays, the interaction columns are used to determine test settings for additional factors. For example, in the array of Table 8-2, a factor "C" can be tested using the settings in the column representing the by-product A*B, as shown in Table 8-6. Table 8-6: Saturated Array (Table 8-2 Modified)
Run 1 2 3 4 A + + Factors (test settings) B C (replaces A*B) + + + + Results (outcomes)
Thus, three factors can be tested in the same number of tests used for two factors in a fullfactorial array. The matrix of Table 8-5 would permit the testing of seven factors in lieu of the three it is designed for. The risk is, of course, that some interaction is significant and its effects will be confounded with the new factor using the same settings. Saturated arrays are also called Taguchi arrays after Genechi Taguchi, a leading proponent of their use. It is also possible to use hybrid arrays in which some of the columns of expected interactions are not used for additional factors, but other columns are. For example, triple interactions are quite rare, so the column in Table 8-5 for A*B*C can often be commandeered for a new factor, providing the new factor is not likely to interact with any of the others. The test will then show all the interactions of the other factors and the effects of the new factor without any additional runs. 8.1.2 Testing for Robustness We have considered only one outcome per experiment. It is also possible to use multiple outcomes (dimensions, strength, defects) in order to observe the effects of the factors on all important considerations. The best solution would then be the one that provided the best overall results. Another variation is to perform the experiment under differing values of an uncontrollable factor, such as atmospheric pressure. Desired "settings" of uncontrolled factors
76
STAT
can be obtained either by waiting for the factor to assume a desired test value or by using special test equipment, such as environmental chambers, to simulate the factor. Again, the preferred solution is a set of settings for the controllable factors which give the best overall results. For example, Table 8-7 shows outcomes for three controlled and two uncontrolled variables. If the desired result was the lowest output (it might represent defect rate, for example) and the uncontrolled factors were equally likely to be at their low and high values, the settings for test run 6 would be preferred, even though under some conditions other test settings produced better outcomes. Table 8-7: Testing for Robustness
Controlled Factors Settings Test A B C Run 1 2 + 3 + 4 + + 5 + 6 + + 7 + + 8 + + + Uncontrolled D Factors E Uncontrolled Factors "Settings" + + + + 9.0 1.8 1.6 7.8 2.6 2.0 2.3 4.8 2.7 2.0 1.5 3.4 2.2 1.5 1.7 1.9 2.4 1.6 1.5 2.9 1.9 1.7 1.7 1.8 3.3 1.6 1.6 3.3 2.6 1.8 1.6 1.9
Finally, Taguchi uses outcomes expressed as signal to noise measures, which consider both the mean and the variation in the output. Interested readers are invited to pursue these avenues on their own. A comprehensive basic text on the subject is Understanding Industrial Designed Experiments, by S.R. Schmidt and R.G. Launsby, Air Academy Press, Colorado Springs, CO, 1989. 8.2 Is There Really a Difference? Recognizing the existence of variance in all data, one should always consider that a difference in two measurements of a parameter, such as the measured outcome of an experiment, might be due to chance and not to other factors. For example, consider the data shown in Table 8-8. Table 8-8: Defect Data
Day of the Week Monday Tuesday Wednesday Thursday Friday Total Average Defects per 100 Units Produced Day Shift Night Shift 1 3 2 7 6 7 4 6 7 2 20 25 4 5
There seems to be a difference between the defect rate of the day shift and that of the night shift. However, there is a wide range in the data for each shift, and we should question the validity of assuming there is really a difference between the shifts. To resolve the issue, we will use a statistical technique from a family of techniques known as ANOVA.
STAT
77
ANOVA stands for Analysis of Variance, and is intended to provide means to separate the influences of many different factors on a parameter of interest. The specific ANOVA application which can determine the significance of the difference between two sets of data requires the following assumptions: 1. 2. 3. 4. The data points follow a normal distribution The data points are independent from each other Variability is about the same in each set of data The data sets we are comparing have the same number of data points
The basic premise we shall use is that if the data sets were really different, there would be a wider variation between the data sets than within the sets. The variance within the data could be estimated from the variance in either data set, but under our assumptions, we can use both data sets and calculate a quantity called the Mean Square Error (MSE) from the formula:
2 (n - 1) Si k
MSE = 1 where: k n
k(n - 1)
(8-2)
= number of data sets = number of measurements in each set
2 Si = Variance of one data set
and:
2 Si
=
1
( y i - y i )2
n -1
(8-3)
where:
yi = yi =
value of one data point in a data set mean value of data in the set
The term k(n - 1) is called the degrees of freedom provided by the data. From the data in Table 8-8, the degrees of freedom = 2(5 - 1) = 8.
78
STAT
The variance between groups, called the mean square error between groups (MSB) is computed by:
MSB =
where:
n k 2 (y i - y ) k -1 1
(8-4)
y = mean of all data sets, all other terms as defined above k - 1 = the associated degrees of freedom for MSB, = (2 - 1) = 1
If there is a real difference between data sets, the MSB should be greater than the MSE; otherwise the ratio should be close to one. Under the assumptions listed, the ratio of MSB to MSE follows an F distribution. Since this is so, we can use tables of the F distribution, such as the one in Appendix F, to determine whether or not a ratio calculated from the measured data is compatible with a ratio of 1.0 (i.e., the hypothesis that there is no real difference), within a specified risk.
MSB = F MSE
(8-5)
Using the data in Table 8-8,

MSB = 5 [(4 - 4.5) 2 + (5 - 4.5) 2 ] = 2.5 2 -1
2 (5 - 1)Sday + (5 - 1)S 2 night
(8-6)
MSE =
2(5 - 1)
5 2 y i - y day 1
(8-7)
2 S day
5 -1
(1 - 4)2 + (2 - 4)2 + (6 - 4)2 + (4 4)2 + (7 - 4)2

4
(- 3)2 + (- 2)2 + (2)2 + (0)2 + (3)2

4
5 2 y i - y night 1
9+4+4+0+9 26 = = 6.5 4 4
(8-8)
S2 night
5 -1
(3 - 5)2 + (7 - 5)2 + (7 - 5)2 + (6 - 5)2 + (2 - 5)2

4
(- 2)2 + (2)2 + (2)2 + (1)2 + (- 3)2

4
4 + 4 + 4 +1+ 9 22 = = 5.5 4 4
(8-9)
STAT
79
Thus:
MSE = 4(6.5) + 4(5.5) 26 + 22 48 = = = 6 8 8 8
(8-10)
and:
F = MSB 2.5 = = 0.417 MSE 6
Tables of the F distribution are organized by degrees of freedom and percentiles (see Appendix F). The degrees of freedom are a function of the data, and the percentile is equivalent to the risk we take (i.e., the probability of being wrong). As discussed many times earlier in this text, it marks the border of an area of the distribution equal to our acceptable risk (i.e., the probability of the statistic being in the area cut off is equal to our defined risk when the hypothesis that there is no difference in the data is true). If we are willing to be wrong no more than 5% of the time, we use the 0.05 percentile table. Excerpts from such a table are given in Table 8-9. (To avoid extrapolation, we extracted Table 8-9 from a more extensive table than the one in Appendix F, which does not have data points for eight degrees of freedom.) Table 8-9: Critical Values for F at 0.05 Significance
Critical values of F for 0.05 risk Degrees of Degrees of Freedom for MSB Freedom for 1 2 3 4 MSE 1 161 200 216 225 2 18.5 19.0 19.2 19.2 4 7.71 6.94 6.59 6.39 8 5.32* 4.46 4.07 3.84 * Critical value for data used in the example
5 230 19.3 6.26 3.69
The table lists what are called critical values. If our calculated F statistic exceeds the value in the table for the degrees of freedom provided by the data, we can reject the hypothesis that there is no real difference with no more than a risk equal to the percentile (5% using the above table). In this case, the table value for 1,8 degrees of freedom is 5.32 and our calculated F statistic is 0.417. Hence, we must accept the hypothesis that there is no real difference between the data sets. This procedure can be used to test the significance of differences in outcomes of statistical experiments to avoid making expensive process changes when the differences are caused by statistical variation rather than the factors tested. 8.3 How Strong is the Correlation? Often it is useful to know the relationship between two variables, such as the outcome of an experiment and one of the factors. A simple way is to plot paired measurements on a scatter diagram. For example, if we want to analyze the relationship between the defect rate of a wave
80
STAT
solder process and the solder temperature, we could measure defect rate at various temperatures and plot the results on a chart with temperature as one axis and defect rate as the other. Figure 8-1 shows such a plot.
o o o o o o o o o o o o o o o
DEFECT RATE
SOLDER TEMPERATURE
Figure 8-1: Scattergram In interpreting the scatter diagram, it is important to note that the slope of a line drawn through the cloud of data points is an artifact of the scales of the axes. Hence, unlike most charts, the slope of a scattergram is not an indicator of correlation. Rather, the width of the cloud is the indicator. The narrower the cloud, the better the correlation. Although an eyeball analysis may be sufficient for many uses, a quantitative evaluation of correlation is often quite useful. This is easily, if somewhat tediously, accomplished using the correlation coefficient.
r =
( Dx )( Dy )
2 2
Dx Dy
(8-11)
where: Dx = the difference between a value of x and the mean of the values of x = ( x i - x) Dy = the difference between a value of y and the mean of the values of y = ( y i - y) Perfect correlation would be shown by a correlation coefficient equal to one. A negative result shows an inverse correlation (as one factor increases, the other decreases) and a minus one is a perfect inverse correlation. A figure of zero indicates no relationship between the variables. To illustrate, let us use the following data shown in Table 8-10.
STAT
81
Table 8-10: Paired Data

Data Pair 1 2 3 4 5 6 7 8 9 10 x (Temperature) 250 255 260 265 270 275 280 285 290 295 y (Defect Rate) 0.030 0.025 0.040 0.030 0.020 0.010 0.035 0.020 0.010 0.015
The scattergram for this data is given in Figure 8-2.

.040
.030 Defect Rate .020
.010
250
255
260
270 Temperature
280
290
295
Figure 8-2: Scattergram of Data in Table 8-10 This appears to show some correlation, but the data cloud is wide, and the correlation may or may not be significant. To resolve the issue, we will compute the correlation coefficient. Computing the terms we need to solve for the correlation coefficient is made easier by using a data table such as the following shown in Table 8-11.
82
STAT
Table 8-11: Data Analysis

Data Pair 1 2 3 4 5 6 7 8 9 10 x 250 255 260 265 270 275 280 285 290 295 y 0.030 0.025 0.040 0.030 0.020 0.010 0.035 0.020 0.010 0.015
y
Dx (x - x ) 22.5 17.5 12.5 7.5 2.5 -2.5 -7.5 -12.5 -17.5 -22.5
Dx
Dy (y - y ) -0.0065 -0.0015 -0.0165 -0.0065 +0.0035 +0.0135 -0.0115 +0.0035 +0.0135 +0.0085
Dy2 0.00004225 0.00000225 0.00027255 0.00004225 0.00001225 0.00018225 0.00013225 0.00001225 0.00018225 0.00007225 2 Dy
DxDy -0.14625 -0.02625 -0.20625 -0.04875 +0.00875 -0.03375 +0.08625 -0.04375 -0.23625 -0.19125
506.25 306.25 156.25 56.25 6.25 6.25 56.25 156.25 306.25 506.25 2 Dx
DxDy
272.5 0.0235 Solving Equation 8-11:

r = - 0.8375
2062.5
0.0009528
-0.8375
(2062.5)(0.0009528)
= - 0.5974
(8-12)
The result shows a fair negative correlation between temperature and defect rate. As one goes down the other goes up. However, there is a lot of noise in the data and predicting one factor from the other can be done only roughly. Caveat: Note that correlation does not necessarily mean causation. In the example, changes in defect rate may be caused by changes in temperature, but it is possible that both are changing in response to some other factor. For example, the speed of the solder flow might be the ultimate cause of both temperature changes and defects. Also, one can imagine scenarios where the defect rate could cause temperature changes (suppose the operator reacted to defects by adjusting the machine in a way that changes the operating temperature). The moral is that to improve the process, isolation of the truly critical factors is required.
STAT
Section 9: Closing Comments
83
9.0 CLOSING COMMENTS While both statistics and reliability engineering encompass far more than this book attempts to cover, the two disciplines intersect quite often in the problems encountered by the reliability engineer. The Reliability Analysis Center hopes the reader has found this book a relatively painless introduction to the world of statistics, and a useful reference for its practical application in reliability engineering.
84
STAT
STAT
Appendix A: Poisson Probabilities
85
86
STAT
STAT
87
Appendix A: Poisson Probabilities The probability of "x" events occurring, when "a" are expected.
Values of "x" 0 1 2 3 4 5 6 Values of "a" 0.4 0.5 0.6 0.6703 0.6065 0.5488 0.2681 0.3033 0.3293 0.0536 0.0758 0.0988 0.0072 0.0126 0.0198 0.0007 0.0016 0.0030 0.0001 0.0002 0.0004 0.0000 0.0000 0.0000
0.1 0.9048 0.0905 0.0045 0.0002 0.0000 0.0000 0.0000
0.2 0.8187 0.1637 0.0164 0.0011 0.0001 0.0000 0.0000
0.3 0.7408 0.2222 0.0333 0.0033 0.0003 0.0000 0.0000
0.7 0.4966 0.3476 0.1217 0.0284 0.0050 0.0006 0.0001
0.8 0.4493 0.3595 0.1438 0.0383 0.0077 0.0012 0.0002
0.9 0.4066 0.3659 0.1647 0.0494 0.0111 0.0020 0.0003
88
STAT
Values of "x" 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
1.0 0.3679 0.3679 0.1839 0.0613 0.0153 0.0031 0.0005 0.0001 0.0000 0.0000 0.0000
2.0 0.1353 0.2707 0.2707 0.1804 0.0902 0.0361 0.0120 0.0034 0.0009 0.0002 0.0001 0.0000 0.0000
3.0 0.0498 0.1494 0.2240 0.2240 0.1680 0.1008 0.0504 0.0216 0.0081 0.0027 0.0008 0.0002 0.0001 0.0001 0.0000 0.0000
Values of "a" 4.0 5.0 0.0183 0.0067 0.0733 0.0337 0.1465 0.0842 0.1954 0.1404 0.1954 0.1755 0.1563 0.1755 0.1042 0.1462 0.0595 0.1044 0.0298 0.0653 0.0132 0.0363 0.0053 0.0181 0.0019 0.0082 0.0006 0.0034 0.0002 0.0013 0.0001 0.0005 0.0000 0.0002 0.0000 0.0001 0.0000 0.0000 0.0000
10.0 0.0000 0.0005 0.0023 0.0076 0.0189 0.0378 0.0631 0.0901 0.1126 0.1251 0.1251 0.1137 0.0948 0.0729 0.0521 0.0347 0.0217 0.0128 0.0071 0.0037 0.0019 0.0009 0.0004 0.0001 0.0001 0.0000 0.0000 0.0000
15.0 0.0000 0.0000 0.0000 0.0002 0.0006 0.0019 0.0048 0.0104 0.0194 0.0324 0.0486 0.0663 0.0829 0.0956 0.1024 0.1024 0.0960 0.0847 0.0706 0.0557 0.0418 0.0299 0.0204 0.0133 0.0083 0.0050 0.0029 0.0016 0.0009 0.0004 0.0003 0.0001 0.0001 0.0000 0.0000 0.0000
20.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0005 0.0013 0.0029 0.0058 0.0106 0.0176 0.0271 0.0387 0.0516 0.0646 0.0760 0.0844 0.0888 0.0888 0.0846 0.0769 0.0669 0.0557 0.0446 0.0343 0.0254 0.0181 0.0125 0.0083 0.0054 0.0034 0.0020 0.0012 0.0007 0.0004 0.0003 0.0002 0.0001
STAT
Appendix B: Cumulative Poisson Probabilities
89
90
STAT
STAT
91
Appendix B: Cumulative Poisson Probabilities The probability of "x or less" events occurring, when "a" are expected.
Values of "x" 0 1 2 3 4 5 6 Values of "a" 0.4 0.5 0.6 0.6703 0.6065 0.5488 0.9384 0.9098 0.8781 0.9920 0.9856 0.9769 0.9992 0.9982 0.9967 0.9999 0.9998 0.9997 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
0.1 0.9084 0.9953 0.9998 1.0000 1.0000 1.0000 1.0000
0.2 0.8187 0.9824 0.9988 0.9999 1.0000 1.0000 1.0000
0.3 0.7408 0.9630 0.9963 0.9966 1.0000 1.0000 1.0000
0.7 0.4966 0.8442 0.9659 0.9943 0.9993 0.9999 1.0000
0.8 0.4493 0.8088 0.9526 0.9909 0.9986 0.9998 1.0000
0.9 0.4066 0.7725 0.9372 0.9866 0.9977 0.9990 1.0000
92
STAT
Values of "x" 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
1.0 0.3679 0.7358 0.9197 0.9810 0.9963 0.9994 0.9999 1.000 1.000 1.000
2.0 0.1353 0.4060 0.6767 0.8571 0.9473 0.9834 0.9954 0.9988 0.9997 0.9999 1.0000 1.0000 1.0000
3.0 0.0498 0.1992 0.4232 0.6472 0.8152 0.9160 0.9664 0.9880 0.9961 0.9988 0.9996 0.9998 0.9999 1.000 1.000 1.000
Values of "a" 4.0 5.0 0.0183 0.0067 0.0916 0.0404 0.2381 0.1246 0.4335 0.2650 0.6289 0.4405 0.7852 0.6160 0.8894 0.7622 0.9489 0.8666 0.9787 0.9319 0.9919 0.9682 0.9972 0.9863 0.9991 0.9945 0.9997 0.9979 0.9999 0.9992 1.000 0.9997 1.000 0.9999 1.000 1.000
10.0 0.0000 0.0005 0.0028 0.0104 0.0293 0.0671 0.1302 0.2203 0.3329 0.4580 0.5831 0.6968 0.7916 0.8645 0.9166 0.9513 0.9730 0.9858 0.9929 0.9966 0.9985 0.9994 0.9998 0.9999 1.000
15.0 0.0000 0.0000 0.0000 0.0002 0.0008 0.0027 0.0075 0.0179 0.0373 0.0697 0.1183 0.1846 0.2675 0.3631 0.4655 0.5679 0.6639 0.7486 0.8192 0.8749 0.9167 0.9466 0.9670 0.9803 0.9886 0.9936 0.9965 0.9981 0.9990 0.9994 0.9997 0.9999 1.0000 1.0000 1.0000
20.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0003 0.0008 0.0021 0.0050 0.0108 0.0214 0.0390 0.0661 0.1048 0.1564 0.2210 0.2970 0.3814 0.4702 0.5590 0.6436 0.7205 0.7874 0.8431 0.8877 0.9220 0.9474 0.9655 0.9780 0.9863 0.9917 0.9951 0.9971 0.9983 0.9990 0.9994 0.9997 0.9999 1.000
STAT
Appendix C: The Standard Normal Distribution
93
94
STAT
STAT
95
Figures are the areas under the curve between 0 (the mean value of Z) and Z. Note that the same figures apply to the areas from 0 to -Z. The area from -Z to Z is simply twice the values shown. The areas in the tails from Z to plus infinity are one (the total area under the curve) minus 0.5 (the area from minus infinity to 0) minus the figures shown in the table (the areas between 0 and Z). The same formula computes the area under the tail of the curve between -Z and minus infinity. To find the area in both tails outside of the range -Z to Z, multiply the figures given in the table (the areas from 0 to Z) by two (yielding the areas from -Z to Z), and subtract these from one (the total area under the curve).
Z 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Area 0 to Z 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.9192 0.4332 0.4452 0.4554 0.4641 0.4613 Z 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 Area 0 to Z 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987 0.4990 0.4993 0.4995 0.4997
0.5000
96
STAT
STAT
Appendix D: The Chi-Square Distribution
97
98
STAT
STAT
99

Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Chi-square value when the area under the curve from the value to infinity is: 0.99 0.98 0.95 0.90 0.80 0.70 0.50 0.000157 0.000628 0.00393 0.0158 0.0642 0.148 0.455 0.0201 0.0404 0.103 0.211 0.446 0.713 1.386 0.115 0.185 0.352 0.584 1.005 1.424 2.366 0.297 0.429 0.711 1.064 1.649 2.195 3.357 0.554 0.752 1.145 1.610 2.343 3.000 4.351 0.872 1.134 1.635 2.204 3.070 3.828 5.348 1.239 1.564 2.167 2.833 3.822 4.671 6.346 1.646 2.032 2.733 3.490 4.594 5.527 7.344 2.088 2.532 3.325 4.168 5.380 6.393 8.343 2.558 3.059 3.940 4.865 6.179 7.267 9.342 3.053 3.609 4.575 5.578 6.989 8.148 10.341 3.571 4.178 5.226 6.304 7.807 9.034 11.340 4.107 4.765 5.892 7.042 8.634 9.926 12.340 4.660 5.368 6.571 7.790 9.467 10.821 13.339 5.229 5.985 7.261 8.547 10.307 11.721 14.339 5.812 6.614 7.962 9.312 11.152 12.624 15.338 6.408 7.255 8.672 10.085 12.002 13.531 16.338 7.015 7.906 9.390 10.865 12.857 14.440 17.338 7.633 8.567 10.117 11.651 13.716 15.352 18.338 8.260 9.237 10.851 12.443 14.578 16.266 19.337 8.897 9.915 11.591 13.240 15.445 17.182 20.337 9.542 10.600 12.338 14.041 16.314 18.101 21.337 10.196 11.293 13.091 14.848 17.187 19.021 22.337 10.856 11.992 13.848 15.659 18.062 19.943 23.337 11.524 12.697 14.611 16.473 18.940 20.867 24.337 12.198 13.409 15.379 17.292 19.820 21.792 25.336 12.879 14.125 16.151 18.114 20.703 22.719 26.336 13.565 14.847 16.928 18.939 21.588 23.647 27.336 14.256 15.574 17.708 19.768 22.475 24.577 28.336 14.953 16.306 18.493 20.599 23.364 25.508 29.336
100
STAT
Degrees of Chi-square value when the area under the curve from the value to infinity is: Freedom 0.30 0.20 0.10 0.05 0.02 0.01 1 1.074 1.642 2.706 3.841 5.412 6.635 2 2.408 3.219 4.605 5.991 7.824 9.210 3 3.665 4.642 6.251 7.815 9.837 11.341 4 4.878 5.989 7.779 9.488 11.668 13.277 5 6.064 7.289 9.236 11.070 13.388 15.086 6 7.231 8.558 10.645 12.592 15.033 16.812 7 8.383 9.803 12.017 14.067 16.622 18.475 8 9.524 11.030 13.362 15.507 18.168 20.090 9 10.656 12.242 14.684 16.919 19.679 21.666 10 11.781 13.442 15.987 18.307 21.161 23.209 11 12.899 14.631 17.275 19.675 22.618 24.725 12 14.011 15.812 18.549 21.026 24.054 26.217 13 15.119 16.985 19.812 22.362 25.472 27.688 14 16.222 18.151 21.064 23.685 26.873 29.141 15 17.322 19.311 22.307 24.996 28.259 30.578 16 18.418 20.465 23.542 26.296 29.633 32.000 17 19.511 21.615 24.769 27.587 30.995 33.409 18 20.601 22.760 25.989 28.869 32.346 34.805 19 21.689 23.900 27.204 30.144 33.687 36.191 20 22.775 25.038 28.412 31.410 35.020 37.566 21 23.858 26.171 29.615 32.671 36.343 39.932 22 24.939 27.301 30.813 33.924 37.659 40.289 23 26.108 28.429 32.007 35.172 38.968 41.638 24 27.096 29.553 33.196 36.415 40.270 42.980 25 28.172 30.675 34.382 37.652 41.566 44.314 26 29.246 31.795 35.563 38.885 42.856 45.642 27 30.319 32.912 36.741 40.113 44.140 46.963 28 31.391 34.027 37.916 41.337 45.419 48.278 29 32.461 35.139 39.087 42.557 46.693 49.588 30 33.530 36.250 40.256 43.773 47.962 50.892
STAT
Appendix E: The Student t Distribution
101
102
STAT
STAT
103
Appendix E: The Student t Distribution Degrees of Freedom = Sample Size - 1

Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120 Inf. Value of t when area from - to t is: 0.95 0.975 0.99 6.314 12.706 31.821 2.920 4.303 6.965 2.353 3.182 4.541 2.132 2.776 3.747 2.015 2.571 3.365 1.943 2.447 3.143 1.895 2.365 2.998 1.860 2.306 2.896 1.833 2.262 2.821 1.812 2.228 2.764 1.796 2.201 2.718 1.782 2.179 2.681 1.771 2.160 2.650 1.761 2.145 2.624 1.753 2.131 2.602 1.746 2.120 2.583 1.740 2.110 2.567 1.734 2.101 2.552 1.729 2.093 2.539 1.725 2.086 2.528 1.721 2.080 2.518 1.717 2.074 2.508 1.714 2.069 2.500 1.711 2.064 2.492 1.708 2.060 2.485 1.706 2.056 2.479 1.703 2.052 2.473 1.701 2.048 2.467 1.699 2.045 2.462 1.697 2.042 2.457 1.684 2.021 2.423 1.671 2.000 2.390 1.658 1.980 2.358 1.645 1.960 2.326
0.75 1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.681 0.679 0.677 0.674
0.90 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.310 1.296 1.289 1.282
0.995 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.617 2.576
0.9995 636.619 31.599 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.460 3.373 3.291
104
STAT
STAT
Appendix F: Critical Values of the F Distribution for Tests of Significance
105
106
STAT
STAT
107
Appendix F: Critical Values of the F Distribution for Tests of Significance for 1% risk
d.o.f. MSE 1 2 3 4 5 10 20 40 inf. 1 4052 98.5 34.1 21.2 16.3 10.0 8.10 7.31 6.63 2 5000 99.0 30.8 18.0 13.3 7.56 5.85 5.18 4.61 3 5403 99.2 29.5 16.7 12.1 6.55 4.94 4.31 3.78 Degrees of Freedom for MSB 4 5 10 5625 5764 6056 99.2 99.3 99.4 28.7 28.2 27.2 16.0 15.5 14.5 11.4 11.0 10.1 5.99 5.64 4.85 4.43 4.10 3.37 3.83 3.51 2.80 3.32 3.02 2.32 20 6210 99.4 26.7 14.0 9.55 4.41 2.94 2.37 1.88 40 6290 99.5 26.4 13.7 9.29 4.17 2.69 2.11 1.59 inf. 6370 99.5 26.1 13.5 9.02 3.91 2.42 1.80 1.00
for 5% risk
d.o.f. MSE 1 2 3 4 5 10 20 40 inf. 1 161 18.5 10.1 7.71 6.61 4.96 4.35 4.08 3.84 2 200 19.0 9.55 6.94 5.79 4.10 3.49 3.23 3.00 3 216 19.2 9.28 6.59 5.41 3.71 3.10 2.84 2.60 Degrees of Freedom for MSB 4 5 10 225 230 242 19.2 19.3 19.4 9.12 9.01 8.79 6.39 6.26 5.96 5.19 5.05 4.74 3.48 3.33 2.98 2.87 2.71 2.35 2.61 2.45 2.08 2.37 2.21 1.83 20 248 19.4 8.66 5.80 4.56 2.77 2.12 1.84 1.57 40 251 19.5 8.59 5.72 4.46 2.66 1.99 1.69 1.39 inf. 254 19.5 8.53 5.63 4.36 2.54 1.84 1.51 1.00
for 10% risk

d.o.f. MSE 1 2 3 4 5 10 20 40 inf. 1 39.9 8.53 5.54 4.54 4.06 3.28 2.97 2.84 2.71 2 49.5 9.00 5.46 4.32 3.78 2.92 2.59 2.44 2.30 3 53.6 9.16 5.39 4.19 3.62 2.73 2.38 2.23 2.08 Degrees of Freedom for MSB 4 5 10 55.8 57.2 60.2 9.24 9.29 9.39 5.34 5.31 5.23 4.11 4.05 3.92 3.52 3.45 3.30 2.61 2.52 2.32 2.25 2.16 1.94 2.09 2.00 1.76 1.94 1.85 1.60 20 61.7 9.44 5.18 3.84 3.21 2.20 1.79 1.61 1.42 40 62.5 9.47 5.16 3.80 3.16 2.13 1.71 1.51 1.30 inf. 63.3 9.49 5.13 3.76 3.10 2.06 1.61 1.38 1.00
108
STAT

Practical Statistical Tools For The Reliability Engineer

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical Statistical Tools For The Reliability Engineer

Uploaded by

Copyright:

Available Formats

Practical Statistical Tools for the Reliabilit y Engineer

Reliability Analysis Center

Ordering No.: STAT

Practical Statistical Tools for the Reliability Engineer

Sryvhivyv6 hyv8rr ! HvyyTrr SrI` "##%( %

9rsrrTy8rr8yi 9T88QGD QP7 %&#7vyqvt!

Reliability Analysis Center

REPORT DOCUMENTATION PAGE

Form Approved OMB No. 0704-0188

REPORT TYPE AND DATES COVERED

Practical Statistical Tools for the Reliability Engineer

Reliability Analysis Center 201 Mill Street Rome, NY 13440-6916

Approved for public release; distribution unlimited.

14. SUBJECT TERMS

15. NUMBER OF PAGES

120 Probability Statistics Reliability

16. PRICE CODE $75.00

Standard Form 298 (Rev. 2-89)

ALL OTHER REQUESTS SHOULD BE DIRECTED TO:

Practical Statistical Tools for the Reliability Engineer

This page intentionally left blank.

Practical Statistical Tools for the Reliability Engineer

Practical Statistical Tools for the Reliability Engineer

Practical Statistical Tools for the Reliability Engineer

This page intentionally left blank.

Section 1: What You Need to Know About Probability

WHAT YOU NEED TO KNOW ABOUT PROBABILITY

Practical Statistical Tools for the Reliability Engineer

Section 1: What You Need to Know About Probability

Practical Statistical Tools for the Reliability Engineer

Section 1: What You Need to Know About Probability

Table 1-2: Converted Data

P(b | a1 ) P(a1 ) P(b | a i ) P(a i )

0.04 = 0.44 0.09

Practical Statistical Tools for the Reliability Engineer

P(b | a 1 ) P(a 1 ) P(b | a i ) P(a i )

Section 2: Introduction to Statistics

Practical Statistical Tools for the Reliability Engineer

Section 2: Introduction to Statistics

Table 2-2: Spread Analysis

Practical Statistical Tools for the Reliability Engineer

509 Frequency In 1,000 Trials 253 238

Section 2: Introduction to Statistics

Practical Statistical Tools for the Reliability Engineer

Section 3: Some Distributions and Their Uses

SOME DISTRIBUTIONS AND THEIR USES

Practical Statistical Tools for the Reliability Engineer

And the probability of "x or more" events is:

As always, it is useful to remember: P(x or less) = 1 - P(x + 1 or more) (3-6)

Section 3: Some Distributions and Their Uses

where: n = the number of failures allowed

The probability of failing the test would be:

Practical Statistical Tools for the Reliability Engineer

Section 3: Some Distributions and Their Uses

Practical Statistical Tools for the Reliability Engineer

Section 3: Some Distributions and Their Uses

Practical Statistical Tools for the Reliability Engineer

Section 3: Some Distributions and Their Uses

0 z Figure 3-4: Standard Normal Distribution

Practical Statistical Tools for the Reliability Engineer

Section 3: Some Distributions and Their Uses

Practical Statistical Tools for the Reliability Engineer

Sryvhivyv6 hyv8rr ! HvyyTrr SrI` "##%( %

9rsrrTy8rr8yi 9T88QGD QP7 %&#7vyqvt!