You are on page 1of 9

w x ME FEATURE

PODstudy
Applying MIL-HDBK-1823 for POD Demonstration on a Fluorescent Penetrant System
by Jennifer Herberich
From Materials Evaluation, Vol. 67, No. 3, pp: 293-301. Copyright 2009 The American Society for Nondestructive Testing, Inc.

robability of detection (POD) demonstration tests are required by the National Aeronautics and Space Administration (NASA) for programs under fracture control. To be qualified to perform fluorescent liquid penetrant testing (PT) on production hardware that is classified as fracture-critical, an inspector must be able to demonstrate the ability to reliably detect crack-like discontinuities equal to or greater than the critical initial discontinuity size. To determine the smallest size indication that an inspector can distinguish as linear with a specified level of reliability, our company periodically conducts POD demonstration exams of both in-house inspectors and approved external vendors inspectors. Each inspector is tested on each production fluorescent PT system they perform work on, using a known specimen standard. The results are analyzed using established statistical techniques, the outcome of which is used to determine an inspectors capability. However, the validity of the statistical analysis and the associated uncertainty is highly dependent on the design of the specimen standard. The guidelines stated in MIL-HDBK-1823 (1999) provide only minimum guidelines for designing a specimen standard for POD demonstration tests on a fluorescent PT system. This paper provides a practical and statistical review of the guidelines stated in MIL-HDBK-1823 regarding the number of discontinuities, the distribution of discontinuity sizes and how the sizes are distributed across the specimen standard (note that MIL-HDBK-1823 uses the term flaw for discontinuity). A general overview of the accepted

MARCH 2009 MATERIALS EVALUATION

293

ME FEATURE w x pod demonstration

POD analysis technique for fluorescent PT systems and interpretation of results will also be given, since the results of the statistical analysis can be subject to misinterpretation even with a well-designed specimen standard.

Background
Nondestructive testing (NDT) techniques such as fluorescent PT are used for flight quality assurance of rocket engine hardware. Fluorescent PT may be performed at several points during the manufacturing process, after development testing, during overhaul or as part of an investigation. Fluorescent PT is used to evaluate rocket engine hardware that is classified as fracture-critical for discontinuities that are open to the

critical hardware are periodically conducted on both our companys inspectors and approved external vendors inspectors. Each inspector is tested on each fluorescent PT system that is used to evaluate fracturecritical hardware. A system is defined as the complete fluorescent PT process (including application, washing, drying and developing) and inspector evaluation (Lively and Aljundi, 2003). The POD demonstration test is conducted using a set of flat panels with a known number, size distribution and location of low-cycle fatigue-induced cracks. There are typically 16 to 20 panels in a set. The panels have the same length, width, thickness and material. The known cracks are randomly distributed across the panel set with an unequal number of cracks per panel. At least one blank

The detection of linear indications is of primary interest when performing fluorescent PT on fracture-critical hardware.
surface. Fracture-critical is defined per NASA-STD-5009 (2008) as a classification that assumes that cracks in the hardware could lead to a failure mode that results in loss of life, serious personal injury, or loss of the manned flight system or national asset. The detection of linear indications is of primary interest when performing fluorescent PT on fracture-critical hardware. A linear indication may be indicative of a surface crack. A surface crack may occur as a result of inprocess manufacturing or fatigue during surface life. The objective of fluorescent PT on fracture-critical hardware is to detect cracks or crack-like discontinuities before they grow to a critical length, which could affect the structural integrity of a part. We are required by NASA to conduct POD demonstration tests on fluorescent PT systems that are used to examine fracturecritical hardware. The results of a POD demonstration test are used to assess the capability of an inspector with respect to a specified performance objective. Per NASA-STD-5009 fracture control requirements for manned space flight, the objective is to estimate the crack length that an inspector is able to reliably detect with 90% probability and 95% confidence. Commonly referred to as an inspectors POD score, this result must be smaller than the critical initial discontinuity size for a given part in order for an inspector to be qualified to perform fluorescent PT on that part. Probability of detection demonstration tests on fluorescent PT systems used to evaluate fracturepanel is also included in the set. The flat panels are processed through the fluorescent PT system and tested under ultraviolet light in the same manner and under the same conditions as production hardware would be. If the inspector determines that a linear indication is present, it is recorded as a find and its location on the panel is also recorded. The test results are then matched against the documented cracks. If a known crack was found, it is recorded as a hit. If a known crack was missed, it is recorded as a miss. If the inspector identified an area that does not have a known crack, it is recorded as a false call. Because the results of the POD demonstration test are binary in nature, the hit/miss analysis described in MIL-HDBK1823 is performed to estimate a POD curve, which describes how likely it is that the inspector will find a crack of a given size under standard fluorescent PT process operation. The resulting POD curve and associated statistical uncertainty are used to determine an inspectors POD score. With respect to designing (or obtaining) a specimen standard, MIL-HDBK-1823 provides recommendations on the number of discontinuities, the discontinuity size distribution and how the discontinuities should be distributed across the specimen set. Though the hit/miss analysis technique described MIL-HDBK-1823 is based on well-known and wellestablished statistical techniques, the number of discontinuities, distribution of discontinuity sizes, and

294

MATERIALS EVALUATION MARCH 2009

how the discontinuities are distributed across the specimen set can affect the validity of the hit/miss analysis and resulting POD curve. The amount of statistical uncertainty associated with estimating an inspectors POD score is also highly dependent on these features. As indicated in MIL-HDBK-1823, the guidelines provided are only minimum considerations when designing a specimen standard. To understand the rationale behind the recommendations in MIL-HDBK-1823 and why the number of cracks and distribution of crack lengths are critical features of a fluorescent PT POD panel set, it is necessary to have a general understanding of the underlying statistics in the hit/miss analysis.

priate model to use when modeling POD versus crack length since the predicted probability is allowed to exceed 1 as crack length increases. To restrict the model predicted probabilities to be between 0 and 1, a function of pi will be modeled by b0 + b1xi rather than modeling pi directly. This is the foundation of logistic regression. Let g(pi) represent some function of pi. This function is called the link function. There are several established link functions. The two link functions that will be discussed are the logit and probit link functions, which are commonly used in general and are referenced in MIL-HDBK-1823. The logistic link function is g i = ln i 1 i

Hit/Miss Analysis
Crack size is characterized in terms of its surface length as measured under white light. Crack length is the easiest characteristic to quantify using nondestructive techniques, and thus has been established as the primary predictor variable for estimating POD. As crack length increases, POD is expected to increase. This is observable in the POD demonstration test results. As crack length increases, misses start to overlap with and transition to hits. Logistic regression models, which belong to a special subclass of non-linear models, are the traditional statistical models used to describe the relationship between continuous (or categorical) variables and binary outcomes, and are the basis for the hit/miss analysis method. The hit/miss binary outcomes of the POD demonstration test are what provide the information on POD and how it relates to crack size. Simply speaking, probability is calculated as the number of occurrences of an event over the total number of opportunities. In terms of POD, probability can be thought of as the number of hits over the number of cracks of a given size. However, the probability is not expected to remain constant for all crack sizes. Rather it is expected to increase with crack size. Let pi be the POD for a crack of length xi. Before considering a logistic regression model, suppose POD were to be modeled as a simple linear function of crack length, that is, (1) i = 0 + 1 x i

(2)

( )

and the logistic regression model, referred to as the logit model, is ln i = 0 + 1 x i 1 i

(3)

Solving for pi the underlying non-linear regression function that is being used to describe the relationship between pi and the predictor variable xi is (4) i = exp 0 + 1 x i

1 + exp 0 + 1 x i

where b0 + b1xi = the linear component.

Simple Linear Regression Model, p = b0 + b1*x Predictor Variable (x) = Crack Length
1.4 1.2 1.0 Simple Linear Model Predictions Simulated Hit (1)/Miss (0) Data

Probability

0.8 0.6 0.4 0.2 0.0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

by regressing the hit/miss outcomes (that is, 1 = hit, 0 = miss) against crack length. Figure 1 illustrates a simple linear regression fit to simulated results from a POD demonstration test. Note that the Y axis coordinate for the simulated POD demonstration test results plotted in Figure 1 is 0 for a miss and 1 for a hit. Though perhaps a natural initial model consideration, a simple linear regression model is not an appro-

Crack Length

Figure 1 Simple linear regression relation between POD and crack length.

MARCH 2009 MATERIALS EVALUATION

295

ME FEATURE w x pod demonstration

Figure 2 compares the fit of a logit model to the simulated POD demonstration test results to that of the simple linear regression model. Note that the logit model is a non-linear, S-shaped curve with natural lower and upper boundary conditions of 0 and 1, respectively, on the model predictions. The probit link function is (5) where F = the cumulative standard normal distribution function. The logistic regression model, referred to as the probit model, is (6) To understand the concept behind the probit model, consider how a crack responds to fluorescent penetrant. A given fluorescent penetrant has a specified sensitivity level, or range of crack sizes that are likely to be detected. Sizes below that range are not likely to be detected and sizes above that range are almost always likely to be detected. Let Si represent the size range associated with the penetrant sensitivity level. If the crack size is within or above the sensitivity range, then the crack is likely to respond to the penetrant. The probability of response for a crack with a given length can be expressed as

(7)

where xi = the crack length. Note that it is the distribution of the sensitivity S across the population of cracks that determines the shape of the curve f(x). It should also be noted that an individual cracks response is not considered random; rather it is the sensitivity Si that is random and determines the binary outcome (that is, hit or miss). Hence, it is a family of distributions that determines the POD curve. Assuming the distribution of S is the standard normal distribution, that is (8) where F is the cumulative standard normal distribution, pi can be expressed as (9) This is equivalent to (10) which is a probit model where (11) The logit and probit models are very similar in shape, as illustrated in Figure 3. Because of this closeness, logit and probit models will yield similar fits to a given set of data and similar model predictions, provided that the fitted probabilities are not too extreme (Lloyd, 1999). In general, it is very difficult to discriminate between the probit and logit models using formal statistical tests (Lloyd, 1999). Except near the asymptotes, the logit and probit models agree closely in many cases (Neter et al., 1996). A probit model will tend to approach 0 and 1 faster than the logit, but will be similar in the middle (Lloyd, 1999). However, the probit model is not readily extendable to more than one predictor variable, which makes it less flexible than the logit model (Neter et al.,1996). The regression coefficients in a probit model are not as easily interpreted as those in a logit model (Lloyd, 1999). Formal statistical inference procedures are also more difficult to perform with the probit model (Neter et al.,1996). For these

Simple Linear Regression Model vs. Logit Model Predictor Variable (x) = Crack Length
1.4 1.2 1.0 Simple Linear Model Predictions Simulated Hit (1)/Miss (0) Data Logit Model Predictions

Probability

0.8 0.6 0.4 0.2 0.0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Crack Length

Figure 2 Comparison of simple linear regression model and logit model.

296

MATERIALS EVALUATION MARCH 2009

reasons, the logit model is generally preferred in practice over the probit model. However, the choice of a link function is ultimately driven by how well the model fits the observed data. Referring back to Figure 3, an overlap between misses and hits as crack length increases is observable in the simulated POD demonstration test results. It is important to note that the execution of a hit/miss analysis is dependent on having an overlapping transition range between misses and hits as crack length increases. Unlike with simple linear regression, the logistic regression model coefficients do not have a closed form solution. Hence, an iterative numerical procedure is required to solve the system of equations from which the estimates of the model coefficients are derived. The procedure iterates until a convergence criterion is met, at which point estimates of the model coefficients are obtained from the last iteration. If there is a complete separation of misses and hits, in other words, all cracks below a certain length are missed and all cracks above a certain length are found, then the procedure will not reach convergence, meaning no solution exists to the system of equations from which the model coefficients are derived. Though some statistical analysis software may produce estimates of the model coefficients even though the convergence criterion has not been met, these estimates are likely to be erroneous and should not be used.

Probit Model vs. Logit Model Predictor Variable (x) = Crack Length
1.4 1.2 1.0 Logit Model Predictions Simulated Hit (1)/Miss (0) Data Probit Model Predictions

Probability

0.8 0.6 0.4 0.2 0.0 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

Crack Length

Figure 3 Comparison of probit model and logit model.

(14)

p ln = 5.219 + 168.526 x 1 p

where x = crack length. Then a90 is estimated as 0.90 ln + 5.219 1 0.90 = 0.044 a90 = 168.526

Hit/Miss Analysis Objective


For fluorescent PT systems used to evaluate fracturecritical hardware, the objective is to determine the crack size, a, that can be detected with 90% probability, denoted a90. A true a90 value exists but is unknown. However, a90 can be estimated from the POD curve. Given the logistic regression function, this is achieved simply by solving for x given the desired probability p. If the logit model is used, then p ln 0 1 p ap =
1

(15)

(12)

where 0 and b 1 = the estimated model coefficients b p is the POD value of interest. If the probit model is used, (13)

Consider the hit/miss data in Figure 2. The logit model fit to the data is estimated as

Note that a90 is a single, specific size (for example, 1.12 mm [0.044 in.]) that is related to 90% POD. In the above example, the chance of finding a crack smaller than 1.12 mm (0.044 in.) is less than 90% and the chance of finding crack larger than that size is greater than 90%. Since POD is specific to a given crack size, finding a smaller crack does not guarantee that a larger crack will be found or vice versa. Just because a 1.12 mm (0.044 in.) crack is found doesnt guarantee that a 1.5 mm (0.060 in.) crack will be found. Also note that 90% POD equates to a 10% chance that a 1.12 mm (0.044 in.) long crack will be missed. Cracks smaller than 1.12 mm (0.044 in.) have a greater chance of being missed, and cracks above that size have a smaller chance of being missed. Hence, each crack has an independent POD and corresponding probability of being missed. However, a90 is only a single point estimate and does not account for random error resulting from naturally occurring variation in the fluorescent PT system. This random error results in some degree of

MARCH 2009 MATERIALS EVALUATION

297

ME FEATURE w x pod demonstration

uncertainty associated with estimating the true a90. For example, if an inspector repeated the demonstration test one year later, he may not find the exact same cracks as he had the year before. Though the fluorescent PT process is considered a mature, wellcontrolled process, there will always be inherent variability. A statistical uncertainty bound, often referred to as a confidence interval, can be put around any point estimate of interest to account for uncertainty due to naturally occurring variation. A confidence interval represents a range of plausible values in which the truth is thought to lie. For hardware under NASA fracture control, the interest is in estimating a 95% upper bound on the crack length that an inspector is able to reliably detect with 90% probability. When a 95% upper bound is used, the resulting crack length is larger than a90. In other words, the interest is in knowing how large the true a90 could be given the inherent variability in the fluorescent PT system. Denoted a90/95, this result is commonly referred to as an inspectors POD score. 95%, which refers to the statistical confidence level, is a probability statement about the statistical methods ability to estimate the truth. With respect to confidence intervals, the confidence level refers to the probability that the interval captures the truth. It is often easier to think of the meaning of statistical confidence in terms of risk, for there will always be a risk of drawing the wrong conclusion based on a statistical analysis of sampled data. The risk in terms of POD is the risk of the interval yielding an upper bound on crack length that is actually smaller than the true a90, leading to the belief that the inspectors capability is better than it really is. If a 95% level of confidence is chosen, then a 5% risk of drawing the wrong conclusion based on the statistical analysis has been accepted. Statistical confidence is analogous in a sense to a safety factor in a design. As a designer applies a safety factor on a critical feature of a design, a statistician applies a confidence bound (that is, statistical safety factor) on the estimate of a parameter of interest. (Details on the formula for calculating a 95% confidence bound on a90 can be found in MIL-HDBK-1823.)

responses by the inspector to hits as crack length increases. As noted in MIL-HDBK-1823, cracks outside this transition range, which are either almost always missed or almost always detected, provide limited information about the relationship between POD and crack length, and thus the capability of the fluorescent PT system. If the crack lengths in the panel set do not adequately cover the expected range over which the POD curve is rising, it may result in an invalid POD curve or an increase in model uncertainty. The rationale behind the recommendations in MIL-HDBK1823 regarding the distribution of crack lengths and the number of cracks is to minimize the chance of inadequate coverage of or completely missing the transition range during the design phase of a panel set. Though addressed separately in MIL-HDBK-1823, the distribution of crack lengths and the number of cracks can also have an interactive effect on the results of a hit/miss analysis. The following discussion will focus on interpreting the recommendations in MILHDBK-1823 for application to fluorescent PT systems where the POD demonstration test results are binary in nature (that is, hit or miss). Each recommendation will be addressed for its own merit as well as the interactive role it plays.

Effect of Crack Number and Length Distribution


In general, the range over which a logistic regression function is rising contains the maximum amount of information about the nature of the relationship between pi and the predictor variable xi. Hence, the range over which the POD curve is rising contains the maximum amount of information about the capability of the fluorescent PT system. This transition range is observable in practice as the transition from missed

Distribution of Crack Lengths MIL-HDBK-1823 (1999) recommends that crack lengths be distributed uniformly on a log scale covering the range of crack lengths over which the POD curve is expected to rise. However, the new recommended practice per the upcoming revision of MIL-HDBK-1823 (2007 draft) is to distribute the crack lengths uniformly on a Cartesian scale covering the range of crack lengths over which the POD curve is expected to rise. Though practically easier and perhaps statistically beneficial (as discussed later), the resulting coverage in the transition range may not be adequate for higher sensitivity fluorescent penetrants. A higher sensitivity fluorescent penetrant is more effective in detecting smaller cracks. Thus, the transition range is more likely to begin at a smaller crack size compared to a lower sensitivity penetrant. However, if there are not enough small cracks between 0 and the beginning of the transition range, then there may be a risk of having a complete separation of misses and hits. Recall that the execution of a hit/miss analysis is dependent on having an overlapping transition range between misses and hits as crack length increases. If there is a complete separation of misses and hits, then the iterative procedure used to obtain estimates of the model coefficients will not reach convergence. Just because the transition range doesnt exist in the POD demonstration test

298

MATERIALS EVALUATION MARCH 2009

data doesnt mean it doesnt exist at all. When cracks sizes are uniformly distributed on a Cartesian scale, the effect may be too few small cracks. However, when crack sizes are distributed uniformly on a log scale, the effect is a concentration of cracks in the smaller size range. The risk of having frequent problems with convergence in practice with higher sensitivity penetrants far outweighs the practical ease of distributing the crack lengths uniformly on a Cartesian scale. Thus for higher sensitivity penetrants, such as Level 3 and Level 4, it is recommended that the crack lengths be distributed uniformly on the log scale to help achieve adequate coverage of the beginning of the transition range. If the POD panel set is to be used with a lower sensitivity fluorescent penetrant, the crack lengths may be uniformly distributed on a Cartesian scale. Though a complete separation of misses and hits can happen with lower sensitivity penetrants given the same crack length distribution, it is less likely since the smaller crack lengths are more likely to be missed with a lower sensitivity penetrant. MIL-HDBK-1823 (1999) recommends that the distribution of crack lengths covers the range of lengths over which the POD curve is expected to rise. Though the exact location (in terms of crack size) of the transition range from misses to hits is ultimately unknown, careful selection of the crack size distribution based on historical information, if available, or engineering judgment can help maximize the chance of covering the transition range. Historical data from POD demonstration tests conducted on a similar fluorescent PT system may provide guidance as to where the range containing the maximum information about the capability of the system may lie. In general, the logit model is almost linear in the range where the probability is between 0.20 and 0.80 (Neter et al., 1996). Probability of detection demonstration test results from a similar fluorescent PT system, if available, may provide a rough estimate of where a20 and a80 may lie. If historical data are not available, then engineering judgment may be able to estimate the transition range by selecting a minimum and maximum value over which the lengths will be distributed. It is important to note that the selected minimum and maximum length should reflect the fluorescent PT system capability, not design requirements or critical initial discontinuity sizes. The panel set is designed to assess system capability, not design requirements. Requirements imposed on a system do not guarantee that the system is capable of meeting the requirements. The objective of a POD demonstration test on a fluorescent PT system using a known specimen standard is to assess whether the capability of the system meets its requirements. Hence, the

A true system capability exists but is unknown and cannot be altered by altering the specimen standard.
distribution of crack lengths should cover the range of lengths that are believed to represent the capability of the fluorescent PT system, which in theory is equivalent to the range of lengths over which the POD curve is expected to rise. Once the minimum and maximum values are selected, careful consideration needs to be given to the distribution of crack lengths covering the minimum and maximum values.

Number of Cracks The minimum recommended number of cracks for a fluorescent PT POD panel set per MIL-HDBK-1823 is 60. Considering the economics of building a POD panel set, 60 is practically reasonable. However, more than 60 cracks should be planned during the design of a panel set and more than 60 cracks (if economically feasible) is recommended in the finished panel set. Practically speaking, more than 60 cracks should be planned during the design of a panel set to ensure the minimum recommended number is realized in the finished product. Damage to panels, which may result in a loss of cracks, should be anticipated during the fabrication of the set. Also, cracks may fail to grow during the fabrication process or may be inadvertently removed during final machining. Statistically speaking, more than 60 cracks are recommended to decrease statistical uncertainty and hence improve the precision in the estimate of a90. When more information is gathered in an unbiased manner about a population, the statistical uncertainty associated with estimating something of interest about that population decreases. The amount of statistical uncertainty associated with estimating the crack size that can be detected with 90% probability is represented by the difference between the estimated a90/95 and a90. A small difference is indicative of less uncertainty. If the uncertainty can be minimized, then the range of plausible values narrows, given the same level of confidence. If increasing the sample sizes helps minimize statistical uncertainty, it would seem logical to increase the number of cracks in the range where a90 is expected to lie. The potential benefit is more information in the primary region of interest, which in
299

MARCH 2009 MATERIALS EVALUATION

ME FEATURE w x pod demonstration

theory should result in a reduction in statistical uncertainty. Distributing crack lengths on a Cartesian scale may result in more cracks in the range where a90 is expected to lie and is the rationale in the upcoming revision of MIL-HDBK-1823 for the change in philosophy regarding how discontinuity sizes are distributed. However, and as discussed earlier, caution should be used when the intent is to test fluorescent PT systems that use higher sensitivity penetrants. In general, having too many cracks in the region containing the expected a90 and too few cracks just before the beginning of the transition range, or too few cracks in the transition range, may result in frequent problems with convergence or high model uncertainty in practice. If the economic trade-off is fewer small cracks, then planning more cracks in the region containing the expected a90 is not recommended. The risk of frequent problems with lack of convergence in practice far outweighs even the potential statistical benefit of increased precision in a90 and narrower confidence bounds, particularly with higher sensitivity penetrants. A less risky approach to minimizing statistical uncertainty is having more than 60 cracks, the majority of which are uniformly distributed on a log scale covering the range over which the POD curve is expected to rise. The benefit in this case is more information about the nature of the relationship between POD and crack length while minimizing the risk of a complete separation of misses and hits at the same time. (Exactly how many more than 60 are needed to see a significant reduction in statistical uncertainty in practice and where the point of diminishing return lies is an area of research.) Another approach may be a combination of distributing smaller cracks on a log scale and larger cracks on a Cartesian scale in an attempt to reduce the risk of lack of convergence, while at the same time assigning more crack lengths in the region of interest.

if a long transition range is expected between lengths that are likely to be missed and lengths that are likely to be found, then more than 60 cracks distributed uniformly on a log scale are likely to be needed to adequately cover this range. If the panel set is to be used with multiple penetrant sensitivity levels, then more than 60 cracks distributed over a wider range is recommended. Multiple penetrant sensitivity levels are likely to result in different relationships between POD and crack length and hence different transition ranges. The panel set design needs to be robust to the different penetrant sensitivity levels. In other words, the number of cracks and distribution of crack lengths needs to adequately cover each possible transition range to avoid an increase in model uncertainty for a given fluorescent PT system.

Effect of Crack Length Distribution across the Panel Set


As noted in MIL-HDBK-1823, bias can be introduced into a POD demonstration test if an inspector starts to memorize the panel set. Any bias introduced into the demonstration test procedure can lead to seriously misleading hit/miss analysis results. To avoid bias in the results introduced by the design of the panel set, it is recommended to randomly distribute the crack lengths across the panel set and to have an unequal number of cracks per panel. Randomly distributing the crack lengths across the panel helps prevent, for example, all large cracks from being on one panel and all small cracks on another, which may enable an inspector to quickly memorize the set. If there are an equal number of cracks per panel, then the inspector is likely to eventually recognize that he or she should be finding the same number of cracks per panel. At least one blank panel (more for large panel sets) should also be included in the set so that an inspector does not expect there to be cracks in every panel. Randomly distributing the crack sizes across the set with an unequal number per panel (with blank panels also included in the set) is also more representative of what an inspector may encounter in a real part. If inspectors begin to memorize the specimen standard, MIL-HDBK1823 recommends obtaining a new panel set as such familiarity does not represent typical inspections.

Crack Number and Length Distribution Interaction The distribution of crack lengths and the number of cracks can also have an interactive effect on the results of a hit/miss analysis. Even if the crack lengths cover the transition range, too few cracks in the transition range may result in a complete separation of misses and hits, very few misses or even no misses, especially for higher sensitivity penetrants. Though few or even no misses is desired during the actual testing of production hardware, it is not desirable when trying to estimate the system capability. If the cracks are too spread out in the transition range, there may be very few misses, resulting in greater model uncertainty due to the lack of information about the nature of the transition range. Hence,
300

Summary
The number of discontinuities, the distribution of discontinuity sizes, and how the discontinuities are distributed across a specimen standard can have a significant effect on the statistical analysis of POD demonstration test results. Thus, they should be treated as critical features and must be given careful consideration when designing or obtaining a

MATERIALS EVALUATION MARCH 2009

fluorescent PT POD panel set. Though the results of a hit/miss analysis are highly dependent on the number of discontinuities and distribution of discontinuity sizes in the panel set, the converse is not true. That is, the number of discontinuities and distribution of discontinuity sizes cannot be manipulated to achieve a desired result. A true system capability exists but is unknown and cannot be altered by altering the specimen standard. A poorly designed specimen standard can only lead to increased uncertainty associated with estimating the true system capability, which in turn can lead to misleading conclusions about the system capability. A poorly designed specimen standard can also introduce bias in the POD demonstration test or lead to the inability to estimate the system capability at all. Hence, resources are best spent in carefully designing a specimen standard since only an appropriately designed specimen standard can yield meaningful data about the true fluorescent PT system capability.

introducing me to Chuck Annis. Finally, thanks to Charlie Haffey for his instruction in fluorescent PT techniques and allowing me to get my hands dirty. w x
AUTHORS Jennifer Herberich: Pratt & Whitney Rocketdyne, M/S 73184, PO Box 109600, West Palm Beach, FL 33410-9600; (561) 796-2230; fax (860) 622-3453; e-mail jennifer.herberich@pw.utc.com. REFERENCES Agresti, A., Categorical Data Analysis, New York, Wiley, 1990. Department of Defense, MIL-HDBK-1823: Non-destructive Evaluation System Reliability Assessment, 30 April 1999. Department of Defense, MIL-HDBK-1823: Non-destructive Evaluation System Reliability Assessment, draft, 28 February 2007. Lively, J.A. and T.L. Aljundi, Fluorescent Penetrant Probability of Detection Demonstrations Performed for Space Propulsion. CP657, Review of Quantitative Nondestructive Evaluation, Vol. 22B, D.O. Thompson and D.E. Chimenti, eds., Melville, New York, AIP, 2003, pp. 18911898. Lloyd, C.J., Statistical Analysis of Categorical Data, New York, Wiley, 1999. NASA, NASA-STD-5009: Nondestructive Evaluation Requirements for Fracture Critical Metallic Components, 2008. NASA, NASA-STD-5019: Fracture Control Requirements for Space Flight Hardware, 2008. Neter, John, Michael Kutner, Christopher Nachtsheim and William Wasserman, Applied Linear Statistical Models, fourth edition, Boston, McGraw-Hill, 1996.

Acknowledgments
This paper would not have been possible without the support and encouragement of Kon Haake and John Lively and their welcoming of a statistician into the world of NDT. Thanks also to Tommie Watkins for imparting his knowledge of POD methodology and

MARCH 2009 MATERIALS EVALUATION

301

You might also like