You are on page 1of 7

w

x ME FEATURE

PODmodels
From Materials Evaluation, Vol. 73, No. 1, pp: 5561.
Copyright 2015 The American Society for Nondestructive Testing, Inc.

Developments in
Probability of
Detection Modeling
and Simulation
Studies
by Jeremy S. Knopp, Frank Ciarallo, and
Ramana V. Grandhi

Photo credit: William Meeker,


Iowa State University

he purpose of this paper is to review recent developments in


probability of detection (POD) modeling and demonstrate how
simulation studies can be used to benchmark the effect of
sample size on confidence bound calculations by means of
evaluating probability of coverage.

Review of Recent Advances


The historical development of POD is well documented in previous papers
and will not be discussed in this paper (Berens, 1989; Knopp et al., 2012;
Olin and Meeker, 1996). In the last decade, the major development was the
use of the likelihood ratio technique for confidence bound calculations
pertaining to hit/miss data (Annis and Knopp, 2007; Harding and Hugo,
2003; Spencer, 2007). This was incorporated into the guidance for POD
studies provided in MIL-HDBK-1823A (DOD, 2010).
The other areas where significant advances have been made include the
following:
l Exposition on the distinction between uncertainty and variability in nondestructive testing (NDT) (Li et al., 2012).
l Use of markov chain monte carlo (MCMC) simulation for confidence
bounds (Knopp and Zeng, 2013).

JANUARY 2015 MATERIALS EVALUATION

55

ME FEATURE w
x developments in probability of detection modeling

l
l
l
l
l

Three- and four-parameter models (Knopp and


Zeng, 2013; Moore and Spencer, 1998; Spencer,
2014).
Bootstrapping for confidence bound calculations
(Knopp et al., 2012).
Nonparametric techniques for POD modeling
(Spencer, 2011).
Box-cox transformations to mitigate violations of
homoscedasticity (Knopp et al., 2012).
Sample size determination (Annis, 2014; Safizadeh
et al., 2004).
Bayesian design of experiments (Koh and Meeker,
2014).

which averages out variability from the inspection


process. A common error in interpreting a POD
analysis is to think of the confidence bounds as
pertaining to variability in the inspection process. The
confidence bounds only pertain to the error due to
sampling for a single experimental run. Therefore, it is
a serious error to assume that 95% of a90 values in
future experiments will be within the confidence
bounds computed from a single experiment. If one is
interested in where 95% of a90 values in future experiments will lie, the appropriate interval is a tolerance
interval, discussed elsewhere (Li et al., 2015). The
technical details also appear in a tutorial in the context

confidence bounds only pertain to


the error due to sampling for a single
experimental run
Uncertainty and Variability
It may be argued that the first item on the list is not an
advance; however, it represents a very clear exposition
and reminder of what a POD curve and associated
confidence bounds actually provide (Li et al., 2012).
The authors point out that that the traditional way of
performing a POD study determines the mean POD,

1.0
0.8

a90

a90/95

POD

0.6
a50

0.4
0.2

POD (a)
Upper confidence bound

0.0
0

10

12

14

16

a (mm)
Figure 1. Conventional two-parameter probability of detection (POD) model.

56

MATERIALS EVALUATION JANUARY 2015

of linear regression (De Gryze et al., 2007). The concept


of probability of coverage discussed in this paper looks
at this issue from another perspective.
Three- and Four-parameter Models and Markov
Chain Monte Carlo
Three- and four-parameter POD models have been
proposed to address limitations of the conventional
two-parameter models (Moore and Spencer, 1998;
Spencer, 2014). Figure 1 shows the conventional twoparameter model. The quantities of interest are a50,
a90, and a90/95. The mean 50% POD is a50, the mean
90% POD is a90, and the upper 95% confidence
bound on a90 is known as a90/95. The two-parameter
model forces the POD curve to zero as the discontinuity size approaches zero, and to one as the discontinuity size approaches infinity. There are many data
sets that do not support POD of one for any discontinuity size, and so a modified model that includes
lower and upper asymptotes was developed to
address this issue (Moore and Spencer, 1998;
Spencer, 2014). The conventional model can be
modified by adding additional parameters. Consider
Equations 1 and 2 that show a four-parameter model
for the logit and probit links, respectively.
(1)
(2)

pi = + ( )

exp ( b0 + b1 log ai )

1 + exp ( b0 + b1 log ai )

pi = + ( ) ( b0 + b1 log ai )

(logit)

(probit)

The parameter represents the fact that there is a


finite probability of detecting cracks well below the
size intended and can be thought of as a false call
rate. A recent work points out that even though POD
when no discontinuity is present might (and probably
does) equal something other than zero, it is not POD
per se, but leads to a function that models an independent false call process on detections (Spencer,
2014). There is also a finite probability that very large
cracks will not be detected, and an upper asymptote
can also be estimated for this. The parameter represents this for large discontinuity sizes. In Figure 2a,
the POD curve approaches one for very large discontinuities but has an parameter that needs to be
estimated as the discontinuity size approaches zero.
Figure 2b shows the parameter that needs to be
estimated to represent detection capability at very

POD

(a)

large discontinuity sizes. Figure 2c shows a fourparameter model that requires both and to be
calculated.
Depending on whether and are included in the
model, there are four candidate models:
l Two-parameter model that does not include or .
l Three-parameter model with a lower asymptote, .
l Three-parameter model with an upper asymptote, .
l Four-parameter model that includes both and .
Historically, the logit and probit links have been
found to fit POD data well, so there are a total of eight
possible models. The question of which model is
appropriate for a given data set can be answered by
using the bayes factor approach described in prior
work (Kass and Raftery, 1995; Knopp and Zeng, 2013).
Recently, an alternative technique of computing confidence intervals for POD models that include lower and
upper asymptotes via MCMC was introduced (Knopp
and Zeng, 2013). MCMC techniques are very similar to
monte carlo techniques with one important distinction: the samples in monte carlo are statistically independent, which means an individual sample does not
depend on a previous sample. In MCMC, the samples
are correlated with each other. MCMC has proven to
be an effective way to compute multidimensional
integrals that occur in bayesian calculations. Since
MCMC is the computational engine that enables
bayesian analysis, computing confidence bounds with
non-informative priors is a graceful first step to introducing bayesian techniques, which is necessary for
model-assisted POD.

POD

(b)

POD

(c)

Figure 2. Probability of detection (POD) model options:


(a) three-parameter with lower bound; (b) threeparameter with upper bound; and (c) four-parameter
with lower and upper bound.

Bootstrapping
Current guidance for POD studies assumes that a
linear relationship exist between the explanatory
variable, such as discontinuity size, which is
commonly designated with an a in POD literature,
and the measurement response . Typically, a logarithmic transform will remedy cases where the linear
relationship is not established, but this is not always
the case. Models with additional complexity beyond a
linear model are sometimes necessary for proper
analysis of data. This was the case in the analysis of
data from an inspection of subsurface cracks around
fastener sites using eddy current (Knopp et al., 2012).
The difficulty in these cases is that the procedures for
confidence bound calculations are not developed;
however, a flexible approach called bootstrapping was
demonstrated. Bootstrapping is essentially sampling
with replacement, and it is a very easy technique to
implement for more complicated models and is very
useful for model-assisted POD.

JANUARY 2015 MATERIALS EVALUATION

57

ME FEATURE w
x developments in probability of detection modeling

Nonparametric Models
The POD model described in MIL-HDBK-1823A
assumes an S-shaped curve described by two parameters (DOD, 2010). The three-parameter and fourparameter models discussed earlier are modified
versions of that model. An entirely different idea is to
not assume any particular model form, which is
referred to in the literature as a nonparametric model.
A nonparametric model was proposed for POD with
the only assumption being that the POD function is
monotonically increasing with respect to discontinuity
size (Spencer, 2011). This model is useful for many
reasons. First it can be used as a screening model by
comparing the form of the nonparametric model with
the selected parametric model. For example, if the
nonparametric model closely follows a threeparameter model with an upper asymptote, chances
are that the three-parameter model is the best fit. It is
generally useful to see what type of model form the
data dictates before forcing a parametric model on it.
Box-cox Transformation
It is always advantageous to use the measured
response data for POD evaluation since there is more
information contained in that form rather than
hit/miss; however, real inspection data often violate
core assumptions required to use a POD model fit.
Another development is the use of the box-cox transformation to mitigate violations of homoscedasticity
for data analysis. Homoscedasticity means that the
scatter in the observations is constant for the discontinuity size range. For cases where there is a relationship between the mean response and the variance,
the box-cox transformation is used to stabilize the
variance. This technique assumes that the relationship
between the error variance, s2i , and mean response,
i, can be described with a power transformation on
in the form of Equation 3. The new regression model
in Equation 4 includes the additional parameter,
which also needs to be estimated.

(3)

(4)

a = a
a i = 0 + 1 ai + i

The technique described in an outside work was


followed exactly (Kutner et al., 2004). A numerical
search procedure was set up to estimate . The
observations were first standardized so that the order
of magnitude error sum of squares was not dependent
on the value of .
58

MATERIALS EVALUATION JANUARY 2015

The standardized observations were:


1
ai 1 , 0
c 1

(5)

gi =

(6)

gi = c ln ai 1 , = 0

where
n is the total number of observations,
c = (i)1/n, which happens to be the geometric
mean of the observations.
Once these standardized observations are
obtained, they are then regressed on a, which in this
case is crack length, and then the sum of squares
error (SSE) is obtained. The optimization problem is
formulated such that the objective is to minimize SSE
with l as a single parameter to be adjusted. An
example of how the box-cox transform is used was
presented in the context of an eddy current inspection
of cracks around fastener sites in an aircraft structure
(Knopp et al., 2012).
Sample Size
One of the more common questions asked about POD
studies is how many samples will be required. Both
the sample size and distribution of the samples will
affect the POD evaluation. In MIL-HDBK-1823A, it is
recommended that there be at least 40 samples when
the signal response data is used and 60 samples for
hit/miss (DOD, 2010). The question of the range of
discontinuity sizes and the distribution of discontinuity sizes is not discussed in MIL-HDBK-1823A, and
has not been examined extensively in the literature
except in a few cases (Berens and Hovey, 1985;
Safizadeh et al., 2004). Recently, this question for
hit/miss data was investigated via simulation and
looked at discontinuity size distribution and the
effects of moving the center of the sample distribution
relative to the true a50 value (Annis, 2014). The recommendations from this study include using a uniform
distribution of discontinuity sizes and that the range
should be from 3 to 97% POD. The number of
specimens agrees with MIL-HDBK-1823A in that a
minimum of 60 specimens should be used.
Bayesian Design of Experiments
A bayesian approach to planning hit/miss experiments
has also been presented (Koh and Meeker, 2014).
This allows engineers to use any prior information that
may be known about a POD curve to assist in
designing the experiment. The conclusion of this work
was that optimal test plans developed purely from
bayesian techniques may not be practical, but they

Simulation Studies
Going forward, the authors recommend simulation
studies to provide the NDT practitioner with a connection between the intuition gained with inspection
experience and the statistical techniques used to
quantify the capability of an inspection. For example,
simulation studies can be used to benchmark the
effect of sample size on confidence bound calculations by means of evaluating probability of coverage.
Probability of coverage is defined as the probability
that the interval contains the quantity of interest. In
this work, covering a90 with a 95% upper confidence
bound (that is, a90/95) is of particular interest, so the
probability that a90/95 is greater than the true value of
a90 as defined in Equation 7 is what is meant by probability of coverage in this paper. The objective of simulation studies is to show how often (in terms of
percent) a confidence interval contains the true
parameter of interest.
(7)

to resemble a population from which samples


were drawn as shown in Figure 3. In this case,
0 = 0.13546, 1 = 0.043, mult = 0.316, and
add = 0.0316. The proportion of observations above
the detection threshold of 0.195 was determined in
intervals of 1000 observations and the true POD curve
is plotted in Figure 4. If 100 000 simulated observations
is designated the population for this inspection, then
the interval of observations with 90% proportion above
0.195 can be considered the true a90 value. It was determined based on this technique that the true a90 is
2.907 mm (0.114 in.). The coverage probability of this
value is investigated via simulation.

0.6
0.5
0.4

can be used to develop a compromise between the


optimal test plan and a uniform distribution of discontinuity sizes. Another conclusion was that the recommendation of 60 observations for hit/miss analysis
performs well for estimating a50, and may be slightly
sub-optimal for estimating a90, but the uniform distribution recommendations for hit/miss experiments still
performs quite well.

0.2
0.1

P { a90/95 > true a90 }

A simple case with a model that includes both


additive and multiplicative noise is proposed as an
approach for how to conduct a simulation study.

0.0
0

(8)

a = 0 + 1 a (1 + mult ) + add

Figure 3. Simulated data with linear model form that includes additive and
multiplicative noise.

1.0
0.9
0.8

True POD

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0

The linear model parameters for data used in prior


work was used as the basis for this study (Knopp et
al., 2013). This model and associated parameters was
used to create a data set with 100 000 observations

a (mm)

Simulation Study with Additive and Multiplicative


Noise
This simulation study uses a noise model that includes
both additive and multiplicative noise as represented in
Equation 8, where is the signal response and a is the
indication size. The additive noise component is designated by add, and the multiplicative is designated by
mult. This additive and multiplicative noise model
resembles realistic inspection data, and is used for the
purpose of generating a synthetic data set useful for
simulations purposes only.

0.3

a (mm)
Figure 4. True probability of detection (POD) curve available from
100 000 simulated observations.

JANUARY 2015 MATERIALS EVALUATION

59

ME FEATURE w
x developments in probability of detection modeling

The process of the simulation is simply to sample


from the population of observations, compute a90
and a90/95, and repeat. In this case, this process was
done for sample sizes of 30 and 100. Since the
assumption of constant variance is violated, the
sample data sets were converted to hit/miss for
analysis and a two-parameter logit model was used.
a90 and a90/95 were computed for each simulated
data set and the coverage probability was computed
for both of the sample sizes. Based on experience, it
is predicted that 100 observations of hit/miss data
randomly sampled should provide appropriate
coverage probability for POD (Annis, 2014).
Appropriate coverage probability is defined as being
close to the theoretical confidence value. For a 95%
level of confidence, for example, the coverage probability should be approximately 95%. Hence, the
percent of simulated intervals that cover the true a90
will be calculated and evaluated against 95%. Figure 5
shows box plots for a90 and a90/95 for the twoparameter logit model, which shows the variation in
the a90 and a90/95 values with respect to the true
value of a90. As expected, the variance is higher
for the estimated a90 and a90/95 values with only
30 observations. Though there does not appear to be
much of a difference in the a90 values on average, the
variance for the 30-observation case is approximately
double that of the 100-observation case. The variance
of the a90/95 values for the 30-observation case is
approximately four times that of the 100-observation
case. Coverage probability for the 30-observation case
is approximately 85%, while coverage probability for
the 100-observation case is approximately 94%.

12
10

Range

8
6
4
True a90
2
0

a90
(30 obs)

a90/95
(30 obs)

a90
(100 obs)

a90/95
(100 obs)

Probability of detection
Figure 5. Box plots for a90 and a90/95 values for 30 and 100 observations (obs) for
two-parameter hit/miss model.

60

MATERIALS EVALUATION JANUARY 2015

Hence, the 30-observation case yields more overly


optimistic results about the inspection capability
(that is, a90/95 < true a90).
These are the types of investigations that can be
conducted via simulation studies. It is hoped that they
will be used more frequently as different techniques of
modeling POD are proposed.

Summary and Conclusion


There have been considerable contributions to POD
modeling in the last decade. It is hoped that procedures for three-parameter and four-parameter models
and also the bootstrapping approach for confidence
intervals will be codified for more general use.
Simulation studies were introduced as a way of to
quantify the effect of sample size on confidence
bound calculations. In the future, such simulation
studies can be used to quantify the effect of changes
in model form or different techniques for calculating
confidence bounds. Further simulation studies can
investigate the effect of the choice of left censoring
that data and the detection threshold on the coverage
of the true a90. Hence, it is recommended that future
POD software projects enable simulation studies to
facilitate such investigations. w
x
AUTHORS
Jeremy S. Knopp: Air Force Research Laboratory
(AFRL/RXCA), Wright-Patterson AFB, Ohio 45433.
Frank Ciarallo: Wright State University, Dayton, Ohio 45435.
Ramana V. Grandhi: Wright State University, Dayton, Ohio
45435.
REFERENCES
Annis, C., and J.S. Knopp, Comparing the Effectiveness of
a90/95 Calculations, Review of Progress in Quantitative
Nondestructive Evaluation, Vol. 26, 2007, pp. 17671774.
Annis, C., Influence of Sample Characteristics on
Probability of Detection Curves, Review of Progress in
Quantitative Nondestructive Evaluation, Vol. 33, 2014,
pp. 20392046.
Berens, A.P., and P.W. Hovey, The Sample Size and Flaw
Size Effects in NDI Reliability Experiments, Review of
Progress in Quantitative Nondestructive Evaluation, Vol. 4,
1985, pp. 13271334.
Berens, A.P., NDE Reliability Data Analysis, ASM
Handbook, Vol. 17, 9th ed: Nondestructive Evaluation and
Quality Control, 1989, pp. 689701.
De Gryze, S., I. Langhans, and M. Vandebroek, Using the
Correct Intervals for Prediction: A Tutorial on Tolerance Intervals for Ordinary Least-squares Regression, Chemometrics
and Intelligent Laboratory Systems, Vol. 87, No. 2, 2007,
pp. 147154.
DOD, MIL-HDBK-1823A, Nondestructive Evaluation System
Reliability Assessment, Department of Defense Handbook,
U.S. Department of Defense, August 2010.
Harding, C.A., and G.R. Hugo, Experimental Determination
of the Probability of Detection Hit/Miss Data for Small Data
Sets, Review of Progress in Quantitative Nondestructive
Evaluation, Vol. 22, 2003, pp. 18231844.

Kass, R.E., and A.E. Raftery, Bayes Factors, Journal of the


American Statistical Association, Vol. 90, No. 430, 1995,
pp. 773795.
Knopp, J.S., and L. Zeng, Statistical Analysis of Hit/
miss Data, Materials Evaluation, Vol. 71, No. 3, 2013,
pp. 323329.
Knopp, J.S, R.V. Grandhi, J.C. Aldrin, and L. Zeng, Considerations for Statistical Analysis of Nondestructive Evaluation
Data: Hit/miss Analysis, E-Journal of Advanced Maintenance, Vol. 4, No. 3, 2012, pp. 105115.
Knopp, J.S., R.V. Grandhi, J.C. Aldrin, and I. Park, Statistical
Analysis of Eddy Current Data from Fastener Site Inspections, Journal of Nondestructive Evaluation, Vol. 32, 2013,
pp. 4450.
Koh, Y.M., and W.Q. Meeker, Bayesian Planning of Hit-Miss
Inspection Tests, Review of Progress in Quantitative
Nondestructive Evaluation, Vol. 33, 2014, pp. 20472054.
Kutner, M.H., C.J. Nachtsheim, J. Neter, and W. Li, Applied
Linear Statistical Models, 5th ed., McGraw-Hill/Irwin, New
York, New York, 2004.
Li, M., F.W. Spencer, and W.Q. Meeker, Distinguishing
Between Uncertainty and Variability in Nondestructive Evaluation, Review of Progress in Quantitative Nondestructive
Evaluation, Vol. 31, 2012, pp. 17251732.

Li, M., F.W. Spencer, and W.Q. Meeker, Quantile Probability


of Detection: Distinguishing between Uncertainty Variability
in Nondestructive Testing, Materials Evaluation, Vol. 73,
No. 1, 2015, pp. 8995.
Moore, D.G., and F.W. Spencer, Detection Reliability Study
for Interlayer Cracks, Proceedings of the 1998 SAE
Airframe/Engine Maintenance & Repair Conference, Society
of Automotive Engineers, Inc., 1998.
Olin, B.D., and W.Q. Meeker, Applications of Statistical
Methods to Nondestructive Evaluation, Technometrics,
Vol. 38, 1996, pp. 95112.
Safizadeh, M.S., D.S. Forsyth, and A. Fahr, The Effect of
Flaw Size Distribution on the Estimation of POD, Insight,
Vol. 46, No. 6, June 2004, pp. 15.
Spencer, F.W., The Calculation and Use of Confidence
Bounds in POD Models, Review of Progress in Quantitative
Nondestructive Evaluation, Vol. 26, 2007, pp. 17911798.
Spencer, F.W., Nonparametric Pod Estimation for Hit/miss
Data: A Goodness of Fit Comparison for Parametric
Models, Review of Progress in Quantitative Nondestructive
Evaluation, Vol. 30, 2011, pp. 15571564.
Spencer, F.W., Curve Fitting for Probability of Detection
Data: A 4parameter Generalization, Review of Progress
in Quantitative Nondestructive Evaluation, Vol. 33, 2014,
pp. 20552062.

JANUARY 2015 MATERIALS EVALUATION

61

You might also like