You are on page 1of 2

Part 8Calibration Diagnostics

by David Coleman and Lynn Vanatta


The past seven articles have provided the fundamentals of calibration, including a protocol for designing a
calibration study. Once such a study has been performed (i.e., the chosen standards have been analyzed
in replicate), the data are available for constructing a
calibration curve. (See installments 5 and 6 in American Laboratory June 2003 and July 2003, respectively,
for details on calibration design.) It is tempting simply
to fit a straight line (SL) using ordinary least squares
(OLS) and to evaluate the SL model using only the
correlation coefficient, or its square, R2. Note that R2
represents the proportion of the variation in the
response (y-value) that is explained by the calibration. If R2 is high enough (typically 0.99 or better), the
curve is deemed adequate.
However, more rigorous testing should be done to
ensure that the selected model and fitting technique
are adequate. Statistics can provide the necessary tools
for this task. This article is the first of three that will
discuss the process in detail. The evaluation involves
seven basic steps, all of which are easy to perform with
the help of statistical software. The steps are:
Plot response versus true concentration
Determine the behavior of the standard deviation of
the response
Fit the proposed model and evaluate R2adj
Examine the residuals for nonrandomness
Evaluate the p-value for the slope (and any higherorder terms)
Perform a lack-of-fit test
Plot and evaluate the prediction interval.
Before these steps are discussed in detail, the concept
of p-value (p stands for probability) must be introduced. This statistic will be used throughout the diagnosis process to guide decisions about plausible
hypotheses. Most statistical evaluations of data involve
hypothesis testing. A proposed assumption (called the
null hypothesis) is made for the data; a contrasting
assumption is called the alternative hypothesis. When
the statistical test is performed, the question being
asked is, What are the odds of getting this set of data
40

NOVEMBER 2003 AMERICAN LABORATORY

(or data at least this unusual, as defined by the alternative hypothesis), purely by chance, if the null (or starting) hypothesis is true? The p-value is that probability.
If the odds are low (typically less than 1% or 5%,
depending on the test being used), then the null
hypothesis is rejected and the alternative is accepted.
Note that one can never prove a hypothesis to be false;
one can only decide based on the weight of the evidence, much as in a court of law.

Step 1: Plot response versus


true concentration
As might be expected, the first step in constructing a calibration curve is to plot the response data versus the
respective true concentrations. Even without having a
model fitted to the points, this plot is usually informative.
Any data that are possible outliers (or typos in the data
table) may be obvious. Suspect points should be investigated to see if they are correct and if they belong with the
rest of the data set. However, errant (or inconvenient)
data should not be permanently excluded unless a sound
physical reason can be found for taking such action. As
much as possible, any calibration experiment should capture the variability that will likely enter into the typical
analysis process. (The topic of outliers will be discussed in
more detail in later installments.) If there is a high degree
of curvature, this situation may also be detectable.

Step 2: Determine the behavior of the


standard deviation of the response
The importance of this second step cannot be overemphasized. If the computed standard deviation (SD) of
the responses changes systematically (e.g., increases or
decreases) with concentration, then ordinary least
squares is not the appropriate fitting technique. Recall
from installment 3 (January 2003) that one of the
assumptions behind OLS is that, The standard deviation of the responses does not change over the range of
x values for which the model will be applied. However,
in analytical chemistry, this assumption does not
always hold; the variability of the response will often
increase with increasing concentration.

To perform this behavior analysis, the SD of the


responses is computed separately at each concentration
(hence, true replicates must be run). These values are
plotted versus true concentration. A straight line
(using, for example, OLS) is fitted through the points,
resulting in an equation of the form (g + hx). Note that
in this series of articles, the (a + bx) form of the
straight-line equation will be reserved for situations in
which the instrumental response is being plotted versus
true concentration. For modeling standard deviations
(g + hx) will be used.
Associated with the slope (h) is a p-value, which
allows the analyst to decide if the slope is significant. In
general, if this p-value is less than 1% (possibly
reported by software as 0.01), then the slope is considered to be significant and the SD is declared to change
with concentration.
Because OLS assumes a constant SD, all data points are
allowed to influence the regression line equally; in
other words, each point carries a weight of 1. However, if some data are noisier than others (i.e., the standard deviation is not constant), then the more variable
points should not be allowed to have as much influence.
To incorporate this nonconstant SD into the regression
process, a generalization of OLS is used instead:
weighted least squares (WLS).
As is indicated by the name, weighted least squares,
weights are involved. Various formulas have been used
by different authors to calculate these weights. However, a robust procedure uses the formula that results
from fitting a SL to the SD data (see above discussion).
The basic equation for the weight is the reciprocal
squared of the estimated standard deviation:
weight @ x = (g + hx)2

To enable the calculation of root mean square error in


original response units, this weight should be normalized by dividing by the mean of all the reciprocalsquared values:
{(g + hx)2}/{Avg [(g + hx)2]}
This formula is evaluated for each concentration and
applied to the corresponding data to weight them.
The result is that the noisy responses have less influence
on the calibration curve than do the precise values.
While step 2 may seem a computational annoyance,
ignoring a SD that changes with concentration will
have two negative results. First, the models coefficient
estimates (g and h) will be noisy. Second (and possibly
more important), the prediction interval will be too
wide in the well-behaved-data region and too narrow
in the noisy-data region.
Steps 3 through 7 will be discussed in the next two
installments. Following the details of calibration diagnostics will be a series of articles using these procedures
to diagnose real calibration data sets.

Mr. Coleman is an Applied Statistician, Alcoa Technical Center, MSTC, 100 Technical Dr., Alcoa Center, PA 15069, U.S.A.; e-mail:
david.coleman@alcoa.com. Ms. Vanatta is an Analytical Chemist, Air
Liquide-Balazs Analytical Services, Box 650311, MS 301, Dallas,
TX 75265, U.S.A.; tel: 972-995-7541; fax: 972-995-3204; e-mail:
lynn.vanatta@airliquide.com.

AMERICAN LABORATORY NOVEMBER 2003

41

You might also like