You are on page 1of 4

Anal. Chem.

1998, 70, 1277-1280

Validation of Analytical Methods Using a


Regression Procedure
Heinz W. Zwanziger and Costel Sarbu*,

Fachbereich Chemiesund Umweltingenieurwesen, Fachhochschule Merseburg, Germany, and Faculty of Chemistry and
Chemical Engineering, Babes -Bolyai University, RO-3400 Cluj-Napoca, Romania

The evaluation and validation of analytical methods and


instruments require comparison studies using sample
material for testing accuracy and precision. In analytical
chemistry, the commonly accepted means of analyzing
data from method comparison studies is least-squares
regression analysis, a model which has limitations. In this
paper, the results from ordinary least-squares and many
other regression approaches recommended in the literature were compared with a new regression procedure that
takes into account the errors in both variables (methods).
After a discussion of the properties of the regression
procedure, recommendations are given for carrying out
a method comparison study using informational analysis
of variance. The efficiency of the regression procedure
proposed is demonstrated by applying it to different data
sets from published literature.
Analytical measurements are of vital importance in many fields
of activity, including diagnosis and treatment of different diseases,
environmental protection, producing and marketing of some useful
materials, and the performance of many scientific studies. Therefore, experimental data resulting from chemical or instrumental
measurements should be reliable; i.e., they first should be
unbiased and precise.
From a chemometric point of view, the validation of a new
analytical method or an improved method, considering, for
example, time of analysis, price, and convenience, must ensure
first the integrity and quality of that method concerning bias,
precision, limits of detection and determination, range of linearity,
selectivity, and the transferability of the method.1-4 The great
variety of techniques used in analytical chemistry emphasizes the
need for establishing an efficient and reliable comparison methodology. The results of a large number of samples over the whole
measurement range can be evaluated by a paired t-test or by
regression. Regression procedures are preferred in many cases
because they not only are less subject to statistical problems but
also deliver more information, since the paired t-test is strictly
valid only for the detection of absolute systematic errors. More

Fachbereich Chemiesund Umweltingenieurwesen.


Babes -Bolyai University.
(1) Youden, W. J.; Steiner, E. H. Statistical Manual of the AOAC; AOAC:
Washington, DC, 1975.
(2) Cardone, J. M. J. Assoc. Off. Anal. Chem. 1983, 66, 1257.
(3) Miller, J. N. Analyst 1991, 116, 3.
(4) Alexandrov, Yu. I. Analyst 1996, 121, 1137.

S0003-2700(97)00926-8 CCC: $15.00


Published on Web 02/25/1998

1998 American Chemical Society

over, the t-test assumes that the errors are independent of the
concentration and have a normal distribution.5,6
To date, different regression procedures have been used for
chemometric evaluation of data from method comparison studies.
Each one has specific theoretical requirements for the data. It is
obvious that the reliability of a procedure depends largely on to
what extent the data can meet these requirements.7,9
Under the premise of a linear relationship between the two
methods in the form of

y ) a0 + a1x

(1)

the estimated values for a0 and a1 are tested against the null
hypothesis a0 ) 0 and a1 ) 1. If the estimated values differ only
by chance from 0 and 1 at a predefined significance level, then
the methods are considered to be equal. A slope significantly
different from unity indicates a proportional error (i.e., a matrix
effect), and an intercept different from zero implies a constant
error (i.e., a blank problem).
The linear regression based on ordinary least-squares presumes that error(Y) . error(X). For the measurements of
method Y, analytical errors are allowed, so that repeated measurements of one sample will scatter perpendicularly to the x-axis
around their expected value on the regression line. The parameters a0 and a1 of the regression equation are determined by
minimizing the sum of squared distances (residuals) between
measurement points and the regression line. The method of leastsquares is sensitive to extreme data points, which may result in
biased values of a0 and a1. A change in the assignment of the
methods to the variables of the regression procedure results in
new parameters which cannot be converted into the old ones by
the regression equation.
A more appropriate approach for method comparison seems
to be the structural relationship model which allows error terms
for both variables. The orthogonal regressionsalso known as the
Deming proceduresis based on the assumption that the standard
deviation of the measurement errors is the same for both methods
and the standardized principal component procedure, considering
(5) Massart, D. J.; Vandeginste, B. M. G.; Deming, S. N.; Michotte, Y.; Kaufman,
L. Chemometrics: A Textbook; Elsevier: Amsterdam, 1988; p 88.
(6) Miller, J. C.; Miller, J. N. Statistics for Analytical Chemistry, 2nd ed.; Ellis
Horwood: Chichester, 1988; p 101.
(7) Passing, H.; Bablok, W. J. Clin. Chem. Clin. Biochem. 1983, 21, 709.
(8) Hartmann, C.; Smeyers-verbeke, J.; Massart, D. L. Analusis 1993, 21, 125.
(9) Kalantar, A. H.; Gelb. B. R.; Alper, J. S. Talanta 1995, 42, 597.

Analytical Chemistry, Vol. 70, No. 7, April 1, 1998 1277

that the standard deviation of the measurement errors is different,


but proportional.10-12 The parameters a0 and a1 of the regression
function are calculated in these cases by minimizing the sum of
squared distances to the calibration line. Extreme data points
have also a strong influence on the values of a0 and a1 and can
lead to biased estimates. A change in the assignment of the
methods to the variables does not alter the results of the method
comparison. Linear regression and the principal component
procedures require normally distributed error terms and sample
populations.
The procedures of from Theil17 and Passing and Bablok7,11 used
for method comparison studies are less sensitive to the underlying
distribution of the data. In particular, they are resistant to
deviating data points, which are likely to produce biased results
for procedures using least-squares estimators.
A NEW REGRESSION PROCEDURE
When two analytical methods are compared, because both,
more and less, are affected by errors, practically it is not important
which is X and which is Y. In other words, we may write as well
y ) f(x) or x ) f(y). In this situation, a more illuminating and
intuitive alternative is to consider the implicit form of the linear
function. It is well-known from analytical geometry that the
general linear function

Ax + By + C ) 0

(2)

Table 1. Relevant Data Sets Concerning Methods


Comparison Studies Discussed in This Paper
Example 1
Y
X

8.71
7.01
3.28
5.60
1.55
1.75
0.73
3.66

7.35
7.92
3.40
5.44
2.07
2.29
0.66
3.43

0.90
9.39
4.39
3.69
0.34
1.94
2.07
1.38

1.25
6.58
3.31
2.72
2.32
1.50
3.50
1.17

1.81
1.27
0.82
1.88
5.66
0.00
0.00

2.31
1.88
0.44
1.37
7.04
0.00
0.49

0.40
0.00
1.98
10.21
4.64
5.66
19.25

1.29
0.37
2.16
12.53
3.90
4.66
15.86

Examples 2 and 3
Y
X

2.34
1.20
1.88
0.08
0.12
1.12
1.60
22.40
2.16
1.34

2.48
1.22
2.14
0.0026
0.0023
1.05
1.42
26.30
1.99
1.06

1.35
2.04
1.97
1.02
1.45
28.20
22.60
22.37
27.00

1.04
1.89
1.90
0.75
1.16
29.40
23.70
23.30
29.50

Example 4
Y
X

7.32
15.80

5.48
13.00

4.60
9.04

3.29
6.84

0.38
7.27
0.28
1.55
0.06
1.50
1.06
5.19
0.33

7.16
6.80

0.35
7.75
0.14
1.61
0.013
1.63
1.05
5.50
0.054

1.96
0.33
5.54
1.85
3.40
4.19
0.04
3.02
1.33

1.80
0.23
5.21
1.76
3.62
4.07
0.048
3.06
1.35

6.00
5.84

9.90
28.70

14.30
18.80

represents a straight line when the coefficients A and B are


different from zero. Considering B * 0, then (2) can be
formulated as follows:

M(y2) - M(x2) B2 - A2 B A
) )
AB
A B
M(xy)

A
C
y ) - x - ) mx + n
B
B

Now, it is easy to observe that -A/B in (6) is the slope of the


linear equation

(3)

In this case, the distance of any point (xi,yi) to the line (2) will
have the following expression:

di2 )

(Axi + Byi + C)2

(4)

A 2 + B2

y ) mx + n

(6)

(7)

resulting from (3). After substitution of m in (6) and taking [M(y2)


- M(x2)]/M(xy) ) w, we obtain a quadratic equation,

m2 - wm -1 ) 0

(8)

The resulting fitting problem will be defined by the condition

S)

1
2

A +B

(Ax + By + C)

) min

(5)

i)1

Taking into account only the positive value of m and considering


that the centroid of x and y must satisfy the equation of the straight
line (7), we can calculate n from the following equation:

y ) mx + jy - mxj
The minimum of S follows from the approach to zero of the
derivative of S with respect to A, B, and C. From the resulting
system, considering M(x) ) 0, M(y) ) 0, and C ) 0, we obtain
(10) Feldmann, U.; Schneider, B.; Klinkers, H. J. Clin. Chem. Clin. Biochem.
1981, 19, 121.
(11) Bablok, W.; Passing, H. J. Clin. Lab. Aut. 1985, 7, 74.
(12) Ripley, B. D.; Thomson, M. Analyst 1987, 112, 377.
(13) Sarbu, C. Anal. Chim. Acta 1993, 271, 269.
(14) Sarbu, C. Anal. Lett. 1997, 30, 1051.
(15) Onicescu, O. C. R. Acad. Sci. Ser. A 1966, 263, 841.
(16) Onicescu, O.; S tefanescu, V. Informational Statisitics; Editura Tehnica:
Bucures ti, 1979.
(17) Theil, H. Proc. K. Ned. Akad. Wet., Ser. A53 1950, 386.
(18) Maw, R.; Witry, L.; Emond, T. Spectroscopy 1994, 49, 39.

1278 Analytical Chemistry, Vol. 70, No. 7, April 1, 1998

(9)

By comparing (2) and (7) and taking C ) 1, we can obtain


immediately A and B.
The ratio of A and B could indicate the (dis)similarity between
the analytical methods compared. Thus, it is now possible to
compare analytical methods, considering either the ratio A/B or
B/A (the equations are symmetrical) or, much more simply their
value. Considering the last possibility, we have to stress that, in
the ideal case, when the two methods produce practically equal
results, A and B will have the same absolute value. To the
contrary, in the other case, the higher the difference between A
and B, the more different will be the two methods. However, at

Table 2. Regression Analysis Concerning the Comparison of the Methods in Table 1


fitted lines
regression method

example 1

example 2

example 3

example 4

linear regression (OLS)

y ) 0.544 + 0.845x
x ) -0.299 + 1.089y

y ) -0.162 + 1.063x
x ) 0.203 + 0.899y

y ) -0.194 + 1.084x
x ) 0.193 + 0.920y

weighted regression (WLS)

y ) 0.005 + 0.890x
x ) -0.167 + 0.631y

y ) -0.081 + 0.839x
x ) 0.171 + 0.918y

y ) -0.066 + 0.795x
x ) 0.148 + 0.937y

iterated weighted
regression (IWLS)

y ) 0.109 + 0.904x
x ) -0.083 + 0.871y

y ) -0.135 + 1.048x
x ) 0.161 + 0.925y

y ) -0.153 + 1.062x
x ) 0.152 + 0.936y

maximum likelihood functional


relationship (MLFR)

y ) 0.106 + 0.973x

y ) -0.167 + 1.076x

y ) -0.161 + 1.068x

Deming method

y ) 0.412 + 0.881x

y ) -0.194 + 1.087x

y ) -0.202 + 1.085x

y ) 1.402 + 0.698x

implicit linear function(ILF)

2.040x -2.329y + 1 ) 0
y ) 0.429 + 0.876x
x ) -0.490 + 1.141y

-5.545x + 5.090y + 1 ) 0
y ) -0.196 + 1.089x
x ) 0.180 + 0.918x

-5.365x + 4.943y + 1 ) 0
y ) -0.202 + 1.085x
x ) 0.186 + 0.921y

3.856x -5.77y + 1 ) 0
y ) 1.733 + 0.668x
x ) -2.593 + 1.496y

Passing and Bablok

y ) 0.859 + 0.588x

y ) -0.204 + 1.007x

y ) 1.087 + 1.024x

y ) -2.442 + 1.108x

this moment, one problem remains: how to test objectively and


rationally the significance of the difference between the absolute
values of the two coefficients A and B. This can be done using,
for example, informational analysis of variance, as demonstrated
in the next section.
INFORMATIONAL ANALYSIS OF VARIANCE
The informational analysis of variance (IANOVA) method,13,14
n
based on informational energy (E ) i)1
pi2),15,16 is a distributionfree procedure that is valid under minimal assumptions. It is not
influenced by the range of the data and has very satisfactory
robustness properties. The null hypothesis in this case is
equivalent to the hypothesis Ho: pA ) pB. The probabilities p are
calculated using the following relations:

concentrations varying from 0 to 20 g L-1. Considering the


results obtained (Table 2) using four different calibration methods,
namely ordinary least-squares (OLS), weighted regression (WLS),
iterated weighted least-squares (IWLS), and a maximum likelihood
functional relationship (MLFR) algorithm, Ripley and Thomson
concluded that the best results were produced by MLFR.
Comparing the results obtained by computation of the implicit
linear function method (ILF), it is easy to notice that ILF is closer
to the results obtained by the Deming method, also included in
Table 2. Passing and Babloks linear regression algorithm gave
very biased parameters; in fact, it is the least sensitive to outliers.
The empirical informational energy associated with the probabilities pA and pB is given by
2

A
pA )
A+B

and

B
pB )
A+B

(10)

The Ho hypothesis is true when E ) 1/2, i.e., when the informational energy is minimal and the two coefficients A and B are equal
in their absolute values. As was shown above, the null hypothesis
is accepted if E ) , where E ) 1/2 represents the theoretical
informational energy, and  ) (A2 + B2)/(A + B)2, represents
empirical informational energy. On the other hand, if E * , the
null hypothesis is rejected. Hence, the difference between the
two compared methods is taken as significant.
RESULTS AND DISCUSSION
Some relevant studies concerning the methods comparison in
well-cited papers are considered to compare the advantages of
the new regression method described above.
Illustrative Example 1. To illustrate the characteristics of
performance of the algorithm, taking into account the errors in
both methods in the case of small deviations from homoscedasticity or in the presence of outliers, we refer to data discussed by
Ripley and Thomson12 concerning a set of 30 pairs of determinations of arsenate(V) ion in natural river water (Table 1). The x
values are determined by selective reduction and atomic absorption spectrometry, whereas the y values came from cold trapping
and atomic emission spectrometry. The quoted values are

y ) 2.279 + 0.619x
x ) 0.413 + 1.402y

1 )

i)1

pi2 )

(A2 + B2)
(A + B)2

9.584
19.08

) 0.502

(11)

As E ) 1, it is concluded that, according to this test, there is no


statistically significant difference between the results of the two
methods.
Illustrative Examples 2 and 3. The second example in the
paper of Ripley and Thomson is determination of beryllium in rock
and soil reference samples. The x values were obtained by an
inductively coupled plasma atomic emission spectrometric (ICPAES) method after fusion with lithium metaborate and dissolution
in dilute nitric acid; the y values were determined by an AAS
method after acid decomposition and solvent extraction (Table
2).
The first batch of specimens, with concentrations in the range
0.1-2.5 g L-1, gave the results presented in Table 2. The authors
concluded from this sample that there is no difference between
the methods at these concentrations, and here the IWLS results
are a good approximation to MLFR. In this case, we have to
emphasize again a very good agreement between the Deming
method and ILF on one hand and between these methods and
MLFR on the other hand.
The techniques were then applied to the 37 specimens in Table
1, with concentrations in the range 0.0-30 g L-1. Here, all the
results are in very good agreement. Weighted regression of y
Analytical Chemistry, Vol. 70, No. 7, April 1, 1998

1279

on x is worse, but the remaining lines are practically the same.


Applying the informational analysis of variance, it can be
concluded that, also in these cases, there is no significant
difference between the two analytical methods because the
equality E )  is obtained in each case (2 ) 0.500 and 3 ) 0.500).
Illustrative Example 4. The last example reported in this
paper refers to the data discussed in ref 17 concerning the
effectiveness of traditional water bath digestion used in U.S. EPA
method 7471 and microwave digestion method 3051. The results
obtained in the determination of mercury (in the ppm range) in
solid wastes by AAS using the two preparation sample methods
(see Table 1) are compared through OLS, the Deming method,
the Passing and Bablok method, and ILF. By examining the
results obtained (see Table 2), we can conclude that ILF and the
Deming method are in very close agreement; again the Passing
and Bablok procedure provides very different results. As in the
case E *  (E ) 0.500 and 4 ) 0.520), the difference between
the two methods of sample preparation is significant, and it is
concluded that there is an important method effect, which means

1280 Analytical Chemistry, Vol. 70, No. 7, April 1, 1998

that one method shows a bias. This conclusion, i.e., the presence
of proportional errors introduced by microwave digestion method,
concerning the analysis of mercury in waste samples was also
recently demonstrated.14
CONCLUSIONS
A new approach for the analysis of method comparison studies
over a wide range of concentrations was discussed and compared
with more or less common regression procedures. All the results
obtained show that the implicit linear function model (ILF) is an
effective method for parameters estimation and methods comparison and can replace the least-squares method and other
proposed approaches. For any given set of data, an evaluation
based on an informational analysis of variance test allows a more
reliable bias detection.
Received for review August 25, 1997. Accepted January
7, 1998.
AC970926Y

You might also like