Professional Documents
Culture Documents
There are lies, damn lies, and statistics...... (Anon.) 6.1 Introduction 6.2 Definitions 6.3 Basic Statistics 6.4 Statistical tests
6.1 Introduction
In the preceding chapters basic elements for the proper execution of analytical work such as personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the actual analytical work, however, one more tool for the quality assurance of the work must be dealt with: the statistical operations necessary to control and verify the analytical procedures (Chapter 7) as well as the resulting data (Chapter 8). It was stated before that making mistakes in analytical work is unavoidable. This is the reason why a complex system of precautions to prevent errors and traps to detect them has to be set up. An important aspect of the quality control is the detection of both random and systematic errors. This can be done by critically looking at the performance of the analysis as a whole and also of the instruments and operators involved in the job. For the detection itself as well as for the quantification of the errors, statistical treatment of data is indispensable. A multitude of different statistical tools is available, some of them simple, some complicated, and often very specific for certain purposes. In analytical work, the most important common operation is the comparison of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a few simple convenient statistical tools most of the information needed in regular laboratory work can be obtained: the "t-test, the "F-test", and regression analysis. Therefore, examples of these will be given in the ensuing pages. Clearly, statistics are a tool, not an aim. Simple inspection of data, without statistical treatment, by an experienced and dedicated analyst may be just as useful as statistical figures on the desk of the disinterested. The value of statistics lies with organizing and simplifying data, to permit some objective estimate showing that an analysis is under control or that a change has occurred. Equally important is that the results of these statistical procedures are recorded and can be retrieved.
6.2 Definitions
6.2.1 Error 6.2.2 Accuracy 6.2.3 Precision 6.2.4 Bias
Discussing Quality Control implies the use of several terms and concepts with a specific (and sometimes confusing) meaning. Therefore, some of the most important concepts will be defined first.
6.2.1 Error
Error is the collective noun for any departure of the result from the "true" value*. Analytical errors can be: 1. Random or unpredictable deviations between replicates, quantified with the "standard deviation". 2. Systematic or predictable regular deviation from the "true" value, quantified as "mean difference" (i.e. the difference between the true value and the mean of replicate determinations). 3. Constant, unrelated to the concentration of the substance analyzed (the analyte). 4. Proportional, i.e. related to the concentration of the analyte. * The "true" value of an attribute is by nature indeterminate and often has only a very relative meaning. Particularly in soil science for several attributes there is no such thing as the true value as any value obtained is method-dependent (e.g. cation exchange capacity). Obviously, this does not mean that no adequate analysis serving a purpose is possible. It does, however, emphasize the need for the establishment of standard reference methods and the importance of external QC (see Chapter 9).
6.2.2 Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is constituted by a combination of random and systematic errors (precision and bias) and cannot be quantified directly. The test result may be a mean of several values. An accurate determination produces a "true" quantitative value, i.e. it is precise and free of bias.
6.2.3 Precision
The closeness with which results of replicate analyses of a sample agree. It is a measure of dispersion or scattering around the mean value and usually expressed in terms of standard deviation, standard error or a range (difference between the highest and the lowest result).
6.2.4 Bias
The consistent deviation of analytical results from the "true" value caused by systematic errors in a procedure. Bias is the opposite but most used measure for "trueness" which is the agreement of the mean of analytical results with the true value, i.e. excluding the contribution of randomness represented in precision. There are several components contributing to bias: 1. Method bias
The difference between the (mean) test result obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level. 2. Laboratory bias The difference between the (mean) test result from a particular laboratory and the accepted reference value. 3. Sample bias The difference between the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In practice, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important aspect but usually falls outside the responsibility of the laboratory (in some cases laboratories have their own field sampling personnel). The relationship between these concepts can be expressed in the following equation: Figure
The types of errors are illustrated in Fig. 6-1. Fig. 6-1. Accuracy and precision in laboratory measurements. (Note that the qualifications apply to the mean of results: in c the mean is accurate but some individual results are inaccurate)
6.3.1 Mean
The average of a set of n data x i:
(6.1)
or
(6.3)
or
(6.4)
The calculation of the mean and the standard deviation can easily be done on a calculator but most conveniently on a PC with computer programs such as
dBASE, Lotus 123, Quattro-Pro, Excel, and others, which have simple ready-touse functions. (Warning: some programs use n rather than n- 1!).
Note. When needed (e.g. for the F-test, see Eq. 6.11) the variance can, of course, be calculated by squaring the standard deviation:
V = s2 (6.7)
where = "true" value (mean of large set of replicates) x = mean of subsamples t = a statistical value which depends on the number of data and the required confidence (usually 95%). s = standard deviation of mean of subsamples n = number of subsamples (The term is also known as the standard error of the mean.) The critical values for t are tabulated in Appendix 1 (they are, therefore, here referred to as ttab ). To find the applicable value, the number of degrees of freedom has to be established by: df = n -1 (see also Section 6.4.2). Example For the determination of the clay content in the particle-size analysis, a semiautomatic pipette installation is used with a 20 mL pipette. This volume is approximate and the operation involves the opening and closing of taps. Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and precision have to be established. A tenfold measurement of the volume yielded the following set of data (in mL):
19.941 19.797
19.812 19.937
19.829 19.847
19.828 19.885
19.742 19.804
The mean is 19.842 mL and the standard deviation 0.0627 mL. According to Appendix 1 for n = 10 is ttab = 2.26 (df = 9) and using Eq. (6.8) this calibration yields: pipette volume = 19.842 2.26 (0.0627/ ) = 19.84 0.04 mL (Note that the pipette has a systematic deviation from 20 mL as this is outside the found confidence interval. See also bias). In routine analytical work, results are usually single values obtained in batches of several test samples. No laboratory will analyze a test sample 50 times to be confident that the result is reliable. Therefore, the statistical parameters have to be obtained in another way. Most usually this is done by method validation (see Chapter 7) and/or by keeping control charts, which is basically the collection of analytical results from one or more control samples in each batch (see Chapter 8). Equation (6.8) is then reduced to
(6.9)
where = "true" value x = single measurement t = applicable ttab (Appendix 1) s = standard deviation of set of previous measurements. In Appendix 1 can be seen that if the set of replicated measurements is large (say > 30), t is close to 2. Therefore, the (95%) confidence of the result x of a single test sample (n = 1 in Eq. 6.8) is approximated by the commonly used and well known expression
(6.10)
where S is the previously determined standard deviation of the large set of replicates (see also Fig. 6-2). Note: This "method-s" or s of a control sample is not a constant and may vary for different test materials, analyte levels, and with analytical conditions. Running duplicates will, according to Equation (6.8), increase the confidence of the (mean) result by a factor :
where x = mean of duplicates s = known standard deviation of large set Similarly, triplicate analysis will increase the confidence by a factor , etc. Duplicates are further discussed in Section 8.3.3. Thus, in summary, Equation (6.8) can be applied in various ways to determine the size of errors (confidence) in analytical work or measurements: single
determinations in routine work, determinations for which no previous data exist, certain calibrations, etc.
can be expected from reducing the largest individual contribution, in this case the exchangeable acidity. 2. Multiplication calculations If the final result x is obtained from multiplication (or subtraction) of (sub)measurements according to
then the total error is expressed by the standard deviation obtained by taking the square root of the sum of the individual relative standard deviations (RSD or CV, as a fraction or as percentage, see Eqs. 6.6 and 6.7): If a (sub)measurement has a constant multiplication factor or coefficient, then this is included to calculate the effect of the RSD concerned, e.g. (2RSDb)2. Example The calculation of Kjeldahl-nitrogen may be as follows:
where a = ml HCl required for titration sample b = ml HCl required for titration blank s = air-dry sample weight in gram M = molarity of HCl 1.4 = 1410-3100% (14 = atomic weight of N) mcf = moisture correction factor Note that in addition to multiplications, this calculation contains a subtraction also (often, calculations contain both summations and multiplications.) Firstly, the standard deviation of the titration (a -b) is determined as indicated in Section 7 above. This is then transformed to RSD using Equations (6.5) or (6.6). Then the RSD of the other individual parameters have to be determined experimentally. The found RSDs are, for instance: distillation: 0.8%, titration: 0.5%, molarity: 0.2%, sample weight: 0.2%, mcf: 0.2%. The total calculated precision is: Here again, the highest RSD (of distillation) dominates the total precision. In practice, the precision of the Kjeldahl method is usually considerably worse ( 2.5%) probably mainly as a result of the heterogeneity of the sample. The present example does not take that into account. It would imply that 2.5% - 1.0% = 1.5% or 3/5 of the total random error is due to sample heterogeneity (or other overlooked cause). This implies that painstaking efforts to improve subprocedures such as the titration or the preparation of standard solutions may not be very rewarding. It would, however, pay to improve the homogeneity of the sample, e.g. by careful grinding and mixing in the preparatory stage.
Note. Sample heterogeneity is also represented in the moisture correction factor. However, the influence of this factor on the final result is usually very small.
is the routine comparison of a control chart with the previous one (see 8.3). However, when it is expected or suspected that the mean and/or the standard deviation will go only one way, e.g. after a change in an analytical procedure, the one-sided (or one-tailed) test is appropriate. In this case the probability that it goes the other way than expected is assumed to be zero and, therefore, the probability that it goes the expected way is doubled. Or, more correctly, the uncertainty in the two-way test of 5% (or the probability of 5% that the critical value is exceeded) is divided over the two tails of the Gaussian curve (see Fig. 6-2), i.e. 2.5% at the end of each tail beyond 2s. If we perform the one-sided test with 5% uncertainty, we actually increase this 2.5% to 5% at the end of one tail. (Note that for the whole gaussian curve, which is symmetrical, this is then equivalent to an uncertainty of 10% in two ways!) This difference in probability in the tests is expressed in the use of two tables of critical values for both F and t. In fact, the one-sided table at 95% confidence level is equivalent to the two-sided table at 90% confidence level. It is emphasized that the one-sided test is only appropriate when a difference in one direction is expected or aimed at. Of course it is tempting to perform this test after the results show a clear (unexpected) effect. In fact, however, then a two times higher probability level was used in retrospect. This is underscored by the observation that in this way even contradictory conclusions may arise: if in an experiment calculated values of F and t are found within the range between the two-sided and one-sided values of Ftab, and ttab, the two-sided test indicates no significant difference, whereas the one-sided test says that the result of A is significantly higher (or lower) than that of B. What actually happens is that in the first case the 2.5% boundary in the tail was just not exceeded, and then, subsequently, this 2.5% boundary is relaxed to 5% which is then obviously more easily exceeded. This illustrates that statistical tests differ in strictness and that for proper nterpretation of results in reports, the statistical techniques used, i including the confidence limits or probability, should always be specified.
where the larger s 2 must be the numerator by convention. If the performances are not very different, then the estimates s 1, and s2, do not differ much and their ratio (and that of their squares) should not deviate much from unity. In practice, the calculated F is compared with the applicable F value in the F-table (also called the critical value, see Appendix 2). To read the table it is necessary to know the applicable number of degrees of freedom for s 1, and s 2. These are calculated by: df1 = n1-1 df2 = n2-1
If Fcal Ftab one can conclude with 95% confidence that there is no significant difference in precision (the "null hypothesis" that s1, = s, is accepted). Thus, there is still a 5% chance that we draw the wrong conclusion. In certain cases more confidence may be needed, then a 99% confidence table can be used, which can be found in statistical textbooks. Example I (two-sided test) Table 6-1 gives the data sets obtained by two analysts for the cation exchange capacity (CEC) of a control sample. Using Equation (6.11) the calculated F value is 1.62. As we had no particular reason to expect that the analysts would perform differently, we use the F-table for the two-sided test and find Ftab = 4.03 (Appendix 2, df1, = df2 = 9). This exceeds the calculated value and the null hypothesis (no difference) is accepted. It can be concluded with 95% confidence that there is no significant difference in precision between the work of Analyst 1 and 2. Table 6-1. CEC values (in cmolc /kg) of a control sample determined by two analysts.
1 10.2 10.7 10.5 9.9 9.0 11.2 11.5 10.9 8.9 10.6 x: s: n: Fcal = 1.62 Ftab = 4.03 2 9.7 9.0 10.2 10.3 10.8 11.1 9.4 9.2 9.8 10.2 10.34 0.819 10 tcal = 1.12 ttab = 2.10 9.97 0.644 10
Example 2 (one-sided test) The determination of the calcium carbonate content with the Scheibler standard method is compared with the simple and more rapid "acid-neutralization" method using one and the same sample. The results are given in Table 6-2. Because of the nature of the rapid method we suspect it to produce a lower precision then
obtained with the Scheibler method and we can, therefore, perform the one sided F-test. The applicable Ftab = 3.07 (App. 2, df1, = 12, df2 = 9) which is lower than Fcal (=18.3) and the null hypothesis (no difference) is rejected. It can be concluded (with 95% confidence) that for this one sample the precision of the rapid titration method is significantly worse than that of the Scheibler method. Table 6-2. Contents of CaCO3 (in mass/mass %) in a soil sample determined with the Scheibler method (A) and the rapid titration method (B).
A 2.5 2.4 2.5 2.6 2.5 2.5 2.4 2.6 2.7 2.4 B 1.7 1.9 2.3 2.3 2.8 2.5 1.6 1.9 2.6 1.7 2.4 2.2 2.6 x: s: n: Fcal = 18.3 Ftab = 3.07 2.51 0.099 10 tcal = 3.12 ttab* = 2.18 2.13 0.424 13
6.4.3.3 t-Test for large data sets (n 30) 6.4.3.4 Paired t-test Depending on the nature of two sets of data (n, s, sampling nature), the means of the sets can be compared for bias by several variants of the t-test. The following most common types will be discussed: 1. Student's t-test for comparison of two independent sets of data with very similar standard deviations; 2. the Cochran variant of the t-test when the standard deviations of the independent sets differ significantly; 3. the paired t-test for comparison of strongly dependent sets of data. Basically, for the t-tests Equation (6.8) is used but written in a different way:
(6.12)
where x = mean of test results of a sample = "true" or reference value s = standard deviation of test results n = number of test results of the sample. To compare the mean of a data set with a reference value normally the "twosided t-table of critical values" is used (Appendix 1). The applicable number of degrees of freedom here is: df = n-1 If a value for t calculated with Equation (6.12) does not exceed the critical value in the table, the data are taken to belong to the same population: there is no difference and the "null hypothesis" is accepted (with the applicable probability, usually 95%). As with the F-test, when it is expected or suspected that the obtained results are higher or lower than that of the reference value, the one-sided t-test can be performed: if tcal > ttab, then the results are significantly higher (or lower) than the reference value. More commonly, however, the "true" value of proper reference samples is accompanied by the associated standard deviation and number of replicates used to determine these parameters. We can then apply the more general case of comparing the means of two data sets: the "true" value in Equation (6.12) is then replaced by the mean of a second data set. As is shown in Fig. 6-3, to test if two data sets belong to the same population it is tested if the two Gauss curves do sufficiently overlap. In other words, if the difference between the means x1-x2 is small. This is discussed next. Similarity or non-similarity of standard deviations When using the t-test for two small sets of data (n1 and/or n2<30), a choice of the type of test must be made depending on the similarity (or non-similarity) of the standard deviations of the two sets. If the standard deviations are sufficiently similar they can be "pooled" and the Student t-test can be used. When the standard deviations are not sufficiently similar an alternative procedure for the t-
test must be followed in which the standard deviations are not pooled. A convenient alternative is the Cochran variant of the t-test. The criterion for the choice is the passing or non-passing of the F-test (see 6.4.2), that is, if the variances do or do not significantly differ. Therefore, for small data sets, the Ftest should precede the t-test. For dealing with large data sets (n1, n2, 30) the "normal" t-test is used (see Section 6.4.3.3 and App. 3).
where x1 = mean of data set 1 x2 = mean of data set 2 s p = "pooled" standard deviation of the sets n1 = number of data in set 1 n2 = number of data in set 2. The pooled standard deviation s p is calculated by:
6.14
where s 1 = standard deviation of data set 1 s 2 = standard deviation of data set 2 n1 = number of data in set 1 n2 = number of data in set 2. To perform the t-test, the critical ttab has to be found in the table (Appendix 1); the applicable number of degrees of freedom df is here calculated by: df = n1 + n2 -2 Example The two data sets of Table 6-1 can be used: With Equations (6.13) and (6.14) tcal, is calculated as 1.12 which is lower than the critical value ttab of 2.10 (App. 1, df = 18, two-sided), hence the null hypothesis (no difference) is accepted and the two data sets are assumed to belong to the same population: there is no significant difference between the mean results of the two analysts (with 95% confidence). Note. Another illustrative way to perform this test for bias is to calculate if the difference between the means falls within or outside the range where this difference is still not significantly large. In other words, if this difference is less than the least significant difference (lsd). This can be derived from Equation (6.13):
6.15
In the present example of Table 6 the calculation yields lsd = 0.69. The -1, measured difference between the means is 10.34 -9.97 = 0.37 which is smaller than the lsd indicating that there is no significant difference between the performance of the analysts. In addition, in this approach the 95% confidence limits of the difference between the means can be calculated (cf. Equation 6.8): confidence limits = 0.37 0.69 = -0.32 and 1.06 Note that the value 0 for the difference is situated within this confidence interval which agrees with the null hypothesis of x 1 = x 2 (no difference) having been accepted.
where t1 = ttab at n1-1 degrees of freedom t2 = ttab at n2-1 degrees of freedom Now the t-test can be performed as usual: if tcal< ttab* then the null hypothesis that the means do not significantly differ is accepted. Example The two data sets of Table 6-2 can be used. According to the F-test, the standard deviations differ significantly so that the Cochran variant must be used. Furthermore, in contrast to our expectation that the precision of the rapid test would be inferior, we have no idea about the bias and therefore the two-sided test is appropriate. The calculations yield tcal = 3.12 and ttab*= 2.18 meaning that tcal exceeds ttab* which implies that the null hypothesis (no difference) is rejected and that the mean of the rapid analysis deviates significantly from that of the standard analysis (with 95% confidence, and for this sample only). Further investigation of the rapid method would have to include the use of more different samples and then comparison with the onesided t-test would be justified (see 6.4.3.4, Example 1).
In the example above (6.4.3.2) the conclusion happens to have been the same if the Student's t-test with pooled standard deviations had been used. This is
caused by the fact that the difference in result of the Student and Cochran variants of the t-test is largest when small sets of data are compared, and decreases with increasing number of data. Namely, with increasing number of data a better estimate of the real distribution of the population is obtained (the flatter t-distribution converges then to the standardized normal distribution). When n 30 for both sets, e.g. when comparing Control Charts (see 8.3), for all practical purposes the difference between the Student and Cochran variant is negligible. The procedure is then reduced to the "normal" t-test by simply calculating tcal with Eq. (6.16) and comparing this with ttab at df = n1 + n2-2. (Note in App. 1 that the two-sided ttab is now close to 2). The proper choice of the t-test as discussed above is summarized in a flow diagram in Appendix 3.
3 4 5 6 7 8 9 10 d = +2.19 s d = 2.395
10.6 2.3 25.2 4.4 7.8 2.7 14.3 13.6 tcal = 2.89 ttab = 2.26
Using Equation (6.12) and noting that d = 0 (hypothesis value of the differences, i.e. no difference), the t-value can be calculated as:
where = mean of differences within each pair of data s d = standard deviation of the mean of differences n = number of pairs of data The calculated t value (=2.89) exceeds the critical value of 1.83 (App. 1, df = n 1 = 9, one-sided), hence the null hypothesis that the methods do not differ is rejected and it is concluded that the silver thiourea method gives significantly higher results as compared with the ammonium acetate method when applied to such highly weathered soils. Note. Since such data sets do not have a normal distribution, the "normal" t-test which compares means of sets cannot be used here (the means do not constitute a fair representation of the sets). For the same reason no information about the precision of the two methods can be obtained, nor can the F-test be applied. For information about precision, replicate determinations are needed. Example 2 Table 6-4 shows the data of total-P in four plant tissue samples obtained by a laboratory L and the median values obtained by 123 laboratories in a proficiency (round-robin) test. Table 6-4. Total-P contents (in mmol/kg) of plant tissue as determined by 123 laboratories (Median) and Laboratory L.
Sample Median Lab L d
1 2 3 4 d = 7.70 s d = 12.702
-7.8 23 5.6 10
To verify the performance of the laboratory a paired t-test can be performed: Using Eq. (6.12) and noting that d=0 (hypothesis value of the differences, i.e. no difference), the t value can be calculated as:
The calculated t-value is below the critical value of 3.18 (Appendix 1, df = n - 1 = 3, two-sided), hence the null hypothesis that the laboratory does not significantly differ from the group of laboratories is accepted, and the results of Laboratory L seem to agree with those of "the rest of the world" (this is a so-called third-line control).
As was discussed in Section 6.4.3, such comparisons can often been done with the Student/Cochran or paired t-tests. However, correlation analysis is indicated: 1. When the concentration range is so wide that the errors, both random and systematic, are not independent (which is the assumption for the ttests). This is often the case where concentration ranges of several magnitudes are involved. 2. When pairing is inappropriate for other reasons, notably a long time span between the two analyses (sample aging, change in aboratory l conditions, etc.). The principle is to establish a statistical linear relationship between two sets of corresponding data by fitting the data to a straight line by means of the "least squares" technique. Such data are, for example, analytical results of two methods applied to the same samples (correlation), or the response of an instrument to a series of standard solutions (regression). Note: Naturally, non-linear higher-order relationships are also possible, but since these are less common in analytical work and more complex to handle mathematically, they will not be discussed here. Nevertheless, to avoid misinterpretation, always inspect the kind of relationship by plotting the data, either on paper or on the computer monitor. The resulting line takes the general form:
y = bx + a (6.18)
where a = intercept of the line with the y-axis b = slope (tangent) In laboratory work ideally, when there is perfect positive correlation without bias, the intercept a = 0 and the slope = 1. This is the so-called "1:1 line" passing through the origin (dashed line in Fig. 6-5). If the intercept a 0 then there is a systematic discrepancy (bias, error) between X and Y; when b 1 then there is a proportional response or difference between X and Y. The correlation between X and Y is expressed by the correlation coefficient r which can be calculated with the following equation:
6.19
where x i = data X x = mean of data X y i = data Y y = mean of data Y It can be shown that r can vary from 1 to -1: r = 1 perfect positive linear correlation r = 0 no linear correlation (maybe other correlation) r = -1 perfect negative linear correlation
Often, the correlation coefficient r is expressed as r2: the coefficient of determination or coefficient of variance. The advantage of r2 is that, when multiplied by 100, it indicates the percentage of variation in Y associated with variation in X. Thus, for example, when r = 0.71 about 50% (r2 = 0.504) of the variation in Y is due to the variation in X. The line parameters b and a are calculated with the following equations:
6.20
and
a = y - bx 6.21
It is worth to note that r is independent of the choice which factor is the independent factory and which is the dependent Y. However, the regression parameters a and do depend on this choice as the regression lines will be different (except when there is ideal 1:1 correlation).
and a = 0.350 - 0.313 = 0.037 Thus, the equation of the calibration line is:
y = 0.626x + 0.037 (6.22)
0.3 0.5 0
0.17 0.32 0
Fig. 6-4. Calibration graph plotted from data of Table 6-5. The dashed lines delineate the 95% confidence area of the graph. Note that the confidence is highest at the centroid of the graph.
During calculation, the maximum number of decimals is used, rounding off to the last significant figure is done at the end (see instruction for rounding off in Section 8.2). Once the calibration graph is established, its use is simple: for each y value measured the corresponding concentration x can be determined either by direct reading or by calculation using Equation (6.22). The use of calibration graphs is further discussed in Section 7.2.2. Note. A treatise of the error or uncertainty in the regression line is given.
6.4.4.2 Comparing two sets of data using many samples at different analyte levels
Although regression analysis assumes that one factor (on the x-axis) is constant, when certain conditions are met the technique can also successfully be applied to comparing two variables such as laboratories or methods. These conditions are: - The most precise data set is plotted on the x-axis - At least 6, but preferably more than 10 different samples are analyzed - The samples should rather uniformly cover the analyte level range of interest. To decide which laboratory or method is the most precise, multi-replicate results have to be used to calculate standard deviations (see 6.4.2). If these are not available t en the standard deviations of the present sets could be compared h (note that we are now not dealing with normally distributed sets of replicate results). Another convenient way is to run the regression analysis on the computer, reverse the variables and run the analysis again. Observe which variable has the lowest standard deviation (or standard error of the intercept a, both given by the computer) and then use the results of the regression analysis where this variable was plotted on the x-axis. If the analyte level range is incomplete, one might have to resort to spiking or standard additions, with the inherent drawback that the original analyte-sample combination may not adequately be reflected. Example In the framework of a performance verification programme, a large number of soil samples were analyzed by two laboratories X and Y (a form of "third-line control", see Chapter 9) and the data compared by regression. (In this particular case, the paired t-test might have been considered also). The regression line of a common attribute, the pH, is shown here as an illustration. Figure 6-5 shows the so-called "scatter plot" of 124 soil pH-H2O determinations by the two laboratories. The correlation coefficient r is 0.97 which is very satisfactory. The slope (= 1.03) indicates that the regression line is only slightly steeper than the 1:1 ideal regression line. Very disturbing, however, is the intercept a of -1.18. This implies that laboratory Y measures the pH more than a whole unit lower than laboratory X at the low end of the pH range (the intercept -1.18 is at pHx = 0) which difference decreases to about 0.8 unit at the high end. Fig. 6-5. Scatter plot of pH data of two laboratories. Drawn line: regression line; dashed line: 1:1 ideal regression line.
The t-test for significance is as follows: For intercept a: a = 0 (null hypothesis: no bias; ideal intercept is then zero), standard error =0.14 (calculated by the computer), and using Equation (6.12) we obtain: Here, ttab = 1.98 (App. 1, two-sided, df = n - 2 = 122 (n-2 because an extra degree of freedom is lost as the data are used for both a and b) hence, the laboratories have a significant mutual bias. For slope: b = 1 (ideal slope: null hypothesis is no difference), standard error = 0.02 (given by computer), and again using Equation (6.12) we obtain: Again, ttab = 1.98 (App. 1; two-sided, df = 122), hence, the difference between the laboratories is not significantly proportional (or: the laboratories do not have a significant difference in sensitivity). These results suggest that in spite of the good correlation, the two laboratories would have to look into the cause of the bias. Note. In the present example, the scattering of the points around the regression line does not seem to change much over the whole range.
This indicates that the precision of laboratory Y does not change very much over the range with respect to laboratory X. This is not always the case. In such cases, weighted regression (not discussed here) is more appropriate than the unweighted regression as used here. Validation of a method (see Section 7.5) may reveal that precision can change significantly with the level of analyte (and with other factors such as sample matrix).
where = "fitted" y-value for each x i, (read from graph or calculated with Eq. 6.22). Thus, the line. is the (vertical) deviation of the found y-values from
n = number of calibration points. Note: Only the y-deviations of the points from the line are considered. It is assumed that deviations in the x -direction are negligible. This is, of course, only the case if the standards are very accurately prepared. Now the standard deviations for the intercept a and slope b can be calculated with:
6.24
and
6.25
To make this procedure clear, the parameters involved are listed in Table 6-6. The uncertainty about the regression line is expressed by the confidence limits of a and b according to Eq. (6.9): a t.s a and b t.s b Table 6-6. Parameters for calculating errors due to calibration graph (use also figures of Table 6-5).
xi 0 0.2 0.4 0.6 0.8 1.0 yi 0.05 0.14 0.29 0.43 0.52 0.67 0.037 0.162 0.287 0.413 0.538 0.663 0.013 -0.022 0.003 0.017 -0.018 0.007 0.0002 0.0005 0.0000 0.0003 0.0003 0.0001 0.001364
The applicable ttab is 2.78 (App. 1, two-sided, df = n -1 = 4) hence, using Eq. (6.9): a = 0.037 2.78 0.0132 = 0.037 0.037 and b = 0.626 2.78 0.0219 = 0.626 0.061 Note that if s a is large enough, a negative value for a is possible, i.e. a negative reading for the blank or zero-standard. (For a discussion about the error in x resulting from a reading in y, which is particularly relevant for reading a calibration graph, see Section 7.2.3)
The uncertainty about the line is somewhat decreased by using more calibration points (assuming s y has not increased): one more point reduces ttab from 2.78 to 2.57 (see Appendix 1).
7.1 Introduction
In this chapter the actual execution of the jobs for which the laboratory is intended, is dealt with. The most important part of this work is of course the analytical procedures meticulously performed according to the corresponding SOPs. Relevant aspects include calibration, use of blanks, performance characteristics of the procedure, and reporting of results. An aspect of utmost importance of quality management, the quality control by inspection of the results, is discussed separately in Chapter 8. All activities associated with these aspects are aimed at one target: the production of reliable data with a minimum of errors. In addition, it must be ensured that reliable data are produced consistently. To achieve this an appropriate programme of quality control (QC) must be implemented. Quality control is the term used to describe the practical steps undertaken to ensure that errors in the analytical data are of a magnitude appropriate for the use to which the data will be put. This implies that the errors (which are unavoidably made) have to be quantified to enable a decision whether they are of an acceptable magnitude, and that unacceptable errors are discovered so that corrective action can be taken. Clearly, quality control must detect both random and systematic errors. The procedures for QC primarily monitor the accuracy of the work by checking the bias of data with the help of (certified) reference samples and control samples and the precision by means of replicate analyses of test samples as well as of reference and/or control samples.
7.2.1 Principle
Here, the construction and use of calibration graphs or curves in daily practice of a laboratory will be discussed. Calibration of instruments (including adjustment) in the present context are also referred to as standardization. The confusion
about these terms is mainly semantic and the terms calibration curve and standard curve are generally used interchangeably. The term "curve" implies that the line is not straight. However, the best (parts of) calibration lines are linear and, therefore, the general term "graph" is preferred. For many measuring techniques calibration graphs have to be constructed. The technique is simple and consists of plotting the instrument response against a series of samples with known concentrations of the analyte (standards). In practice, these standards are usually pure chemicals dispersed in a matrix corresponding with that of the test samples (the "unknowns"). By convention, the calibration graph is always plotted with the concentration of the standards on the x-axis and the reading of the instrument response on the y-axis. The unknowns are determined by interpolation, not by extrapolation, so that a suitable working range for the standards must be selected. In addition, in the present discussion it is assumed that the working range is limited to the linear range of the calibration graphs and that the standard deviation does not change over the range (neither of which is always the case* and that data are normally distributed. Non-linear graphs can sometimes be linearized in a simple way, e.g. by using a log scale (in potentiometry), but usually imply statistical problems (polynomial regression) for which the reader is referred to the relevant literature. It should be mentioned, however, that in modem instruments which make and use calibration graphs automatically these aspects sometimes go by unnoticed. * This is the so-called "unweighted" regression line. Because normally the standard deviation is not constant over the concentration range (it is usually least in the middle range), this difference in error should be taken into account. This would then yield a "weighted regression line". The calculation of this is more complicated and information about the standard deviation of the y -readings has to be obtained. The gain in precision is usually very limited, but sometimes the extra information about the error may be useful. Some common practices to obtain calibration graphs are: 1. The standards are made in a solution with the same composition as the extractant used for the samples (with the same dilution factor) so that all measurements are done in the same matrix. This technique is often practised when analyzing many batches where the same standards are used for some time. In this way an incorrectly prepared extractant or matrix may be detected (in blank or control sample). 2. The standards are made in the blank extract. A disadvantage of this technique is that for each batch the standards have to be pipetted. Therefore, this type of calibration is sometimes favoured when only one or few batches are analyzed or when the extractant is unstable. A seeming advantage is that the blank can be forced to zero. However, an incorrect extractant would then more easily go by undetected. The disadvantage of pipetting does not apply in case of automatic dispensing of reagents when equal volumes of different concentration are added (e.g. with flow-injection). 3. Less common, but useful in special cases is the so-called standard additions technique. This can be practised when a matrix mismatch between samples and standards needs to be avoided: the standards are
prepared from actual samples. The general procedure is to take a number of aliquots of sample or extract, add different quantities of the analyte to each aliquot (spiking) and dilute to the final volume. One aliquot is used without the addition of the analyte (blank). Thus, a standard series is obtained. If calibration is involved in an analytical procedure, the SOP for this should include a description of the calibration sub-procedure. If applicable, including an optimalization procedure (usually given in the instruction manual).
where: a = intercept of the line with the y-axis b = slope (tangent) Ideally, the intercept a is zero. Namely, when the analyte is absent no response of the instrument is to be expected. However, because of interactions, interferences, noise, contaminations and other sources of bias, this is seldom the case. Therefore, a can be considered as the signal of the blank of the standard series. The slope b is a measure for the sensitivity of the procedure; the steeper the slope, the more sensitive the procedure, or: the stronger the instrument response on y i to a concentration change on x (see also Section 7.5.3). The correlation coefficient r can be calculated by:
(6.19;7.2)
where x 1= concentrations of standards x = mean of concentrations of standards y 1= instrument response to standards y = mean of instrument responses to standards The line parameters b and a are calculated with the following equations:
(6.20;7.3)
and
a = y - bx (6.21;7.4)
Example of calibration graph As an example, we take the same calibration graph as discussed in Section 6.4.4.1, (Fig. 6-4): a standard series of P (0-1.0 mg/L) for the spectrophotometric determination of phosphate in a Bray-I extract ("available P"), reading in absorbance units. The data and calculated terms needed to determine the parameters of the calibration graph were given in Table 6-5. The calculations can be done on a (programmed) calculator or more conveniently on a PC using a home-made program or, even more conveniently, using an available regression program. The calculations yield the equation of the calibration line (plotted in Fig. 7-1):
y = 0.626x + 0.037 (6.22; 7.5)
with a correlation coefficient r = 0.997 . As stated previously (6.4.3.1), such high values are common for calibration graphs. When the value is not close to 1 (say, below 0.98) this must be taken as a warning and it might then be advisable to repeat or review the procedure. Errors may have been made (e.g. in pipetting) or the used range of the graph may not be linear. Therefore, to make sure, the calibration graph should always be plotted, either on paper or on computer monitor. Fig. 7-1. Calibration graph plotted from data of Table 6-5.
If linearity is in doubt the following test may be applied. Determine for two or three of the highest calibration points the relative deviation of the measured yvalue from the calculated line:
(7.6)
- If the deviations are < 5% the curve can be accepted as linear. - If a deviation > 5% then the range is decreased by dropping the highest concentration. - Recalculate the calibration line by linear regression. - Repeat this test procedure until the deviations < 5%. When, as an exercise, this test is applied to the calibration curve of Fig. 7-1 (data in Table 6-3) it appears that the deviations of the three highest points are < 5%, hence the line is sufficiently linear. During calculation of the line, the maximum number of decimals is used, rounding off to the last significant figure is done at the end (see instruction for rounding off in Section 8.2). Once the calibration graph is established, its use is simple: for each y value measured for a test sample (the "unknown") the corresponding concentration x can be determined either by reading from the graph or by calculation using Equation (7.1), or x is automatically produced by the instrument.
The "fitting" of the calibration graph is necessary because the actual response points y i, composing the line usually do not fall exactly on the line. Hence, random errors are implied. This is expressed by an uncertainty about the slope and intercept b and a defining the graph. A discussion of this uncertainty is given. It was explained there that the error is expressed by s y , the "standard error of the y-estimate" (see Eq. 6.23, a parameter automatically calculated by most regression computer programs. This uncertainty about the -values (the fitted y -values) is transferred to the corresponding concentrations of the unknowns on the x-axis by the calculation using Eq. (7.1) and can be expressed by the standard deviation of the obtained x-value. The exact calculation is rather complex but a workable approximation can be calculated with:
(7.7)
Example For each value of the standards x the corresponding y is calculated with Eq. (7.5):
(7.8)
Now, the confidence limits of the found results xf can be calculated with Eq. (6.9):
x f t.sx (7.9)
For a two-sided interval and 95% confidence: ttab = 2.78 (see Appendix 1, df = n -2=4). Hence all results in this example can be expressed as: Xf 0.08 mg/L Thus, for instance, the result of a reading y = 0.22 and using Eq. (7.5) to calculate xf = 0.29, can be reported as 0.29 0.08 mg/L. (See also Note 2 below.) The used s x value can only be approximate as it is taken constant here whereas in reality this is usually not the case. Yet, in practice, such an approximate estimation of the error may suffice. The general rule is that the measured signal is most precise (least standard deviation) near the centroid of the calibration graph (see Fig. 6-4). The confidence limits can be narrowed by increasing the number of calibration points. Therefore, the reverse is also true: with fewer calibration points the confidence limits of the measurements become wider. Sometimes only two or three points are used. This then usually concerns the checking and restoring of previously established calibration graphs including those in the microprocessor or computer of instruments. In such cases it is advisable to check the graph regularly with more standards. Make a record of this in the file or journal of the method.
Note 1. Where the determination of the analyte is part of a procedure with several steps, the error in precision due to this reading is added to the errors of the other steps and as such included in the total precision error of the whole procedure. The latter is the most useful practical estimate of confidence when reporting results. As discussed in Section 6.3.4 a convenient way to do this is by using Equations (6.8) or (6.9) with the mean and standard deviation obtained from several replicate determinations (n> 10) carried out on control samples or, if available, taken from the control charts (see 8.3.2: Control Chart of the Mean). Most generally, the 95% confidence for single values x of test samples is expressed by Equation (6.10):
x2s (6.10; 7.10)
where s is the standard deviation of the mentioned large number of replicate determinations. Note 2. The confidence interval of 0.08 mg/L in the present example is clearly not satisfactory and calls for inspection of the procedure. Particularly the blank seems to be (much) too high. This illustrates the usefulness of plotting the graph and calculating the parameters. Other traps to catch this error are the Control Chart of the Blank and, of course, the technician's experience.
After calibration, at fixed places or intervals (after every 10, 15, 20, or more, test samples) a standard is measured. For this, often a standard near the middle of the working range is used (continuing calibration standard). When the drift is within acceptable limits, the measurement is continued. If the drift is unacceptable, the instrument is recalibrated ("resloped") and the previous interval of samples remeasured before continuing with the next interval. The extent of the "acceptable" drift depends on the kind of analysis but in soil and plant analysis usually does not exceed 5%. This procedure is very suitable for manual operation of measurements. When automatic sample changers are used, various options for recalibration and repeating intervals or whole batches are possible. 2. Linear correction or correction by interpolation Here, too, standards are measured at intervals, usually together with a blank ("drift and wash") and possible changes are processed by the computer software which converts the past readings of the batch to the original calibration. Only in case of serious mishap are batches or intervals repeated. A disadvantage of this procedure is that drift is taken to be linear whereas this may not be so. Autoanalyzers, ICP and AAS with automatic sample changers often employ variants of this type of procedure. At present, the development of instrument software experiences a mushroom growth. Many new fancy features with respect to resloping, correction of carryover, post-batch dilution and repeating, are being introduced by manufacturers. Running ahead of this, many laboratories have developed their own interface software programs meeting their individual demands.
7.3.1 Blanks
A blank or blank determination is an analysis of a sample without the analyte or attribute, or an analysis without a sample, i.e. going through all steps of the procedure with the reagents only. The latter type is the most common as samples without the analyte or attribute are often not available or do not exist. Another type of blank is the one used for calibration of instruments as discussed in the previous sections. Thus, we may have two types of blank within one analytical method or system: - a blank for the whole method or system and - a blank for analytical subprocedures (measurements) as part of the whole procedure or system. For instance, in the cation exchange capacity (CEC) determination of soils with the percolation method, two method or system blanks are included in each batch: two percolation tubes with cotton wool or filter pulp and sand or celite, but without sample. For the determination of the index cation (NH4 by colorimetry or
Na by flame emission spectroscopy) a blank is included in the determination of the calibration graph. If NH4 is determined by distillation and subsequent titration, a blank titration is carried out for correction of test sample readings. The proper analysis of blanks is very important because: 1. In many analyses sample results are calculated by subtracting blank readings from sample readings. 2. Blank readings can be excellent monitors in quality control of reagents, analytical processes, and proficiency. 3. They can be used to estimate several types of method detection limits. For blanks the same rule applies as for replicate analyses: the larger the number, the greater the confidence in the mean. The widely accepted rule in routine analysis is that each batch should include at least two blanks. For special studies where individual results are critical, more blanks per batch may be required (up to eight). For quality control, Control Charts are made of blank readings identically to those of control samples. The between-batch variability of the blank is expressed by the standard deviation calculated from the Control Chart of the Mean of Blanks, the precision can be estimated from thge Control Chart of the Range of Duplicates of Blanks. The construction and use of control charts are discussed in detail in 8.3. One of the main control rules of the control charts, for instance, prescribes that a blank value beyond the mean blank value plus 3 the standard deviation of this mean (i.e. beyond the Action Limit) must be rejected and the batch be repeated, possibly with fresh reagents. In many laboratories, no control charts are made for blanks. Sometimes, analysts argue that 'there is never a problem with my blank, the reading is always close to zero'. Admittedly, some analyses are more prone to blank errors than others. This, however, is not a valid argument for not keeping control charts. They are made to monitor procedures and to alarm when these are out of control (shift) or tend to become out of control (drift). This can happen in any procedure in any laboratory at any time. From the foregoing discussion it will be clear that signals of blank analyses generally are not zero. In fact, blanks may found to be negative. This may point to an error in the procedure: e.g. for the zeroing of the instrument an incorrect or a contaminated solution was used or the calibration graph was not linear. It may also be due to the matrix of the solution (e.g. extractant), and is t en often h unavoidable. For convenience, some analysts practice "forcing the blank to zero" by adjusting the instrument. Some instruments even invite or compel analysts to do so. This is equivalent to subtracting the blank value from the values of the standards before plotting the calibration graph. From the standpoint of Quality Control this practice must be discouraged. If zeroing of the instrument is necessary, the use of pure water for this is preferred. However, such general considerations may be overruled by specific instrument or method instructions. This is becoming more and more common practice with modem sophisticated hi-tech instruments. Whatever the case, a decision on how to deal with blanks must made for each procedure and laid down in the SOP concerned.
offer excellent possibilities for this. For proper judgement (validation) and selection of a procedure or instrument it is important to have information about the lower limits at which analytes can be detected or determined with sufficient confidence. Several concepts and terms are used e.g., detection limit, lower limit of detection (LLD), method detection limit (MDL). The latter applies to a whole method or system, whereas the two former apply to measurements as part of a method. Note: In analytical chemistry, "lower limit of detection" is often confused with "sensitivity" (see 7.5.3). Although various definitions can be found, the most widely accepted definition of the detection limit seems to be: 'the concentration of the analyte giving a signal equal to the blank plus 3 the standard deviation of the blank'. Because in the calculation of analytical results the value of the blank is subtracted (or the blank is forced to zero) the detection limit can be written as:
LLD, MDL = 3 sbl (7.11)
At this limit it is 93% certain that the signal is not due to the blank but that the method has detected the presence of the analyte (this does not mean that below this limit the analyte is absent!). Obviously, although generally accepted, this is an arbitrary limit and in some cases the 7% uncertainty may be too high (for 5% uncertainty the LLD =3.3 s bl). Moreover, the precision in that concentration range is often relatively low and the LLD must be regarded as a qualitative limit. For some purposes, therefore, a more elevated "limit of determination" or "limit of quantification" (LLQ) is defined as
LLQ = 2 LLD = 6 sbl (7.12)
or sometimes as
LLQ = 10 sbl (7.13)
Thus, if one needs to know or report these limits of the analysis as quality characteristics, the mean of the blanks and the corresponding standard deviation must be determined (validation). The s bl can be obtained by running a statistically sufficient number of blank determinations (usually a minimum of 10, and not excluding outliers). In fact, this is an assessment of the "noise" of a determination. Note: Noise is defined as the 'difference between the maximum and minimum values of the signal in the absence of the analyte measured during two minutes' (ox otherwise according to instrument instruction). The noise of several instrumental measurements can be displayed by using a recorder (e.g. FES, AAS, ICP, IR, GC, HPLC, XRFS). Although this is not often used to actually determine the detection limit, it is used to determine the signal-to-noise ratio (a validation parameter not discussed here) and is particularly useful to monitor noise in case of trouble shooting (e.g. suspected power fluctuations). If the analysis concerns a one-batch exercise 4 to 8 blanks are run in this batch. If it concerns an MDL as a validation characteristic of a test procedure used for multiple batches in the laboratory such as a routine analysis, the blank data are collected from different batches, e.g. the means of duplicates from the control charts.
For the determination of the LLD of measurements where a calibration graph is used, such replicate blank determinations are not necessary since the value of the blank as well as the standard deviation result directly from the regression analysis (see Section 7.2.3 and Example 2 below). Examples 1. Determination of the Method Detection Limit (MDL) of a Kjeldahl-N determination in soils Table 7-1 gives the data obtained for the blanks (means of duplicates) in 15 successive batches of a micro-Kjeldahl N determination in soil samples. Reported are the millilitres 0.01 M HCl necessary to titrate the ammonia distillate and the conversion to results in mg N by: reading 0.01 14. Table 7-1. Blank data of 15 batches of a Kjeldahl-N determination in soils for the calculation of the Method Detection Limit.
ml HCl 0.12 0.16 0.11 0.15 0.09 0.14 0.12 0.17 0.14 0.20 0.16 0.22 0.14 0.11 0.15 Mean blank: s bl: mg N 0.0161 0.0217 0.0154 0.0203 0.0126 0.0189 0.0161 0.0238 0.0189 0.0273 0.0217 0.0308 0.0189 0.0154 0.0203 0.0199 0.0048
MDL = 3 sbl =0.014 mg N The MDL reported in this way is an absolute value. Results are usually reported as relative figures such as % or mg/kg (ppm). In the present case, if 1 g of
sample is routinely used, then the MDL would be 0.014 mg/g or 14 mg/kg or 0.0014%. Note that if one would use only 0.5 g of sample (e.g. because of a high N content) the MDL as a relative figure is doubled! When results are obtained below the MDL of this example they must reported as: '<14 mg/kg' or '< 0.0014%'. Reporting '0 %' or '0.0 %' may be acceptable for practical purposes, but may be interpreted as the element being absent, which is not justified. Note 1. There are no strict rules for reporting figures below the LLD or LLQ. Most important is that data can be correctly interpreted and used. For this reason uncertainties (confidence limits) and detection limits should be known and reported to clients or users (if only upon request). The advantage of using the " <" sign for values below the LLD or LLQ is that the value 0 (zero) and negative values can be avoided as they are usually either impossible or improbable. A disadvantage of the " <" sign is that it is a non-numerical character and not suitable in spreadsheet programs for further calculation and manipulation. In such cases the actually found value will be required, but then the inherent confidence restrictions should be known to the user. Note 2. Because a normal distribution of data is assumed it can statistically be expected that zero and negative values for analytical results occur when blank values are subtracted from test values equal to or lower than the blank. Clearly, only in few cases are negative values possible (e.g. for adsorption) but for concentrations such values should normally not be reported. Exceptions to this rule are studies involving surveys of attributes or effects. Then it might be necessary to report the actually obtained low results as otherwise the mean of the survey would be biased. 2. Lower Limit of Detection derived from a calibration graph We use the calibration graph of Figure 7-1. Then, noting that s bl = s x = 0.6097 and using Equation (7.11) we obtain: LLD = 30.6097 = 1.829 mg/L. It is noteworthy that "forcing the blank to zero" does not affect the Lower Limit of Detection. Although a (= y b, see Fig. 7-1) may become zero, the uncertainty s y of the calibration graph, and thus of s x and sbl, is not changed by this: the only change is that the "forced" calibration line has moved up and now runs through the intersection of the axes (parallel to the "original" line).
Although several terms for different sample types have already freely been used in the previous sections, it seems appropriate to define the various types before the major Quality Control operations are discussed.
The sample is analyzed with and without the spike to test recovery (see 7.5.6). It should be a realistic surrogate with respect to matrix and concentration. The mixture should be well homogenized. The requirement "realistic surrogate" is the main problem with spikes. Often the analyte cannot be integrated in the sample in the same manner as the original analyte, and then treatments such as digestion or extraction may not necessarily reflect the behaviour of real samples.
Two main types of validation may be distinguished: 1. Validation of standard procedures. The validation of new or existing methods or procedures intended to be used in many laboratories, including procedures (to be) accepted by national or international standardization organizations. 2. Validation of own procedures. The in-house validation of methods or procedures by individual user-laboratories. The first involves an interlaboratory programme of testing the method by a number ( 8) of selected renown laboratories according to a protocol issued to all participants. The second involves an in-house testing of a procedure to establish its performance characteristics or more specifically its suitability for a purpose. Since the former is a specialist t sk, usually (but not exclusively) a performed by standardization organizations, the present discussion will be restricted to the second type of validation which concerns every laboratory. Validation is not only relevant when non-standard procedures are used but just as well when validated standard procedures are used (to what extent does the laboratory meet the standard validation?) and even more so when variants of standard procedures are introduced. Many laboratories use their own versions of well-established methods or change a procedure for reasons of efficiency or convenience. Fundamentally, any change in a procedure (e.g. sample size, liquid:solid ratio in extractions, shaking time) may affect the performance characteristics and should be validated. For instance, in Section 7.3.2 we noticed that halving the sample size results in doubling the Lower Limit of Detection. Thus, inherent in generating quality analytical data is to support these with a quantification of the parameters of confidence. As such it is part of the quality control. To specify the performance characteristics of a procedure, a selection (so not necessarily all) of the following basic parameters is determined: - Trueness (accuracy), Bias - Precision - Recovery - Sensitivity - Specificity and selectivity - Working range (including MDL) - Interferences - Ruggedness or robustness - Practicability Before validation can be carried out it is essential that the detailed procedure is available as a SOP.
The direct method is by carrying out replicate analyses (n 10) with the method on a (certified) reference sample with a known content of the analyte. The indirect method is by comparing the results of the method with those of a reference method (or otherwise generally accepted method) both applied to the same sample(s). Another indirect way to verify bias is by having (some) samples analyzed by another laboratory and by participation in interlaboratory exchange programmes. This will be discussed in Chapter 9. It should be noted that the trueness of an analytical result may be sensitive to varying conditions (level of analyte, matrix, extract, temperature, etc.). If a method is applied to a wide range of materials, for proper validation different samples at different levels of analyte should be used. Statistical comparison of results can be done in several ways some of which were described in Section 6.4. Numerically, the trueness (often less appropriately referred to as accuracy) can be expressed using the equation:
7.14
where x = mean of test results obtained for reference sample = "true" value given for reference sample Thus, the best trueness we can get is 100%. Bias, more commonly used than trueness, can be expressed as an absolute value by:
bias = x - (7.15)
Thus, the best bias we can get is 0 (in units of the analyte) or 0 % respectively. Example The Cu content of a reference sample is 34.0 2.7 mg/kg (2.7 = s, n=12). The results of 15 replicates with the laboratory's own method are the following: 38.0; 34.6; 29.1; 27.8; 40.4; 33.1; 40.9; 28.5; 36.1; 26.8; 30.6; 24.3; 31.6; 22.3; 29.9 mg/kg. With Equation (6.1) we calculate: x = 31.6. Using Equation (7.14) the trueness is (31.6/34.0)100% = 93%. Using Equation (7.16), the bias is (31.6 34.0)100% / 34.0 = - 7%. These calculations suggests a systematic error. To see if this error is statistically significant a t-test can be done. For this, with Equation (6.2) we first calculate s = 5.6. The F-test (see 6.4.2 and 7.5.2) indicates a significant difference in standard deviation and we have to use the Cochran variant of the t-test (see 6.4.3). Using Equation (6.16) we find tcal = 1.46, and with Eq. (6.17) the critical value ttab* = 2.16 indicating that the results obtained by the laboratory are not significantly different from the reference value (with 95% confidence). Although a laboratory could be satisfied with this result, the fact remains that the mean of the test results is not equal to the "true" value but somewhat lower. As discussed in Sections 6.4.1 and 6.4.3 the one-sided t-test can be used to test if
this result is statistically on one side (lower or higher) of the reference value. In the present case the one-sided critical value is 1.77 (see Appendix 1) which also exceeds the calculated value of 1.46 indicating that the laboratory mean is not systematically lower than the reference value (with 95% confidence). At first sight a bias of -7% does not seem to be insignificant. In this case, however, the wide spread of the own data causes the uncertainty about this. If the standard deviation of the results had been the same as that of the reference sample then, using Equations (6.13) and (6.14), tcal were 2.58 and with ttab = 2.06 (App. 1) the difference would have been significant according to the two-sided t-test, and with ttab =1.71 significantly lower according to the one-sided t-test (at 95% confidence).
7.5.2 Precision
7.5.2.1 Reproducibility 7.5.2.2 Repeatability 7.5.2.3 Within-laboratory reproducibility Replicate analyses performed on a reference sample yielding a mean to determine trueness or bias, as described above, also yield a standard deviation of the mean as a measure for precision. However, for precision alone also control samples and even test samples can be used. The statistical test for comparison is done with the F-test which compares the obtained standard deviation with the standard deviation given for the reference sample (in fact, the variances are compared: Eq. 6.11). Numerically, precision is either expressed by the absolute value of the standard deviation or, more universally, by the relative standard deviation (RSD) or coefficient of variation (CV) (see Equations 6.5 and 6.6,).
(7.17
where x = mean of test results obtained for reference sample s = standard deviation of x If the attained precision is worse than given for the reference sample then it can still be decided that the performance is acceptable for the purpose (which has to be reported as such), otherwise it has to be investigated how the performance can be improved. Like the bias, precision will not necessarily be the same at different concentration of the analyte or in different kinds of materials. Comparison of precision at different levels of analyte can be done with the F-test: if the variances at a few different levels are similar, then precision is assumed to be constant over the range. Example The same example as above for bias is used. The standard deviation of the laboratory is 5.6 mg/kg which, according to Eq. (7.17), corresponds with a precision of (5.6/31.6)100% = 18%. (The precision of the reference sample can similarly be calculated as about 8%).
the critical value is 2.47 (App. 2, two-sided, df1 = 14, df2 =11) hence, the null hypothesis that the two standard deviations belong to the same population is rejected: there is a significant difference in precision (at 95% confidence level). Types of precision The above description of precision leaves some uncertainty about the actual execution of its determination. Because particularly precision is sensitive to the way it is determined some specific types of precision are distinguished and, therefore, it should always be reported what type is involved.
7.5.2.1 Reproducibility
The measure of agreement between results obtained with the same method on identical test or reference material under different conditions (execution by different persons, in different laboratories, with different equipment and at different times). The measure of reproducibility R is the standard deviation of these results s R, and for a not too small number of data (n 8) R is defined by (with 95% confidence):
R = 2.8 sR (7.18)
(where 2.8 = 2 and is derived from the normal or gaussian distribution; ISO 5725). Thus, reproducibility is a measure of the spread of results when a sample is analyzed by different laboratories. If a method is sensitive to different ways of execution or conditions (low robustness, see 7.5.7), then the reproducibility will reflect this. This parameter can obviously not be verified in daily practice. For that purpose the next two parameters are used (repeatability and within-laboratory reproducibility).
7.5.2.2 Repeatability
The measure of agreement between results obtained with the same method on identical test or reference material under the same conditions (job done by one person, in the same laboratory, with the same equipment, at the same time or with only a short time interval). Thus, this is the best precision a laboratory can obtain: the within-batch precision. The measure for the repeatability r is the standard deviation of these results s r, and for a not too small number of data ( 10) r is defined by (with 95% confidence):
r = 2.8 sr (7.19)
The between-batch precision can be estimated in three different ways: 1. As the standard deviation of a large number (n 50) of duplicate determinations carried out by two analysts:
(7.21)
where s i, = the standard deviation of each pair of duplicates k = number of pairs of duplicates di = difference between duplicates within each pair 2. Empirically as 1.6 s r. Then: RL = 2.8 1.6 s r or:
RL = 1.6 r (7.22)
where r is the repeatability as defined above. 3. The most practical and realistic expression of the within-laboratory reproducibility is the one based on the standard deviation obtained for control samples during routine work. The advantage is that no extra work is involved: control samples are analyzed in each batch, and the within-laboratory standard deviation is calculated each time a control chart is completed (or sooner if desired, say after 10 batches). The calculation is here:
RL = 2.8 scc (7.23)
where s cc is the standard deviation obtained from a Control Chart (see 8.3.2). Clearly, the above three RL values are not identical and thus, whenever the within-laboratory reproducibility is reported, the way by which it is obtained should always be stated. Note: Naturally, instead or reporting the derived validation parameters for precision R, r, or RL, one may prefer to report their primary measure: the standard deviation concerned.
7.5.3 Sensitivity
This is a measure for the response y of the instrument or of a whole method to the concentration C of the analyte or property, e.g. the slope of the analytical calibration graph (see Section 7.2.2). It is the value that is required to quantify the analyte on basis of the analytical signal. The sensitivity for the analyte in the final sample extract may not necessarily be equal to the sensitivity for the analyte in a simple standard solution. Matrix effects may cause improper calibration of the measuring Step of the analytical method. As observed earlier for calibration graphs, the sensitivity may not be constant over a long range. It usually decreases at higher concentrations by saturation of the signal. This limits the working range (see next Section 7.5.4). Some of the most typical situations are exemplified in Figure 7-2. Fig. 7-2. Examples of some typical response graphs. 1. Constant sensitivity. 2. Sensitivity constant over lower-range, then decreasing. 3. Sensitivity decreasing over whole range. (See also 7.5.4.)
In general, on every point of the response graph the sensitivity can be expressed by
(7.24)
The dimension of S depends on the dimensions of y and C. In atomic absorption, for example, y is expressed in absorbance units and C in mg/L. For pH and ion-selective electrodes the response of the electrode is expressed in mV and the concentration in mg/L or moles (plotted on log scale). Often, for convenience, the signal is conversed and amplified to a direct reading in arbitrary units, e.g. concentration. However, for proper expression of the sensitivity, this derived response should be converted back to the direct response. In practice, for instance, this is simply done by making a calibration graph in the absorbance mode of the instrument as exemplified in Figure 7-1, where slope b is the sensitivity of the P measurement on the spectrophotometer. If measured in the absorption (or transmission) mode, plotting should be done with a logarithmic y-axis.
several sample sizes, liquid:sample ratios, or by spiking samples (see 7.5.6, Recovery). This practice is particularly important to determine the upper limit of the working range (the lower limit of a working range corresponds with the Method Detection Limit and was discussed in Section 7.3.2). The upper limit is often determined by such factors as saturation of the extract (e.g. the "free" iron or gypsum determinations) or by depletion of a solution in case of adsorption procedures (e.g. phosphate adsorption; cobaltihexamine or silver thiourea adsorption in single-extraction CEC methods). In such cases the liquid:sample ratio has to be adapted. To determine the measuring range of solutions the following procedure can be applied: - Prepare a standard solution of the analyte in the relevant matrix (e.g. extractant) at a concentration beyond the highest expected concentration. - Measure this solution and determine the instrument response. - Dilute this standard solution 10 with the matrix solution and measure again. - Repeat dilution and measuring until the instrument gives no response. - Plot the response vs. the concentration. - Estimate the useful part of the response graph. (If the dilution steps are too large to obtain a reliable graph, they need to be reduced, e.g. 5). In Figure 7 the useful parts of graphs 1 and 2 are obviously the linear parts -2 (and for graph 2 perhaps to concentration 8 if necessary). Sometimes a built-in curve corrector for the linearization of curved calibration plots can extend the range of application (e.g. in AAS). Graph 3 has no linear part but must and can still be used. A logarithmic plotting may be considered and in some cases by non-linear (polynomial) regression an equation may be calculated. It has to be decided on practical grounds what concentration can be accepted until the decreasing sensitivity renders the method inappropriate (with the knowledge that flat or even downward bending ranges are useless in any case).
spectrometric techniques (FES, AAS). The selectivity is no problem as the useful spectral lines can be selected exactly with a monochromator or filters. The mutual interference can be suppressed by adding an excess of an easily ionizable element, such as cesium, which maintains the electron concentration in the flame constant. In chromatographic techniques (GC, HPLC) specificity is sometimes a problem in the analysis of complex compounds. In the validation report, selectivity and specificity are usually described rather than quantitatively expressed.
7.5.6 Recovery
To determine the effectiveness of a method (and also of the working range), recovery experiments can be carried out. Recovery can be defined as the 'fraction of the analyte determined after addition of a known amount of the analyte to a sample'. In practice, control samples are most commonly used for spiking. The sample as well as the spikes are analyzed at least 10 times, the results averaged and the relative standard deviation (RSD) calculated. For inhouse validation the repeatability (replicates in one batch, see 7.5.2.2) is determined, whereas for quality control the within-laboratory reproducibility (replicates in different batches, see 7.5.2.3) is determined and the data recorded on Control Charts. The concentration level of the spikes depend on the purpose: for routine control work the level(s) will largely correspond with those of the test samples (recoveries at different levels may differ): a concentration midway the working range is a convenient choice. For the determination of a working range a wide range may be necessary, at least to start with, see 7.5.4). An example is the addition of ammonium sulphate in the Kjeldahl nitrogen determination. Recovery tests may reveal a significant bias in the method used and may prompt a correction factor to be applied to the analytical results. The recovery is calculated with:
(7.25)
where x s = mean result of spiked samples x = mean result of unspiked samples x add = amount of added analyte If a blank (sample) is used for spiking then the mean result of the unspiked sample will generally be close to zero. In fact, such replicate analyses could be used to determine or verify the method detection limit (MDL, see 7.3.2). As has been mentioned before (Section 7.4.5) the recovery obtained with a spike may not be the same as that obtained with real samples since the analyte may not be integrated in the spiked sample in the same manner as in real samples. Also, the form of the analyte with which the spike is made may present a problem as different compounds and grain sizes representing the analyte may behave differently in an analysis.
the ruggedness is first tested by the initiating laboratory and subsequently in an interlaboratory trial. The ruggedness test is conveniently done with the so-called "Youden and Steiner partial factorial design" where in only eight replicate analyses seven factors can be varied and analyzed. This efficient technique can also be used for within-laboratory validation. As an example the ammonium acetate CEC determination of soil will be taken. The seven factors could be for instance: A: With (+) and without (-) addition of 125 mg CaCO3 to the sample (corresponding with 5% CaCO3 content) B: Concentration of saturation solution: 1 M (+) and 0.5 M (-) NH4OAc C: Extraction time: 4 hours (-) and 8 hours (+) D: Admixture of sea-sand (or celite): with (+) and without (-) 1 teaspoon of sand E: Washing procedure: 2 (-) or 3(+) with ethanol 80% F: Concentration of washing ethanol: 70% (-) or 80% (+) G: Purity of NH4OAc: technical grade (-) and analytical grade (+) The matrix of the design looks as shown in Table 7-2. The eight subsamples are analyzed basically according to the SOP of the method. The variations in the SOP are indicated by the + or - signs denoting the high or low level, presence or absence of a factor or otherwise stated conditions to be investigated. The eight obtained analytical results are Yi,. Thus, sample (experiment) no. 1 receives all treatments A to G indicated with (+), sample no. 2 receives treatments A, B and D indicated by (+) and C, E, F and G indicated by (-), etc. Table 7-2. The partial factorial design (seven factors) for testing ruggedness of an analytical method
Factors Experiment 1 2 3 4 5 6 7 8 A + + + + B + + + + C + + + + D + + + + E + + + + F + + + + G + + + + Results Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8
where
YA+ = sum of results Yi, where factor A has + sign (i.e. Y1, + Y2 + Y3 + Y4; n=4) YA- = sum of results Yi, where factor A has - sign (i.e. Y5 + Y6 + Y7+ Y8; n=4) The test for significance of the effect can be done in two ways: 1. With a t-test (6.4.3) using in principle the table with "two-sided" critical t values (App. 1, n=4). When clearly an effect in one direction is to be expected, the one-sided test is applicable. 2. By checking if the effect exceeds the precision of the original procedure (i.e. if the effect exceeds the noise of the procedure). Most realistic and practical in this case would be to use s cc , the withinlaboratory standard deviation taken from a control chart (see Sections 7.5.2.3 and 8.3.2). Now, the standard deviation of the mean of four measurements can be taken as (see 6.3.4), and the standard deviation of the difference between two such means (i.e. the standard deviation of the effect calculated with Eq. 7.26) as . The effect of a factor can be considered significant if it exceeds 2 the standard deviation of the procedure, i.e.
Effect >1.4 scc
.
(7.27)
Therefore, the effect is significant when: where s cc is the standard deviation of the original procedure taken from the last complete control chart. Note. Obviously, when this standard deviation is not available such as in the case of a new method, then an other type of precision has to be used, preferably the within-laboratory reproducibility (see 7.5.2). It is not always possible or desirable to vary seven factors. However, the discussed partial factorial design does not allow a reduction of factors. At most, one (imaginary) factor can be considered in advance to have a zero effect (e.g. the position of the moon). In that case, the design is the same as given in Table 7-2 but omitting factor G. For studying only three factors a design is also available. This is given in Table 7-3. Table 7-3. The partial factorial design (three factors) for testing ruggedness of an analytical method
Experiment A 1 2 3 + + Factors B + + C + + Y1 Y2 Y3 Results
Y4
where YA+ = sum of results Yi, where factor A has + sign (i.e. Y1 + Y3; n=2) YA- = sum of results Yi, where factor A has - sign (i.e. Y2 + Y4; n=2) The test for significance of the effect can be done similarly as described above for the seven-factor design, with the difference that here n = 2. If the relative effect has to be calculated (for instance for use as a correction factor) this must be done relative to the result of the original factor. Thus, in the above example of the CEC determination, if one is interested in the effect of reducing the concentration of the saturating solution (Factor B), the "reference" values are those obtained with the 1 M solution (denoted with + in column B) and the relative effect can be calculated with:
(7.29)
The confidence of the results of partial factorial experiments can be increased by running duplicates or triplicates as discussed in Section 6.3.4. This is particularly useful here since possible outliers may erroneously be interpreted as a "strong effect". Often a laboratory wants to check the influence of one factor only. Temperature is a factor which is particularly difficult to control in some laboratories or sometimes needlessly controlled at high costs simply because it is prescribed in the original method (but perhaps never properly validated). The very recently published standard procedure for determining the particle-size distribution (ISO 11277) has not been validated in an interlaboratory trial. The procedure prescribes the use of an end-over-end shaker for dispersion. If up to now a reciprocating shaker has been used and the laboratory decides to adopt the end-over-end shaker then in-house validation is indicated and a comparison with the end-over-end shaker must be made and documented. If it is decided, after all, to continue with the reciprocating shaking technique (e.g. for practical reasons), then the laboratory must be able to show the influence of this step to users of the data. Such validation must include all soil types to which the method is applied. The effect of a single factor can simply be determined by conducting a number of replicate analyses (n>. 10) with and without the factor, or at two levels of the factor, and comparing the results with the F-test and t-test (see 6.4). Such a single effect may thus be expressed in terms of bias and precision.
7.5.8 Interferences
Many analytical methods are to a greater or lesser extent susceptible to interferences of various kinds. Proper validation should include documentation of such influences. Most prominent are matrix effects which may either reduce or enhance analytical results (and are thus a form of reduced selectivity). Ideally, such interferences are quantified as bias and corrected for, but often this is a
tedious affair or even impossible. Matrix effects can be quantified by conducting replicate analyses at various levels and with various compositions of (spiked) samples or they can be nullified by imitating the test sample matrix in the standards, e.g. in X-ray fluorescence spectroscopy. However, the matrix of test samples is often unknown beforehand. A practical qualitative check in such a case is to measure the analyte at two levels of dilution: usually the signal of the analyte and of the interference are not proportional. Well-known other interferences are, for example, the dark colour of extracts in the colorimetric determination of phosphate, and in the CEC determination the presence of salts, lime, or gypsum. A colour interference may be avoided by measuring at an other wavelength (in the case of phosphate: try 880 nm). Sometimes the only way to avoid interference is to use an other method of analysis. If it is thought that an interference can be singled out and determined, it can be quantified as indicated for ruggedness in the previous section.
7.5.9 Practicability
When a new method is proposed or when there is a choice of methods for a determination, it may be useful if an indication or description of the ease or tediousness of the application is available. Usually the practicability can be derived from the detailed description of the procedure. The problems are in most cases related to the availability and maintenance of certain equipment and the required staff or skills. Also, the supply of required parts and reagents is not always assured, nor the uninterrupted supply of stable power. In some countries, for instance, high purity grades cannot always be obtained, some chemicals cannot be kept (e.g. sodium pyrophosphate in a hot climate) and even the supply of a seemingly common reagent such as ethanol can be a problem. If such limitations are known, it is useful if they are mentioned in the relevant SOPs or validation report.
type of SOP, directly determine the product of a laboratory, some specific aspects relating to them are discussed here. As was outlined in Chapter 2, instructions in SOPs should be written in such a way that no misunderstanding or ambiguity exists as to the execution of the procedure. Thus, much of the responsibility (not all) lies with the author of the procedure. Even if the author and user are one and the same person, which should normally be the case (see 2.2), such misunderstanding may be propagated since the author usually draws on the literature or documents written by someone else. Therefore, although instructions should be as brief as possible, they should at the same time be as extensive as necessary. As an example we take the weighing of a sample, a common instruction in many analytical procedures. Such an instruction could read: 1. Weigh 5.0 g of sample into a 250 ml bottle. 2. Add 100 ml of extracting solution and close bottle. 3. Shake overnight. 4. Etc., etc. Comment 1 According to general analytical practice the amount of 5.0 g means "an amount between and including 4.95 g and 5.05 g" (4.95 weight 5.05) since less than 4.95 would round to 4.9 and more than 5.05 would round to 5.1 (note that 5.05 rounds to 5.0 and not to 5.1). Some analysts, particularly students and trainees, take the amount of 5.0 g too literally and set out on a lengthy process of adding and subtracting sample material until the balance reads "5.0" or perhaps even "5.00". Not only is this procedure tedious, the sample may become biased as particles of different size tend to segregate during this process. To prevent such an interpretation, often the prefixes "approximately", "approx." or "ca." (circa) are used, e.g. "approx. 5.0 g". As this, in turn, introduces a seeming contradiction between "5.0" (with a decimal, so quite accurate) and "approx." ('it doesn't matter all that much'), the desired accuracy must be stated: "weigh approx. 5.0 g (accuracy 0.01 g) into a 250 ml bottle". The notation 5.0 g can be replaced by 5 g when the sample size is less critical (in the present case for instance if the ratio sample: liquid is not very critical). Sometimes it may even be possible to use "weigh 3 - 5 g of sample (accuracy 0.1 g)". The accuracy needs to be stated when the actual sample weight is used in the calculation of the final result, otherwise it may be omitted. Comment 2 The "sample" needs to be specified. A convenient and correct way is to make reference to a SOP where the preparation of the sample material is described. This is the more formal version of the common practice in many laboratories where the use of the sample is implied of which the preparation is described elsewhere in the laboratory manual of analytical procedures. In any case, there should be no doubt about the sample material to be used. When other material than the usual "laboratory sample" or "test sample" is used, the preparation must be described and the nature indicated e.g., "field-moist fine earth" or "fraction > 2 mm" or "nodules". When drafting a new procedure or an own version of a standard procedure, it must be considered if the moisture content of the used sample is relevant for the final result. If so, a moisture correction factor should be part of the calculation step. In certain cases where the sample contains a considerable amount of
water (moist highly humic samples; andic material) this water will influence the soil: liquid ratio in certain extraction or equilibration procedures. Validation of such procedures is then indicated. Comment 3 The "250 ml bottle" needs to be specified also. This is usually done in the section "Apparatus and glassware" of the SOP. If, in general, materials are not specified, then it is implied that the type is unimportant for the procedure. However, in shaking procedures, the kind, size and shape of bottles may have a significant influence on the results. In addition the kind (composition) of glass is sometimes critical e.g., for the boron determination. Comment 4 To the instruction "Add 100 ml of extracting solution" apply the same considerations as discussed for the sample weighing. The accuracy needs to be specified, particularly when automatic dispensers are used. The accuracy may be implicit if the equipment to be used is stated e.g., "add 100 ml solution by graduated pipette" or "volumetric pipette" or "with a 100 ml measuring cylinder". If another means of adding the solution is preferred its accuracy should equal or exceed that of the stated equipment. Comment 5 The instruction "shake overnight" is ambiguous. It must be known that "overnight" is equivalent to "approximately 16 hrs.", namely from 5 p.m. till 9 a.m. the next morning. It is implied that this time-span is not critical but generally the deviation should not be more than, say, two hours. In case of doubt, this should be validated with a ruggedness test. More critical in many cases is the term "shake" as this can be done in many different ways. In the section "Apparatus" of the SOP the type of shaking machine is stated e.g., reciprocating shaker or end-over-end shaker. For the reciprocating shaker the instruction should include the shaking frequency (in strokes per minute), the amplitude (in mm or cm) and the position of the bottles (standing up, lying length-wise or perpendicular to the shaking direction). For an end-over-end shaker usually only the frequency or speed (in rpm) is relevant.
- Descriptive title, purpose, and identification details Study director and further personnel Sponsor or client - Work plan with starting date and duration Materials and methods to be used Study protocol and SOPs (including statistical treatments of data) - Protocols for interim reporting and inspection Way of reporting and filing of results Authorization by the management (i.e. signature) - A work plan or subroutines can often be clarified by means of a flow diagram. Some of the most used symbols in flow diagrams for procedures in general, including analytical procedures, are given in Figure 7-3. An example of a flow sheet for a research plan is given in Fig 7-4. Fig. 7-3. Some common symbols for flow diagrams.
2. Execution of the work The work must be carried out according to the plan, protocols and SOPs. All observations must be recorded including errors and irregularities. Changes of plan have to be reported to the IA and if there are budgetary implications also to the management. The study leader must have control of and be informed about the progress of the work and, particularly in larger projects, be prepared for inspection by the IA. Fig. 7-4. Design of flow diagram for study project.
3. Reporting As soon as possible after completion of the experimental work and verification of the quality control data the results are calculated. Together with a verification statement of the IA, possibly after corrections have been made, the results can be reported. The copyright and authorship of a possible publication would have been arranged in the plan. The report should contain all information relevant for the correct interpretation of the results. To keep a report digestible, used procedures may be given in abbreviated form with reference to the original protocols or SOPs. Sometimes, relevant information turns up afterwards (e.g. calculation errors). Naturally, this should be reported, even if the results have already been used. It is useful and often rewarding if after completion of a study project an evaluation is carried out by the study team. In this way a next job may be performed better.
SOPs
VAL 09-2 - Validation of CEC determination with NH4OAc METH 006 - Determination of nitrogen in soil with micro-Kjeldahl
1 PURPOSE To determine the performance characteristics of the CEC determination with ammonium acetate (pH 7) using the mechanical extractor. The following parameters have been considered: Bias, precision, working range, ruggedness, interferences, practicability. 2 REQUIREMENTS See SOP METH 09-2 (Cation Exchange Capacity and Exchangeable Bases with ammonium acetate and mechanical extractor). 3 PROCEDURES 3.1 Analytical procedure The basic procedure followed is described in SOP METH 09-2 with variations and number of replicates as indicated below. Two Control Samples have been used: LABEX 6, a Nitisol (clay 65%, CEC 20 cmolc /kg) and LABEX 2, an Acrisol (clay 25%; CEC 7 cmolc /kg); further details of these control samples in SOP RF 031 (List of Control Samples). 3.2 Bias The CEC was determined 10 on both control samples. Reference is the mean value for the CEC obtained on these samples by 19 laboratories in an interlaboratory study. 3.3 Precision
Obtained from the replicates of 3,2, 3.4 Working range The Method Detection Limit (MDL) was calculated from 10 blank determinations. Determination of the Upper Limit is not relevant (percolates beyond calibration range are rare and can be brought within range by dilution). 3.5 Ruggedness A partial factorial design with seven factors was used. The experiments were carried out in duplicate and the factors varied are as follows:
A: B: C: D: E: F: G: With (+) and without (-) addition of 125 mg CaCO3 (corresponding with 5% CaCO3 content) Concentration of saturating solution: 1 M (+) and 0.5 M (-) NH4OAc Extraction time: 4 hours (-) and 8 hours (+) Admixture of seasand (or celite): with (+) and without (-) 1 teaspoon of sand Washing procedure: 2 (-) or 3 (+) with ethanol 80% Concentration of ethanol for washing free of salt: 70% (-) or 80% (+) Parity of NH4OAc: technical grade (-) and analytical grade (+)
3.6 Interferences Two factors particularly interfere in this determination: 1. high clay content (problems with efficiency of percolation) and 2. presence of CaCO3 (competing with saturating index cation). The first was addressed by the difference in clay content of the two samples as well as by Factor D in the ruggedness test, the second by factor A of the ruggedness test, 3.7 Practicability The method is famous for its wide application and ill-famed for its limitations. Some of the most prominent aspects in this respect are considered. 4 RESULTS As results may have to be produced as a document accompanying analytical results (e.g. on request of clients) they are presented here in a model format suiting this purpose. In the present example where two different samples have been used the results for both samples may be given on one form, or for each sample on a separate form. For practical reasons, abbreviated reports may be released omitting irrelevant information. {The full report should always be kept!)
LOGO METHOD VALIDATION FORM No.: VAL RES 09-2 Version: 1 Page: 1 # 1 Date: 96-11-23 File:
1 TITLE or DESCRIPTION Validation of cation exchange capacity determination with NH4OAc pH 7 method as described in VAL 09-2 dd. 96-09-19. 2 RESULTS
2.1 Bias (Accuracy): Result of calculation -with Eq. (7.14) or (7.16) of Guidelines.
(Accuracy): 2.2 Precision Repeatability: Within-lab reproducibility: 2.3 Working range: 2.4 Ruggedness: 2.5 Interferences: 2.6 Practicability: Result of calculation with Eq. (7.17) or (7.19). Result of calculation with Eq. (7.23) (if Control Charts are available). Result of calculation as examplified by Table 7-1 in Section 7.3.2 of Guidelines. Results of calculations with Eq. (7.26) or (7.29), In this case mainly drawn from Ruggedness test Special equipment necessary: mechanical extractor substantial amounts of ethanol required washing procedures not always complete, particularly in high-clay samples, requiring thorough check.
2.7 General observations: Author: QA Officer (sign.): Author: QA Officer (sign.): Sign.: Date of Expiry: Sign.: Date of Expiry:
1. SCOPE This procedure describes the determination of nitrogen with the micro-Kjeldahl technique. It is supposed to include all soil nitrogen (including adsorbed NH4+) except that in nitrates. 2. RELATED DOCUMENTS 2.1 Normative references The following standards contain provisions referred to in the text. ISO 3696 Water for analytical laboratory use. Specification and test methods, ISO 11464 Soil quality Pretreatment of samples for physico-chemical analysis.
3. PRINCIPLE The micro-Kjeldahl procedure is followed. The sample is digested in sulphuric acid and hydrogen peroxide with selenium as catalyst and whereby organic nitrogen is converted to ammonium sulphate. The solution is then made alkaline and ammonia is distilled. The evolved ammonia is trapped in boric acid and titrated with standard acid, 4. APPARATUS AND GLASSWARE 4.1 Digester (Kjeldahl digestion tubes in heating block) 4.2 Steam-distillation unit (Fitted to accept digestion tubes) 4.3 Burette 25 ml 5. REAGENTS Use only reagents of analytical grade and deionized or distilled water (ISO 3696). 5.1 Sulphuric acid - selenium digestion mixture. Dissolve 3.5 g selenium powder in 1 L concentrated (96%, density 1.84 g/ml) sulphuric acid by mixing and heating at approx. 350C. on a hot plate. The dark colour of the suspension turns into clear light-yellow. When this is reached, continue heating for 2 hour 5.2 Hydrogen peroxide, 30%. 5.3 Sodium hydroxide solution, 38%. Dissolve 1,90 kg NaOH pellets in 2 L water in a heavy-walled 5 L flask. Cool the solution with the flask stoppered to prevent absorption of atmospheric CO2. Make up the volume to 5 L with freshly boiled and cooled deionized water. Mix well. 5.4 Mixed indicator solution. Dissolve 0.13 g methyl red and 0.20 g bromocresol green in 200 ml ethanol. 5.5 Boric acid-indicator solution, 1%. Dissolve 10 g H3BO3 in 900 ml hot water, cool and add 20 ml mixed indicator solution. Make to 1 L with water and mix thoroughly. 5.6 Hydrochloric acid, 0.010 M standard. Dilute standard analytical concentrate ampoule according to instruction.
Author: QA Officer (sign.): Sign.: Date of Expiry:
6. SAMPLE Air-dry fine earth (<2 mm) obtained according to ISO 11464 (or refer to own procedure). Mill approx. 15 g of this material to pass a 0.25 mm sieve. Use part of this material for a moisture determination according to ISO 11465 and PROC 002. 7. PROCEDURE 7.1 Digestion 1. Weigh 1 g of sample (accuracy 0.01 g) into a digestion tube. Of soils, rich in organic matter (>10%), 0.5 g is weighed in (see Remark 1). In each batch, include two blanks and a control sample. 2. Add 2.5 ml digestion mixture. 3. Add successively 3 aliquots of 1 ml hydrogen peroxide. The next aliquot can be added when frothing has subsided. If frothing is excessive, cool the tube in water. Note:. In Steps 2 and 3 use a measuring pipette with balloon or a dispensing pipette, 4. Place the tubes on the heater and heat for about 1 hour at moderate temperature (200C). 5. Turn up the temperature to approx. 330C (just below boiling temp.) and continue heating until mixture is transparent (this should take about two hours). 6. Remove tubes from heater, allow to cool and add approx., 10 ml water with a wash bottle while swirling. 7.2 Distillation 1. Add 20 ml boric acid-indicator solution with measuring cylinder to a 250 ml beaker and place beaker on stand beneath the condenser tip. 2. Add 20 ml NaOH 38% with measuring cylinder to digestion tube and distil for about 7 minutes during which approx. 75 ml distillate is produced. Note: the distillation time and amount of distillate may need to be increased for complete distillation (see Remark 2). 3. Remove beaker from distiller, rinse condenser tip, and titrate distillate with 0.01 M HCl until colour changes from green to pink. Note: When using automatic titrator: set end-point pH at 4.60. Remarks 1. The described procedure is suitable for soil samples with a nitrogen content of up to 10 mg N. This corresponds with a carbon content of roughly 10% C. Of soils with higher contents, less sample material is weighed in. Sample sizes of less than 250 mg should not be used because of sample bias.
2. The capacity of the procedure with respect to the amount of N that can be determined depends to a large extent on the efficiency of the distillation assembly. This efficiency can be checked, for instance, with a series of increasing amounts of (NH4)2SO4 or NH4Cl containing 0-50 mg N. 8. CALCULATION
where a = ml HCl required for titration of sample b = ml HCl required for titration of blank s = air-dry sample weight in gram M = molarity of HCl 1.4 = 14 10-3 100% (14 = atomic weight of nitrogen) mcf = moisture correction factor 9. VALIDATION PARAMETERS
9.1 Bias: 9.2 Within-lab reproducibility: 9.3 Method Detection Limit: -3.1% rel. (sample ISE 921, x=2.80 g/kg N, n=5) RL = 2.8scc = 2,5% rel. (sample LABEX 38,x =2.59 g/kg N, n=30) 0.014 mg N or 0.0014% N
10. TEST REPORT The report of analytical results shall contain the following information: - the result(s) of the determination with identification of the corresponding sample(s); - a reference to this SOP (if requested a brief outline such as given under clause 3: Principle); - possible peculiarities observed during the test; - all operations not mentioned in the SOP that can have affected the results. 11. REFERENCES Hesse, P.R. (1971) A textbook of soil chemical analysis. John Murray, London. Bremner, J.M. and C.S. Mulvaney (1982) Nitrogen Total. In: Page, A.L., R.H. Miller & D.R. Keeney (eds.) Methods of soil analysis. Part 2. Chemical and microbiological properties, 2nd ed. Agronomy Series 9 ASA, SSSA, Madison. ISO 11261 Soil quality - Determination of total nitrogen - Modified Kjeldahl method.
8.1 Introduction
In the preceding chapters basic elements of quality assurance were discussed. All activities associated with these aspects have one aim: the production of reliable data with a minimum of errors. The present discussion is concerned with activities to verify that a laboratory produces such reliable data consistently. To this end an appropriate programme of quality control (QC) must be implemented. Quality control is the term used to describe the practical steps undertaken to ensure that errors in the analytical data a of a magnitude re appropriate for the use to which the data will be put. This means that the (unavoidable) errors made are quantified to enable a decision whether they are of an acceptable magnitude and that unacceptable errors are discovered so that corrective action can be taken and erroneous data are not released. In short, quality control must detect both random and systematic errors. In principle, quality control for analytical performance consists of two complementary activities: internal QC and external QC. The internal QC involves the in-house procedures for continuous monitoring of operations and systematic day-to-day checking of the produced data to decide whether these are reliable enough to be released. The procedures primarily monitor the bias of data with the help of control samples and the precision by means of duplicate analyses of test samples and/or of control samples. These activities take place at batch level (second-line control). The external QC involves reference help from other laboratories and participation in national and/or international interlaboratory sample and data exchange programmes (proficiency testing; third-line control). The present chapter focuses mainly on the internal QC as this has to be organised by the laboratory itself. External QC, just as indispensable as the internal QC, is dealt with in Chapter 9.
either direct readings (e.g. pH) or results of one or more calculation steps associated with most analytical methods, are often reported with several numbers after the decimal point. In many cases this suggests a higher significance than is warranted by the combination of procedure and test materials. Since clear rules for rounding and for determining the number of significant decimals are available these will be given here.
8.2.1 Rounding
To allow a better overview and interpretation, to conserve paper (more columns per page), and to simplify subsequent calculations, figures should be rounded up or down leaving out insignificant numbers. - To produce minimal bias, by convention rounding is done as follows: - If the last number is 4 or less, retain the preceding number; - if it is 6 or more, increase the preceding number by 1; - if the last number is 5, the preceding number is made even. Examples:
pH = pH = pH = pH = 5.72 5.76 5.75 5.85 rounds to rounds to rounds to rounds to 5.7 5.8 5.8 5.8
When calculations and statistics have to be performed, rounding must be done at the end. Remark: Traditionally, and by most computer calculation programs, when the last number is 5, the preceding number is raised by 1. There is no objection to this practice as long as it causes no disturbing bias, e.g. in surveys of attributes or effects.
Then choose a equal to the largest decimal unit (...;100; 10; 1; 0.1; 0.01; etc.) which does not exceed the calculated bt
After having done this for each type of analysis at different concentration or intensity levels it will become apparent what the last significant figure or decimal is which may be reported. This exercise has to be repeated regularly but is certainly indicated when a new technique is introduced or when analyses are performed in a nonroutine way or on non-routine test materials. Example Table 8-1. A series of repeated CEC determinations (in cmolc /kg) on a control sample, each in a different batch.
Data 6.55 7.01 7.25 7.83 6.95 7.16 7.83 7.05 6.83 7.63 Rounded 6.6 7.0 7.2 7.8 7.0 7.2 7.8 7.0 6.8 7.6
where x = mean of set of n results s = standard deviation of set of results RSD = relative standard deviation.
8.3.1 Introduction
As stated in Section 8.1, an internal system for quality control is n eeded to ensure that valid data continue to be produced. This implies that systematic checks, e.g. per day or per batch, must show that the test results remain reproducible and that the methodology is actually measuring the analyte or attribute in each sample. An excellent and widely used system of such quality control is the application of (Quality) Control Charts. In analytical laboratories such as soil, plant and water laboratories separate control charts can be used for analytical attributes, for instruments and for analysts. Although several types of control charts can be applied, the present discussion will be restricted to the two most usual types: 1. Control Chart of the Mean for the control of bias; 2. Control Chart of the Range of Duplicates for the control of precision. For the application of quality control charts it is essential that at least Control Samples are available and preferably also (certified) Reference Samples. As the latter are very expensive and, particularly in the case of soil samples, still hard to obtain, laboratories usually have to rely largely on (home-made) control samples. The preparation of control samples is dealt with in Section 8.4.
8.3.2.1 Principle
In each batch of test samples at least one control sample is analyzed and the result is plotted on the control chart of the attribute and the control sample concerned. The basic construction of this Control Chart of the Mean is presented in Fig. 8-1. (Other names are Mean Chart, x-Chart, Levey-Jennings, or Shewhart Control Chart). This shows the (assumed) relation with the normal distribution of the data around the mean. The interpretation and practical use of control charts is based on a number of rules derived from the probability statistics of the normal distribution. These rules are discussed in 8.3.2.3 below. The basic assumption is that when a control result falls within a distance of 2s from the mean, the system was under control and the results of the batch as a whole can be accepted. A control result beyond the distance of 2s from the mean (the "Warning Limit") signals that something may be wrong or tends to go wrong, while a control result beyond 3s (the "Control Limit" or "Action Limit") indicates that the system was statistically out of control and that the results have
to be rejected: the batch has to be repeated after sorting out what went wrong and after correcting the system. Fig. 8-1. The principle of a Control Chart of the Mean. UCL = Upper Control Limit (or Upper Action Limit). LCL = Lower Control Limit (or Lower Action Limit). UWL = Upper Warning Limit. LWL = Lower Warning Limit.
Apart from test results of control samples, control charts can be used for quite a number of other types of data that need to be controlled on a regular basis, e.g. blanks, recoveries, standard deviations, instrument response. A model for a Mean Chart is given. Note. The limits at 2s and 3s may be too strict or not strict enough for particular analyses used for particular purposes. A laboratory is free to choose other limits for analyses. Whatever the choice, this should always be identifiable on the control chart (and stated in the SOP or protocol for the use of control charts and consequent actions). Fig. 8-2. A filled-out control chart of the mean of a control sample.
Example In ten consecutive batches of test samples the CEC of a control sample is determined. The results are: 10.4; 11.6; 10.8; 9.6; 11.2; 11.9; 9.1; 10.4; 10.3; 11.6 cmolc /kg respectively. Using the equations the following parameters for this set of data are obtained: Mean x = 10.7 cmolc /kg, and standard deviation s = 0.91. These are the initial parameters for a new control chart (see Fig. 8-2) and are recorded in the second upper right box of this chart ("data previous chart"). The Mean is drawn as a dashed (nearly) central line. The Warning and Action Limits are calculated in the left lower box, and the corresponding lines drawn as dashed and continuous lines respectively (the Action Line may be drawn in red). The vertical scale is chosen such that the range x 3s is roughly 2.5 to 4 cm. It may turn out, in retrospect, that one (or more) of the initial data lies beyond an initial Action Limit. This result should not have been used for the initial calculations. The calculations then have to be repeated without this result. Therefore, it is advisable to have a few more than ten initial data. The procedure for starting a control chart should be laid down in a SOP.
- 2. Two successive control results beyond same Warning Limit. - 3. Ten successive control results are on the same side of the Mean. (Some laboratories apply six results.) - 4. Whenever results seem unlikely (plausibility check). The Warning Rule is exceeded by mere chance in less than 5% of the cases. The chance that the Rejection Rules are violated on purely statistical grounds can be calculated as follows:
Rule 1: Rule 2: Rule 3: 0.3 % 0.5(0.05)2100% = 0.1% (0.5)10100% = 0.1%
Thus, only less than 0.5% of the results will be rejected by mere chance. (This increases to 2% if in Rule 3 'six results on the same side of the mean' is applied.) If any of the four rejection rules is violated the following actions should be taken: - Repeat the analysis, if the next point is satisfactory, continue the analysis. If not, then - Investigate the cause of the exceeding. - Do not use the results of the batch, run, day or period concerned until the cause is trailed. Only use the results if rectification is justified (e.g. when a calculation error was made). - If no rectification is possible, after elimination of the source of the error, repeat the analysis of the batch(es) concerned. If this next point is satisfactory, the analysis can be continued. Commonly, outliers are caused by simple errors such as calculation or dilution errors, use of wrong standard solutions or dirty glassware. If there is evidence of such a cause, then this outlier can be put on the chart but may not be used in calculating the statistical parameters of the control chart. These events should be recorded on the chart in the box "Remarks". If the parameters are calculated automatically, the outlier value is not entered. Rejection Rule 3 may pose a particular problem. If after the 10th successive result on one side of the mean it appears that a systematic error has entered the process, the acceptance of the previous batches has to be reconsidered. If they cannot be corrected they may have to be repeated (if this is still possible: samples may have deteriorated). Also, the customer(s) may have to be informed. Most probably, however, problems of this type are discovered at an earlier stage by other Quality Control tools such as excessive blank readings, the use of independent standard solutions, instrument calibrations, etc. In addition, by consistent inspection of the control chart three or four consecutive control results at the same side of the mean will attract attention and a shift (see below) may already then be suspected. Rejection Rule 4 is a special case. Unlike the other rules this is a subjective rule based on personal judgement of the analyst and the officer charged with the final screening of the results before release to the customer. Both general and specific knowledge about a sample and the attribute(s) may ring a bell when
certain test results are thought to be unexpectedly or impossibly high or low. Also, results may be contradictive, sometimes only noticed by a complaining client. Obviously, much of the success of the application of this rule depends on the available expertise. Note. A very useful aspect of Quality Control of Data falling under Rejection Rule 4 is the cross-checking of analytical results obtained for one sample (or, sometimes, for a sequence or a group of samples belonging together, e.g. a soil profile or parts of one plant). Certain combinations of data can be considered impossible or highly suspect. For instance, a pH value of 8 and a carbonate content of zero is a highly unlikely combination in soils and should arouse enough suspicion for closer examination and possibly for rejection of either or both results. A number of such contradictions or improbabilities can be built into computer programs and used in automatic cross-checking routines after results are entered into a database. Ideally, these cross-checks are built into a LIMS (Laboratory Information Management System) used by the laboratory. While all LIMSes have options to set ranges within which results of attributes are acceptable, cross-checks of attributes is not a common feature. An example of a LIMS with cross-checks for soil attributes is SOILIMS. Most models of control charts accommodate 30 entries. When a chart is fall a new chart must be started. On the new chart the parameters of the just completed old chart need to be filled in. This is shown on Fig. 8-2. Calculate the "Data this chart" of the old chart and fill these in on the old chart. Perform the two-sided F-test and t-test (see right, to check if the completed chart agrees with the previous data. If this is the case, calculate "Data all charts" by adding the "Data this chart" to the "Data previous charts". These newly calculated "Data all charts" of the completed old chart are the "Data previous charts" of the new chart. Using these data, the new chart can now be initiated by drawing the new control lines as described in 8.3.2.2. Shift In the rare case that the F-test and/or the t-test will not allow the data of a completed control chart to be incorporated in the set of previous data, there is a problem. This has to be resolved before the analysis of the attribute in question can be continued. As indicated above, such a change or shift may have various causes, e.g. introduction of new equipment, instability of the control sample, use of a wrong standard, wrong execution of the method by a substitute analyst. Also, when there is a considerable time interval between batches such a shift may occur (mind the expiry date of reagents!). However, when the control chart is inspected in a proper and consistent manner, usually such errors are discovered before they are revealed by the F and t-test. Drift A less conspicuous and therefore perhaps greater danger than incidental errors or shifts is a gradual change in accuracy or precision of the results. An upward or downward trend or drift of the mean or a gradual increase in the standard deviation may be too small to be revealed by the F or t-test but may be substantial over time. Such a drift could be discovered if a control chart were much longer, say some hundreds of observations. A way to imitate this extension of the horizontal scale is to make a "master" control chart with the values of x and s of the normal control charts. Such a compressed control chart
could be referred to as "Control Chart of the Trend" and is particularly suitable for a visual inspection of the trend. An upward trend can be suspected in Figure 8-2. Indeed, the mean of the first fifteen entries is 10.59 vs. 10.97 cmolc /kg for the last fifteen entries, implying a relative increase of about 3.5%. This indicates that the further trend has to be watched closely. The main cause of drift is often instability of the control sample, but other causes such as deterioration of reagents and equipment must taken into account. Whatever the cause, when discovered, it should be traced and rectified. And here too, if necessary, already released results may have to be corrected. New Control Sample When a control sample is about to run out, or must be replaced because of instability, or for any other reason, a new control sample must be timely prepared so that it can be run concurrently with the old control sample for some time. This allows to make a smooth start without interrupting the analytical programme. As indicated previously, the more initial data are obtained the better (with a minimum of 10) but ideally a complete control chart should be made.
8.3.3.1 Principle
In each batch of test samples at least one sample is analyzed in duplicate and the difference between the results is plotted on the control chart of the attribute concerned. The basic construction of such a Control Chart of the Range of Duplicates is given in Figure 8-3. It shows similarities with the Control Chart of the Mean in that now a mean of differences is calculated with corresponding standard deviation. The warning line and control line can be drawn at Is and 3s distance from the mean of differences. The graph is single-sided as the lowest observable value of the difference is zero. Fig. 8-3. Control Chart of the Range of Duplicates. R = mean of the range of duplicates. WL = Warning Limit. CL = Control Limit (or Action Limit).
where R = mean difference between duplicates Ri = sum of (absolute) differences between duplicates m == number of pairs of duplicates and
(8.6)
where s R = standard deviation of the range of all pairs of duplicates. Fig. 8-4. A filled-out control chart of the range of duplicates of a control sample. Note 1. Equation (8.6) is equivalent to Equation (7.21). This standard deviation is somewhat different from the common standard deviation of a set of data (Eq. 6.2) and results from pooling the standard deviation of
each pair: namely, the duplicates of the pairs have the same population standard deviation. Note 2. If it is decided to routinely run the control sample in duplicate in each batch as described here, a different situation arises with respect to the Mean Chart since now two values for the control sample are obtained instead of one. These values are of equal weight and, therefore, their mean must be used as an entry. It is important to note that the parameters of the thus obtained Mean Chart, particularly the standard deviation, are not the same as those obtained using single values. Hence, these two types should not be mixed up and not be compared by means of the F-test!.
Mean: s:
10.24 0.85
10.13 0.74
R: s R:
0.66 0.52
From the foregoing it must have become clear that the control sample has a crucial function in quality control activities. For most analyses a control sample is indispensable. In principle, its place can be taken by a (certified) reference sample, but these are expensive and for many soil and plant analyses not even available. Therefore, laboratories have to prepare control samples themselves or obtain them from other laboratories. Because the quality control systems rely so heavily on these control samples their preparation should be done with great care so that the samples meet a number of criteria. The main criteria are: 1. The sample is homogeneous 2. The material is stable 3. The material has the correct particle size (i.e. passed a prescribed sieve) 4. The relevant information on properties and composition of the matrix, and the concentration of the analyte or attribute concerned is available. The preparation of a control sample is usually fairly easy and straightforward. As an example it will be described here for a "normal" soil sample (so-called "fine earth") and for a ground plant sample.
8.4.3 Stability
No general statement can be given about the stability of the material. Although dried soil and plant material can be kept for a very long time or even, in practice, indefinitely under favourable conditions, it must be realized that some natural attributes may still (slowly) change, that samples for certain analyses may not be dried and that certainly many "foreign" components such as petroleum products, pesticides or other pollutants change with time or disappear at varying unknown rates. Each sample and attribute has to be judged on this aspect individually. Control charts may give useful information about possible changes during storage (trends, shifts).
8.4.4 Homogeneity
For quality control it is essential that a control sample is homogeneous so that subsamples used in the batches are "identical". In practice this is impossible (except for solutions), and the requirement can be reduced to the condition that the (sub)samples statistically belong to the same population. This implies a test for homogeneity to prove that the daily-use sample containers (the laboratory control samples) into which the bulk sample was split up represent one and the same sample. This can be done in various ways. A relatively simple procedure is described here. Check for homogeneity by duplicate analysis For the check for homogeneity the statistical principles of the two control charts discussed in Section 8.3, i.e. for the Mean and for the Range of Duplicates, are used. The laboratory control samples, prepared by splitting the bulk sample, are analyzed in duplicate in one batch. The analysis used is arbitrary. Usually a rapid, easy and/or cheap analysis suffices. Suitable analyses for soil material are, for example, carbon content, total nitrogen, and loss-on-ignition. For plant samples total nitrogen, phosphorus, or a metal (e.g. Zn) can be used. The organization of the test is schematically given in Fig. 8-6. As stated before, statistically this test only makes sense when a sufficient number of sample
containers are involved (n 7). Do not use too small samples for the analysis as this will adversely affect the representativeness resulting in an unnecessary high standard deviation. Note. A sample may prove to be homogeneous for one attribute but not for another. Therefore, fundamentally, homogeneity of control samples should be tested with an analysis for each attribute for which the control sample is used. This is done for certified reference samples but is often considered too cumbersome for laboratory control samples. On the other hand, such an effort would have the additional advantage that useful information about the procedure and laboratory performance is obtained (repeatability). Also, such values can be used as initial values of control charts. Check on the Mean (sample bias) This is a check to establish if all samples belong to the same population. The means of the duplicates are calculated and treated as single values (xi) for the samples 1 to n. Then, using Equations (6.1) and (6.2), calculate x and s of the data set consisting of the means of duplicates (include all data, i.e. do not exclude outliers). Fig. 8-6. Scheme for the preparation and homogeneity test of control samples.
The rules for interpretation may vary from one laboratory to another and from one attribute to another. In general, values beyond 2 from the mean are s considered outliers and rejected. The sample container concerned may be discarded or analyzed again after which the result may well fall within x 2s and be accepted or, otherwise, the subsample may now definitely be discarded. Check on the Range (sample homogeneity)
This is a check to establish if all samples are homogeneous. The differences R between d uplicates of each pair are calculated (include all data, i.e. do not exclude outliers). Then calculate R and s R of the data set using Equations (8.5) and (8.6) respectively. The interpretation is identical to that for the Check on the Mean as given in the previous paragraph. Thus, a laboratory control sample container may have to be discarded on two grounds: 1. because it does not sufficiently represent the level of the attribute in the control sample and 2. because it is internally too heterogeneous. The preparation of a control sample including a test for homogeneity should be laid down in a SOP. Example In Table 8-3 an example is given of a check for homogeneity of a soil control sample of 5 kg which was split into ten equal laboratory control samples of which the loss-on-ignition was determined in duplicate. The loss-on-ignition can be determined as follows: 1. Weigh approx. 5 g sample into a tared 30 mL porcelain crucible and dry overnight at 105C. 2. Transfer crucible to desiccator to cool; then weigh crucible (accuracy 0.001 g). 3. Place crucibles in furnace and heat at 900C for 4 hours. 4. Allow furnace to cool to about 100C, transfer crucible to desiccator to cool, then weigh crucible with residue (accuracy 0.001 g). Now, the weight loss between 110 and 900C can be calculated and expressed in mass % or in g/kg (weight basis: material dried at 105 C). Table 8-3. Results (in mass/mass %) of duplicate Loss-on-Ignition determinations (A and B) on representative subsamples of ten 500 g laboratory samples of a soil control sample.
Sample 1 2 3 4 5 6 7 8 A 9.10 9.65 9.63 8.65 8.71 9.14 8.71 8.59 B 8.42 8.66 9.18 8.89 9.19 8.93 8.97 8.78 MeanAB 8.760 9.155 9.405 8.770 8.950 9.040 8.840 8.685 0.68 0.99 0.45 0.24 0.48 0.22 0.26 0.19
9 10 Mean: s:
8.990 8.895
0.26 0.29
(* using Eq. 6.2; ** using Eq. 8.6) Tolerance range for mean of duplicates (x 2s): 8.949 2 0.214 = 8.52-9.38% Tolerance range for difference R between duplicates: In this example it appears that only the mean result of sample no. 3 (= 9.405%) falls outside the permissible range. However, since this is only marginally so (less than 0.3% relative) we may still decide to accept the sample without repeating the analysis. The measure R for internal homogeneity falls for all samples within the permissible range. (Should an R be found beyond the range we may opt for repeating the duplicate analysis before deciding to discard that sample.)
8.5 Complaints
Errors that escaped detection by the laboratory may be detected or suspected by the customer. Although this particular type of quality control may not be popular, it should in no case be ignored and can sometimes even be useful. For the dealing with complaints a protocol must be made with an accompanying Registration Form with at least the following items: - name client, and date the complaint was received - work order number - description of complaint - name of person who received the complaint (usually the head of laboratory) - person charged with investigation - result of investigation - name of person(s) who dealt with the complaint - an evaluation and possible action - date when report was sent to client A record of complaints should be kept, the documents involved may be kept in the work order file. The trailing of events (audit trailing) may sometimes not be easy and particularly in such cases the proper registration of all laboratory procedures involved will show to be of great value. Note. Registration of procedures formally also applies to work that has been put out to contract to other laboratories. When work is put out, the quality standards of the subcontractor should be (demonstrably) satisfactory since the final responsibility towards the client lies with the laboratory that put out the work. If the credibility needs to be verified this is usually done by inserting duplicate and blind samples.
8.6 Trouble-shooting
Whenever the quality control detects an error, corrective measures must be taken. As mentioned earlier, the error may be readily recognized as a simple calculation or typing error (decimal point!) which can easily be corrected. If this is not the case, then a systematic investigation must take place. This includes the checking of sample identification, standards, chemicals, pipettes, dispensers, glassware, calibration procedure, and equipment. Standards may be old or wrongly prepared, adjustable pipettes may indicate a wrong volume, glassware may not be cleaned properly, equipment may be dirty (e.g. clogged burner in AAS), or faulty. Particularly electrodes can be a source of error: they may be dirty and their life-time must be observed closely. A pH-electrode may seemingly respond well to calibration buffer solutions but still be faulty. Clearly, every analytical procedure and instrument has its own characteristic weakness, by experience these become known and it is useful to make a list of such relevant check points for each procedure and adhere this to the corresponding SOP or maintenance logbook if it concerns an instrument. Update this list when a new flaw is discovered. Trouble-shooting is further discussed in Section 9.4.
8.7 LIMS
8.7.1 Introduction 8.7.2 What is a LIMS? 8.7.3 How to select a LIMS
8.7.1 Introduction
The various activities in a laboratory produce a large number of data streams which have to be recorded and processed. Some of the main streams are: - Sample registration - Desired analytical programme - Work planning and progress monitoring - Calibration - Raw data - Data processing - Data quality control - Reporting - Invoicing - Archiving Each of these aspects requires its own typical paperwork most of which is done with the help of computers. As discussed in previous chapters, it is the responsibility of the laboratory manager to keep track of all aspects and tie them up for proper functioning of the laboratory as a whole. To assist him in this task, the manager will have to develop a working system of records and journals. In laboratories of any appreciable size, but even with more than two analysts, this can be a tedious and error-sensitive job. Consequently, from about 1980, computer programs appeared on the market that could take over much of this work. Subsequently, the capability of Laboratory Information Management Systems (LIMS) has been further developed and their price has ncreased i likewise.
The main benefit of a LIMS is a drastic reduction of the paperwork and improved data recording, leading to higher efficiency and increased quality of reported analytical results. Thus, a LIMS can be a very important tool in Quality Management.
Data collection and subsequent calculations are usually done "outside" the LIMS. Either with a pocket calculator but more commonly on a PC with a standard type spreadsheet program (such as Lotus 123) or with one supplied with the analytical instrument. The data are then transferred manually or, preferably, by wire or diskette to the LIMS. The larger LIM systems usually have an internal module for this processing. A major problem with the application of a LIMS is the installation and the involved customizing to the specific needs of a laboratory. One of the first asked questions (after asking for the price) is: 'can I directly connect my equipment to the LIMS?'. Invariably the answer of the vendor is positive but the problems involved are usually concealed or unjustly trivialized. It is not uncommon that installations take more than a year before the systems are operational (not to speak of complete failures), and sometimes the performance falls short of the expectations because the operational complexity was underestimated. Mentioning these problems is certainly not meant to discourage the purchase of a LIMS. On the contrary, the use of a LIMS in general can be very rewarding. It is rather intended as a warning that the choice for a system must be very carefully considered.
- Identify LIMS vendors. - Compare requirements with available systems. - Identify suitable systems and make shortlist of vendors. - Ask vendors for demonstration and discuss requirements, possible customization, installation problems, training, and after-sales support. - If possible, contact user(s) of candidate-systems. After comparing the systems on the shortlist the choice can be made. By way of precaution, it may be wise to start with a "pilot" LIMS, a relatively cheap singleuser system in part of the laboratory in order to gain experience and make a more considered decision for a larger system later. It is essential that all laboratory staff are involved and informed right from the start as a LIMS may be considered meddlesome ('big brother is watching me') possibly arousing a negative attitude. Like Quality Management, the success of a LIMS depends to a large extent on the acceptance by the technical staff.