You are on page 1of 86

6 BASIC STATISTICAL TOOLS

There are lies, damn lies, and statistics...... (Anon.) 6.1 Introduction 6.2 Definitions 6.3 Basic Statistics 6.4 Statistical tests

6.1 Introduction
In the preceding chapters basic elements for the proper execution of analytical work such as personnel, laboratory facilities, equipment, and reagents were discussed. Before embarking upon the actual analytical work, however, one more tool for the quality assurance of the work must be dealt with: the statistical operations necessary to control and verify the analytical procedures (Chapter 7) as well as the resulting data (Chapter 8). It was stated before that making mistakes in analytical work is unavoidable. This is the reason why a complex system of precautions to prevent errors and traps to detect them has to be set up. An important aspect of the quality control is the detection of both random and systematic errors. This can be done by critically looking at the performance of the analysis as a whole and also of the instruments and operators involved in the job. For the detection itself as well as for the quantification of the errors, statistical treatment of data is indispensable. A multitude of different statistical tools is available, some of them simple, some complicated, and often very specific for certain purposes. In analytical work, the most important common operation is the comparison of data, or sets of data, to quantify accuracy (bias) and precision. Fortunately, with a few simple convenient statistical tools most of the information needed in regular laboratory work can be obtained: the "t-test, the "F-test", and regression analysis. Therefore, examples of these will be given in the ensuing pages. Clearly, statistics are a tool, not an aim. Simple inspection of data, without statistical treatment, by an experienced and dedicated analyst may be just as useful as statistical figures on the desk of the disinterested. The value of statistics lies with organizing and simplifying data, to permit some objective estimate showing that an analysis is under control or that a change has occurred. Equally important is that the results of these statistical procedures are recorded and can be retrieved.

6.2 Definitions
6.2.1 Error 6.2.2 Accuracy 6.2.3 Precision 6.2.4 Bias

Discussing Quality Control implies the use of several terms and concepts with a specific (and sometimes confusing) meaning. Therefore, some of the most important concepts will be defined first.

6.2.1 Error
Error is the collective noun for any departure of the result from the "true" value*. Analytical errors can be: 1. Random or unpredictable deviations between replicates, quantified with the "standard deviation". 2. Systematic or predictable regular deviation from the "true" value, quantified as "mean difference" (i.e. the difference between the true value and the mean of replicate determinations). 3. Constant, unrelated to the concentration of the substance analyzed (the analyte). 4. Proportional, i.e. related to the concentration of the analyte. * The "true" value of an attribute is by nature indeterminate and often has only a very relative meaning. Particularly in soil science for several attributes there is no such thing as the true value as any value obtained is method-dependent (e.g. cation exchange capacity). Obviously, this does not mean that no adequate analysis serving a purpose is possible. It does, however, emphasize the need for the establishment of standard reference methods and the importance of external QC (see Chapter 9).

6.2.2 Accuracy
The "trueness" or the closeness of the analytical result to the "true" value. It is constituted by a combination of random and systematic errors (precision and bias) and cannot be quantified directly. The test result may be a mean of several values. An accurate determination produces a "true" quantitative value, i.e. it is precise and free of bias.

6.2.3 Precision
The closeness with which results of replicate analyses of a sample agree. It is a measure of dispersion or scattering around the mean value and usually expressed in terms of standard deviation, standard error or a range (difference between the highest and the lowest result).

6.2.4 Bias
The consistent deviation of analytical results from the "true" value caused by systematic errors in a procedure. Bias is the opposite but most used measure for "trueness" which is the agreement of the mean of analytical results with the true value, i.e. excluding the contribution of randomness represented in precision. There are several components contributing to bias: 1. Method bias

The difference between the (mean) test result obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level. 2. Laboratory bias The difference between the (mean) test result from a particular laboratory and the accepted reference value. 3. Sample bias The difference between the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In practice, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important aspect but usually falls outside the responsibility of the laboratory (in some cases laboratories have their own field sampling personnel). The relationship between these concepts can be expressed in the following equation: Figure

The types of errors are illustrated in Fig. 6-1. Fig. 6-1. Accuracy and precision in laboratory measurements. (Note that the qualifications apply to the mean of results: in c the mean is accurate but some individual results are inaccurate)

6.3 Basic Statistics


6.3.1 Mean 6.3.2 Standard deviation 6.3.3 Relative standard deviation. Coefficient of variation 6.3.4 Confidence limits of a measurement 6.3.5 Propagation of errors In the discussions of Chapters 7 and 8 basic statistical treatment of data will be considered. Therefore, some understanding of these statistics is essential and they will briefly be discussed here. The basic assumption to be made is that a set of data, obtained by repeated analysis of the same analyte in the same sample under the same conditions, has a normal or Gaussian distribution. (When the distribution is skewed statistical treatment is more complicated). The primary parameters used are the mean (or average) and the standard deviation (see Fig. 6-2) and the main tools the F-test, the t-test, and regression and correlation analysis. Fig. 6-2. A Gaussian or normal distribution. The figure shows that (approx.) 68% of the data fall in the range x s, 95% in the range x 2s, and 99.7% in the range x 3s.

6.3.1 Mean
The average of a set of n data x i:
(6.1)

6.3.2 Standard deviation


This is the most commonly used measure of the spread or dispersion of data around the mean. The standard deviation is defined as the square root of the variance (V). The variance is defined as the sum of the squared deviations from the mean, divided by n-1. Operationally, there are several ways of calculation:
(6.1)

or
(6.3)

or
(6.4)

The calculation of the mean and the standard deviation can easily be done on a calculator but most conveniently on a PC with computer programs such as

dBASE, Lotus 123, Quattro-Pro, Excel, and others, which have simple ready-touse functions. (Warning: some programs use n rather than n- 1!).

6.3.3 Relative standard deviation. Coefficient of variation


Although the standard deviation of analytical data may not vary much over limited ranges of such data, it usually depends on the magnitude of such data: the larger the figures, the larger s. Therefore, for comparison of variations (e.g. precision) it is often more convenient to use the relative standard deviation (RSD) than the standard deviation itself. The RSD is expressed as a fraction, but more usually as a percentage and is then called coefficient of variation (CV). Often, however, these terms are confused.
(6.5; 6.6)

Note. When needed (e.g. for the F-test, see Eq. 6.11) the variance can, of course, be calculated by squaring the standard deviation:
V = s2 (6.7)

6.3.4 Confidence limits of a measurement


The more an analysis or measurement is replicated, the closer the mean x of the results will approach the "true" value , of the analyte content (assuming absence of bias). A single analysis of a test sample can be regarded as literally sampling the imaginary set of a multitude of results obtained for that test sample. The uncertainty of such subsampling is expressed by
(6.8)

where = "true" value (mean of large set of replicates) x = mean of subsamples t = a statistical value which depends on the number of data and the required confidence (usually 95%). s = standard deviation of mean of subsamples n = number of subsamples (The term is also known as the standard error of the mean.) The critical values for t are tabulated in Appendix 1 (they are, therefore, here referred to as ttab ). To find the applicable value, the number of degrees of freedom has to be established by: df = n -1 (see also Section 6.4.2). Example For the determination of the clay content in the particle-size analysis, a semiautomatic pipette installation is used with a 20 mL pipette. This volume is approximate and the operation involves the opening and closing of taps. Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) and precision have to be established. A tenfold measurement of the volume yielded the following set of data (in mL):

19.941 19.797

19.812 19.937

19.829 19.847

19.828 19.885

19.742 19.804

The mean is 19.842 mL and the standard deviation 0.0627 mL. According to Appendix 1 for n = 10 is ttab = 2.26 (df = 9) and using Eq. (6.8) this calibration yields: pipette volume = 19.842 2.26 (0.0627/ ) = 19.84 0.04 mL (Note that the pipette has a systematic deviation from 20 mL as this is outside the found confidence interval. See also bias). In routine analytical work, results are usually single values obtained in batches of several test samples. No laboratory will analyze a test sample 50 times to be confident that the result is reliable. Therefore, the statistical parameters have to be obtained in another way. Most usually this is done by method validation (see Chapter 7) and/or by keeping control charts, which is basically the collection of analytical results from one or more control samples in each batch (see Chapter 8). Equation (6.8) is then reduced to
(6.9)

where = "true" value x = single measurement t = applicable ttab (Appendix 1) s = standard deviation of set of previous measurements. In Appendix 1 can be seen that if the set of replicated measurements is large (say > 30), t is close to 2. Therefore, the (95%) confidence of the result x of a single test sample (n = 1 in Eq. 6.8) is approximated by the commonly used and well known expression
(6.10)

where S is the previously determined standard deviation of the large set of replicates (see also Fig. 6-2). Note: This "method-s" or s of a control sample is not a constant and may vary for different test materials, analyte levels, and with analytical conditions. Running duplicates will, according to Equation (6.8), increase the confidence of the (mean) result by a factor :

where x = mean of duplicates s = known standard deviation of large set Similarly, triplicate analysis will increase the confidence by a factor , etc. Duplicates are further discussed in Section 8.3.3. Thus, in summary, Equation (6.8) can be applied in various ways to determine the size of errors (confidence) in analytical work or measurements: single

determinations in routine work, determinations for which no previous data exist, certain calibrations, etc.

6.3.5 Propagation of errors


6.3.5.1. Propagation of random errors 6.3.5.2 Propagation of systematic errors The final result of an analysis is often calculated from several measurements performed during the procedure (weighing, calibration, dilution, titration, instrument readings, moisture correction, etc.). As was indicated in Section 6.2, the total error in an analytical result is an adding-up of the sub-errors made in the various steps. For daily practice, the bias and precision of the whole method are usually the most relevant parameters (obtained from validation, Chapter 7; or from control charts, Chapter 8). However, sometimes it is useful to get an insight in the contributions of the subprocedures (and then these have to be determined separately). For instance if one wants to change (part of) the method. Because the "adding-up" of errors is usually not a simple summation, this will be discussed. The main distinction to be made is between random errors (precision) and systematic errors (bias).

6.3.5.1. Propagation of random errors


In estimating the total random error from factors in a final calculation, the treatment of summation or subtraction of factors is different from that of multiplication or division. I. Summation calculations If the final result x is obtained from the sum (or difference) of (sub)measurements a, b, c, etc.: x = a + b + c +... then the total precision is expressed by the standard deviation obtained by taking the square root of the sum of individual variances (squares of standard deviation): If a (sub)measurement has a constant multiplication factor or coefficient (such as an extra dilution), then this is included to calculate the effect of the variance concerned, e.g. (2b)2 Example The Effective Cation Exchange Capacity of soils (ECEC) is obtained by summation of the exchangeable cations: ECEC = Exch. (Ca + Mg + Na + K + H + Al) Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, K and (H + Al) on a certain sample, e.g. a control sample, are: 0.30, 0.25, 0.15, 0.15, and 0.60 cmolc /kg respectively. The total precision is: It can be seen that the total standard deviation is larger than the highest individual standard deviation, but (much) less than their sum. It is also clear that if one wants to reduce the total standard deviation, qualitatively the best result

can be expected from reducing the largest individual contribution, in this case the exchangeable acidity. 2. Multiplication calculations If the final result x is obtained from multiplication (or subtraction) of (sub)measurements according to

then the total error is expressed by the standard deviation obtained by taking the square root of the sum of the individual relative standard deviations (RSD or CV, as a fraction or as percentage, see Eqs. 6.6 and 6.7): If a (sub)measurement has a constant multiplication factor or coefficient, then this is included to calculate the effect of the RSD concerned, e.g. (2RSDb)2. Example The calculation of Kjeldahl-nitrogen may be as follows:

where a = ml HCl required for titration sample b = ml HCl required for titration blank s = air-dry sample weight in gram M = molarity of HCl 1.4 = 1410-3100% (14 = atomic weight of N) mcf = moisture correction factor Note that in addition to multiplications, this calculation contains a subtraction also (often, calculations contain both summations and multiplications.) Firstly, the standard deviation of the titration (a -b) is determined as indicated in Section 7 above. This is then transformed to RSD using Equations (6.5) or (6.6). Then the RSD of the other individual parameters have to be determined experimentally. The found RSDs are, for instance: distillation: 0.8%, titration: 0.5%, molarity: 0.2%, sample weight: 0.2%, mcf: 0.2%. The total calculated precision is: Here again, the highest RSD (of distillation) dominates the total precision. In practice, the precision of the Kjeldahl method is usually considerably worse ( 2.5%) probably mainly as a result of the heterogeneity of the sample. The present example does not take that into account. It would imply that 2.5% - 1.0% = 1.5% or 3/5 of the total random error is due to sample heterogeneity (or other overlooked cause). This implies that painstaking efforts to improve subprocedures such as the titration or the preparation of standard solutions may not be very rewarding. It would, however, pay to improve the homogeneity of the sample, e.g. by careful grinding and mixing in the preparatory stage.

Note. Sample heterogeneity is also represented in the moisture correction factor. However, the influence of this factor on the final result is usually very small.

6.3.5.2 Propagation of systematic errors


Systematic errors of (sub)measurements contribute directly to the total bias of the result since the individual parameters in the calculation of the final result each carry their own bias. For instance, the systematic error in a balance will cause a systematic error in the sample weight (as well as in the moisture determination). Note that some systematic errors may cancel out, e.g. weighings by difference may not be affected by a biased balance. The only way to detect or avoid systematic errors is by comparison (calibration) with independent standards and outside reference or control samples.

6.4 Statistical tests


6.4.1 Two-sided vs. one-sided test 6.4.2 F-test for precision 6.4.3 t-Tests for bias 6.4.4 Linear correlation and regression 6.4.5 Analysis of variance (ANOVA) In analytical work a frequently recurring operation is the verification of performance by comparison of data. Some examples of comparisons in practice are: - performance of two instruments, - performance of two methods, - performance of a procedure in different periods, - performance of two analysts or laboratories, - results obtained for a reference or control sample with the "true", "target" or "assigned" value of this sample. Some of the most common and convenient statistical tools to quantify such comparisons are the F-test, the t-tests, and regression analysis. Because the F-test and the t-tests are the most basic tests they will be discussed first. These tests examine if two sets of normally distributed data are similar or dissimilar (belong or not belong to the same "population") by comparing their standard deviations and means respectively. This is illustrated in Fig. 6-3. Fig. 6-3. Three possible cases when comparing two sets of data (n1 = n2). A. Different mean (bias), same precision; B. Same mean (no bias), different precision; C. Both mean and precision are different. (The fourth case, identical sets, has not been drawn).

6.4.1 Two-sided vs. one-sided test


These tests for comparison, for instance between methods A and B, are based on the assumption that there is no significant difference (the "null hypothesis"). In other words, when the difference is so small that a tabulated critical value of F or t is not exceeded, we can be confident (usually at 95% level) that A and B are not different. Two fundamentally different questions can be asked concerning both the comparison of the standard deviations s 1 and s 2 with the F-test, and of the meansx1, and x2, with the t-test: 1. are A and B different? (two-sided test) 2. is A higher (or lower) than B? (one-sided test). This distinction has an important practical implication as statistically the probabilities for the two situations are different: the chance that A and B are only different ("it can go two ways") is twice as large as the chance that A is higher (or lower) than B ("it can go only one way"). The most common case is the twosided (also called two-tailed) test: there are no particular reasons to expect that the means or the standard deviations of two data sets are different. An example

is the routine comparison of a control chart with the previous one (see 8.3). However, when it is expected or suspected that the mean and/or the standard deviation will go only one way, e.g. after a change in an analytical procedure, the one-sided (or one-tailed) test is appropriate. In this case the probability that it goes the other way than expected is assumed to be zero and, therefore, the probability that it goes the expected way is doubled. Or, more correctly, the uncertainty in the two-way test of 5% (or the probability of 5% that the critical value is exceeded) is divided over the two tails of the Gaussian curve (see Fig. 6-2), i.e. 2.5% at the end of each tail beyond 2s. If we perform the one-sided test with 5% uncertainty, we actually increase this 2.5% to 5% at the end of one tail. (Note that for the whole gaussian curve, which is symmetrical, this is then equivalent to an uncertainty of 10% in two ways!) This difference in probability in the tests is expressed in the use of two tables of critical values for both F and t. In fact, the one-sided table at 95% confidence level is equivalent to the two-sided table at 90% confidence level. It is emphasized that the one-sided test is only appropriate when a difference in one direction is expected or aimed at. Of course it is tempting to perform this test after the results show a clear (unexpected) effect. In fact, however, then a two times higher probability level was used in retrospect. This is underscored by the observation that in this way even contradictory conclusions may arise: if in an experiment calculated values of F and t are found within the range between the two-sided and one-sided values of Ftab, and ttab, the two-sided test indicates no significant difference, whereas the one-sided test says that the result of A is significantly higher (or lower) than that of B. What actually happens is that in the first case the 2.5% boundary in the tail was just not exceeded, and then, subsequently, this 2.5% boundary is relaxed to 5% which is then obviously more easily exceeded. This illustrates that statistical tests differ in strictness and that for proper nterpretation of results in reports, the statistical techniques used, i including the confidence limits or probability, should always be specified.

6.4.2 F-test for precision


Because the result of the F-test may be needed to choose between the Student's t-test and the Cochran variant (see next section), the F-test is discussed first. The F-test (or Fisher's test) is a comparison of the spread of two sets of data to test if the sets belong to the same population, in other words if the precisions are similar or dissimilar. The test makes use of the ratio of the two variances:
(6.11)

where the larger s 2 must be the numerator by convention. If the performances are not very different, then the estimates s 1, and s2, do not differ much and their ratio (and that of their squares) should not deviate much from unity. In practice, the calculated F is compared with the applicable F value in the F-table (also called the critical value, see Appendix 2). To read the table it is necessary to know the applicable number of degrees of freedom for s 1, and s 2. These are calculated by: df1 = n1-1 df2 = n2-1

If Fcal Ftab one can conclude with 95% confidence that there is no significant difference in precision (the "null hypothesis" that s1, = s, is accepted). Thus, there is still a 5% chance that we draw the wrong conclusion. In certain cases more confidence may be needed, then a 99% confidence table can be used, which can be found in statistical textbooks. Example I (two-sided test) Table 6-1 gives the data sets obtained by two analysts for the cation exchange capacity (CEC) of a control sample. Using Equation (6.11) the calculated F value is 1.62. As we had no particular reason to expect that the analysts would perform differently, we use the F-table for the two-sided test and find Ftab = 4.03 (Appendix 2, df1, = df2 = 9). This exceeds the calculated value and the null hypothesis (no difference) is accepted. It can be concluded with 95% confidence that there is no significant difference in precision between the work of Analyst 1 and 2. Table 6-1. CEC values (in cmolc /kg) of a control sample determined by two analysts.
1 10.2 10.7 10.5 9.9 9.0 11.2 11.5 10.9 8.9 10.6 x: s: n: Fcal = 1.62 Ftab = 4.03 2 9.7 9.0 10.2 10.3 10.8 11.1 9.4 9.2 9.8 10.2 10.34 0.819 10 tcal = 1.12 ttab = 2.10 9.97 0.644 10

Example 2 (one-sided test) The determination of the calcium carbonate content with the Scheibler standard method is compared with the simple and more rapid "acid-neutralization" method using one and the same sample. The results are given in Table 6-2. Because of the nature of the rapid method we suspect it to produce a lower precision then

obtained with the Scheibler method and we can, therefore, perform the one sided F-test. The applicable Ftab = 3.07 (App. 2, df1, = 12, df2 = 9) which is lower than Fcal (=18.3) and the null hypothesis (no difference) is rejected. It can be concluded (with 95% confidence) that for this one sample the precision of the rapid titration method is significantly worse than that of the Scheibler method. Table 6-2. Contents of CaCO3 (in mass/mass %) in a soil sample determined with the Scheibler method (A) and the rapid titration method (B).
A 2.5 2.4 2.5 2.6 2.5 2.5 2.4 2.6 2.7 2.4 B 1.7 1.9 2.3 2.3 2.8 2.5 1.6 1.9 2.6 1.7 2.4 2.2 2.6 x: s: n: Fcal = 18.3 Ftab = 3.07 2.51 0.099 10 tcal = 3.12 ttab* = 2.18 2.13 0.424 13

(ttab* = Cochran's "alternative" ttab)

6.4.3 t-Tests for bias


6.4.3.1. Student's t-test 6.4.3.2 Cochran's t-test

6.4.3.3 t-Test for large data sets (n 30) 6.4.3.4 Paired t-test Depending on the nature of two sets of data (n, s, sampling nature), the means of the sets can be compared for bias by several variants of the t-test. The following most common types will be discussed: 1. Student's t-test for comparison of two independent sets of data with very similar standard deviations; 2. the Cochran variant of the t-test when the standard deviations of the independent sets differ significantly; 3. the paired t-test for comparison of strongly dependent sets of data. Basically, for the t-tests Equation (6.8) is used but written in a different way:
(6.12)

where x = mean of test results of a sample = "true" or reference value s = standard deviation of test results n = number of test results of the sample. To compare the mean of a data set with a reference value normally the "twosided t-table of critical values" is used (Appendix 1). The applicable number of degrees of freedom here is: df = n-1 If a value for t calculated with Equation (6.12) does not exceed the critical value in the table, the data are taken to belong to the same population: there is no difference and the "null hypothesis" is accepted (with the applicable probability, usually 95%). As with the F-test, when it is expected or suspected that the obtained results are higher or lower than that of the reference value, the one-sided t-test can be performed: if tcal > ttab, then the results are significantly higher (or lower) than the reference value. More commonly, however, the "true" value of proper reference samples is accompanied by the associated standard deviation and number of replicates used to determine these parameters. We can then apply the more general case of comparing the means of two data sets: the "true" value in Equation (6.12) is then replaced by the mean of a second data set. As is shown in Fig. 6-3, to test if two data sets belong to the same population it is tested if the two Gauss curves do sufficiently overlap. In other words, if the difference between the means x1-x2 is small. This is discussed next. Similarity or non-similarity of standard deviations When using the t-test for two small sets of data (n1 and/or n2<30), a choice of the type of test must be made depending on the similarity (or non-similarity) of the standard deviations of the two sets. If the standard deviations are sufficiently similar they can be "pooled" and the Student t-test can be used. When the standard deviations are not sufficiently similar an alternative procedure for the t-

test must be followed in which the standard deviations are not pooled. A convenient alternative is the Cochran variant of the t-test. The criterion for the choice is the passing or non-passing of the F-test (see 6.4.2), that is, if the variances do or do not significantly differ. Therefore, for small data sets, the Ftest should precede the t-test. For dealing with large data sets (n1, n2, 30) the "normal" t-test is used (see Section 6.4.3.3 and App. 3).

6.4.3.1. Student's t-test


(To be applied to small data sets (n1, n2 < 30) where s 1, and s 2 are similar according to F-test. When comparing two sets of data, Equation (6.12) is rewritten as:
(6.13)

where x1 = mean of data set 1 x2 = mean of data set 2 s p = "pooled" standard deviation of the sets n1 = number of data in set 1 n2 = number of data in set 2. The pooled standard deviation s p is calculated by:
6.14

where s 1 = standard deviation of data set 1 s 2 = standard deviation of data set 2 n1 = number of data in set 1 n2 = number of data in set 2. To perform the t-test, the critical ttab has to be found in the table (Appendix 1); the applicable number of degrees of freedom df is here calculated by: df = n1 + n2 -2 Example The two data sets of Table 6-1 can be used: With Equations (6.13) and (6.14) tcal, is calculated as 1.12 which is lower than the critical value ttab of 2.10 (App. 1, df = 18, two-sided), hence the null hypothesis (no difference) is accepted and the two data sets are assumed to belong to the same population: there is no significant difference between the mean results of the two analysts (with 95% confidence). Note. Another illustrative way to perform this test for bias is to calculate if the difference between the means falls within or outside the range where this difference is still not significantly large. In other words, if this difference is less than the least significant difference (lsd). This can be derived from Equation (6.13):

6.15

In the present example of Table 6 the calculation yields lsd = 0.69. The -1, measured difference between the means is 10.34 -9.97 = 0.37 which is smaller than the lsd indicating that there is no significant difference between the performance of the analysts. In addition, in this approach the 95% confidence limits of the difference between the means can be calculated (cf. Equation 6.8): confidence limits = 0.37 0.69 = -0.32 and 1.06 Note that the value 0 for the difference is situated within this confidence interval which agrees with the null hypothesis of x 1 = x 2 (no difference) having been accepted.

6.4.3.2 Cochran's t-test


To be applied to small data sets (n1, n2, < 30) where s 1 and s 2, are dissimilar according to F-test. Calculate t with:
6.16

Then determine an "alternative" critical t-value:


6.17

where t1 = ttab at n1-1 degrees of freedom t2 = ttab at n2-1 degrees of freedom Now the t-test can be performed as usual: if tcal< ttab* then the null hypothesis that the means do not significantly differ is accepted. Example The two data sets of Table 6-2 can be used. According to the F-test, the standard deviations differ significantly so that the Cochran variant must be used. Furthermore, in contrast to our expectation that the precision of the rapid test would be inferior, we have no idea about the bias and therefore the two-sided test is appropriate. The calculations yield tcal = 3.12 and ttab*= 2.18 meaning that tcal exceeds ttab* which implies that the null hypothesis (no difference) is rejected and that the mean of the rapid analysis deviates significantly from that of the standard analysis (with 95% confidence, and for this sample only). Further investigation of the rapid method would have to include the use of more different samples and then comparison with the onesided t-test would be justified (see 6.4.3.4, Example 1).

6.4.3.3 t-Test for large data sets (n 30)

In the example above (6.4.3.2) the conclusion happens to have been the same if the Student's t-test with pooled standard deviations had been used. This is

caused by the fact that the difference in result of the Student and Cochran variants of the t-test is largest when small sets of data are compared, and decreases with increasing number of data. Namely, with increasing number of data a better estimate of the real distribution of the population is obtained (the flatter t-distribution converges then to the standardized normal distribution). When n 30 for both sets, e.g. when comparing Control Charts (see 8.3), for all practical purposes the difference between the Student and Cochran variant is negligible. The procedure is then reduced to the "normal" t-test by simply calculating tcal with Eq. (6.16) and comparing this with ttab at df = n1 + n2-2. (Note in App. 1 that the two-sided ttab is now close to 2). The proper choice of the t-test as discussed above is summarized in a flow diagram in Appendix 3.

6.4.3.4 Paired t-test


When two data sets are not independent, the paired t-test can be a better tool for comparison than the "normal" t-test described in the previous sections. This is for instance the case when two methods are compared by the same analyst using the same sample(s). It could, in fact, also be applied to the example of Table 6-1 if the two analysts used the same analytical method at (about) the same time. As stated previously, comparison of two methods using different levels of analyte gives more validation information about the methods than using only one level. Comparison of results at each level could be done by the F and t-tests as described above. The paired t-test, however, allows for different levels provided the concentration range is not too wide. As a rule of fist, the range of results should be within the same magnitude. If the analysis covers a longer range, i.e. several powers of ten, regression analysis must be considered (see Section 6.4.4). In intermediate cases, either technique may be chosen. The null hypothesis is that there is no difference between the data sets, so the test is to see if the mean of the differences between the data deviates significantly from zero or not (two-sided test). If it is expected that one set is systematically higher (or lower) than the other set, then the one-sided test is appropriate. Example 1 The "promising" rapid single-extraction method for the determination of the cation exchange capacity of soils using the silver thiourea complex (AgTU, buffered at pH 7) was compared with the traditional ammonium acetate method (NH4OAc, pH 7). Although for certain soil types the difference in results appeared insignificant, for other types differences seemed larger. Such a suspect group were soils with ferralic (oxic) properties (i.e. highly weathered sesquioxide-rich soils). In Table 6-3 the results often soils with these properties are grouped to test if the CEC methods give different results. The difference d within each pair and the parameters needed for the paired t-test are given also. Table 6-3. CEC values (in cmolc /kg) obtained by the NH4OAc and AgTU methods (both at pH 7) for ten soils with ferralic properties.
Sample 1 2 NH4OAc 7.1 4.6 AgTU 6.5 5.6 d -0.6 +1.0

3 4 5 6 7 8 9 10 d = +2.19 s d = 2.395

10.6 2.3 25.2 4.4 7.8 2.7 14.3 13.6 tcal = 2.89 ttab = 2.26

14.5 5.6 23.8 10.4 8.4 5.5 19.2 15.0

+3.9 +3.3 -1.4 +6.0 +0.6 +2.8 +4.9 +1.4

Using Equation (6.12) and noting that d = 0 (hypothesis value of the differences, i.e. no difference), the t-value can be calculated as:

where = mean of differences within each pair of data s d = standard deviation of the mean of differences n = number of pairs of data The calculated t value (=2.89) exceeds the critical value of 1.83 (App. 1, df = n 1 = 9, one-sided), hence the null hypothesis that the methods do not differ is rejected and it is concluded that the silver thiourea method gives significantly higher results as compared with the ammonium acetate method when applied to such highly weathered soils. Note. Since such data sets do not have a normal distribution, the "normal" t-test which compares means of sets cannot be used here (the means do not constitute a fair representation of the sets). For the same reason no information about the precision of the two methods can be obtained, nor can the F-test be applied. For information about precision, replicate determinations are needed. Example 2 Table 6-4 shows the data of total-P in four plant tissue samples obtained by a laboratory L and the median values obtained by 123 laboratories in a proficiency (round-robin) test. Table 6-4. Total-P contents (in mmol/kg) of plant tissue as determined by 123 laboratories (Median) and Laboratory L.
Sample Median Lab L d

1 2 3 4 d = 7.70 s d = 12.702

93.0 201 78.9 175 tcal =1.21 ttab = 3.18

85.2 224 84.5 185

-7.8 23 5.6 10

To verify the performance of the laboratory a paired t-test can be performed: Using Eq. (6.12) and noting that d=0 (hypothesis value of the differences, i.e. no difference), the t value can be calculated as:

The calculated t-value is below the critical value of 3.18 (Appendix 1, df = n - 1 = 3, two-sided), hence the null hypothesis that the laboratory does not significantly differ from the group of laboratories is accepted, and the results of Laboratory L seem to agree with those of "the rest of the world" (this is a so-called third-line control).

6.4.4 Linear correlation and regression


6.4.4.1 Construction of calibration graph 6.4.4.2 Comparing two sets of data using many samples at different analyte levels These also belong to the most common useful statistical tools to compare effects and performances X and Y. Although the technique is in principle the same for both, there is a fundamental difference in concept: correlation analysis is applied to independent factors: if X increases, what will Y do (increase, decrease, or perhaps not change at all)? In regression analysis a unilateral response is assumed: changes in X result in changes in Y, but changes in Y do not result in changes in X. For example, in analytical work, correlation analysis can be used for comparing methods or laboratories, whereas regression analysis can be used to construct calibration graphs. In practice, however, comparison of laboratories or methods is usually also done by regression analysis. The calculations can be performed on a (programmed) calculator or more conveniently on a PC using a homemade program. Even more convenient are the regression programs included in statistical packages such as Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS, and others. Also, most spreadsheet programs such as Lotus 123, Excel, and Quattro-Pro have functions for this. Laboratories or methods are in fact independent factors. However, for regression analysis one factor has to be the independent or "constant" factor (e.g. the reference method, or the factor with the smallest standard deviation). This factor is by convention designated X, whereas the other factor is then the dependent factor Y (thus, we speak of "regression of Y on X").

As was discussed in Section 6.4.3, such comparisons can often been done with the Student/Cochran or paired t-tests. However, correlation analysis is indicated: 1. When the concentration range is so wide that the errors, both random and systematic, are not independent (which is the assumption for the ttests). This is often the case where concentration ranges of several magnitudes are involved. 2. When pairing is inappropriate for other reasons, notably a long time span between the two analyses (sample aging, change in aboratory l conditions, etc.). The principle is to establish a statistical linear relationship between two sets of corresponding data by fitting the data to a straight line by means of the "least squares" technique. Such data are, for example, analytical results of two methods applied to the same samples (correlation), or the response of an instrument to a series of standard solutions (regression). Note: Naturally, non-linear higher-order relationships are also possible, but since these are less common in analytical work and more complex to handle mathematically, they will not be discussed here. Nevertheless, to avoid misinterpretation, always inspect the kind of relationship by plotting the data, either on paper or on the computer monitor. The resulting line takes the general form:
y = bx + a (6.18)

where a = intercept of the line with the y-axis b = slope (tangent) In laboratory work ideally, when there is perfect positive correlation without bias, the intercept a = 0 and the slope = 1. This is the so-called "1:1 line" passing through the origin (dashed line in Fig. 6-5). If the intercept a 0 then there is a systematic discrepancy (bias, error) between X and Y; when b 1 then there is a proportional response or difference between X and Y. The correlation between X and Y is expressed by the correlation coefficient r which can be calculated with the following equation:
6.19

where x i = data X x = mean of data X y i = data Y y = mean of data Y It can be shown that r can vary from 1 to -1: r = 1 perfect positive linear correlation r = 0 no linear correlation (maybe other correlation) r = -1 perfect negative linear correlation

Often, the correlation coefficient r is expressed as r2: the coefficient of determination or coefficient of variance. The advantage of r2 is that, when multiplied by 100, it indicates the percentage of variation in Y associated with variation in X. Thus, for example, when r = 0.71 about 50% (r2 = 0.504) of the variation in Y is due to the variation in X. The line parameters b and a are calculated with the following equations:
6.20

and
a = y - bx 6.21

It is worth to note that r is independent of the choice which factor is the independent factory and which is the dependent Y. However, the regression parameters a and do depend on this choice as the regression lines will be different (except when there is ideal 1:1 correlation).

6.4.4.1 Construction of calibration graph


As an example, we take a standard series of P (0-1.0 mg/L) for the spectrophotometric determination of phosphate in a Bray-I extract ("available P"), reading in absorbance units. The data and calculated terms needed to determine the parameters of the calibration graph are given in Table 6-5. The line itself is plotted in Fig. 6-4. Table 6-5 is presented here to give an insight in the steps and terms involved. The calculation of the correlation coefficient r with Equation (6.19) yields a value of 0.997 (r2 = 0.995). Such high values are common for calibration graphs. When the value is not close to 1 (say, below 0.98) this must be taken as a warning and it might then be advisable to repeat or review the procedure. Errors may have been made (e.g. in pipetting) or the used range of the graph may not be linear. On the other hand, a high r may be misleading as it does not necessarily indicate linearity. Therefore, to verify this, the calibration graph should always be plotted, either on paper or on computer monitor. Using Equations (6.20 and (6.21) we obtain:

and a = 0.350 - 0.313 = 0.037 Thus, the equation of the calibration line is:
y = 0.626x + 0.037 (6.22)

Table 6-5. Parameters of calibration graph in Fig. 6-4.


xi 0.0 0.2 0.4 0.6 yi 0.05 0.14 0.29 0.43 x 1-x -0.5 -0.3 -0.1 0.1 (x i -x)2 0.25 0.09 0.01 0.01 yi -y -0.30 -0.21 -0.06 0.08 (yi -y)2 0.090 0.044 0.004 0.006 (x 1-x)(yi -y) 0.150 0.063 0.006 0.008

0.8 1.0 3.0 x=0.5

0.52 0.67 2.10 y = 0.35

0.3 0.5 0

0.09 0.25 0.70

0.17 0.32 0

0.029 0.102 0.2754

0.051 0.160 0.438

Fig. 6-4. Calibration graph plotted from data of Table 6-5. The dashed lines delineate the 95% confidence area of the graph. Note that the confidence is highest at the centroid of the graph.

During calculation, the maximum number of decimals is used, rounding off to the last significant figure is done at the end (see instruction for rounding off in Section 8.2). Once the calibration graph is established, its use is simple: for each y value measured the corresponding concentration x can be determined either by direct reading or by calculation using Equation (6.22). The use of calibration graphs is further discussed in Section 7.2.2. Note. A treatise of the error or uncertainty in the regression line is given.

6.4.4.2 Comparing two sets of data using many samples at different analyte levels

Although regression analysis assumes that one factor (on the x-axis) is constant, when certain conditions are met the technique can also successfully be applied to comparing two variables such as laboratories or methods. These conditions are: - The most precise data set is plotted on the x-axis - At least 6, but preferably more than 10 different samples are analyzed - The samples should rather uniformly cover the analyte level range of interest. To decide which laboratory or method is the most precise, multi-replicate results have to be used to calculate standard deviations (see 6.4.2). If these are not available t en the standard deviations of the present sets could be compared h (note that we are now not dealing with normally distributed sets of replicate results). Another convenient way is to run the regression analysis on the computer, reverse the variables and run the analysis again. Observe which variable has the lowest standard deviation (or standard error of the intercept a, both given by the computer) and then use the results of the regression analysis where this variable was plotted on the x-axis. If the analyte level range is incomplete, one might have to resort to spiking or standard additions, with the inherent drawback that the original analyte-sample combination may not adequately be reflected. Example In the framework of a performance verification programme, a large number of soil samples were analyzed by two laboratories X and Y (a form of "third-line control", see Chapter 9) and the data compared by regression. (In this particular case, the paired t-test might have been considered also). The regression line of a common attribute, the pH, is shown here as an illustration. Figure 6-5 shows the so-called "scatter plot" of 124 soil pH-H2O determinations by the two laboratories. The correlation coefficient r is 0.97 which is very satisfactory. The slope (= 1.03) indicates that the regression line is only slightly steeper than the 1:1 ideal regression line. Very disturbing, however, is the intercept a of -1.18. This implies that laboratory Y measures the pH more than a whole unit lower than laboratory X at the low end of the pH range (the intercept -1.18 is at pHx = 0) which difference decreases to about 0.8 unit at the high end. Fig. 6-5. Scatter plot of pH data of two laboratories. Drawn line: regression line; dashed line: 1:1 ideal regression line.

The t-test for significance is as follows: For intercept a: a = 0 (null hypothesis: no bias; ideal intercept is then zero), standard error =0.14 (calculated by the computer), and using Equation (6.12) we obtain: Here, ttab = 1.98 (App. 1, two-sided, df = n - 2 = 122 (n-2 because an extra degree of freedom is lost as the data are used for both a and b) hence, the laboratories have a significant mutual bias. For slope: b = 1 (ideal slope: null hypothesis is no difference), standard error = 0.02 (given by computer), and again using Equation (6.12) we obtain: Again, ttab = 1.98 (App. 1; two-sided, df = 122), hence, the difference between the laboratories is not significantly proportional (or: the laboratories do not have a significant difference in sensitivity). These results suggest that in spite of the good correlation, the two laboratories would have to look into the cause of the bias. Note. In the present example, the scattering of the points around the regression line does not seem to change much over the whole range.

This indicates that the precision of laboratory Y does not change very much over the range with respect to laboratory X. This is not always the case. In such cases, weighted regression (not discussed here) is more appropriate than the unweighted regression as used here. Validation of a method (see Section 7.5) may reveal that precision can change significantly with the level of analyte (and with other factors such as sample matrix).

6.4.5 Analysis of variance (ANOVA)


When results of laboratories or methods are compared where more than one factor can be of influence and must be distinguished from random effects, then ANOVA is a powerful statistical tool to be used. Examples of such factors are: different analysts, samples with different pre-treatments, different analyte levels, different methods within one of the laboratories). Most statistical packages for the PC can perform this analysis. As a treatise of ANOVA is beyond the scope of the present Guidelines, for further discussion the reader is referred to statistical textbooks, some of which are given in the list of Literature. Error or uncertainty in the regression line The "fitting" of the calibration graph is necessary because the response points y i, composing the line do not fall exactly on the line. Hence, random errors are implied. This is expressed by an uncertainty about the slope and intercept b and a defining the line. A quantification can be found in the standard deviation of these parameters. Most computer programmes for regression will automatically produce figures for these. To illustrate the procedure, the example of the calibration graph in Section 6.4.3.1 is elaborated here. A practical quantification of the uncertainty is obtained by calculating the standard deviation of the points on the line; the "residual standard deviation" or "standard error of the y-estimate", which we assumed to be constant (but which is only approximately so, see Fig. 6-4):
(6.23)

where = "fitted" y-value for each x i, (read from graph or calculated with Eq. 6.22). Thus, the line. is the (vertical) deviation of the found y-values from

n = number of calibration points. Note: Only the y-deviations of the points from the line are considered. It is assumed that deviations in the x -direction are negligible. This is, of course, only the case if the standards are very accurately prepared. Now the standard deviations for the intercept a and slope b can be calculated with:

6.24

and
6.25

To make this procedure clear, the parameters involved are listed in Table 6-6. The uncertainty about the regression line is expressed by the confidence limits of a and b according to Eq. (6.9): a t.s a and b t.s b Table 6-6. Parameters for calculating errors due to calibration graph (use also figures of Table 6-5).
xi 0 0.2 0.4 0.6 0.8 1.0 yi 0.05 0.14 0.29 0.43 0.52 0.67 0.037 0.162 0.287 0.413 0.538 0.663 0.013 -0.022 0.003 0.017 -0.018 0.007 0.0002 0.0005 0.0000 0.0003 0.0003 0.0001 0.001364

In the present example, using Eq. (6.23), we calculate

and, using Eq. (6.24) and Table 6-5:

and, using Eq. (6.25) and Table 6-5:

The applicable ttab is 2.78 (App. 1, two-sided, df = n -1 = 4) hence, using Eq. (6.9): a = 0.037 2.78 0.0132 = 0.037 0.037 and b = 0.626 2.78 0.0219 = 0.626 0.061 Note that if s a is large enough, a negative value for a is possible, i.e. a negative reading for the blank or zero-standard. (For a discussion about the error in x resulting from a reading in y, which is particularly relevant for reading a calibration graph, see Section 7.2.3)

The uncertainty about the line is somewhat decreased by using more calibration points (assuming s y has not increased): one more point reduces ttab from 2.78 to 2.57 (see Appendix 1).

7 QUALITY OF ANALYTICAL PROCEDURES


7.1 Introduction 7.2 Calibration graphs 7.3 Blanks and Detection limit 7.4 Types of sample material 7.5 Validation of own procedures 7.6 Drafting an analytical procedure 7.7 Research plan SOPs

7.1 Introduction
In this chapter the actual execution of the jobs for which the laboratory is intended, is dealt with. The most important part of this work is of course the analytical procedures meticulously performed according to the corresponding SOPs. Relevant aspects include calibration, use of blanks, performance characteristics of the procedure, and reporting of results. An aspect of utmost importance of quality management, the quality control by inspection of the results, is discussed separately in Chapter 8. All activities associated with these aspects are aimed at one target: the production of reliable data with a minimum of errors. In addition, it must be ensured that reliable data are produced consistently. To achieve this an appropriate programme of quality control (QC) must be implemented. Quality control is the term used to describe the practical steps undertaken to ensure that errors in the analytical data are of a magnitude appropriate for the use to which the data will be put. This implies that the errors (which are unavoidably made) have to be quantified to enable a decision whether they are of an acceptable magnitude, and that unacceptable errors are discovered so that corrective action can be taken. Clearly, quality control must detect both random and systematic errors. The procedures for QC primarily monitor the accuracy of the work by checking the bias of data with the help of (certified) reference samples and control samples and the precision by means of replicate analyses of test samples as well as of reference and/or control samples.

7.2 Calibration graphs


7.2.1 Principle 7.2.2 Construction and use 7.2.3 Error due to the regression line 7.2.4 Independent standards 7.2.5 Measuring a batch

7.2.1 Principle
Here, the construction and use of calibration graphs or curves in daily practice of a laboratory will be discussed. Calibration of instruments (including adjustment) in the present context are also referred to as standardization. The confusion

about these terms is mainly semantic and the terms calibration curve and standard curve are generally used interchangeably. The term "curve" implies that the line is not straight. However, the best (parts of) calibration lines are linear and, therefore, the general term "graph" is preferred. For many measuring techniques calibration graphs have to be constructed. The technique is simple and consists of plotting the instrument response against a series of samples with known concentrations of the analyte (standards). In practice, these standards are usually pure chemicals dispersed in a matrix corresponding with that of the test samples (the "unknowns"). By convention, the calibration graph is always plotted with the concentration of the standards on the x-axis and the reading of the instrument response on the y-axis. The unknowns are determined by interpolation, not by extrapolation, so that a suitable working range for the standards must be selected. In addition, in the present discussion it is assumed that the working range is limited to the linear range of the calibration graphs and that the standard deviation does not change over the range (neither of which is always the case* and that data are normally distributed. Non-linear graphs can sometimes be linearized in a simple way, e.g. by using a log scale (in potentiometry), but usually imply statistical problems (polynomial regression) for which the reader is referred to the relevant literature. It should be mentioned, however, that in modem instruments which make and use calibration graphs automatically these aspects sometimes go by unnoticed. * This is the so-called "unweighted" regression line. Because normally the standard deviation is not constant over the concentration range (it is usually least in the middle range), this difference in error should be taken into account. This would then yield a "weighted regression line". The calculation of this is more complicated and information about the standard deviation of the y -readings has to be obtained. The gain in precision is usually very limited, but sometimes the extra information about the error may be useful. Some common practices to obtain calibration graphs are: 1. The standards are made in a solution with the same composition as the extractant used for the samples (with the same dilution factor) so that all measurements are done in the same matrix. This technique is often practised when analyzing many batches where the same standards are used for some time. In this way an incorrectly prepared extractant or matrix may be detected (in blank or control sample). 2. The standards are made in the blank extract. A disadvantage of this technique is that for each batch the standards have to be pipetted. Therefore, this type of calibration is sometimes favoured when only one or few batches are analyzed or when the extractant is unstable. A seeming advantage is that the blank can be forced to zero. However, an incorrect extractant would then more easily go by undetected. The disadvantage of pipetting does not apply in case of automatic dispensing of reagents when equal volumes of different concentration are added (e.g. with flow-injection). 3. Less common, but useful in special cases is the so-called standard additions technique. This can be practised when a matrix mismatch between samples and standards needs to be avoided: the standards are

prepared from actual samples. The general procedure is to take a number of aliquots of sample or extract, add different quantities of the analyte to each aliquot (spiking) and dilute to the final volume. One aliquot is used without the addition of the analyte (blank). Thus, a standard series is obtained. If calibration is involved in an analytical procedure, the SOP for this should include a description of the calibration sub-procedure. If applicable, including an optimalization procedure (usually given in the instruction manual).

7.2.2 Construction and use


In several laboratories calibration graphs for some analyses are still adequately plotted manually and the straight line (or sometimes a curved line) is drawn with a visual "best fit", e.g. for flame atomic emission spectrometry, or colorimetry. However, this practice is only legitimate when the random errors in the measurements of the standards are small: when the scattering is appreciable the line-fitting becomes subjective and unreliable. Therefore, if a calibration graph is not made automatically by a microprocessor of the instrument, the following more objective and also quantitatively more informative procedure is generally favoured. The proper way of constructing the graph is essentially the performance of a regression analysis i.e., the statistical establishment of a linear relationship between concentration of the analyte and the instrument response using at least six points. This regression analysis (of reading y on concentration x) yields a correlation coefficient r as a measure for the fit of the points to a straight line (by means of Least Squares). Warning. Some instruments can be calibrated with only one or two standards. Linearity is then implied but may not necessarily be true. It is useful to check this with more standards. Regression analysis was introduced in Section 6.4.4 and the construction of a calibration graph was given as an example. The same example is taken up here (and repeated in part) but focused somewhat more on the application. We saw that a linear calibration graph takes the general form:
y = bx + a (6.18; 7.1)

where: a = intercept of the line with the y-axis b = slope (tangent) Ideally, the intercept a is zero. Namely, when the analyte is absent no response of the instrument is to be expected. However, because of interactions, interferences, noise, contaminations and other sources of bias, this is seldom the case. Therefore, a can be considered as the signal of the blank of the standard series. The slope b is a measure for the sensitivity of the procedure; the steeper the slope, the more sensitive the procedure, or: the stronger the instrument response on y i to a concentration change on x (see also Section 7.5.3). The correlation coefficient r can be calculated by:

(6.19;7.2)

where x 1= concentrations of standards x = mean of concentrations of standards y 1= instrument response to standards y = mean of instrument responses to standards The line parameters b and a are calculated with the following equations:
(6.20;7.3)

and
a = y - bx (6.21;7.4)

Example of calibration graph As an example, we take the same calibration graph as discussed in Section 6.4.4.1, (Fig. 6-4): a standard series of P (0-1.0 mg/L) for the spectrophotometric determination of phosphate in a Bray-I extract ("available P"), reading in absorbance units. The data and calculated terms needed to determine the parameters of the calibration graph were given in Table 6-5. The calculations can be done on a (programmed) calculator or more conveniently on a PC using a home-made program or, even more conveniently, using an available regression program. The calculations yield the equation of the calibration line (plotted in Fig. 7-1):
y = 0.626x + 0.037 (6.22; 7.5)

with a correlation coefficient r = 0.997 . As stated previously (6.4.3.1), such high values are common for calibration graphs. When the value is not close to 1 (say, below 0.98) this must be taken as a warning and it might then be advisable to repeat or review the procedure. Errors may have been made (e.g. in pipetting) or the used range of the graph may not be linear. Therefore, to make sure, the calibration graph should always be plotted, either on paper or on computer monitor. Fig. 7-1. Calibration graph plotted from data of Table 6-5.

If linearity is in doubt the following test may be applied. Determine for two or three of the highest calibration points the relative deviation of the measured yvalue from the calculated line:
(7.6)

- If the deviations are < 5% the curve can be accepted as linear. - If a deviation > 5% then the range is decreased by dropping the highest concentration. - Recalculate the calibration line by linear regression. - Repeat this test procedure until the deviations < 5%. When, as an exercise, this test is applied to the calibration curve of Fig. 7-1 (data in Table 6-3) it appears that the deviations of the three highest points are < 5%, hence the line is sufficiently linear. During calculation of the line, the maximum number of decimals is used, rounding off to the last significant figure is done at the end (see instruction for rounding off in Section 8.2). Once the calibration graph is established, its use is simple: for each y value measured for a test sample (the "unknown") the corresponding concentration x can be determined either by reading from the graph or by calculation using Equation (7.1), or x is automatically produced by the instrument.

7.2.3 Error due to the regression line

The "fitting" of the calibration graph is necessary because the actual response points y i, composing the line usually do not fall exactly on the line. Hence, random errors are implied. This is expressed by an uncertainty about the slope and intercept b and a defining the graph. A discussion of this uncertainty is given. It was explained there that the error is expressed by s y , the "standard error of the y-estimate" (see Eq. 6.23, a parameter automatically calculated by most regression computer programs. This uncertainty about the -values (the fitted y -values) is transferred to the corresponding concentrations of the unknowns on the x-axis by the calculation using Eq. (7.1) and can be expressed by the standard deviation of the obtained x-value. The exact calculation is rather complex but a workable approximation can be calculated with:
(7.7)

Example For each value of the standards x the corresponding y is calculated with Eq. (7.5):
(7.8)

Then, s y is calculated using Eq. (6.23) or by computer:

Then, using Eq. (7.7):

Now, the confidence limits of the found results xf can be calculated with Eq. (6.9):
x f t.sx (7.9)

For a two-sided interval and 95% confidence: ttab = 2.78 (see Appendix 1, df = n -2=4). Hence all results in this example can be expressed as: Xf 0.08 mg/L Thus, for instance, the result of a reading y = 0.22 and using Eq. (7.5) to calculate xf = 0.29, can be reported as 0.29 0.08 mg/L. (See also Note 2 below.) The used s x value can only be approximate as it is taken constant here whereas in reality this is usually not the case. Yet, in practice, such an approximate estimation of the error may suffice. The general rule is that the measured signal is most precise (least standard deviation) near the centroid of the calibration graph (see Fig. 6-4). The confidence limits can be narrowed by increasing the number of calibration points. Therefore, the reverse is also true: with fewer calibration points the confidence limits of the measurements become wider. Sometimes only two or three points are used. This then usually concerns the checking and restoring of previously established calibration graphs including those in the microprocessor or computer of instruments. In such cases it is advisable to check the graph regularly with more standards. Make a record of this in the file or journal of the method.

Note 1. Where the determination of the analyte is part of a procedure with several steps, the error in precision due to this reading is added to the errors of the other steps and as such included in the total precision error of the whole procedure. The latter is the most useful practical estimate of confidence when reporting results. As discussed in Section 6.3.4 a convenient way to do this is by using Equations (6.8) or (6.9) with the mean and standard deviation obtained from several replicate determinations (n> 10) carried out on control samples or, if available, taken from the control charts (see 8.3.2: Control Chart of the Mean). Most generally, the 95% confidence for single values x of test samples is expressed by Equation (6.10):
x2s (6.10; 7.10)

where s is the standard deviation of the mentioned large number of replicate determinations. Note 2. The confidence interval of 0.08 mg/L in the present example is clearly not satisfactory and calls for inspection of the procedure. Particularly the blank seems to be (much) too high. This illustrates the usefulness of plotting the graph and calculating the parameters. Other traps to catch this error are the Control Chart of the Blank and, of course, the technician's experience.

7.2.4 Independent standards


It cannot be overemphasized that for QC a calibration should always include measurement of an independent standard or calibration verification standard at about the middle of the calibration range. If the result of this measurement deviates alarmingly from the correct or expected value (say > 5%), then inspection is indicated. Such an independent standard can be obtained in several ways. Most usually it is prepared from pure chemicals by another person than the one who prepared the actual standards. Obviously, it should never be derived from the same stock or source as the actual standards. If necessary, a bottle from another laboratory could be borrowed. In addition, when new standards are prepared, the remainder of the old ones always have to be measured as a mutual check (include this in the SOP for the preparation of standards!).

7.2.5 Measuring a batch


After calibration of the instrument for the analyte, a batch of test samples is measured. Ideally, the response of the instrument should not change during measurement (drift or shift). In practice this is usually the case for only a limited period of time or number of measurements and regular recalibration is necessary. The frequency of recalibration during measurement varies widely depending on technique, instrument, analyte, solvent, temperature and humidity. In general, emission and atomizing techniques (AAS, ICP) are more sensitive to drift (or even sudden shift: by clogging) than colorimetric techniques. Also, the techniques of recalibration and possible subsequent action vary widely. The following two types are commonly practised. 1. Step-wise correction or interval correction

After calibration, at fixed places or intervals (after every 10, 15, 20, or more, test samples) a standard is measured. For this, often a standard near the middle of the working range is used (continuing calibration standard). When the drift is within acceptable limits, the measurement is continued. If the drift is unacceptable, the instrument is recalibrated ("resloped") and the previous interval of samples remeasured before continuing with the next interval. The extent of the "acceptable" drift depends on the kind of analysis but in soil and plant analysis usually does not exceed 5%. This procedure is very suitable for manual operation of measurements. When automatic sample changers are used, various options for recalibration and repeating intervals or whole batches are possible. 2. Linear correction or correction by interpolation Here, too, standards are measured at intervals, usually together with a blank ("drift and wash") and possible changes are processed by the computer software which converts the past readings of the batch to the original calibration. Only in case of serious mishap are batches or intervals repeated. A disadvantage of this procedure is that drift is taken to be linear whereas this may not be so. Autoanalyzers, ICP and AAS with automatic sample changers often employ variants of this type of procedure. At present, the development of instrument software experiences a mushroom growth. Many new fancy features with respect to resloping, correction of carryover, post-batch dilution and repeating, are being introduced by manufacturers. Running ahead of this, many laboratories have developed their own interface software programs meeting their individual demands.

7.3 Blanks and Detection limit


7.3.1 Blanks 7.3.2 Detection limit

7.3.1 Blanks
A blank or blank determination is an analysis of a sample without the analyte or attribute, or an analysis without a sample, i.e. going through all steps of the procedure with the reagents only. The latter type is the most common as samples without the analyte or attribute are often not available or do not exist. Another type of blank is the one used for calibration of instruments as discussed in the previous sections. Thus, we may have two types of blank within one analytical method or system: - a blank for the whole method or system and - a blank for analytical subprocedures (measurements) as part of the whole procedure or system. For instance, in the cation exchange capacity (CEC) determination of soils with the percolation method, two method or system blanks are included in each batch: two percolation tubes with cotton wool or filter pulp and sand or celite, but without sample. For the determination of the index cation (NH4 by colorimetry or

Na by flame emission spectroscopy) a blank is included in the determination of the calibration graph. If NH4 is determined by distillation and subsequent titration, a blank titration is carried out for correction of test sample readings. The proper analysis of blanks is very important because: 1. In many analyses sample results are calculated by subtracting blank readings from sample readings. 2. Blank readings can be excellent monitors in quality control of reagents, analytical processes, and proficiency. 3. They can be used to estimate several types of method detection limits. For blanks the same rule applies as for replicate analyses: the larger the number, the greater the confidence in the mean. The widely accepted rule in routine analysis is that each batch should include at least two blanks. For special studies where individual results are critical, more blanks per batch may be required (up to eight). For quality control, Control Charts are made of blank readings identically to those of control samples. The between-batch variability of the blank is expressed by the standard deviation calculated from the Control Chart of the Mean of Blanks, the precision can be estimated from thge Control Chart of the Range of Duplicates of Blanks. The construction and use of control charts are discussed in detail in 8.3. One of the main control rules of the control charts, for instance, prescribes that a blank value beyond the mean blank value plus 3 the standard deviation of this mean (i.e. beyond the Action Limit) must be rejected and the batch be repeated, possibly with fresh reagents. In many laboratories, no control charts are made for blanks. Sometimes, analysts argue that 'there is never a problem with my blank, the reading is always close to zero'. Admittedly, some analyses are more prone to blank errors than others. This, however, is not a valid argument for not keeping control charts. They are made to monitor procedures and to alarm when these are out of control (shift) or tend to become out of control (drift). This can happen in any procedure in any laboratory at any time. From the foregoing discussion it will be clear that signals of blank analyses generally are not zero. In fact, blanks may found to be negative. This may point to an error in the procedure: e.g. for the zeroing of the instrument an incorrect or a contaminated solution was used or the calibration graph was not linear. It may also be due to the matrix of the solution (e.g. extractant), and is t en often h unavoidable. For convenience, some analysts practice "forcing the blank to zero" by adjusting the instrument. Some instruments even invite or compel analysts to do so. This is equivalent to subtracting the blank value from the values of the standards before plotting the calibration graph. From the standpoint of Quality Control this practice must be discouraged. If zeroing of the instrument is necessary, the use of pure water for this is preferred. However, such general considerations may be overruled by specific instrument or method instructions. This is becoming more and more common practice with modem sophisticated hi-tech instruments. Whatever the case, a decision on how to deal with blanks must made for each procedure and laid down in the SOP concerned.

7.3.2 Detection limit


In environmental analysis and in the analysis of trace elements there is a tendency to accurately measure low contents of analytes. Modem equipment

offer excellent possibilities for this. For proper judgement (validation) and selection of a procedure or instrument it is important to have information about the lower limits at which analytes can be detected or determined with sufficient confidence. Several concepts and terms are used e.g., detection limit, lower limit of detection (LLD), method detection limit (MDL). The latter applies to a whole method or system, whereas the two former apply to measurements as part of a method. Note: In analytical chemistry, "lower limit of detection" is often confused with "sensitivity" (see 7.5.3). Although various definitions can be found, the most widely accepted definition of the detection limit seems to be: 'the concentration of the analyte giving a signal equal to the blank plus 3 the standard deviation of the blank'. Because in the calculation of analytical results the value of the blank is subtracted (or the blank is forced to zero) the detection limit can be written as:
LLD, MDL = 3 sbl (7.11)

At this limit it is 93% certain that the signal is not due to the blank but that the method has detected the presence of the analyte (this does not mean that below this limit the analyte is absent!). Obviously, although generally accepted, this is an arbitrary limit and in some cases the 7% uncertainty may be too high (for 5% uncertainty the LLD =3.3 s bl). Moreover, the precision in that concentration range is often relatively low and the LLD must be regarded as a qualitative limit. For some purposes, therefore, a more elevated "limit of determination" or "limit of quantification" (LLQ) is defined as
LLQ = 2 LLD = 6 sbl (7.12)

or sometimes as
LLQ = 10 sbl (7.13)

Thus, if one needs to know or report these limits of the analysis as quality characteristics, the mean of the blanks and the corresponding standard deviation must be determined (validation). The s bl can be obtained by running a statistically sufficient number of blank determinations (usually a minimum of 10, and not excluding outliers). In fact, this is an assessment of the "noise" of a determination. Note: Noise is defined as the 'difference between the maximum and minimum values of the signal in the absence of the analyte measured during two minutes' (ox otherwise according to instrument instruction). The noise of several instrumental measurements can be displayed by using a recorder (e.g. FES, AAS, ICP, IR, GC, HPLC, XRFS). Although this is not often used to actually determine the detection limit, it is used to determine the signal-to-noise ratio (a validation parameter not discussed here) and is particularly useful to monitor noise in case of trouble shooting (e.g. suspected power fluctuations). If the analysis concerns a one-batch exercise 4 to 8 blanks are run in this batch. If it concerns an MDL as a validation characteristic of a test procedure used for multiple batches in the laboratory such as a routine analysis, the blank data are collected from different batches, e.g. the means of duplicates from the control charts.

For the determination of the LLD of measurements where a calibration graph is used, such replicate blank determinations are not necessary since the value of the blank as well as the standard deviation result directly from the regression analysis (see Section 7.2.3 and Example 2 below). Examples 1. Determination of the Method Detection Limit (MDL) of a Kjeldahl-N determination in soils Table 7-1 gives the data obtained for the blanks (means of duplicates) in 15 successive batches of a micro-Kjeldahl N determination in soil samples. Reported are the millilitres 0.01 M HCl necessary to titrate the ammonia distillate and the conversion to results in mg N by: reading 0.01 14. Table 7-1. Blank data of 15 batches of a Kjeldahl-N determination in soils for the calculation of the Method Detection Limit.
ml HCl 0.12 0.16 0.11 0.15 0.09 0.14 0.12 0.17 0.14 0.20 0.16 0.22 0.14 0.11 0.15 Mean blank: s bl: mg N 0.0161 0.0217 0.0154 0.0203 0.0126 0.0189 0.0161 0.0238 0.0189 0.0273 0.0217 0.0308 0.0189 0.0154 0.0203 0.0199 0.0048

MDL = 3 sbl =0.014 mg N The MDL reported in this way is an absolute value. Results are usually reported as relative figures such as % or mg/kg (ppm). In the present case, if 1 g of

sample is routinely used, then the MDL would be 0.014 mg/g or 14 mg/kg or 0.0014%. Note that if one would use only 0.5 g of sample (e.g. because of a high N content) the MDL as a relative figure is doubled! When results are obtained below the MDL of this example they must reported as: '<14 mg/kg' or '< 0.0014%'. Reporting '0 %' or '0.0 %' may be acceptable for practical purposes, but may be interpreted as the element being absent, which is not justified. Note 1. There are no strict rules for reporting figures below the LLD or LLQ. Most important is that data can be correctly interpreted and used. For this reason uncertainties (confidence limits) and detection limits should be known and reported to clients or users (if only upon request). The advantage of using the " <" sign for values below the LLD or LLQ is that the value 0 (zero) and negative values can be avoided as they are usually either impossible or improbable. A disadvantage of the " <" sign is that it is a non-numerical character and not suitable in spreadsheet programs for further calculation and manipulation. In such cases the actually found value will be required, but then the inherent confidence restrictions should be known to the user. Note 2. Because a normal distribution of data is assumed it can statistically be expected that zero and negative values for analytical results occur when blank values are subtracted from test values equal to or lower than the blank. Clearly, only in few cases are negative values possible (e.g. for adsorption) but for concentrations such values should normally not be reported. Exceptions to this rule are studies involving surveys of attributes or effects. Then it might be necessary to report the actually obtained low results as otherwise the mean of the survey would be biased. 2. Lower Limit of Detection derived from a calibration graph We use the calibration graph of Figure 7-1. Then, noting that s bl = s x = 0.6097 and using Equation (7.11) we obtain: LLD = 30.6097 = 1.829 mg/L. It is noteworthy that "forcing the blank to zero" does not affect the Lower Limit of Detection. Although a (= y b, see Fig. 7-1) may become zero, the uncertainty s y of the calibration graph, and thus of s x and sbl, is not changed by this: the only change is that the "forced" calibration line has moved up and now runs through the intersection of the axes (parallel to the "original" line).

7.4 Types of sample material


7.4.1 Certified reference material (CRM) 7.4.2 Reference material (RM) 7.4.3 Control sample 7.4.4 Test sample 7.4.5 Spiked sample 7.4.6 Blind sample 7.4.7 Sequence-control sample

Although several terms for different sample types have already freely been used in the previous sections, it seems appropriate to define the various types before the major Quality Control operations are discussed.

7.4.1 Certified reference material (CRM)


A primary reference material or substance, accompanied by a certificate, one or more of whose property values are accurately determined by a number of selected laboratories (with a stated method), and for which each certified value is accompanied by an uncertainty at a stated level of confidence. These are usually very expensive materials and, particularly for soils, hard to come by or not available. For the availability a computerized databank containing information on about 10,000 reference materials can be consulted (COMAR, see Appendix 4).

7.4.2 Reference material (RM)


A secondary reference material or substance, one or more of whose property values are accurately determined by a number of laboratories (with a stated method), and which values are accompanied by an uncertainty at a stated level of confidence. The origin of the material and the data should be traceable. In soil and plant analysis RMs are very important since for many analytes and attributes certified reference materials (CRMs) are not (yet) available. For certain properties a "true" value cannot even be established as the result is always method-dependent, e.g. CEC, and particle-size distribution of soil material. A very useful source for RMs are interlaboratory (round robin) sample and data exchange programmes. The material sent around is analyzed by a number of laboratories and the resulting data offer an excellent reference base, particularly if somehow there is a link with a primary reference material. Since this is often not the case, the data must be handled with care: it may well be that the mean or median value of 50 or more laboratories is "wrong" (e.g. because most use a method with an inadequate digestion step). In some cases different levels of analyte may be imitated by spiking a sample with the analyte (see 7.4.5). However, this is certainly not always possible (e.g. CEC, exchangeable cations, pH, particle-size distribution).

7.4.3 Control sample


An in-house reference sample for which one or more property values have been established by the user laboratory, possibly in collaboration with other laboratories. This is the material a laboratory needs to prepare for second-line (internal) control in each batch and the obtained results of which are plotted on Control Charts. The sample should be sufficiently stable and homogeneous for the properties concerned. The preparation of control samples is discussed in Chapter 8.

7.4.4 Test sample


The material to be analyzed, the "unknown".

7.4.5 Spiked sample


A test material with a known addition of analyte.

The sample is analyzed with and without the spike to test recovery (see 7.5.6). It should be a realistic surrogate with respect to matrix and concentration. The mixture should be well homogenized. The requirement "realistic surrogate" is the main problem with spikes. Often the analyte cannot be integrated in the sample in the same manner as the original analyte, and then treatments such as digestion or extraction may not necessarily reflect the behaviour of real samples.

7.4.6 Blind sample


A sample with known content of the analyte. This sample is inserted by the Head of Laboratory or the Quality Officer in batches at places and times unknown to the analyst. The frequency may vary but as an indication one sample in every 10 batches is given. Various types of sample material may serve as blind samples such as control samples or sufficiently large leftovers of test samples (analyzed several times). In case of water analysis a solution of the pure analyte, or combination of analytes, may do. Essential is that the analyst is aware of the possible presence of a blind sample but that he does not recognize the material as such. Insertion of blind samples requires some attention regarding the administration and camouflaging. The protocol will depend on the organization of the sample and data stream in the laboratory.

7.4.7 Sequence-control sample


A sample with an extreme content of the analyte (but falling within the working range of the method). It is inserted at random in a batch to verify the correct order of samples. This is particularly useful for long batches in automated analyses. Very effective is the combination of two such samples: one with a high and one with a low analyte content.

7.5 Validation of own procedures


7.5.1 Trueness (accuracy), bias 7.5.2 Precision 7.5.3 Sensitivity 7.5.4 Working range 7.5.5 Selectivity and specificity 7.5.6 Recovery 7.5.7 Ruggedness, robustness 7.5.8 Interferences 7.5.9 Practicability 7.5.10 Validation report Validation is the process of determining the performance characteristics of a method/procedure or process. It is a prerequisite for judgement of the suitability of produced analytical data for the intended use. This implies that a method may be valid in one situation and invalid in another. Consequently, the requirements for data may, or rather must, decide which method is to be used. When this is illconsidered, the analysis can be unnecessarily accurate (and expensive), inadequate if the method is less accurate than required, or useless if the accuracy is unknown.

Two main types of validation may be distinguished: 1. Validation of standard procedures. The validation of new or existing methods or procedures intended to be used in many laboratories, including procedures (to be) accepted by national or international standardization organizations. 2. Validation of own procedures. The in-house validation of methods or procedures by individual user-laboratories. The first involves an interlaboratory programme of testing the method by a number ( 8) of selected renown laboratories according to a protocol issued to all participants. The second involves an in-house testing of a procedure to establish its performance characteristics or more specifically its suitability for a purpose. Since the former is a specialist t sk, usually (but not exclusively) a performed by standardization organizations, the present discussion will be restricted to the second type of validation which concerns every laboratory. Validation is not only relevant when non-standard procedures are used but just as well when validated standard procedures are used (to what extent does the laboratory meet the standard validation?) and even more so when variants of standard procedures are introduced. Many laboratories use their own versions of well-established methods or change a procedure for reasons of efficiency or convenience. Fundamentally, any change in a procedure (e.g. sample size, liquid:solid ratio in extractions, shaking time) may affect the performance characteristics and should be validated. For instance, in Section 7.3.2 we noticed that halving the sample size results in doubling the Lower Limit of Detection. Thus, inherent in generating quality analytical data is to support these with a quantification of the parameters of confidence. As such it is part of the quality control. To specify the performance characteristics of a procedure, a selection (so not necessarily all) of the following basic parameters is determined: - Trueness (accuracy), Bias - Precision - Recovery - Sensitivity - Specificity and selectivity - Working range (including MDL) - Interferences - Ruggedness or robustness - Practicability Before validation can be carried out it is essential that the detailed procedure is available as a SOP.

7.5.1 Trueness (accuracy), bias


One of the first characteristics one would like to know about a method is whether the results reflect the "true" value for the analyte or property. And, if not, can the (un)trueness or bias be quantified and possibly corrected for? There are several ways to find this out but essentially they are all based on the same principle which is the use of an outside reference, directly or indirectly.

The direct method is by carrying out replicate analyses (n 10) with the method on a (certified) reference sample with a known content of the analyte. The indirect method is by comparing the results of the method with those of a reference method (or otherwise generally accepted method) both applied to the same sample(s). Another indirect way to verify bias is by having (some) samples analyzed by another laboratory and by participation in interlaboratory exchange programmes. This will be discussed in Chapter 9. It should be noted that the trueness of an analytical result may be sensitive to varying conditions (level of analyte, matrix, extract, temperature, etc.). If a method is applied to a wide range of materials, for proper validation different samples at different levels of analyte should be used. Statistical comparison of results can be done in several ways some of which were described in Section 6.4. Numerically, the trueness (often less appropriately referred to as accuracy) can be expressed using the equation:
7.14

where x = mean of test results obtained for reference sample = "true" value given for reference sample Thus, the best trueness we can get is 100%. Bias, more commonly used than trueness, can be expressed as an absolute value by:
bias = x - (7.15)

or as a relative value by:


(7.16)

Thus, the best bias we can get is 0 (in units of the analyte) or 0 % respectively. Example The Cu content of a reference sample is 34.0 2.7 mg/kg (2.7 = s, n=12). The results of 15 replicates with the laboratory's own method are the following: 38.0; 34.6; 29.1; 27.8; 40.4; 33.1; 40.9; 28.5; 36.1; 26.8; 30.6; 24.3; 31.6; 22.3; 29.9 mg/kg. With Equation (6.1) we calculate: x = 31.6. Using Equation (7.14) the trueness is (31.6/34.0)100% = 93%. Using Equation (7.16), the bias is (31.6 34.0)100% / 34.0 = - 7%. These calculations suggests a systematic error. To see if this error is statistically significant a t-test can be done. For this, with Equation (6.2) we first calculate s = 5.6. The F-test (see 6.4.2 and 7.5.2) indicates a significant difference in standard deviation and we have to use the Cochran variant of the t-test (see 6.4.3). Using Equation (6.16) we find tcal = 1.46, and with Eq. (6.17) the critical value ttab* = 2.16 indicating that the results obtained by the laboratory are not significantly different from the reference value (with 95% confidence). Although a laboratory could be satisfied with this result, the fact remains that the mean of the test results is not equal to the "true" value but somewhat lower. As discussed in Sections 6.4.1 and 6.4.3 the one-sided t-test can be used to test if

this result is statistically on one side (lower or higher) of the reference value. In the present case the one-sided critical value is 1.77 (see Appendix 1) which also exceeds the calculated value of 1.46 indicating that the laboratory mean is not systematically lower than the reference value (with 95% confidence). At first sight a bias of -7% does not seem to be insignificant. In this case, however, the wide spread of the own data causes the uncertainty about this. If the standard deviation of the results had been the same as that of the reference sample then, using Equations (6.13) and (6.14), tcal were 2.58 and with ttab = 2.06 (App. 1) the difference would have been significant according to the two-sided t-test, and with ttab =1.71 significantly lower according to the one-sided t-test (at 95% confidence).

7.5.2 Precision
7.5.2.1 Reproducibility 7.5.2.2 Repeatability 7.5.2.3 Within-laboratory reproducibility Replicate analyses performed on a reference sample yielding a mean to determine trueness or bias, as described above, also yield a standard deviation of the mean as a measure for precision. However, for precision alone also control samples and even test samples can be used. The statistical test for comparison is done with the F-test which compares the obtained standard deviation with the standard deviation given for the reference sample (in fact, the variances are compared: Eq. 6.11). Numerically, precision is either expressed by the absolute value of the standard deviation or, more universally, by the relative standard deviation (RSD) or coefficient of variation (CV) (see Equations 6.5 and 6.6,).
(7.17

where x = mean of test results obtained for reference sample s = standard deviation of x If the attained precision is worse than given for the reference sample then it can still be decided that the performance is acceptable for the purpose (which has to be reported as such), otherwise it has to be investigated how the performance can be improved. Like the bias, precision will not necessarily be the same at different concentration of the analyte or in different kinds of materials. Comparison of precision at different levels of analyte can be done with the F-test: if the variances at a few different levels are similar, then precision is assumed to be constant over the range. Example The same example as above for bias is used. The standard deviation of the laboratory is 5.6 mg/kg which, according to Eq. (7.17), corresponds with a precision of (5.6/31.6)100% = 18%. (The precision of the reference sample can similarly be calculated as about 8%).

According to Equation (6.11) the calculated F-value is:

the critical value is 2.47 (App. 2, two-sided, df1 = 14, df2 =11) hence, the null hypothesis that the two standard deviations belong to the same population is rejected: there is a significant difference in precision (at 95% confidence level). Types of precision The above description of precision leaves some uncertainty about the actual execution of its determination. Because particularly precision is sensitive to the way it is determined some specific types of precision are distinguished and, therefore, it should always be reported what type is involved.

7.5.2.1 Reproducibility
The measure of agreement between results obtained with the same method on identical test or reference material under different conditions (execution by different persons, in different laboratories, with different equipment and at different times). The measure of reproducibility R is the standard deviation of these results s R, and for a not too small number of data (n 8) R is defined by (with 95% confidence):
R = 2.8 sR (7.18)

(where 2.8 = 2 and is derived from the normal or gaussian distribution; ISO 5725). Thus, reproducibility is a measure of the spread of results when a sample is analyzed by different laboratories. If a method is sensitive to different ways of execution or conditions (low robustness, see 7.5.7), then the reproducibility will reflect this. This parameter can obviously not be verified in daily practice. For that purpose the next two parameters are used (repeatability and within-laboratory reproducibility).

7.5.2.2 Repeatability
The measure of agreement between results obtained with the same method on identical test or reference material under the same conditions (job done by one person, in the same laboratory, with the same equipment, at the same time or with only a short time interval). Thus, this is the best precision a laboratory can obtain: the within-batch precision. The measure for the repeatability r is the standard deviation of these results s r, and for a not too small number of data ( 10) r is defined by (with 95% confidence):
r = 2.8 sr (7.19)

7.5.2.3 Within-laboratory reproducibility


The measure of agreement between results obtained with the same method on identical test material under different conditions (execution by different persons, with the same or different equipment, in the same laboratory, at different times). This is a more realistic type of precision for a method over a longer span of time when conditions are more variable than defined for repeatability. The measure is the standard deviation of these results s L (also called betweenbatch precision). The within-laboratory reproducibility RL is calculated by:
RL = 2.8 sL (7.20)

The between-batch precision can be estimated in three different ways: 1. As the standard deviation of a large number (n 50) of duplicate determinations carried out by two analysts:
(7.21)

where s i, = the standard deviation of each pair of duplicates k = number of pairs of duplicates di = difference between duplicates within each pair 2. Empirically as 1.6 s r. Then: RL = 2.8 1.6 s r or:
RL = 1.6 r (7.22)

where r is the repeatability as defined above. 3. The most practical and realistic expression of the within-laboratory reproducibility is the one based on the standard deviation obtained for control samples during routine work. The advantage is that no extra work is involved: control samples are analyzed in each batch, and the within-laboratory standard deviation is calculated each time a control chart is completed (or sooner if desired, say after 10 batches). The calculation is here:
RL = 2.8 scc (7.23)

where s cc is the standard deviation obtained from a Control Chart (see 8.3.2). Clearly, the above three RL values are not identical and thus, whenever the within-laboratory reproducibility is reported, the way by which it is obtained should always be stated. Note: Naturally, instead or reporting the derived validation parameters for precision R, r, or RL, one may prefer to report their primary measure: the standard deviation concerned.

7.5.3 Sensitivity
This is a measure for the response y of the instrument or of a whole method to the concentration C of the analyte or property, e.g. the slope of the analytical calibration graph (see Section 7.2.2). It is the value that is required to quantify the analyte on basis of the analytical signal. The sensitivity for the analyte in the final sample extract may not necessarily be equal to the sensitivity for the analyte in a simple standard solution. Matrix effects may cause improper calibration of the measuring Step of the analytical method. As observed earlier for calibration graphs, the sensitivity may not be constant over a long range. It usually decreases at higher concentrations by saturation of the signal. This limits the working range (see next Section 7.5.4). Some of the most typical situations are exemplified in Figure 7-2. Fig. 7-2. Examples of some typical response graphs. 1. Constant sensitivity. 2. Sensitivity constant over lower-range, then decreasing. 3. Sensitivity decreasing over whole range. (See also 7.5.4.)

In general, on every point of the response graph the sensitivity can be expressed by
(7.24)

The dimension of S depends on the dimensions of y and C. In atomic absorption, for example, y is expressed in absorbance units and C in mg/L. For pH and ion-selective electrodes the response of the electrode is expressed in mV and the concentration in mg/L or moles (plotted on log scale). Often, for convenience, the signal is conversed and amplified to a direct reading in arbitrary units, e.g. concentration. However, for proper expression of the sensitivity, this derived response should be converted back to the direct response. In practice, for instance, this is simply done by making a calibration graph in the absorbance mode of the instrument as exemplified in Figure 7-1, where slope b is the sensitivity of the P measurement on the spectrophotometer. If measured in the absorption (or transmission) mode, plotting should be done with a logarithmic y-axis.

7.5.4 Working range


For most analytical methods the working range is known from previous experience. When introducing a new method or measuring technique this range may have to be determined. This range can be determined during validation by attempting to span a (too) wide range. This can for instance be done by using

several sample sizes, liquid:sample ratios, or by spiking samples (see 7.5.6, Recovery). This practice is particularly important to determine the upper limit of the working range (the lower limit of a working range corresponds with the Method Detection Limit and was discussed in Section 7.3.2). The upper limit is often determined by such factors as saturation of the extract (e.g. the "free" iron or gypsum determinations) or by depletion of a solution in case of adsorption procedures (e.g. phosphate adsorption; cobaltihexamine or silver thiourea adsorption in single-extraction CEC methods). In such cases the liquid:sample ratio has to be adapted. To determine the measuring range of solutions the following procedure can be applied: - Prepare a standard solution of the analyte in the relevant matrix (e.g. extractant) at a concentration beyond the highest expected concentration. - Measure this solution and determine the instrument response. - Dilute this standard solution 10 with the matrix solution and measure again. - Repeat dilution and measuring until the instrument gives no response. - Plot the response vs. the concentration. - Estimate the useful part of the response graph. (If the dilution steps are too large to obtain a reliable graph, they need to be reduced, e.g. 5). In Figure 7 the useful parts of graphs 1 and 2 are obviously the linear parts -2 (and for graph 2 perhaps to concentration 8 if necessary). Sometimes a built-in curve corrector for the linearization of curved calibration plots can extend the range of application (e.g. in AAS). Graph 3 has no linear part but must and can still be used. A logarithmic plotting may be considered and in some cases by non-linear (polynomial) regression an equation may be calculated. It has to be decided on practical grounds what concentration can be accepted until the decreasing sensitivity renders the method inappropriate (with the knowledge that flat or even downward bending ranges are useless in any case).

7.5.5 Selectivity and specificity


The measurement of an analyte may be disturbed by the presence of other components. The measurement is then non-specific for the analyte under investigation. An analytical method is "fully specific" when it gives an analytical signal exclusively for one particular component, but is "dead" for all other components in the sample, e.g. when a reagent forms a coloured complex with only one analyte. A method is "fully selective" when it produces correct analytical results for various components of a mixture without any mutual interaction of the components, e.g. when a reagent forms several coloured complexes with components in the matrix but with a different colour for each component. A selective method is composed of a series of specific measurements. Mutual influences are common in analytical techniques but can often easily be overcome. An example is ionization interference reducing the specificity in flame

spectrometric techniques (FES, AAS). The selectivity is no problem as the useful spectral lines can be selected exactly with a monochromator or filters. The mutual interference can be suppressed by adding an excess of an easily ionizable element, such as cesium, which maintains the electron concentration in the flame constant. In chromatographic techniques (GC, HPLC) specificity is sometimes a problem in the analysis of complex compounds. In the validation report, selectivity and specificity are usually described rather than quantitatively expressed.

7.5.6 Recovery
To determine the effectiveness of a method (and also of the working range), recovery experiments can be carried out. Recovery can be defined as the 'fraction of the analyte determined after addition of a known amount of the analyte to a sample'. In practice, control samples are most commonly used for spiking. The sample as well as the spikes are analyzed at least 10 times, the results averaged and the relative standard deviation (RSD) calculated. For inhouse validation the repeatability (replicates in one batch, see 7.5.2.2) is determined, whereas for quality control the within-laboratory reproducibility (replicates in different batches, see 7.5.2.3) is determined and the data recorded on Control Charts. The concentration level of the spikes depend on the purpose: for routine control work the level(s) will largely correspond with those of the test samples (recoveries at different levels may differ): a concentration midway the working range is a convenient choice. For the determination of a working range a wide range may be necessary, at least to start with, see 7.5.4). An example is the addition of ammonium sulphate in the Kjeldahl nitrogen determination. Recovery tests may reveal a significant bias in the method used and may prompt a correction factor to be applied to the analytical results. The recovery is calculated with:
(7.25)

where x s = mean result of spiked samples x = mean result of unspiked samples x add = amount of added analyte If a blank (sample) is used for spiking then the mean result of the unspiked sample will generally be close to zero. In fact, such replicate analyses could be used to determine or verify the method detection limit (MDL, see 7.3.2). As has been mentioned before (Section 7.4.5) the recovery obtained with a spike may not be the same as that obtained with real samples since the analyte may not be integrated in the spiked sample in the same manner as in real samples. Also, the form of the analyte with which the spike is made may present a problem as different compounds and grain sizes representing the analyte may behave differently in an analysis.

7.5.7 Ruggedness, robustness


An analytical method is rugged or robust if results are not (very) sensitive to variations in the experimental conditions. Such conditions can be temperature, extraction or shaking time, shaking technique, pH, purity of reagents, moisture content of sample, sample size, etc. Usually, when a new method is proposed,

the ruggedness is first tested by the initiating laboratory and subsequently in an interlaboratory trial. The ruggedness test is conveniently done with the so-called "Youden and Steiner partial factorial design" where in only eight replicate analyses seven factors can be varied and analyzed. This efficient technique can also be used for within-laboratory validation. As an example the ammonium acetate CEC determination of soil will be taken. The seven factors could be for instance: A: With (+) and without (-) addition of 125 mg CaCO3 to the sample (corresponding with 5% CaCO3 content) B: Concentration of saturation solution: 1 M (+) and 0.5 M (-) NH4OAc C: Extraction time: 4 hours (-) and 8 hours (+) D: Admixture of sea-sand (or celite): with (+) and without (-) 1 teaspoon of sand E: Washing procedure: 2 (-) or 3(+) with ethanol 80% F: Concentration of washing ethanol: 70% (-) or 80% (+) G: Purity of NH4OAc: technical grade (-) and analytical grade (+) The matrix of the design looks as shown in Table 7-2. The eight subsamples are analyzed basically according to the SOP of the method. The variations in the SOP are indicated by the + or - signs denoting the high or low level, presence or absence of a factor or otherwise stated conditions to be investigated. The eight obtained analytical results are Yi,. Thus, sample (experiment) no. 1 receives all treatments A to G indicated with (+), sample no. 2 receives treatments A, B and D indicated by (+) and C, E, F and G indicated by (-), etc. Table 7-2. The partial factorial design (seven factors) for testing ruggedness of an analytical method
Factors Experiment 1 2 3 4 5 6 7 8 A + + + + B + + + + C + + + + D + + + + E + + + + F + + + + G + + + + Results Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8

The absolute effect (bias) of each factor A to G can be calculated as follows:


(7.26)

where

YA+ = sum of results Yi, where factor A has + sign (i.e. Y1, + Y2 + Y3 + Y4; n=4) YA- = sum of results Yi, where factor A has - sign (i.e. Y5 + Y6 + Y7+ Y8; n=4) The test for significance of the effect can be done in two ways: 1. With a t-test (6.4.3) using in principle the table with "two-sided" critical t values (App. 1, n=4). When clearly an effect in one direction is to be expected, the one-sided test is applicable. 2. By checking if the effect exceeds the precision of the original procedure (i.e. if the effect exceeds the noise of the procedure). Most realistic and practical in this case would be to use s cc , the withinlaboratory standard deviation taken from a control chart (see Sections 7.5.2.3 and 8.3.2). Now, the standard deviation of the mean of four measurements can be taken as (see 6.3.4), and the standard deviation of the difference between two such means (i.e. the standard deviation of the effect calculated with Eq. 7.26) as . The effect of a factor can be considered significant if it exceeds 2 the standard deviation of the procedure, i.e.
Effect >1.4 scc

.
(7.27)

Therefore, the effect is significant when: where s cc is the standard deviation of the original procedure taken from the last complete control chart. Note. Obviously, when this standard deviation is not available such as in the case of a new method, then an other type of precision has to be used, preferably the within-laboratory reproducibility (see 7.5.2). It is not always possible or desirable to vary seven factors. However, the discussed partial factorial design does not allow a reduction of factors. At most, one (imaginary) factor can be considered in advance to have a zero effect (e.g. the position of the moon). In that case, the design is the same as given in Table 7-2 but omitting factor G. For studying only three factors a design is also available. This is given in Table 7-3. Table 7-3. The partial factorial design (three factors) for testing ruggedness of an analytical method
Experiment A 1 2 3 + + Factors B + + C + + Y1 Y2 Y3 Results

Y4

The absolute effect of the factors A, B, and C can be calculated as follows:


(7.28)

where YA+ = sum of results Yi, where factor A has + sign (i.e. Y1 + Y3; n=2) YA- = sum of results Yi, where factor A has - sign (i.e. Y2 + Y4; n=2) The test for significance of the effect can be done similarly as described above for the seven-factor design, with the difference that here n = 2. If the relative effect has to be calculated (for instance for use as a correction factor) this must be done relative to the result of the original factor. Thus, in the above example of the CEC determination, if one is interested in the effect of reducing the concentration of the saturating solution (Factor B), the "reference" values are those obtained with the 1 M solution (denoted with + in column B) and the relative effect can be calculated with:
(7.29)

The confidence of the results of partial factorial experiments can be increased by running duplicates or triplicates as discussed in Section 6.3.4. This is particularly useful here since possible outliers may erroneously be interpreted as a "strong effect". Often a laboratory wants to check the influence of one factor only. Temperature is a factor which is particularly difficult to control in some laboratories or sometimes needlessly controlled at high costs simply because it is prescribed in the original method (but perhaps never properly validated). The very recently published standard procedure for determining the particle-size distribution (ISO 11277) has not been validated in an interlaboratory trial. The procedure prescribes the use of an end-over-end shaker for dispersion. If up to now a reciprocating shaker has been used and the laboratory decides to adopt the end-over-end shaker then in-house validation is indicated and a comparison with the end-over-end shaker must be made and documented. If it is decided, after all, to continue with the reciprocating shaking technique (e.g. for practical reasons), then the laboratory must be able to show the influence of this step to users of the data. Such validation must include all soil types to which the method is applied. The effect of a single factor can simply be determined by conducting a number of replicate analyses (n>. 10) with and without the factor, or at two levels of the factor, and comparing the results with the F-test and t-test (see 6.4). Such a single effect may thus be expressed in terms of bias and precision.

7.5.8 Interferences
Many analytical methods are to a greater or lesser extent susceptible to interferences of various kinds. Proper validation should include documentation of such influences. Most prominent are matrix effects which may either reduce or enhance analytical results (and are thus a form of reduced selectivity). Ideally, such interferences are quantified as bias and corrected for, but often this is a

tedious affair or even impossible. Matrix effects can be quantified by conducting replicate analyses at various levels and with various compositions of (spiked) samples or they can be nullified by imitating the test sample matrix in the standards, e.g. in X-ray fluorescence spectroscopy. However, the matrix of test samples is often unknown beforehand. A practical qualitative check in such a case is to measure the analyte at two levels of dilution: usually the signal of the analyte and of the interference are not proportional. Well-known other interferences are, for example, the dark colour of extracts in the colorimetric determination of phosphate, and in the CEC determination the presence of salts, lime, or gypsum. A colour interference may be avoided by measuring at an other wavelength (in the case of phosphate: try 880 nm). Sometimes the only way to avoid interference is to use an other method of analysis. If it is thought that an interference can be singled out and determined, it can be quantified as indicated for ruggedness in the previous section.

7.5.9 Practicability
When a new method is proposed or when there is a choice of methods for a determination, it may be useful if an indication or description of the ease or tediousness of the application is available. Usually the practicability can be derived from the detailed description of the procedure. The problems are in most cases related to the availability and maintenance of certain equipment and the required staff or skills. Also, the supply of required parts and reagents is not always assured, nor the uninterrupted supply of stable power. In some countries, for instance, high purity grades cannot always be obtained, some chemicals cannot be kept (e.g. sodium pyrophosphate in a hot climate) and even the supply of a seemingly common reagent such as ethanol can be a problem. If such limitations are known, it is useful if they are mentioned in the relevant SOPs or validation report.

7.5.10 Validation report


The results of validation tests should be recorded in a validation report from which the suitability of a method for a certain purpose can be deduced. If (legal) requirements for specific analyses are known (e.g. in the case of toxic compounds) then such information may be included. Since validation is a kind of research project the report should have a comparable format. A plan is usually initiated by the head of laboratory, drafted by the technician involved and verified by the head. The general layout of the report should include: - Parameters to be validated - Description of the procedures (with reference to relevant SOPs) - Results A model for a validation SOP is given (VAL 09-2).

7.6 Drafting an analytical procedure


For drafting an analytical procedure the general instructions for drafting SOPs as given in Chapter 2 apply. An example of an analytical procedure as it can be written in the form of a SOP is METH 006. A laboratory manual of procedures, the "cookery book", can be made by simply collecting the SOPs for all procedures in a ring binder. Because analytical procedures, more than any other

type of SOP, directly determine the product of a laboratory, some specific aspects relating to them are discussed here. As was outlined in Chapter 2, instructions in SOPs should be written in such a way that no misunderstanding or ambiguity exists as to the execution of the procedure. Thus, much of the responsibility (not all) lies with the author of the procedure. Even if the author and user are one and the same person, which should normally be the case (see 2.2), such misunderstanding may be propagated since the author usually draws on the literature or documents written by someone else. Therefore, although instructions should be as brief as possible, they should at the same time be as extensive as necessary. As an example we take the weighing of a sample, a common instruction in many analytical procedures. Such an instruction could read: 1. Weigh 5.0 g of sample into a 250 ml bottle. 2. Add 100 ml of extracting solution and close bottle. 3. Shake overnight. 4. Etc., etc. Comment 1 According to general analytical practice the amount of 5.0 g means "an amount between and including 4.95 g and 5.05 g" (4.95 weight 5.05) since less than 4.95 would round to 4.9 and more than 5.05 would round to 5.1 (note that 5.05 rounds to 5.0 and not to 5.1). Some analysts, particularly students and trainees, take the amount of 5.0 g too literally and set out on a lengthy process of adding and subtracting sample material until the balance reads "5.0" or perhaps even "5.00". Not only is this procedure tedious, the sample may become biased as particles of different size tend to segregate during this process. To prevent such an interpretation, often the prefixes "approximately", "approx." or "ca." (circa) are used, e.g. "approx. 5.0 g". As this, in turn, introduces a seeming contradiction between "5.0" (with a decimal, so quite accurate) and "approx." ('it doesn't matter all that much'), the desired accuracy must be stated: "weigh approx. 5.0 g (accuracy 0.01 g) into a 250 ml bottle". The notation 5.0 g can be replaced by 5 g when the sample size is less critical (in the present case for instance if the ratio sample: liquid is not very critical). Sometimes it may even be possible to use "weigh 3 - 5 g of sample (accuracy 0.1 g)". The accuracy needs to be stated when the actual sample weight is used in the calculation of the final result, otherwise it may be omitted. Comment 2 The "sample" needs to be specified. A convenient and correct way is to make reference to a SOP where the preparation of the sample material is described. This is the more formal version of the common practice in many laboratories where the use of the sample is implied of which the preparation is described elsewhere in the laboratory manual of analytical procedures. In any case, there should be no doubt about the sample material to be used. When other material than the usual "laboratory sample" or "test sample" is used, the preparation must be described and the nature indicated e.g., "field-moist fine earth" or "fraction > 2 mm" or "nodules". When drafting a new procedure or an own version of a standard procedure, it must be considered if the moisture content of the used sample is relevant for the final result. If so, a moisture correction factor should be part of the calculation step. In certain cases where the sample contains a considerable amount of

water (moist highly humic samples; andic material) this water will influence the soil: liquid ratio in certain extraction or equilibration procedures. Validation of such procedures is then indicated. Comment 3 The "250 ml bottle" needs to be specified also. This is usually done in the section "Apparatus and glassware" of the SOP. If, in general, materials are not specified, then it is implied that the type is unimportant for the procedure. However, in shaking procedures, the kind, size and shape of bottles may have a significant influence on the results. In addition the kind (composition) of glass is sometimes critical e.g., for the boron determination. Comment 4 To the instruction "Add 100 ml of extracting solution" apply the same considerations as discussed for the sample weighing. The accuracy needs to be specified, particularly when automatic dispensers are used. The accuracy may be implicit if the equipment to be used is stated e.g., "add 100 ml solution by graduated pipette" or "volumetric pipette" or "with a 100 ml measuring cylinder". If another means of adding the solution is preferred its accuracy should equal or exceed that of the stated equipment. Comment 5 The instruction "shake overnight" is ambiguous. It must be known that "overnight" is equivalent to "approximately 16 hrs.", namely from 5 p.m. till 9 a.m. the next morning. It is implied that this time-span is not critical but generally the deviation should not be more than, say, two hours. In case of doubt, this should be validated with a ruggedness test. More critical in many cases is the term "shake" as this can be done in many different ways. In the section "Apparatus" of the SOP the type of shaking machine is stated e.g., reciprocating shaker or end-over-end shaker. For the reciprocating shaker the instruction should include the shaking frequency (in strokes per minute), the amplitude (in mm or cm) and the position of the bottles (standing up, lying length-wise or perpendicular to the shaking direction). For an end-over-end shaker usually only the frequency or speed (in rpm) is relevant.

7.7 Research plan


All laboratories, including those destined for routine work, carry out research in some form. For many laboratories it constitutes the main activity. Research may range from a simple test of an instrument or a change in procedure, to large projects involving many aspects, several departments of an institute, much staff and money, often carried out by commission of third parties (contract research, sponsors). For any project of appreciable size, according to GLP the management of the institute must appoint a study director before the study is initiated. This person is responsible for the planning and execution of the job. He/she is responsible to a higher Inspecting Authority (IA) which may be the institute's management, the Quality Assurance Unit, the Head of Research or the like as established by the management. A study project can be subdivided into four phases: preparation, execution, reporting, filing/archiving. 1. Preparation In this phase the purpose and plan are formulated and approved by the IA. Any subsequent changes are documented and communicated to the IA. The plan must include:

- Descriptive title, purpose, and identification details Study director and further personnel Sponsor or client - Work plan with starting date and duration Materials and methods to be used Study protocol and SOPs (including statistical treatments of data) - Protocols for interim reporting and inspection Way of reporting and filing of results Authorization by the management (i.e. signature) - A work plan or subroutines can often be clarified by means of a flow diagram. Some of the most used symbols in flow diagrams for procedures in general, including analytical procedures, are given in Figure 7-3. An example of a flow sheet for a research plan is given in Fig 7-4. Fig. 7-3. Some common symbols for flow diagrams.

2. Execution of the work The work must be carried out according to the plan, protocols and SOPs. All observations must be recorded including errors and irregularities. Changes of plan have to be reported to the IA and if there are budgetary implications also to the management. The study leader must have control of and be informed about the progress of the work and, particularly in larger projects, be prepared for inspection by the IA. Fig. 7-4. Design of flow diagram for study project.

3. Reporting As soon as possible after completion of the experimental work and verification of the quality control data the results are calculated. Together with a verification statement of the IA, possibly after corrections have been made, the results can be reported. The copyright and authorship of a possible publication would have been arranged in the plan. The report should contain all information relevant for the correct interpretation of the results. To keep a report digestible, used procedures may be given in abbreviated form with reference to the original protocols or SOPs. Sometimes, relevant information turns up afterwards (e.g. calculation errors). Naturally, this should be reported, even if the results have already been used. It is useful and often rewarding if after completion of a study project an evaluation is carried out by the study team. In this way a next job may be performed better.

SOPs
VAL 09-2 - Validation of CEC determination with NH4OAc METH 006 - Determination of nitrogen in soil with micro-Kjeldahl

VAL 09-2 - Validation of CEC determination with NH4OAc


LOGO STANDARD OPERATING PROCEDURE No.: VAL 09-2 Version: 1 Page; 1 # 2 Date: 96-0919 File:

Title: Validation of CEC determination with NH4OAc (pH 7)

1 PURPOSE To determine the performance characteristics of the CEC determination with ammonium acetate (pH 7) using the mechanical extractor. The following parameters have been considered: Bias, precision, working range, ruggedness, interferences, practicability. 2 REQUIREMENTS See SOP METH 09-2 (Cation Exchange Capacity and Exchangeable Bases with ammonium acetate and mechanical extractor). 3 PROCEDURES 3.1 Analytical procedure The basic procedure followed is described in SOP METH 09-2 with variations and number of replicates as indicated below. Two Control Samples have been used: LABEX 6, a Nitisol (clay 65%, CEC 20 cmolc /kg) and LABEX 2, an Acrisol (clay 25%; CEC 7 cmolc /kg); further details of these control samples in SOP RF 031 (List of Control Samples). 3.2 Bias The CEC was determined 10 on both control samples. Reference is the mean value for the CEC obtained on these samples by 19 laboratories in an interlaboratory study. 3.3 Precision

Obtained from the replicates of 3,2, 3.4 Working range The Method Detection Limit (MDL) was calculated from 10 blank determinations. Determination of the Upper Limit is not relevant (percolates beyond calibration range are rare and can be brought within range by dilution). 3.5 Ruggedness A partial factorial design with seven factors was used. The experiments were carried out in duplicate and the factors varied are as follows:
A: B: C: D: E: F: G: With (+) and without (-) addition of 125 mg CaCO3 (corresponding with 5% CaCO3 content) Concentration of saturating solution: 1 M (+) and 0.5 M (-) NH4OAc Extraction time: 4 hours (-) and 8 hours (+) Admixture of seasand (or celite): with (+) and without (-) 1 teaspoon of sand Washing procedure: 2 (-) or 3 (+) with ethanol 80% Concentration of ethanol for washing free of salt: 70% (-) or 80% (+) Parity of NH4OAc: technical grade (-) and analytical grade (+)

3.6 Interferences Two factors particularly interfere in this determination: 1. high clay content (problems with efficiency of percolation) and 2. presence of CaCO3 (competing with saturating index cation). The first was addressed by the difference in clay content of the two samples as well as by Factor D in the ruggedness test, the second by factor A of the ruggedness test, 3.7 Practicability The method is famous for its wide application and ill-famed for its limitations. Some of the most prominent aspects in this respect are considered. 4 RESULTS As results may have to be produced as a document accompanying analytical results (e.g. on request of clients) they are presented here in a model format suiting this purpose. In the present example where two different samples have been used the results for both samples may be given on one form, or for each sample on a separate form. For practical reasons, abbreviated reports may be released omitting irrelevant information. {The full report should always be kept!)
LOGO METHOD VALIDATION FORM No.: VAL RES 09-2 Version: 1 Page: 1 # 1 Date: 96-11-23 File:

Title: Validation data CEC-NH4OAc (METH 09-2)

1 TITLE or DESCRIPTION Validation of cation exchange capacity determination with NH4OAc pH 7 method as described in VAL 09-2 dd. 96-09-19. 2 RESULTS
2.1 Bias (Accuracy): Result of calculation -with Eq. (7.14) or (7.16) of Guidelines.

(Accuracy): 2.2 Precision Repeatability: Within-lab reproducibility: 2.3 Working range: 2.4 Ruggedness: 2.5 Interferences: 2.6 Practicability: Result of calculation with Eq. (7.17) or (7.19). Result of calculation with Eq. (7.23) (if Control Charts are available). Result of calculation as examplified by Table 7-1 in Section 7.3.2 of Guidelines. Results of calculations with Eq. (7.26) or (7.29), In this case mainly drawn from Ruggedness test Special equipment necessary: mechanical extractor substantial amounts of ethanol required washing procedures not always complete, particularly in high-clay samples, requiring thorough check.

2.7 General observations: Author: QA Officer (sign.): Author: QA Officer (sign.): Sign.: Date of Expiry: Sign.: Date of Expiry:

METH 006 - Determination of nitrogen in soil with microKjeldahl


LOGO METHOD VALIDATION FORM No.: METH 006 Version: 2 Page: 1 # 1 Date: 96-0301 File:

Title: Determination of nitrogen in soil with microKjeldahl

1. SCOPE This procedure describes the determination of nitrogen with the micro-Kjeldahl technique. It is supposed to include all soil nitrogen (including adsorbed NH4+) except that in nitrates. 2. RELATED DOCUMENTS 2.1 Normative references The following standards contain provisions referred to in the text. ISO 3696 Water for analytical laboratory use. Specification and test methods, ISO 11464 Soil quality Pretreatment of samples for physico-chemical analysis.

2.2 Related SOPs


F 001 APP 066 APP 067 APP 072 RF 008 METH 002 Administration of SOPs Operation of Kjeltec 1009 digester Operation of ammonia distillation unit Operation of Autoburette ABU 13 and Titrator TTT 60 (facultative) Reagent Book Moisture content determination

3. PRINCIPLE The micro-Kjeldahl procedure is followed. The sample is digested in sulphuric acid and hydrogen peroxide with selenium as catalyst and whereby organic nitrogen is converted to ammonium sulphate. The solution is then made alkaline and ammonia is distilled. The evolved ammonia is trapped in boric acid and titrated with standard acid, 4. APPARATUS AND GLASSWARE 4.1 Digester (Kjeldahl digestion tubes in heating block) 4.2 Steam-distillation unit (Fitted to accept digestion tubes) 4.3 Burette 25 ml 5. REAGENTS Use only reagents of analytical grade and deionized or distilled water (ISO 3696). 5.1 Sulphuric acid - selenium digestion mixture. Dissolve 3.5 g selenium powder in 1 L concentrated (96%, density 1.84 g/ml) sulphuric acid by mixing and heating at approx. 350C. on a hot plate. The dark colour of the suspension turns into clear light-yellow. When this is reached, continue heating for 2 hour 5.2 Hydrogen peroxide, 30%. 5.3 Sodium hydroxide solution, 38%. Dissolve 1,90 kg NaOH pellets in 2 L water in a heavy-walled 5 L flask. Cool the solution with the flask stoppered to prevent absorption of atmospheric CO2. Make up the volume to 5 L with freshly boiled and cooled deionized water. Mix well. 5.4 Mixed indicator solution. Dissolve 0.13 g methyl red and 0.20 g bromocresol green in 200 ml ethanol. 5.5 Boric acid-indicator solution, 1%. Dissolve 10 g H3BO3 in 900 ml hot water, cool and add 20 ml mixed indicator solution. Make to 1 L with water and mix thoroughly. 5.6 Hydrochloric acid, 0.010 M standard. Dilute standard analytical concentrate ampoule according to instruction.
Author: QA Officer (sign.): Sign.: Date of Expiry:

6. SAMPLE Air-dry fine earth (<2 mm) obtained according to ISO 11464 (or refer to own procedure). Mill approx. 15 g of this material to pass a 0.25 mm sieve. Use part of this material for a moisture determination according to ISO 11465 and PROC 002. 7. PROCEDURE 7.1 Digestion 1. Weigh 1 g of sample (accuracy 0.01 g) into a digestion tube. Of soils, rich in organic matter (>10%), 0.5 g is weighed in (see Remark 1). In each batch, include two blanks and a control sample. 2. Add 2.5 ml digestion mixture. 3. Add successively 3 aliquots of 1 ml hydrogen peroxide. The next aliquot can be added when frothing has subsided. If frothing is excessive, cool the tube in water. Note:. In Steps 2 and 3 use a measuring pipette with balloon or a dispensing pipette, 4. Place the tubes on the heater and heat for about 1 hour at moderate temperature (200C). 5. Turn up the temperature to approx. 330C (just below boiling temp.) and continue heating until mixture is transparent (this should take about two hours). 6. Remove tubes from heater, allow to cool and add approx., 10 ml water with a wash bottle while swirling. 7.2 Distillation 1. Add 20 ml boric acid-indicator solution with measuring cylinder to a 250 ml beaker and place beaker on stand beneath the condenser tip. 2. Add 20 ml NaOH 38% with measuring cylinder to digestion tube and distil for about 7 minutes during which approx. 75 ml distillate is produced. Note: the distillation time and amount of distillate may need to be increased for complete distillation (see Remark 2). 3. Remove beaker from distiller, rinse condenser tip, and titrate distillate with 0.01 M HCl until colour changes from green to pink. Note: When using automatic titrator: set end-point pH at 4.60. Remarks 1. The described procedure is suitable for soil samples with a nitrogen content of up to 10 mg N. This corresponds with a carbon content of roughly 10% C. Of soils with higher contents, less sample material is weighed in. Sample sizes of less than 250 mg should not be used because of sample bias.

2. The capacity of the procedure with respect to the amount of N that can be determined depends to a large extent on the efficiency of the distillation assembly. This efficiency can be checked, for instance, with a series of increasing amounts of (NH4)2SO4 or NH4Cl containing 0-50 mg N. 8. CALCULATION

where a = ml HCl required for titration of sample b = ml HCl required for titration of blank s = air-dry sample weight in gram M = molarity of HCl 1.4 = 14 10-3 100% (14 = atomic weight of nitrogen) mcf = moisture correction factor 9. VALIDATION PARAMETERS
9.1 Bias: 9.2 Within-lab reproducibility: 9.3 Method Detection Limit: -3.1% rel. (sample ISE 921, x=2.80 g/kg N, n=5) RL = 2.8scc = 2,5% rel. (sample LABEX 38,x =2.59 g/kg N, n=30) 0.014 mg N or 0.0014% N

10. TEST REPORT The report of analytical results shall contain the following information: - the result(s) of the determination with identification of the corresponding sample(s); - a reference to this SOP (if requested a brief outline such as given under clause 3: Principle); - possible peculiarities observed during the test; - all operations not mentioned in the SOP that can have affected the results. 11. REFERENCES Hesse, P.R. (1971) A textbook of soil chemical analysis. John Murray, London. Bremner, J.M. and C.S. Mulvaney (1982) Nitrogen Total. In: Page, A.L., R.H. Miller & D.R. Keeney (eds.) Methods of soil analysis. Part 2. Chemical and microbiological properties, 2nd ed. Agronomy Series 9 ASA, SSSA, Madison. ISO 11261 Soil quality - Determination of total nitrogen - Modified Kjeldahl method.

8 INTERNAL QUALITY CONTROL OF DATA


8.1 Introduction 8.2 Rounding and Significant figures 8.3 Control charts 8.4 Preparation of a Control Sample 8.5 Complaints 8.6 Trouble-shooting 8.7 LIMS SOPs

8.1 Introduction
In the preceding chapters basic elements of quality assurance were discussed. All activities associated with these aspects have one aim: the production of reliable data with a minimum of errors. The present discussion is concerned with activities to verify that a laboratory produces such reliable data consistently. To this end an appropriate programme of quality control (QC) must be implemented. Quality control is the term used to describe the practical steps undertaken to ensure that errors in the analytical data a of a magnitude re appropriate for the use to which the data will be put. This means that the (unavoidable) errors made are quantified to enable a decision whether they are of an acceptable magnitude and that unacceptable errors are discovered so that corrective action can be taken and erroneous data are not released. In short, quality control must detect both random and systematic errors. In principle, quality control for analytical performance consists of two complementary activities: internal QC and external QC. The internal QC involves the in-house procedures for continuous monitoring of operations and systematic day-to-day checking of the produced data to decide whether these are reliable enough to be released. The procedures primarily monitor the bias of data with the help of control samples and the precision by means of duplicate analyses of test samples and/or of control samples. These activities take place at batch level (second-line control). The external QC involves reference help from other laboratories and participation in national and/or international interlaboratory sample and data exchange programmes (proficiency testing; third-line control). The present chapter focuses mainly on the internal QC as this has to be organised by the laboratory itself. External QC, just as indispensable as the internal QC, is dealt with in Chapter 9.

8.2 Rounding and Significant figures


8.2.1 Rounding 8.2.2 Significant figures At this point, before entering into actual treatment of data, it might be useful to enter into the data themselves as they are treated and reported. Analytical data,

either direct readings (e.g. pH) or results of one or more calculation steps associated with most analytical methods, are often reported with several numbers after the decimal point. In many cases this suggests a higher significance than is warranted by the combination of procedure and test materials. Since clear rules for rounding and for determining the number of significant decimals are available these will be given here.

8.2.1 Rounding
To allow a better overview and interpretation, to conserve paper (more columns per page), and to simplify subsequent calculations, figures should be rounded up or down leaving out insignificant numbers. - To produce minimal bias, by convention rounding is done as follows: - If the last number is 4 or less, retain the preceding number; - if it is 6 or more, increase the preceding number by 1; - if the last number is 5, the preceding number is made even. Examples:
pH = pH = pH = pH = 5.72 5.76 5.75 5.85 rounds to rounds to rounds to rounds to 5.7 5.8 5.8 5.8

When calculations and statistics have to be performed, rounding must be done at the end. Remark: Traditionally, and by most computer calculation programs, when the last number is 5, the preceding number is raised by 1. There is no objection to this practice as long as it causes no disturbing bias, e.g. in surveys of attributes or effects.

8.2.2 Significant figures


8.2.2.1 Rounding of test results 8.2.2.2 Rounding of means and standard deviations

8.2.2.1 Rounding of test results


The significance of the numbers of results is a function of the precision of the analytical method. The most practical figures for precision are obtained from the own validation of the procedure whereby the -within-laboratory standard deviation s L (between-batch precision) for control samples is the most realistic parameter for routine procedures (see 7.5.2). For non-routine studies, the sr (within-batch precision) might have to be determined. To determine which number is still significant, the following rule is applied: Calculate the upper boundary bt of the rounding interval a using the standard deviation s of the results (n 10):
bt = s (8.1)

Then choose a equal to the largest decimal unit (...;100; 10; 1; 0.1; 0.01; etc.) which does not exceed the calculated bt

After having done this for each type of analysis at different concentration or intensity levels it will become apparent what the last significant figure or decimal is which may be reported. This exercise has to be repeated regularly but is certainly indicated when a new technique is introduced or when analyses are performed in a nonroutine way or on non-routine test materials. Example Table 8-1. A series of repeated CEC determinations (in cmolc /kg) on a control sample, each in a different batch.
Data 6.55 7.01 7.25 7.83 6.95 7.16 7.83 7.05 6.83 7.63 Rounded 6.6 7.0 7.2 7.8 7.0 7.2 7.8 7.0 6.8 7.6

The standard deviation of this set of (unrounded) data is:


s =0.4298 hence: and: b t = 0.2149 a = 0.1

Therefore, these data should be reported with a maximum of one decimal.

8.2.2.2 Rounding of means and standard deviations


When values for means, standard deviations, and relative standard deviation (RSD and CV) are to be rounded, b is calculated in a different way:
for x: for s: bs = for RSD: b rsd = (8.4) bx = (8.2) (8.3)

where x = mean of set of n results s = standard deviation of set of results RSD = relative standard deviation.

8.3 Control charts


8.3.1 Introduction 8.3.2 Control Chart of the Mean (Mean Chart) 8.3.3 Control Chart of the Range of Duplicates (Range Chart) 8.3.4 Automatic preparation of control charts

8.3.1 Introduction
As stated in Section 8.1, an internal system for quality control is n eeded to ensure that valid data continue to be produced. This implies that systematic checks, e.g. per day or per batch, must show that the test results remain reproducible and that the methodology is actually measuring the analyte or attribute in each sample. An excellent and widely used system of such quality control is the application of (Quality) Control Charts. In analytical laboratories such as soil, plant and water laboratories separate control charts can be used for analytical attributes, for instruments and for analysts. Although several types of control charts can be applied, the present discussion will be restricted to the two most usual types: 1. Control Chart of the Mean for the control of bias; 2. Control Chart of the Range of Duplicates for the control of precision. For the application of quality control charts it is essential that at least Control Samples are available and preferably also (certified) Reference Samples. As the latter are very expensive and, particularly in the case of soil samples, still hard to obtain, laboratories usually have to rely largely on (home-made) control samples. The preparation of control samples is dealt with in Section 8.4.

8.3.2 Control Chart of the Mean (Mean Chart)


8.3.2.1 Principle 8.3.2.2 Starting with Mean Charts 8.3.2.3 Using a Mean Chart

8.3.2.1 Principle
In each batch of test samples at least one control sample is analyzed and the result is plotted on the control chart of the attribute and the control sample concerned. The basic construction of this Control Chart of the Mean is presented in Fig. 8-1. (Other names are Mean Chart, x-Chart, Levey-Jennings, or Shewhart Control Chart). This shows the (assumed) relation with the normal distribution of the data around the mean. The interpretation and practical use of control charts is based on a number of rules derived from the probability statistics of the normal distribution. These rules are discussed in 8.3.2.3 below. The basic assumption is that when a control result falls within a distance of 2s from the mean, the system was under control and the results of the batch as a whole can be accepted. A control result beyond the distance of 2s from the mean (the "Warning Limit") signals that something may be wrong or tends to go wrong, while a control result beyond 3s (the "Control Limit" or "Action Limit") indicates that the system was statistically out of control and that the results have

to be rejected: the batch has to be repeated after sorting out what went wrong and after correcting the system. Fig. 8-1. The principle of a Control Chart of the Mean. UCL = Upper Control Limit (or Upper Action Limit). LCL = Lower Control Limit (or Lower Action Limit). UWL = Upper Warning Limit. LWL = Lower Warning Limit.

Apart from test results of control samples, control charts can be used for quite a number of other types of data that need to be controlled on a regular basis, e.g. blanks, recoveries, standard deviations, instrument response. A model for a Mean Chart is given. Note. The limits at 2s and 3s may be too strict or not strict enough for particular analyses used for particular purposes. A laboratory is free to choose other limits for analyses. Whatever the choice, this should always be identifiable on the control chart (and stated in the SOP or protocol for the use of control charts and consequent actions). Fig. 8-2. A filled-out control chart of the mean of a control sample.

8.3.2.2 Starting with Mean Charts


A control chart can be started when a sufficient number of data of an attribute of the control sample is available (or data of the performance of an analyst in analyzing an a ttribute, or of the performance of an instrument on an analyte). Since we want the control chart to reflect the actual analytical practice, the data should be collected in the same manner. This is usually done by analyzing a control sample in each batch. Statistically, a sufficient number of data would be 7, but the more data available the better. It is generally recommended to start with at least 10 replicates. Note: If duplicate determinations of the control sample are used in each batch to control within-batch precision (see 8.3.3), the mean of the duplicates can be used as entry. Although the principle of such a Mean Chart (called x-Chart, as opposed to x-Chart) is the same as for single values, the statistical background of the parameters obviously is not. These two systems may, therefore, not be mixed.

Example In ten consecutive batches of test samples the CEC of a control sample is determined. The results are: 10.4; 11.6; 10.8; 9.6; 11.2; 11.9; 9.1; 10.4; 10.3; 11.6 cmolc /kg respectively. Using the equations the following parameters for this set of data are obtained: Mean x = 10.7 cmolc /kg, and standard deviation s = 0.91. These are the initial parameters for a new control chart (see Fig. 8-2) and are recorded in the second upper right box of this chart ("data previous chart"). The Mean is drawn as a dashed (nearly) central line. The Warning and Action Limits are calculated in the left lower box, and the corresponding lines drawn as dashed and continuous lines respectively (the Action Line may be drawn in red). The vertical scale is chosen such that the range x 3s is roughly 2.5 to 4 cm. It may turn out, in retrospect, that one (or more) of the initial data lies beyond an initial Action Limit. This result should not have been used for the initial calculations. The calculations then have to be repeated without this result. Therefore, it is advisable to have a few more than ten initial data. The procedure for starting a control chart should be laid down in a SOP.

8.3.2.3 Using a Mean Chart


After calculating the mean and the standard deviation of the previous chart (or of the initial data set) five lines are drawn on the next control chart: one for the Mean, two Warning Limits and two Action Limits (see Fig. 8-2). Each time a result for the control sample is obtained in a batch of test samples, this result is recorded on the control chart of the attribute concerned. No rules are laid down for the size of a "batch" as this usually depends on the methods and equipment used. Some laboratories use one control sample in every 20 test samples, others use a minimum of 1 in 50. Note. The level of the analyte in the control sample should as much as possible match the level in the test samples. For this reason it is often necessary to have more than one control sample available for an attribute. To cope with the (expected) variation of the concentration of the analyte in the test samples the use of more than one control sample in a batch must be considered. This would indeed increase the reliability of the obtained results but at a price: an extra analysis is carried out and the chance of false rejection of a batch is increased also. Quality control rules have been developed to detect excess bias and imprecision as well as shift and drift in the analysis. These rules are used to determine whether or not results of a batch are to be accepted. Ideally, the quality control rules chosen should provide a high rate of error detection with a low rate of false rejection. The rules for quality control are not uniform: they may vary from laboratory to laboratory, and even within laboratories from analysis to analysis. The rules for the interpretation of quality control charts are not uniform either. Very detailed rules are sometimes applied, particularly when more than one control sample per batch is used. However, it should be realized that stricter rules generally result in (s)lower output of data and higher costs of analysis. The most convenient and commonly applied main rules are the following: Warning rule (if occurring, then data require farther inspection): - One control result beyond Warning Limit. Rejection rules (if occurring, then data are rejected): - 1. One control result beyond Action Limit.

- 2. Two successive control results beyond same Warning Limit. - 3. Ten successive control results are on the same side of the Mean. (Some laboratories apply six results.) - 4. Whenever results seem unlikely (plausibility check). The Warning Rule is exceeded by mere chance in less than 5% of the cases. The chance that the Rejection Rules are violated on purely statistical grounds can be calculated as follows:
Rule 1: Rule 2: Rule 3: 0.3 % 0.5(0.05)2100% = 0.1% (0.5)10100% = 0.1%

Thus, only less than 0.5% of the results will be rejected by mere chance. (This increases to 2% if in Rule 3 'six results on the same side of the mean' is applied.) If any of the four rejection rules is violated the following actions should be taken: - Repeat the analysis, if the next point is satisfactory, continue the analysis. If not, then - Investigate the cause of the exceeding. - Do not use the results of the batch, run, day or period concerned until the cause is trailed. Only use the results if rectification is justified (e.g. when a calculation error was made). - If no rectification is possible, after elimination of the source of the error, repeat the analysis of the batch(es) concerned. If this next point is satisfactory, the analysis can be continued. Commonly, outliers are caused by simple errors such as calculation or dilution errors, use of wrong standard solutions or dirty glassware. If there is evidence of such a cause, then this outlier can be put on the chart but may not be used in calculating the statistical parameters of the control chart. These events should be recorded on the chart in the box "Remarks". If the parameters are calculated automatically, the outlier value is not entered. Rejection Rule 3 may pose a particular problem. If after the 10th successive result on one side of the mean it appears that a systematic error has entered the process, the acceptance of the previous batches has to be reconsidered. If they cannot be corrected they may have to be repeated (if this is still possible: samples may have deteriorated). Also, the customer(s) may have to be informed. Most probably, however, problems of this type are discovered at an earlier stage by other Quality Control tools such as excessive blank readings, the use of independent standard solutions, instrument calibrations, etc. In addition, by consistent inspection of the control chart three or four consecutive control results at the same side of the mean will attract attention and a shift (see below) may already then be suspected. Rejection Rule 4 is a special case. Unlike the other rules this is a subjective rule based on personal judgement of the analyst and the officer charged with the final screening of the results before release to the customer. Both general and specific knowledge about a sample and the attribute(s) may ring a bell when

certain test results are thought to be unexpectedly or impossibly high or low. Also, results may be contradictive, sometimes only noticed by a complaining client. Obviously, much of the success of the application of this rule depends on the available expertise. Note. A very useful aspect of Quality Control of Data falling under Rejection Rule 4 is the cross-checking of analytical results obtained for one sample (or, sometimes, for a sequence or a group of samples belonging together, e.g. a soil profile or parts of one plant). Certain combinations of data can be considered impossible or highly suspect. For instance, a pH value of 8 and a carbonate content of zero is a highly unlikely combination in soils and should arouse enough suspicion for closer examination and possibly for rejection of either or both results. A number of such contradictions or improbabilities can be built into computer programs and used in automatic cross-checking routines after results are entered into a database. Ideally, these cross-checks are built into a LIMS (Laboratory Information Management System) used by the laboratory. While all LIMSes have options to set ranges within which results of attributes are acceptable, cross-checks of attributes is not a common feature. An example of a LIMS with cross-checks for soil attributes is SOILIMS. Most models of control charts accommodate 30 entries. When a chart is fall a new chart must be started. On the new chart the parameters of the just completed old chart need to be filled in. This is shown on Fig. 8-2. Calculate the "Data this chart" of the old chart and fill these in on the old chart. Perform the two-sided F-test and t-test (see right, to check if the completed chart agrees with the previous data. If this is the case, calculate "Data all charts" by adding the "Data this chart" to the "Data previous charts". These newly calculated "Data all charts" of the completed old chart are the "Data previous charts" of the new chart. Using these data, the new chart can now be initiated by drawing the new control lines as described in 8.3.2.2. Shift In the rare case that the F-test and/or the t-test will not allow the data of a completed control chart to be incorporated in the set of previous data, there is a problem. This has to be resolved before the analysis of the attribute in question can be continued. As indicated above, such a change or shift may have various causes, e.g. introduction of new equipment, instability of the control sample, use of a wrong standard, wrong execution of the method by a substitute analyst. Also, when there is a considerable time interval between batches such a shift may occur (mind the expiry date of reagents!). However, when the control chart is inspected in a proper and consistent manner, usually such errors are discovered before they are revealed by the F and t-test. Drift A less conspicuous and therefore perhaps greater danger than incidental errors or shifts is a gradual change in accuracy or precision of the results. An upward or downward trend or drift of the mean or a gradual increase in the standard deviation may be too small to be revealed by the F or t-test but may be substantial over time. Such a drift could be discovered if a control chart were much longer, say some hundreds of observations. A way to imitate this extension of the horizontal scale is to make a "master" control chart with the values of x and s of the normal control charts. Such a compressed control chart

could be referred to as "Control Chart of the Trend" and is particularly suitable for a visual inspection of the trend. An upward trend can be suspected in Figure 8-2. Indeed, the mean of the first fifteen entries is 10.59 vs. 10.97 cmolc /kg for the last fifteen entries, implying a relative increase of about 3.5%. This indicates that the further trend has to be watched closely. The main cause of drift is often instability of the control sample, but other causes such as deterioration of reagents and equipment must taken into account. Whatever the cause, when discovered, it should be traced and rectified. And here too, if necessary, already released results may have to be corrected. New Control Sample When a control sample is about to run out, or must be replaced because of instability, or for any other reason, a new control sample must be timely prepared so that it can be run concurrently with the old control sample for some time. This allows to make a smooth start without interrupting the analytical programme. As indicated previously, the more initial data are obtained the better (with a minimum of 10) but ideally a complete control chart should be made.

8.3.3 Control Chart of the Range of Duplicates (Range Chart)


8.3.3.1 Principle 8.3.3.2 Range chart of Control Sample 8.3.3.3 Starting the first chart 8.3.3.4 R-chart of Test Samples Between-batch precision (within-laboratory reproducibility, see 7.5.2.3) can be inspected visually on the Control Chart of the Mean; a "noisy" graph with frequent and large fluctuations indicates a lower precision than a smooth graph. Information about the within-batch precision (repeatability, see 7.5.2.2) can only be obtained by running duplicate analyses in the same batch. For this purpose both test samples and control samples can be used but the latter are somewhat more convenient. The obtained data are plotted on a Control Chart of the Range of Duplicates (also called Range Chart or R-chart).

8.3.3.1 Principle
In each batch of test samples at least one sample is analyzed in duplicate and the difference between the results is plotted on the control chart of the attribute concerned. The basic construction of such a Control Chart of the Range of Duplicates is given in Figure 8-3. It shows similarities with the Control Chart of the Mean in that now a mean of differences is calculated with corresponding standard deviation. The warning line and control line can be drawn at Is and 3s distance from the mean of differences. The graph is single-sided as the lowest observable value of the difference is zero. Fig. 8-3. Control Chart of the Range of Duplicates. R = mean of the range of duplicates. WL = Warning Limit. CL = Control Limit (or Action Limit).

8.3.3.2 Range chart of Control Sample


The simplest way of controlling precision is by running duplicates of a control sample in each batch. The advantage is that this can be directly connected to the use of single values as applied for the Control Chart of the Mean by simply simultaneously running two subsamples of the same control sample. A disadvantage is that precision is measured at one concentration level only (unless more than one control samples are used). The duplicates should be placed at random positions in the batch, not adjacent to each other. The necessary statistical parameters for the Range Chart, R and s R, can be determined as follows:
(8.5)

where R = mean difference between duplicates Ri = sum of (absolute) differences between duplicates m == number of pairs of duplicates and
(8.6)

where s R = standard deviation of the range of all pairs of duplicates. Fig. 8-4. A filled-out control chart of the range of duplicates of a control sample. Note 1. Equation (8.6) is equivalent to Equation (7.21). This standard deviation is somewhat different from the common standard deviation of a set of data (Eq. 6.2) and results from pooling the standard deviation of

each pair: namely, the duplicates of the pairs have the same population standard deviation. Note 2. If it is decided to routinely run the control sample in duplicate in each batch as described here, a different situation arises with respect to the Mean Chart since now two values for the control sample are obtained instead of one. These values are of equal weight and, therefore, their mean must be used as an entry. It is important to note that the parameters of the thus obtained Mean Chart, particularly the standard deviation, are not the same as those obtained using single values. Hence, these two types should not be mixed up and not be compared by means of the F-test!.

8.3.3.3 Starting the first chart


Initiating a Control Chart of the Range of Duplicates is identical to initiating a Control Chart of the Mean as discussed in Section 8.3.2.2. Also the model of the chart is virtually identical with only x replaced by R. The parameters R and s R are determined for at least 10 initial pairs of duplicates as given in Table 8-2 as an example. A control chart with these initial parameters is given in Fig. 8-4. The interpretation rules of the Range Chart are very similar to those of the Mean Chart: Warning rule: - One control observation beyond Warning Limit Rejection rules: - One control observation beyond Control (or Action) Limit - Two successive control observations beyond Warning Limit - Ten successive control observations beyond R. (Some apply six.) The response to violation of the rejection rules is also similar: repeat the analysis and investigate the problem if the repeat is not satisfactory. The procedure to initiate a new chart when the present one is full is identical to that described for the Control Chart of the Mean. Example Table 8-2. CEC values (in cmolc /kg) of a control sample determined in duplicate to calculate initial values of R and s R of the control chart of duplicates.
1 10.1 10.7 10.5 9.8 9.0 11.0 11.5 2 9.7 10.2 11.1 10.3 10.1 10.6 10.7 0.4 0.5 0.6 0.5 1.1 0.5 0.8

10.9 8.9 10.0

9.5 9.4 9.6

1.4 0.5 0.4

Mean: s:

10.24 0.85

10.13 0.74

R: s R:

0.66 0.52

8.3.3.4 R-chart of Test Samples


A limitation of the use of duplicates of a control sample to verify precision is that this may not fully reflect the precision of the analysis of test samples as these may appreciably deviate from the control sample both in matrix and in concentration or capacity of the attribute concerned. The most convenient way to meet this problem is to use more than one control sample with different concentrations of the attribute, each with their own control chart as described above. Another way is to use test samples instead of control samples. However, also in this case duplicates may be chosen at non-representative analyte levels unless the level per batch is rather uniform. Alternatively, all samples are run in duplicate but this is not commonly done in routine analysis and is usually only affordable in special research cases. When test sample duplicates are preferred two situations can be distinguished: 1. Analyses with a (near-)constant relative standard deviation; 2. Analyses with a non-constant relative standard deviation. Although commonly occurring, the second case is rather complicated for routine work and will therefore not be treated here. Constant Relative Standard Deviation If a constant relative standard deviation (CV or RSD) can be assumed, which may often be the case over certain limited working ranges of concentration, one (or more) test sample(s) in a batch can be analyzed in duplicate instead of a control sample. A constant RSD would result in a control chart as schematically given in Figure 8 which is very similar to Fig. 8 Because the standard -5 -3. deviation is assumed proportional to the analytical result this applies to the difference between duplicates as well. Therefore, the vertical scale must be the normalized, i.e. the (absolute) value found for R of each pair of duplicates has to be divided by the mean of the two duplicates (and multiplied by 100% if a % scale is used rather than a fraction scale). The interpretation rules and calculations of parameters when a chart is full are again identical to those discussed above for the Control Chart of the Mean. Fig. 8-5. Control Chart of the Normalized Range of Duplicates. CV = coeff. of variation; other symbols as in Fig. 8-3.

8.3.4 Automatic preparation of control charts


Obviously, in large laboratories with hundreds of analyses per day, much (if not all) of the above discussed control work is usually done automatically by computer. This can be programmed by the laboratory personnel but commercial programs are available which are usually connected to or incorporated in the LIMS (Laboratory Information Management System, see 8.7). For small and medium-sized laboratories (and also for large laboratories starting with control work of new tests or analyses), the manual use of charts, where possible with computerized calculations, is recommended.

8.4 Preparation of a Control Sample


8.4.1 Collection and treatment of soil material 8.4.2 Collection and treatment of plant material 8.4.3 Stability 8.4.4 Homogeneity In the previous sections reference was frequently made to the "Control Sample". It was defined as: "An in-house reference sample for which one or more property values have been established by the user laboratory, possibly in collaboration with other laboratories." This is the material a laboratory needs to prepare for second-line (internal) control in each batch and the obtained results of which are plotted on Control Charts. The sample should be sufficiently stable and homogeneous for the properties concerned.

From the foregoing it must have become clear that the control sample has a crucial function in quality control activities. For most analyses a control sample is indispensable. In principle, its place can be taken by a (certified) reference sample, but these are expensive and for many soil and plant analyses not even available. Therefore, laboratories have to prepare control samples themselves or obtain them from other laboratories. Because the quality control systems rely so heavily on these control samples their preparation should be done with great care so that the samples meet a number of criteria. The main criteria are: 1. The sample is homogeneous 2. The material is stable 3. The material has the correct particle size (i.e. passed a prescribed sieve) 4. The relevant information on properties and composition of the matrix, and the concentration of the analyte or attribute concerned is available. The preparation of a control sample is usually fairly easy and straightforward. As an example it will be described here for a "normal" soil sample (so-called "fine earth") and for a ground plant sample.

8.4.1 Collection and treatment of soil material


Select a location for the collection of suitable and sufficient material. The amount of material to be collected depends on the turn-over of the sample material, the expected stability and the amount that can be handled during preparation. Thus, amounts may range from a few kilos to a hundred kilo or more. The material is collected in plastic bags and spread out on plastic foil or in large plastic trays in the institute for air-drying (do not expose to direct sunlight; forced drying up to 40C is permitted). Remove large plant residues. After drying, pass the sample through a 2 mm sieve. Clods, not passing through the sieve are carefully crushed (not ground!) by a pestle and mortar or in a mechanical breaker. Gravel, rock fragments etc. not passing through the sieve are removed and discarded. The material passing through the sieve is collected in a bin or vessel for mechanical homogenization. If the whole sample has to be ground to a finer particle size this can be done at this stage. If only a part has to be ground finer, this should be done after homogenization. Homogenization may be done with a shovel or any other instrument suitable for this purpose. Some laboratories use a concrete mixer. Mixing should be intensive and complete. After that, the bulk sample is divided into subsamples of 0.5 to 1 kg to be used in the laboratory. For this, riffle samplers and sample splitters may be used. The subsamples can be kept in glass or plastic containers. The latter have the advantage that they are unbreakable. Both have the disadvantage is that fine particles may be electrostatically attracted to the container walls thus causing segregation. The rule about labelling is that it should preferably be done on both the container and the lid. If only one label is used this should always be stuck on the container and not on the lid! Note. In a note the suggestion is made to have a useful control sample prepared by an interlaboratory sample exchange organization.

8.4.2 Collection and treatment of plant material


Select plant material with the desired or expected composition. Realize that the composition of different parts of a plant (leaf, stem, flower, fruit) may differ considerably and that, in general, the control sample should match the test samples as much as possible. If the fresh material is contaminated (e.g. by soil, salts, dust) it needs to be washed with tap water or dilute (0.1 M) hydrochloric acid followed by deionized water. For test samples, to minimize the change of concentration of components, this washing should be done in a minimum of time, say within half a minute. For the preparation of a control sample this is less critical. The sample is dried at 70C in a ventilated drying oven for 24 hours. The sample is then cut and ground to pass a 1 mm sieve. Storage can be done as described for soil samples. Note. During the pretreatment (drying, milling, sieving) both soil and plant material may be contaminated by the tools used. In this way the concentration of certain elements (Cu, Fe, Al, etc., see 9.4) may be increased. Like the washing procedure, this problem is less critical for control samples than for test samples (unless the contamination is present as large particles).

8.4.3 Stability
No general statement can be given about the stability of the material. Although dried soil and plant material can be kept for a very long time or even, in practice, indefinitely under favourable conditions, it must be realized that some natural attributes may still (slowly) change, that samples for certain analyses may not be dried and that certainly many "foreign" components such as petroleum products, pesticides or other pollutants change with time or disappear at varying unknown rates. Each sample and attribute has to be judged on this aspect individually. Control charts may give useful information about possible changes during storage (trends, shifts).

8.4.4 Homogeneity
For quality control it is essential that a control sample is homogeneous so that subsamples used in the batches are "identical". In practice this is impossible (except for solutions), and the requirement can be reduced to the condition that the (sub)samples statistically belong to the same population. This implies a test for homogeneity to prove that the daily-use sample containers (the laboratory control samples) into which the bulk sample was split up represent one and the same sample. This can be done in various ways. A relatively simple procedure is described here. Check for homogeneity by duplicate analysis For the check for homogeneity the statistical principles of the two control charts discussed in Section 8.3, i.e. for the Mean and for the Range of Duplicates, are used. The laboratory control samples, prepared by splitting the bulk sample, are analyzed in duplicate in one batch. The analysis used is arbitrary. Usually a rapid, easy and/or cheap analysis suffices. Suitable analyses for soil material are, for example, carbon content, total nitrogen, and loss-on-ignition. For plant samples total nitrogen, phosphorus, or a metal (e.g. Zn) can be used. The organization of the test is schematically given in Fig. 8-6. As stated before, statistically this test only makes sense when a sufficient number of sample

containers are involved (n 7). Do not use too small samples for the analysis as this will adversely affect the representativeness resulting in an unnecessary high standard deviation. Note. A sample may prove to be homogeneous for one attribute but not for another. Therefore, fundamentally, homogeneity of control samples should be tested with an analysis for each attribute for which the control sample is used. This is done for certified reference samples but is often considered too cumbersome for laboratory control samples. On the other hand, such an effort would have the additional advantage that useful information about the procedure and laboratory performance is obtained (repeatability). Also, such values can be used as initial values of control charts. Check on the Mean (sample bias) This is a check to establish if all samples belong to the same population. The means of the duplicates are calculated and treated as single values (xi) for the samples 1 to n. Then, using Equations (6.1) and (6.2), calculate x and s of the data set consisting of the means of duplicates (include all data, i.e. do not exclude outliers). Fig. 8-6. Scheme for the preparation and homogeneity test of control samples.

The rules for interpretation may vary from one laboratory to another and from one attribute to another. In general, values beyond 2 from the mean are s considered outliers and rejected. The sample container concerned may be discarded or analyzed again after which the result may well fall within x 2s and be accepted or, otherwise, the subsample may now definitely be discarded. Check on the Range (sample homogeneity)

This is a check to establish if all samples are homogeneous. The differences R between d uplicates of each pair are calculated (include all data, i.e. do not exclude outliers). Then calculate R and s R of the data set using Equations (8.5) and (8.6) respectively. The interpretation is identical to that for the Check on the Mean as given in the previous paragraph. Thus, a laboratory control sample container may have to be discarded on two grounds: 1. because it does not sufficiently represent the level of the attribute in the control sample and 2. because it is internally too heterogeneous. The preparation of a control sample including a test for homogeneity should be laid down in a SOP. Example In Table 8-3 an example is given of a check for homogeneity of a soil control sample of 5 kg which was split into ten equal laboratory control samples of which the loss-on-ignition was determined in duplicate. The loss-on-ignition can be determined as follows: 1. Weigh approx. 5 g sample into a tared 30 mL porcelain crucible and dry overnight at 105C. 2. Transfer crucible to desiccator to cool; then weigh crucible (accuracy 0.001 g). 3. Place crucibles in furnace and heat at 900C for 4 hours. 4. Allow furnace to cool to about 100C, transfer crucible to desiccator to cool, then weigh crucible with residue (accuracy 0.001 g). Now, the weight loss between 110 and 900C can be calculated and expressed in mass % or in g/kg (weight basis: material dried at 105 C). Table 8-3. Results (in mass/mass %) of duplicate Loss-on-Ignition determinations (A and B) on representative subsamples of ten 500 g laboratory samples of a soil control sample.
Sample 1 2 3 4 5 6 7 8 A 9.10 9.65 9.63 8.65 8.71 9.14 8.71 8.59 B 8.42 8.66 9.18 8.89 9.19 8.93 8.97 8.78 MeanAB 8.760 9.155 9.405 8.770 8.950 9.040 8.840 8.685 0.68 0.99 0.45 0.24 0.48 0.22 0.26 0.19

9 10 Mean: s:

8.86 9.04 8.949 0.214*

9.12 8.75 0406 SR : 0.334**

8.990 8.895

0.26 0.29

(* using Eq. 6.2; ** using Eq. 8.6) Tolerance range for mean of duplicates (x 2s): 8.949 2 0.214 = 8.52-9.38% Tolerance range for difference R between duplicates: In this example it appears that only the mean result of sample no. 3 (= 9.405%) falls outside the permissible range. However, since this is only marginally so (less than 0.3% relative) we may still decide to accept the sample without repeating the analysis. The measure R for internal homogeneity falls for all samples within the permissible range. (Should an R be found beyond the range we may opt for repeating the duplicate analysis before deciding to discard that sample.)

8.5 Complaints
Errors that escaped detection by the laboratory may be detected or suspected by the customer. Although this particular type of quality control may not be popular, it should in no case be ignored and can sometimes even be useful. For the dealing with complaints a protocol must be made with an accompanying Registration Form with at least the following items: - name client, and date the complaint was received - work order number - description of complaint - name of person who received the complaint (usually the head of laboratory) - person charged with investigation - result of investigation - name of person(s) who dealt with the complaint - an evaluation and possible action - date when report was sent to client A record of complaints should be kept, the documents involved may be kept in the work order file. The trailing of events (audit trailing) may sometimes not be easy and particularly in such cases the proper registration of all laboratory procedures involved will show to be of great value. Note. Registration of procedures formally also applies to work that has been put out to contract to other laboratories. When work is put out, the quality standards of the subcontractor should be (demonstrably) satisfactory since the final responsibility towards the client lies with the laboratory that put out the work. If the credibility needs to be verified this is usually done by inserting duplicate and blind samples.

8.6 Trouble-shooting

Whenever the quality control detects an error, corrective measures must be taken. As mentioned earlier, the error may be readily recognized as a simple calculation or typing error (decimal point!) which can easily be corrected. If this is not the case, then a systematic investigation must take place. This includes the checking of sample identification, standards, chemicals, pipettes, dispensers, glassware, calibration procedure, and equipment. Standards may be old or wrongly prepared, adjustable pipettes may indicate a wrong volume, glassware may not be cleaned properly, equipment may be dirty (e.g. clogged burner in AAS), or faulty. Particularly electrodes can be a source of error: they may be dirty and their life-time must be observed closely. A pH-electrode may seemingly respond well to calibration buffer solutions but still be faulty. Clearly, every analytical procedure and instrument has its own characteristic weakness, by experience these become known and it is useful to make a list of such relevant check points for each procedure and adhere this to the corresponding SOP or maintenance logbook if it concerns an instrument. Update this list when a new flaw is discovered. Trouble-shooting is further discussed in Section 9.4.

8.7 LIMS
8.7.1 Introduction 8.7.2 What is a LIMS? 8.7.3 How to select a LIMS

8.7.1 Introduction
The various activities in a laboratory produce a large number of data streams which have to be recorded and processed. Some of the main streams are: - Sample registration - Desired analytical programme - Work planning and progress monitoring - Calibration - Raw data - Data processing - Data quality control - Reporting - Invoicing - Archiving Each of these aspects requires its own typical paperwork most of which is done with the help of computers. As discussed in previous chapters, it is the responsibility of the laboratory manager to keep track of all aspects and tie them up for proper functioning of the laboratory as a whole. To assist him in this task, the manager will have to develop a working system of records and journals. In laboratories of any appreciable size, but even with more than two analysts, this can be a tedious and error-sensitive job. Consequently, from about 1980, computer programs appeared on the market that could take over much of this work. Subsequently, the capability of Laboratory Information Management Systems (LIMS) has been further developed and their price has ncreased i likewise.

The main benefit of a LIMS is a drastic reduction of the paperwork and improved data recording, leading to higher efficiency and increased quality of reported analytical results. Thus, a LIMS can be a very important tool in Quality Management.

8.7.2 What is a LIMS?


The essential element of a LIMS is a relational database in which laboratory data are logically organized for rapid storage and retrieval. In principle, a LIMS plans, guides and records the passage of a sample through the laboratory, from its registration, through the programme of analyses, the validation of data (acceptance or rejection), before the presentation and/or filing of the analytical results. Hardware Originally, LIMSes were installed on mainframe and minicomputers in combination with terminals. However, with the advent of stronger PCs, programs were developed that could run on a single PC (single-user system) or on several PCs with a central one acting as server (network, multi-user system). The more expensive systems allow advanced automation of a laboratory by direct coupling of analytical instruments to the system. Printers are essential parts of the system for label and bar code printing as well as for graphs and reports. Software The LIMS software consists o two elements: the routines for the functional f parts, and the database. For the latter usually a standard database program is used (e.g. dBase, Oracle,) which can also be done for certain functional parts such as production of graphs and report generation. The database is subdivided into a static and a dynamic part. The static part comprises the elements that change only little with time such as the definition of analytical methods, whereas the dynamic part relate to clients, samples, planning, and results. Function features - A number of common main features of a LIMS are the following: - Registration of samples and assigned jobs with unique numbers and automatic label production. - Production of work lists for daily and long-term planning. - Allows rapid insight in status of work (pending jobs, back-log). - Informs about laboratory productivity (per analysis, whole laboratory). - Production of control charts and signalling of violation of control rules (results beyond Action Limit, etc.). - Flagging results beyond preset specifications. - Generates reports and invoices. - Archiving facility. - Allows audit trailing (search for data, errors, etc.).

Data collection and subsequent calculations are usually done "outside" the LIMS. Either with a pocket calculator but more commonly on a PC with a standard type spreadsheet program (such as Lotus 123) or with one supplied with the analytical instrument. The data are then transferred manually or, preferably, by wire or diskette to the LIMS. The larger LIM systems usually have an internal module for this processing. A major problem with the application of a LIMS is the installation and the involved customizing to the specific needs of a laboratory. One of the first asked questions (after asking for the price) is: 'can I directly connect my equipment to the LIMS?'. Invariably the answer of the vendor is positive but the problems involved are usually concealed or unjustly trivialized. It is not uncommon that installations take more than a year before the systems are operational (not to speak of complete failures), and sometimes the performance falls short of the expectations because the operational complexity was underestimated. Mentioning these problems is certainly not meant to discourage the purchase of a LIMS. On the contrary, the use of a LIMS in general can be very rewarding. It is rather intended as a warning that the choice for a system must be very carefully considered.

8.7.3 How to select a LIMS


When it is considered that a computerized system might improve the management of the laboratory information data flow, a plan for its procurement must be made. The most important activities prior to the introduction of a LIMS are the following: - Set up LIMS project team. Include a senior laboratory technician, the future system manager and someone from the computer department. - Review present procedures and workload. - Consider if a LIMS can be useful. Define what the system must do and may cost (make cost/benefit assessment). The cost/benefit assessment is not always straightforward as certain benefits are difficult to assess or to express in money (e.g. improved data quality; changing work attitude). Also, a LIMS may be needed as a training facility for students. When a decision is made that a LIMS project is viable, the team must define the requirements and consider the two ways to acquire a LIMS: either by in-house building a system or by purchasing one. Many in-house systems are not premeditated but result from a gradual build-up of small programs written for specific laboratory tasks such as the preparation of work lists or data reports. The advantage is that these programs are fully customized. The disadvantage is that, lacking an initial master plan, they are often not coupled or integrated into a overall system which then takes extra n effort. Yet, many laboratories employ such "systems". If a system has to be built from scratch then the general rule is that if a suitable commercial package can be found, it is not economical to build a system as it is both a complicated and time-consuming process. The purchase of a commercial LIMS should be a well structured exercise, particularly if a large and expensive system is considered. Depending on the capabilities, prices for commercial systems range from roughly USD 25,000 to 100,000 or even higher. The next steps to be taken are:

- Identify LIMS vendors. - Compare requirements with available systems. - Identify suitable systems and make shortlist of vendors. - Ask vendors for demonstration and discuss requirements, possible customization, installation problems, training, and after-sales support. - If possible, contact user(s) of candidate-systems. After comparing the systems on the shortlist the choice can be made. By way of precaution, it may be wise to start with a "pilot" LIMS, a relatively cheap singleuser system in part of the laboratory in order to gain experience and make a more considered decision for a larger system later. It is essential that all laboratory staff are involved and informed right from the start as a LIMS may be considered meddlesome ('big brother is watching me') possibly arousing a negative attitude. Like Quality Management, the success of a LIMS depends to a large extent on the acceptance by the technical staff.

You might also like