Professional Documents
Culture Documents
ANALYST
FULL PAPER
A new test for sufficient homogeneity
Certified reference materials and materials distributed in proficiency testing need to be sufficiently
homogeneous, that is, the variance in the mean composition of the distributed portions of the material must be
negligibly small in relation to the variance of the analytical result produced when the material is in normal use.
The requirement for sufficient homogeneity suggests the use of a formal test. Such tests as have been formulated
rely on the duplicated analysis of the material from a number of portions, followed by analysis of variance.
However, the outcome is not straightforward. If the analytical method used is very precise, then an undue
proportion of the materials will be found to be significantly heterogeneous. If it is too imprecise, the test may be
unable to detect heterogeneity. Moreover, the Harmonised Protocol Procedure (M. Thompson and R. Wood, Pure
Appl. Chem., 1993, 65, 2123) seems to be unduly prone to the rejection of material that is in fact satisfactory. We
present a simple new statistical approach that overcomes some of these problems.
Testing for sufficient homogeneity Tests for sufficient homogeneity are never likely to be wholly
satisfactory. The main problem is that, because of the high cost
With the exception of well-mixed true solutions, materials of the analysis, the number of samples taken for testing will be
prepared for proficiency tests and other interlaboratory studies small. This makes the power of the statistical test (that is, the
are, despite our best efforts, heterogeneous. When such a bulk probability of rejecting the material when it is in fact
material is split for distribution to various laboratories, the units heterogeneous) relatively low. A further problem is that
produced vary slightly in composition among themselves. heterogeneity is inherently likely to be patchy, and discrepant
Usually the variation is negligible, but we want to be sure of distribution units might be under-represented among those
this. When we test for so-called sufficient homogeneity in selected for test.
such materials, we are seeking to show that this variation in However, given that sufficient homogeneity is a reasonable
composition among the distributed units (characterised by the prior assumption, and that the cost of testing for it is often high,
sampling standard deviation ssam) is negligible in relation to it seems sensible to make the main emphasis the avoidance of
variation introduced by the measurements conducted by the Type 1 errors (that is, false rejection of a satisfactory
participants in the proficiency test. material). Homogeneity tests should be regarded as essential,
As we expect the standard deviation of interlaboratory but not foolproof, safeguards. We argue below that the test
variation in proficiency tests to be approximated to by sp, the suggested in the Harmonised Protocol may be too prone to the
target standard deviation, it is natural to use this criterion as a rejection of good samples, and we suggest an alternative test.
reference value. The ISO/IUPAC/AOAC Harmonised Protocol
for Proficiency Testing1 requires that the estimated sampling
standard deviation ssam should be less than 30% of the target
standard deviation sp, that is ssam/sp < 0.3. Analytical precision required for homogeneity
This condition, when fulfilled, is called sufficient homoge- tests
neity in the Harmonised Protocol. At this limit, the standard
deviation of the resultant z-scores would be inflated by the To test for sufficient homogeneity, we have to estimate ssam
heterogeneity by somewhat less than 5% relative, for example from the results of a randomised replicated experiment using
from 2.0 to 2.1, which is deemed to be acceptable. If the ANOVA. In the experiment, each selected sample is separately
condition were not fulfilled, the z-scores would reflect, to an homogenised and analysed in duplicate. Much depends on the
unacceptable degree, variation in the material as well as quality of the analytical results. If the analytical method is
variation in laboratory performance. Participants in proficiency sufficiently precise, ssam can be reliably estimated, and a lack of
testing schemes need to be reassured that the distributed units of sufficient homogeneity can be detected with reasonably high
the test material are sufficiently similar, and this requirement probability when it is present. If the analytical standard
usually calls for testing. deviation san is not small, however, important sampling
The test specified in the Harmonised Protocol calls for the variation may be obscured by analytical variation. We may
selection of 10 or more units at random after the putative obtain a non-significant result when testing for excess sampling
homogenised material has been split and packaged into discrete variation, not because it is not present, but because the test has
samples for distribution. The material from each sample is then no power to detect it when the analytical variance is high.
analysed in duplicate, under randomised repeatability condi- The Harmonised Protocol does not specify any limits on the
tions (that is, all in one run), using a method with sufficient analytical variance, but it seems desirable to do so. There has to
analytical precision. The value of ssam is then estimated from be a trade off between the cost of specifying very precise
the mean squares after one-way analysis of variance (ANOVA), analytical methods and the risk of failing to detect important
and a statistical test is carried out. sampling variation. Based on an informal consideration of this
Detailed procedure
Other pathologies of data sets
It is assumed that the data comprise m pairs of duplicate
All of the above considerations depend on the laboratory analyses. The first step is to use these to estimate the analytical
carrying out the test for sufficient homogeneity correctly and, in and sampling variances. If a program to perform a one-way
particular, selecting the samples for test at random, homogenis- ANOVA is available, this may be used. Alternatively, a full
ing them before analysis, analysing the duplicated test portions calculation scheme is given below.
under strictly randomised repeatability conditions, and record- (i) Calculate the sum Si and the difference Di of each pair of
ing the results with sufficient digit resolution to allow the duplicates for i = 1, , m.
analysis of the variation. In the authors experience, data sets (ii) Calculate the sum of squares of the differences SD2i , this
where at least some of these requirements have not been met are sum and all those below being over the range i = 1, , m.
common (25/139 instances in our study). Such infringements (iii) Cochrans test statistic is the ratio of D2max, the largest
may invalidate the outcome of the test. We therefore recom- squared difference, to this sum of squared differences:
mend that: (i) detailed instructions be issued to the laboratory
conducting the homogeneity test; and (ii) the data be checked C = D2max/SD2i .
for discrepancies as a matter of routine. Such a check could be Calculate the ratio and compare it with critical values from
made visually on a simple plot of the data, searching for such tables.
diagnostic features as: (i) trends or discontinuities; (ii) non- (iv) Now use the same sum of squared differences to
random distribution of differences between first and second test calculate
results; (iii) excessive rounding; and (iv) outlying results within
samples. MSW = (SD2i )/2m.
(v) Calculate the variance of the sums Si,
vS = S(Si 2 S)2/(m 2 1),
The new procedure
where
Rather than express the criterion for sufficient homogeneity in S = (1/m)SSi
terms of the estimated sampling variance s2sam, as does the
Harmonised Protocol, it would seem more logical to impose a is the mean of the Si, and use this to find
limit on the true sampling variance s2sam. It is this quantity that MSB = vS/2.
is more relevant to the variability in the (untested) samples sent
(vi) Then estimate the analytical variance as
out to laboratories. Thus our criterion for sufficient homoge-
neity is that the sampling variance s2sam must not exceed an s2an = MSW
m 20 19 18 17 16 15 14 13 12 11 10 9 8 7
F1 1.59 1.60 1.62 1.64 1.67 1.69 1.72 1.75 1.79 1.83 1.88 1.94 2.01 2.10
F2 0.57 0.59 0.62 0.64 0.68 0.71 0.75 0.80 0.86 0.93 1.01 1.11 1.25 1.43
a m is the number of samples that have been measured in duplicate. The two constants are derived from standard statistical tables as
Fig. 1 Example data for the homogeneity testing procedure. Appendix: Example instructions for the analyst in
testing for sufficient homogeneity
(i) Select 10 (or more) of the packaged units strictly at random.
This must be done in a formal way, by assigning a sequential
number to the units, either explicitly (by labelling them) or
implicitly (e.g., by their position in a linear sequence). The
selection is made by use of random numbers from a table or
generated by a computer package (e.g., Excel). It is not
acceptable to select the units in any other way (e.g., by shuffling
them). A new random sequence should be generated for each
experiment.
(ii) Homogenise each selected sample in an appropriate
manner (e.g., in a blender) and from each weigh out two test
portions. Label the test portions as shown below.
Sample Labels
1 1.1 1.2
2 2.1 2.2
Fig. 2 Probability of rejecting the hypothesis of sufficient homogeneity as 3 3.1 3.2
a function of q = s2sam/s2p. The three curves, from left to right, are for values " " "
of r = s2an/s2p of 0, 0.125 and 0.25. 10 10.1 10.2
(iii) Sort the 20 test portions into a random order and carry out
all analytical operations on them in that order. Again, random
When used on real data from homogeneity tests, the new number tables or a computer package must be used to generate
procedure was found to be only slightly less likely than the a new random sequence. An example random sequence (not to
Harmonised Protocol procedure to reject materials when the be copied) is: 7.1, 3.1, 5.2, 5.1, 10.2, 1.1, 2.1, 9.2, 8.2, 1.2, 4.1,
analytical precision was satisfactory and no other data patholo- 2.2, 9.1, 10.1, 7.2, 3.2, 8.1, 6.1, 4.2, 6.2.
gies were detected. The respective failure rates were 0/114 and (iv) The analysis should be conducted if at all possible under
3/114. All of these materials were thought a priori to be repeatability conditions (i.e., in one run) or, if that is impossible,
sufficiently homogeneous. However, when the analytical data in successive runs with as little change as possible, using a
were defective in some way (and this is sometimes unavoida- method that has a repeatability standard deviation of less than
ble), the new procedure was much less likely than the 0.5sp.
Harmonised Protocol to reject materials, the respective failure (v) Return the 20 analytical results, including the labels, in
rates being 2/139 and 22/139. the run order used.
Acknowledgement
Recommendations for tests for sufficient
This work was completed with financial support from the Food
homogeneity Standards Agency.
(i) The precision of the analytical method used in the test should
satisfy
References
san/sp < 0.5
1 M. Thompson and R. Wood, Pure Appl. Chem., 1993, 65, 2123.
if at all possible. 2 J. S. Williams, Biometrika, 1962, 49, 278.