Statistical Power in Comparative Aquaculture Studies: E. Nigel Ling, Deirdre Cotter

Aquaculture 224 (2003) 159 168 www.elsevier.
com/locate/aqua-online
Statistical power in comparative aquaculture studies

E. Nigel Linga, *, Deirdre Cotter b
a
School of CIS, Kingston University, Kingston upon Thames KT1 2EE, UK b Marine Institute, Furnace, Newport, Co. Mayo, Ireland
Received 14 August 2002; received in revised form 24 February 2003; accepted 26 February 2003
Abstract A formula is derived for the statistical power in a nested or hierarchical analysis of variance (ANOVA). Data from several salmonid growth trials are analysed to obtain values for the variance between tanks and within tanks in a hierarchical experimental design. These figures are used to calculate the minimum values of fish sample rates and numbers of tanks required to achieve a statistical power of 80% in a comparative experiment; the most cost-effective design is estimated. For a comparative trial using Atlantic salmon from first feeding for a period of 3 months, a design to detect a difference of at least 1.2 g between treatments requires no fewer than five tanks and a sample rate of 5000 individuals from each tank. Using cost estimates from the Marine Institute, an optimal design requires eight tanks and a sample of 77 fish per tank. Smaller values for the variances or a less sensitive experiment will reduce the minimum number of tanks and fish. The calculations are applicable to any hierarchical experiment with appropriate values for the parameters. D 2003 Elsevier Science B.V. All rights reserved.
Keywords: Statistical power; Experimental design; Nested ANOVA; Atlantic salmon; Triploid
1. Introduction Experiments requiring nested designs occur frequently in aquaculture and other fields of biology research. This paper was motivated by considerations on how comparative studies on triploid salmon should be structured and analysed. The potential environmental benefits of rearing sterile triploid Atlantic salmon (Salmo salar L.) in commercial farming have been widely discussed (Cotter et al., 2000; Hansen and Youngson, 1998; Hindar et al., 1991). Several growth experiments have investigated the comparative performance of
* Corresponding author. Tel.: +44-20-8547-7921; fax: +44-20-8547-7972. E-mail address: e.ling@kingston.ac.uk (E.N. Ling). 0044-8486/03/$ - see front matter D 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0044-8486(03)00225-4
160
E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168
triploid and diploid salmon. Most recently, an extensive trial compared performance on a commercial scale over the full production cycle (Cotter et al., 2002). Triploid fish were found to perform as well as diploids in freshwater against commercial criteria. There is, however, some uncertainty from the results in previous fresh water studies: some conclude that triploid salmon perform better than diploids, while others find the opposite (see, for example, the summary in Galbreath et al., 1994). However, the interpretation of the data is dependent on statistical considerations which may have been overlooked. Aquaculture growth trials are expensive to carry out and are subject to large natural variations in the fish populations. Careful design of the experiments is, therefore, especially important, and the statistical techniques on which the data analysis depends must be correctly chosen and applied. In the past, the application of unsuitable procedures or the misuse of the appropriate analysis was reported widespread in marine biology, and indeed other fields in the biological sciences (Underwood, 1981). From our examination of the literature, it is clear that some difficulties remain in the field of aquaculture. Analysis of variance (ANOVA) is the most common technique used to analyse experimental data in biology because it is readily adaptable to complex multifactor designs. In aquaculture experiments, it is usual to replicate tanks of fish so that any additional variation due to random environmental factors or hierarchical feeding may be partitioned in the analysis. The variation between tanks represents a random nuisance factor that should be nested within the treatment factor. Hence, the correct method with which to analyse this design is a nested or hierarchical ANOVA. Although many growth trials do use replicate tanks, a nested analysis is not always applied to the data. In experiments on salmon triploids, the tanks within each ploidy are sometimes compared with a t-test to assess whether they belong to the same population before comparing ploidy using the pooled data. Others erroneously cross the tank factor with ploidy and, in some cases, the variation between tanks is not even measured. All can lead to invalid conclusions. Pooling tank data, for example, may bias the test for difference between ploidy. At worst, it will give a false significant result. The statistical power of a test is an important consideration in growth trials but, in common with many biological experiments, is often ignored. Power is a measure of the confidence in a test to yield a significant result when a real effect exists. Experiments often inadvertently have low power, which means that a null result is unreliablea false null hypothesis may have been accepted. Power in ANOVA has received some attention (see Searcy-Bernal, 19941 and references therein) but its analysis in nested designs is relatively little discussed. Calculating the power of a proposed experiment requires some knowledge of the expected variances, but there appears to be no published data available. This paper shows how to calculate the power of a nested ANOVA and, using data from other growth trials, discusses the optimisation of experimental design. Although what follows is illustrated with data from experiments in triploid salmon rearing, the general principles are of course applicable to other areas of aquaculture and fields with similar experimental designs.
1 It should be noted that Eq. (3) in the Searcy-Bernal paper is in error due to an invalid assumption in Appendix Bthe sample and population means cannot be assumed equal. Hence, Eq. (3) is biased.
161
2. Power of a nested ANOVA In a single-factor ANOVA, the test statistic F = MSgroup/MSerror is distributed as the central F distribution when the null hypothesis H0: l1 = l2 = . . . = la is true. MSgroup is the mean square between groups, MSerror the mean square within groups or the error mean square. If H0 is false, F is distributed as the noncentral F distribution, which depends on the two degrees of freedom of F, t1 and t2, and a noncentrality parameter, k. This parameter is defined as the ratio of the deviation of population means to the variance of the grand population mean:
a X lj l2
j1
r2 =n e
where lj l is the deviation of the jth treatment mean, n is the number of data items per treatment level (or group), a the number of levels and re2 the error variance estimated by MSerror. The parameter k may be used to determine the power of the F test via Tangs (1938) related parameter /, given by r k / : a P The deviation of the treatment means is a lj l2 d 2 =2, where d is the separation j1 of the means (the maximum separation in the case of more than two treatments). Hence, s d2 : 2 / 2ar2 =n e The power of the test 1 b may then be read off the nomographs of / reproduced in statistical tables. Power increases with /. Thus, power increases with d, but falls with larger variance. In a nested ANOVA with a random factor B nested within fixed factor A, the 2 2 denominator in Eq. (1) becomes rB(A) /nb, where rB(A) is the variance among subgroups, b the number of subgroups nested in each group and n the number of data items per 2 subgroup. The variance rB(A) is estimated by the ANOVA mean square among subgroups MSB(A), which is the correct denominator to use to calculate F in a nested model. Its expected value is EMSBA r2 nr2 e b 3
2 (If rb, the variance of level B within level A, is 0, then MSB(A) estimates the error mean square, MSerror). An estimate of the variance of group means Y is, therefore,
s2 Y
2 MSBA r2 rb e nb nb b
162
Fig. 1. Plots of the relative reduction in the variance of the mean with increasing n and b. The smaller the value of R, the greater the statistical power of the experiment.
Hence, the power of the test depends on the error variance and the variance of the random factor in level B according to Eq. (4). Values for the estimates re2 and rb may be 2 obtained from previous experiments or a pilot study. Calculations of power are most useful during the planning stage of an experiment. Stipulating a difference of interest between means, one can design an experiment with sufficient power to detect such a difference with acceptable confidence. Clearly, an increase in the number of fish sampled in a tank (subgroup), n, or the number of tanks, b, 2 will reduce sY and, thus, yield a gain in power. The question is which combination of the 2 2 two is the most efficient. The derivatives of sY with respect to n and b, divided by sY, give 2 expressions for the relative improvement (reduction) in sY with increasing n and b, respectively. The relative improvements with n and b, respectively, are then Rn 1=n2 ; b e 1=n r2 =r2 1 Rb : b
Note that Rn is independent of b. Similarly, Rb is independent of n, and also independent of the variances re2 and rb . Thus, an increase in the number of tanks from, 2 say, two to three will give a relative improvement of one-third regardless of the variances 2 or number of fish in each tank. Unless rb is negligible, increasing b always gives a greater 2 2 relative improvement in sY and, thus, power, than increasing n. The larger rb , the more pronounced the effect. Fig. 1 displays curves for Rn and Rb. Both flatten with increasing n and b, respectively, though this happens more quickly for Rn, especially for larger values of rb . This suggests that adding tanks rather than fish to tanks is the best strategy for 2 gaining power in a nested ANOVA. However, tanks are much more expensive than fish, and a more practicable approach is to examine the efficiency of a design in relation to its cost. The total cost C of an experiment may be formulated as C nbcfish bctank 5
163
2 Fig. 2. Graph of n vs. b showing constant curves for the variance of the mean sY , and the cost function C (bold line). The optimal design is where the two curves intersect, at n = nopt, the simultaneous solution with equal roots.
where cfish and ctank are the unit costs of each fish and tank, respectively. Cost will rise as n and b increase, but this will bring a fall in the variance according to Eq. (4) and, hence, a gain in power. However, at some stage, the extra cost will not produce a worthwhile improvement. The optimal design and sampling strategy is the point at which the product 2 CsY is a minimum, i.e. when the derivative is equal to zero. This gives (Snedecor and Cochran, 1967) s re2 =rb 2 : 6 nopt cfish =ctank This is the value of n that leads to the most efficient sampling strategy. To achieve the optimal design for a given cost, the number of tanks b can be calculated from Eq. (5). Alternatively, Eq. (4) may be solved for b to achieve a desired sensitivity. Fig. 2 helps to 2 clarify the idea. The graph shows curves of constant sY and C. In general, these will intersect at two points, the roots of the simultaneous solutions of Eqs. (4) and (5). The optimal design is when n = nopt, where the curves intersect at a single point. This is the solution with equal roots, an alternative derivation of Eq. (6).
3. Experimental variability The optimal design of an experiment with a defined power is only achievable with a 2 prior knowledge of the variances and r2 and rb. Although there have been many fish e growth trials with Atlantic salmon and other species, few figures have been published with 2 which to estimate these parameters. Some idea of the size of rb to expect in a given trial is critical to designing sufficient power into an experiment because the variability between tanks is sensitive to several factors, such as tank location, its size and flow rates through
164
the tanks. Even apparently innocuous events such as people walking past or bumping into the tanks can have an impact (Webster, 2003, personal communication). Using previous 2 experimental data, r2 may be measured directly and a value for rb calculated with the help e of Eq. (3). Table 1 lists figures derived from several growth trials, mostly diploid triploid comparisons, but some on diet with diploids. The data in the table show a wide range of values for the coefficients of variation. In the trials on diet (data sets 8 11) and the presmolts (data set 6), the figures are generally lower because the fish have been graded. This is a common practice in fish rearing and should be taken into account when designing
Table 1 Data from 11 growth trials on Atlantic salmon Data Origin set 1 2 3 4 5 6 SRAa AF ploidy June 7, 1995 SRA AF ploidy August 29, 1995 SRA AF ploidy July 9, 1996 SRA AF ploidy September 4, 1996 SRA MS ploidy August 29, 1995 SRA Pre-smolt ploidy April 24, 1996 Stolte AF ploidy August 1992 SRA diet September 9, 1999 SRA diet January 11, 2000 Norwegian diet May Norwegian diet July N 600 600 1080 1080 600 200 n 150 150 180 180 150 50 Tanks Levels Grand mean 4 4 6 6 4 4 2 2 2 2 2 2 0.553 11.94 1.073 11.83 12.31 80.1 MSwithin MSerror 2 (r e ) 0.103 9.761 0.091 68.04 144.2 841.4 0.022 19.5 0.066 20.7 25.2 342.3 rb 2 CVerror CVb 0.0005 0.266 0.042 f0 0.370
0.0001 0.239 0.009 0.263 0.384 0.043 0.793 0.408 0.072 9.982 0.231 0.039
7 8 9 10 11
391.5 65.25 6 900 900 150 150 6 6
2 2 2 4 4
11.15 11.9 42.1 158.2 302.5
15.56 12.7 300 3001 5407
11.2 3.72 36.5 2452 6713
0.067 0.300 0.023 0.060 0.162 0.021 1.757 0.144 0.031 4.361 0.313 0.013 f0 0.271
1508.4 125.7 12 1524 127 12
Column 2 shows the origin of the data. Columns 3 6 show the experimental arrangementtotal numbers of fish sampled (N), fish sampled per tank (n), the total number of tanks and the number of levels. Column 7 is the grand mean over all tanks of fish weights in grams. Columns 8 and 9 show the within group and error mean squares, and column 10 shows the tank variance (grams squared). Columns 11 and 12 are the coefficients of variation, the standard deviations re and rb as a proportion of the grand mean. Noninteger values for N and n are adjusted figures for unbalanced data (see, e.g., Sokal and Rohlf, 1995a, p. 297). Data sets 1 6 come from the full-cycle trial conducted at the Marine Institute in Ireland (formerly the Salmon Research Agency) comprising all-female (AF) and mixed-sex (MS) data; all are first-year freshwater experiments (Cotter et al., 2002). Set 7 is an allfemale, first-year trial from Stolte Sea Farms in the USA (Galbreath et al., 1994). All sets are comparative trials of diploid and triploid Atlantic salmon, except 8 and 9, and 10 and 11 which are, respectively, Marine Institute (in freshwater) and Norwegian (in seawater) diet growth trials on diploids (unpublished data). Sets 1 and 2 are dates from the same trial; likewise with sets 3 and 4, sets 8 and 9, and sets 10 and 11. a Salmon Research Agency.
165
experiments for later stages in the growth cycle. In the data sets 1 5 and 7, where fish are monitored on different dates from the commencement of feeding for 3 months, the variation between fish ranges from about 41% in data set 5 to 27% in set 1. The variation between tanks is more difficult to assess, given that CVb is estimated with very few degrees of freedom, two in most cases. This explains why MSerror is greater than MSwithin in sets 2 and 11. Values for CVb in the other trials range from just over 1% to 7%. The range of variances in Table 1 indicates the value of pilot trials. Local conditions clearly will have considerable bearing on the variability in a particular experiment. Figures obtained in different conditions may not necessarily be a reliable guide. Values for the variances are likely to be most useful in the context of the mean. The end stage of the three SRA freshwater trials, sets 2, 4 and 5, have mean weights of about 12 g. Corresponding values of CVerror are fairly consistent at around 35 40%. Figures for CVb, however, can only be considered to indicate an order of magnitude for the tank variance.
4. An example power calculation We now show an example using the above formulae to calculate the values of n and b that will achieve a power of 80% (a conventional choice) in a comparative salmon rearing experiment, accepting a significance level of 5% (a = 0.05, the probability of a Type I error). For a nested ANOVA, the formula for / is s d2 / 7 2 2asY
2 where the variance of group means sY is given by Eq. (4). We take the case of a trial using first feeding salmon fry where the overall mean weight of the fish is expected to be 12 g at the end of the first year growth period (lasting about 3 months). The consideration of practical significance is very important and a figure is needed to design a useful experiment. A reasonable supposition is that a difference of interest should be at least as large as the natural variation one expects between salmon cohorts. This can be very large, as much as 30% in first-year fish (Webster, 2003, personal communication). However, much depends on local conditions and one cannot assume a universal figure. For this illustration, let us suppose we require an experiment sensitive enough to detect a difference of 10% between the means of the two ploidys, so that d = 1.2 g, the minimum detectable difference. For the coefficients of variation amongst fish and between tanks, we assume figures of 35% and 5%, respectively (which lie in the range of those in Table 1), 2 giving values for the variances re and rb of 17.6 and 0.36 g2, respectively. The number of 2 2 treatments a is 2. Using Eq. (4) to calculate sY for given n and b, a value for / may be obtained from Eq. (7), which can be used to read off power against the nomograph with degrees of freedom t1 = 1 in statistical tables (e.g., Sokal and Rohlf, 1995b, p. 188). Fig. 3 shows power curves for several values of b, illustrating how the power varies with n, assuming the above parameter values. Previous experiments have typically used three tanks (or fewer) per ploidy, sampling of the order of 100 fish from each tank. Substituting these figures into Eq. (4) yields a value
166
Fig. 3. Graph of experimental power vs. number of fish sampled n for numbers of tanks b from 2 to 8. Minimum detectable difference is 1.2 g; variances within tanks and between tanks are 17.6 and 0.36 g2, respectively.
2 for sY of 0.18; from Eq. (7), we, thus, obtain / = 1.42. The degrees of freedom t2 of the F distribution is a(b 1) = 4. Referring to the nomograph for t1 = 1, the power of such a design is somewhat less than 40%. If we require a significance level of 1% (a = 0.01), the power will be even less. By substituting a range of n and b in Eq. (4), it becomes apparent that for the above variances, a minimum of five tanks per ploidy is required to achieve a power of 80%, and that only with a fish sample level of 5000 (Fig. 3). Increasing the tanks to six brings the required sample rate down to 200. The most cost-effective arrangement can be obtained by first calculating n from Eq. (6). Using the same figures as above, the value of the numerator is 49. Figures supplied by the Marine Institute based on current tank costs suggest a value for the ratio of fish cost to tank cost of 1/120 (this figure takes no account of economies of scale, minimum stocking densities or depreciation). These values give an optimal value of n of approximately 77, implying a figure for b of 8. To illustrate the sensitivity of power to the minimum detectable difference, we can repeat the calculation with d = 2 g in Eq. (7). This reduces the number of tanks required to achieve 80% power to three, with a fish sample rate of a little over 200.
5. Conclusions We have derived a formula for the statistical power of a nested ANOVA and shown how to achieve a desired power at the lowest cost. The power of a comparative test is rarely considered in salmonid experiments and the above calculations suggest that many would have had only sufficient tanks to detect very large differences. However, the impact of this undersampling depends on the size of the effect that the experiment measures. If the
167
observed difference between the ploidy means is too small to be of practical interest, a hypothesis test becomes irrelevant. We note that the issue of practical significance is rarely addressed in comparative trials. Given that natural differences can be large, it could be that only large treatment effects are typically of scientific interest. The data used here show that at least five tanks per ploidy are required to ensure a nested experiment has acceptable power, regardless of the number of fish sampled per tank. Most past experiments have only used three tanks. Normally, there are constraints on the number of tanks and fish available for an experiment. Such an arrangement may only have sufficient power to detect large effects, although smaller differences may not be of interest. The calculations here depend on the expected variances between groups and within groups; at present insufficient data are available to determine the latter with any confidence. If rb is substantially smaller than the figure used here, the number of tanks 2 required could be fewer. Conversely, a more sensitive experiment with the ability to detect smaller separations between means than the above examples could demand many more. We conclude that future comparative experiments should pay closer attention to the issue of statistical power. The alternative is to risk obtaining results that are misleading or completely false. It is quite possible that more tanks are required than routinely used at present. Further experimental work is necessary to establish the size of variation between tanks. The variance may be dependent on local factors.
Acknowledgements We would like to thank Peter Galbreath and Frode Oppedal for providing their experimental data on which some of the figures in Table 1 are based. We are also grateful to Nasrollah Saebi for reading the manuscript, and for his helpful comments and suggestions for improvement.
References
Cotter, D., ODonovan, V., OMaoileidigh, N., Rogan, G., Roche, N., Wilkins, N.P., 2000. An evaluation of the use of triploid Atlantic salmon (Salmo salar L.) in minimising the impact of escaped farmed salmon on wild populations. Aquaculture 186, 61 75. Cotter, D., ODonovan, V., Drumm, A., Roche, N., Ling, E.N., Wilkins, N.P., 2002. Comparison of freshwater and marine performances of all-female diploid and triploid Atlantic salmon (Salmo salar L.). Aquaculture Research 33, 43 53. Galbreath, P.F., St. Jean, W., Anderson, V., Thorgaard, G., 1994. Freshwater performance of all-female diploid and triploid Atlantic salmon. Aquaculture 128, 41 49. Hansen, L.P., Youngson, A.F., 1998. Interactions between farmed and wild salmon and options for reducing their impact. In: Youngson, A.F., Hansen, L.P., Windsor, M.L. (Eds.), Interactions Between Salmon Culture And Wild Stocks Of Atlantic Salmon: The Scientific And Management Issues. NINA, Trondheim, pp. 80 89. Report of a Symposium Organised by ICES/NASCO. Hindar, K., Ryman, N., Utter, F., 1991. Genetic effects of aquaculture on natural fish populations. Aquaculture 48, 945 957. Searcy-Bernal, R., 1994. Statistical power and aquaculture research. Aquaculture 127, 371 388. Snedecor, G.W., Cochran, W.G., 1967. Statistical Methods, 6th ed. University of Iowa Press, Ames, IA. 593 pp.
168
Sokal, R.R., Rohlf, F.J., 1995a. Biometry, 3rd ed. Freeman, New York. 887 pp. Sokal, R.R., Rohlf, F.J., 1995b. Statistical Tables, 3rd ed. Freeman, New York. 199 pp. Tang, P.C., 1938. The power function of the analysis of variance tests with tables and illustrations of their use. Statistics Research Memorandum 2, 126 149. Underwood, A.J., 1981. Techniques of analysis of variance in experimental marine biology and ecology. Oceanographic and Marine Biology Annual Review 19, 513 605. Webster, J., 2003. A Personal Communication with the author.

Statistical Power in Comparative Aquaculture Studies: E. Nigel Ling, Deirdre Cotter

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Power in Comparative Aquaculture Studies: E. Nigel Ling, Deirdre Cotter

Uploaded by

Copyright:

Available Formats

Aquaculture 224 (2003) 159 168 www.elsevier.

Statistical power in comparative aquaculture studies

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

391.5 65.25 6 900 900 150 150 6 6

11.15 11.9 42.1 158.2 302.5

15.56 12.7 300 3001 5407

11.2 3.72 36.5 2452 6713

1508.4 125.7 12 1524 127 12

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

E.N. Ling, D. Cotter / Aquaculture 224 (2003) 159168

You might also like