Professional Documents
Culture Documents
1. INTRODUCTION
In a paper three pages in length published in Biometrics Bulletin,
Wilcoxon [1] proposed two rank-based statistical tests for
analyzing data from experiments comparing two treatments. The
one for the matched-pairs design has come to be known as
the Wilcoxon signed-rank test (WSR) (or simply signed-rank
test). According to David [2, p.213], this clever and helpful term
was coined by Tukey [3] in an unpublished but repeatedly cited
technical report [the simplest signed-rank tests]. Clever and
helpful indeed is the compound adjective signed-rank. However,
nowadays, it is not clear whether the same can be said about the
phrase signed-rank test as a statistical term, because it has been
used to refer to two different statistical tests having completely
different null hypotheses and thus can give rise to ambiguity.
To make matters more tangled, one of those tests assumes
symmetry of distribution, while the null hypothesis of the other
test is that the distribution is symmetric. This intricate case of
terminological ambiguity (cf. Becker [4]) has practical ramifications: misconceptions, misinterpretations, and misapplications
can often trace their origins to it. As part of this short note, we
put forward a simple solution for this problem, namely, to avoid
using the phrase signed-rank test altogether and start a new terminological convention under which the adjective signed-rank
applies exclusively to the noun statistic. We believe that
the aforementioned convention would better serve statistical
practice and pedagogy.
The rest of the paper is organized as follows. Section 2 provides
background material taken from Lehmann and Romano [5].
Section 3 contains formulation of the two distinct statistical tests
that have both been called signed-rank test. Sections 4 and 5 deal
with the familiar topic of competition among tests based on the
signed-rank statistic, one-sample t-test, and sign test (Dixon and
Mood [6]), partly to illustrate the advantage of the proposed new
terminological convention, whose rationale will have been made
clear in Section 3. Section 6 provides some summary remarks. For
the purpose of this paper, absolutely continuous distribution is
assumed for all random variables and random vectors.
Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
H. Li and T. Johnson
symmetric about a known constant m0 (Lehmann and Romano
[5]). For the test procedure including test statistic and rejection
region, see Wilcoxon [1] and Lehmann and Romano [5].
(1)
(2)
Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
H. Li and T. Johnson
where denotes the mean of the distribution and 0 denotes
a known constant, with the assumption that the distribution
is normal. Because all normal distributions are symmetric, the
assumption of symmetry for TM is weaker than the assumption of
normality for the one-sample t-test. Thus, we have the following
ranking of assumptions from the strongest to the weakest: (i) (for
the one-sample t-test) the distribution is normal (and hence symmetric); (ii) (for TM ) the distribution may or may not be normal but
is symmetric; and (iii) the shape of the distribution is unknown.
Under condition (i), where the normality assumption holds,
the one-sample t-test would be the test of choice, as it controls
type 1 error rate accurately and has various optimality properties and overall reasonable performance with regard to power
(e.g., Neyman [19] and King [20]). Under condition (ii), where
the symmetry assumption holds but normality assumption is in
doubt, TM is more attractive than the one-sample t-test given
the exactitude and efficiency of the former (e.g., Hodges and
Lehmann [21], Chernoff and Savage [22]) and possible unsatisfactory control of type 1 error rate of the latter (e.g., Bahadur and
Savage [23], Romano [24]), especially when sample size is small.
Arguably, however, in practice, it is rarely the case that
the normality assumption is not accepted but the symmetry
assumption is acceptable (note that only situations other than
those in Section 4.1 are under discussion here). Although there
is a wide selection of parametric families of non-normal symmetric distributions, situations in which their use is fully justified are
relatively few. It seems to be far more often the case that when
normality assumption is not accepted the symmetry assumption
cannot be accepted either. Without the assumption of symmetry
for the distribution, the mean and the median are not necessarily the same, so TM and the one-sample t-test do not necessarily
test the same hypotheses. Nevertheless, with the aforementioned
caveat in mind, to illustrate what happens when the assumption
of symmetry is violated, we conduct a small simulation to contrast
their operating characteristics, in particular, type 1 error rates.
For the aforementioned simulation, we decided to choose a
bounded distribution, considering that in the real world, all data
generating distributions are bounded; otherwise, our choice of
distribution is mostly arbitrary. It turns out that the asymmetric beta distribution beta (2, 5) (see Gupta and Nadarajah [25]
for comprehensive coverage of beta distribution in theory and
practice) serves the objectives of our illustrative simulation quite
well. Note that the simulation results are applicable not only
to a distribution with the interval [0,1] as its range but also
to a family of distributions with a range of any magnitude, via
location-scale transformations.
Table I contains the results of a simulation of 10,000 runs with
various sample sizes generated from beta (2, 5) (for its shape, see
the asymmetric distributions displayed in Figure 1). Examine the
type 1 error rate inflation for TM in Table I and how it goes up with
sample size. This phenomenon makes perfect sense. For an asymmetric distribution, even if the median equals the value specified
in the null hypothesis for TM (1), the whole distribution is still
in the territory of alternative hypothesis for TS (Figure 1), so the
probability of rejection is expected to trend up for TS as sample
size increases and therefore also for TM because it has the same
test statistic and rejection region as TS . For the same reason, when
the distribution does not satisfy the assumption of symmetry, the
type 1 error rate for TM (1) is practically always inflated and grossly
so when the sample size is large. On the other hand, the type 1
error rate for the one-sample t-test (2) is relatively well contained
and tends to be closer to the nominal 5% as sample size increases,
t-test (%)
TM (%)
5.4
8.7
13.0
24.8
98.8
5.3
6.8
8.0
10.8
63.2
5.6
5.8
5.7
5.1
4.8
5.1
6.5
7.4
8.4
37.3
which is not surprising given the central limit theorem. The issue
of ties does not arise in this paper as absolutely continuous distribution is assumed; where it does, TM (1) would in all likelihood
compare even less favorably to one-sample t-test (2).
Some textbooks and tutorials do not mention the assumption
of symmetry in their introduction of TM to the reader. Some contain remarks from which the reader may get the wrong impression
of TM being in general preferable to one-sample t-test when the
normality assumption is in doubt. The illustrative aforementioned
simulation should serve as a warning of the perils of handling the
assumption of symmetry casually. Considering the serious consequence of its violation, the assumption of symmetry ought to
be emphasized, rather than glossed over or brushed aside, at the
point where TM is introduced to the reader in elementary textbooks. If TM is covered in a first course in statistics, it should be
instructive to describe its behavior under asymmetric distribution, in particular, the runaway type 1 error inflation as sample size
increases, and contrast it with that of t-test under the same distribution. After such an exercise (as the simulation performed in
this paper), it should be clear that TM does not possess the kind
of robustness that t-test does. Therefore, while the one-sample
t-test (2) (for mean) enjoys wide applicability, routine use of TM (1)
(for median) is not advisable. Finally, the shaded regions of Table I
remind us that TM should never be used as a test for mean, and
t-test should never be used as a test for median.
Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
H. Li and T. Johnson
treatment effects outside the null hypothesis of sign test, often,
the power of TS is higher than that of sign test but not always
(Section 5.2). All in all, there do not seem to be many reasons to
prefer sign test over TS unless one has absolutely no interest in
detecting those non-null treatment effects that happen to have a
median difference of 0.
Shift
TM (%)
10
0.00
0.05
0.10
0.15
0.20
4.6
7.2
14.0
25.9
42.0
2.0
2.6
5.0
8.6
15.1
30
0.00
0.05
0.10
0.15
0.20
4.9
14.6
39.6
71.1
92.0
4.1
7.9
17.7
35.5
58.7
50
0.00
0.05
0.10
0.15
0.20
5.1
20.6
61.5
91.4
99.2
3.3
7.7
24.2
50.2
78.7
100
0.00
0.05
0.10
0.15
0.20
5.1
37.1
89.1
99.7
>99.9
3.8
13.4
46.6
81.9
97.8
1000
0.00
0.05
0.10
0.15
0.20
5.1
99.9
>99.9
>99.9
>99.9
4.5
88.3
>99.9
>99.9
>99.9
Table II. Power: null distribution beta (4, 4), 10,000 runs,
H0 : m D 0.5, two-sided D 0.05.
n
Shift
TM (%)
10
0.00
0.025
0.05
0.075
0.10
5.0
6.4
12.6
22.3
35.7
2.2
3.0
5.2
9.8
16.7
30
0.00
0.025
0.05
0.075
0.10
4.7
10.7
31.2
60.4
85.1
4.1
8.0
19.3
39.6
63.4
50
0.00
0.025
0.05
0.075
0.10
5.0
16.1
49.4
83.2
97.4
3.3
8.9
27.9
56.6
82.5
100
0.00
0.025
0.05
0.075
0.10
4.7
28.5
79.3
98.8
>99.9
3.6
15.5
52.5
88.0
98.9
1000
0.00
0.025
0.05
0.075
0.10
5.2
99.3
>99.9
>99.9
>99.9
4.5
88.3
>99.9
>99.9
>99.9
6. SUMMARY
A statistical test procedure proposed in Wilcoxon [1] has been
used for two different purposes: to test whether a distribution
is symmetric about a known constant and to test whether the
median of a symmetric distribution equals a known constant.
In the current literature, those two tests share the same name
(Section 1), which can be a source of confusion for students
and practitioners alike. To distinguish between those two tests,
we propose to refer to them as test for symmetry based on
signed-rank statistic (denoted by TS ) and test for median based on
signed-rank statistic (denoted by TM ), respectively. Note that the
proposed nomenclature does not introduce any new terms into
the statistical lexicon.
Equipped with our terminological improvement, we embarked
on a discussion of two topics of general interest: those of comparisons with one-sample t-test and sign test. We hope we
have demonstrated that informative discussion on those topics is challenging without making clear distinction between TS
and TM . Here is a short recapitulation of our discussion. We
first considered the randomized matched-pairs design. In this
setting, TS is distribution-free as a test of the null hypothesis
of no treatment effect (Section 2). Its power does not go much
less than that of the t-test and seems to be higher than the sign
Published 2014. This article is a U.S. Government work and is in the public domain in the USA.
H. Li and T. Johnson
REFERENCES
Table IV. Power: null distribution standard cauchy, 10,000
runs, H0 : m D 0, two-sided D 0.05.
n
Shift
TM (%)
10
0
0.2
0.5
1.0
1.5
2.0
4.8
6.7
12.6
28.6
41.8
50.8
2.2
3.7
8.1
24.8
41.8
54.7
30
0
0.2
0.5
1.0
1.5
2.0
4.8
9.2
29.4
69.6
89.2
95.8
4.4
9.5
34.3
80.7
96.1
99.2
50
0
0.2
0.5
1.0
1.5
2.0
4.7
12.7
44.5
89.4
98.5
99.7
3.0
10.9
49.1
94.5
99.7
>99.9
100
0
0.2
0.5
1.0
1.5
2.0
5.0
19.0
73.4
99.5
>99.9
>99.9
3.5
18.6
81.4
>99.9
>99.9
>99.9
Acknowledgements
The authors would like to thank Ted Peterson for his assistance with generating the figure for the manuscript. In addition,
the authors would like to express their appreciation to two
colleagues, Drs. Vandana Mukhi and Jessica Kim, for providing
helpful comments.
Published 2014. This article is a U.S. Government work and is in the public domain in the USA.