When T-Tests or Wilcoxon-Mann-Whitney Tests Won't Do

Adv Physiol Educ 34: 128–133, 2010;
Staying Current doi:10.1152/advan.00017.2010.
When t-tests or Wilcoxon-Mann-Whitney tests won’t do

Fiona McElduff,1 Mario Cortina-Borja,1 Shun-Kai Chan,2 and Angie Wade1
1
Medical Research Council Centre of Epidemiology for Child Health and 2Nephro-Urology Unit, Institute of Child Health,
University College London, London, United Kingdom
Submitted 1 February 2010; accepted in final form 9 July 2010
McElduff F, Cortina-Borja M, Chan SK, Wade A. When t-tests test: a general form and the location-shift model, both of which
or Wilcoxon-Mann-Whitney tests won’t do. Adv Physiol Educ 34: can be problematic when skewed discrete data are analyzed.
128 –133, 2010; doi:10.1152/advan.00017.2010.—t-Tests are widely The general form of the WMW test assesses the probability
used by researchers to compare the average values of a numeric (P) that any observation drawn at random from the first group
outcome between two groups. If there are doubts about the suitability
(X) will exceed any observation from the second group (Y) i.e.,
Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

of the data for the requirements of a t-test, most notably the distribu-
tion being non-normal, the Wilcoxon-Mann-Whitney test may be used P (X ⬎ Y) ⫽ 0.5 against P (X ⬎ Y) ⫽ 0.5 or a one-sided
instead. However, although often applied, both tests may be invalid alternative, P (X ⬎ Y) ⬎ 0.5 or P (X ⬎ Y) ⬍ 0.5. It assumes
when discrete and/or extremely skew data are analyzed. In medicine, that all the observations are independent and are either contin-
extremely skewed data having an excess of zeroes are often observed, uous or ordinal with no possibility of ties (6). This is often an
representing a numeric outcome that does not occur for a large unrealistic assumption for ordinal and discrete data. Although
percentage of cases (so is often zero) but which also sometimes takes the variance of the WMW test statistic can be adjusted in the
relatively large values. For data such as this, application of the t-test presence of ties, this adjustment always results in an increase in
or Wilcoxon-Mann-Whitney test could lead researchers to draw in- the test statistic and smaller P values, and this version of the
correct conclusions. A valid alternative is regression modeling to WMW test becomes less conservative (12) for increasing
quantify the characteristics of the data. The increased availability of proportions of tied observations in the pooled sample.
software has simplified the application of these more complex statis-
tical analyses and hence facilitates researchers to use them. In this
The location-shift form of the WMW test is more restrictive
article, we illustrate the methodology applied to a comparison of cyst and assumes that the distribution of values in each group differ
counts taken from control and steroid-treated fetal mouse kidneys. only on location. Using this form, the test compares the
significance of any differences in medians (6). Fagerland and
Poisson; negative binomial; zero inflation Sandvik (5) showed that the WMW test in its location-shift
form is not robust in the presence of skewed data. Hence, when
WHEN FACED WITH THE TASK of comparing a numeric outcome the distributions are skewed and dispersions are unequal, this
between two groups, most clinicians perform either a t-test or, form of the WMW test is inappropriate, and, consequently,
if the data set is small and/or the assumptions for parametric incorrect interpretations may be made. If all necessary assump-
testing are not met, a Wilcoxon-Mann-Whitney test (WMW; tions are made, then it is unlikely that the t-test of means would
also called the Mann-Whitney-Wilcoxon test, Wilcoxon rank- be invalid (3), and, therefore, the location-shift form has very
sum test, Wilcoxon test, or Mann-Whitney U-test). Statistical limited applicability (6).
methods may be divided according to the assumptions they In this article, we analyzed counts of cysts in two groups of
demand from the data. Parametric methods suppose that the mouse kidneys: one treated with a steroid and a control group.
data follow a specific probability distribution whose parame- Kidney cyst counts were often zero and had a highly skewed
ters can be estimated, whereas the basic principle of nonpara- distribution. When comparisons were made between the two
metric tests, such as the WMW test, is that the observed pooled groups, a WMW test detected a significant difference, although
values are ranked and no particular distributional assumptions the group medians were both equal to zero. We used this data
are made. Statistical analyses of highly skewed distributions set to illustrate alternative and valid methods for quantifying
can be especially problematic, particularly when the data are and formally testing the significance of differences for data of
this form. The standard practice of applying WMW tests to
discrete and exhibit value inflation.
overcome the usual assumptions of the t-test without consid-
The t-test is a parametric test of the difference in mean
eration of the data’s distribution is not a panacea for every data
between two groups that assumes that the data are normally
set where a t-test is inappropriate. In this, the assumptions of
distributed and the groups have equal variances. Welch’s
the location-shift form of the hypothesis of the WMW test are
approximation (13) can be used when the assumption of equal
invalid, and the cyst data set is inappropriate for the correct
variances is not satisfied. While it is commonly believed that
application of this test. While the general version of the WMW
the WMW test compares medians in the same way that a t-test
test is not incorrect in this instance, it provides less information
compares means, this does not always hold. In some cases, a
about the difference in location than other methods. We
WMW test indicates a significant difference between groups
showed how considering several regression models offers a
when the medians are equal (2, 3, 6), in contrast to what many
viable and more informative alternative means of analysis for
researchers would expect. There are two versions of the WMW
data with an excess of zeroes and extreme skewness. Regard-
less of the issue of validity of the WMW test, a regression
approach has some advantages over simple tests of compari-
Address for reprint requests and other correspondence: F. McElduff, Med-
ical Research Council Centre of Epidemiology for Child Health, UCL Institute
sons: 1) model parameters can be estimated to allow for greater
of Child Health, 30 Guilford St., London WC1N 1EH, UK (email: interpretation of the data and 2) goodness-of-fit statistics allow
f.mcelduff@ich.ucl.ac.uk). for comparisons between regression models.
128 1043-4046/10 Copyright © 2010 The American Physiological Society
Staying Current
WHEN t-TESTS OR WILCOXON-MANN-WHITNEY TESTS WON’T DO 129
This is a training article to illustrate common misconcep- to produce cysts and those kidneys that can produce cysts but haven’t
tions about the t-test and WMW test and to propose the use of during this experiment.
discrete regression modeling as an alternative. The data set we Model parameters. Parameter estimates in linear regression give
used represents data resulting from a typical experiment found the difference in means between the groups. In contrast, the Poisson
and negative binomial models yield rate ratios (RRs), which estimate
in research. It provides a simple example of a situation where the change in the relative (rather than absolute) mean number of
the assumptions of the commonly applied t-test and WMW test events between the groups. RRs can be expressed in different ways.
are violated. Although these tests are clearly inappropriate for For example, RR ⫽ 1.25 indicates that the mean in one group is, on
the real-life data presented here, we include them for illustra- average, 1.25 times higher or, alternatively, that there is a 25%
tive purposes. increase in one group compared with the other. On the other hand,
e.g., RR ⫽ 0.83 indicates a 17% decrease in one group compared with
METHODS the other.
Zero-inflated regression models additionally yield the relative odds
Data. The example data set originated from a study (3a) investi- of the proportions of zeros in each group. The odds and probability are
gating the effect of a corticosteroid on cyst formation in mice fetuses calculated by dividing the number of events by the number without

undertaken within the Department of Nephro-Urology at the Institute the event and by the total number of observations, respectively. For
of Child Health of University College London. Embryonic mouse example, an odds value of 2 (for every two individuals that experience
kidneys were cultured, and a random sample was subjected to steroids the event, one individual does not) equates to a probability of 0.33
(n ⫽ 111), whereas the remainder acted as controls (n ⫽ 103). Six [equal to 2/(1⫹2) ⫽ 2/3]; odds of 0.25 (for every individual for whom
days later, cyst counts were compared between the groups. the event occurs, there are four individuals for whom it does not)
Summary descriptive statistics and bar charts were used to describe equates to a probability of 0.2 (equal to 1/5). Odds ratios are calcu-
the data. Normality was formally assessed using the Shapiro-Wilks lated as the odds in one group divided by the odds in another group.
test (12), and the control and steroid-treated groups were compared Relative risks and odds ratios are always positive, and a value of 1
using a t-test and WMW test. Mean and median differences are means that there is no difference between the groups. All estimates
presented with 95% confidence intervals (CIs). A ␹2-test (4) assessed (mean differences and rate and odds ratios) are presented with 95%
the difference in the percent zero counts between the two groups. CIs.
Regression modeling. A series of different regression models was Comparison of regression models. There are a variety of ways of
used to describe the relationship between cyst counts and steroid deciding which regression model best describes the data. A more
status. First, a normal linear regression model was used to compare complex model, with additional parameters, generally improves the
the means between the two groups, which for a single binary predictor goodness of fit (i.e., how closely the model follows the data) but
(steroid/control) is equivalent to the t-test. The assumptions of linear requires the estimation of extra parameters. Ideally, our chosen opti-
regression are the same as for the t-test, i.e., that the data are mal model should describe the data’s main features without losing
continuously numeric and normally distributed with equal variances information or being overly complex. It is important that model
in each group. The purpose of presenting this regression model is to selection procedures avoid including redundant parameters (overfit-
allow comparisons of model fit diagnostics with other, less commonly ting); we do not want to have more parameters than necessary.
used, models. Regression diagnostics are not available when the t-test The regression models described above (linear, Poisson, negative
is used. binomial, ZIP, and ZINB) are not all direct extensions of each other.
Second, a Poisson regression model was fitted to the data. This Hence, we need to use comparison methods that do not rely on the
model is appropriate when the outcome is a count or a rate and models being nested, i.e., that all the parameters of the smaller model
quantifies the mean number of events (or counts). The mean and of the two being compared are included in the larger model. There-
variance are assumed to be equal in each of the two groups (steroid/ fore, we compared models using the Bayesian information criterion
control) (1). If the variance of the distribution is significantly greater (BIC), which is a well-established measure of goodness of fit (10) that
than the mean, there is overdispersion, and the Poisson regression also applies to nonnested models. The model’s likelihood is a measure
model will correctly estimate the covariate parameters but will un- of how well it fits the data set. The BIC is based on the model’s log
derestimate their SEs (8). An extension of the Poisson regression likelihood larger values indicating better goodness of fit. To account
model is the negative binomial regression model (1), which incorpo- for the fact that more complex models always lead to the likelihood
rates an extra parameter to allow for overdispersion in the data. The remaining the same or increasing (adding parameters cannot make the
improvement in model fit obtained using the negative binomial rather overall fit worse), BIC makes an adjustment penalizing for the number
than the Poisson model can be formally tested. of estimated parameters and is more conservative against overfitting,
Allowing for zero inflation. Zero inflation occurs where there is an i.e., it leads to models with smaller number of parameters than other
excess number of zeroes in the population, often arising from a measures of goodness of fit. This is particularly important when the
separate process in the population. For example, some individuals are potential number of covariates to be adjusted for is large. Models with
infertile and will never have children, and others are fertile but may be lower BIC values indicate a better penalized goodness of fit; however,
childless; the total number of individuals with no children depends on its absolute value is meaningless by itself, and only differences of BIC
both processes combined. matter.
Both the Poisson and negative binomial regression models can All analyses were performed using the R environment for statistical
be extended to incorporate an additional parameter that allows for computing (version 2.10.1) (11) in a Windows platform. The glm
zero inflation in the data. The resulting models are known as the function in the base package of R was used to fit normal, Poisson, and
zero-inflated Poisson (ZIP) and zero-inflated negative binomial negative binomial models, and the pscl library (15) was used to fit
(ZINB) (1), and the extra parameter quantifies the excess of zeroes. zero-inflated models, both of which use maximum likelihood estima-
These models result in two parameter estimates for each covariate tion methods.
entered into the model. For example, in our data set, one parameter
will estimate the increased probability of a kidney being unable to RESULTS
produce cysts. The other parameter will estimate the difference in cyst
counts between the steroid-treated and untreated groups of those Exploratory data analysis. Bar charts of the cyst frequencies
kidneys that can/do produce cysts (which can take values of 0, 1, 2, (Fig. 1) showed that most kidneys had no cysts (74.3% over-
etc.). Note that the total number of zeroes is the sum of kidneys unable all), although there are a few kidneys in each group with large
Advances in Physiology Education • VOL 34 • SEPTEMBER 2010

Staying Current
130 WHEN t-TESTS OR WILCOXON-MANN-WHITNEY TESTS WON’T DO

Fig. 1. Bar graphs of the frequencies of cysts in the kidneys for the two groups [steroid-treated group (top) and control group (bottom)].
cyst counts; hence, the distributions were highly skewed to the met, but we present the results purely for comparative pur-
right and zero inflated. The overall mean number of cysts was poses. A t-test yielded a highly significant difference between
0.87 (SD: 2.28), indicating overdispersion since the variance is the means of 1.40 [95% CI: (0.82, 1.99), P ⬍ 0.01]. The
almost four times larger than the mean. The values showed WMW test under the general null hypothesis P (X ⬎ Y) ⫽ 0.5
larger number of kidneys with low counts in the control group. produced P ⬍ 0.01, similarly indicating a significant differ-
The steroid-treated group had 1 kidney with 19 cysts, which ence. A median difference of 0 and a value of 0 for both the
was much higher than the maximum number of cysts found in upper and lower 95% CI for the median difference were also
the group of control kidneys (maximum: 3). Not surprisingly, calculated, and this contrasts with the results from the WMW
the Shapiro-Wilks test for normality gave P ⬍ 0.01, indicating test, where a significant difference was detected. A comparison
that the counts did not follow a normal distribution. Similarly, of the number of cysts with zero counts showed that there were
Bartlett’s F-test for homogeneity of variances yielded P ⬍ significantly more (␹2-statistic ⫽ 28.23, degree of freedom: 1,
0.01, which implied a significant difference in the variances
P ⬍ 0.01) cysts with zero counts in the control group than in
between the two groups. Note that this test assumes normality.
The Ansari-Bradley test for equal scale parameters (9) also
gave P ⬍ 0.01, indicating a significant difference in dispersion
between the groups. Table 1. Summary statistics for counts of cysts for the
Table 1 shows summary statistics for the steroid-treated and steroid-treated and control groups
control groups. The means were 1.55 and 0.15 for the steroid-
Steroid-Treated
treated and control groups, with SDs of 2.98 and 0.51, respec- Group Control Group
tively. The medians for the two groups were both 0, with the
Mean 1.55 0.15
steroid-treated group having an interquartile range of 2 and the SD 2.98 0.51
control group an interquartile range of 0. The mean and SD in Median 0 0
the steroid-treated group were much higher than those in the Interquartile range (quartile 1, quartile 3) 2 (0, 2) 0 (0, 0)
control group, although the medians were equal. Minimum 0 0
Tests of comparisons for the two groups. Clearly, the as- Maximum 19 3
Percent zeroes 58.56 91.26
sumptions for the application of a two-sample t-test were not

Staying Current
the steroid-treated group [percent difference: 32.7%, 95% CI: the BIC values clearly indicated that the negative binomial
(21.1%, 44.3%)]. model resulted in an improvement in fit to the data (450.68
Regression models. Table 2 shows the parameter estimates, compared with 667.07 for the Poisson model).
CIs, P values, and BIC for each of the regression models For the ZIP model, the zero inflation parameter was signif-
described above. As expected, linear regression gave the same icant (model 4: P ⫽ 0.04), and the odds of a kidney having zero
information as the t-test but additionally yielded a BIC value cysts was 0.20 [95% CI (0.08, 0.51), P ⬍ 0.01] times lower in
that could be compared with those obtained from other regres- the steroid-treated group (model 5). Given that there are cysts,
sion models. The Poisson model showed that the average the rate was over three times [RR: 3.23, 95% CI: (1.51, 6.94),
number of cysts in kidneys in the steroid-treated group were P ⬍ 0.01] higher on average for kidneys in the steroid-treated
almost 11 times [RR: 10.64, 95% CI: (6.28, 18.03), P ⬍ 0.01] group compared with the control group. The additional param-
higher than kidneys not treated with steroids. eter to model the zero-inflated component in the Poisson model
The negative binomial model gave the same rate ratio as the reduced the BIC values greatly (507.94) and even further when
Poisson model but with larger SEs, resulting in a wider CI than the variation between zero inflation of the steroid-treated and
the Poisson model. As expected, the overdispersion parameter control groups was included (505.98), with both providing a

in the negative binomial model was significant (P ⬍ 0.01), and much better fit than the simple Poisson model (BIC: 667.07).
Table 2. Estimates, 95% confidence, P values, and BIC values for the various models for the control and steroid-treated
groups
Estimate 95% Confidence Intervals for the Estimate P Value BIC Value
Model 1: linear (raw estimate)

Control group (constant) 0.15 ⫺0.27, 0.57 0.50
Steroid-treated group 1.40 0.82, 1.99 ⬍0.01 953.41
Model 2: Poisson (rate)
Control group (constant) 0.15 0.09, 024 ⬍0.01
Model 3: negative binomial (rate)
Control group (constant) 0.15 0.08, 0.27 ⬍0.01
Steroid-treated group 10.64 5.09, 22.24 ⬍0.01
Dispersion 1.31 1.17, 1.47 ⬍0.01 450.68
Model 4: zero-inflated Poisson (3 parameters)
Count model (rate)
Zero inflation (odds)
Control group (constant) 1.52 1.02, 2.25 0.04 507.94
Model 5: zero-inflated Poisson (4 parameters)
Count model (rate)
Control group (constant) 1.13 0.53, 2.38 0.76
Model 6: zero-inflated negative binomial (4 parameters)
Count model (rate)
Dispersion 5.49 0.96, 31.30 0.33
Control group (constant) 0.31 0.01, 17.44 0.57 455.20
Model 7: zero-inflated negative binomial (5 parameters)
Count model (rate)
Steroid-treated group 4.38 1.33, 14.40 0.02
Dispersion 1.67 0.41, 6.89 0.48
Steroid-treated group 0.18 0.03, 1.02 0.05 457.80
Parameter estimates for the control (constant) and steroid-treated groups are given as either raw estimates, rate, or odds ratios for the regression models;
dispersion parameters are also given for the negative binomial models. Upper and lower confidence intervals for all the parameter estimates are shown as well
as P values and Bayesian information criterion (BIC) values.

Staying Current
132 WHEN t-TESTS OR WILCOXON-MANN-WHITNEY TESTS WON’T DO
The basic negative binomial model (model 3) gave a better tion can result in a misleading difference between means being
fit than any of the Poisson models (BIC: 450.68), with signif- detected. Value inflation (namely, zero inflation) can lead to a
icant overdispersion, as expected. The constant zero inflation high proportion of tied values in the location-shift version of
term (model 6) was nonsignificant, and the model that allowed the WMW test, resulting in incorrect conclusions being drawn
zero inflation to vary according to group showed the difference from P values. In these cases, tests may lead to conflicting
to be almost significant [odds ratio: 0.18, 95% CI (0.03, 1.02), results and may not agree with observations made from ex-
P ⫽ 0.05]. However, increasing the complexity of the negative ploratory data analyses where the medians do not indicate a
binomial model resulted in a higher BIC value (455.20 and difference between groups. While an advantage of t-tests and
457.80 for ZINB with and without zero inflation varying by WMW tests are their simplicity and ease of application, they
group) as it was penalized by the incorporation of the addi- are greatly disadvantaged when applied to count data due to
tional parameters, indicating models that do not provide a more these erroneous assumptions (14). It can be shown using
parsimonious fit to the data. simulation that the WMW test is particularly vulnerable to zero
Comparison of regression models. The observed and pre- inflation, particularly when the probability of belonging to the
dicted numbers of cysts for models 1–3, 5, and 7 are shown in zero class is ⬎0.75 and less so to overdispersion. More details

Table 3. In the distributions for the control group, the predicted on the simulations performed are available upon request from
distributions for negative binomial, ZIP, and ZINB were all the corresponding author.
similar to the observed data. However, for the steroid-treated Here, we present a number of regression models for discrete
group, there were differences between the distributions. The data as an alternative to the t-tests and WMW tests. Poisson
normal distribution provided the worst fit to the data, and this regression models can be used to model the mean number of
was reflected in the BIC value in Table 2. The Poisson events, and the negative binomial model includes an extra
distribution also yielded a poor fit, as indicated by both the parameter to allow for overdispersion. A further extension of
fitted values and BIC value. The additional zero inflation these models is to model zero inflation in the data, resulting in
parameter provided a better fit to the data in the ZIP model but the ZIP and ZINB models, each with corresponding additional
did not fit as well as the negative binomial or ZINB models, in
parameters.
terms of both the BIC value and fitted values shown in Table
While regression analyses using distributions such as the
3 (steroid-treated group). The BIC values shown in Table 2
Poisson, negative binomial, ZIP, and ZINB models might be
indicate that the negative binomial model was the most parsi-
more complex, they model discrete counts, allowing for correct
monious, and therefore preferred, model.
assumptions, and also provide more valid and useful estimates
of differences between groups. Software for fitting the models
DISCUSSION
is now widespread, for example, in R and STATA. BIC values
The most commonly used methods for comparing location in are provided alongside parameter estimates and SEs in R,
two groups are the t-test or WMW test. The advantage of whereas in other packages such as STATA, the log likelihood
permutation tests (7) is that they provide a distribution-free (ᐉ) is given instead, from which the BIC value can easily be
alternative to test differences of group means; however, they calculated as follows: ⫺2ᐉ⫹k ⫻ log(n), where k is the number
are disadvantaged as they are difficult to generalize to a of parameters in the model and n is the sample size.
regression setup such as discrete regression modeling. The A further advantage is that the regression modeling approach
normality assumption of the t-test is inappropriate for highly is very flexible. Models can be extended to quantify the
skewed discrete data, where overdispersion and/or zero infla- associations with additional covariates, which may be contin-
Table 3. Observed and fitted numbers of cysts for the control and steroid-treated groups
Frequency of Cysts
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Control group
Observed numbers 94 4 4 1
Fitted number
Model 1: normal 77 20
Model 2: poisson 89 13 1
Model 3: negative binomial 94 4 2 1 1
Model 5: zero-inflated poisson 94 5 3 1
Model 7: zero-inflated negative binomial 94 6 2 1
Steroid-treated group
Observed numbers 65 14 10 6 4 2 2 2 1 1 1 2 1
Fitted numbers
Model 1: normal 13 15 15 13 11 8 5 3 1 1
Model 2: poisson 24 37 28 15 6 2
Model 3: negative binomial 64 17 9 6 4 3 2 2 1 1 1 1
Model 5: zero-inflated poisson 65 5 8 10 9 7 4 2 1
Model 7: zero-inflated negative binomial 65 14 9 6 4 3 2 2 1 1 1 1
n ⫽ 103 mice in the control group and 111 mice in the steroid-treated group.

Staying Current
uous (as opposed to the single binary covariate, steroid/control, analysing discrete data that avoids making erroneous assump-
used in our example data set). In this example, the number of tions and facilitates comparisons between models.
groups could be extended to include more than one type of ACKNOWLEDGMENTS
steroid or a continuous variable, e.g., the growth of the kidneys
over the number of days cultured. Interactions between covari- The authors are grateful to Adrian Woolf and David Long for the helpful
comments on a previous version of this article.
ates can also be investigated in the models, e.g., we could look
at the interaction between the growth of kidneys and steroid- GRANTS
treated groups. A further use of discrete regression is in The Institute of Child Health of University College London receives a
modeling rates. For counts of events measured in a certain time portion of funding from the Department of Health’s National Institute of
period, where the time period may vary across observation, Health Research Biomedical Research Centres funding scheme. The Centre for
Paediatric Epidemiology and Biostatistics also benefits from funding support
dividing the counts of events by the length of a time period from the Medical Research Council (MRC) in its capacity as the MRC Centre
gives a rate. In discrete regression modeling, an offset can be of Epidemiology for Child Health (G0400546). F. McElduff acknowledges the
included to adjust for the different time periods that has no support of an MRC Capacity Building Studentship. S.-K. Chan is funded by
parameter estimates or P values. Kidney Research UK.

In the example presented in this article, both the t-test and DISCLOSURES
WMW test indicated that a significant difference exists in the No conflicts of interest, financial or otherwise, are declared by the author(s).
number of cysts between the control and steroid-treated
groups. While exploratory data analysis revealed that there was REFERENCES
a difference between the means of the two groups, there was no 1. Afifi AA, Kotlerman JB, Ettner SL, Cowan M. Methods for improving
difference in the medians. Of the seven regression models regression analysis for skewed continuous or count responses. Annu Rev
Public Health 28: 95–111, 2007.
fitted, the BIC values indicated that the negative binomial 2. Bergmann R, Ludbrook J, Spooren WP. Different outcomes of the
model provided the best fit to the data. This model showed that Wilcoxon-Mann-Whitney test from different statistical packages. Am
the mean number of cysts was significantly higher in the Statistician 54: 72–77, 2000.
steroid-treated group than in the control group and that there 3. Bland M. An Introduction to Medical Statistics (3rd ed.). New York:
was significant overdispersion present in the data. Oxford Univ. Press, 2000.
3a. Chan SK, Riley PR, Price KL, McElduff F, Winyard PJ, Welham SJ,
The cyst counts on a given day provide a snapshot of each Woolf AS, Long DA. Corticosteroid-induced dysmorphogenesis is asso-
kidney’s ability to produce cysts. The underlying capacities of ciated with deregulated expression of known cystogenic molecules, as well
the cells may or may not be identical. The Poisson model as indian hedgehog. Am J physiol Renal Physiol 298: 346 –356, 2010.
assumes that they are the same, whereas the negative binomial 4. Chernoff H, Lehmann EL. The use of maximum likelihood estimates in
chi squared tests for goodness-of-fit. Annu Math Stat 25: 579 –586, 1954.
model allows for variation. Therefore, the data suggest that the 5. Fagerland MW, Sandvik L. The Wilcoxon-Mann-Whitney test under
more complex model was appropriate. However, the prefer- scrutiny. Stat Med 28: 1487–1497, 2009.
ence of the negative binomial model over the ZINB model 6. Fay MP, Proschan MA. Wilcoxon-Mann-Whitney or t-test? On assump-
showed that there did not appear to be a subset of kidneys that tions for hypothesis tests and multiple interpretations of decision rules.
Stat Surveys 4: 1–39, 2010.
cannot produce cysts. 7. Good PI. Permutation, Parametric, and Bootstrap Tests of Hypotheses
This article demonstrates how commonly used tests for the (3rd ed.). New York: Springer, 2005.
comparison of two groups, namely, the t-test and WMW test, 8. Hilbe JM. Negative Binomial Regression. Cambridge: Cambridge Univ.
can be problematic when dealing with discrete counts. Erro- Press, 2007.
9. Hollander M, Wolfe DA. Nonparametric Statistical Methods (2nd ed.).
neous assumptions may need to be made for the application of New York: Wiley-Interscience, 1999.
the t-test and linear regression to skewed discrete data, hence 10. Kuha J. AIC and BIC: comparisons of assumptions and performance.
invalidating results. Less obviously, nonparametric tests such Sociol Methods Res 33: 188 –228, 2004.
as the WMW test, where value inflation leads to an increased 11. R Development Core Team. R: a Language and Environment for Statis-
tical Computing. Vienna: R Foundation for Statistical Computing, 2007.
number of ties, may also lead to incorrect conclusions. We 12. Shapiro SS, Francia RS. An approximate analysis of variance test for
present the use of regression modeling as an alternative method normality. J Am Stat Assoc 67: 215–216, 1972.
for comparing groups with skewed observations, using the 13. Welch BL. The generalization of “Student’s” problem when several
Poisson distribution, which models the mean number of events different population variances are involved. Biometrika 34: 28 –35, 1947.
14. Whitley E, Ball J. Statistics review 6: nonparametric methods. Crit Care
and can be extended to allow for overdispersion and value 6: 509 –513, 2002.
inflation in the negative binomial, ZIP, and ZINB models. The 15. Zeileis A, Kleiber C, Jackman S. Regression models for count data in R.
approach outlined in this article provides a flexible tool for J Stat Software 27: 2008.

When T-Tests or Wilcoxon-Mann-Whitney Tests Won't Do

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

When T-Tests or Wilcoxon-Mann-Whitney Tests Won't Do

Uploaded by

Copyright:

Available Formats

Adv Physiol Educ 34: 128–133, 2010;

Staying Current doi:10.1152/advan.00017.2010.

When t-tests or Wilcoxon-Mann-Whitney tests won’t do

Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

Advances in Physiology Education • VOL 34 • SEPTEMBER 2010

Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

Advances in Physiology Education • VOL 34 • SEPTEMBER 2010

Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

Model 1: linear (raw estimate)

Advances in Physiology Education • VOL 34 • SEPTEMBER 2010

Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

Advances in Physiology Education • VOL 34 • SEPTEMBER 2010

Downloaded from http://advan.physiology.org/ by 10.220.32.247 on September 4, 2017

Advances in Physiology Education • VOL 34 • SEPTEMBER 2010

You might also like