You are on page 1of 9

RESEARCH METHODS AND STATISTICS

Parametric Versus Nonparametric Statistical


Tests: The Length of Stay Example
Munirih Qualls, MD, MPH, Daniel J. Pallin, MD, MPH, and Jeremiah D. Schuur, MD, MHS

Abstract
Objectives: This study examined selected effects of the proper use of nonparametric inferential statisti-
cal methods for analysis of nonnormally distributed data, as exemplified by emergency department
length of stay (ED LOS). The hypothesis was that parametric methods have been used inappropriately
for evaluation of ED LOS in most recent studies in leading emergency medicine (EM) journals. To illus-
trate why such a methodologic flaw should be avoided, a demonstration, using data from the National
Hospital Ambulatory Medical Care Survey (NHAMCS), is presented. The demonstration shows how
inappropriate analysis of ED LOS increases the probability of type II errors.
Methods: Five major EM journals were reviewed, January 1, 2004, through December 31, 2007, and all
studies with ED LOS as one of the reported outcomes were reviewed. The authors determined whether
ED LOS was analyzed correctly by ascertaining whether nonparametric tests were used when indicated.
An illustrative analysis of ED LOS was constructed using 2006 NHAMCS data, to demonstrate how
inferential testing for statistical significance can deliver differing conclusions, depending on whether
nonparametric methods are used when indicated.
Results: Forty-nine articles were identified that studied ED LOS; 80% did not perform a test of normal-
ity on the ED LOS data. Data were not normally distributed in all 10 of the studies that did perform such
tests. Overall, 43% failed to use appropriate nonparametric methods. Analysis of NHAMCS data con-
firmed that failure to use nonparametric bivariate tests results in type II statistical error and in multivari-
ate models with less explanatory power (a smaller R2 value).
Conclusions: ED LOS, a key ED operational metric, is frequently analyzed incorrectly in the EM litera-
ture. Applying parametric statistical tests to such nonnormally distributed data reduces power and
increases the probability of a type II error, which is the failure to find true associations. Appropriate use
of nonparametric statistics should be a core component of statistical literacy because such use increases
the validity of ED research and quality improvement projects.
ACADEMIC EMERGENCY MEDICINE 2010; 17:1113–1121 ª 2010 by the Society for Academic Emer-
gency Medicine
Keywords: length of stay, parametric, nonparametric, methodology, statistics, quality

S
tatistical literacy is a critical skill for users of the (EBM) is that clinicians should depend on primary medi-
medical literature, for clinicians and researchers cal literature to inform patient care decisions.1 To prac-
alike. A key premise of evidence-based medicine tice EBM well requires the ability to understand and
recognize sources of bias in the medical literature.
From the Department of Emergency Medicine, Brigham and Biased studies are more likely to derive incorrect conclu-
Women’s Hospital (MQ, DJP, JDS), Boston, MA; Harvard sions that can mislead practitioners of EBM. Bias can
Medical School (DJP, JDS), Boston, MA; and the Division of occur due to faulty study design and ⁄ or methodology, as
Emergency Medicine, Children’s Hospital Boston (DJP), well as inappropriate choice of inferential statistical tests.
Boston, MA. Consequently, biostatistical training has become more
Received December 15, 2009; revisions received March 21 and emphasized in medical schools2 and emergency medicine
April 7, 2010; accepted April 11, 2010. (EM) residency programs.3
Presented at the Society for Academic Emergency Medicine Despite this recent increased emphasis on statistical
annual meeting, New Orleans, LA, May 14–17, 2009. literacy, many physicians cannot demonstrate compe-
Disclosures: Part of Dr. Schuur’s time is supported by a Jahni- tence with basic biostatistical concepts.1,4,5 This leads to
gen Career Development Award, funded by the Atlantic Philan- mistakes in statistical analysis6 and reluctance to con-
thropies and the Hartford Foundation. duct research.7,8 This limitation is particularly germane
Supervising Editor: Gary Gaddis, MD, PhD. to EM, which has been highly self-critical regarding
Address for correspondence: Munirih Qualls, MD, MPH; research methodology.9,10 Although the rigor of EM
e-mail: mqualls@partners.org. Reprints will not be available. research has increased with the maturation of the

ª 2010 by the Society for Academic Emergency Medicine ISSN 1069-6563


doi: 10.1111/j.1553-2712.2010.00874.x PII ISSN 1069-6563583 1113
1114 Qualls et al. • PARAMETRIC VS. NONPARAMETRIC STATISTICAL TESTS

specialty, and the creation of dedicated research train- ED LOS, using data from the National Hospital Ambu-
ing activities such as research fellowships,11 there is latory Medical Care Survey (NHAMCS). We test the
still room for improvement. Appropriate use of non- theory that type II errors, and not type I errors, pre-
parametric statistical analyses for nonnormally distrib- dominate when inappropriate parametric tests are
uted data represents just such an opportunity for used.17
improvement.6,12 EM researchers must understand how
to choose appropriate statistical methods, and clinicians METHODS
and peer reviewers who read their studies should be
able to recognize basic errors. Study Design
Emergency department length of stay (ED LOS) is a We hand-reviewed all original research articles pub-
continuously distributed interval variable with a non- lished between January 1, 2004, and December 31,
normal distribution due to its frequent high degree of 2007, in Academic Emergency Medicine, the American
skew, as illustrated in Figure 1. ED LOS is an impor- Journal of Emergency Medicine, Annals of Emergency
tant, frequently studied outcome variable in EM opera- Medicine, the Canadian Journal of Emergency Medicine,
tions research because it is a key indicator of and Emergency Medicine Journal. The first three jour-
operational efficiency. Parametric statistical tests are nals are the three highest ranked U.S. EM journals by
usually inappropriate13–17 for analysis of ED LOS. Since impact factor.27 The others are the two leading English
the Institute of Medicine identified ED crowding as an language non-U.S. EM journals. Using the 2006 NHA-
obstacle to high-quality emergency care, and a series of MCS database, we conducted illustrative analyses of
studies have linked crowding to adverse safety and ED LOS using both parametric and nonparametric
quality outcomes, ED LOS has become a proxy for methods.
quality-of-care processes.18–23 The National Quality
Forum (NQF) has endorsed median ED LOS as an indi- Data Sources
cator of safety and efficiency,24 and the Joint Commis- Our literature review included any original research
sion announced that it will include the NQF’s ED LOS article using ED LOS as a primary or secondary out-
measures in its 2010 hospital specifications manuals.25 come. We used the ED Benchmarking Alliance consen-
These are the first steps toward inclusion of ED LOS in sus definition of ED LOS: time from ED arrival to
mandatory hospital quality measures, making it likely discharge, admission, or death in the ED.28 For our
that within a few years every hospital’s ED LOS will be illustrative analyses, we used the ED component of the
available for the public to view on the Medicare web- 2006 NHAMCS. This is a multistage probability survey
site, Hospital Compare.26 of U.S. ED visits in institutional, general, and short-stay
Our study has two goals. The first goal is to test the hospitals, whose methods have been detailed previ-
hypothesis that inferential statistical analysis of ED LOS ously.29 NHAMCS defines ED LOS as ‘‘length of visit,
in published studies is usually performed inappropri- calculated from arrival time to time of discharge from
ately, because of a suspicion that ED LOS is usually the ED.’’29
analyzed using parametric methods. The second goal is
to demonstrate, by example, that inappropriate use of Study Protocol
parametric methods increases the probability of com- We determined standards used to describe or test non-
mitting a type II error. We present several analyses of normally distributed data and then compared the

Figure 1. Distribution of ED LOS in the NHAMCS (n = 26,618). The graph is truncated at ED LOS £ 1,800 minutes. ED LOS was
greater than 1800 minutes at 123 visits. LOS = length of stay; NHAMCS = National Hospital Ambulatory Medical Care Survey.
ACAD EMERG MED • October 2010, Vol. 17, No. 10 • www.aemj.org 1115

Table 1 with ED LOS. The hospital type variable was urban


Standards Used to Describe or Test Nonnormally Distributed hospital defined by location in a metropolitan statistical
Data area (Yes ⁄ No). The visit event variables were diagnostic
tests performed (Yes ⁄ No), was imaging performed
Criterion Appropriate Standard (Yes ⁄ No), was care by midlevel provider (nurse practi-
Test of normality LOS data must be evaluated for tioner or a physician assistant; Yes ⁄ No), and arrival by
nonnormal distribution. ambulance (Yes ⁄ No). The patient demographic charac-
Descriptor of central LOS data must be described teristics were Hispanic ethnicity (Yes ⁄ No), race (white
tendency using median, or median and vs. nonwhite), sex (male ⁄ female), age (18–65 years, > 65
mean.
Aggregation of data Aggregate LOS data, such as
years), and presenting level of pain (dichotomized to
weekly mean ED LOS, should severe ⁄ not severe [none ⁄ mild ⁄ moderate]).
not be used in statistical For the multivariate analysis, we included the 10 vari-
analyses. ables listed above, plus five additional categorical vari-
Bivariate statistical Nonparametric tests of ables that were not eligible for bivariate analysis
tests significance (Mann-Whitney,
Wilcoxon, or Kruskal-Wallis) because they could not easily be dichotomized. The
should be used to analyze additional variables were as follows:
data with a nonnormal
distribution, rather than 1. Hospital ownership (proprietary; voluntary non-
parametric tests such as profit; government, nonfederal).
Student’s t-test. 2. Hospital region (Northeast, Midwest, South, West).
Transformation of Nonnormal LOS data must be 3. Number of medications given in the ED.
LOS in regression appropriately transformed
analysis prior to regression analysis 4. Day of the week.
(e.g., log-transformation). 5. Number of procedures performed (e.g., wound care
or IV fluids) during the visit.
LOS = length of stay.

Outcome Measures
The primary outcome measure in the literature review
selected articles against these standards (Table 1). portion of this manuscript was the proportion of pub-
Descriptor of central tendency was categorized as lished articles that inappropriately employed para-
mean, median, or both. For all other criteria, we deter- metric methods for description or analysis of ED LOS
mined a rating of ‘‘Yes’’ or ‘‘No’’ based on the statistical data.
techniques used. To assess whether a test of normality
was performed, we recorded whether the authors sta- Data Analysis
ted that they performed a test of normality on the ED We analyzed the results of the literature review using
LOS data or if they explicitly mentioned the distribution descriptive statistics. We tested the NHMACS data for
of ED LOS. For articles that did not describe the distri- normality using three methods. The Anderson-Darling
bution of ED LOS data, we considered ED LOS to have test is a formal test of normality; a p-value less than
a nonnormal distribution, unless a normal distribution 0.01 signifies a nonnormal distribution.32 The second
was reported or unless sufficient data were provided method is ‘‘mean-median difference,’’ which represents
for us to determine that ED LOS was normally distrib- the degree of skewness by calculating the difference
uted. When ED LOS was not normal, we deemed the between the median and the mean as a percentage of
study to have conducted appropriate nonparametric the mean. A small percentage difference (i.e., 1% to
analysis if it met three criteria: 1) reported a median as 5%) suggests that the mean and the median are close to
the description of central tendency, 2) used appropriate each other, and the data are likely to be normally dis-
nonparametric bivariate tests of significance (e.g., tributed. A larger difference suggests that the mean
Mann-Whitney, Wilcoxon, or Kruskal-Wallis),30 and and the median are far from each other and the data
3) conducted appropriate log-transformations of ED are not normally distributed.33 The third method is the
LOS data prior to regression analyses.31 ‘‘standard deviation to mean ratio.’’ If the standard
In contrast, we deemed a study with nonnormal ED deviation (SD) is more than half of the mean, the distri-
LOS data not to have used appropriate methods if it: bution is likely to be nonnormal.33
1) reported only the mean as a measure of central ten- We examined the bivariate relationship between
dency for nonnormally distributed ED LOS data, NHAMCS ED LOS and the 10 dichotomizable covari-
2) used inappropriate bivariate tests of significance ates with parametric (t-test) and nonparametric (Wilco-
(such as Student’s t-test), and 3) failed to perform a log- xon rank sum test) bivariate tests.33
transformation prior to linear regression analysis. We created a multivariate regression model from the
In our NHAMCS analyses, we included all adult (age 15 independent variables (10 dichotomous and five non-
‡ 18 years) ED visits in the NHAMCS 2006 data set and dichotomous, as described above) using NHAMCS
evaluated associations between ED LOS and three cate- data. We constructed one model with raw ED LOS as
gories of variables, which were hospital type, visit the dependent variable and another with log-trans-
events, and patient demographic characteristics. To formed ED LOS. The R2 values, the F-values, and the
evaluate bivariate predictors of ED LOS, we selected 10 p-values of the two models were compared. The mea-
dichotomous variables for which there existed a clinical ning of these parameters is defined elsewhere.33 In
or administrative rationale to suggest an association brief, the R2 is a measure of the total amount of
1116 Qualls et al. • PARAMETRIC VS. NONPARAMETRIC STATISTICAL TESTS

variability described by the equation constituting a mul- performed statistical analyses on aggregate mean LOS
tivariate linear regression. The F-value represents the data. Three used daily mean, one used weekly mean,
proportion of the variance explained by each predictor and one used monthly mean. This is an inappropriate
variable. The p-value is a means of interpreting the approach, as it does not change the underlying non-
F-value in terms of statistical significance. For purposes parametric distribution of ED LOS data and does not
of the modeling exercise, we have focused on the have the power advantages of nonparametric statistical
change in R2 and F-values that occurs when one log- techniques.
transforms existing explanatory variables to better nor-
malize their distributions, without addressing issues of NHAMCS Analysis
colinearity, residual analysis, and outliers. Mean ± SD adult ED LOS for U.S. EDs was
We considered a two-sided p < 0.01 to be significant, 229 ± 257 minutes, and median ED LOS was 164 min-
as recommended by the National Center for Health utes (interquartile range [IQR] = 93–272 minutes). Adult
Statistics for analysis of NHAMCS data, due to the ED LOS was nonnormally distributed in the 2006 NHA-
large size of the data set and the frequency with which MCS data set by all three tests of normality. The Ander-
it is queried.34 We performed bivariate parametric son-Darling test revealed nonnormality, with p < 0.005.
tests and multivariate linear regression models using Mean-median difference was 30% of the value of the
both standard and weighted survey techniques that mean. The mean to SD ratio was 0.9. A histogram
account for the design characteristics of NHAMCS. graphically illustrates the rightward skew of adult ED
We performed nonparametric tests using standard LOS compared to the normal distribution (Figure 1).
techniques, as weighting cannot be accounted for with The distribution of ED LOS closely fits a log-normal dis-
nonparametric techniques surveys. All statistical analy- tribution.
ses were performed with SAS 9.1 software (SAS Insti- In our bivariate analysis, we found that three of the
tute, Cary, NC). 10 variables evaluated as predictors of ED LOS did not
meet our a priori threshold for statistical significance
RESULTS using the parametric Student’s t-test (Table 2). In con-
trast, all 10 variables were significant (p < 0.001) using
Literature Review the nonparametric Wilcoxon rank sum test. The three
We identified 49 articles with ED LOS as a primary or variables that were significant in nonparametric analy-
secondary outcome. Ten of the 49 (20%) articles sis and not significant in parametric analysis were: sex
included a test of normality on the ED LOS data; all (p = 0.806), care by a midlevel provider (p = 0.021), and
10 of these articles reported that the data were not pain (p = 0.091).
normally distributed. Of the 39 articles that did not In our multivariate linear regression analysis, we
perform a test of normality, 17 reported sufficient data found that log-transforming ED LOS resulted in a bet-
to allow calculation of the distribution of the ED LOS ter fitting model than raw ED LOS, as determined by
data set using two of the summary methods described larger F-value (with the exception of ‘‘Hospital Owner-
above. Ten of these 17 articles (59%) had a nonnormal ship’’) for the predictor variables, and a larger R2 value
distribution of ED LOS. The ED LOS of the remaining for the regression model (Table 3). Only one variable
22 articles was assumed to be nonnormally distrib- was significant in the transformed ED LOS model and
uted, consistent with the methodology described not significant in the raw ED LOS model (sex,
above. p = 0.049). In both our bivariate and our multivariate
Sixteen of 49 articles (33%) appropriately accounted analyses, we found no type I errors resulting from inap-
for the nonparametric distribution of ED LOS by propriate use of parametric tests.
reporting median ED LOS and analyzing ED LOS using
a nonparametric bivariate test or transforming ED LOS Sensitivity Analysis: Weighted Survey Techniques
prior to multivariate regression. Twenty-one of the 49 As the NHAMCS is a multistage probability sample, it
articles (43%) failed to account for the nonnormal dis- is recommended that users account for the visit weights
tribution of ED LOS and used only parametric methods when drawing inferences from the data. Although we
of analysis. The remaining 12 articles used a combina- used the data as an example of a nonparametric data
tion of parametric and nonparametric methods and set and are not proposing conclusions based on rela-
descriptive statistics. tionships between variables, we performed bivariate
Among all 49 articles with ED LOS as an outcome, tests and multivariate linear regression using both sur-
47% exclusively reported mean LOS as the description vey and unweighted techniques, as readers familiar
of central tendency. Of the 32 articles that performed with NHAMCS will expect. When the weighted tech-
bivariate tests of significance, 14 (44%) used only the niques were used for the bivariate tests (t-tests), one
parametric Student’s t-test. Of the 14 articles that cre- more variable was nonsignificant with the t-test (His-
ated multivariate regression models with ED LOS as panic ethnicity, p = 0.02), but all other directions of
the dependent variable, 10 (71%) used only raw ED effect and conclusions were unchanged from the results
LOS data. reported above. When the multivariate models were
Two articles (4%) used other appropriate techniques run with survey weights applied, two more variables
to account for skewed data: one included a histogram, were nonsignificant in the raw ED LOS model (hospital
a graphic representation of normality, and another region, p = 0.02; and ownership, p = 0.08) and one in
reported trimming the data of outliers that contributed the log-transformed ED LOS model (hospital owner-
to a rightward skew of the data. Five articles (10%) ship, p = 0.12). Most importantly, no examples of type I
ACAD EMERG MED • October 2010, Vol. 17, No. 10 • www.aemj.org 1117

Table 2
Results of Bivariate Parametric and Nonparametric Analysis of ED LOS From the NHAMCS*

Wilcoxon Rank
Variable Mean (95% CI) t-test Median (IQR) Sum Test
Urban hospital 239 (236–243) <0.001 174 (100–286) <0.001
Nonurban hospital 157 (149–166) 109 (62–173)
No diagnostic tests performed 133 (128–138) <0.001 88 (51–154) <0.001
Diagnostic tests performed 251 (247–255) 183 (111–294)
No imaging performed 197 (193–202) <0.001 127 (70–222) <0.001
Imaging performed 264 (259–268) 205 (129–315)
No care by midlevel provider 230 (227–233) 0.021  165 (94–273) 0.006
Care by midlevel provider 216 (205–228) 154 (89–259)
No arrival in ambulance 208 (207–213) <0.001 150 (105–355) <0.001
Arrival in ambulance 291 (283–300) 214 (137–334)
Hispanic ethnicity 257 (248–267) <0.001 184 (102–307) <0.001
Non-Hispanic ethnicity 225 (222–228) 161 (91–268)
Nonwhite race 248 (242–254) <0.001 180 (104–304) <0.001
White race 222 (218–226) 158 (90–261)
Female 229 (225–234) 0.806  167 (95–276) 0.002
Male 228 (224–234) 159 (90–266)
Age > 65 yr 256 (249–263) <0.001 200 (125–306) <0.001
Age < 65 yr 223 (219–226) 155 (88–262)
High pain 225 (220–231) 0.091  170 (100–280) <0.001
Low pain 220 (216–224) 159 (91–260)

IQR = interquartile range; LOS = length of stay; NHAMCS = National Hospital Ambulatory Medical Care Survey.
* Based on 26,618 adult ED visits in the 2006 NHAMCS. Some variables were calculated based on smaller sample sizes,
as missing values were excluded. Statistical calculations are based on unweighted techniques.
 Variables that differed significantly between parametric and nonparametric test.

Table 3
ED LOS Regression Models Using Raw and Transformed ED LOS Data From the 2006 NHAMCS*

Raw ED LOS Log-transformed ED LOS

Predictor Variable F-value p-value F-value p-value


Urban hospital 132 <0.001 543 <0.001
Hospital ownership 72.6 <0.001 44.4 <0.001
Hospital region 28.1 <0.001 39.6 <0.001
Imaging performed 87.9 <0.001 407 <0.001
Diagnostic tests performed 148 <0.001 817 <0.001
Total number of procedures 18.7 <0.001 61.2 <0.001
Day of week (weekday ⁄ weekend) 10.3 <0.001 37.8 <0.001
Care by midlevel provider (Yes ⁄ No) 0.390 0.531 2.40 0.121
Number of medications administered 712 <0.001 1163 <0.001
Arrival by ambulance (Yes ⁄ No) 126 <0.001 127 <0.001
Race White ⁄ nonwhite 89.2 <0.001 106 <0.001
Ethnicity (Hispanic ⁄ non-Hispanic) 31.9 <0.001 63.4 <0.001
Sex (male ⁄ female) 3.85 0.049  8.66 0.003
Age, yr 69.8 <0.001 187 <0.001
Pain (high ⁄ low) 0.81 0.369 3.27 0.071
R2 for regression model Raw ED LOS: Log-transformed ED LOS:
0.12 0.23

LOS = length of stay; NHAMCS = National Hospital Ambulatory Medical Care Survey.
*Models based on 23,842 of 26,618 adult ED visits in the 2006 NHAMCS with complete data for all predictor variables.
Statistical calculations are based on unweighted techniques.
 Variables which differed in significance between parametric and nonparametric test.

error were identified when data were analyzed with bias in research, readers should be able to identify
weighted survey techniques. biases associated with inappropriate experimental
design or inappropriate statistical methods. ‘‘Parame-
DISCUSSION ter’’ and ‘‘parametric’’ are among the most frequently
misunderstood words in clinical research. A parameter
Statistical literacy is important for emergency physi- is defined as the unknown value of a variable in an
cians engaged in research or quality improvement or entire population, which is derived by estimating the
interested in practicing EBM. To recognize sources of value of that variable from a random sample derived
1118 Qualls et al. • PARAMETRIC VS. NONPARAMETRIC STATISTICAL TESTS

from that population. The value of the variable in women. Thus the rightward skew of these outliers influ-
the sample is called the ‘‘sample value’’ or the ‘‘point ences the mean enough to cause the populations to
estimate.’’ acquire a similar measure of central tendency when
Accuracy and validity of a ‘‘nonparametric’’ statistical comparing means. However, by comparing ranks, it
inferential statistical test result does not rely on an becomes clear that for the majority of the population
assumption that the outcome and predictor variables (approximately 85% of patients), men have a shorter
are distributed normally in the source population. Non- LOS than women.
parametric tests, unlike parametric tests, do not have to Length of stay in health care settings tends to have a
be ‘‘robust’’ against violations of the inherent mathe- rightward skew due to the minority of patients who
matical assumptions of the test. An assumption of para- have disproportionately long stays (Figure 1).13,14,35
metric tests is normality of the distribution of the data Health care expenditures, utilization of health services,
analyzed by the inferential test. Nonparametric statisti- and consumption of unhealthy commodities are other
cal tests do not assume that the parameter is distributed common skewed variables.15 Distributions of data with
normally, and thus they are appropriate and accurate heavy tails, extreme skews, or unknown population
(robust) even when the sample value is not distributed characteristics should be analyzed with nonparametric
normally. tests.17 Table 4 displays common statistical tests and
The easiest example to illustrate the difference explains why each is parametric or nonparametric.
between parametric and nonparametric is a comparison When authors and readers of the medical literature
of mean versus median. The mean is inferior to the express concern about inappropriate use of parametric
median as a summary of the central tendency of the statistical tests, they often assume that inappropriate
data because the mean is a misleading indicator of cen- use of an inferential statistical test can result in an erro-
tral tendency when the data are skewed. In contrast, neously low p-value or an erroneously narrow 95%
the median is said to be more robust, because it confidence interval (CI). This can potentiate reaching a
remains an accurate description of the central tendency conclusion that a difference between groups exists in
(50th percentile) of the sample, even when the values of the source population, when in reality no difference
the individual data are skewed. In the case of income, exists. In other words, it is widely and incorrectly
for example, there is rightward skew, meaning that assumed that inappropriate use of inferential tests pot-
there are a few values that are very high. The presence entiates making a type I error. In fact, the true problem
of one billionaire in a less-developed country will with parametric tests is that they lack power when their
change mean income significantly, without affecting assumptions are violated, as occurs when they are
median income. The mean is accurate as a summary applied to analyze nonparametric data.17 When analyz-
statistic of central tendency only when the distribution ing nonnormally distributed data with parametric tests,
is normal, or bell-shaped. On the other hand, the med- the analyst is more likely to fail to detect a difference in
ian is determined by identifying the middle value in the the source population when one truly exists. This is the
list of incomes and is an accurate description of the definition of a type II error.17 Choice of the wrong sta-
central tendency regardless of whether the data are tistical test is unlikely to result in type I error.
skewed. In our NHAMCS example, sex provides a clear Our demonstration elucidates this principle. The 2006
illustration of this principle. The mean LOS is not statis- NHMACS ED LOS data had a strong rightward skew.
tically different between men and women (228.6 vs. If we had used the Student’s t-test to analyze the cova-
229.5), while the median LOS differs significantly (159 riates, we would have concluded that care by a mid-
vs. 167). This becomes apparent examining the distribu- level provider, and patient sex, were not associated
tion of ED LOS by percentile between men and women. with ED LOS. However, using the Wilcoxon rank sum
Men had shorter LOS until approximately the 85th per- test, we found that these variables were significantly
centile, above which their ED LOS exceeded that of associated with ED LOS. If applied to actual ED LOS

Table 4
Commonly Used Parametric and Nonparametric Statistical Tests

Statistical Test What Is Being Compared Why Parametric or Nonparametric


Parametric statistical tests
Student’s t-test Whether the mean is larger in one Only accurate if the values are
group vs. the other. distributed normally.
Linear regression Slope, which is a mean Only accurate if four assumptions are
met: normal distribution of dependent
variable, homoscedasticity, lack of
autocorrelation, and linearity.
Nonparametric statistical tests
Wilcoxon rank sum, Mann-Whitney Whether values tend to be larger in Result is accurate whether data are
U, and Kruskal-Wallis tests one group vs. the other. distributed normally or not.
(inferential tests for ordinal data)
Chi-square test (inferential test for Chance of falling into one group or Result is accurate whether data are
nominal data) another. distributed normally or not.
ACAD EMERG MED • October 2010, Vol. 17, No. 10 • www.aemj.org 1119

analyses, this error could affect quality improvement answer our study question, whether applying a para-
projects or interhospital comparisons. metric test to a nonparametric data set leads to diver-
Failure to transform a skewed dependent variable gent results, we believe that the unweighted statistics
during linear regression is similar to using parametric provide a sufficiently accurate and informative answer.
tests on a nonnormally distributed variable. The statisti- First, we used NHAMCS as an illustrative data set to
cal methodology underlying linear regression depends show the relationship between nonparametric and
on four assumptions, one of which is a normal distribu- parametric tests when applied to nonparametric data.
tion of the dependent variable. The violation of this We are not proposing conclusions about the relation-
assumption jeopardizes the fit of the model,36 resulting ships between the actual variables (for example, patient
in a decreased R2 value. This can affect the result of a race and ED LOS). Alternatively, we could have gener-
study because an ill-fitting model underestimates the ated a simulated data set for the same analysis,17 but as
relationship between the dependent variable and the ED LOS in NHAMCS is nonnormally distributed, it is a
independent variables. It may be that the dependent reasonable data set in which to test this hypothesis.
variable is very strongly correlated with the set of inde- Second, the unweighted results are more easily compa-
pendent variables, but if the dependent variable is rable between techniques, as there do not exist any
highly skewed, the linear regression will not completely straightforward nonparametric tests that account for
capture the magnitude of the relationship. The solution survey weights. By design, nonparametric tests assign
is to transform the nonnormal variable so that it value to each observation’s rank, not its value, so are
approximates the normal distribution or use a nonlinear not directly applicable to weighted observations. Statis-
model equation that better fits the data. Figure 1 visu- tical models have been developed to apply nonparamet-
ally illustrates how a log-normal distribution closely fits ric statistics to weighted samples, but they are not in
the NHAMCS data, while a normal distribution does common use.38–40 Focusing on the use of survey
not. For highly skewed data, one can log-transform the weights risks ‘‘losing the forest for the trees,’’ as we
dependent variable or use variations of multiple regres- found no material difference in results between stan-
sion that better fit skewed data, such as Poisson regres- dard techniques and complex survey techniques. The
sion and negative binomial regression.31 major finding of our analysis is that parametric and
Statistical techniques that maximize power, therefore nonparametric statistical tests will produce material
minimizing type II error, allow researchers to find sig- differences in statistical results if applied to nonpara-
nificant differences between groups with smaller sam- metric data.
ple sizes.15,37 This is particularly salient for ED LOS,
which is often analyzed in small single-site quality LIMITATIONS
improvement projects. In these small-scale projects, the
inherently lower power of any selected statistical test, In analyzing the results of our literature review, we
due to the predictable effect of the likely small sample assumed that ED LOS data are nonnormally distrib-
size, is likely to affect the results and conclusions. If a uted and that the use of nonparametric methods
study examines an operational intervention aimed to might have changed the result of the studies in some
reduce median ED LOS, comparison of means can cases. We cannot definitely prove this assumption
cause the investigator to miss a true reduction in ED without access to the data sets of all articles. However,
LOS. For example, a reduction in lab test turnaround LOS in health care settings is known to be skewed,13,14
time that accounts for 5 minutes per patient may not and our NHAMCS simulation demonstrates that the
appear significant if a small number of patients are use of nonparametric methods can quantitatively affect
boarding in the ED with ED LOS of greater than results at alpha levels commonly reported in medical
24 hours, whereas it may actually affect the median ED journals. Additionally, all articles that performed tests
LOS and represent improved ED operations and care. of normality on their data elected to use nonpara-
Our NHAMCS demonstration had an extremely large metric tests, which implies a widespread belief by
sample (n = 26,618) and was therefore highly powered. these authors that including formal analysis of normal-
Nonetheless, we were able to demonstrate type II error ity before deciding which statistical test to use does
in both bivariate and multivariate tests conducted with affect the results of a study. Our review is also likely
inappropriate parametric methods. This effect would be affected by publication bias, since studies with a
amplified in a smaller study, which is inherently negative parametric result may not have reached
lower powered, due to the smaller sample size, and publication. Our study investigates the effect of data
thus likely to yield p-values closer to the traditional distributions on hypothesis testing as opposed to
‘‘cutoff’’ of 0.05. parameter estimates. Hypothesis testing is frequently
In our NHAMCS demonstration, we analyzed ED used in EM literature and quality improvement pro-
LOS in the 2006 NHAMCS data set, a chart review sur- jects, and an understanding of the strengths and limi-
vey designed to allow national estimates of ED visits by tations inherent in different statistical methods, and
multistage probability sampling. When analyzing NHA- their influence on type I and type II error rates, is
MCS to determine accurate estimates of national visits, important. An alternative strategy that is sometimes
it is important to use weighted survey techniques in sta- useful is graphical illustration of data (e.g., histograms)
tistical software. We performed analyses with both that will show distribution differences.41 However, this
weighted survey statistics and unweighted statistics. approach does not provide a standard way for readers
We did not find any material differences between these to draw inferences from the data displayed and is still
techniques, so we report the unweighted results. To relatively uncommon.
1120 Qualls et al. • PARAMETRIC VS. NONPARAMETRIC STATISTICAL TESTS

10. Lewis RJ. Statistical methodology and effective


CONCLUSIONS emergency medicine: what is the connection?
Understanding the descriptive and analytic statistics [Commentary] Acad Emerg Med. 1996; 3:824.
that underlie EM research is an essential part of the 11. Biros MH. SAEM emergency medicine research
practice of EM.1,4,5 Emergency department length of fellowship guidelines. Acad Emerg Med. 1999; 6:
stay has been identified as a key process metric of 1067–8.
ED function, and numerous studies have shown an 12. Barrett TW, Schriger DL. Measures of emergency
association between ED length of stay and patient department crowding, odds ratios, and the dangers
satisfaction,21,22 quality of care,19,20 and departmental of making continuous data categorical: answers to
revenue.23 With the recent National Quality Forum January 2008 Journal Club questions. Ann Emerg
approval and Joint Commission implementation of ED Med. 2008; 51:782–9.
length of stay as a national metric of quality care, there 13. Weissman C. Analyzing intensive care unit length
will be pressure to reduce ED LOS, as there has previ- of stay data: problems and possible solutions. Crit
ously been upon door-to-balloon time in ST-elevation Care Med. 1997; 25:1594–600.
myocardial infarction.25 However, in the current EM lit- 14. Kulinskaya E, Kornbrot D, Gao H. Length of stay as
erature, most studies analyzing ED length of stay use a performance indicator: robust statistical method-
inappropriate statistical methods, designed for para- ology. IMA J Manag Math. 2005; 16:369–81.
metric data. This can bias the conclusions regarding 15. Manning WG, Mullahy J. Estimating log models: to
quality and efficiency and may affect reimbursement in transform or not to transform? J Health Econ. 2001;
the future. We cannot afford to miscalculate our own 20:461–94.
outcomes. 16. Altman DG. Practical Statistics for Medical
Contrary to popular belief, inappropriate use of para- Research. New York, NY: Chapman and Hall, 1991.
metric tests results in an increased probability of a 17. Vickers AJ. Parametric versus non-parametric sta-
type II error, not a type I error. If a study reports a sta- tistics in the analysis of randomized trials with non-
tistically significant finding after using inappropriate normally distributed data. BMC Med Res Methodol.
parametric statistics, the finding would be statistically 2005; 5:35.
significant if derived via nonparametric methods. It 18. Institute of Medicine (U.S.). Committee on the
should therefore be considered valid, because the result Future of Emergency Care in the United States
would be expected to have had a lower p-value, which Health System. Emergency Medical Services at the
is to say a lower likelihood to have occurred by chance Crossroads. Washington, DC: National Academies
alone, if the data were reanalyzed with appropriate Press, 2007.
nonparametric methods.17 19. Schull MJ, Vermeulen M, Slaughter G, Morrison L,
Daly P. Emergency department crowding and
thrombolysis delays in acute myocardial infarction.
References
Ann Emerg Med. 2004; 44:577–85.
1. West CP, Ficalora RD. Clinician attitudes toward 20. Diercks DB, Roe MT, Chen AY, et al. Prolonged
biostatistics. Mayo Clin Proc. 2007; 82:939–43. emergency department stays of non–ST-segment
2. Association of American Medical Colleges. Contem- elevation myocardial infarction patients are associ-
porary Issues in Medicine: Basic Science and Clini- ated with worse adherence to the American College
cal Research. Available at: https://services.aamc.org/ of Cardiology ⁄ American Heart Association Guide-
publications. Accessed Aug 20, 2010. lines for Management and increased adverse
3. Chapman DM, Hayden S, Sanders AB, et al. Inte- events. Ann Emerg Med. 2007; 50:489–96.
grating the Accreditation Council for Graduate 21. Thompson DA, Yarnold PR, Williams DR, Adams
Medical Education Core Competencies into the SL. Effects of actual waiting time, perceived waiting
model of the Clinical Practice of Emergency Medi- time, information delivery, and expressive quality
cine. Ann Emerg Med. 2004; 43:756–69. on patient satisfaction in the emergency depart-
4. Windish DM, Huot SJ, Green ML. Medicine ment. Ann Emerg Med. 1996; 28:657–65.
residents’ understanding of the biostatistics and 22. Taylor C, Benger JR. Patient satisfaction in emer-
results in the medical literature. JAMA. 2007; gency medicine. Emerg Med J. 2004; 21:528–32.
298:1010–22. 23. Bayley MD, Schwartz JS, Shofer FS, et al. The
5. Hack JB, Bakhtiari P, O’Brien K. Emergency medi- financial burden of emergency department conges-
cine residents and statistics: what is the confidence? tion and hospital crowding for chest pain patients
J Emerg Med. 2009; 37:313–8. awaiting admission. Ann Emerg Med. 2005; 45:
6. Gaddis ML, Gaddis GM. Introduction to biostatis- 110–7.
tics: part 1, basic concepts. Ann Emerg Med. 1990; 24. National Quality Forum. NQF Endorses Measures
19:86–9. to Address Care Coordination and Efficiency in
7. Goodacre S. Critical appraisal for emergency medi- Hospital Emergency Departments: Measures Can
cine 2: statistics. Emerg Med J. 2008; 25:362–4. Help Decrease Wait Time, Increase Physician Pro-
8. Carley S, Lecky F. Statistical consideration for ductivity, and Safety. Available at: http://www.
research. Emerg Med J. 2003; 20:258–62. Qualityforum.org. Accessed Aug 24, 2010.
9. Biros MH. Emergency medicine research: where 25. Centers for Medicare and Medicaid Services. Medi-
are we now and where do we need to be? Acad care Program: Proposed Changes to the Hospital
Emerg Med. 1997; 4:1101–3. Inpatient Prospective Payment Systems and Fiscal
ACAD EMERG MED • October 2010, Vol. 17, No. 10 • www.aemj.org 1121

Year 2009 Rates; Proposed Changes to Disclosure 34. Hing E, Gousen G, Shimizu I, Burt C. Guide to
of Physician Ownership in Hospitals and Physician using masked design variables to estimate standard
Self-Referral Rules; Proposed Collection of Informa- errors in public use files of the National Ambulatory
tion Regarding Financial Relationships Between Medical Care Survey and the National Hospital
Hospitals and Physicians; Proposed Rule. Federal Ambulatory Medical Care Survey. Inquiry. 2003;
Register. 2008; 73(84):42. FR Parts 411, 412, 413. 40:401.
26. U.S. Department of Health and Human Services. 35. Fan J, Kao W, Yen DH, Wang L, Huang C, Lee C.
Medicare.gov - Hospital Compare. Available at: http:// Risk factors and prognostic predictors of unex-
www.medicare.gov/Hospital/Search/Welcome.asp. pected intensive care unit admission within 3 days
Accessed Jul 14, 2010. after ED discharge. Am J Emerg Med. 2007;
27. Thomson Reuters. ISI Web of Knowledge: ISI 25:1009–14.
Thomson Impact Factor. Available at: http:// 36. Miles J, Shevlin M. Applying Regression and Corre-
thomsonreuters.com/products_services/science/free/ lation: A Guide for Students and Researchers.
essays/impact_factor/. Accessed Jul 14, 2010. Thousand Oaks, CA: Sage Publications, 2001.
28. Welch S, Augustine J, Camargo CA Jr, Reese C. 37. Bridge PD, Sawilowsky SS. Increasing physicians’
Emergency department performance measures and awareness of the impact of statistics on research
benchmarking summit. Acad Emerg Med. 2006; outcomes: comparative power of the t-test and Wil-
13:1074–80. coxon rank-sum test in small samples applied
29. U.S. Department of Health and Human Services: research. J Clin Epidemiol. 1999; 52:229–35.
National Center for Health Statistics. National Hos- 38. Kassam SA, Thomas JB. Nonparametric weighted-
pital Ambulatory Medical Care Survey. Available at: signs tests for location. SIAM J Appl Math. 1977;
http://www.cdc.gov/nchs/ahcd.htm. Accessed Jul 32:649–52.
14, 2010. 39. Cristóbal J, Alcalá J. An overview of nonparametric
30. Norman GR, Streiner DL. Biostatistics : The Bare contributions to the problem of functional estima-
Essentials. Hamilton, Ontario: B. C. Decker Inc., tion from biased data. TEST. 2001; 10:309–32.
2000. 40. Kang Q, Nelson PI. Nonparametric tests for the
31. Tabachnick BG, Fidell LS. Using Multivariate Statis- median from a size-biased sample. J Nonparametric
tics. Boston, MA: Allyn and Bacon, 2001. Stat. 2008; 20:19–37.
32. U.S. Commerce Department Technology Adminis- 41. Schriger DL, Cooper RJ. Achieving graphical excel-
tration. NIST ⁄ SEMATECH e-Handbook of Statisti- lence: suggestions and methods for creating high-
cal Methods. Available at: http://www.itl.nist.gov/ quality visual displays of experimental data. Ann
div898/handbook/. Accessed Jul 14, 2010. Emerg Med. 2001; 37:75–87.
33. Peat J, Barton B. Medical Statistics: A Guide to
Data Analysis and Critical Appraisal. Malden, MA:
Blackwell Publishing Ltd, 2005.

You might also like