You are on page 1of 17

Journal of Modelling in Management

An observatory note on tests for normality assumptions


Ahmed F. Siddiqi
Article information:
To cite this document:
Ahmed F. Siddiqi , (2014),"An observatory note on tests for normality assumptions", Journal of Modelling in
Management, Vol. 9 Iss 3 pp. 290 - 305
Permanent link to this document:
http://dx.doi.org/10.1108/JM2-04-2014-0032
Downloaded on: 06 December 2014, At: 10:33 (PT)
References: this document contains references to 38 other documents.
Downloaded by Monash University At 10:33 06 December 2014 (PT)

To copy this document: permissions@emeraldinsight.com


The fulltext of this document has been downloaded 28 times since 2014*
Users who downloaded this article also downloaded:
Ajay K. Manrai, Rick L. Andrews, Peter Ebbes, (2014),"Properties of instrumental variables estimation in
logit-based demand models: Finite sample results", Journal of Modelling in Management, Vol. 9 Iss 3 pp.
261-289 http://dx.doi.org/10.1108/JM2-07-2014-0062

Access to this document was granted through an Emerald subscription provided by 451335 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald for
Authors service information about how to choose which publication to write for and submission guidelines
are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as
providing an extensive range of online products and additional customer resources and services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee
on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.

*Related content and download information correct at time of download.


The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1746-5664.htm

JM2
9,3
An observatory note on tests for
normality assumptions
Ahmed F. Siddiqi
School of Business & Economics, University of
290 Management & Technology, Lahore, Pakistan

Abstract
Purpose The purpose of this paper is to discuss how numerous tests that are available in statistical
literature to assess normality of a given set of observations perform in normal and near-normal
situations. Not all these tests are suitable for all situations but each test has an exclusive area of
Downloaded by Monash University At 10:33 06 December 2014 (PT)

application.
Design/methodology/approach These tests are assessed for their power at varying degrees of
skewness, kurtosis and sample size on the basis of simulated experiments.
Findings It is observed that almost all these tests are indifferent for smaller values of skewness and
kurtosis. Further, the power of accepting normality reduces with increasing sample size.
Originality/value The article gives guidelines to researchers to apply normality assessing tests in
different situations.
Keywords Decision making, Data analysis, Normality assumptions, Skewness,
Kurtosis, Normality test
Paper type Research paper

1. Introduction
The normality assumption is an omnipresent assumption in almost every statistical or
even statistics-oriented test of significance and models. In essence, this assumption
requires that a set of data upon which a statistical test of significance or statistical
modeling is to be applied must either exactly, or at least approximately, be normally
distributed. Primarily, it is due to the fact that almost all of these tests and models are
developed by keeping normal distribution in mind. So, a proper, apt and legitimate
application requires this distribution as primary building block. Application of
Students t test, 2 test and F test all assume normality of the parent distribution.
Secondly, for theoretical reasons (such as the central limit theorem), any variable that is
the sum of a large number of independent factors is likely to be normally distributed. For
this reason, normal distribution is used throughout statistics, natural science and social
science as a simple model for complex phenomena. For example, the observational error
in an experiment is usually assumed to follow a normal distribution, and the
propagation of uncertainty is computed using this assumption. So, a violation of the
normality results in the in-ability of these tests to verdict statistical significance.
Studies on assessing normality began probably at the dawn of the previous century
when Pearson (1900) introduced his test which was based upon estimated distribution,
Journal of Modelling in Management which is then compared with the given set of information using 2 distribution. There
Vol. 9 No. 3, 2014
pp. 290-305 are certain conditions that need to be satisfied for this test to be applied legitimately, like
Emerald Group Publishing Limited information for events must be mutually exclusive and have total probability 1. Then
1746-5664
DOI 10.1108/JM2-04-2014-0032 the sample size, both whole and per event, should be fairly large (Yates, 1934). All these
pre-requisites put a question mark upon the application and the normality assessment Tests for
by this seminal test. After this attempt, Kolmogorov and Smirnov (Kolmogorov, 1933)
proposed their test, which used cumulative distribution and its distance from the
normality
empirical distribution of the sample. This periodogram statistics, as labeled by Stephens assumptions
(1970), is based upon the deviation of the ith order statistics from its expected value. The
test is among the most used non-parametric methods, in the sense that the critical values
do not depend on the specific distribution being tested, and is sensitive to both location 291
and shape of the empirical distribution. However, the empirical distribution, which is
based upon a sample, belittles the power of the test, thus making the normality
assessment less reliable. The test was modified, up to a certain extent, by Lilliefors
(1967), who made use of the minimum distance between the cumulative and the
empirical distributions. However, both of these tests, as based upon sample estimates,
are at severe criticism. DAgostesto (1986) rendered these tests simply as historical
Downloaded by Monash University At 10:33 06 December 2014 (PT)

curiosity and suggested not to use these. Primarily, this test is meant for continuous
distribution, however, there exist its version for discontinuous data (Conover, 1972;
Horn, 1977; Gleser, 1985).
Anderson and Darling (1954) suggested their test which is based upon the
cumulative distribution of order statistics of the given set of observations. In literature,
it is considered to be a modification of the KolmogrovSimronov test, which gives more
weight to the tails. The test makes use of the specific distribution in calculating critical
values. This has the advantage of allowing a more sensitive test, especially at tails, and
the disadvantage that critical values must be calculated for each distribution. Kuiper
(1960) also suggested his V test, which is using the maximum differences between the
empirical and cumulative distribution but having a different functional form. It is a
rotation-invariant Kolmogorov-type test statistic (Jammalamadaka and SenGupta,
2001, Section 7.2). It requires the knowledge of the two normal parameters, i.e. mean and
variance, without which it is not possible to apply it (Dyer, 1974). However, Louter and
Koerts (2008) devised a modification to apply this test even in case of composite
hypothesis.
Shapiro and Wilk (1965) suggested their test by introducing a ratio of a linear
combination of the order statistics of the sample to its variance estimate. This is in
contrast to previously discussed distance-based tests. The ratio is claimed to be both
scale and origin invariant. A modification of this ratio was suggested by DAgostesto
(1972), who used ratio of a linear unbiased estimator of the standard deviation, using
order statistics, to the usual mean square estimator. The test was originally proposed for
moderate sample sizes and can detect departures from normality both for skewness and
kurtosis. Ajne (1968) suggested a test which is primarily meant for circular uniformity
but is also used to assess normality. The test is locally most powerful and invariant for
circular rotation (Stephens, 1970). Vasicek (1976) introduced a test for normality, based
upon the property of the normal distribution that its entropy exceeds that of any other
distribution with a density that has the same variance. As the entropy of normal
distribution depends only on its variance and not upon the mean, so the test is meant
only for composite hypothesis. Arizono and Ohta (1989) extended this test for simple
hypotheses by using Kullback and Leiblers (1951) information, which is an extended
concept of entropy. Jarque and Bera (1987) proposed a goodness-of-fit measure of
departure from normality, based on the sample kurtosis and skewness. The null
hypothesis, for the test, is a joint hypothesis of the skewness being 0 and the excess
JM2 kurtosis being 0. Urza (1996) warns, however, about the incorrect use of this Jarque and
Bera (1987) (JB) test in case of small- and medium-size samples. He also introduces a
9,3 modified version of the same test which is more suitable for smaller samples.
Arizono and Ohta (1989) presented a Monte Carlo simulation-based comparison of
different tests to establish the power-based superiority of their sample entropy-based
test. The study shows that this entropy-based test statistics is statistically more
292 powerful as compared to Kolmogorov Smirnov (KS), and Cramer von Mises (CVM)
tests. However, the comparison is based upon a sample of size 20 only, and different
results are expected for smaller samples. DAgostino et al. (1990) studied the usefulness
of symmetry and kurtosis measures, 1 and 2, respectively, for assessing the
normality. As both of these measures are related to the graphical presentation of the
data, they recommended a combined use of graphical and numerical techniques in doing
so. Lee et al. (2005) developed some new tests to assess normality based on U processes.
Downloaded by Monash University At 10:33 06 December 2014 (PT)

But still it is based upon order statistics which belittles the power of these tests (Yazici
and Yolacan, 2007). Oztuna et al. (2006) establishes the superiority of Shapiro and Wilk
(1965) test over four of its competitors when compared on the basis of type 1 error. Yazici
and Yolacan (2007) attempted a comparison of 12 different normality tests for their
statistical power to assess the normality assumption. They concluded that the tests
based upon cumulative density function are slightly more powerful when compared
with the ones based upon ordered statistics. The study is, however, restricted for a
sample of size up to 50 only and no attempt was made to study the skewness and
kurtosis of the under investigation sample. Asma (2008) developed computer algorithms
in Delphi programming language for most of these tests without discussing the pros and
cons. Masuda (2010) derives consistent and asymptotically distribution-free test
statistics for the normality of the driving Levy process, based on the self-normalized
partial sums of residuals. Akbilgi and Howe (2011) introduced a test based upon
identity transformation of the Gaussian function. The test is evaluated on the basis of its
type I error and the associated power. Not many menu driven statistical packages
available in the software market, like SPSS, Statistica, etc. offer a variety of these
normality assessing tests. Instead, they offer only a few whose selection is based upon
their own choice. Command-driven statistical packages like R, SAS, etc., on the other
hand, offer a wide variety of such tests. This paper is also using some built in algorithms
in R for various tests.
All these studies which attempts to compare the relative performance of different
tests available to assess the normality are stereotyped where no effort is made to study
the behavior of these tests for nearly or approximately normal situations. These
near-normal situations are usually judged either by symmetry or kurtosis measures.
Some comparisons do exists, like Yazici and Yolacan (2007), which addresses the lack of
symmetry or kurtosis but not in terms of their direct measures like 1 or 2, but in terms
of different distributions which are either skewed or showing not-zero excess kurtosis.
Such a comparison may have an academic worth, but for practical purposes and
especially when the user is non-statistician, these comparisons simply add to the
confusion. The current study focuses exactly on near-normal situations where the
behavior of these tests is assessed with respect to skewness and kurtosis. Section 2
describes the tests of normality being discussed in this study. Section 3 describes a
Monte Carlo simulation-based comparison of these tests of normality with respect to
skewness, kurtosis and sample sizes.
2. Testing tests of normality Tests for
As has been discussed earlier, there is a long list of test available in statistical literature
to test the normality assumption. It is not possible to study all these tests in a single
normality
study. Selected tests are Anderson and Darling (1954) (AD), Cramer (1928) and Mises assumptions
(1947), DAgostesto (1972) (DAG), Kuiper (1960), Kolmogorov and Simronov, 1933,
Lilliefors (1967) modification to KS (KS-L), Pearson (1900) (P), Shapiro and Francia
(1972) (SF) and lastly, but not least, Shapiro and Wilk (1965) (SW). A brief description of 293
all these tests is given in Table I.
All these tests are developed for a composite null hypothesis of normality. The first
column gives the symbol, usually used in the statistical literature, for the test, the second
column gives the authors name and the third column gives the respective expression
to calculate numerical value of these tests. Forth and fifth columns gives the Pearson
correlation coefficients showing the sensitivity of these tests for skewness and kurtosis,
respectively (details are in Section 3.1).
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Not all these tests are equally powerful in all situations and for similar situations
different tests behaves differently. It has also been observed that behavior of these tests
is more erratic in near, or approximately, normal situations. It is attempted in this article
to discriminate these tests for near-normal situations. Normality is assessed either
through symmetry, kurtosis or through any combination of these. The comparison is
based upon simulations using the same established characteristics of the normal
distribution.

3. A simulation-based comparison of these tests


Monte Carlo simulation experiments are conducted here to compare the behavior of
these normality tests in near-normal situations. It is believed that for a typical normal
distribution, coefficient of skewness should be 0 (Azzalini, 2005; Genton et al., 2001),
while its kurtosis coefficient should exactly be 3 (DAgostino et al., 1990; Chissom, 1970)
or equivalently and excess of kurtosis should be 0. Both of these characteristics are used
here to investigate the sensitivity of the above mentioned normality tests.

Assessing the effect of increasing skewness & kurtosis


The first simulation experiment starts with drawing 10,000 random samples of size 40
from a normal distribution with mean 0 and unit variance. For each of these samples,
coefficients of skewness and that of kurtosis (excess of) are calculated using
moment-based classical expressions, as developed by Charlier (1905). Normality is
assessed for each of these samples applying all the tests, discussed in Table I, and
corresponding p-values are recorded. Because all these tests are based upon a null
hypothesis of normality, a higher p-value indicates the acceptance of normality and vice
versa. To capture the behavior of p-values for coefficients of skewness and kurtosis,
Pearson correlation coefficient is computed for each of these tests between their p-values
and these coefficients. A positive correlation would show an increase in p-value for an
increase in these coefficients, while a negative correlation would shows a decrease in
p-value for increasing coefficients. A higher magnitude of the correlation indicates a
higher sensitivity for these coefficients.
One would expect all these coefficients to be negative in direction and having
identically higher magnitudes to call them equally proficient in assessing normality.
However, this is not the case and these coefficients are quite different in magnitudes.
Downloaded by Monash University At 10:33 06 December 2014 (PT)

9,3
JM2

294

Table I.
Some tests of normality
Test Correlation with
name Authors Test statistics Skewness Kurtosis

AD Anderson Darling test 1 0.486 0.245


A2 n log p (1 p (ni1) ) 2i1
n e (i)
CVM Cramer von Mises test 1 2i1 0.415 0.189
W p ( )
12n (i) 2n
DAG DAgostinos K-squared test 1 0.674 0.395
ix (i) 2 (n1) x(i)
K2 n
n (xix)
2
2
JB Jarque Bera test n ( 23) 0.781 0.475

6 1 4
K Kuiper test V D D 0.125 0.178

D max ( ni p ) (i)

i
D (i)
max ( p )
n
x(i)x
p(i)
s
KS Kolmogorov Simronov Test D max (D , D ) 0.119 0.067
i
D max p(i)( )
n
i
D max p(i) ( )
n
x(i)x
p(i)
s
(continued)
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Test Correlation with


Name Authors Test statistics Skewness Kurtosis

KS-L Lilliefore test 0.378 0.164


2 2
P Pearson goodness-of-fit test (oiEi ) 0.151 0.078
P
Ei
SF Shapiro Francis test n 0.673 0.187
( aix (i) )2
i
W= n

ni (xi x)2
SW Shapiro Wilk test 0.597 0.360
( aix (i) )2
i
W
n
i (xix) 2

Table I.
295
assumptions
normality
Tests for
JM2 The results of this experiments are shown in the column 4 and 5 of Table I. Here is a brief
commentary on these results:
9,3
All correlation coefficients for skewness are negative (as shown in column 4). This
shows that all the selected tests behave similarly for increasing values of
skewness. However, their magnitudes are quite different; varies from 0.119 for KS
to 0.781 for JB. Classical interpretation of the correlation coefficient says a 1 per
296 cent increase in coefficient of skewness develops a 0.119 per cent change in the p
value of the KS test and a 0.781 per cent in JB. In other words, KS is sensitive to a
change in skewness; however, not as much as JB, for which the correlation is the
highest. As the correlation coefficient is different for different tests, one may infer
that the sensitivity levels of these tests to skewness are different for different tests.
One may rank these tests for their sensitivity to skewness.
Column 5 shows these correlation coefficients for kurtosis values. Same, as in case
Downloaded by Monash University At 10:33 06 December 2014 (PT)

of skewness, all of these coefficients are negative, indicating an inverse


relationship of p-values with coefficient of kurtosis. However, the magnitude is
different for different tests, which indicates the varying level of sensitivity of these
tests to kurtosis. It is found that the KS test is least sensitive to the changes in
kurtosis, while JB is the most and in between these two, the sensitivity index is
different.

As sensitive a test to skewness or kurtosis as good it is in discerning normality. Results


of this simulation experiment are quite revealing and lesser sensitive tests are likely to
accept the hypothesis of normality even for higher values of skewness and kurtosis,
which should not be the case. However, as these two are not independent from each other
(Doornik and Hansen, 2008), one may think of a joint effect of skewness and kurtosis,
which is doing all the wrong. A second Monte Carlo simulation experiment is conducted
to study the behavior of these tests to skewness or kurtosis alone.

Assessing the effect of skewness after controlling kurtosis


The objective of second Monte Carlo simulation experiment is to study the effect of
skewness alone on these tests. The experiment starts also with drawing samples, each of
size 40, from a normal distribution with mean 0 and unit variance. However, to study the
skewness alone, all these samples are filtered for higher values of kurtosis and only
those 10,000 samples are accepted having their kurtosis coefficient lies within a range
of 0.5. For each of these filtered samples, observations are made for their coefficients
of skewness and corresponding p-values for the tests defined in Table I. The results are
shown in Figure 1, which is a scatter diagram with skewness measured along horizontal
and p-values along vertical axis.
Because all these tests are based upon a null hypothesis of normality, a higher p-value
is always expected, especially for all skewnesses closer to zero. Resultantly, one is
expected to have upside-down shape; high concentration of points along both the
horizontal and vertical axis, while hollow at and around origin. However, this is not the
case for many tests.
AD test, Figure 1(a), despite having concentration along vertical axis, is accepting
normality hypothesis for samples having skewness well beyond 0. No hollowness
is observed at the origin, which means it is rejecting normality for the samples
having skewness at or around 0.
Tests for
normality
assumptions

297
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Figure 1.
Performance of normality
tests for skewness

For CVM test, Figure 1(b), the aberration of points away from vertical axis for
higher values of skewness is even more explicit, as compare to the AD test. No
hollowness is observed at the origin, which means it is rejecting normality for the
samples having skewness at or around 0.
The SF test, in Figure 1(c), a hollow area is observed which indicates its power to reject
the normality hypothesis at or around 0 skewness. However, at the same time, it also
accept normality hypothesis for many samples with higher value of skewness.
However, it deals with skewness in a better way compared to AD or CVM tests.
KS-L, in Figure 1(d), shows a similar behavior as CVM; accepting many samples with
skewness well beyond 0 and at the same time rejecting normality of samples with
skewness at or around 0 skewness.
JM2 P test, in Figure 1(e), and KS test, in Figure 1(i), both show a real pathetic picture. No
upside-down shape, instead a pillar-like structure which reflects its impotence in
9,3 detecting, or rejecting, normality at 0 or around 0 skewnesses. Comments made by
DAgostesto (1986) rendering KS test simply as historical curiosity got their meanings
here in Figure 1(e) and 1(i).
SW test, in Figure 1(g), shows slightly better behavior; at least accepting normality for
298 zero skewness. However, behaving equally bad for accepting normality for near zero
skewnesses.
DAG test, in Figure 1(h), gives higher p-values to samples with skewness around 0.5.
It has a tendency to reject normality at 0 skewness, while accepting it around the 0
value.
JB test, in Figure 1(f), is probably the best test in rejecting the normality for samples
with skewness around 0, as it has a large hollow area at the center.
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Generally, all these tests have a tendency to accept normality for samples with skewness
within the range of 1 or rejecting normality at 0 skewness, except the JB test. These
were the results obtained at controlled kurtosis. Lets do a similar exercise for controlled
skewness to study the behavior of these test at varying levels of coefficient of kurtosis in
another simulation experiment.

Assessing the effect of kurtosis after controlling skewness


The third simulation experiment is conducted to study the effect of kurtosis alone on
these tests. The experiment starts, as usual, with drawing samples, each of size 40, from
a normal distribution with mean 0 and unit variance. However, to study the kurtosis
alone, all these samples are filtered for higher values of skewnesses and only those
10,000 samples are accepted having their skewness coefficient within a range of 0.5.
For each of these filtered samples, observations are made for their coefficients of
kurtosis and corresponding p-values for the tests defined in Table I. The results are
shown in Figure 2, which is a scatter diagram with skewness measured along horizontal
and p-values along vertical axis.
Because all these tests are based upon a null hypothesis of normality, a higher p-value
is always expected, especially for all kurtosis closer to 0. Resultantly, one is expected to
have upside-down shape; high concentration of points along both the horizontal and
vertical axis, while hollow at and around origin. However, this is not the case for many
tests which shows clearly a different behavior.
One of the most distinguishing feature of all these scatter diagrams is there
comparatively more towards their right side.
Results are somewhat similar to that in the previous simulation experiment.
Almost all the tests are accepting, unfortunately the hypothesis of normality for
the samples having kurtosis coefficient well beyond 0. The classical tests, like P
test in Figure 2(e) and KS test in Figure 2(i), are quite bad in assessing normality.
AD test, Figure 2(a), accepts the normality of samples having kurtosis coefficients
touching even 3.
CVM test, Figure 2(b), again accepts the normality of samples having kurtosis
coefficients greater than 3.
Tests for
normality
assumptions

299
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Figure 2.
Performance of normality
tests for kurtosis

SF test, Figure 2(c), is doing a comparatively better job, in rejecting the hypothesis
of normality for all samples having kurtosis coefficient greater than 2.
The behavior of KS-L test in Figure 2(d) is almost similar to that of CVM test.
Both JB test in Figure 2(f) and DAG test in Figure 2(h) are better in assessing the
normality like SF test.

In short, the results are not very much different from what we have seen in the case of
skewness.
In all the these Monte Carlo simulation experiments, sample size was kept constant.
The sample size may play a critical role in determination of normality, as observed by
JM2 Bearden et al. (1982) among others. Another simulation experiment is needed to study
the behavior of these tests for varying sample size.
9,3
Assessing the effect of sample size
A forth simulation experiment is conducted to study the effect of sample size on
these tests. The experiment is based on 500 random variables drawn from a normal
distribution with both mean and standard deviation of 50, filtered for higher values
300 of skewness and kurtosis in such a way that no sample has skewness or kurtosis
more than 1. The sample size is let to vary from 20 to 350 and observations are
made regarding the number of times, in percentage, these tests rejects normality.
Results are shown in Figure 3, which is a scatter diagram accentuated with a
polynomial trend. Each point in these scatter diagrams shows the number of times, in
percentage, a test is rejecting normality at a specific sample size. Expected is a
downward trend as behavior of these tests is expected to improve as sample size
Downloaded by Monash University At 10:33 06 December 2014 (PT)

increases.
However, the reality is quite different. Despite a low rejection rate, the trend is still not
downward even for a single test. Almost, for all these tests, the trend is a curve; the
rejection rate rises sharply at the start, for sample sizes up to 100, then it gets constant.
However, the situation is a little different for SF, JB and DAG tests, where the rejection
rate gets constant around a sample size of 150. The rejection rate for the KS test is the
highest. Scatterness of these plots also varies with tests. For some, like KS, JB, SF and
SW, it is not as significant as it is for others.
The results in Figure 3 confirms that the diagnostic power of the normality tests gets
better with increasing sample size. However, for sample sizes up to 100, the rejection for
normality rate for most of these tests increases, so their performance is not reliable for
sizes lesser than 100.
Apart from these near-normal situations, the behavior of these tests for entirely
non-normal situation is also worth a discussion. For the sake of the current article, only
Students t, chi-square (2), binomial and Poisson probability distributions have been
selected. Yazici and Yolacan (2007) have used many other distributions for a similar
study.

Assessing the effect of non-normal situations


The fifth simulation experiment is conducted to appraise the behavior of these tests on
the samples drawn from non-normal probability distributions. These non-normal
probability distributions include a central Students t with 4 degrees of freedom, t(4), a
central chi-square with 10 degrees of freedom, 2(10), a binomial with n 10 and p 0.5,
B(10, 0.5), and a Poisson distribution with mean value 7, P(7). Five hundred random
samples, each of size 50, are drawn from each of these four destitutions. Observations
are made about the number of times these tests rejects the normality, which then are
graphed as bar charts in Figure 4.
Ideally, these tests should results in rejecting each and every sample taken from
non-normal distributions and each of these percentage should actually be equal to
exactly 100. However, this is not the case and many of these percentages are not
equal to 100. The Students t distribution, in the Figure 4(a), being more close with a
normal remains a difficult task for most of the tests. The KS, remains pathetically
poor, and is rejecting non-normality. However, for the B(10, 0.5) distribution, which
looks like normal because of similar symmetry and kurtosis coefficients,
Tests for
normality
assumptions

301
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Figure 3.
Performance of normality
tests for different sample
sizes

performance of most of the tests remains very good except JB and DAG, which fall
prey to similarities in shape.

Results summary
These Monte Carlo experiments evaluates the relative efficacy, power and
applicability of six widely used tests of normality assessment from five different
perspectives. Tests like DAG, JB, SF and SW perform comparatively better in
almost all the situations while the tests like KS-L, CVM, AD and especially KS and
P perform pathetically poor.
Downloaded by Monash University At 10:33 06 December 2014 (PT)

9,3
JM2

302

Figure 4.

tests of normality
Comparison for different
4. Concluding remarks Tests for
One of the most common and crucial assumption for a correct application of statistical
tests is the normality of the given set of observations. The theory of statistics is very
normality
strict in this regard and has developed many yardsticks, benchmarks, graphs and assumptions
diagnostics to assess the normality. The relative efficacy and the power of these
assessment techniques varies, however, with the situation, and this is a point of concern
for many data analysts. 303
The paper is written with the sole objective of appraising the performance of different
available statistical diagnostic tests for assessing normality of a given set of
observations to discuss the appropriateness of these tests for different situations.
Although there are many normality tests available in the academic literature, none
dominate for all conditions and specific situations calls different tests. However, one
may use the results of this paper to apply these tests more cognizantly for ones own
situation.
Downloaded by Monash University At 10:33 06 December 2014 (PT)

The paper is using Monte Carlo simulation for generating random samples. Three
different kinds of random samples have been generated:
(1) normal with controlled sample size to assess the effect of skewness and kurtosis;
(2) normal with controlled skewness and kurtosis to assess the effect of sample size;
and
(3) non normal to assess the overall performance of these tests.

It is believed that one manifestation of the normality of a set of observation is that the
corresponding skewness coefficient is 0 and the kurtosis value, calculated through the
second moment ratio, 2 4 / 22 , exactly equals to 3. So, the diagnostic tests for
normality should answer in no for any deviation in these shape parametric values.
However, this is not the case. Almost all of these tests fails, in one way or the other, to
discern non-normality for little deviations in these shape parametric values; accepting
normality even when skewness is not 0 or kurtosis value is not 3. For some, like KS,
Lilliefore, this failure is even more significant. For smaller sample sizes, the tests do
perform well; the bigger the sample size, the lesser are the chances that the test would
assess the normality. One may conclude that the performance of these tests is not
reliable in near-normal situations.
These simulation indicates that the tests like DAG, JB, SF and SW perform
comparatively better in almost all the situations, while the tests like KS-L, CVM, AD and
especially KS and P perform pathetically poor. However, no verdict should be
considered final. Setting the scenario, from practical point of view, and especially for
non-technical researchers, it is quite wise to start with visual inspection of the data either
through QQ plots, density plots or some other more intelligent graphical techniques to
have a sketch of the data. Such a sketch would be helpful not only to appraise the
correctness of the diagnostics but also to locate the abnormality, if it exists. Never rely
on the results of a single diagnostic test but apply at least three of such tests. Prefer tests
like DAG and JB, which give better results in almost all situations. Further, the results of
the numerical diagnostics should be read in conjunction with the graphical sketch.

References
Ajne, B. (1968), A simple test for uniformity of a circular distribution, Biometrika, Vol. 55 No. 2,
p. 343.
JM2 Akbilgi, O. and Howe, J.A. (2011), A novel normality test using an identity transformation of the
Gaussian function, European Journal of Pure and Applied Mathematics, Vol. 4 No. 4,
9,3 pp. 448-454.
Anderson, T.W. and Darling, D.A. (1954), A test of goodness of fit, Journal of American
Statistical Association, Vol. 49 No. 268, pp. 765-769.
Arizono, I. and Ohta, H. (1989), A test for normality based on Kullback-Leibler information,
304 American Statistician, Vol. 43 No. 1, pp. 20-22.
Asma, S. (2008), Delphi programming of normality tests, Proceedings of the 7th WSEAS
International Conference on Application of Electrical Engineering, World Scientific and
Engineering Academy and Society (WSEAS), pp. 181-185.
Azzalini, A. (2005), The skew-normal distribution and related multivariate families,
Scandinavian Journal of Statistics, Vol. 32 No. 2, pp. 159-188.
Bearden, W., Sharma, S. and Teel, J. (1982), Sample size effects on chi square and other statistics
Downloaded by Monash University At 10:33 06 December 2014 (PT)

used in evaluating causal models, Journal of Marketing Research, Vol. 19 No. 4,


pp. 425-430.
Charlier, C. (1905), Uber das Fehlergesetz, Ark. Math. Astr. och Phys., Vol. 2 No. 8, pp. 1-9.
Chissom, B. (1970), Interpretation of the kurtosis statistic, American Statistician, Vol. 24 No. 4,
pp. 19-22.
Conover, W. (1972), A Kolmogorov goodness-of-fit test for discontinuous distributions, Journal
of the American Statistical Association, Vol. 67 No. 339, pp. 591-596.
Cramer, H. (1928), On the composition of elementary errors, Skand Aktuarietids, Vol. 11,
pp. 13-74.
DAgostesto, R. (1972), Small sample probability points for the D test of normality, Biometrika,
Vol. 59 No. 1, p. 219.
DAgostesto, R. (1986), Goodness-of-Fit Techniques, Marcel Decker, New York, NY.
DAgostino, R., Belanger, A. and DAgostino, R. Jr (1990), A suggestion for using powerful and
informative tests of normality, American Statistician, Vol. 44 No. 4, pp. 316-321.
Doornik, J. and Hansen, H. (2008), An omnibus test for univariate and multivariate normality,
Oxford Bulletin of Economics and Statistics, Vol. 70 No. S1, pp. 927-939.
Dyer, A. (1974), Comparisons of tests for normality with a cautionary note, Biometrika, Vol. 61
No. 1, pp. 185-189.
Genton, M., He, L. and Liu, X. (2001), Moments of skew-normal random vectors and their
quadratic forms, Statistics & Probability Letters, Vol. 51 No. 4, pp. 319-325.
Gleser, L. (1985), Exact power of goodness-of-fit tests of Kolmogorov type for discontinuous
distributions, Journal of the American Statistical Association, Vol. 80 No. 392, pp. 954-958.
Horn, S. (1977), Goodness-of-fit tests for discrete data: a review and an application to a health
impairment scale, Biometrics, Vol. 33 No. 1, pp. 237-247.
Jammalamadaka, S.R. and SenGupta, A. (2001), Topics in Circular Statistics, World Scientific
Press.
Jarque, C. and Bera, A. (1987), A test for normality of observations and regression residuals,
International Statistical Review/Revue Internationale de Statistique, Vol. 55 No. 2,
pp. 163-172.
Kolmogorov, A. (1933), Sulla determinazione empirica di une legge di distribuzione, Giornale
dellIntituto Italiano degli Attuari, Vol. 4, pp. 83-91.
Kuiper, N. (1960), Tests concerning random points on a circle, Proceedings of the Koninklijke
Nederlandse Akademie van Wetenschappen, Ser. A., Vol. 63, pp. 38-47.
Kullback, S. and Leibler, R. (1951), On information and sufficiency, The Annals of Mathematical Tests for
Statistics, Vol. 22 No. 1, pp. 79-86.
Lee, W., Arcona, S., Thomas, S., Wang, Q., Hoffmann, M. and Pashos, C. (2005), Effect of
normality
comorbidities on medical care use and cost among refractory patients with partial seizure assumptions
disorder, Epilepsy & Behavior, Vol. 7 No. 1, pp. 123-126.
Lilliefors, H. (1967), On the Kolmogorov-Smirnov test for normality with mean and variance
unknown, Journal of the American Statistical Association, Vol. 62 No. 318, pp. 399-402. 305
Louter, A. and Koerts, J. (2008), On the Kuiper test for normality with mean and variance
unknown, Statistica Neerlandica, Vol. 24 No. 2, pp. 83-87.
Masuda, H. (2010), Approximate quadratic estimating function for discretely observed Lvy
driven SDEs with application to a noise normality test, MI Preprint Series, Kyushu
University, pp. 1-18.
Mises, R. (1947), On the asymptotic distribution of differentiable statistical functions, The
Downloaded by Monash University At 10:33 06 December 2014 (PT)

Annals of Mathematical Statistics, Vol. 18 No. 3, pp. 309-348.


Oztuna, D., Elhan, A. and Tuccar, E. (2006), Investigation of four different normality tests in
terms of type 1 error rate and power under different distributions, Turkish Journal of
Medical Sciences, Vol. 36 No. 3, p. 171.
Pearson, K. (1900), On the criterion that a given system of deviations from the probable in the case
of a correlated system of variables is such that it can be reasonably supposed to have arisen
from random sampling, Philosophical Magazine Series 5, Vol. 50 No. 302, pp. 157-175.
Shapiro, S. and Francia, R. (1972), An approximate analysis of variance test for normality,
Journal of the American Statistical Association, Vol. 67 No. 337, pp. 215-216.
Shapiro, S. and Wilk, M. (1965), An analysis of variance test for normality (complete samples),
Biometrika, Vol. 52 Nos 3/4, p. 591.
Stephens, M. (1970), Use of the Kolmogorov-Smirnov, Cramer-von Mises and related statistics
without extensive tables, Journal of the Royal Statistical Society. Series B (Methodological),
Vol. 32 No. 1, pp. 115-122.
Urza, C. (1996), On the correct use of omnibus tests for normality, Economics Letters, Vol. 53
No. 3, pp. 247-251.
Vasicek, O. (1976), A test for normality based on sample entropy, Journal of the Royal Statistical
Society. Series B (Methodological), Vol. 38 No. 1, pp. 54-59.
Yates, F. (1934), Contingency table involving small numbers and the 2 test, Supplement to the
Journal of the Royal Statistical Society, Vol. 1 No. 2, pp. 217-235.
Yazici, B. and Yolacan, S. (2007), A comparison of various tests of normality, Journal of
Statistical Computation and Simulation, Vol. 77 No. 2, pp. 175-183.

Corresponding author
Dr Ahmed F. Siddiqi can be contacted at: ahmedfsiddiqi@gmail.com

To purchase reprints of this article please e-mail: reprints@emeraldinsight.com


Or visit our web site for further details: www.emeraldinsight.com/reprints

You might also like