You are on page 1of 12

New Nonparametric Tests of Multivariate Locations and Scales Using Data Depth Author(s): Jun Li and Regina Y.

Liu Reviewed work(s): Source: Statistical Science, Vol. 19, No. 4 (Nov., 2004), pp. 686-696 Published by: Institute of Mathematical Statistics Stable URL: http://www.jstor.org/stable/4144439 . Accessed: 11/10/2012 19:07
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science.

http://www.jstor.org

StatisticalScience 2004, Vol. 19, No. 4, 686-696 DOI 10.1214/088342304000000594 ? Instituteof Mathematical Statistics,2004

Tests Nonparametric Locations and Scales Using


Jun Li and Regina Y. Liu

New

of

Multivariate Data Depth

Abstract. Multivariatestatisticsplays a role of ever increasingimportance in the modem era of informationtechnology.Using the center-outward rankinduced the notion of data we several describe ing by nonparametric depth, tests of location and scale differencesfor multivariate distributions. The tests for location differences are derived from graphs in the so-called DD plots (depthvs. depthplots) and are implementedthroughthe idea of permutation tests. The proposedtest statisticsare scale-standardized measuresfor the location differenceand they can be carriedout withoutestimatingthe scale or varianceof the underlyingdistributions. The test for scale differencesintroduced in Liu and Singh (2003) is a naturalmultivariate ranktest derivedfrom the center-outward depthrankingand it extends the Wilcoxon rank-sumtest to the testing of multivariatescale. We discuss the propertiesof these tests, andprovidesimulationresultsas well as a comparisonstudyundernormality. Finally, we apply the tests to compareairlines' performancesin the context of aviationsafety evaluations. location differKey words and phrases: Data depth, DD plot, multivariate multivariate scale difference, permutationtest, multivariaterank test, ence, Wilcoxon rank-sumtest. 1. INTRODUCTION Recent advances in computer technology have facilitated the collection of massive multivariate data in many industries.The demandfor effective multivariate analyses has never been greater.Most existing multivariate analysis still relies on the assumptionof normality, which is often difficult to justify in practice. Using the center-outward rankinginduced by the notion of data depth, we describe several nonparametric tests for location and scale differences in multivariate distributions. These tests arecompletelynonparametric and have broaderapplicabilitythan the existing tests. They can even be moment-freeand thus valid for testing parameterswhich are not defined using moments, such as the locations of Cauchydistributions. Fortesting location differences,our test statisticsare constructedfrom the graphsobserved in the DD plots Jun Li is a Ph.D. candidate and Regina Y Liu is Professor Department of Statistics, Rutgers University, Piscataway, New Jersey 08854-8019, USA (e-mail: junli@stat.rutgers.edu, rliu@stat.rutgers.edu). 686 (depth vs. depth plots). The DD plot, introducedby Liu, Parelius and Singh (1999), is a two-dimensional graph which can serve as a simple diagnostic tool for visual comparisonsof two samples of any dimension. Different distributionaldifferences, such as location, scale, skewness or kurtosisdifferences, are associated with different graphicalpatternsin DD plots. In this paper,we focus on the patternassociated with the location difference in the DD plots and we propose two tests for testing possible location differences between two samples. Since the data depth is affine-invariant, it provides a scale-standardizedmeasure of the position of any data point relative to the center of the distribution. This propertyallows us to view our depthbased test statistics as scale-standardized measuresfor the location difference. Consequently, the tests can be carriedout without the difficulty of estimating the varianceof the null samplingdistributions. Instead,we derive the decision rules by obtaining p-values using the idea of permutation. For testing multivariatescales, we review a new ranktest proposedby Liu and Singh (2003). This rank

TESTS OF MULTIVARIATE AND SCALES LOCATIONS

687

test is derived from the center-outwardranking induced by data depth assigned to the combined sample. It is constructedin a way similarto the Wilcoxon rank-sumtest and can be carriedout using either the Wilcoxon rank-sumtable or simulations.It is a comtest for testing the scale expanpletely nonparametric sion or contraction.It includes the Ansari-Bradleyand the Siegel-Tukey tests as special cases for testing the equalityof variancesin the univariatesetting. The tests discussed in this paperare appealingsince they are guided visually by the DD plot and admit a full theoretical justification.Most importantly, they are to of the easy implementregardless dimensionalityof the data. For all the proposedtests, we presentseveral simulation studies, including power comparisonsbetween our proposednonparametric tests and some existing parametrictests. The performanceof our tests is generally comparableto the parametrictests under the multivariate normalsetting,with only minorloss of efficiency.However,our tests dramatically outperform the parametric tests underthe multivariate Cauchy setThis in is ting. partbecause our tests are moment-free and thus are more suitablefor dealing with parameters not derivedfrom moments,such as those in the case of Cauchydistributions. The rest of the paperis organizedas follows. In Section 2 we give a briefreview of the notion of datadepth, depth-inducedmultivariate rankingsand DD plots. In Section 3 we describe two tests for location differences. These tests are referredto as the T-based test and the M-based test, and they are derived from the graphsin DD plots which reflect location changes between two distributions. We then carryout the tests by test to determinep-values. using Fisher's permutation We justify the validity of the tests by showing that the distributionof the obtained p-values follows apU[0, 1] underthe proximatelythe uniformdistribution null hypothesis and also that it decreases to 0 under the alternativehypothesis. Results from several simulation studies are presented. Under the normalityassumption,our tests are comparableto the Hotelling T2 test. We devote Section 4 to the scale comparisonof two multivariate which includes a depthdistributions, induced multivariaterank test introducedin Liu and Singh (2003) and a graphical display of scale curve (Liu, Parelius and Singh, 1999). In Section 5 we apply our tests to compare the performanceof 10 airlines using the airline performancedata collected by the Federal Aviation Administration(FAA). Finally, we providesome concludingremarksin Section 6.

MATERIAL AND BACKGROUND 2. NOTATION We begin with a brief descriptionof the notion of data depthand its properties. 2.1 Data Depth and Center-OutwardRanking of MultivariateData A data depth is a way to measure the "depth"or "outlyingness"of a given point with respect to a multivariate data cloud or its underlying distribution.It gives rise to a naturalcenter-outwardordering of the sample points in a multivariatesample. This ordering gives rise to new and easy ways to quantifythe many featuresof the underlyingdistricomplex multivariate bution, including location, quantiles, scale, skewness and kurtosis. This ordering in effect turn provides a inference scheme (cf. new nonparametric multivariate Parelius and Liu, Singh, 1999), which includes sevdiseral graphicalmethodsfor comparingmultivariate tributionsor samples. Some of the methods in Liu, Parelius and Singh (1999) motivated the comparison methods presentedin this paper.Before we show how the depthand its orderingcan be used to constructmultivariatenonparametric tests, we first use the simplicial depth proposed by Liu (1990) as an example of depth measure (1) to describe the general concept of ordercenter-outward datadepthand its corresponding to notation. introduce and (2) necessary ing, The word "depth"was first used by Tukey (1975) to picture data, and the far reaching ramificationsof data was depth in orderingand analyzingmultivariate observed and elaboratedby Liu (1990), Donoho and Gasko (1992), Liu, Pareliusand Singh (1999) and others. Many existing notions of data depthwere listed in Liu, Pareliusand Singh (1999). Let {Y1,..., Y,,} be a randomsample from the distributionG(.) in IRk,k > 1. We begin with the bivariate setting k = 2. Let A(a, b, c) denote a trianglewith vertices a, b and c. Let I(.) be the indicatorfunction, that is, I(A) = 1 if A occurs and I (A) = 0 otherwise. Given the sample {Y1,..., Y, }, the sample simplicial depthof y e R2 is definedas

-1

(2.1) DGm(Y) =() (*)

I(y

A(Yil, Yi2,Yi3))

which is the fraction of the triangles generatedfrom the sample that contain the point y. Here (*) runs over all possible tripletsof {Y1,..., Y,m }. A large value of DGm(Y) indicates that y is contained in many triangles generated from the sample, and thus it lies deep within the datacloud. On the other hand, a small

688

J. LI AND R. Y. LIU
-(a)

DGm (y) indicatesan outlying position of y. Thus DGm is a measure of the depth (or outlyingness) of y w.r.t. the datacloud {Yi,..., Ym). The above simplicialdepthcan be generalizedto any dimensionk as

c'j

DGm(Y)

(2.2) , (y I E 1 Skm+ (,;,)l~l(*)[?;y~1 I(Yik+1), s[Yil where (*) runsover all possible subsetsof {Y1,..., Ym } of size k + 1. Here s[Yi, ..., Yik+l] is the closed simplex whose vertices are {Yil,..., Yik+1}, that is, the smallest convex set determined by ..Yik+1 }. {Yi1,., When the distributionG is known, then the simplicial depthof y w.r.t.to G is definedas ... , DG(y) = PGY E s[Y1, Yk+l]}, where Y1,..., Yk+l are k + 1 random observations from G. Depth DG(y) measureshow deep y is w.r.t.G. A fuller motivationtogether with the basic properties of DG(.) can be found in Liu (1990). In particular, it is shown that DG (.) is affine-invariant and that DGm (') convergesuniformlyand stronglyto DG(.). The affine invarianceensures that our proposed inference methods are coordinate-free,and the convergenceof DGm to DG allows us to approximate DG(.) by DGm () when G is unknown. For the given sample I{Y1, we calcu..., Y2, Ym,), late all the depth values DGm(Yi) and then order the Yi's according to their ascending depth values. Denoting by Y[j] the sample point associated with the jth smallest depth value, we obtain the sequence {Y[1], Y[2],.. . , Y[m] which is the depth orderstatistics of the Yi's, where Y[m] is the deepest point and Y[l] is the most outlying point. Here, a smallerrankis associated with a more outlying position w.r.t.the underlying distributionG. Note that the order statistics derived from depth are different from the usual order statistics in the univariatecase, since the latter are ordered from the smallest samplepoint to the largest,while the formerstartsfrom the middle sample point and moves outwardin all directions.This propertyis illustratedin Figure 1, which shows the depthorderingof a random sample of 500 points drawn from a bivariatenormal distribution. The plus (+) marksthe deepest point, and the most inner convex hull encloses the deepest 20% of the sample points. The convex hull expandsfurther to enclose the next deepest 20% by each expansion. Such nested convex hulls, determinedby the decreasing depthvalue, also indicatethatthe depthorderingis from the centeroutward. (2.3)

-4

-2

0
x,

FIG. 1. Depth contoursfor a bivariatenormalsample.

When the distributionG is known, DG(y) leads to an orderingof all points in IRk from the deepest point outward.The deepest point here is the maximizer of DG(.) (or the average of the maximizers if there is more than one), which is denoted by g*. Clearly, /* can be viewed as a location parameter of the distribution G, and it coincides with the mean (and the center of symmetry)if G is symmetric. 2.2 DD Plots for Graphical Comparisons of MultivariateSamples Let {X1,...,Xn} (= X) and {Y1,..., Ym} (= Y) be two random samples drawn, respectively, from F and G, where F and G are two continuous distributions in Rk. Comparisonsof the two samples can be conveniently studied in the frameworkof testing the null hypothesis
Ho: F = G.

Depending on the specific differencewe seek between F and G, we can choose a properalternative hypothesis to carryout the test. The so-called depth vs. depth plots were proposedby Liu, Pareliusand Singh (1999) for graphicalcomparisonsof two multivariate samples. Specifically, the DD plot is the plot of DD(F,, Gm), where DD(Fn, Gm) (2.4)
= {(DFn (X), DGm (X)), X E {X U Y}}.

This is the empiricalversion of


DD(F, G) (2.5) = {(DF(x), DG(X)), for all x E Rk

}.

TESTS OF MULTIVARIATE AND SCALES LOCATIONS (a)


0o
aO00

689

(b)
o O 0 0 0 0

0,
0
9o0

0 0 o00 o on e 0
0 %O0
00
0
0

o
80
0 o'

SLO 00
0

0 00

0o

08

0?

0_

0 o 00 0 o 0 & 0? -0 0 0 o 0 00 000 0 oo o0 0o0?00o00 0 0 o oOo o0 08 00 o00 0 0 Q) o0a0f 0 o000o00 0 oL 00 0o oe 0 0 o o o Co o8o o o Oo o " 0 Oo o o,, 0o.

00o oo o _%

00

005

010
DF

015 DF

020

025

00

005

010
DF

015 OF

020

025

FIG. 2.

DD plots of (a) identical distributionsand (b) location shift.

Note that DD(F, G) as well as DD(Fn, Gm) are always subsets of R2 no matterhow large is the dimension k of the data. The two-dimensionalgraphsof DD plots are easy to visualize and they turnout to be convenient tools for graphical comparison of multivariate samples. If F = G, then DF(X) = DG(X) for all x e Ik, and thus the resulting DD(F, G) is simply a line segmenton the 450 line in the DD plot, from (0, 0) to (maxt DF(t), maxt DG(t)). This is illustratedby the simulationresult in Figure 2(a), which is the DD plot of two samples drawnfrom the bivariatenormaldistribution with mean (0, 0). Deviations from the 450 line segment in DD plots would suggest that there are differencesbetween the distributionsF and G. As it turns
(a)

out, each particular patternof deviationfromthe diagoto a specifictype of difference nal line can be attributed For example, as shown between the two distributions. in Figure2(b), in presenceof a location shift in the two samples, the DD plot generally has a leaf-shapedfigure, with the leaf stem anchoringat the lower left corner point (0, 0) and the cusp lying on the diagonalline pointing towardthe upperright corner.(The variations of the leaf shape reflect the magnitudeof the location shift as well as the symmetry of the underlying distributions,as we discuss furtherin Section 3.) Figures 3(a) and (b) shows yet different patternsof DD plots (the half-moonpatternandthe wedge-like pattern)that are indicative,respectively,of scale and skewness differences.
(b)
LO

aqg
40o

O0

o
o oo

o
6 o%6

o
d
o

IOF6o 0O ; 0 00ce 0 b (31 Qj o? 0 O;/, 0 oob0


o

C~
__0_

0?t

-9
0 __0_

LO c
0. 0. 5

0.0 0.1 .05

0.5

020

.250.0

0,05

0.15 0,10

0,20

0.250.0

0 0 00 0 Lc0 0 0R 0
0 .(05
0 o

__0_ __0_

__0_

0 .10
_

0.1
0

.0

02
02

0.0

,10.0 .0

0.1

0.15

0.2

0.25

0.

0. 5

0.1
D-F

0.15

0.2

0.25

D-F

FIG. 3.

DD plots of (a) scale increase and (b) skewnessdifference.

690 (a)

J. LI AND R. Y LIU (b)

0 o_0
6

co
0

C
-4 -

0l

^0

0O00 0 005 OF 0 0 *o 06)(?o)o

0 ?

o I

0oo

)0
0

000o
C
0 0

0
0

d2 4 6 -4 -2 0d t2
x

dl -2

t 2

co o0o0 d~ co OR C 000
80 o

0C

0(

00

0 05

0.0 0.05 0.15 0.20 0.25 0.10


DDF

,100

.15

.20

0.2

FIG. 4.

(a) Twodistributionswith a location shift. (b) DD plot of large location shift.

3. TESTS OF LOCATION DIFFERENCES USING DD PLOTS As describedabove, DD plots can serve as diagnostic tools for detecting visually the difference between two samples of any dimension.To make DD plots rigorous testing tools, we need to constructtest statistics which can capturedeviation patternsin DD plots and establish the null distributionsfor those statistics. In this section, we focus specifically on derivingtest statistics from DD plots for testing location differences. Some brief comments on testing other distributional differencesare made in the concludingremarks. Y,m } - G are two given samples in Rk. For convenience, we assume that n = m, althoughthe inference methods described in this paper remain valid otherwise. Assume that F and G are identical except for a possible location shift 0 [i.e., G(.) = F( - 0)]. The hypothesesof interestare then

Recall that X _={X1,..., Xn}- F and Y =E {YI,...,

(2-k, 2-k), the maximumDD value achievableby the deepest point under the null hypothesis. For example, when k = 2, 1/4 is the achievable maximumfor DF(') and DG(.). The larger the location shift is, the closer the tip of the leaf is pulled diagonallydownward to (0, 0). Figure 4(a) illustratesthis phenomenonin a simple univariatesetting, using two symmetricdensity functions with a location shift. The crossing of the two densities occurs at the point t. The cusp point of the leaf-shapedDD plot in Figure 2(b), markedby a solid dot, correspondsto (DF (t), DG(t)). If there are no location shifts, then the two densities coincide and t is the deepest point for both F and G. Hence, DF(t) = DG(t) = 1/2 and the cusp point hits exactly the right upper corner point. If the location shift widens, then the cusp point pulls further toward(0, 0), as seen in the DD plots in Figures 2(b) and 4(b). 3.1 T-Based Test: MonitorShrinking Cusp Point The above observationsuggests that the closer the cusp point is to (0, 0), the more likely there is a location shift between the two underlyingdistributions. This suggests considering the distance between the cusp point and (0, 0) as our testing statistic.Before we define this distance more precisely, we note that the sample versions of depth DF, and DG, are both discrete, and thus the cusp point of the DD plot may not fall exactly on the diagonal line, especially if m n. Therefore,we first introducethe following notationto definethe relativepositionsof any two points in R2 and to derivea convenientapproximation of the cusp point.

0=O Ho

vs. Ha:0

o0.

Note that F and G are not requiredto be symmetric. If they are, their deepest points (i.e., the location parameters)coincide with the centersof symmetry(as well as the means). Under Ho, the DD plot DD(Fn, G,) should be clustered along the diagonalline, as seen in Figure2(a). In case there is a location shift from F to G, the DD plot exhibits a leaf shape with its tip pulling away from the upperright cornerpoint along the diagonalline toward the lower left comer point (0, 0). Note that, using the simplicial depth, (maxt DF (t), maxt DG(t)) =

TESTS OFMULTIVARIATE ANDSCALES LOCATIONS For (al, bl) and (a2, b2) in IR2,we define (al, bl) >- (a2, b2) if al > a2 and bl > b2,
(al, bl) -< (a2, b2) otherwise.

691

Define the set

Q {Z E X U Y:there does not exist W E X U Y


s.t. (DF, (W), DGn (W)) - (DF, (Z),
DG, (Z))

}.

Then the cusp point is identified or approximated by the point (DF,(Zc), (Zc)) that satisfies Zc e Q DGn and IDF (Zc) - DG,(Zc) I IDF (Z) for DG, (Z)I all Z e Q. Let Tn= (DF, (Zc) + DGn(Zc))/2. (3.1) Then the distance from the cusp point to (0, 0) is approximatelyequal to v/2Tn. This is equivalentto working with Tn. Intuitively,the larger the location shift between the two distributions, the smallerthe Tn value and thus the strongerthe evidence against Ho. To determine when Tn is small enough to reject Ho deciof Tn.The sively, we need to derivethe null distribution derivationof this null distribution turnsout to be quite demanding and we plan to carry it out as a separate projectin the future.Alternatively,we proposehere to use Fisher's permutation test to determinethe following p-value and complete our test procedure.(The idea of using Fisher'spermutation test to obtainthe p-value of a test is well known; for reference see, e.g., Chapter 15 of Efron and Tibshirani,1993.) Let
(3.2) p = PHo(Tn < Tobs),

1. Permute the combined sample X U Y B times. Here B is sufficiently large. For each permutation, we treat the first n elements as the X sample and the remaining elements as the Y sample. Denote the outcome of the ith permutationby X* = {XI X* } and Y* = {Y ... Yi* } for .... i=1 ... B. 2. Obtain the DD plot for each X* U Y* and evaluate the correspondingTn value [following (3.1) or (3.3)], which is then denotedby Ti*,i = 1,..., B. The empirical distributionof Ti*,i = 1,..., B, can of Tn. of the null distribution be used as an approximate the under in can be defined H0, (3.2) Consequently, pn by approximated
B

(3.4)

pnB

i=1 I{*Tobs}/B.

where Tobsis the observed value of T, based on the is also referredto as the given sample X U Y. The achieved significancelevel. pnT REMARK3.1. If the sample size is sufficiently large, the definitionof Tnin (3.1) can be approximated by (3.3) Tn = max {DFn(Z): DFn(Z) = DGn(Z) }. ZEXUY REMARK 3.2. If the underlying distributions F and G are symmetric,the populationversion of Tn, denoted by T, is the depth (under either F or G) of the midpoint of the line segment that connects the two centers of symmetry.In the setting of Figure 4(a), T = DF(t). Withoutthe null distributionof Tn, we can proceed and use the permutationmethod to approximatethe T-based p-value defined in (3.2). The procedureis as follows.

The above permutation procedurecontains, in prinIf n is not too large or comciple, all n! permutations. then we let B = n!. not a is concern, putationalspeed In theory,for any testing procedure,a valid p-value follows a uniform distributionon [0, 1], U[0, 1], unis valid, since der Ho. It is clear thatour p-value, test. Our simulait is derived from the permutation pnTB, tion results also show that, under Ho, the histograms of P B are reasonablyclose to U[0, 1]. We have shown that the T-based test described above works well under the null hypothesis and allows us to control the type I errorusing U[0, 1]. We now proceed to evaluate the power of this test under the alternativehypothesis. Ideally, the power of the T-based test grows (or, equivalently, decreases) we as the location shift increases.To this end, pnTB conduct some simulationexperimentsunderboth bivariatenormal and exponentialdistributions.Figure 5 shows the normalcase, where histogramsof B for the bivariate F = N((0, 0),pnT I) and G = N((, /t), I) with (A, /t) equal to (0.1,0.1), (0.2, 0.2) and (0.3, 0.3) for G, respectively, from top down. Clearly, as the location shift grows larger,the histogramsfrom top down become more skewed to the right and the pT'B value leans more toward 0. This shows that the power of the T-based test grows as the location shift increases, which is a desirable property.Similar patternsof hisare observed for the bivariateexpotograms of where the mean for F is (1, 1) and is nential case, pnTB (0.9, 0.9), (0.75, 0.75) and (0.5, 0.5) for G. In both settings, we let n = 100, B = 500 and use the simplicial depth to obtain DD plots. For each case, the experimentis repeated 100 times to obtain 100 correS B's. sponding p7 Pn,B

692

J. LI AND R. Y. LIU

ing its achieved significancelevel, definedas


(3.6)
0.0 0.2 0.4 Pn1 0.6 0.8 1.0

= PHo(Mn < Mobs). PnM

test to estimate Again, we turnto Fisher'spermutation The procedure consists of the two the p-value pnm. steps outlined for the T-based test, except that each replicationis now used to evaluatethe Mn permutation
0.6 0.8 1.0

0.0

0.2

04

as defined in (3.5). Denote by Mj* the Mn value obThe p-value pn is then tained in the ith permutation.

by approximated
(3.7)
Pn3

PnmB

E I{M i=1

obs}/B.

= (0, 0) FIG. 5. Histogramsof pT B underHa, where Ho: (i, /) and Ha:(/(, L) = (0.1,0.1), (0.2, 0.2) and (0.3, 0.3) (bivariate normal case).

Discussions of the validityandpowerof the T-based test in terms of the proposed p-value apply similarly based on the M-based test deto the proposed pnB the scribed above. Again, histogramsof the 100 simuunder the standard bivariate normal and lated pnMB'S

3.2 M-Based Test: Monitor the Maximum

Depth Points We next propose anothertest based on the DD plot for detectinga location change in two multivariate distributions.This test statistic is more in the form of a location estimator. In the context of data depth, the location parameterof a distributionis defined as the deepest point (see, e.g., Liu, Pareliusand Singh, 1999). If the two distributionsF and G are identical, they should have the same deepest point. On the otherhand, if there is a location change, the deepest point of the distributionF would no longer be the deepest point of the distributionG and thus it attains a smaller depth value w.r.t. G. The larger the location change is, the smallerthis depthbecomes. This trendcan also be observedfromthe DD plots in Figures2(b) and4(b). This observationmotivatesour second test below, in which the test statistic monitors directly the depth values of Let the deepest points of the underlyingdistributions.
(3.5)
Mn = min { DF (ZGn), DG, (ZF,)

exponential distributionsappear close to U[0, 1] under Ho, and they skew more to the right as the location shift widens, as observedin Figure 5. 3.3 Power Comparisons: T and M Tests versus Hotelling T2 Since both T- and M-based tests are completely nonparametric,it should be interesting to compare them to known parametrictests to see their loss of efficiency, if any. The first comparison is with the Hotelling (1947) T2 test underthe normalityassumption where F = N((0, 0), I) and G = N((/, it), I).

Each test is repeated 1000 times and the simplicial depth is used to compute needed depth values. The power of the T-based (or M-based) tests is estimated by the proportionof the simulated P,B's (or PnM's) which are less than the nominal type I error a = 0.05. Table 1 lists the estimated power for

},

and are the deepest points among where ZG, ZF,0 X U Y with respect to Gn and Fn, respectively. REMARK 3.3. In theory, we may also consider other functions of the two depths such as the maximum in (3.5). However, we choose to work with the minimum of the two depths, because the minimum is more sensitiveto the location change andit can achieve more power. We now proceed and carryout the test by determin-

/t = 0, 0.1, 0.2, 0.3, 0.4, 0.5. The results clearly show thatboth T- and M-based tests performcomparablyto the Hotelling T2 test, even thoughthe formerare comand do not utilize the normality pletely nonparametric assumption.
TABLE1 Power comparisonunderbivariatenormaldistributions

A
T-based M-based Hotelling T2

0
0.054 0.060 0.059

0.1
0.109 0.113 0.124

0.2
0.373 0.386 0.410

0.3
0.714 0.710 0.765

0.4
0.933 0.921 0.953

0.5
0.993 0.988 0.995

AND SCALES TESTS OF MULTIVARIATE LOCATIONS TABLE 2 Power comparisonunderbivariateCauchydistributions

693

4.1 LargerScale-More Outlying DataSmaller Ranks Using any measure of depth, we can compute the depth values of the points in the combined sample W. We then assign ranks to the combined sample W according to the ascending depth values, namely, lower ranks to the points with lower depth values. Specifrank of Yi ically, we let r(Yi) be the center-outward within the combined sample, thatis,
r(Yi) = #{Wj E W: Dn+m(Wj) < Dn+m(Yi),

I
T-based M-based Hotelling T2

0
0.052 0.046 0.020

0.1
0.060 0.072 0.010

0.2
0.114 0.118 0.020

0.3
0.154 0.214 0.034

0.4
0.214 0.324 0.022

0.5
0.350 0.522 0.052

We also conducted the same comparison study for with the location pathe bivariateCauchydistributions rameter (it, it). Clearly, both T- and M-based tests the Hotelling T2. This can be attributed to outperform the fact thatthe firsttwo tests using the simplicialdepth are moment-freeapproaches andthus more suitablefor not derivedfrom moments, testing location parameters such as in the case of Cauchydistributions. The results in Table 2 seem to suggest also that the M-based test is more powerful than the T-based test in the Cauchy case. We plan to investigatefurtherthe difference between the T- and M-basedtests, includingtheirrobustness propertiesas well as their capabilityto cope with asymmetricunderlyingdistributions. 4. RANKTESTS FORSCALEEXPANSION OR CONTRACTION
F and Y - {Y1,..., Let, again, X - {Xi, ..., Xn} ,Y, } -- G be two given samples in Rk. Assume that F and G are identical except for a possible scale difference. For simplicity, assume that we are interested in testing if G has a larger scale in the sense that the scale of G is an expansionof thatof F. In otherwords, the hypothesesof interestare

(4.2)

j =l, 2,...,n + m}, and we let the sum of the ranksfor the sample Y be
m

(4.3)

R(Y)=

r(Yi).
i=1

Here, Dn+m (0) is the sample depth value of * measured w.r.t. {W1,W2,..., Wn+m}. Under Ho, if there can be viewed as a ranare no ties, {r(Y1), ..., r(Y,m))) from dom sampleof size m drawnwithoutreplacement the set {1, ... , n +m }. If Ha is true,thenthe Yi'stendto be more outlying, and thus assume smaller depth values and thus smaller ranks.In other words, we should reject Ho if the ranksum R (Y) is too small. The critical values for carryingout this test can be implementedusing the Wilcoxon rank-sumprocedureas if one is testing a negative location shift in the univariatesetting. For a review of the Wilcoxon rank-sumtest and its tabfor differentsample size combinaulated distributions for (1984). When tions, see, example, Hettmansperger m are n and sufficiently large, following the largewe can reject Ho if R* < z, for sample approximation, an a-level test. Here (4.4) ( R+
R(Y) - (m(n + m + 1)/2} m + 1) {nm(n + m + 1)/12}1/2

Ho: F and G have the same scale (4.1)


Ha :G has a largerscale.

Combine the two samples, that is, let W - {W1,W2, .., Wn+m}-{X1, ...,Xn, Y1,..., Ym}. If G has a larger scale, then the Xi's are more likely to cluster tightly aroundthe center of the combined sample, while the Yi's are more likely to scatter at outlying positions. This outlyingness can be easily capturedby data depth. Following this observation,Liu and Singh ranktests to compare (2003) developed depth-induced scales among two or multiple multivariate samples. In this paper we provide a brief review of their ranktest for two samples.Detaileddiscussionsandjustifications can be found in Liu and Singh (2003).

The depthrankingof samplepoints,due to its centeroutwardnature,often leads to ties, especially in high dimension cases. To use the tables provided for the Wilcoxon rank-sumtest, we may consider the random tie-breakingscheme. However, we can actually carry out the test and obtainits exact p-value withoutbreaking ties by the following approach. Since powerful computingfacilities are easily availablenowadays,we for can use computersto obtainthe exact distributions the observed ranks, with or without ties. Specifically, we permuteall the observedranks(possibly including ties), calculatethe sum of the firstm ranksin each permutation,and finally tabulatesuch ranksums and their

694

J. LI AND R. Y. LIU

correspondingfrequenciesin the total numberof permutations.This distribution allows us to determinethe exact p-value of our test, which is simply the proportion of the ranksums which areless thanor equal to the observedranksum in (4.3). As an illustrativeexample, we assume that n = m = 2 and that the ranks for the combined sample turnout to be {1, 2, 2, 4) with a tie. The sampling distributionof the rank sum R -- R(Y) is P(R = 3) = 8/24, (4.5)5) P(R = 5) = 4/24,
=

P(R = 4) = 4/24,

4/24,

P(R =6)

P(R = 6) = 8/24.

8/24.

Therefore, if the observed rank sum is 4, then the p-value is P(R < 4) = 0.5. For large samples, the distributionof the rank sum can be approximatedby In consideringlarge enough numbersof permutations. Table3, we presentsome simulationresultsto examine the power of the rank test. Here the samples are from threebivariatedistributions: Cauchy,normaland expoeach with the nential, componentvarianceo2. We as= sume n m, and consider n = 20 and n = 30. In each case 5000 randompermutations of the observedranks were used to approximatethe sampling null distribution, and the rank test at significance level 0.05 was repeated1000 times. The results in Table 3 show that the power achieved by the rank test for scale expansions is quite respectable,especially in the nonnormalcases. A power comparisonbetween the above ranktest and a X2 test under the normality assumptioncan be found in Liu and Singh (2003). The results there show some minor loss of efficiency of the rank test in the normal case. Liu and Singh (2003) also discussed in detail the properties of this ranktest as well as severalapproachesfor dealing with large numbersof ties in the depth ranking. Moreover,they also generalized the rank test to the case of multiple samples. Note that the rank test described above can be viewed as the multivariategeneralizationof AnsariBradleyand Siegel-Tukey tests for testing the equality
TABLE 3

of variancein the univariatesetting. Both tests try to assign smallerranksto the datapoints which are more outlying toward two tails, although the Siegel-Tukey test avoids ties by alternating ranks. If we are interestedin testing whetheror not G has a smaller (contracted)scale, then we should reject the null when the ranksum is too large. The rank test above is easily implementableand is Its p-value yields a decisive completelynonparametric. decision rule.The test resultcan be independently verified visually by two graphicaltools: One is the DD plot [see Figure 3(a) and the discussion in Section 2.2]; the otheris the scale curve introducedby Liu, Pareliusand Singh (1999). The sample scale curve derived from a sample of size n is definedas (4.6) Sn(p) = volume {C,,p) for 0 5 p < 1.

Here Cn,p is the convex hull that contains the [npl deepest points. Roughly speaking, the scale curve measures the volume expansion of the nested depth contours,as seen in Figure 1, as the contours grow to enclose more probabilitymass. This plot of S,n (p) versus p shows the scale of the distributionas a simple curve in the plane, which is easily visualized and interpreted.When comparingthe scales of two samples, if one scale curve is consistently above the other,then the sample with the higher scale curve is more spread out and thus has a largeror expandedscale. TOAIRLINE 5. APPLICATION PERFORMANCE DATA We apply all tests described so far to an analysis of an airline performancedata set collected by the FAA. measuresof It consists of severalmonthlyperformance fromJuly 1993 to May 1998. The the top 10 air carriers measuresinclude the fractionsof nonconperformance in and operation surveillance. airworthiness formity A small nonconformityfractionis a desirablefeature. multivariate controlcharts(Liu, Severaldepth-induced Liu and 1995; Cheng, Luxh0j, 2000) have been used to monitor and compare the performancesof all 10 airlines. For illustration, the T- and M-based tests are used to determine whether there is a significant difference in location (referredto as expected target performancein the aviationsafety domain) in the distributionsthat underlietwo air carriers.In comparing air carriers 1 and 4, their scatterplots in Figure 6(a) show a clear location shift to the upper right in carrier4. The deepest point of carrier4, markedby a solid triangle,is moreto the upperrightthanthatof carrier1,

Simulatedpower of the ranktestfor scale expansions (a = 0.05) n = 30 a 1-1 1-1.2 1-2 Cauchy Normal 0.056 0.345 0.996 0.044 0.325 0.994 Exp 0.049 0.218 0.940 n =20 Cauchy Normal 0.054 0.261 0.966 0.051 0.242 0.940 Exp 0.043 0.188 0.813

AND SCALES LOCATIONS TESTS OF MULTIVARIATE (a)


o
o * 1 Carrier
4 Carrier

695

(b)
0
O

o0 t 00 0 0o
0000
A

1 Carrier for point deepest 4 Carrier for deepest point


A

00

0 0 0

AoA"o

0o0

SAoA
0 00 0 0 0

A
0 &0

o00
0

0
00 0

00

0.04

0.06

0.08

0t1 AW

0.12

0.t4

00

01

0.0

0.05

.0t

0.t5 ODF

0.20

0.25

0.30

FIG. 6.

(a) Scatterplot and (b) DD plot for carriers 1 and 4.

markedby 9. The DD plot for the two carriersin Figure 6(b) has the cusp point pulled down toward(0, 0) to the midrangeof the plot andclearlyindicatesa location difference in the two distributions.Using both T- and M-based tests, we found the approximated p-values to be nearly 0, which confirmsa significantlocation shift in the two distributions. In judging airline performance,in addition to examining the expected targetperformance(i.e., the location of the distribution)of the airlines, the stability of the performancewithin the airlines is also a major concern. This measure of stability is simply the measure of scale or variationof the performancedistribution. Thus, comparingperformancestabilityamounts to comparingthe scales of distributions. Largeror more We prostable mean less scales performance. expanded
(a)

ceed and compare the scales of carriers1 and 4. The p-value is 0.00038 using the test statistic in (4.3), which clearly supportsthe conclusionthatcarrier4 has a larger scale than carrier 1. In other words, the performances of carrier4 are more scatteredand hence less stable. This same conclusion can also be reached by examining the two graphsin Figure 7. Figure 7(a) is the DD plot of carriers 1 and 4 after centering the data respectively at their deepest points, removing the effect of location difference. It shows a patternwhich combines Figure 3(a) and (b). This suggests that there are both scale and skewness differences between the two carriers.Figure 7(b) displays the scale curves, as defined in (4.6), of four carriers.Obviously,the scale curve of carrier4 lies consistently above all others, including that of carrier1. The findings are also sup(b)
400

0e

0 o0

0 o
Oo

0o
0
o

0
oo

0 0 o
0 000
020

00

03

0o

0--C

230

300
300

0 0

250

0 0
00 0000 0

0000
0o

005 0

0.25

00

2000.30 160 100

0 .0 0 o

000

0
00

0 000. 0 0

I.5 0.

02 021.00.0 OF

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

p fractioe

FIG. 7.

(a) DD plot for carriers 1 and 4 after centering. (b) Scale curvesfor air carriers.

696

J. LIANDR. Y.LIU

portedby the scatterplots in Figure 6(a), which show more scattereddata for carrier4. In summary,the performanceof carrier4 is inferiorto that of carrier1, in thatcarrier4 has significantlyhighertargetnonconformity ratiosandit is also muchless stableoverall.Possible causes shouldbe identifiedandcorrectivemeasures should be taken. 6. CONCLUDING REMARKS Althoughourillustrativeexamplesarein R2, all tests discussed in this paperapply to any dimension. The DD plot of the ranktest in Section 3 can be constructedusing any notion of datadepthwhich is affineinvariant.Some notions of depthmay be more suitable than others in capturinga certain feature of a distribution. For example, if the underlyingdistributionis close to elliptical, then it is more efficient to use the Mahalanobisdepth.Otherwise,more geometricdepths such as the simplicial depth or the half-space depth (Tukey, 1975) may be more desirable since they do not requirespecific distributional structures or moment conditions.Details on some of these conditionsfor differentdepthscan be found in Liu and Singh (1993) and Zuo and Serfling(2000). Note thatit can be shown that the M-based test using Mahalanobis(1936) depthis asymptoticallyequivalentto the Hotelling T2 test when comparing elliptical distributions.In other cases, the M-based test is more robust. Concerningthe issue of computationalfeasibility in computing depth, although the exact sample simplicial depth value in any dimension can be computed by solving a system of linearequations,more efficient algorithmsare desirable.Rousseeuw and Ruts (1996) providedan efficient algorithmfor computingboth the simplicial and the half-spacedepthsin R2. Developing efficient algorithmsin the case of higher dimensions has recently generatedmuch interestin computational geometry. It is reasonableto expect rapid progress in this direction. Some depth rank tests have been proposed by Liu and Singh (1993) for testing simultaneouslylocation and scale changes. It may be worthwhile to compare these rank tests separatelywith the T-based and Mbased tests for testing location changes, and with the ranktest describedin (4.3) for testing scale changes. Several graphical diagnostic tools stemming from DD plots for the two-sample problem have been proposed by Hettmansperger, Oja and Visuri (1999) and Liu, Pareliusand Singh (1999). Their associatedinferences need to be developedto make the graphicaltools rigorous tests. Combining proper statistics derived fromgraphicaltools with the permutation test idea may prove to be a helpful step in developingthese tests.

ACKNOWLEDGMENTS This research is supportedin part by grants from the NationalScience Foundation,the NationalSecurity Agency and the FederalAviationAdministration. REFERENCES
J. (2000). Monitoring mulCHENG, A., LIU, R. and LUXHOJ, tivariate aviation safety data by data depth: Control charts and thresholdsystems. IIE Trans.OperationsEngineering32 861-872. D. and GASKO, DONOHO, M. (1992). Breakdownpropertiesof location estimates based on half-space depth and projectedoutlyingness. Ann. Statist.20 1803-1827. R. (1993). An Introductionto the EFRON,B. and TIBSHIRANI, Bootstrap.Chapmanand Hall, London. T. (1984). Statistical Inference Based on HETTMANSPERGER, Ranks.Wiley, New York. HETTMANSPERGER, T., OJA,H. and VISURI,S. (1999). Discussion of "Multivariate analysis by data depth: Descriptive statistics, graphics and inference,"by R. Liu, J. Parelius and K. Singh. Ann. Statist.27 845-854. H. (1947). Multivariate HOTELLING, qualitycontrol:Illustrated by the air testing of sample bomb sight. In Selected Techniques of Statistical Analysisfor Scientific and IndustrialResearch, and Productionand ManagementEngineering (C. Eisenhart, M. Hastayand W. Wallis, eds.) 111-184. McGraw-Hill, New York. LIu, R. (1990). On a notion of data depth based on randomsimplices. Ann. Statist. 18 405-414. processes. J. Amer LIu, R. (1995). Controlchartsfor multivariate Statist.Assoc. 90 1380-1387. J. and SINGH,K. (1999). Multivariate analyLIU, R., PARELIUS, sis by datadepth:Descriptivestatistics,graphicsandinference (with discussion).Ann. Statist. 27 783-858. K. (1993). A qualityindex based on datadepth LIu, R. and SINGH, andmultivariate ranktests.J. Amer Statist.Assoc. 88 252-260. R. K. and SINGH, (2003). Ranktests for comparingmultivariLIU, ate scale using data depth: Testing for expansion or contraction. Unpublishedmanuscript. LIu, R., SINGH,K. and TENG,J. (2004). DDMA-charts:Nonparametricmultivariatemoving average control charts based on datadepth.Allg. Stat.Arch.88 235-258. MAHALANOBIS, P. (1936). On the generalized distance in statistics. Proc. Nat. Acad. Sci. India 12 49-55. K. (2002). MultivariateDispersion, CentralRegionsand MOSLER, LectureNotes in Statist.165. Depth:TheLiftZonoidApproach. Springer,New York. P. and RUTS,I. (1996). AlgorithmAS 307: Bivariate ROUSSEEUW, location depth.Appl. Statist. 45 516-526. J. (1975). Mathematicsand the picturingof data.Proc. InTUKEY, ternationalCongressof Mathematicians 2 523-531. Canadian Math.Congress,Montreal. Zuo, Y. and SERFLING,R. (2000). Structural propertiesand convergence results for contoursof sample statisticaldepth functions. Ann. Statist.28 483-499.

You might also like