You are on page 1of 39

The Ordering of Multivariate Data Author(s): V. Barnett Reviewed work(s): Source: Journal of the Royal Statistical Society.

Series A (General), Vol. 139, No. 3 (1976), pp. 318-355 Published by: Wiley-Blackwell for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2344839 . Accessed: 11/10/2012 19:22
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley-Blackwell and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series A (General).

http://www.jstor.org

J. R. Statist. Soc. A, (1976),139,Part 3, p. 318

318

Data The Ordering of Multivariate


By V. BARNETT
University of Sheffieldt theROYAL [Read before
STATISTICAL SocIE1Yon Wednesday, April 28th, 1976, thePresident, MissSTELLA V. CUNLIFFE, intheChair]

SUMMARY In spiteofthelackofa natural multivariate basisforordering an data,weencounter of univariate order extension suchas medians, and ranges extremes to the concepts dimensional situation. and method, higher Also,muchmultivariate theory, exploits in the data or model. We examine orderproperties therole of ordering in these and methodological descriptive aspects of multivariate analysisby means of a ofsub-ordering four-fold classification principles.
PARTIAL ORDERING;SUB-ORDERING; Keywords: MULTIVARIATEDATA; ORDERSTATISTICS;
EXTREME; MEDIAN; RANGE

KENDALL (1966) is not alone in observing that"orderproperties . . . existonlyin one His remark was madein thecontext ofmultivariate dimension". discriminant and analysis, in studying classification. Belland Haller(1969), ofbivariate tests claimto show symmetry, that "there is no 'natural' ofrank" forbivariate data. In bothcases,thelackofany concept andunambiguous means obvious offully orranking, in a multivariate observations ordering, as an obstacle to thedevelopment ofstatistical method: in particular sample appears to the to higher dimensions ofareasofapplication, extension orgeneral methodological advantages ofunivariate order statistics. properties Thisis notto saythattheidea of order or rankis entirely absent from themultivariate a substantial scene.Indeed, effort hasbeen directed todefining some sorts ofhigher dimensional ofunivariate order and much ofmultivariate analogues statistical concepts, method employs various ofsub(perhaps only implicitly) types (lessthan Thispaper total) ordering principle. and classify, theworkin these twocategories and thushas a wider willreview, brief than issueofhowto define thethorny, dimensional order possibly statistics. irresolvable, higher levelwe recognize At theintuitive somerough, notionof orderor rankin primitive scatter relation to,say,a bivariate Consider thetworandom diagram. bivariate samples of 50 observations as Figs1 and2. Certain observations tobe "extreme" presented might appear us" bytheir from thedatamass. Theinterrelationships -they "surprise apparent separation on a SW-NE axisin Fig. 2 between thesample to be more"ordered" points (byappearing in Fig. 1) maysuggest a greater in thesecond ofassociation than degree sample.Suchideas cantoa limited extent be formalized andweshall classification propose (Section 2) a four-fold in thisrespect, ofsub-ordering which serve and one or more ofwhich canbe principles might invarious seen toextend univariate order orinparticular clearly represented attempts concepts, in multivariate results or multivariate distribution analysis theory.We shall termthese reduced sub-ordering principles: marginal and ordering, (aggregate) ordering, partial ordering conditional (sequential) ordering. Thematerial is presented as follows. A brief ofthenature and methods review ofapplicationof univariate orderstatistics as a backcloth to set thehigher serves which against multivariate canbeseen data. These various tobeused(sometimes (ormulti-sample) principles more often inattempts that overtly, implicitly) havebeenmadetoproduce direct multivariate
conducted in part in theStat.Lab.,SUNY,Buffalo, of PublicHealth USA with thesupport t Work Service Grant No. CA-10810-08. dimensional scene. Section2 presents a four-fold classification of sub-ordering for principles

1. INTRODUCTION

1976]

- The Ordering BARNETr of Multivariate Data

319

orderstatistics analoguesof univariate and to underlie in multivariate concepts, manyresults analysis and distribution theory. The paper traces these links. In Section 3 we consider extremes and order In Section4 we proceedto examine multivariate statistics. medians, ranges, the way in whichthe sub-ordering enterinto a variety of multivariate principles (and multisample) analysisprocedures.A fairly detailedreview is givencovering suchtopicsas outliers,

X21
x2
@ * *

.~~S

S.~~~
.... F . ,^

xt~~~~~~~~~*
1 J

'*

.1

xl

*1

F'IG. 1. Sample

1 (n

50, P

2.)

FIG.

2.

Sample 2 (n

50, P

2).

discriminant tests of symmetry, tolerance analysis,mixturesof multivariate distributions, distribution multivariate and regions, theory for univariate samplesof dependent observations, methods with brief comment on informal data screening multi-sample non-parametric large-scale clusteranalysis and ("data analysis") methods includingprobability plottingtechniques, multidimensional scaling. The final Section focuses on a single topic: the assessmentof association(specifically between the components of a bivariate randomvariable. correlation) We see in particular how a certain conditional ordering principle usingtheidea ofconcomitants estimates of thecorrelation in a bivariate coefficient (David, 1973) leads to new and attractive normaldistribution, in a "limited information" context. Most oftheworkofthepaperis expository; theclassification and attribution ofthediffierent list of references sub-ordering principles appears to be new. An extensive providesa fairly work publishedin the international statistical comprehensive coverageof relevant journals over the past 20 years,together with earlierkey references of historicalor motivational interest. 1.1. Univariate OrderStatistics The ordering of a univariate random sampleo as deenenfor clear representation of the has long been an important to the stage sampleimport principle.Such workhas snowballed wherewe now find builtup a vast statistical and associateddistribution methodology, theory,

320

- The Ordering Data of Multivariate BARNErT

(Part 3,

in thetext andthe byDavid (1970), Thisis welldescribed ordered samples. concerned with (1962). and tables bySarhan and Greenberg setofedited papers earlier ..., xn If xl,x2, principle is clearand unambiguous. sample theordering For a univariate variable X,wecanplacethem continuous random ofobservations ofsome is a random sample in increasing as X(W) < X(2) ... < x(n). order, variables random as realized valuesofdependent Theordered values x(j) can be regarded interms oftheordered ofX,canbe approached ofthesample, ofthe structure orexamination in theobservations thesample wouldbe to order method of ordering [An alternative point, a. If a < x(1) from somereference absolute deviation, or "distance", relation to their If a is in thebodyof thedata,e.g. at the theordering just described. is thesameas that orderings" are seldomdirectly ordering arises. Such "distance median, quitea different appealin the havean inevitable data,butwe willsee thatthey considered forunivariate
multivariate context.] X(s). observations x(j), or the orderstatistics theorder statistics of X fora sampleof size n. Properties X(?)(i = 1,2, ..., n) whichare termed

of extensions potential (and actual)multivariate As a framework on which to consider univariate emphases inthe todistinguish different andmethod itis useful principle, application situation.

in order Naturalinterest

natural ordered observations used to express intrinsic At themostbasic levelwe find ortheeffect ofexternal variability ofa setofdata,reflecting extremeness, contiguity, features values canbe crucially as expressions Extreme oftheworst orbest contamination. important or maximum in meteorological flood levels that temperatures (minimum maybe encountered for inindustrial minimum lifetimes may pinpoint reliability studies). They components work; influences orerrors inthedataassembly Their separation process. foreign outliers, indicating a simple assessment provides is a natural measure ofvariability. Themedian (that is,the range) a setofdata: a medical Thevery form ofa problem censor study, oflocation. may naturally all patients, or a reliability before or piecesofequipment, trial, mayhaveto be terminated ofbehaviour. to work with a setoflower havereached thecritical Herewe areforced stage
order natural. statistics.It is inevitable, for speed,ease or efficiency Exploitation of order

ordered to giveease or advantage a setofobservations maybe deliberately Alternatively, to reflect natural inthedata. With this rather than factors emphasis inthestatistical analysis, of scale and location linear order statistics estimation we encounter parameters; optimum basedon range, estimation or testing etc.tests forparameter mid-range, short-cut methods based on contrived suchas elimination or adjustment of methods foroutliers; censoring, aid(trimmed, methods values as a robustness extreme means); probability plotting Winsorized, in analysis use of ordered residuals ofvariance estimation, formodelvalidation, parameter ofnon-parametric basedon ranks, and thewhole or signs. procedures range a natural of thedata serves or is contrived forease or Whether theordering purpose, of distribution results fororderstatistics. This we need a vastbattery theory efficiency, Muchis known ofthe andinterrelationships between, hasbeen studied. forms of, widely aspect oforder statistics both for distributions andinspecific moments cases. Oneparticular general forextreme area concerns limit law results values. In theunivariate situation there arejust of limiting valuedistribution; thisis notso in higher three forms extreme dimensions. See 3.3. Distribution results for univariate canbe viewed Section samples non-independent theory a multivariate and willbe considered later(Section context within 4.1) in some 1.2,Section detail.
Distribution theory

1976]

BARNETr -

Data The Ordering of Multivariate

321

intotwodifferent The broadclassification of univariate order statistics study emphases from thepoint ofview ofapplication, andthecorresponding background distribution theory, willprovea useful basisforexamining order in multivariate data. Section concepts 3, for and considers examines direct to the "natural" someof the example, parallels emphasis, distribution aremore concerned associated theory, whilst the later sections with the exploitation we must ofmultivariate aspect.Butfirst consider what might be meant by"ordering" data, andterminology. andtofacilitate discussion somecomment is needed on notation this random variable, X. Corresponding with anycomponent random variable, random Xi ofX wehavea component sampleXil, Xi2, ..., Xin (i 1 2, ... p). for different valuesofi arein general Thecomponent samples non-independent, reflecting of thecomponent themultivariate structure of X. Properties random or random samples, Thusmoments ofXi aretheithmarginal variables, willbe termed moments ofX; marginal.
the set of orderedobservations Xi(l) 1.2. Notation for Multivariate Samplesor Distributions We shall denoteby xl,x2,...,xn a randomsample of n observations of a p-dimensional

random variables order corresponding statistics. marginal Xi), X(2), ..., X(.) aretheith in thisframework certain cases. We can fit special Herethemarginal areindependent (i) TheXi are independent. random samples samples oftheindependent random ofobservations variables, separately Xi. IftheXi canbe observed is no reason sizesneedbe thesame. We shalldenote there whyall marginal them sample arebetter termed thanmultivariate. n1, n2,..., np. Data ofthis type multi-sample rather The be identically distributed. Xi mayon occasions in comparisons from (ii) Internal comparisons. Usually wewillbe interested oneobservation,xi, to another, xj. We may seek to "order" the xi (i = 1,2, ...,p) or look for"extreme

i(2)... <xi(n)

is the ith marginal ordered sampleand

values"or "outliers". The corresponding areclear, andindividually marginal concepts may behandled order statistics results. between byunivariate Interrelationships marginal properties to exploit themultivariate structure of X. A specialcase arises need,however, with n= 1
(a singlemultivariate observation, x), when we orderover the component observations thus

x can be ordered

X(1

< X(2) *X<X(p)

internal orderstatistics with and corresponding X(1),X(2),..., X(p). If theXi areindependent distributed thisspecialization morethanorderstatistics fora identically yieldsnothing ofsizep. If theXi aredependent, random theinternal univariate order sample statistics are what havebeentermed "order statistics for variables Muchhasbeen dependent (processes)". written on these, and it is relevant to our study off themainstream).See (if somewhat 4.1. Section multivariate we shallwish (iii) Several to consider, samples. Occasionally simultaneously, from thanone multivariate more distribution. To avoidconfusion it is bestto use samples for the andsamples. distinct different distributions Thusxl,x2,..., xni symbols andYi,Y2, *-' Yn, of different multivariate random maybe tworandom samples variables X and Y. Withinfactors willuse theabovedesignations basedon x andy sample (andparticularly, marginal) separately.

of seeking Accepting thefutility anysimple, unambiguous, universally agreeable, total ofthen sample values we limit ourinterest to ways ordering ...,x. among themselves, x., x2, ofordering some restricted form ofmultivariate inwhich datais feasible and (ormulti-sample) of anyrestricted The end product which we term subadvantageous. ordering principle, of one or moresummary is an ordering or ranking of theobservations features ordering,

2.

SUB-ORDERING PRINCIPLES FOR MULTIVARIATE AND MULTI-SAMPLE DATA

322

BARNETr -

The Ordering of Multivariate Data

[Part3,

(usually quantitative and uni-dimensional) considered either individually or in combination. we maymerely Sometimes achieve a relative order comparison (in declaring, say,thatone observation is more, or less,extreme thananother in somelimited respect), or mayconclude thatsomesetof observations is of "different order" from another with no formal intra-set order comparison. It appears convenient to distinguishfour particular sub-ordering for principles multivariate data. Theyare notentirely mutually exclusive. Sometimes an order-based method of data study is to be found which clearly incorporates more than oneofthesub-ordering principles; themethod occasionally might seemto be classifiable undermorethanone heading, or possibly has dubious pedigree under anyof thefour.But thesub-division of sub-ordering principle doesseem topartition thefield oforder-based multivariate study fairly well, either in terms ofbasicprinciple or in terms ofpractical interest.
The four sub-ordering data are marginalordering, principlesfor multivariate reduced (aggregate) ordering, partial ordering and conditional (sequential)ordering. We shall refer to theseas M-ordering, R-ordering, and C-ordering, P-ordering respectively. Separateconsidera-

tionis given to someordering methods in multi-sample data.

As thenamesuggests, or ranking ordering heretakesplacewithin one or moreof the marginal samples. Interest maycentre on theindividual ordered margnal as an aid samples to inference aboutthemarginal certain order features distributions; on themarginal samples in combination maybe considered (as in global,or component-based, of median, concepts range, extremes, etc.see Section 3), or marginal ordering mayserve as a prelude to some further sub-ordering principle (as in Bennett, 1966,on confidence forratiosof intervals orvarious marginal correlation seeSection medians; estimates, 5.1). In addition, Singh (1960) considers due to censoring, marginal in theestimation ordering ofparameters ofthemultinormal from variate distribution and censored, truncated, ofjointdistrisamples.Studies butional of marginal orderstatistics include Mustafi properties (1969),who forbivariate considers recurrence samples, for the relationships jointdistribution function:
X2) = p(X1(r) < x1,X2(s) < X2) G,Ux1n,

2.1. MarginalOrdering (M-ordering)

of anytwomarginal order statistics from a sample of sizen, and Galambos (1975)on the p-dimensional of(Xl(r1), jointdistribution X2(r.), Xp(r,)). Moreexamples oftheuseofM-ordering be encountered will when wediscuss (inSection 4) specific ofmultivariate aspects or method. theory find Wefrequently totransformations ofthedataset. Theordering M-ordering applied of particular linear combinations (projections) of component valuesor of radialdistances or angular deviations from some fixed areexploited point ordirection (see,for example, Blumen, 1958;Weiss,1960;Vincze, and Johnson, 1961;Bhattacharyya et al., 1972; 1969;Andrews Russell andPuri,1974;for in more discussed particular detail applications later).Theinitial transformation ofa component maytaketheform analysis and Kettenring, (Gnanadesikan to outlier 1972,in relation Then again,the samplepointsmayeach be identification). reduced to a single preliminarily valuebysomeappropriate (non-linear) ofa metric, perhaps distance generalized mostexamples type.However, of ordering after initial transformation ofthedataarebestconsidered under thenext ofreduced heading since (aggregate) ordering is notto represent their intention marginal behaviour (joint) butto summarily overall express characteristics for themultivariate dataset. thistype of ordering With eachmultivariate observation is reduced to a single valueby meansof some combination of the component is samplevalues. The metric employed of the"generalized distance" frequently type:x beingrepresented by a quadratic function
2.2. Reduced(Aggregate) Ordering (R-ordering)

1976]

BARNETT -

The Ordering of Multivariate Data

323

(x- a)' r-P(x- a) forsomeconvenient choiceofa and r; a maybe theorigin, themeanor the setofmarginal medians (sampleor population);r maybe theidentity matrix, I, thepopulation or sampledispersion matrix (E or S) or perhapsthediagonalmatrix of(sampleor population) Wilkand Gnanadesikan component variances. See, forexample, (1964) on informal graphical assessment of multi-response data. In contrast to marginal ordering, theaim is to effect some sortofrestricted overall ordering of the multivariate sample. This may be explicit(as, forexample,in much of the workon extremes, described in Section3.3, or outliers, Section4.2) or merely implicit in a particular methodof multivariate as we shall observethroughout analysis, Section4. Generalized distancemeasures figure widelyin statistical analysis, goingback to Pearson (1900). Theiruse as a basis for(reduced is but a smallarea of application. type)sub-ordering r = I has a primitive Setting appeal in merely ordering the Euclideandistanceof thesample pointsfromsome "centre"a, possiblythe naturalorigin(a = 0). Its disadvantages include thedisregard ofsecond-order moment structure lack ofappropriate (and location),itsfrequent probabilistic interpretation and its failure to reduceto theconventional in ordering principle one dimension(it yieldshere the type of distanceordering of describedat the beginning Section1.1). in relation If we knewthedistributional form ofX there wouldbe some appeal in ordering to probability concentration contours. For a normaldistribution to thisadds respectability ordering based on thegeneralized distance(x-,u) Y-1(x-V.) where,uis themeanvectorof X (or thesampleequivalent). For thedata in Figs 1 and 2, theunderlying distributions werein 0 and 0-5, respectively. fact standardized normalwithcorrelation Thus concentric circles, or ellipses,would be the corresponding bases on whichto assess orderor extremeness. See The distribution ofX willtypically be Figs 3 and 4. But suchan approachhas limited utility.
X2~

~ ~

~~~~~~X

FIG. 3. Probabilitycircles-sample 1.

FIG. 4. Probabilityellipses-sample 2.

324 - The Ordering BARNETr Data of Multivariate [Part3, unknown and cannot be used(except, foridentification as a basisfor of outliers) perhaps, expressing order or extremeness. A slightly different version ofR-ordering involves theaccumulated (oraggregated) distance of each pointfrom all theother rather thanits distance from a single points, fixed point. Thisprinciple is implicit in Wilks'(1963)testfortheidentification of multivariate outliers (Section 4.2) and is indirectly appliedin a particular definition of themultivariate sample median (see Section 3.1). A novel forreduced be found in theFourier-type possibility ordering reduction might of Andrews x is represented (1972), where byxIN2+ x2sint+ x3cost+x4 sin 2t+x5cos2t... for some choice(or range)of valuesof t. More speculatively, we evencontemplate might "ordering" Chernoff's faces?Chernoff a multivariate (1973)suggests reducing observation x toa caricature ofthe human for easeofassimilation face ofx,ordistinction between different x. He proposes no ordering (other thanrough ofthefaces.Butwhy grouping) not? Couldwe notimagine them subsequently ordered in terms of "beauty", or "intelligence" or "malevo! lence" ? Thescopefor is endless personal judgment The emphasis here moves ofthemarginal orindividual consideration awayfrom samples multivariate observations to consider overall interrelational inthetotal properties deployment ofthesample.Thewayin which observations fallintodifferent ofthesample regions space, suchpartitioning where maybe basedon one of several is usedto dispossible principles, between of observations tinguish withregard to order, groups rankor extremeness. The method partitioning in P-ordering mayinvolve marginal properties or reduction metrics but theaimis usually restricted to limited order forthewholesetof data. Specific distinctions forms are a basis forparticular methods of multivariate forexample, in cluster analysis, or discriminant analysis in more analysis (considered detail below:Section 4). in a basisfordividing either results P-ordering thesample ofdifferent intodistinct groups oneto another, with no internal order, distinction or for statements ofrelative making order inwhich anyoneobservation is ranked with respect to theothers. We shallconsider various ofsucha principle. examples the random Figs5 and6 againshow 1 and2. In eachcasetheconvex hull hasbeen samples constructed by drawing theminimum convex setwhich all sample encloses points.Those on theperimeter points are designated c-order group The convex 1, and discarded. hullof theresidue is formed; thoseon theperimeter are group 2. The process is repeated, thus an entirely providing ofdividing sample-based method thedataintoorder groups; thelower thegroup themore "extreme" theobservation. number, Sucha method ofpartial is ordering to Tukey's analogous for"peeling" a multivariate proposal sample (as thegeneralization of a univariate "trimming" sample) butitdoesnotseem to havebeendiscussed in detail, andis rather attractive. It might, forexample, lead to a simple method forestimating correlation a bivariate from sample.Correlation estimates basedmerely on linear combinations ofthe in thenumber differences oflinesegments ofpositive and negative slopefordifferent group perimeters (perhaps inversely weighted by group number) hasintuitive appeal which issupported on empirical study. Distributional results corresponding to convex hullordering maynotbe too tractable butsomestudies havebeenmade. Efron (1965)discusses theexpected valueof the area,perimeter, probability content and number of sidesof the overall convex hull. His results, in theform ofintegrals, are manageable foruniform and normal distributions. Fisher (1969)considers all possible limiting shapes fortheconvex hullas n-*oo, and refers to earlier work byhimself (1966),Geffroy (1961,on theasymptotic behaviour oftheconvex hullformultivariate normal data),and Renyiand Sulanke (1963,1964). See also Carnal (1970); Quesenberry and Gessaman (1968) refer to the use of tolerance regions to form convex hullsaboutsample points in their study ofmultivariate non-parametric discriminant analysis (see Section 4.3) extending thesuggestion byKendall(1966)thatan observation be
2.3. Partial Ordering (P-ordering)

1976]

BARNETT -

Data of Multivariate The Ordering

325

hullof in theconvex it falls on thebasisofwhether or another to one population assigned from oneor theother population. samples points ofsample thenumbers liesin examining P-ordering basisfor thesimplest Perhaps or shape(rectangles of different regions prescribed case) within which lie (in thebivariate
X2

FIG. 5. Convexhulls-sample 1.

FIG. 6. Convexhulls-sample2.

a circle offixed radius toinclude manipulating suggests (1955)at onestage say). Cohen circles, a bivariate mean. He of samplepoints, as a basis forestimating number themaximum about70-90percent theaimofenclosing with should be chosen theradius that recommends ofa a non-parametric estimate and Quesenberry (1965)present ofthesample.Loftsgaarden defined in terms of in regions function usingcountsof observations multivariate density cubicaland spherical Elkins(1968),fora similar compares purpose, Euclidean distance; on the is O(bn-4) where b depends sideor diameter and concludes thattheoptimal regions Naus (1965)discusses windows". Murthy (1966)uses"p-dimensional true function; density at with sidesu,v,contains thata rectangle ofp(mIn,u,v): theprobability thedetermination uniform distribution. from a bivariate leastm outofn observations duetoBarndorffintheideaoflayer is found ranks, for P-ordering Animportant principle idea forbivariate data,theobservation and Sobel(1966). Illustrating this Nielsen xi is said
r-1 observations (r = 1,2, ...) if thereare precisely to be rthlayer, first quadrant-admissible is in excessof thoseof x;. For any r, each observation in the samplewithboth co-ordinates For the or rth layer, firstquadrant-inadmissible. eitherrth layer, firstquadrant-admissible, foreach xi in terms are defined quadrants,analogous attributions second,thirdand fourth thanthoseof or smaller are (appropriately) whoseco-ordinates greater of otherobservations rth in but of first extend terms layer, only definitions orthants, the naturally p> 2, (For xi. are distributions is considered).For the specialcase wherethe marginal orthant-admissibility of the distribution-free and Sobel discuss the distribution Barndorff-Nielsen independent,

326

BARNETT

- The Ordering of Multivariate Data

[Part3,

in a bivariate points sample of sizen. Theyalso consider analogous results for p > 2 in the first orthant only.Whereas andSobelareconcerned Barndorff-Nielsen only with distributional properties with no specific application in mind, theideaoflayer ranks doesseem implicit in a simple form inthework ofSiddiqui (1960)on bivariate extremes, anditis usedexplicitly by andJohnson rank Bhattacharyya test"forequality (1970)in a "layer oftwobivariate distributions where thealternative is one-sided with more hypothesis "large"observations likely from one distribution rather thantheother.Thisdistribution-free test forordered shifts in bivariate or multivariate uses the notionof "stochastic distributions fortwoordering distribution function with G, ifF#G and,forall (x,y), FA G and F.<G (F> G or F< C). Thetest statistic is basedon sums, over eachsample, oflayer inthecombined ranks sample.
2.4. Conditional (Sequential)Ordering (C-Ordering) dimensional vectors". X, withdistribution function F, is strongly (weakly)smallerthan Y,

Ar(q) (q = 1,2, 3,4; r = 1,2, ..., n): thenumbers ofrthlayer, quantities qthquadrant-admissible

Thefinal sub-ordering for principle multivariate datais oneinwhich ordering orranking is conducted on oneofthemarginal setsofobservations conditional on selection, or ordenrng or within thedata in terms of other ranking, marginal setsof observations. Examples appear ofKreimerman in thework (1975),or theuse ofconcomitants (David,1973). (See Sections 3.4and5,respectively.) Themarginal samples usedmay be theoriginal orthose ones, derived from somepreliminary co-ordinate transformation. Theprocess is often repeated sequentially all the marginal setsof observations, or maybe limited throughout to a single stageof conditioning. Theterm is duetoTukey is developed from (1947)buttheconcept ideasofWilks (1941, 1942) and Wald (1943). Wilks(1948)surveys thework to thatdate; Anderson the (1966)refines notion andproposes many non-parametric procedures basedonit; Kendall Gessaman (1966), informal rather (1970)andRichards (1972)propose for multivariate discriminant applications andclassification. analysis The idea stems from workby Wilks(1941) on the determination of distribution-free tolerance based on thefactthatthe coverages limitsfora univariate forthe distribution, of k is just F(X(j))-F(X(j_1)). In higher The coverage themulti-dimensional dimensions, to anysetofmarginal with order statistics coverages produced by"slicing" will respect clearly Thisextends to hyper-rectangular also be distribution-free. when thecomponent coverages areindependent. variables Wald(1943), with how tolerance showed againconcerned regions, of distribution-free the construction to extend to the general rectangular coverages (nonbivariate independent component) case. The rectangular regions were constructed byslicing to chosen values ofthefirst with ordered then eachslice respect component, internally slicing to theordered valuesofthesecond with within thatslice. Tukey respect component (1947) thisto general multivariate extended situations and also relaxed theconstraint of slicing to theco-ordinate parallel blocks.Their axes,to produce construction statistically equivalent is welldescribed by Anderson (1966)in thefollowing terms.Supposeh&(x),h2(x),...,h.(x)
functions ofx, notnecessarily aren one-dimensional and k1, different, k2,..., k. is a permutation to order of 1,2,. .., n. We usehk1(x) thexi anduse thekith ordered valueofhk5(x) to divide intotwoblocks.One or other thesample ofthese twoblocks on whether (depending k2> k1 intotwoin terms or k1> k2) is nowdivided of thevaluesof hk,(x) forthesample it points intervals (i = 1,2, ..., n+1 withX(0)= -0o, X(f+l) = oo)aredistribution-free. Ii = [X(i),X(j_1)] The principal is foundin thenotionofstatistically exampleof C-ordering equivalent blocks.

Thisprocess is contained contains. until wefinish n+ 1 statistically up with blocks equivalent distribution-free with coverages. Anderson (1966)showshowto use statistically equivalent blocksto testif thesample a prescribed comesfrom multivariate or thattwomultivariate distribution, samples come from thesamedistribution either theblocks defined (using byonesample, orbythecombined as a basis for the He alsoconsiders sample, ranking observations). anapplication todiscriminant

19761

BARNETT -

The Ordering of Multivariate Data

327

are of course non-parametric. Vincze (1961) implicitly analysis. The proposed procedures blocksforhis two-sample non-parametric Smirnov-type equivalent uses particular statistically implicit procedures have a similar non-parametric testsforbivariate samples,and manyother the use of basis. The "bivariate order statistics"of Kreimerman(1975) also illustrate statistically equivalent blocks. See Sections3.4 and 5. in Multi-sample Data and MarginalOrdering 2.5. Combined methodsfrequently use the ranks(or relative signs)of obserUnivariate non-parametric may take place within vations. When considering multi-sample comparisonsthe ordering above, multi-sample each sampleor possiblyoverthecombined set of samples. As described context. The "component"or "marginal" data can be viewedin the generalmultivariate and their samples Ordering of individual sizes maydiffer. samplesare of courseindependent overthe combinedsampleneeds to be above, but ordering is of the marginal typedescribed ordering.Many viewed as a distinct sub-ordering principlewhich we shall call combined statistical methodsemploy combined ordering. We shall not attemptto non-parametric non-parametric methods. However, reviewin detail orderconceptsin relationto univariate non-parametric methods(Section4.7) since consideration will be givento multivariate fuller ordering but also not onlythedistinction and combined, between within-sample, thesereflect natureof the data. the above fourtypesof sub-ordering for coping withthe multivariate samples.) as a set of multivariate multi-sample data is bestvieweddirectly (Multivariate data includeConover theory resultson orderedmulti-sample Some specific distribution equal sized, Each ofp independent, and marginal ordering. (1965). He employs bothcombined ordered;thep orderedsamplesare then is internally samplesfroma commondistribution properties are "ordered" in relationto theirmaximumobservations.Some distributional may investigated. See also David (1966). For a similarset-up,but wherethe distributions thatthep samplemaximaare the Cohn et al. (1960) examinetheprobability possiblydiffer, of the of non-empty intersection sample,and theprobability p largest values in the combined if all and maximized set of intervals {(xi(r), Xi(r+l), i = 1,2,....,p)} (whichis distribution-free property distributions coincide). In the bivariatecase theyshow that the distribution-free equivalentblocks depend on the of intersections of statistically disappearsand probabilities mannerof their construction. 3.
DIRECT ANALOGUES OF UNIVARIATE ORDER CONCEPTS FOR MULTIVARIATE DISTRIBUTIONS AND DATA

data is An obvious starting ordering principles applied to multivariate pointin studying of case in the namingof conceptsor expression to seek directparallelswiththe univariate contains ofthedata. The literature ideas from thepointofviewofthe"natural"representation extremes and order multivariate medians, ranges, including quite specifically manyexamples, of any directand total statistics.The authorsare not hampered by the obviousintangibility of ordering concept; theyseldom considerthe lack of formaldefinition higher-dimensional butwe have someidea of what an elephant, to define order. It woulddoubtless provedifficult The fact we mean by an "elephant"and readilyattachthe label to certainmanifestations. to themfromtrying fromanother'sdoes not prevent thatone man's "elephant"may differ of suchsubstantial communicate on a matter importance. 3.1. The Median thedetermination of themedian sampleas thatpoint Austin(1959) discusses ofa bivariate is a minimum.Known also of all observations from whichthesum of theabsolutedistances distance(travel)"it has been claimedto be the "proper as "the pointof minimum aggregate and economists and has interested ofthemedianconceptto higher dimension", generalization Weberproblem")with is also knownas the"generalized planners (to whomitsdetermination

328

BARNETT -

Data The Ordering of Multivariate

[Part3,

plants.Mathea setofindustrial depot to serve location ofa storage regard to theoptimum in thedifficulty Widestatistical of its determination. maticians havelongbeeninterested has Metron) thepages of thejournal, expressed through interest in the 1930's(primarily fornumerical 1970; againin Metron) (Seymour, by newproposals recently been revived analogue, to the in its univariate reduce, Such a "median"will clearly determination. is the aggregate formof concept sub-ordering samplemedian.The implicit traditional of merely thevector is employed elsewhere: to represent quite differently Theterm median ofthe Mood (1941)considers the jointdistribution medians. or sample) marginal (population andTukey a bivariate about sample medians; Olmsted (1947)partition setofmarginal sample intheir sumtest" ofassociation, and medians "quadrant non-parametric themarginal sample of estimator Gnanadesikan and Kettenring (1972) proposesucha "median"as a robust with (1966,1968)areconcerned (1961)and Bennett multivariate location.Hoel and Scheuer medians in a bivariate andwith confidence limits for ratios ofmarginal jointconfidence sets, of theordered distribution Bennett thesampling (1966)employs For example, distribution. confidence ina bivariate ofzi(O) = x2i- Oxli sample values (xl, x2) (i = 1,2,..., n) toconstruct (1968),he medians intervals forthe ratio'qle of the marginal (e, -). Later,in Bennett bivariate median two distributions whose pairs({1, 1) and (92) J, satisfy considers
& = {es 2= = 1

R-ordering.

to in thetwobivariate distributions arein constant medians proportion that is,themarginal he of thedifferences = and = thesigns zj Xx21 -axU each other.By considering Y2j zj y limits an estimator andconfidence forcx. constructs inthe medians a test for thevalueofthepairofmarginal with Blumen (1958)is concerned bivariate case. investiofa multivariate median involves butitssubsequent Thisconcept justM-ordering as inthe useofthe ordered values ofzi(O)byBennett alsoemploy (1966). R-ordering gation may in use of order which We see forthe mediana basic distinction concepts permeates ofa univariate internal structure either utilizes concept multivariate studies.Generalization and is correspondingly from itsunivariate progenitor, remote through R-ordering) (perhaps and of thevector of marginal univariate M-ordering) or consists equivalents (using merely on theinternal losesinformation ofX. structure correspondingly median" form of themedian as the"arithmetic Haldane(1948)describes themarginal in ordinary it "theonly reasonable to be preferred and declares generalisation ... obviously median" "certain distance with statistical theaggregate form the"geometric work".He terms in problems ofgeometric probability". advantages inthe andextremes. inform be noticed discussion ofrange These distinctions will following in terms to seekto represent of a It is natural thevariability in a multivariate sample of the of range. Cacoullosand DeCicco (1967) consider the distribution generalization in label, principally candidates for this "bivariate andexamine various range".They propose tothe circular common standard Inthe estimation ofthe relation bivariate normal distribution. inter alia, consider, deviation, a, they bivariate range: max[(x,,- xl)2 + (x2,-x22)2]i points), between twosample Euclidean separation (i.e. themaximum
diagonal: [R2+R2]1, figure of merit: (Rl+R2)/2, 3.2. The Range

1976]

BARNETT -

The Ordering of Multivariate Data

329

whereR1 and R2 are the two marginal ranges. These involveR-ordering (forthe bivariate range), and M-ordering (forthediagonal,and thefigure ofmerit).Theyrefer also to theuse of covering circles (Daniels, 1952)and of thearea, or perimeter, of theconvex hullofthedata set, The detailedform of thedistribution thusemploying of theStudentized P-ordering. bivariate range, whensampling from a circular normaldistribution, is discussed by Gentleet al. (1975). Another approachto rangeis foundin the definition of connected rangeby Tsukibayashi (1962). Forp = 2, suppose(X1,X2) have zero means,and theregression of X2 on X1 is linear and homoscedastic.A randomsample (xli,x2j) (i = 1,2, ..., n) has minimum and maximum and Y[n] are the associatedy-values, x-values,x(l) and x(f). If Y[W] the "connectedrange" is the covariance,and the regression and correlation Y[n]-Y[i]and this is used in estimating coefficients. The Y[i]are the so-calledconcomitants of the orderedxi (David, 1973) and thus we also encounterthe fourth type of sub-ordering principle(C-ordering) in the studyof multivariate "range". is the predominant However,M-ordering principlein the discussionof range. We saw the marginalrangesR1,R2 used in combination by Cacoullos and DeCicco (1967). Several thecorrelation ofthemarginal a standardauthors have considered ranges. For samplesfrom ized bivariate normaldistribution withcorrelation, p, Hartley (1950) showedhow to determine between R1 and R2 fora sampleof size n. Mardia (1967) claimedto p,(n, p); thecorrelation an error correct in Hartley's workand givestabulated valuesforp = 0 (0 05) 1,n = 2 (1) 10,20. Smithand Hartley(1968) point out thattherewas no mistakein Hartley(1950); it was a Kurtz et al. (1966) give an explicit forp.(n, p) whenn = 2, 3, misunderstanding. expression and its limiting form as p -?0 foranyn. and Quantiles 3.3. Extremes of a multivariate The conceptof an extreme sampleis important intuitively and methodoof outliers and testsof multivariate structure is considered logically. Its role in identification in an overallor in a marginal to explicitly later. At thisstage we examineattempts define, extreme. We shallalso review somerelevant distribution sense,theidea ofa multivariate theory. to call theobservation Kudo (1957)claimsthatwe are"justified ... [xJwhich has maximum
value of (xj -xR)' Z-1(xj -)

a distribution withknownvariance-covariance matrix withsamplemean x, from Z. We are, to know Z in whichcase Kudo suggests it withS, the sample of course,unlikely replacing He is concerned estimator. witha normaldistribution and theidentification moment primarily of outliers.(See also Section4.2.) of extremeness but bases it on moregeneralized Siotani(1959) adopts a similar principle - a) whereP may be Z or S and a may be the natural "distancemeasures"(xj - a)' 1-I(xj originor the true mean, as well as the sample mean. Both Kudo and Siotani are using R-ordering. in the sense of the set of marginal notionof an extreme The M-ordering extreme values x2(), ..., xp()) or (X1(n), X2(n), ..p, XP(n)) has also been widelydiscussed. Sibuya (1960) (x1(q), a bivariate extremes from claimsthatthepair of marginal extension sampleis theappropriate of the extremal of possible conceptof two dimensions.He shows that thereis an infinity in theunivariate for(Xl(n), X2(X))(in contrast to thethree which exist limiting jointdistributions thenormal with case) and thatfora largeclass ofdistributions, including imperfect correlation, independent.Gumbel and Goldstein (1964) examine X1(n) and X2(n) are asymptotically fortwo real-life oldestages at deathfor extremes bivariate marginal samples: one describing over manyyears,the othermaximum floodlevels the two sexes in a particular community forvariousyearsat two pointson a river. They concludethatthe maximum ages are independent,the maximumflood levels,dependent. Gumbel and Mustafi(1967) examinethe extremal distributions" of two forms of "stable bivariate analytic properties havingmarginal value distributions. typeI extreme They examinesome data on flood levels to determine fit. See also Gumbel(1961). whichof themodelsprovidesthe better

.. . the extremevalue" in a multivariatesample {xj} (j = 1,2, ...,

3, [Part Data of Multivariate BARNErr- The Ordering 330 (1953)to ideas by Finkelshtein (1967) develops Srivastava Sibuya'sresults, Extending or of anypair (Xl(l),X2(1)), of theminima independence forasymptotic conditions examine this workto the set extends and Mardia (1964a) further orderstatistics, of marginal are papers relevant Other andmaxima. minima marginal ofthefour X2(1), (Xl(l), Xl(f), X2(n)) jointdistribution (1975). Theasymptotic et al. (1969)and Galambos (1959),Posner Geffroy (1960); bySiddiqui is considered orjointnormality) independence, (their quantiles ofmarginal andofMood (1941);andbySrivastava ofSiddiqui results on earlier (1964)building byWeiss quantiles. between ofdistances for independence conditions including etal. (1964), formarginal jointdistributions exact(non-asymptotic) in thisarea includes work Other or for (Siddiqui1960;Mardia,1964b), distributions forgeneral ranges quantiles, extrema, formultivariate and ranges (Mardia,1964c,on minima distributions bivariate particular ranges). marginal normal ofsizes2 and 3 for samples for 1,butonly type andPareto normal arealso jointdistribution oflimiting forms ormaxima to specific ofminima Convergence jointdensity having jointdistribution a limiting considers (1962b) Berman discussed. widely oftheform
01(x1)]+1, b(Xi, X2) = #2(x2) 01(x1)Xog0g 2(x')/1lg

(0. (- t,-1) < X(t) max with function convex, where x(t)is a continuous, orso,Tiagode Oliviera of 15years a period over ofpapers series In a longandimportant In Tiago de extremes. marginal forbivariate jointdistributions asymptotic has considered mustbe a stable joint distribution Oliviera(1959, 1962) he showsthatthe asymptotic transforafter logarithmic simple (possibly which distributions withmarginal distribution thatis with type, of theGumbel valuedistributions extreme are each univariate mations) joint theasymptotic +(x) = exp(-xe). Morespecifically oftheform function distribution havea form must function distribution
f(x1,x2) = [#(x1) (x2)]k(xz-x) =

exp[-(e-xi + e-xs) k(x2- xl)],

random of thecomponent interdependence expressing function where k(w),thedependence ifk(w)= 1 and arises independence Asymptotic conditions. prescribed must satisfy variables, x2)as yields #(xl,
x2) = exp [-(ex01 + e-xs)] f1(xl

+ el") in which (1,ew)/(1 when k(w)= max arises or diagonality, dependence, whilst complete case b(xl, x2)is x2)= exp{ -exp [-min (xl,x2)]}. b2(x, In general
0Xl(2 XD -< +(X1, X2) 02(X13,X0.

is exchangeable:#(xl, x2)= +(x2,x1), if k(w) = k(- w). Tiago de Oliviera The distribution of b(xl, x2). Subsequentpapers and estimation for independence, (1965) considerstesting

in ofspecific distributions and properties thenature (1968,1970,1971,1974,1975)consider Gumbel and biextremal the so-called mixed, include logistic, x2). These the family O(xl, distribution of Marshall to thebivariate exponential thelatter beingrelated distributions: in bivariate of parameters estimation et al. (1969)also consider and Olkin(1967). Posner distributions. extremal usesno overall extremes ofmarginal clearly onjointdistributions ofwork Thisvastarray on M-ordering. Occasionally, exclusively concept.It is basedalmost multivariate ordering in Siddiqui(1960) where are also present as, forexample, othersub-ordering principles ranks". to anticipate which is usedin a manner "layer appears P-ordering

1976]

BARNETT -

Data of Multivariate The Ordering

331

3.4. OrderStatistics wherea claim is advancedfora global There seem to be fewexamplesin the literature is in Healy(1968) sample. One example in a multivariate theobservations of ordering principle of "distance"fromthe mean. Explicit in terms principle a "natural" ordering who suggests appears in the titleof distributions in relationto bivariate use of the term"orderstatistics" (1975) by Kreimerman and in a report is considered) Galambos (1975,whereonlyM-ordering increasing usinggradually distribution bivariate fora continuous on a testof goodnessof fit "order statistics"for a bivariatesample are numbersof order statistics.Kreimerman's and selectscertainorderedvalues to define obtainedas follows. He ordersone component "striplimits". For the points withineach stripthe values of the second componentare ordered. He utilizesjointlythe striplimitsand a selectionof orderedsecond-component akinto "statistically in a form C-ordering employs each ofthem.Sucha procedure valueswithin blocks". equivalent 4. MULTIVARIATE ANALYSIS OR THEORY INVOLVING SUB-ORDERING analysisor theory or multi-sample areas of multivariate We shall now reviewparticular or ranking.Severalexampleshave already of data ordering whereuse is made of some form be briefand been quoted. The reviewin this sectionextendsthese,but must necessarily analysisor multidimensional to surveycluster we shall not attempt selective.In particular, withordering concerned thesecan each in a sensebe viewedas intrinsically eventhough scaling senses,respectively). (aggregate) data (in partialand reduced multidimensional Observations Dependent for Ordered Theory 4.1. Distribution random variables. distributed Suppose Xl, X2,..., X. are dependentor non-identically to produce each is ordered from A setxl,x2,..., xp of one randomobservation
X(1) < X(2), ... < "X(p).

a dependent from to as "orderstatistics variables {X(1)}are referred random The corresponding as a singleobservation x = (xl,x2,..., xp)' can be regarded sample(or process)". Equivalently x, ordering randomvariableX = (X1,X2,..., Xp)'. Internally multivariate of a p-dimensional Xi the distributed, identically on a componentbasis, yields the {x(j)}. For independent, sincethe X(2)are just theusual studied, of the X(o)have been widely distributional properties fora sample of size p. But thereare some analogous resultsfor orderstatistics univariate in themainwithexchangeor non-identically distributed, dependent, Xi. These are concerned forX. forms assumeddistributional or equi-correlated Xi,or withparticular able,m-dependent X(f)and value,X(,), butgeneral with theextreme Most oftheworkin thisarea is concerned the range,X(_)- X(1)) also feature. of them(including linearcombinations of X(,) is studiedby Watson (1954), by Newell (1964) formdistribution The limiting p is random. Gallot (1966) dependent Xi or where Xi and by Berman(1962) forexchangeable bounds on p(X(p) > c) forgeneralXi. considers David and Joshi(1968) extendto exchangeableXi some of the known resultson the X(f)for independent of the orderstatistics, moments Xi; Young (1967) derivesrecurrence or equi-correlated of theX(f)forexchangeable functions thedensity between Xi; relationships of samplequantilesform-dependent normality asymptotic Sen (1968) demonstrates Xi. normalXi, Steckand Owen(1962),and Grieg(1967),consider standard For equi-correlated values of theextreme distribution theapproximate X(,) foranyp; Gupta et al. (1973) present tablesby Gupta (1963); and percentage pointsforX(,) extending integral of the probability of thesefor for{X(i)} in terms moments and product moments Owen and Steck(1962) express thedistribution normalXi. Teichroew standard (1955) and Kapur (1957)consider uncorrelated theothers.Both from butone has meandifferent normal, of X(,) whentheXi are independent of X(,) includes Afonja(1972) study a p-sample problem.Further with slippage are concerned sampling. and Kozelka (1956) formultinomial normaland t-variates, forgeneralcorrelated

332

Data - The Ordering of Multivariate BARNETT

[Part3,

(1961)for binomial byIshiiandYamasaki X(p)- X(1)is considered The range Xi andby paper, and GuptaandPillai(1965), normal correlated X. Thislatter Guptaetal. (1964)for forms. ofsuchlinear and ratios oftheX(s), combinations to linear consideration givelimited
sample xl,x2,..., x, we might prospect. For a multivariate This suggestsan interesting

valueswithin the component first ordering of C-ordering: form a particular contemplate oftheordered combination linear ofsome tovalues xi inrelation the ordering anyxi andthen values. component identification isthe relationships avoidorder cannot which analysis Oneareaofmultivariate statistical (1963)on "multivariate byWilks thework is still reference Theprincipal ofoutliers. a test fora single in detail considers Wilks distribution normal For a multivariate outliers". in general arediscussed twooutliers than more for tests twooutliers; valueand for outlying is basedon outlier procedure The identification tabulation. anyrelevant terms butwithout ratio"is scatter the"one outlier ratios.For thetthobservation scatter
SI=Ia= I/Iaq ,

Outliers 4.2. Multivariate

ofsquares sums ofsample ofthematrix scatter" (thedeterminant where Ia1 I is the"internal with quantity samplemeans)and Iaijt theanalogous aboutthecomponent and products

small.For two(or ifS(q) is sufficiently as an outlier is,S(q),is identified thesmallest Si,that in theoutlier term thatthenumerator except is proposed thesameprinciple more)outliers points.Tablesofuseful sample ofanytwo(or more) on omission is determined ratio scatter thecasesofone,ortwo, for ofS(1)arepresented tailprobabilities thelower for bounds upper sample.The sub-ordering of an uncontaminated under thenullhypothesis outliers possible ofthevolumes ofsquares thesums to ordering equivalent R-ordering usedis clearly principle (either observations ofdifferent on omission canbe obtained which simplexes ofall possible one,two,etc.outliers). we anticipate on whether or in pairs, etc.,depending individually, in a outlier ofa single to theidentification a Bayesian approach Guttman (1973)presents a camefrom thatall but one observation He assumes distribution. normal multivariate comesfroma N(,L+a,l) distribution. observation the anomalous N(jx,l) distribution, ofa. distribution is basedon theposterior ofan outlier Identification forthecase of a bivariate testbased on R-ordering a graphical Healy(1968)proposes themeansof the valuesof (x-.)' # -1(x-,p) against ordered distribution: plotting normal or as a testfornormality distribution fora univariate exponential order statistics reduced of transformation normalizing orcube-root, a square-root he suggests Alternatively, outliers. suchan approach and Kettenring (1972)extend Gnanadesikan distances. thegeneralized plots,plotsof probability marginal methods including a rangeof graphical by suggesting analysis. component a preliminary after components orplots oflow-order distances generalized expression univariate itssimple anoutlier doesnotretain inhigher dimensions that explain They procedures outlier multivariate outat theend" ofthesample.Other as "theonethatsticks in the (1975, include Hawkins (1974)and Fellegi theprincipal component basedon ordering data). ofquantitative editing oftheautomatic context basedon gap test"foroutliers a "generalized to produce Rohlf(1975)uses R-ordering between distances Euclidean of matrix in the minimum trees of spanning of the edges lengths See also Devlinetal. (1975). all pairsofobservations. outliers using tests formultivariate (1961)discuss andTruax(1960)and Furguson Karlin hypothesis. alternative anda slippage-type measures distance Studentized (R-ordering) data in disof multivariate BothAnderson (1966)and Kendall(1966)use sub-ordering theaim is to use the from each of twopopulations, criminant analysis.Havingsamples
and Classification Analysis 4.3. Discriminant

yielding as S(1), S(2),..., S(n) and theobservation xi omitted.The SI (t = 1,2, ..., n) are ordered

1976]

BARNETT -

Data The Ordering of Multivariate

333

to theappropriate ofa newobservation in theassignment they portray to assist information a can promote blocks (C-ordering) equivalent howstatistically Anderson shows population. to are less formal.Theyamount analysis.Kendall'sproposals method of discriminant in somesortof sequentially observations of themultivariate thecomponents considering "thebestdivision" which effects component is that Thefirst order ofimportance. decreasing valuesin thetwosamples component as reflected bytheordered between thepopulations; component on thebasisofthechosen maybe assigned The newobservation (M-ordering). structured component (similarly the secondmostimportant alone; if thisis not possible, uncertainty range within the for the twosamples component values ofthesecond from ordered Thisprocess is continued as a basisforassignment. is considered component) forthefirst havebeenconsidered. or until all components has beenassigned until thenewobservation Analternative procedure proposed areundecided.") ofcaseswhich a residuum ("Weendwith hulls.IfA1, A2 arethe convex theconcept ofsample through employs P-ordering byKendall 1 or2 if x is assigned to population a newobservation hulls for thetwosamples, convex
xeA rnA2 or xeA1nA2

is qualitative withno to either.Discussion of thesemethods otherwise it is not assigned hulls A principle similar use of convex to Kendall's of statistical properties. consideration hullis determined that theconvex and Gessaman (1968), except is employed byQuesenberry Richards (1972) configurations. thebasic sample thanfrom regions rather from tolerance considering approach by informally elimination Kendall'ssuccessive and extends" "refines oftheelimination process. at certain stages pairsofcomponents non-parametric Gessamanand Gessaman(1972) reviewand comparemultivariate andKendall's ideasandtheir own particulariAnderson's including procedures, discrimination blocks.See also Gessaman (1970). equivalent zation ofstatistically distributions arise from eachofthree observations that F1, F2 andF3, independent Suppose ifF3is a mixture ofF1andF2 in theform to test areavailable.We wish and n suchtriplets
F3 = F1+(1-G)F2. 4.4. Mixtures of Distributions

of n multivariate observations with as a sample p = 3. Thomas The data can be regarded ofeachtriplet. Iftheranks ordering test basedon internal a non-parametric (1969)constructs of [1,2,3] as zeroif [(1),(2), (3)] is an evenpermutation are (1), (2),(3) a scoreis assigned rank scores over the n observation ofsuch Thetest is a combination oroneotherwise. statistic orcomponent-byuncommon ofinternal, Thisis an example oftherather application triplets. of "rankscoring" in nonthewidespread principle and also illustrates ordering component, themultivariate 4.5and4.7. Chatterjee (1972)considers SeealsoSections analysis. parametric rank with ofsucha model.Concerned compounded extension estimating 0, heuses"linearly in the combined all three foreach component samplefrom scores". Ranksare assigned Combination hasscores attached toeachelement. "rank matrix" andthe distributions resulting combined an estimator of0,which scores ordering. oftherank (marginal) employs yields fora of symmetry tests methods to construct use ordering or ranking authors Several areconcerned with IfF(xl,x2)is thedistribution testing bivariate distribution. function, they the2ncomponent observations, thehypothesis that xl). Sen(1967)combines F(xl,x2)= F(x2, Herewesee a 2 x n "collection rank matrix". into andseparates the ranks components overall, rather in a multivariate to non-independent sample, combined components ordering applied univariate usualapplication to independent thanitsmore data). samples (multi-sample
4.5. Testsof Symmetry

334

- The Ordering Data of Multivariate BARNETT

[Part3,

usinga function, implicitly Hollander (1971) bases a teston the sampledistribution andnonparametric ranks.Belland Hailer(1969)consider concept akinto layer P-ordering are based on ranks". ... procedures tests and showthat"all distribution-free parametric
Theyuse M-ordering. Comparisons 4.6. Some Multi-sample

of, equality in multi-sample data (fortesting principles We haveseenhowsub-ordering order.The ofrank usually leadto consideration between, distributions) or interrelationships (see in observation valuesis also considered (component) differences use ofsigns ofrelative features methods retain quantitative somemulti-sample, order-based 4.7). However, Section ofthedata. the ranges of each of p univariate from distributions samples Lewis(1972) considers ifat leasttworanges intheir andrejects do notoverlap. location at most equality differing Vincze a Smirnov-type testof equality (1961)constructs For twobivariate distributions, on a lineofrandom angle, a, relative observations ofthebivariate signed projections byusing areordered over thetwosamples in combination. The projections to theX1axis. Thesigned He refers totheunsolved ofdetermining form. problem is ofreduced, andcombined, ordering oftheSmimov over statistic as weletcevary (0,2T). distribution) (orlimiting thedistribution of two multivariate distributions based on Weiss (1960) proposesa testof equality and on numbers eachmultivariate sample within anytwoobservations between "distances" in the ofgiven radius abouteach observation in one sample within spheres of observations other sample. involve someideaofordering. Thisusually methods all non-parametric statistical Almost or over the ofrelative either within order orsigns samples theform ofrank differences, takes in terms ofindividual of samples formulti-sample component data,expressed combination metric on themultivariate sample spaceof the valuesor of someone-dimensional variate references ranks is van Dantzigand to use suchR-order observations (one of theearliest forms. are ofmarginal, and combined reduced concepts 1954). Thussub-ordering Hemelrijk, in forunivariate haveextensive methods coverage samples multi-sample Non-parametric a comprehensive evenin thelimited and we cannot survey theliterature hopeto provide have been discussed above. Others of their order-related basis. Some examples respect test and Tukey slippage (based Mosteller (1950)on a p-sample include (1948)and Mosteller andcounting ofitsobservations observation thenumber with onpicking thesample the largest for of equality ofdistribution in theother and tests exceed all observations which samples) with Saw (1966)andYoung(1970,1973)consider several for censoring, samples; twosamples from one sample or numbers of timesobservations rankorderin the combined sample, to obtain censored forms of Mann-Whitney, from theother Smirnov, those exceed sample, a generalized for Breslow or"precedence" "median" (1970)offers tests; p-samples, Wilcoxon, basedon numbers dataandBhapkar ofp-plets test for censored (1961)a test Kruskal-Wallis from eachsample) such that theobservation oneobservation which canbe formed (comprising value. See also Dwass (1960),Savage(1964) and from the ith samplehas the smallest Odeh(1967). methods havebeenreviewed multivariate by Puriand Sen (1971). For Non-parametric of the forsub-ordering all fourpossibilities data we encounter multivariate multi-sample set of samples. within as wellas overthecombined observations anysample, multivariate ofordering for thedevelopment ofnon-parametric Ranks andsigns aretheusualexpression as in thiswork, ranks and combined ranks feature expressed methods. marginal Marginal rankscores. to form or aggregated overthecomponents rankmatrices foreachcomponent a family of areemployed. ofranks ShaneandPuri(1969)proposed forms ofscoring Various results are rank forms statistics; rank order basedon sumsofquadratic oflinear signed tests
Procedures (Ranksand Signs) 4.7. Non-parametric

1976]

BARNETT -

Data of Multivariate The Ordering

335

testsof locationare discussed by Russelland Puri(1974). Rank ordermulti-sample extended based on a single of subsetsof components by Sen and Puri (1967); testsof independence means of component by Puriet al. (1970); testsof linearcombinations sampleare considered by Tamura (1969); Sen (1969) forp-samples(one of whichis a controlsample)are examined a Sen and Puri(1970) present lines,whilst of severalregression testsforparallelism considers other ranks. Puri and Sen (1966) offer based on marginal analysisof covariance multivariate as do Karlin and Truax (1960), Bennett procedures, multivariate rankordernon-parametric and Mehrotra (1970) and Johnson and Johnson (1964, 1968),Bhapkar(1966), Bhattacharyya based distributions of bivariate a testof homogeneity (1972). Mardia (1969b, 1970)considers and theoverallsample theobservations sample)of anglesbetween on ranks(in thecombined mean. data includesBennett equality (1962,concerning Some workon signtestsformultivariate normal distributions, p (4); Hodges (1955) and of means in two correlatedmultivariate on of observations of positiveprojections (1969) usingnumbers and Johnson Bhattacharyya to the choice of the line; Blumen(1958, a testfor withrespect an arbitrary line,maximized to slopes of lines fromthe observations the value of a bivariatemedianbased on ordering thesamplemedian). Methods 4.8. Graphical for screening procedures yearsan arrayof informal Therehas been builtup over recent properties, oftheir statistical oflarge-scale data sets. Lackingmuchknowledge theexamination have high intuitive appeal and are finding such methodsof "data analysis" nonetheless widespreadapplication. They include methodsof clusteranalysisand of multidimensional through characteristics scaling; also means for quick assimilationof certainmultivariate methodsfor procedures.These extendthe univariate judicious choice of graphicalplotting its parameters (see, forexample, of a probability model,or estimating assessingthe validity choiceofplotting ordered samplevaluesagainsta convenient 1975),based on plotting Barnett, of factorial plots" of Daniel (1959) in the interpretation positions; or the "half-normal data the methodsstill depend on an assessed orderingof For multivariate experiments. relevant to this reviewof multidimensional certainaspects of the data and are, therefore, ordering. data after of ordered radii,or angles,forbivariate Andrews et al. (1972) proposeplotting to assessbivariate Many papersby Wilkand Gnanadesikan normality. polar transformation, and Kettenring (1972), (1961, 1964,1968,1969),and Gnanadesikan (and viceversa)including graphicalproceduresfor a varietyof purposes such as examination propose multivariate of significant in multipleeffects and thedetermination of outliers identification of residuals, but morefrequently R-ordering, data. Such workemploys M-ordering, experimental response based on "generalizeddistances"of various typesand oftenleading to "gamma plots" in froman underlying for such distancesresulting view of the ubiquityof the x2 distribution normalerror model. 5. CORRELATION AND AsSOCIATION role is in the implicit One area of multivariate analysisin whichorderplaysan important variablesin a betweenthe component estimation and testing of association,or correlation, are based on of association, and testsofindependence, distribution. bivariate Many measures in different of observations or counting numbers regions thecomponent observations ranking Min some manner. Thus we encounter of the sample space afterit has been partitioned We cannotfully theworkon association survey orP-ordering C-ordering). (sometimes ordering or testsin whichordering plays a but will make a briefreviewof estimators or correlation will be givenonlyto workin whichthe orderbasis is notablepart. References particularly is given of some pronounced,or novel in some respect. A somewhatfullerdescription

336

BARNErr

- The Ordering of Multivariate Data

[Part3,

basedon the normal distribution, coefficient, p, in a bivariate estimators ofthecorrelation informawhere wehavelimited andrelevant to situations C-ordering concept ofconcomitants, samples. tionon oneofthemarginal counts of ofcorrelation basedon frequency many estimators or tests P-ordering underlies in different stem sample space. (Manysuchexamples regions ofthepartitioned observations of Karl Pearson and UdnyYule,and have often in thework from theturn ofthecentury, occasions.)This is trueof the on varioussubsequent been rediscovered or re-examined spaceis partiin which thesample or biserial measures ofcorrelation, tetrachoric, polychoric 2 x 2, rx s or2 x s tables or of andofcoefficients ofcolligation tioned to form offrequencies; possible only oftheir properties is often of suchmeasures and assessment Interpretation including the Tests of independence, normaldistribution. if we assumean underlying x2test, are manifold. Often is outside thecontrol of thebasisforcategorization ubiquitous whichare not readily in any or reflects factors interpretable the investigator qualitative inwhich sense.Ourcurrent interest is restricted tosituations orranking thecategories ordering reflect either orimplicit, ofthe twomarginal variables whether ofclassification ordering direct, or quantitative. be qualitative Herewe witness partialsub-ordering. they variables with respect be so if, for twoquantitative were dichotomized example, Thiswould values ofeachfor determination of(say)tetrachoric coefficient correlation, andsmall to large animproved orapplication ofa x2test ofindependence. Mosteller ofcontingency (1946)offers 4 p) bydichotomizing of p in N(p1, estimator x2about , but tetrachoric correlation p2, a2, in thefour corners themiddle forx3)improve on the region x > +kal. Counts (omitting k for estimator on p. The optimum tetrachoric (k = 0) forspecialchoiceof k dependent andvariances when hasasymptotic variance 1.939/n. Ifthemeans p = 0 is 0'612, theestimator accrue ifwe partition on thebasisof Mosteller shows thatsimilar areunknown, advantages ofthedataset:retaining mobservations with andmwith a certain highest, lowest, C-ordering in eachretained and dividing those observations intotwogroups ordered xl-group xl-values = 0-27is optimal.See also Ogawa to their ordered When x2-values. with p = 0, mln respect and Hamdan(1964)and Hamdan (1954,1959),Lancaster (1962). Goodmanand Kruskal ofmeasuring association from other tables. contingency aspects (1970)consider and tests forassociation havebeenproposed, and studied, by Hotelling Corresponding in thefour and Tukey test"basedon counts Pabst(1936)and byOlmsted (1947,a "corner in each aboutthemarginal medians.Countsare made inward corners after partitioning until themedian from thefour extremities forced to cross lines.Alternate + and quadrant in the successive and the accumulated to the counts signed quadrants signsare attached for is alsoknown as the a test statistic Thetest counts association. "quadrant assessing provide a deadregion abouteachmarginal mean ofwidth sumtest").Shahani leaving (1969)suggests suitable choice ofk or,if,u andai areunknown, aboutthemid40 percent 2kacfor omitting 40percent ofthe(marginal) andthe mid ordered ordered ofthe(marginal) x2-values. xl-values infour Elston and on tests basedon counts further work Mardia(1969)presents quadrants. a contingency a testof association basedon constructing tableofcounts Stuart (1970)offer order statistics. classboundaries defined with byparticular marginal observations are in estimating when use ofordering association arises Another or testing ror ranked correlation coefficients such as Kendall's as inthe ofrank determination marginally ofthe A detailed review order tests ofindependence. comparative Spearman's p, or in rank use of"ordinal measures ofassociation" is given (1964) byKruskal (1958). See also Aitkin to thecomparison of andAitkin andHume(1965,1966, ofrank correlation 1968).Extension with correlations between X1and X2,and X3 and X4,in a multivariate distribution p = 4, is considered byDavis and Quade(1968).
to x1< the xl-rangeinto threepartscorresponding dividing -kal,
^

Estimators, and Tests,ofAssociation 5.1. Order-based

contingency.

-kka <x1 <^ +kal,

19761

Data - The Ordering of Multivariate BARNETr

337

association, is theidea of order to rankcorrelation of relevance concept A basicorder Two rank"concept. with the"layer somesimilarity (1962)andhaving byKendall proposed aresaidto be: x22) and (x12, observations bivariate (xll,x21)
concordant,if (xll < discordant, if (xll <
< X22) or (xll X12,X21
X21> X22); > X12,

X < X22); > X22)or (xll > X12, X12,X21

tied, otherwise. in here, (1966)is relevant by Lehmann described concept dependence" The "quadrant large(or X1and X2in which variables random between a modelforassociation providing or negatively). positively (either to be associated ofX1and X2tend small) values of ("coefficients estimators classof correlation a general Daniels(1944,1948)proposes (product-moment, estimators particular casesthefamiliar as special include which disarray") of the Daniels' the efficiency p). Farlie (1961) investigates Kendall'sT and Spearman's coefficients. correlation generalized by ideashavebeenproposed sub-ordering alternative of p using quickestimators Other and Moran(1951)intheform Chown
- 1) sgn E{sgn -x2,i+i)}I(n (x2,i -x1,i+?) (xl,i on thefortuitous depends about033 when p = 0 (butnotehowtheestimator with efficiency of andbyLeigh-Dugmore (1953)basedon the"range theobservations arise), inwhich order axis". aboutthereduced major thedeviations

normal of a multivariate oftheparameters linear estimation Watterson (1959)examined the data. To discuss affect theobservational forms ofcensoring when different distribution a multivariate sample of ordering a method he suggested censoring idea of multivariate is then ofonecomponent (say, ordering ofthemarginal xl). Thesample x1,x2,..., xn interms
as represented
XIW 3, X1(2) *, ** Xl(n) X2[1, X22], ...*, X2

Based on Concomitants Estimators 5.2. Correlation

xp[l], Xp[2]s ..-

Xpfn;

whichhas arisenin associationwiththe whereXT[i, (r> 1) is thevalue of the rthcomponent being used: the xr[s]are principle sth orderedvalue of xl. Thus we observea C-ordering

moments. a forordering such a principle David (1973) and David and Galambos (1974) reconsider we statistics the concomitants of X1. Likewise, of theorder bivariate sampleand termthex2[81 of X1 for the concomitants of X2. Figs 7 and 8 illustrate as concomitants could define xl[8J information clearthepotential makesintuitively thetwofigures samples1 and 2, and comparing thexl(s]conveyabout associationbetweenX1 and X2. problems. They David and Galambos were not concernedwith statisticalinference theoryresultsof Watterson. Recent work by Barnettet al. (1976) extendthe distribution normaldistribution bivariate foran uncensored N(PK,2, G2, 22,p) where utilizessuch results resultssome usefulnew concomitant-based Watterson's are known. Extending 2 and U2r and examined. of p are described estimators

values. ofthe onan ordering conditional observations component xl-component quasi-ordered and second-order offirstforestimation considered arethen ofthexrts] combinations Linear

338

BARNETT -

The Ordering Data of Multivariate


x2

[Part3,
X2[44]

X2[11

x2X L X22j X2[21]

[361
23 X21373 X2[101 X2[43] ,

%tso,~~~~~~~~~~~~~~~~40
**

X248

X2[214

X2[2]~~~~~~~~~~X
X208 X2[9]

X2[101~~~~~~~~~~~~~~24

FIG. 7. Concomitants

of XI-sample
--.28

1.
(I

FIG. 8. Concomitants of X1-sample


p2:2 I+ 2

2.

andhomoscedasticity oftheregression ofthelinearity ofX2on X1wecanwrite In view


p x - i +Z. = .*(s , 2-..n)

*-'~~~~ ~

0 and u2 = 1) where lossof generality, without that theZ,,arei'ndependent, (assuming, p2= ofconcomitants thevector oftheorder statistics of X1,its N[0,(1 - p2)]. ThusifX2 denotes

matrix are mean,and variance-covariance


E(X)
=

pa,

V(X2)

matrixof the reduced order statistics wherea, V are the mean and variance-covariance matrix. N(th, sampleof size n from e2), and I is the n x n identity (X(I)- vi)/o fora univariate {X2wiu l- poj (s = 1,2, ... n) are independentN(,, 1- p2). See David and Asymptotically Galambos (1974). of E(Xn), V(Xth and oftheasymptotic theform distribution Of , Bamnett Exploiting X2t Pt of p as well as generalized least squares,and et al. (1976) considervariouslinearestimators maximum likelihood, estimators. Some readily calculable estimators,with reasonable are presented. characteristics, efficiency feature of thisapproachis theirrelevance to the An interesting of thevalues of and u ma of the concomitants, X2. Thus lack of moment,and asymptotic distribution, properties of p. Indeed,we can proceedeven of IL,and u2 is no obstacleto such estimation knowledge in themarginal ifwe knowonlytheranksof observations Xa of values. sample Such a limitedinformation situationcould arise in practice. Consider estimating the between correlation and valueson somesubsequent adjudgedgradesof a groupofindividuals measure. In educational,psychological we can encounter or industrial testing performance have been recorded wheretheearlierresults but (or evencollected)as rankorders, situations wherewe can realistically for the inaccessibleearlier postulatea joint normaldistribution latervalues. values (on whichtheranksare based) and the observable

1976]

BARNETT -

Data The Ordering of Multivariate


6. CONCLUSION

339

observations. Notwithordering a set of multivariate No reasonablebasis existsforfully and methods thestudy ofmultivariate distributions standing this fact ideasofordering permeate of multivariate or multi-sample analysis. We have reviewedthe mannersin which order bya four-fold classification ofsub-ordering relationships areintroduced and applied,facilitated theory implicit role in multivariate statistical principles.Ordering clearly playsan important and studymight yieldadded benefits. and method;its moreformal recognition
ACKNOWLEDGEMENT

forsuggesting some additionalreferences. The authoris grateful to thereferees


REFERENCES B. (1972). The moments AFONJA, ofthemaximum ofcorrelated normal and t-variates. J.R. Statist. Soc. B, 34, 251-262. AITKIN, M. A. (1964). Correlation in a singly truncated bivariate normaldistribution. Psychometrika, 29, 263-270. in a singlytruncated (1966). Correlation between bivariate normaldistribution. III. Correlation ranksand variate-values. Biometrika, 53, 278-281. - (1968). Correlation in a singly truncated bivariate normaldistribution. variances of IV. Empirical rankcorrelation coefficients. Biometrika, 55, 437-438. AITKIN, M. A. and HUME, M. W. (1965). Correlation in a singly truncated bivariate normaldistribution. II. Rank correlation. Biometrika, 52, 639-643. ANDERBERG, M. R. (1973). Cluster New York: AcademicPress. Analysis forApplications. ANDERSON, T. W. (1966). Some nonparametric multivariate procedures based on statistically equivalent blocks. "Krishnaiah 1", 5-27. ANDREWs,D. F. (1972). Plotsof high-dimensional data. Biometrics, 28, 125-136. ANDREWs, D. F., GNANADESIKAN, R. and WARNER,J. L. (1972). Methodsfor assessingmultivariate normality. "Krishnaiah III", 95-115. AUSTIN,T. L., JR (1959). An approximation to the pointof minimum aggregate distance. Metron, 19, 10-21. 0. and SOBEL, M. (1966). On thedistribution BARNDORFF-NIELsEN, of thenumber of admissible pointsin a vector randomsample. Theor. Probability Appl.,11, 249-269. BARNETT, V. (1975). Probability plotting methods and orderstatistics. Appl.Statist., 24, 95-108. BARNETT, V., GREEN,P. G. and ROBINSON, A. (1976). Concomitants and correlationestimates. Biometrika,
63, in the press. BELL, C. B. and HALLER,H. S. (1969). Bivariate symmetry tests: parametric and nonparametric. Ann.

BENNETT, B. M. (1962). On multivariate signtests.J. R. Statist. Soc. B, 24, 159-161. (1966). Note on confidencelimitsfor a ratio of bivariate medians. Metrika, 10, 52-54. (1968). On estimation of a ratio of multivariatemedians by non-parametricmethods. Metrika, 12,

Math.Statist., 40, 259-269.

ranktest.J. R. Statist. (1964). A bivariate signed Soc. B, 26, 457-461.

BERMAN, S. M. (1962a). Limiting distribution of the maximum termin sequencesof dependent random

22-28.

Tokyo, 13, 217-223. BHAPKAR, V. P. (1961). A nonparametric testfortheproblem of several samples. Ann.Math.Statist., 32, 1108-1117. (1966). Somenonparametric tests for themultivariate several sample location problem."Krishnaiah I", 29-41. G. K. andJOHNSON, BHATTACHARYYA, R. A. (1969). On Hodges's bivariatesign testand a testforuniformity
of a circular distribution. Biometrika,56, 446-449. (1970). A layer rank test for ordered bivariate alternatives. Ann. Math. Statist., 41, 1296-1310. I. (1958). A newbivariate BLUMEN, signtest.J. Amer. Statist. Ass.,53, 448-456. BRESLOW, N. A. (1970). A generalized Kruskal-Wallis test for comparing k samples subject to marginal patternsof censorship. Biometrika,57, 579-594. CACOULLOS, T. and DECICCO, H. (1967). On the distributionof the bivariate range. Technometrics, 9,
-

variables.Ann.Math.Statist., 33, 894-908.

(1962b). Convergence to bivariate limitingextreme value distributions. Ann. Inst. Statist. Math.,

CARNAL,H. (1970). Die konvexe Hulle von n rotationssymmetrisch verteiltenPunkten. Z. Wahrschein-

476-480.

undVerw. lichskeitstheorie Gebiete, 15, 168-176.

340

BARNETT -

Data The Ordering of Multivariate

[Part3,

mixture problem.J. Mult. two-population S. J. (1972). Rank approachto themultivariate CHATTERJEE, Anal.,2, 261-281. Statist. J. Amer. in k-dimensional space graphically. points H. (1973). Usingfacesto represent CHERNOFF, Ass.,68, 361-368. forestimating correlation coefficients. BioP. A. P. (1951). Rapid methods L. N. and MORAN, CHOWN, metrika, 38, 464-467. of thedispersion likelihood estimation of a chi-distributed parameter COHEN, A. C., JR(1955). Maximum Statist. to target analysis.J. Amer. applications sampleswith and censored truncated from radialerror Ass.,50, 1122-1135. adjacent theprobability that J.W. and TATSUOKA, M. (1960). Maximizing F., PRATT, R., MOSTELLER, COHN, 31, Ann.Math.Statist., intervals. form overlapping from several of samples populations orderstatistics 1095-1104. 36, 1223-1235. Ann.Math. Statist., modelin orderstatistics. CONOVER, W. J. (1965). A k-sample 1, 311-341. two-level Technometrics, experiments. plotsin factorial DANIEL, C. (1959). Use of half-normal in theuniverse ofsample permutations. ofcorrelation between measures DANIELS, H. E. (1944). The relation Biometrika, 33, 129-135. Biometrika, 35, 416-417. of rankcorrelations. (1948). A property 39,137-143. from normal distribution. Biometrika, ofa sample a circular circle (1952). Thecovering methods based on fewassumptions.Bull. Int. J. (1954). Statistical VANDANTZIG, D. and HEMELRLJK, Statist. Inst.,34(2), 239-267. by W. J. Conover. Ann.Math. modelin orderstatistics" DAVID, H. A. (1966). A note on "A k-sample Statist., 37, 287-288. New York: Wiley. Statistics. (1970). Order Bull. Int. Statist. Inst.,45, 295-300. of orderstatistics. (1973). Concomitants J.Appl. statistics. ofconcomitants of order theory J.(1974). The asymptotic DAVID, H. A. and GALAMBOS, Prob.,11, 762-770. forexrelations moments of orderstatistics P. C. (1968). Recurrence between DAVID, H. A. and Josm, 39, 272-274. changeable variates.Ann.Math.Statist., thecorrelations within twopairsofvariables.Biometrics, C. E. and QUADE, D. (1968). On comparing DAVIS, 24, 987-995. and outlier detection R. and KETrENRING, J. R. (1975). Robustestimation DEVUN,S. J.,GNANADESIKAN, coefficients. 62, 531-546. with correlation Biometrika, 17in Contributions toProbability andStatistics. rank-order tests.Chapter M. (1960). Somek-sample DWASS, Press. Stanford University (Olkinet al., eds). Stanford: EssaysinHonorofHaroldHotelling 52, 331-343. setof points. Biometrika, EFRON,B. (1965). The convexhullof a random Statist. density. J.Amer. probability ofmultivariate estimation T. A. (1968). Cubicaland spherical ELKINS, Ass., 63, 1495-1513. forcontinuous variables.Biometrics, J. (1970). A new testof association R. C. and STEWART, ELSTON, 26, 305-314. J. R. coefficients. of Daniels's generalized correlation efficiency D. J. G. (1961). The asymptotic FARLIE, Soc. B, 23, 128-141. Statist. Warsaw. of quantitative data. I.S.I. Conference, and imputation editing FELLEGI, I. P. (1975). Automatic Proc.4thBerkeley ofoutliers. Math.Stat.Prob.,1,253-287. Symp. FERGUSON, T. S. (1961). On therejection ofextreme distribution terms ofvariational series ofa two-dimensional FINKELSHTEIN, B. V. (1953). Limiting variable. Dokl. Ak. Nauk. S.S.S.R., 91, 000-0. (In Russian.) random Math. Soc., 72, 555-558. FISHER, L. (1966). The convexhull of a sample. Bull. Amer. 40, measures.Ann.Math. Statist., product setsand convexhullsof samplesfrom (1969). Limiting 1824-1832. Statist. J.Amer. ofsamples multivariate distributions. Ass.,70, from J.(1975). Orderstatistics GALAMBOS, 674-680. variables.J.Appl.Prob.,3, 556-558. ofa number ofrandom GALLOT,S. (1966). A boundforthemaximum des valeurs Publ.Inst.Stat.Paris,8, 123-184. extremes. Ala th6orie GEFFROY, J.(1959). Contribution LaplacienA K dimensions. du poly6dre d'appue d'un 6chantillon (1961). Localizationasymptotique Publ.Inst.Statist.Univ. Paris,10, 213-228. of the Studentised GENTLE, J. E. G., KODELL, R. L. K. and SMITH,P. L. S. (1975). On thedistribution bivariate 17, 501-505. range. Technometrics, estimator based on statistically multivariate density nonparametric GESSAMAN, M. P. (1970). A consistent blocks. Ann.Math.Statist., 41, 1344-1346. equivalent discrimination ofsomemultivariate procedures. GESSAMAN, M. P. and GESSAMAN,P. H. (1972). A comparison Statist. J. Amer. Ass.,67, 468-472. detection with residuals and outlier GNANADESIKAN, R. and KETrENRING, J. R. (1972). Robustestimates, data. Biometrics, 28, 81-124. multiresponse statistical in multivariate methods analysis. GNANADESIKAN, R. and WILK, M. B. (1969). Data analysis II", 593-638. "Krishnaiah

1976]

- The Ordering Data of Multivariate BARNErr

341

IsHI, G. and YAMASAKI, M. (1961). A note on the testingof homogeneityof k binomial experimentsbased on the range. Ann. Inst. Stat. Math. Tokyo, 12, 273-278. in thebivariate tests alternatives JOHNSON, R. A. and MEHROTRA, K. G. (1972). Nonparametric forordered

with of linear variables on range. Biometrika, functions of ordered correlated normal random emphasis 51, 143-151. in detecting outliers GUTTMAN, I. (1973). Care and handling of univariate or multivariate spuriosity-a Bayesianapproach. Technometrics, 15, 723-738. ordered HABERMAN, S. (1955). Distributions of Kendall'stau based on partially systems. Biometrika, 42, 417-424. HALDANE, J.B. S. (1948). Note on themedian ofa multivariate distribution. Biometrika, 35, 414-415. HAMDAN, M. A. (1970). The equivalence of tetrachoric and maximum likelihood estimates of p in 2 x 2 tables. Biometrika, 57, 212-215. HARTLEY, H. 0. (1950). The use of therangein analysis of variance.Biometrika, 37, 271-289. J.Amer. HAWKINS, D. M. (1974). The detection oferrors in multivariate data usingprincipal components. Statist. Ass.,69, 340-344. HEALY, M. J. R. (1968). Multivariate normal plotting. Appl.Statist., 17, 157-161. HODGES, J.L., JR (1955). A bivariate signtest. Ann.Math.Statist., 26, 523-527. medians.Ann.Math.Statist., HOEL, P. G. and ScHEuua, E. M. (1961). Confidencesetsformultivariate 32, 477-484. testforbivariate HOLLANDER, M. (1971). A nonparametric symmetry. Biometrika, 58, 203-212. ofsignificance no assumptions and tests HOTELLING, H. and PABST, M. R. (1936). Rankcorrelation involving Ann.Math.Statist., of normality. 7, 29-43.

GUPTA,S. S., PILLAT, K. C. S. and STECK, G. P. (1964). On the distributionof linear functionsand ratios

GOODMAN, L. A. and KRUSKAL, W. H. (1954, 1959). Measuresof associationforcross classifications. Statist. PartI. J. Amer. Statist. Ass.,49, 732-764. PartII. J. Amer. Ass.,54, 123-163. GREIG, M. (1967). Extremes in a randomassembly.Biometrika, 54, 273-282. Bull.Int. Statist. GUMBEL, E. J. (1961). Multivariate extremal distributions. Inst.,33a sess.,2? liv.,Paris. GUMBEL, E. J.and GOLDSTEIN, of empirical bivariate extremal distributions. J. Amer. N. (1964). Analysis Statist. Ass.,59, 794-816. GUMBEL, E. J. and MUSTAFI, C. K. (1967). Some analytical properties of bivariate extremal distributions. J. Amer. Statist. Ass.,62, 569-588. GUPTA, S. S. (1963). Probability integrals of multivariate normal and multivariate t. Ann.Math. Statist., 34, 792-828. from GUPTA,S. S., NAGEL, K. and PANCHA PAKESAN, S. (1973). On theorderstatistics equallycorrelated normal random variables.Biometrika, 60, 403-413. correlated GUPTA, S. S. and PILLAI, K. C. S. (1965). On linearfunctions of ordered randomvariables. Biometrika, 52, 367-379.

case. J. Mult.Anal.,2, 219-229. M. N. (1957). A property of theoptimum solution suggested byPaulsonforthek-sample slippage forthenormal problem distribution. Ind. Soc. Agric. Statist. 9, 179-190. KARLIN, S. and TRUAX, D. (1960). Slippageproblems.Ann.Math.Statist., 31, 296-324. KENDALL, M. G. (1962). RankCorrelation Methods, 3rded. New York: Hafner.
KAPUR,

Ann.Math.Statist., 27, 507-512. of order number offit testofgoodness based on a gradually A bivariate increasing Cornell statistics. of Operations Tech.Report No. 250, Department Research, Collegeof Engineering, University. KRISHNAIAH, P. R. (ed.) Multivariate Analysis, Vol. I (1966), Vol. II (1969), Vol. III (1972). New York:
KREIMERMAN, J.(1975). KRUSKAL, KUDO, A.

(1966). Discrimination and classification. "Krishnaiah Its, 165-184. KOZELKA,R. M. (1956). Approximate upper percentagepoints forextremevalues in multinomialsampling.

Academic Press.

143-156. KURTZ, T. E., LINK, R. F., TUKEY, J. W. and WALLACE, D. L. (1966). Correlation of rangesof correlated deviates.Biometrika, 53, 191-197. in contingency LANCASTER, H. 0. and HAMDAN, M. A. (1964). Estimation of the correlation coefficient tableswith possibly nonmetrical characters. Psychometrika, 29, 383-391. LEHMANN, E. L. (1966). Some concepts of dependence. Ann.Math.Statist., 37, 1137-1153. LEIGH-DUGMORE, C. H. (1953). A rapidmethod forestimating thecorrelation coefficient from therangeof thedeviations about thereduced majoraxis. Biometrika, 40, 218-219. J. L. (1972). A k-sample LEWIS, testbased on rangeintervals. Biometrika, 59, 155-160. estimate LOFTSGAARDEN, D. 0. and QUESENBERRY, C. P. (1965). A nonparametric of a multivariate density function. Ann.Math.Statist., 36, 1049-1051.

(1957). The extremevalue in a multivariatenormal sample. Mem. Fac. Sci. Kyushu Univ. (A), 11,

W. H. (1958). Ordinalmeasures of association.J. Amer. Statist. Ass.,53, 814-861. of outlying (1956). On thetesting observations. Sankhyd A, 17, 67-76.

342
MARDuA,

- The Ordering BARNErr of Multivariate Data

[Part3,

K. V. (1964a). Asymptotic independence of bivariate extremes. Ass. Bull.,13, CalcuttaStatist. 172-178. (1964b). Exact distributions of extremes, rangesand mid-ranges in samplesfromany multivariate population.J. of theIndianStat.Assn,2, 126-130. (1964c). Someresults on theorder statistics ofthemultivariate normal and Paretotype1 populations. Ann.Math.Statist., 35, 1815-1818. (1967). Correlation of therangesof correlated samples. Biometrika, 54, 529-539. (1969a). The performance of sometests of independence forcontingency-type bivariate distributions. Biometrika, 56, 449-451. (1969b). On the null distribution of a nonparametric testfor the bivariate two-sample problem. J. R. Statist. Soc. B, 31, 98-102. (1970). A bivariate non-parametric c-sample test.J.R. Statist. Soc. B, 32, 74-89. MARSHALL, A. W. and OLKIN, I. (1967). A multivariate exponential distribution. Ass., J. Amer.Statist. 62, 30-44. MOOD, A. M. (1941). On thejointdistribution of themedians in samplesfrom a multivariate population. Ann.Math.Statist., 12, 268-278. MOSTELLER, F. (1946). On someuseful "inefficient" statistics. Ann.Math.Statist., 17,377-408. (1948). A k-sample slippage test foran extreme population.Ann.Math.Statist., 19,58-65. MOSTELLER, F. and TUKEY, J. W. (1950). Significance levels fora k-sample slippage test.Ann. Math.Statist., 21, 120-123. MURTHY, V. K. (1966). Nonparametric estimation of multivariate densities with applications."Krishnaiah I", 43-56. MUSTAFI,C. K. (1969). A recurrence relation fordistribution. J. Amer. Statist. Ass.,64, 600-601. NAUS, J.I. (1965). Clustering of randompointsin twodimensions. Biometrika, 52, 263-267. NEWELL, G. F. (1964). Asymptotic extremes form-dependent randomvariables.Ann.Math. Statist., 35, 1322-1325. of themaximum sumof ranks. Technometrics, ODEH, R. E. (1967). The distribution 9, 271-278. to OrderStatistics OGAWA, J. (1962). ChapterIOF in Contributions (A. E. Sarhanand B. G. Greenberg, eds). New York: Wiley. testforassociation.Ann.Math. Statist., OLMSTEAD, P. S. and TUKEY, J. W. (1947). A corner 18, 495-513. of orderstatistics fromtheequicorrelated OWEN, D. B. and STECK, G. P. (1962). Moments multivariate Ann.Math.Statist., normal distribution. 33, 1286-1291. thata givensystem of deviations from theprobablein thecase of a PEARSON, K. (1900). On thecriterion of variables is such thatit can be reasonably correlated system supposedto have arisenfrom random sampling.Phil.Mag., 50, 157-172. S. (1969). Application of an estimator of POSNER,E. C., RODEMICH, E. R., ASHLOCK,J. C. and LURIE, in bivariate extreme value theory. J. Amer. Statist. highefficiency Ass.,64, 1403-1414. rankordertests. Sankhyd PURI,M. L. and Sen, P. K. (1966). On a class of multivariate multisample A, 28, 353-376. Methods in Multivariate (1971). Nonparametric Analysis.New York: Wiley. forindependence in PuRI,M. L., SEN, P. K. and GOKHALE, D. V. (1970). On a class of rankordertests distributions. multivariate Sankhyd A, 32, 271-298. discrimination QUESENBERRY, C. P. and GESSAMAN, M. P. (1968). Nonparametric usingtolerance regions. Ann.Math.Statist., 39, 664-673. I and II. Punkten RENYI,A. and SULANKE, R. (1963,1964). Uberdie konvexe Hullevonnzufallig gewahlten undVerw. Z. Wahrscheinlichkeitstheorie Gebiete, 2, 75-84 and 3, 138-148. ofdistribution-free discriminate L. E. (1972). Refinement and extension RICHARDS, analysis.Appl.Statist., 21, 174-176. of multivariate outliers.Biometrics, of thegap testforthedetection RoHLF, F. J. (1975). Generalisation 31, 93-101. in fora class of rankorderstatistics RUSSELL, C. T. and PURI,M. L. (1974). Joint asymptotic normality multivariate J. Mult.Anal.,4, 88-105. pairedcomparisons. to Order New York: Wiley. Statistics. SARHAN, A. E. and GREENBERG, B. G. (eds) (1962). Contributions to the theory of rankorderstatistics: of latticetheory. SAVAGE, I. R. (1964). Contributions applications Rev.Inst.Statist. Inst.,32, 52-64. of twosamples is censored.Biometrika, G. J.(1966). A nonparametric one of which SAW, comparison 53, 599-602. testsformultivariate Part 1: Problems of location SEN, P. K. (1967). Nonparametric interchangeability. and scalein bivariate distributions. Sankhyd A, 29, 351-372. form-dependent ofsamplequantiles (1968). Asymptotic normality processes.Ann.Math.Statist., 39, 1724-1730. Math.Statist., fortheparallelism ofseveral lines. Ann. tests (1969). On a classofrankorder regression 40, 1668-1683.

1976]
-

Data - The Ordering BARNETT of Multivariate

343

38, 1216-1228. sampleproblem.Ann.Math.Statist., models. in somemultivariate linear tests ratioand rankorder oflikelihood theory (1970). Asymptotic 41, 87-100. Ann.Math.Statist., distance". "An approximation to thepointofminimum aggregate SEYMOUR, D. R. (1970). Note on Austin's 28, 412-421. Metron, 18, 185-190. forlargesamples.Appl.Statist., graphical testofassociation SHAHANI, A. K. (1969). A simple formultivariate Ann.Math. pairedcomparisons. SHANE,H. D. and PuRI,M. L. (1969). Rank ordertests 40, 2101-2117. Statist., 11, 195-210. I. Ann.Inst.Stat.Math.,Tokyo, extremal statistics, SIBUYA,M. (1960). Bivariate in samples from a bivariate ofquantiles population.J.Res. Nat. Bur. SIDDIQUI, M. M. (1960). Distribution Stand.,64B, 145-150. and of parameters of a multivariate normalpopulationfromtruncated SINGH, N. (1960). Estimation Soc. B, 22, 307-311. samples.J. R. Statist. censored distances of theindividual pointsin themultivalue of thegeneralized M. (1959). The extreme SIOTANI, 10, 183-208. Math.,Tokyo, variate normal sample. Ann.Inst.Statist. samples. normal ofranges in correlated H. 0. (1968). A noteon thecorrelation SMITH,W. B. and HARTLEY, Biometrika, 55, 595-597. order of certain withtheextreme statistics connected 0. P. (1967). Asymptotic independence SRIVASTAVA, Sankhyd A, 29, 175-182. in a bivariate distribution. statistics between distribution ofdistances W. L. and BARTOO, J.B. (1964). Asymptotic 0. P., HARKNESS, SRIVASTAVA, bivariate from 35, 748-754. orderstatistics populations.Ann.Math.Statist., normaldistribution. multivariate STECK, G. P. and OWEN, D. B. (1962). A note on the equicorrelated Biometrika, 49, 269-271. 40, based on ranks. Ann.Math. Statist., procedures comparison R. (1969). Some multivariate TAMURA,
1486-1491. insamples from twonormal populations with order statistics associated D. (1955). Probabilities TEICHROEW,

of rankordertests forlocationin themultivariate one SEN, P. K. and PuRI,M. L. (1967). On thetheory

Biometrika, 56,475-484. distributions. formixed probability free tests THOMAS, E. A. C. (1969). Distribution Ser. 2, A, 7, 219-227. Lisbon, TIAGO DE OLIVEIRA, J. (1959). Extremal distributions. Rev. Fac. Ciencias E. Estatistica extremes; extensions.Estudosde Mathematica, theory of bivariate --(1962). Structure Econometria, 7, 165-195. forbivariate extremes. Port.Math.,24, 145-154. decision (1965). Statistical Paris,17, (2), 25-36. definition and properties. Publ. Inst.Stat. Univ. processes; (1968). Extremal
-

ChemicalCorps Engineering Agency. withequal variance.Army ChemicalCenter, Maryland,

de Calcolo delleProbabilita statistical decision.Istituto extremes: (1971). A newmodelforbivariate in Onorede Statistica e RicercaOperativa degliStudidi Roma, ed. Studidi Probabilita dell'UniversitA pp. 437-449.Gubbio: Oderisi. Giuseppe Pompilj, models. J. Amer.Statist.Ass., 69, in the nondifferentiable bivariate extreme (1974). Regression 816-818. Warsaw. LS.L Conference, extensions. extremes: (1975). Bivariate based on range. Rep. Statist.Appl. Res. parameters of bivariate S. (1962). Estimation TSUKIBAYASHI, JUSE, 9, 10-23. regions 11. Statistically blocksand tolerance equivalent estimation. TUKEY,J. W. (1947). Nonparametric in thecontinuous 18, 529-539. case. Ann.Math.Statist., Math. Symposium Proc. 4th Berkeley testsbased on orderstatistics. VINCZE, I. (1961). On two-sample Stat.Prob.,1, 695-705. 14,45-55. Math.Statist., limits.Ann. forsetting tolerance of Wilks'method WALD, A. (1943). An extension stochastic processes. stationary values in samplesfromm-dependent WATSON,G. S. (1954). Extreme Ann.Math.Statist., 25, 798-800. normalpopulations. multivariate in censored samplesfrom G. A. (1959). Linearestimation WATTERSON, Ann.Math.Statist., 30, 814-824. 31, 159-164. Ann.Math.Statist., distributions. L. (1960). Two-sample tests formultivariate WEISS, distribution. J. Res. Nat. a multivariate of quantiles from jointnormality (1964). On theasymptotic Bur.Stand.,68B, 65-66. data response experimental R. (1961). Graphicalanalysisof multiple WILK,M. B. and GNANADESIKAN, distances.Proc. Nat. Acad. Sci., USA, 47, 1209-1212. usingordered Ann. Math. experiments. comparisons in multiresponse (1964). Graphical methodsfor internal 35, 613-631. Statist., limits.Ann.Math.Statist., tolerance of samplesizesforsetting S. S. (1941). On thedetermination WILKS, 12, 91-96. limits.Ann.Math. to the problemof tolerance withspecialreference prediction (1942). Statistical 13, 400-409. Statist.,
(1968). Probabilityplottingmethods for the analysis of data. Biometrika,55, 1-18.

(1970). Biextremaldistributions:statisticaldecision. Trab. Estad. y Inv. Oper., 21, 107-117.

344
-

Barnett's Paper DiscussionofProfessor

[Part3,

Bull.Amer. Math.Soc., 54, 6-50. WILKS, S. S. (1948). Orderstatistics.


-

Statistics.New York: Wiley. (1962). Mathematical A, 25, 407-426. outliers.Sankhyd statistical (1963). Multivariate of dependent variables, statistics between theP.D.F.'s or order relations YOUNG,D. H. (1967). Recurrence Biometrika, 54, 283-292. and someapplications. based on a givenorder of powerforsome two sampletestswithcensoring (1970). Consideration Biometrika, 57, 595-604. statistic. alternatives forthetwo-sample Lehmann rankstatistics under of somecensored (1973). Distributions 60, 543-549. case. Biometrika,

PAPER OF PROFESSOR BARNETT's DISCUSSION ofa random statistics ofNewcastle uponTyne):Theorder (University Professor R. L. PLACKETT forjust as long. and have been studied methodology, partof statistical sampleform an integral in viewofthewider interest nowtakenin are worth noting, historical features Someof thesalient to the Theorie des Probabilitds is dated Analytique The secondsupplement past developments. fi from thelinear of estimating withtheclassicalproblem February 1818. Laplace is concerned model
E(Y) = fix1 (j = 1, 2, ..,n).

theratios ... aredecreasing. His procedure and that Supposethat x1,x2,... arepositive, y1/xl,y2/x2, r is suchthat is to estimate ,6byylx1, where minimizes is thevalueof fi which Thusyr/Xr
Xl+X2+...+x,-i<xr+xr+i+...+xn

and

x1+x2+...

+X7>x+1+xt.+2+-..+x".

?1Iy-Pxil.
oftheestimator which is a generalized form ofthe theasymptotic distribution Laplacederives Yl/x7, Moredetails ofhisworkare given median. (1973). byStigler sample we findFrancisGalton 70 years,in whichmuchthatis relevant doubtless occurred, After as follows. and writing in Natural Inheritance (1889)abouttheNormalCurve, rhapsodizing ofchaotic in handand marshalled in theorder a large elements aretaken "Whenever sample form ofregularity tohavebeen an unsuspected andmost beautiful proves oftheir magnitude, all along." latent on themeanvalue contains ofBiometrika articles volume Thefirst bybothGaltonandKarlPearson of K. P. must Thisinterest be responsible successive order statistics. between for of thedifference at University many CollegeLondon,whichhas influenced theattention givento orderstatistics For example, the use of rangein place of standard deviation, associatedwiththatinstitution. of smallsamples, of variation thestability was suggested whenexamining amonga largenumber factors forcontrol charts were (1927). Tables of thenecessary by Tippett (1925) and "Student" control in a British standardization and quality by E. S. Pearson standard on industrial included ofinference methods basedon order statistics period, simple theimmediate postwar (1935). During in a computer are largely butthey obsolete thesufficiency age. However, wereactively explored, a basisforpermutation willcontinue to be used of theorder statistics tests.Thesetests provides based on modelswithmore structure wheremethods cannotbe in areas such as psychology, justified. oftheconcept of order whatgeneralizations can be madein two arises naturally: The question in Professor Recent is welldescribed Barnett's work ormoredimensions? paper. At thebeginning, exist onlyin one Kendallto theeffect thatorder properties from Sir Maurice he givesa quotation thatno reasonable basisexists fora fullordering ofmultiand at theendhe concludes dimension, of orderstatistics whenthetransition features is variate data. I agree. We lose someattractive to several, and simplicity. Sufficiency remains, to notably uniqueness madefrom one dimension tests. multivariate permutation support data can be expressed multivariate as follows. Given a sample The problemof ordering
of the form x1,x2, ..., x. we require an arrangement
{Xi) Z Zl X(A}Z.. {X(k)},,

1976]

Barnett's Paper DiscussionofProfessor

345

< is thesymbol: exclusive i,j, ..., k rangeovermutually notpreferred to,and thesubscripts where Barnett has givenfourmethods of 1,2, ..., n. Professor subsetsof the integers and exhaustive as marginal, reduced, partialor the ordering interpreting the symbol<, in whichhe describes Theyare analytical and geometrical. I believe thatthere arereally onlytwomethods: conditional. a function methodis to introduce distinct.The first largely, but not altogether, f(Q) withp valuesoftheothers.We calculate forfixed increasing in anyone argument arguments, monotonic includes both the n valuesoff(.) forthesample,and thenorderthesevalues. This procedure The second method there distinction. which is no clearcut ordering, between marginal andreduced or usefully expressed which arenoteasily hull, concepts suchas theconvex is basedon geometrical thenthistoo is an analytical in analytical terms.However, whenthe regionsare prescribed, in term ofa function f( ). approach and can be expressed or a quadratic form, so combination ) is a linear Barnett has given examples Professor wheref(in schoolsand universities use Examiners maybe relevant. letus consider other functions which a provisional ranking of the to determine and subjects from different questions thetotals ofmarks considerations. Supposethattwo thefinal ranking maydependon further candidates.However, butthemarks on individual questions thesametotal mark, achieve ina particular subject candidates candidate is usually and in thesecondexactly equal. The first are in thefirst case widely dispersed, thanto a multiplicity is given to completed questions weight thatgreater preferred, on thegrounds themarks candidate can be achieved on of thefirst bycombining of fragments. A higher ranking whenever are thequestions which mustalso be symmetrical questions usinga concavefunction, thattwocandidates thesame achieve on thesamefooting, as is usual. On theother hand,suppose arein thefirst butthemarks forindividual case exactly equal, subjects totalmark overall subjects, candidate is usually on thegrounds thata The first preferred dispersed. and in thesecondwidely to an uneven one. A higher ranking on thedifferent is preferable subjects uniform performance themarkson subjects can be achieved usinga symmetrical of thefirst candidate by combining in the following Both possibilities are included n2, ..., nrbe proposal. Let n1, convexfunction. theranking is based. Define N as a function and N thescoreon which ther marks to be combined, of nl, n2, ..., nby
Nx= Y, nllr.

mean to thegeometric if0 < x< 1, and concaveifa > 1. As x -- 0, N tends is convex The function This manyof the present of nL, n2, proposal may appear radical,but would eliminate n,. and subjects.The values x = 2 for after takingx = 1 forbothquestions subjective adjustments x= are suggested. to finding theroot forsubjects and a respectively Theycorrespond questions thescopeof a pocket calculator. meansquareand thesquaremeanroot,and lie wellwithin of multivariate data can Barnett's Thereis muchin Professor paperto showthattheordering The study is particularly of outliers arising of statistical important, problems. helpin thesolution and theassociated estimation continues and reliability, do in meteorology as they topicof robust of association shouldnearly On theotherhand,I believethatcoefficients attention. to deserve in a thattheyare seldommeaningful exceptas parameters alwaysbe avoided,on the grounds in Statistical widecurrency at present viewis given model. The opposite Package forthe statistical with tables 16 of thismanualis concerned contingency Social Sciences (Nie et al., 1975). Chapter all thefigures thevaluesof ofassociation.Nearly measures andrelated giving reproduce printouts "raw chi square" and functions This is the which include thereof. of association, 13 coefficients Three-dimensional tablesare treated as a sequence of which is presented. of analysis onlymethod thesamequantities arecalculated.Consider, the from which forexample, two-dimensional tables, below: data in Fig. 16.1,reproduced
...,

Race Income White Non-white

Lessthan $4,000
$4,000-7,999

396
526

98 64 40
70

$8,000-12,499 andover $12,500

612 624

DiscussionofProfessor [Part3, Barnett's Paper Whatseemsto be called forhereis a logitregression of the white/non-white ratioon income, in whichthe relationship possibly is expressed transformed, by a modelwithtwo parameters. thanany analysis based on coefficients This wouldlead to moreunderstanding of association. I wouldliketo conclude Professor Barnett forguiding us through bythanking thecomplexities of thisimportant so many results in sucha clearand systematic and difficult topic. By presenting pattern he has brought, so to speak,orderout of chaos. I havemuchpleasure in proposing the voteof thanks on hispaper. ofLeeds): I am very Professor K. V. MARDLA (University pleasedto be able to secondthisvote of thanks.However, Professor Barnett neednot speculate as to which referee's was mine report inperforming as I am acting as a replacement this pleasant duty. Thepaperis to be greatly for intheclassification welcomed itsefforts andunification ofprevious work. In fact,thisarea neededa fresh look. UnlikeProfessor I agreewithProfessor Plackett, Barnett's fourbasic ordering The development of thissubjecthas been somewhat principles. haphazardand theseprinciples structure whichconnects givean underlying previous studies.I it wouldbe advantageous think to sub-classify thedifferent of R-ordering. types Two types could be "distance-ordering" and "projection-ordering". Theformer usesanyspecific measure ofdistance with theidea of theform of theunderlying while thelatter wouldinclude population ordering the sampleusingthe first etc. Another principal component (or higher), seriation, typecould be "polar-ordering". Thepapergives mean opportunity to lookat myold work.Looking ofmy back,theimportance a pleasing formula 1967paperis also in giving forcorr general (R1,R2) whichgivesa simplified forthenormal is also computationally expression case. Thisexpression moreconvenient. Further, for n = 2 andn = 3 as given theformula inmyPh.D. thesis simplifies of1964(Rajasthan University) and also obtained in Mardia(1967,p. 533). The independently byKurtzet al. (1966)as mentioned formarginal exactexpressions foranyn forthemultinormal case are also contained ranges in the are published thesis. Theseresults of abstracts in theAnnals onlyin theform of Mathematical Statistics it is notsurprising (1963,pp. 1131,1627; 1964,p. 461) and therefore thatthese are not generally known. In distance-ordering, to call D2(x, A; E) = (x - tL) it is convenient E-(x - ,) the"Mahalanobis E maybe constant between x and ILwhere distance" eachofthethree quantities x, ,u, or stochastic, ofE (see Mardia,1975a). If D2 - D2(Xt, x, S), itcan be seenthat and E- is a g-inverse the"outlierratio" St = 1- (n- 1)-i D2 and, as Professor scatter Barnett alreadyknows,the graphical procedures suggested byHealy(1968)(and also byCox, 1968)look basically at these samequantities. Of course,Gnanadesikan, Wilkand Kettenring methods to assessmultinormality givegraphical butI think measures and test were specific first analytical procedures given byMardia(1970)which on Mahalanobis anddistances depend angles (see,for example, Mardia,1975b).Thecorrespondence these tests andgraphical between is similar methods to that between Wilks'outlier test andgraphical of Cox (1968),Healy (1968) and others.Thereare obviousadvantages procedures of distanceoncetheappropriate ordering distance forthedata is known. For multinomial has been some controversy there populations, on the appropriateness of a distance measure (see Edwards, thatsurprisingly 1971). I discover thiscan be resolved through its with the von Mises-Fisher close connection distribution M,(p.,K) withmean vectorIL and concentration K. Letnl,..., nk be distributed as multinomial parameter with parameters (p1,...,PA and n = nl+... + nk. (Ep = 1.) (i) It can be shownthatas n -* oo, .. , 4nk) .*, pk), 4n}. -Mkf(4pl, n-i(Vn:L, when is n large, thedistance a vonMisespopulation measure Hence, for is suitable for a multinomial population. (ii) If I1 is theinformation matrix of4pl, .., lPkforthemultinomial, and 12 is theinformation matrix of p.1, p.s forM,(IL, K) withK fixed, then forpi = >P
.--

346

where of n and K only. ThusRao's distance measure is thesameforthe 01 and #2 are functions twopopulations. as in thevon Mises-Fisher Therefore, case,we are led to use theBhattacharyya forthemultinomial distance which is theanglebetween population (Inl, ..., Ink)and (Ip,, *--, VPk)-

#1(n) 1 = 02(K)

12,

1976]

DiscussionofProfessor Barnett's Paper

347

of p in a bivariate normal can indicates howan attractive estimate Professor Barnett population ofdependence is as a quickmethod to assesstheextent be obtained.In quality control situations, andletpw(n, thenormal correlation between follows.Letr, be theobserved ranges p) = f Ip I) for in Mardia(1967). In fact, where case. One estimate of p is I p I = f-1(rw) f is tabulated usingthe that ofKurtzet al. (1966),itis found approximation
p2_

[-3n

+ {9L2 + 24r2(1 -Ln)2}]1/4(l -L),

to use. and thisis muchsimpler where Lnis a knownconstant,

bivariatetestsusingcircularordering unconditional Althoughhistorically (strict) non-parametric there is a one-to-one correand circular non-parametrictests were developed independently, spondence betweentestsin the two categories. In this sense, Hodge's testis related to Ajne's test scores test. My bivariatetestpossesses (1968), and Mardia's (1967a) testis relatedto the uniform of this test compared to T2 an interesting invariance property,and the asymptoticefficiency we have again to rely on the four amounts to about 80 per cent. However, on a hypersphere, case (p> 3), thereis a dearth of smallorderingprinciples. Consequently,as in the multivariate testsof practicalvalue on a hypersphere. sample non-parametric are determined Broadly speaking,the typesof orderingprincipleswhich will be more effective is predominantly used in outlierproblems. Perhaps Professor by one's objectives,e.g. R-ordering Barnettcould have told us in Section 4 (possibly in tabular form?) which orderingprinciplesin in principalareas of multivariate his opinion have been most effective analysis. researchwould emergefromthe The paper elegantlyleads us to the realisticview that fruitful orderstatistics.It givesme ratherthan trying to define fourordering principles higher-dimensional the greatestpleasure to second the vote of thanks. The vote of thanks was passed by acclamation.

ProfessorR. M. LOYNES(Universityof Sheffield):It is a great pleasure to see the Society holding one of its only too rare meetings outside London here in Sheffieldtoday, and an equal pleasure that the speaker, who has given a useful and learned contribution,is also a colleague. I should like to make just a fewcommentsabout various aspects of thepaper. In the univariate in orderstatistics:one, the common occurrenceof censoring case I see two reasons forthe interest as estimators in linear combinationsof order statistics mechanismsin real life (the interest surely depends on the existenceof simple censoringprocesses which allow one to carryout orderingin practice,forexample); and the other,the factthat withinthe model of a simplerandom sampleor more generallyof exchangeablerandom variables-the order statisticis the minimalsufficient situationwe findthat both approaches run into rather statistic. When we turnto the multivariate statistic is in a sense easily described(it is the Of course the minimalsufficient similardifficulties. set of sample vectors,theirlabels no longer having any importance),but thereseems no way of can be illustratedratherdifferently: providinga simple canonical form for this. The difficulty of thesample xi, andfis a monotoneincreasing suppose x(t) are univariateorderstatistics function; of thetransformed forsimplicity, samplef(xi). In two dimensions, thenf(x(i)) are theorderstatistics what would we mean by a monotone transformation?Any natural definition would exclude the we would wishto regardtheaxes as arbitrary. of rotating possibility axes, and yetin manysituations Perhaps we should accept that the axes may not be rotatedin such discussions. Some supportfor such a restriction is added by the observationthat althoughon the line a point (the median) exists withequal numbersof points on eitherside, in theplane no point exists,in general,such thatequal line throughit; the vector of marginalmedians numbersof points lie eitherside of an arbitrary forlines drawnparallel to the axes, however. possesses thisproperty The idea which appears in several guises, that we should order not every single observation, but rathersets of them(as, forexample,in Figs 5 and 6), seems to me an appealing one. But it is ratherdifficult to see why the particularexample of convex hulls should be of general interest: it makes no reference to an origin,and yetifthe originwerenot inside(or at least not much outside) the innermost polygonone would findinterpretation difficult. Finally, references:those for Section 4.1 are of course not complete. More recent ones are Galambos (1972) and O'Brien (1974).

348

Barnett's Discussion Paper ofProfessor

[Part3,

in practical of thetechI am interested applications Polytechnic): Dr A. HUITSON (Sheffield to tie up my practical in thispaper. I was trying experience niqueswhichhave been outlined and the and becamea littlelost whenit came to the P-ordering withthe varioustechniques on these his thoughts care to outline particular WouldProfessor Barnett C-ordering techniques. please? aspects to some medical had occasionto givea set of lectures Dr G. M. PADDLE (ICI): I recently observed thatthesophisticated which at workand,at theend,one of them techniques colleagues wereall very wanted was a simple really I had putforward well,butall thathe and hiscolleagues data to pickout thesalient I commend at a massof multivariate features. forlooking technique in cheek, Barnett as a target to which we shouldworkin thearea to Professor this, withtongue he has described today. out to me in thecourseof hispaper forpointing Professor Barnett I shouldalso liketo thank I felt I was unableto solve,are in werefairly easy,butwhich which thattwoor three problems factquitedifficult. I shouldliketo drawattention as beingrelevant to thisarea are as Two problems to which follows. ofa battery ofbloodtests on various patients medical data-such as observations (i) For many The medical to find correlations. wellexpect ofsuch -we might interpretation high positive and to pick out the outliers rectangular data is to use marginal by applying ordering The factthatstatistically, of abnormality. definitions peoplewith xi> i butx1<x, are in of no interest is apparently and it is difficult to persuade moreabnormal some respects no matter howimprobable these to repeat to them they maybe. In relation observations, in which work tooknotsingle Levinet al. (1973)havedonesomevery interesting they this, on thesamepeople,and calculated observations buta number ofrepeat a set observations fortime ofdayand personal after They ofabnormal observations characteristics. allowing in his paper,a completely Barnett different set of abnormal as Professor suggests found, as such. defined from theone usually observations fit ofphysical ofmaterials, we might equations to regression properties (ii) In themeasurement withconstituents then a set of observations, Yij,fora rangeof materials Xik We might of thesamples to decidewhich of material wereunreprewantto look at theresiduals, ri1, wereunrepresentative. For instance, and whichindividual if we were readings sentative, we might tensilestrength and elongation, decide that all threewere testing hardness, foranother butthatonlyonewas inaccurate foronesample, sample.Inaccurate inaccurate sum of squaresforinstance, ordering, weighted by a distance samplescould be defined one or twoincorrect ofsamples with a rather butdefinition sophisticated readings requires form of ranking. has presented Barnett solutions to these I am notsuggesting thatProfessor buthe has problems, is a wayforward. mademe awarethatthere of Hull): I shouldlike to make a fewrather T. LEwIs (University randomand Professor remarks. off-the-cuff Barnett to quoteKendalland maketheremark which wellforProfessor Professor It is all very Plackett It is all very welltoo forProfessor to saythat he agrees Plackett quotedinhisconclusions. thatone cannotreallyordermultivariate data, but the fact-whichwas impliedby Professor about theclass listforstudents-isthatthetopicof Professor in his example Barnett's Plackett in thehuman context which one couldimagine. This splendid paperis one of themostimportant ofmultivariate is whatis going on all thetime:thestudent classlists, is becausetheordering points the variouspoliciesin order-we hope thatis whatis done. In between the Cabinetdeciding in thesortof radio play in whichthere form it is in its purest are, say, 10 peopleon a fiction, and so on-and they haveto go, one byone,forthesurvival of boat-a judge,an actress sinking Professor Barnett one finalvectorvalue,xl. This does existin the real-life context, although in hispaper. thisnotation avoidedusing in theunivariate and natural thatordering obvious is perfectly It has also beensaid byeveryone is clearand unambiguous-and Barnett case. Professor ordering saysin hispaperthatunivariate

1976]

Discussion ofProfessor Barnett's Paper

349

so itis,butitis notinvariant because itdepends on thespecification ofthevariable.Thisis,indeed, a platitude, are as follows. butthefigures If we wantto order thisunivariate sample-and I putitdownin order-there is A<B<C<D. Ifwe liketo usea transform ofthisvariable-06, 0-8,0 4, 0 7-this givesC< A < D < B. If we use 3x, then we multiply all these us: 4-8-and we willcrossoff bythree, giving thefour; 5 4-and, ifwe crossoff theintegers, we getthefollowing: D < C<B<A. Of course, we are onlyinterested in thisone becausethere is a reasonforpreferring-in Professor Plackett's happyworld-largevaluesof x to smallvaluesofx. If we turn to themultivariate case,I agreewith Professor in hisremarks Plackett thatwereally wantto do R-ordering, it to some univariate reducing of thex's. sample,usingsome function If we takethebivariate case,to makeiteasyto talkabout,which is: hxy, nowh is notcompletely which wouldleaveus in a very arbitrary, but-as Professor vaguesituation, Plackett said-we are in functions, really are monotonic onlyinterested in each ofthearguments. h, which It is easyto showthenthatifthemarginal in thex's is thesameas themarginal ordering in they's, ordering thatordering these appliesalso to anyh satisfying conditions. In thatsense,fora samplewiththesame marginal it has a uniqueordering orderings, in a ofsize 1,0O0, reasonable is a sample way. Ifthere andthemarginal ofthex's is notexactly ordering thesameas thev's, butthere is perhaps correlation onlyone reversal-the is nearly right one-it wouldseemreasonable to think thattheordering was nearly If there defined. is thesituation in whichx and y are independent, thiswillnothappenvery so it is not of muchhelp,butif often, x and y are highly thosesampleswilloften be obtained correlated, and theordering a becomes useful theordering idea. This makesme think thatperhaps is related-or thatwe shouldthink ofit as related-totheunderlying ThenI notethatin Professor distribution. Barnett's discussion of various M-, R-, C- and P-orderings, somewhere at leastone ofhis R-orderings usestheprobafunction fortheunderlying bility density I shouldliketo ask him,therefore, distribution. whether alongwith hisclassification there is nottiedup another, rather fundamental classification which is thefollowing: which are based purely on therelative orderings sizesof thenumbers which comeintothe on theone hand,and orderings sample, which in thepresumed bring on the distribution, other hand. Let me conclude Professor bythanking to thoseoftheother Barnett, addingmythanks speakers, forhispaper. Professor G. A. BARNARD (University of Essex): I supposewe shouldnot be surprised at the of methods of ordering multivariate multiplicity data whichcould be presented in thecountry whichinvented and one-upmanship as well as otherforms snobbery of class distinction. Since some of therequirements forordering thathave beenmentioned thisevening have beenrelated to thequestion of outliers, I thought it wouldbe worth while againdrawing attention to Morven Gentleman's in themultivariate proposalforrobustestimation case. This is one of rather few do notrequire methods which oranyordering. It is tochooseas thecentre trimming oflocation that pointin thespacefrom which thethree-halves ofthedistances powers ofthepoints aresummed to a minimum. As faras I know,he has stillnotyetpublished butit is a method which details, is quiteeffective andefficient and doesnotrequire theisolation ofa few outliers, allowing us therefore to progress perhaps alongtheroad ofrecognizing thatall of us excelin something-if onlywecan findout whatit is. Professor A. M. WALKER (University of Sheffield): I wishto ask Professor if he can Barnett make any comment about problems from arising tiesin any of thevariate values-thatis, the occurrence oftwoor more sample points suchthat thexi are notdistinct forallj. In theunivariate case thepresence oftiescan,in certain be troublesome, circumstances, when thedistriparticularly bution from sampled is discrete. However, perhaps thedecreased oftiesfor probability multivariate samples, ifthese especially occur inall thevariate values, makes suchproblems much lessimportant.

350

Barnett's Discussion Paper ofProfessor

[Part3,

All of us mustbe grateful to Professor Professor F. DOWNTON (University of Birmingham): of the different whichhave been adoptedin Barnett forhis comprehensive survey approaches notbeing theordering able to be present ofmultivariate to add myvoice studying data,and I regret to thevoteof thanks. of multivariate ordering and whatthey I wouldliketo raisea pointconcerning thepurposes in practice. forexample, and inference procedures, imply Thesepurposes include, quickestimating and problems fora particular assessment ofoutliers matheinvolving censoring. Quickprocedures themathematical of data from properties thatmodel matical modelcan be produced by studying an outlier and howdata are censored are morelikely to comefrom the only, butwhatconstitutes mathematics. It is this which makes nature oftheexperiment and theresulting datathanfrom their statistics sucha difficult which Professor (butalso fascinating) subject, butitdoes raisea question, discussed in thepaperarosefrom a Barnett maytreat as rhetorical. How manyof themethods thoughts suchas the practical need or have beenused in practice?And how manyarose from has failed mein trying interesting one he has in thelastsentence of Section 4.1? My imagination there wouldbe appropriate; to visualize an experiment forwhich thetype of C-ordering described can Professor ofwhathe has in mind? Barnett putme out of mymisery and giveme an example in writing, themeeting. The following contributions werereceived after to make on Professor Mr P. J. GREEN (University of Bath): I have two general comments Barnett's interesting paper. in one dimension. there aremany "ordering" Thesemaybe Firstly, alternative waysofdefining thedata lies,and thoseordering divided intothoseordering thespace in which thedata broadly thelatter theconcepts ofadjacency, we can distinguish orderliness and canonical directly. Among ofdata values. oftheset,rather thanthesequence, representation of order Theseconcepts different and it is thefact statistics, giveriseto thevarious properties ofthesubject.However, in one dimension thathas led to theapparent unity thatthey all coincide we find thatthese to extend order to higher whenwe attempt do not firstly concepts dimensions, themselves maynotbe so uniquely defined. necessarily coincide, and secondly thattheconcepts as M- and R-ordering are largely Thosemethods classified Barnett concerned with byProfessor seemmorenaturally thesamplespace; but manyorderstatistics defined in terms of a ordering of thedata, as considered underP-ordering. The idea of adjacency direct seemsmost ordering in no uniquemanner.Possibledefinitions could be based upon forgeneralization-but suitable tree linked in theminimum or pathofthedata,or uponsomeranking thoseobservations spanning A definition thatseemsattractive is to call xi,xi adjacent distances. oftheinter-point empirically ifthere is somepointx within theconvex hullofthedata forwhich d(x,x) = d(x,x) = min {d(x,xk): k = 1,2, ..., n}. notionof orderin morethanone to yielda very is likely None of theseconcepts complete forexample, as limited a notion of order as the butthisis perhaps dimension, unimportant when, and outliers. extremes to discuss hullof thedata is sufficient convex range, of multivariate be appliedto muchanalysis is one whichmight data. My secondcomment it shouldsurely be decidedunderwhattransformations suchprocedures of the Before discussing to be invariant. In thepresent "order" data are theconclusions context, presumably any useful But also undertranslation. underuniform shouldbe invariant changeof scale, and possibly two disjoint underrotations or unequalscale changeseemto form thatare invariant problems classes. (and possibly exhaustive) to rotation-invariant made above refer The suggestions data, as indeeddo many principally theaxesmayrepresent where Butfortheother ofproblem, methods. modern "data-analysis" type It is interesting M- and C-ordering seemmuchmoreattractive. incomparable variables, principles under all thetransformations 2.3 areinvariant thatthec-order of Section to note,however, groups above. mentioned that absence abroadprevents I regret Statistical KENDALL SirMAURICE (International Institute): ofa subject theauthor on a mostuseful thismeeting. I congratulate me from summary attending In one dimension and scattered. I have one majorcomment. theliterature is confused in which that ofthescale. It seemsto me desirable under transformation is invariant order anymonotonic

351 Barnett's Paper DiscussionofProfessor 1976] however shouldalso be theidea to morethanone dimension, imperfect, ofextending anymethods process, suchas thesequential ordering described inthepaper, Someofthemethods scaleinvariant. or distance are not. I wouldnot components metrics, suchas theuse ofprincipal are so. Others, as one are notmetric-free thatthey alone,butit has to be recognized them on thatground discard alone. based on order wouldliketo havea technique and tedious to apply method 2.3 is difficult of Section Two minor points:(a) The convex-hull and if linear programming of thehullrequires The determination in morethantwodimensions. theworkhas to be doneagain. (b) Thereare sample extra observations are added to theoriginal unit, on to any a visualdisplay through complex, a multivariate which willproject nowprograms way a relatively unsophisticated planes,and thisoffers theco-ordinate assigned plane,notmerely subjective basis. of rejecting outliers, albeitby eye on a somewhat us a fourfold classification Barnett has offered ofBath): Professor Mr A. ROBmNSON (University out thattheclassesmaynotbe data buthe points to multivariate of ordering methods applicable be attempting thisdifficulty arisesbecausewe shouldmorerelevantly Perhaps exclusive. mutually and ranking to whichwe wouldliketo apply"order-statistics" thetypesofproblem to classify to be treated as a unified subject deserve methods.The claimofDavid (1970)thatorder-statistics accommodation find useful univariate problems thatquiteseparate a little overstated; is possibly thatthey wereall bornat thatsameaddress. One in thehouseof order-statistics does notimply intheunivariate eachother which complement ofweapons thefull battery want doesnotnecessarily if we remove a particular point,thenwe do in problems adjacency involving case; forinstance, I agreethatmany pointsto be invariant. the "order"of theremaining not necessarily require in sometypeof ordering result data naturally of multivariate employed in theanalysis methods thatit demands of theproblem sucha solution. butit is in thenature I feelcannotbe carried overto which simplicity have an appealing methods Manyunivariate thanon theparticular rather principle on theordering ifoneconcentrates situation themultivariate facet oforder induced bytheproblem. of Durham), Mr PETER DIGGLE and Dr DENNIS EVANS Dr ALLAN SEHEULT (University to R- and P-ordering. We our attention of Newcastle upon Tyne):We shallconfine (University and of convex in Section2.2 on R-ordering) (introduced contours of probability find thenotions worth whileto reflect, and it is perhaps interesting, particularly hulls(Section2.3 on P-ordering) belowgivesthepH valuesfor in theunivariate case. The figure on their possibleutility initially, are two sampleswithpH plot; thus,there 52 soil samplesusingJ. W. Tukey's"stem-and-leaf" in due course). is given oftheitalicfigures value5-4,three with pH value5-6,etc.(an explanation 5 5 5 6 6 6 6 6 7 7 7
7

44 6667777 888888889 01111 2222 44 77 999 000 3 44 889


0 00 1 3

7
8 8

4445

to avoid to c-order which we prefer 26 P-order (a term For thesedata,we can define groups in Section2.4); group1 = {5 4, 8-5},group introduced confusion of C-ordering withtheconcept two points. The exactly 2 = {5-4, 8-4}and so on untilgroup26 = {6-2},each groupcontaining the oflocation.Thisis,ofcourse, measure thevalueof6-2as a summary suggests aboveprocedure case the last convexhull shoulddefine thatin themultivariate and we suggest samplemedian,

352

Barnett's DiscussionofProfessor Paper

[Part3,

as someenclosed suchas the themedian setofthedata; themedian be defined pointmaythen point that this is a reasonable inFigs5 and6 ofthepapersuggest measure centroid. Thesamples depicted theorigin. The important bothmedian feature here of location, setsbeingclose to, or enclosing, thatthereis no essential is thatfora univariate samplethe convexhull emphasizes difference are bothextreme-we or equivalently we do between thevalues5-4and 8'5; they ignore direction, notdistinguish between monotone increasing and monotone functions. Note in passing decreasing thatin themultivariate case theP-ordering obtained usingconvexhullsremains invariant under of thedata. linear transformations ofdispersion to the In a similar way,theinter-quartile rangeas a measure maybe generalized let Ci denotetheP-order theinteger multivariate case as follows: groupi convexhulland find k suchthatCkcontains at least50 percentof thedata pointsand Ck+1 at most50 per cent. The determined interquartile set,IQS, is then partially bytherelationship Ck+l 'IQS' Ckand,as with themedian, determined maybe uniquely bya suitable interpolation procedure. of partialordering and theresulting it produces is thatthe One criticism measures summary notionof ordering therein is metrical on thedensity of and does not dependdirectly properties In ourexample, thedata configuration. thevalues5-4and 8'5 areclearly extremes with respect to boththepartial hullapproach induced and thereduced induced ordering by theconvex ordering of the data. However, themedianvalue 6-2 is also relatively extreme by theempirical density to density. In fact, and can with thedata were collected as partof a discriminant respect analysis intotwoidentifiable be divided thesecondof which is indicated sub-groups, by theitalicfigures in thestem-and-leaf plot. Barnett The paper by Loftsgaarden and Quesenberry citedby Professor givesa simpleand whichcan be used to for a multivariate function workablenon-parametric estimator density contours evenwhentheunderlying distribution is unknown.The exploittheidea of probability hereis therefore thatwhenanalysing be assigned both suggestion data,each pointmayusefully itsP-order thereof. or someestimate Such information groupindexand its density, might give useful of thedata and thepossible of outliers intotheconfiguration insight existence and clusters. in itsownright! a setofbivariate It also provides data foranalysis Dr D. H. YOUNG In his interesting review of ordering (BrunelUniversity): procedures for Barnett formof C-ordering multivariate mentions the particular in whichthe data, Professor observations to linear of theordered xl, ..., x, are ordered byreference combinations component of ordering use forthistype occurs whenthe{x} are independent values. A possible multinomial then multinomial with common index is centred on ranking observations and interest distributions function of their ordered on thebasis of a linear cellprobabilities. For example, one might wish to select themultinomial distribution withthesmallest or theone with rangeof cellprobabilities themaximum cellprobability. zone or subset selection The usualindifference procedures couldbe considered and their use wouldrequire a study ofthedistribution of properties min max(xi, -xt) and other similar order statistics. in writing, The author as follows: replied I shouldlike to thankall the contributors to the discussion fortheir kindremarks, helpful comments and interesting proposals. It is not possibleto deal withall thepointsin detail,but someuseful comment can be made. summary Professor Plackett's on thehistorical oforder observations development statistics arefascinating; it wouldbe niceto hearmoreaboutthismatter.I do not,however, sharehis viewthatinference procedures basedon order statistics arerendered "largely obsolete" bytheadvent ofthecomputer. The acclaimedcomputational of orderstatistics simplicity methods has alwaysstruck me as a delusion.Linearforms are simple in principle butthedetermination of theappropriate weighting involves factors often tedious calculation madefeasible onlyby powerful computers. The reprerole of orderstatistics, sentational use of orderor rankin the construction of distribution-free statistical in the primary methods, probability plotting techniques of (particularly examination largedata sets)all seemto havecontinuing practical in an unabating relevance, reflected flow of publications.

Discussion ofProfessor Barnett's Paper 353 1976] In morethanone dimension we do of courselose "uniqueness and simplicity". My aim was in spite to demonstrate that, ofthis fundamental obstacle, theorder concept is widely and variously represented in multivariate work, extending farbeyond multivariate permutation tests! It was not myintention to judgethepropriety of suchuseage, merely to report it. I hopedthatthefourfold classification provided a reasonablebasis for indicating and objective. distinctions of attitude I feelthatProfessor Plackett's dichotomy of ordering principle (analytical, geometrical) is tidier and moremathematically objective, but it does not so easilydistinguish basic emphases.For example, whilst M- and R-ordering are analytical they tendto distinguish approaches concerned withlimited or overallordering aims, respectively. Indeed,the extended sub-classification of R-ordering suggested by Professor Mardiais appealing as a framework forfurther distinguishing operational objectives. Professor Mardiaprovides some interesting results witha useful set of additional references. The familiar in whatit impliesabout the Wilks relationship between St and D2 is important multivariate outlier rejection test,at least foran underlying The proposal normaldistribution. to test outliers byexamining minimum scatter ratios was advanced on an entirely byWilks intuitive Butanytestof outliers as a basisfora declaration argument. must involve someconcept of order thatcertain observations are "extreme".In thenormal case witha locationslippage alternative hypothesis we can set up a maximum likelihood ratiotestforoutliers which leads to identifying as observations outliers forwhich thevalue D2 is large;theoutlier is adjudgeddiscordant if D2 is sufficiently large. Thus theimplicit ordering basis is in terms of valuesof thedistance metric D2. Buttherelationship between Dt and Streveals theWilks is also a maximum that test likelihood ratiotest (fornormal data and a location slippage alternative) and is using againDt as an ordering basis. Professor ofcomparative Mardia,Dr Huitson and others essentially ask forsomeform assessmentof the different sub-ordering This is not feasibleat the moment; principles. individual in isolation in different principles figure (and often ofmultivariate onlyimplicitly) aspects analysis. Partof theaim in airing thistopicwas to try to encourage morespecific to be intercomparisons see someresults we might arisein thisarea. made; perhaps in whichappropriate Dr Paddle's two practical exampleswell illustrate typesof situation outlier multivariate methods need to be applied. Multivariate oftendo not show up outliers in themargins of thedata. merely Professor Barnard'sdescription of the robustmultivariate locationestimator of Morven We musthope to see moreof its credentials. In reply to Professor is interesting. Gentleman no useful Walker's abouttiedvaluesin discrete information. enquiry data,I can offer to thediscussion comment hullordering on theconvex in Manyof thecontributors proposed of probabilistic and manipulative as remarked Section2.3. It has thedisadvantages complexity, in its entirely data-oriented by Sir MauriceKendall,but it does have an appealingdirectness notionsof medianset and interquartile emphasis.The associated set,proposedby Dr Seheult, merit Mr Diggleand Dr Evans,are intriguing and surely further as does study. Theycomment, on therather of invariance in respect widedegree of c-order oftransformations Dr Green, groups basis. of theco-ordinate centres on the fundamental Most of the discussion of transferring orderconcepts problems dimension. thequestion ofinvariance ofordering from arises.Professor one,to higher, Frequently in extending to many difficulties order-invariance to themultidimensional Loynes points situation, thenotion of monotone transformations: a pointtakenup also by evenat thelevelof expressing Professor Lewis. Sir MauriceKendalland Dr Greenalso comment on theinvariance issue,the cautionwith to ordering thelatter former are not scale invariant, which urging respect principles an invariance underuniform insisting also undertranslation. changeof scale and possibly Most thatrotation-invariance can hardly be realistically and Professor agree, however, Lewis demanded, a verywide degreeof invariance evenin one-dimensional pointsout thatwe do not experience ordering. Otherfundamental are drawn. Professor distinctions Lewis Downton,Dr Green,Professor all suggest thatdifferent interests and Mr Robinson dictate different ordering principles, depending in relation on whether we assumesomeunderlying wishto order to thesample probability model, orderthedata themselves. I entirely of Section 2 was space or to directly agree,and mywriting muchinfluenced butperhaps I did notsufficiently stress Dr Green's them. bysuchconsiderations be basedon an "adjacency" has someinterest proposalthatorderings might principle particularly

354

Barnett's DiscussionofProfessor Paper

[Part3,

a "gap test"foroutliers inrelation to outlier identification. basedon IndeedRohlf (1975)develops theminimum spanning tree. of thetypeof c-ordering at theend of Professor Downton's request foran example proposed described we have the Section4.1 has been metby the examples by Dr Young. Additionally, thevaluesin each of non-parametric case wouldinvolve variety slippage tests.A simple ordering oftheir in terms extreme values. ofa setofsamples, and then thesamples ordering muchhas beensaid about bothgeneral In thepaper,and in themostinteresting discussion, difficulties principle, and detail, in multivariate A great exist.Butit seems ordering. many thatin Lewis. Whether thefinalresort we cannotavoid thepointmade by Professor we likeit or not ofmultivariate practical problems inevitably involve theordering data,and there remains theneed of thistopic. formuchmorestatistical investigation
REFERENCESIN THE DIscussIoN 55, 343-354. Biometrika, distribution. foruniformity of a circular test AJNE, B. (1968). A simple

New York: Wiley. DAVID, H. A. (1970). OrderStatistics. Biometrics, on thebasis of genefrequencies. populations EDWARDS,A. W. F. (1971). Distancesbetween

Soc. A, 131,265-279. analysis.J.R. Statist. ofregression Cox, D. R. (1968). Noteson someaspects 27, 873-881. 516-521.

43, of randomvariables.Ann.Math. Statist., of themaximum J. (1972). On thedistribution GALAMBOS, London: Macmillan. Inheritance. GALTON,F. (1889). Natural 3rdedn. Paris: Courcier. desProbabilites, Analytique LAPLACE,P. S. (1820). Theorie variation in plasma P. M. and BARON, D. N. (1973). Long term C. K., FRASER, LEVIN,G. E., MCPHERSON, location test forthebivariate problem.J.R. Statist. two-sample MARDIA,K. V. (1967a). A non-parametric Soc. B, 29, 320-342. in health. Clin. Sci., 44, 185-196. transaminase and alkalinephosphatase activities of aspartate

and kurtosis withapplications.Biometrika, skewness 57, 519-530. (1970). Measuresof multivariate Multivar. and angles. In Proc.4thInt.Symp. Anal.(P. R. Krishnaiah, distances (1975a). Mahalanobis Co. ed.). Ohio: D. ReidelPublishing T2 test.Appl.Statist., and therobustness of Hotelling's 24, of multinormality (1975b). Assessment 163-171. Package K. and BENT, D. H. (1975). Statistical J. G., STEINBRENNER, NIE, N. H., HULL, C. H., JENKINS, 2ndedn. New York: McGraw-Hill. fortheSocial Sciences, termof a stationary forthe maximum process. Ann.Prob.,2, O'BRIEN, G. L. (1974). Limittheorems 540-545. Methods to Industrial Standardization and Quality of Statistical PEARSON, E. S. (1935). The Application Institution. Standards Control.London: British of the conceptof sufficiency. Biometrika, 60, STIGLER,S. M. (1973). Laplace, Fisherand the discovery 439-445. ofroutine 19, 151-164. analysis.Biometrika, "STUDENT" (1927). Errors takenfrom a normal and therangeof samples individuals popuTIPPETT,L. H. C. (1925). On theextreme 17, 364-387. lation. Biometrika, As a result of the ballot held duringthe meeting,the followingwere elected Fellows of the Society. R. ABBESS, Christopher Sir Campbell ADAMSON, ALIS, David Michael B. KatherineM. BARRATT, Neil T. BEAMISH, BECKETT, James,III Howard J. CAPELIN, Kar Y. CHANG, Ralph T. CLARKE, COKER,Jonah B. D. DAYKIN, Christopher Paul J. DELAHUNTY, DIMou, Theodore Frank D. J. DUNSTAN, DUNN, Douglas M. FENYO,AndrewJ. David FERRIS, FISHER,William J. JanetP. GARNER, John GARNSWORTHY,
FRAGIADAKI-SALTAVAREA$

Hellas-Maria Robert F. GIBSON, Gipps, Peter G. A. G. Jennifer GOODWIN, Alastair G. S. HOUSTON, GREEN, JohnL. Joseph HIGGINS,

Philip G. HUGHES, David H. JoNus, JosephB. KADANE, KENT, JohnT. KNIGHr,JohnF. George KOKOLAKIS, IrvingH. LAVALLE, Deana M. LEADBETER, Sarah B. J. MACFARLANE, McGILL, Peter R. AnthonyW. MASTERS, David E. MATTHEWS, JohnR. MERCHANT, Abdalla E. MOHAMED,

1976]
NARAIN, Hugh H. NAYLOR, John C. NEWELL, Robert OGLE, Ian F. OKUSANYA, Adedayo PATER, John R. RIGBY, Michael J. RILEY, Patrick H. B. RUBIN, Donald RUST, John N.

Fellowsof theSociety
SADAT, Ali N. SAVAGE,I. R. SANDILANDS, Douglas W. SHAW, John E. H. SKEGG, Joy L. SPIEGELHALTER,David J. STEIN, George J. STEVENSON,Michael R. STUBBS, Peter A. SWAITHES,Gillian A. SZYMANKIEWICZ,Jan Z.

355
TANG,Victor K. T. TEICHMAN, Robert TRIGGS, Christopher M. UDOFIA,Godwin A. WEIR,Bruce S. WHITE,Patricia A. Paul D. WINTER, David K. WHYNES, ZAFAR YAB, Muhammad

0.

RUSTON, Paul K.

You might also like