You are on page 1of 12

Estimates of the Regression Coefficient Based on Kendall's Tau

Author(s): Pranab Kumar Sen


Source: Journal of the American Statistical Association, Vol. 63, No. 324 (Dec., 1968), pp. 13791389
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2285891 .
Accessed: 08/06/2014 14:31
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

ESTIMATES OF THE REGRESSION COEFFICIENT


BASED ON KENDALL'S TAU*
PRANAB KUMAR SEN

University
ofNorthCarolina,Chapel Hill
The least squares estimatorof a regressioncoefficient
g is vulnerable
to grosserrorsand the associated confidenceintervalis, in addition,
sensitiveto non-normality
of the parentdistribution.In this paper, a
simpleand robust (point as well as interval)estimatorof ,Bbased on
Kendall's [6] rankcorrelationtau is studied.The pointestimatoris the
median of the set of slopes (Y;-Yi)/(ti-ti) joining pairs of points
withti$tj, and is unbiased.The confidenceintervalis also determined
by two orderstatisticsof this set of slopes. Variouspropertiesof these
estimatorsare studied and comparedwith those of the least squares
and some othernonparametricestimators.

ET

1. INTRODUCTION

Y1,

Y. be n independentrandomvariables with distributions


-

PI Yi < x}I= Fi(x)=F(x

a-

3ti),

i = 1,

, n,

(1.1)

where F(x) is a continuouscumulativedistributionfunction(cdf), t1,*


t
are knownconstants (not all equal) and (a, ,3) are unknownparameters.Our
purposeis to considerpoint as well as intervalestimatorsof the regressioncog. If F(x) has a finitevariance a2(F), the best (i.e., minimumvariance
efficient
unbiased) linearestimatorof g is providedby the methodofleast squares. This
estimatoris vulnerableto gross errorsand is also inefficient
for distributions
with 'heavy tails' (e.g., double exponentialor logisticdef). Moreover,the associated confidenceinterval for g, being based on the assumed normalityof
F(x), is sensitive in small samples to any departurefromthis assumption.
Alternativeestimatorsofg based on suitable rank tests are proposedby Mood
and Brown [8], Theil [12] and Adichie [1], among others.Mood and Brown
propose to estimatea and A simultaneouslyfromthe two equations
Median(Yi

Median(Yi

= 0

for ti ? tM,

jti) = 0

for ti > tM,

At)

(1.2)

is the medianofti, * , tn.The pointestimate(a, A) is to be obtained


wheret11,
by a trialand errorsolutionand is subject to some arbitrarinesswhen tMis not
uniquely defined(a case that may arise when ti, * * *, tnare not all distinct).
Moreover,j is usually inefficient
as comparedto the otherestimators(cf. [1]).
A generalclass of pointestimatorsof/ (and also of a) is consideredby Adichie
[1]. However,his basic assumptionthat F(x) is an absolutelycontinuousand
symmetricdistributionfunctionwith an absolutely continuous and square
integrabledensityfunctionis morerestrictivethan what is reallyneeded in this
paper. Moreover,his pointestimatorsofA also requiretrialand errorsolutions.
Such a trial and errorproceduremay indeed be quite laboriouswhen n is not
verysmall. Finally,Adichiegivesno confidenceintervalforA. When t1,
,tn
* Worksupportedby theArmyResearchOffice,
Durham,GrantDA-ARO-D-31-124-G746.

1379

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

1380

AMERICAN

STATISTICAL

ASSOCIATION

JOURNAL,

DECEMBER

1968

are all distinct,Theil [12] proposes a very simple point estimatorof ,3,viz.,

the medianof the (2) slopes(Yj -Y) /(tj- ti), 1 < i <j < n. He also obtainsa

correspondingconfidenceintervalfor: in termsof these slopes. However,the


asymptoticpropertiesof the estimatorsare not studiedby him. The procedure
to be consideredin the presentpaper is quite analogous to Theil's, but is based
on weaker assumptionsand do not require ti, * * *, t. to be all distinct.If N
be the numberof non-zerodifferences
tj- ti (1< i <j!? in), the proposed point
estimatoris the median of the N slopes (Yj-Yj)/(t1-ti) for which ti5,ftj.
This is shown to be unbiased forA. The confidenceintervalfor g is also obtained in termsof two orderstatisticsof this set of N slopes. It is shown that
in the two-sample
the point and intervalestimatorsof the location-difference
case based on Wilcoxontest, proposed and studied by Hodges and Lehmanni
considered
[4], Lehmann [7] and Sen [10, 11] are special cases oftheestilmators
here.Propertiesof the estimatorssuch as invariance,unbiasednessand asymp(A.R.E.)
totic distributionare studied, and the asymptoticrelativeefficiency
of the proposed procedurewith respect to the least squares procedure an(d
Adichie's [1] proceduresare discussed. It is shown that for equally spaced
values of ti, * * *, t,,or for the two-sampleproblem (i.e., when ti's can have
only two values), the proposed estimatorhas A.R.E. never less than 0.864
with respectto the least squares estimator,though such a conclusionis not
necessarilytrue when tl, * *, t. are not equally spaced.
OF THE ESTIMATORS

2. FORMULATION

Without any loss of generalitywe may assume that tl? t2 * * * t<e; they
are alreadyassumedto be not all equal. We definec(u) to be 1, 0, or -1 according as u is >, = or <0. Let then

N =

E
1ci<j?n

c(tj- ti),

(2.1)

i.e., N is the numberof positive differences


tj- ti, so that N w(), where the
equality sign holds only when ti, * , t. are all distinct.For any real b, define
Z (b) = Y*-btj i = 1,X * * * n. We thenconsiderthe followingstatisticbasically
relatedto Kendall's [6] tau betweent*and Z*(b), i= 1, * * *, n.
Un(b) = AN

c(tj - ti)c(Zj(b) -Zj(b)).

(2.2)

score that would appear in the


Thus, {N(2) }I Un(b) is the difference-sign
of correlationbetweenthe t*and the (Y -bt*),
numeratorof the tau coefficient
in b for
forsome fixedb. Since t?
jti forall i <j, Zj(b) - Zi(b) is non-increasing
in b.
all 1 ?i< ij<n. Hence, from(2.2) it followsthat Un(b)is also non-increasing
Now, by definition,Z1(13),* * *, Zn(3) are n independentand identicallydisindependent of
tributed random variables having the cdf F(x-a,)
tn= (tl, * , tn). Consequently, U.(f3) will be an estimatorof 0, and will
statistic
be stochasticallysmall. In fact, U,,(f) is a strictlydistribution-free
having a distributionsymmetricabout 0 (cf. [6]). Thus one way of estimating
A is to make U,,(b) (by a properchoice of the estimatorb) as close to zero as
in b, therewill be an half-openinterval
possible. Since, Un(b) is non-increasing

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

REGRESSION

ESTMIMATE BASED

1381

ON TAU

(in b) forwhich Un(b) will be equal to zero. The mid-pointof this intervalsuggestsitselfas a naturalestimateof d. M\'athemnatically,
we definethe estimator
as follows.Let
Sup{b: Un(b) > 01,
I=nf{b: U(b) < 0.

(2.3)

=
2

Then, our proposedestimatoris


*=

1((31*+ 02*)

(2.4)

It may be noted that if instead of workingwith Kendall's tau, we workwith


the sample covariance of Z (b) and ti, i= 1,
, n, we will obtain the least
squares estimator
71
0

( y - F,)
Y(ti - tiv)/

(ti - tn)>

'
An explicit formula for j3*
where YnY(1/n)
1Yij and 4n=(1/n) =lti.
will be consideredin section3.
To constructa confidenceintervalfor,3based on Un(b), we again note that
statistichaving a distributionsymmetricabout 0.
U,n(3)is a distribution-free
Hence, dependingon the sample size n, we can always select (U*, E.) such that
(2.5)
PI -Un < Un(3) < Un I3} = 1 - En;
where 0 < <1.l For small values of n (say, n<10), we may use Table 1 of
Kendall [6, p. 171] to findappropriatevalues of U* and En. For large sample
sizes, we adopt the followingprocedure.Let tnbe composedof an(> 2) distinct
sets of elements,wherein the ith set thereare ui elementswhichare all equal,
, an. We define
for i= 1,
V=

(l/18){n(n - 1)(2n + 5)

an

uj(uj

j=~1

1)(2uj + 5)}.

(2.6)

is the variance of { N(2) } Un(3) with the standard correction for tied
Thus Vn
observations, in the form that applies when there are ties in only one variable,
(viz., t). Also, let rTbe the upper 100E% point of a standard normal distribution.
Then, from the results of Kendall [6] and Hoeffding [5], we obtain that

Un TS{n
jE

Af)}

where

C-n

E as n- ~.

(2.7)

Let us now define


Sliptb: Un(b) > -

fu
OL

If I{b:

(2.8)

Un(b) < Ua}.

From (2.5) and (2.8), we arriveat the following


P{L

< f < fuIA}


=I

1 -E,

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

(2.9)

1382

AMERICAN

STATISTICAL

ASSOCIATION

JOURNAL,

DECEMBER

1968

which is our proposed confidenceintervalfor,Bhaving the confidencecoefficient 1- E.(c l - e forlarge n). (2.9) providesan exact confidenceintervalwith
confidencecoefficient
1forall unknown(but continuous)F(x), no matter
whetherthe normalityand the finitenessof the variance of F(x) hold or not.
The exact expressionsforAL and 0* are consideredin the next section.
3. EXACT

EXPRESSIONS

FOR THE ESTIMATORS

We recall that among the (2) values of (tj- ti), 1 < i <j < n, onlyN (defined
by (2.1)) values are non-zero,and the correspondingvalues of Zj(b)-Zi(b)
only have contributionsto Un(b)in (2.2). We now considerthe set S of N distinctpairs (i, j) forwhichtj> ti, and define
-

(Yj

xj==

Yi)/(tj

ti),

(3.1)

(i,j)ES.

Thus, the Xij's are the slopes of the lines connectingeach pair of points (ti, Yi)
and (tj, Yj) wheretiS tj; the pairs of pointsforwhichti= tj are not considered.
It will be seen that the N quantitiesin (3.1) defineboth the point and interval
estimators.To do this,we arrangethe N values in (3.1) in ascendingorderof
magnitudeand denote the rth smallestvalue by X(r) forr= 1, * * * , N. Then,
looking at (2.2), we observe that if we computethe value of Un(X(r)), (r- 1)
of the differencesZj(X(r))-Zi(X(r)) (for which (i, j)ES) will be negative,
(N-r) will be positive and the remainingone will be exactly equal to 0. As
such, U,(X(,)) willbe equal to (N-2r+ 1)/ { N() } 2. Similarly,{ N(2) } 2U,(X+))
will be equal to (N - 2r), where X+ (or X-) indicates that the value is just
greaterthan (or less than) X. Now, we writeN= 2M or 2M+1 accordingas
N is even or odd. For N=2M+1,
we observe that Un(X(m+1))=0, while
and
for N = 2M, it followsthat for
>0
<0.
Similarly,
Un(X(+1))
Un(X+ 1))
any b in the open interval (X(M), X(M+1)), Un(b)= 0, while it is positive or
negative accordingas b is <X(M) or X(M+1). Hence, from (2.3) and (2.4),
we obtain
X(M+1)2
(2(X(M) + X(M+1)),

N = 2M + 1,
N = 2M.

(3.2)

Thus ,3*is the medianofthe N numbers{Xij: (i, j) E S }. To obtain the expressions for03 and i3 , we let
N*

{U(

)}*U

and Mi = 4(N+(-1)iN*)

fori = 1, 2,

(3.3)

where U,*is definedby (2.5). From (2.8), (3.3) and the observationsmade
above, it follows that Un(X(M1)) = (N*+1)7 {N(2) } 2> Un* but Un(X1M,))
=N*/{NI }= Un*.Hence f3*
=X+1). Similarly, 3=X(M2J1) Hence,
P{X(M1)

<

< X(M2+1) | A} =

E,.

(3.4)

It may be noted that the classical least squares estimator A, definedjust


after (2.4), can also be expressed as a linear functionof the slopes {Xij:
(i, j) ES }. In fact, : is a weightedmean of the variables Xii with weights
equal to (tj - t) 2, whereas 3*is the median of the same set of variables. Since

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

REGRESSION

ESTIMATE

BASED

1383

ON TAU

the median is less affectedby grosserrorsor outliersthan a weightedaverage,


it followsthat ,B*will be morerobustthan (3.
We also note that the two sample location problem (cf. [4, 7, 10, 11]) is a
special case of the general regressionproblem studied here. In this case,
= tn 1=0 and t,,1+1=
-*
t=
=41+,n2=1 (wherenl=fni+fn2, n,<n). Thus,
N=nin2 and d* is the median of the nln2differences
(Yj-Y*), .jn=+n , ** *,
ni. Also, L and 0 are definedas the M1th and (M2+ 1)th
n1+n2, i- 1,
orderstatisticsof these nmn2
differences
whereM1l and M2 are definedby (3.3)
and are based on the Wilcoxontwo-sampletest (cf. [7, 10, 11]).
.

4. AN ILLUSTRATIVE EXAMPLE

We considerthe followingdata fromGraybifl [3, pp. 119-1201, also consideredby Adichie [1].
ti 1
yi 9

10 12 18

15 19 20 45 55 78

The least squares estimateof (3 is 4.02. Since all ti's are distinct,N (2) = 21.
The values of Xi, definedby (3.1) are obtained as (in ascendingorder)
1, 2.5, 2.88, 3.67, 3.71

3.75, 3.88, 3.93, 3.94, 4, 4, 4, 4,

4.06, 4.14, 4.18, 4.25, 4.75, 5, 5, 6.


Thus the point estimateof ( is X(11)= 4, whichis the same value obtained by
Adichie [1]. He has, however,employed a trial and errorprocedurefor the
computationof his estimator,as the exact expressionin (3.2) is not applicable

in hiscase (cf. [1, section3)].

Now, fromTable 1 of Kendall [6, p. 1711, we observe that corresponding


to a value of E.=0.07, the value of UZ in (2.5) is equal to 11/21. Thus, from
(3.3), we obtain that N* = 11, M1 = 5 and M2 = 16. Consequently,from(3.4),
we obtain that the open interval (3.71), 4.18 providesa 93% confidenceinterval for(, valid forall continuousF(x).
5. REGULARITY PROPERTIES OF THE ESTIMATORS

I. Invariance.We note that if we defineWi = cl+c2Yi and si= di+d2ti,i= 1,


from0), the regressionparameterof W
, n, (wherec2and d2 are different
on s will be equal to (c2/d2)(3.It is easy to verifythat like the least squares
estimator, the pointestimator(3*in (2.4) satisfiesthisrelation.The estimators
AL and (3 in (2.8) also satisfythis conditionand as a result,the confidence
intervalin (2.9) may be regardedas invariantunderlineartransformations
on
the variables. Let us denotethe pointestimatorin (2.4) by (3*(Yn, tn)to denote
. , Y,,) and tn= (tl, * * , tn). Then, it readily
its dependenceon Y.= (Y1, * *
followsfrom(2.2), (2.3), and (2.4) that
3*(Yn+ at.,

t,) =

3*(Yn, tn) + a

forall real a.

(5.1)

The same invariancerelationis also satisfiedby (L* and (3*in (2.8), and as a
result the confidenceinterval in (2.9) is also invariant in the above sense.
Again, by a straightforward
generalizationof the porofof Theorem 1 of [4],

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

1384

AMERICAN

STATISTICAL

ASSOCIATION

JOURNAL,

DECEMBER

1968

it cani be shown that if F (x) is continuous(or absolutelycontinuous)then so


are the cdfsof all the statisticsj3*, 32*, /L, Au* and 3u'-O3
II. Unbiasedness.We have the followingtheoremestablishingthis property
of 3*.

about thetrueparameterj3.
of A* is symmetric
Theorem5.1. The distribution
Proof.By virtueof (5.1), we may assume withoutany loss of generalitythat
,B=O. Rewriting Un(O) as U(Y., tn), we have from (2.2) that U(-Yn, tn)
= - U(Y., t.). Also, for /= 0, U.(O) has a distributionsymmetricabout 0
(cf. Kendall [6, p. 68]). Hence, U(Yn, tn) and, U(-Yn, tn) have the same
Alsofrom(2.3) and (2.4),weobtainthat0*(Y., tn) =
distribution.
0*(_ Y, tn).
Hence, the distributionof /* (Y., tn), beingthe same as of /*( Y,,, t,), is also
symmetricalabout 0, the assumed value of A. Q.E.D.
III. Validitywhenbothvariablesare subjectto errors.We considerhere the
moregeneralcase, in which t. is not observable and the observable (random)
variable is Wn= (W1, . .. , Wn),whereW = t+vi, i-= 1, . . . , n. It is assumed
where (es, vi) are stochasticallyindependent,for i=1,
that Yi=a+?ti+ei,
* *, n. Thus, havingobserved(Ye, Wi), i= 1, * - *, n, we want to estimateB.
Theil [i1 ] consideredthisproblemunderthe assumptionsthat (i) P { Iv > gi
0 forsome finitegi(>O), (ii) it--tiI >gi+gj forall isr3j,and (iii) the random
variables Ei=ei-/vi, i=1, * * , n, are all independentand identicallydistributed. Under these assumptions,P { Wi 5 Wj, Vi 5j } = 1. Thus the Wi's
occur in the same order as the ti's and we can consider (with probability1)
WIV<W2< ... <Wn, so that N, definedby (2.1), is equal to (2). Hence, definingUn(b) as in (2.2), with ti's replaced by Wi's, we obtain that here with
probability1,

Un(/3) (2)

c(Yj -

i - /(Wj- wi))
(5.2)

(2)

(<i

which is symmetricallydistributedabout 0. Consequently,proceedingas in


Theorem5.1, we may concludethat the estimate/* in (3.2) (with ti's replaced
by Wi's) is unbiased forA. The invariancepropertyalso holds in this case.
Otherpropertiesof the estimatorare consideredin the next section.
6. ASYMPTOTIC

PROPERTIES

OF THE ESTIMATORS

Here we shall consider(i) the asymptoticnormalityofthe pointestimatorin


(2.4), (ii) asymptoticpropertiesof the confidenceintervalin (2.9), and (iii)
of the point and intervalestimatorswith
the asymptoticrelative efficiencies
based on the least squares principle.
estimators
the
to
corresponding
respect
For this purposewe defineT4 as in section2, and let
Tn=

i=1

n(t-

4%)2,

An = (1/12) n(n2-

1)-E uj(uj- 1)}


jl

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

(6.1)

REG(RESSION

ESTIMATE

BASED

1385r)

ON TAU

wherean and ujs are definedjust before(2.6), and also let


n
Pn

(i-

i=l

1?

2(n

+ 1))((ti.) 4)/(TnA,).

(6.2)

That is, pn iS the product momentcorrelationcoefficient


between (t, * , tn)
and (1, * * *, n), as adjusted forties. Finally,we assume that F(x) is absolutely
continuoushaving a continuousdensityfunctionf(x) satisfying
f2(x)dx <

B(F)

cc.

(6.3)

_00

Then, we have the followingtwo theoremswhose proofsare supplied in the


Appendix. (In orderto take care of the asymptoticsituationwe conceive of a
sequence of sample sizes and a correspondingsequeniceof estimators,defined
by (2.4), (2.8) and (2.9). We shall attach the suffixn to these estimatorsto
denote such a sequence.)
Theorem6.1. If (i) pn is strictlypositiveand (ii) Tn-* oo as n-* oo, thenl
withzero mean and varia normaldistribution
p00Tn
(4-fi) has asymptotically
ance 1/(12B2(F)).
converges
Theorem6.2. Undertheconditionsof Theorem6.1, pn0Tn(un*,,,-0,)
to7,/2/(V\3 B(F)), whereE is thelimitingvalueofEn,defined
in probability
by(2.9).
We denote the sequence of least squares estimatorsby X and the allied
confidenceintervals (correspondingto the same confidencecoefficient1-e)
by &L,n< A3?U,n. Then,it is wellknownthat(i) Tn(A n-0-) has asymptotically
a normal distributionwith 0 mean and variance o-2(F), (where o-2(F) is the
variance of the cdf F(x),) and (ii) Tn (A, n-L,n)
convergesin probabilityto
may
be
referredto Eicker [2].)
a(F).
(In
this
connection,
the
reader
2,ri
Now, to study the asymptoticrelative efficiency(A.R.E.) of j*, with respect
to n we comparethe reciprocalsoftheirasymptoticvariances,and obtain that
A.R.E.(0*/A)

12o2(F)p2B2(F),

(6.4)

as in [7, 11]
as n-- oo. Similarly,
we compare the reciprocalsof the squares of the limitingvalues of Tn(O3U,n
and Tn (I ,-L,n) as a measureoftheirA.R.E., and arriveat (6.4) as
-L.n)
the A.R.E. of the confidenceintervalin (2.9) with respectto the confidence
intervalderivedfromthe least squares estimators.We shall now study (6.4)
in more detail. For this, we recall that tB is composed of an distinctsets of
elements,wherein the jth set thereare uj elementswhich are all equal to t ,
. <t*. Let Rj=uo+ * +uj_
say, for j=1, **
an (?2), where t*<
+ I(uj+l), forj=1, * *, an whereuo= 0. Then, we have the following:
Theorem6.3. 0 <Pn <1, wherethe upper bound 1 is attainedif and only if
t= a+ bRj forall j 1, * * *, a., where
b is positive.
Proof.Since tl?t2< ...
<tn, the numeratoron the righthand side of (6.2)
is non-negative,and hence, Pn2>0. To prove that Pn<?, we rewriteTnAnpn as
,uj (Rj -+) (tj -in) whichby the Cauchy-Schwarzinequalityis less than
or equal to TnAn, where the equality sign holds if and only if (t*- Tn)= b(R1
), forall j 1, * * *, an. This completesthe proofof the theorem.
-2+
to the limitp2(>0)
providedp2 converges

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

1386

AMERICAN

STATISTICAL

ASSOCIATION

JOURNAL,

DECEMBER

1968

Two particularcases where pn= 1 are of special interest.First, the general


regressionproblemwithequispaced independentvariableswheret,= t1+ (i- 1)h,
h>0, i = 1, . . . , n. The secondcase relatesto the experimentaldesignwhereall
the observationsare placed at the two end-pointsof an intervalfor the optimum least squares estimation of the slope, i.e., when tl= . . . t,n=t*
= t>
* t*
*>t, where ni<n. (As has been noted earlier,the second
tni+=
case also resemblesthe classical two-samplelocation problem.) In eithercase,
we shall say that the independentvariables are optimallydesignedif pn -1.
optimally
We shall also say that the independentvariables are asymptotically
designedif Pn--*l as n- oo0. As an example,considerthe followingdesign:
tf? -2

-1

tn - 0

(6.5)

m m 1 n=2m+2.

Uj

1 2

Here, clearly p.->l as n-> oo. From theorem6.3 and the above discussionwe
readilyarriveat the followingtheorem.
Theorem6.4. A.R.E. (j3*j ? 12cr2(F)B2(F),wheretheequalitysign holds if
optimallydesigned.
variablesare (at least asymptotically)
theindependent
Thus, foroptimal or asymptoticallyoptimal designs,the A.R.E. of A* relative to A is the same as that of the Wilcoxontest withrespectto the Student's
t-test(forthe two sample location problem). Thus, as in [9, p. 89], it follows
that (i) when F(x) is normal,this A.R.E. is equal to 3/7r= 0.955, (ii) when
F(x) is logisticor double exponential,it is greaterthan unity,(iii) fordistribularge and
tionswith 'heavy tails' (such as Cauchy etc.), it may be indefinitely
(iv) forany continuousF(x), it cannot be less than 0.864. On the contrary,if
ti, - - *, tnare not optimallydesigned,so that Pn does not tend to 1 as n-> 00,
this A.R.E. may not have any lower bound (such as 0.864 or so). In fact, if
Pn->O as n- *o, so also willthisA.R.E. As an exampleof a bad design,consider
the following
A)

t
Uj

-m
1

1 m (m > 1)(66)
m m 1 n=2m+2.

-1

computationsit followsthat
By straightforward
Pn=

m(3m + 1)/{m(m + 1) (m3+ 4m2+ 4m + 1)}

0(3/mi) = 0(n-1),

(6.7)

and this convergesto zero as n-> 0o . In spite of such pathologicalexamples,in


actual practice,pn is usuallywell away from0, and as a result(6.4) can be used
of d*. However,theorem6.1,
to providea reasonableidea about the efficiency
(6.4) and theorem6.4 clearlyindicate that if the choice of t. is leftto the exhe should always tryto select tn in such a way that (i) p. is either
perimenter,
exactlyor nearlyequal to 1 and (ii) T is maximumforthe practicable range
ofvaluesofti, - - - , tn.
It is also worthcomparingthe A.R.E. of A* with respectto the estimators
proposed by Adichie [1]. His estimatesare in fact based on a class of'mixed
rank' statisticsof the type > n (ti-4n)1Pn(Rj/(n+1)), where Rj refersto the
rank of Yj among Y1, * * , Yn, and AlniS some suitablerank score. For general
is given by
A,n, the expressionforthe A.R.E. of his estimatorwithrespectto
(6.1) of [1]. Hence, the A.R.E. of ,* with respectto his estimatorcan be obA

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

REGRESSION

ESTIMATE

BASED

1387

ON TAU

tained fromour (6.4) and his (6.1). A special case consideredby him in section
3 [1, pp. 896-897] is the estimatorA,&based on the Wilcoxon-scoresstatistic
i.e., on EJ=n(tj-bRj, and in this case, the A.R.E. of , with respect to A
comes out as 12r2(F)B2(F). Thus, the A.R.E. of A* with respectto AX is equal
from0. This means
to the limitingvalue of pn,providedsuch a limitis different
are asympthat foroptimumor asymptoticallyoptimumdesigns,,3*and 0,&
but, unlike A*, &,is not affectedby bad design of
toticallyequally efficient,
tn. However, this is not unexpected. ,, like ,B,utilizes the exact values of
X, *, tn in the mixed-rankstatistic,whereas/* only utilizestheirordering.
On the otherhand, :, has to be obtained by a trial and errorsolution,whereas /3*can be obtainedsimplyas the medianof the slopes. So in actual practice,
if pn is close to unity,it may definitelybe of some advantage to consider a
but quick estimatorratherthan a computationally
(possibly)slightlyinefficient
complicatedone.
In passing,we may remarkthat by virtueof theorem6.2,
X(F)

re2/ 1 \/ip,Tn

,n

OL,n)

B(F),

as n

??

(6.8)

for all absolutelycontinuousF(x). This resultis an immediategeneralization


of a similarresult(forthe two samplelocationproblem)(cf. [7, 11]) to the more
generalregressionproblem.
7. APPENDIX

The proofsof theorems6.1 and 6.2 are based on the following.


positiveand (ii) Tn-? oo as n-* o thenunder
Theorem7.1. If (i) pn is strictly
Ho0 =0, [{N(2) }AUn(b/Tn)
+4bB(F)pnAn]/V2 has asymptoticallya normal
withzero mteanand unit variance,whereN, Un(b/Tn),Vn, Tn and
distribution
An, Pnand B(F) are definedby (2.1), (2.2), (2.6), (6.1), (6.2) and (6.3) respectively.
Proof. We note that for large Tn, E {c(Zj(b/Tn)-Zj(b/Tn)) | H0}
Yi> (b/Tn)(ti-ti)) -1, (where P0 indicates that Ho is assumed to
-2Po(Yj-

be

true), reduces to

-2b(tj-ti)B(F)/Tn+o(T,-

1).

Also, we note that

Ei<j(tj-tj)=2pnAnTn. Hence, it follows from (2.2) that E{Un(b/Tn)|HO}


-4lB(F)p,An/j{N(2G)j}+o(1). In a similar manner,it can be shown that
{N() } Var [Un(b/Tn)]/Vnconvergesto one as n->t. Finally, the asymptotic
normalityof U>(b/T>) followsreadily fromTheorem 7.1 of Hoeffding[5],
afternotingthat Un(b/Tn)is a U-statisticforall real b. Q.E.D.
Proof of Theorem6.1. Here also we assume withoutany loss of generality
that =0. Then, it followsfrom(2.2), (2.3) and (2.4) that forany real a,
lim Po{PnTnjn< a}l

lim Po{ Un(a/pnTn)< 0?

7&-4+00

11-4>00

(7.1)
=

lim G(4aB(F)An/Vn),

n->oo

by theorem7.1, where G(x) is the standard normal cdf. Now, it followsfrom


(2.6) and (6.1) that A /Vn->3/4as n-* oo. Consequently,4B(F)An/V,' tends
to \/12B(F), and this completesthe proof.

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

1388

AMERICAN

STATISTICAL

JOURNAL,

ASSOCIATION

DECEMBER

1968

For the proof of theorem 6.3, we note that for any two real and
finite (b, '), under Ho:-=0, the covariance of {N(2)/V-}iUn(b/{pnTn})
and {N(2)/Vn}`Un(b'/pnTn}) can be shown to be asymptoticallyequal to
unity.Hence, using the resultsof theorem7.1, we see that as n-* 00,

{X

/Tn}E:{ Un(b1/{p,T,,j - U Z;(b/ {PnTX})

{N

()

(7.2)

O.
}
Vn}Var{Un(b/{ip,Tn}) - Un(b'/jpnTn})

(7.3)

4(b'

b)B(F)AV

(7.2) and (7.3) along with the Chebyshev'sinequalityimplythat

{N()/V

n}{U

n(b/{pT})

Un(b'/{pnTn})}
(7.4)
-4(b'

b)B(F)An/Vt -> 0.

Now, proceedingas in theorem7.1, it followsaftersome manipulationsthat


pnTn(/un,.0-)has asymptotically a normal distribution with mea'n TnE2
/{ I12B(F) } and variance1/{12B2(F) }. This impliesthat
n - A) - r/2/{ V/I2B(F) } I is bounded in probability, (7.5)
and similarly,it can be shown that

I PnTnu

I pnTn(,L,n ) + ni2/1{VI2B(F) } I is boundedin probability.(7.6)


From (7.4), (7.5) and (7.6), we may conclude (on notingthat by assumption
j3=0) that

{NQ)/~vf}[U nl(/3L
{N(2)

-U.(Qt3n}

/Tf

- /LLn)B(F) An/Vn+ cp(l).


4p.Tn(O3Un

(737)-J(X )]

Now, by (2.7) and (2.8), the lefthand side of (7.7) convergesto 2r,q2,and also,
An/V1->VA3/2as n-> 00. Hence, theorem6.2 followsfrom(7.7). Q.E.D.
ACKNOWLEDGMENT

The authoris gratefulto the editor,the associate editorand the refereesfor


theirvaluable commentson the paper and to ProfessorHerbertA. David for
his carefulreadingof the manuscript.
REFERENCES

[11 Adichie,J. N., "Estimatesof regressionparametersbased on rank tests," Annals of


Mathematical Statistics, 38 (1967), 894-904.

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

REGRESSION

ESTIMATE

BASED

ON TAU

1389

[2] Eicker, F., "Asymptoticnormalityand consistencyof least squares estimatorsfor


AnnalsofMathematical
Statistics,34 (1963), 447-56.
familiesoflinearregressions,"
[31 Graybill, F., Introduction to linear statistical models, Volume 1. McGraw-Hill Book
Company, New York, 1961.
[41 Hodges,J. L., Jr.,aridLehmann,E. L., "Estimatesoflocationbased on ranktests,"
Statistics,
34 (1963), 598-611.
AnnalsofMathematical
[5] Hoeffding,
W., "A class of statisticswith asymptoticallynormaldistribution,"AnStatistics,19 (1948), 293-325.
nals ofMathematical
and Company: London.
methods.Charles Griffin
[6] Kendell, M. G., Rank correlation
Second edition,1955.
[7] Lehmann,E. L., "Nonparametricconfidenceintervalsfor a shiftparameter,"AnStatistics,34 (1963), 1507-12.
nals ofMathematical
to thetheoryof statistics.McGraw-HillBook Company:
[8] Mood, A. MI.,Introduction
New York, 1950.
statistics.
JohnWiley:New York,1967.
[9] Noether,G., Elementsofnonparametric
[10] Sen, P. K., "On the estimationofrelativepotencyin dilution(-direct)assays by dis19 (1963), 532-52.
tribution-free
methods,"Biometrics,
of a
methodof estimatingasymptoticefficiency
[11] Sen, P. K., "On a distribution-free
Statistics,37 (1966), 1759-70.
class of non-parametric
tests,"Annals of Mathematical
methodof linear and polynomialregressionanalysis,"
[12] Theil, H., "A rank-iinvariant
Proc.,53 (1950), 386-92, 521-5 and 1397-412.
I, II, and III, Nederl.Akad. Wetensch.

This content downloaded from 129.107.2.7 on Sun, 8 Jun 2014 14:31:15 PM


All use subject to JSTOR Terms and Conditions

You might also like