Professional Documents
Culture Documents
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
University
ofNorthCarolina,Chapel Hill
The least squares estimatorof a regressioncoefficient
g is vulnerable
to grosserrorsand the associated confidenceintervalis, in addition,
sensitiveto non-normality
of the parentdistribution.In this paper, a
simpleand robust (point as well as interval)estimatorof ,Bbased on
Kendall's [6] rankcorrelationtau is studied.The pointestimatoris the
median of the set of slopes (Y;-Yi)/(ti-ti) joining pairs of points
withti$tj, and is unbiased.The confidenceintervalis also determined
by two orderstatisticsof this set of slopes. Variouspropertiesof these
estimatorsare studied and comparedwith those of the least squares
and some othernonparametricestimators.
ET
1. INTRODUCTION
Y1,
a-
3ti),
i = 1,
, n,
(1.1)
Median(Yi
= 0
for ti ? tM,
jti) = 0
At)
(1.2)
1379
1380
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
DECEMBER
1968
are all distinct,Theil [12] proposes a very simple point estimatorof ,3,viz.,
the medianof the (2) slopes(Yj -Y) /(tj- ti), 1 < i <j < n. He also obtainsa
2. FORMULATION
Without any loss of generalitywe may assume that tl? t2 * * * t<e; they
are alreadyassumedto be not all equal. We definec(u) to be 1, 0, or -1 according as u is >, = or <0. Let then
N =
E
1ci<j?n
c(tj- ti),
(2.1)
(2.2)
REGRESSION
ESTMIMATE BASED
1381
ON TAU
(in b) forwhich Un(b) will be equal to zero. The mid-pointof this intervalsuggestsitselfas a naturalestimateof d. M\'athemnatically,
we definethe estimator
as follows.Let
Sup{b: Un(b) > 01,
I=nf{b: U(b) < 0.
(2.3)
=
2
1((31*+ 02*)
(2.4)
( y - F,)
Y(ti - tiv)/
(ti - tn)>
'
An explicit formula for j3*
where YnY(1/n)
1Yij and 4n=(1/n) =lti.
will be consideredin section3.
To constructa confidenceintervalfor,3based on Un(b), we again note that
statistichaving a distributionsymmetricabout 0.
U,n(3)is a distribution-free
Hence, dependingon the sample size n, we can always select (U*, E.) such that
(2.5)
PI -Un < Un(3) < Un I3} = 1 - En;
where 0 < <1.l For small values of n (say, n<10), we may use Table 1 of
Kendall [6, p. 171] to findappropriatevalues of U* and En. For large sample
sizes, we adopt the followingprocedure.Let tnbe composedof an(> 2) distinct
sets of elements,wherein the ith set thereare ui elementswhichare all equal,
, an. We define
for i= 1,
V=
(l/18){n(n - 1)(2n + 5)
an
uj(uj
j=~1
1)(2uj + 5)}.
(2.6)
is the variance of { N(2) } Un(3) with the standard correction for tied
Thus Vn
observations, in the form that applies when there are ties in only one variable,
(viz., t). Also, let rTbe the upper 100E% point of a standard normal distribution.
Then, from the results of Kendall [6] and Hoeffding [5], we obtain that
Un TS{n
jE
Af)}
where
C-n
E as n- ~.
(2.7)
fu
OL
If I{b:
(2.8)
1 -E,
(2.9)
1382
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
DECEMBER
1968
which is our proposed confidenceintervalfor,Bhaving the confidencecoefficient 1- E.(c l - e forlarge n). (2.9) providesan exact confidenceintervalwith
confidencecoefficient
1forall unknown(but continuous)F(x), no matter
whetherthe normalityand the finitenessof the variance of F(x) hold or not.
The exact expressionsforAL and 0* are consideredin the next section.
3. EXACT
EXPRESSIONS
We recall that among the (2) values of (tj- ti), 1 < i <j < n, onlyN (defined
by (2.1)) values are non-zero,and the correspondingvalues of Zj(b)-Zi(b)
only have contributionsto Un(b)in (2.2). We now considerthe set S of N distinctpairs (i, j) forwhichtj> ti, and define
-
(Yj
xj==
Yi)/(tj
ti),
(3.1)
(i,j)ES.
Thus, the Xij's are the slopes of the lines connectingeach pair of points (ti, Yi)
and (tj, Yj) wheretiS tj; the pairs of pointsforwhichti= tj are not considered.
It will be seen that the N quantitiesin (3.1) defineboth the point and interval
estimators.To do this,we arrangethe N values in (3.1) in ascendingorderof
magnitudeand denote the rth smallestvalue by X(r) forr= 1, * * * , N. Then,
looking at (2.2), we observe that if we computethe value of Un(X(r)), (r- 1)
of the differencesZj(X(r))-Zi(X(r)) (for which (i, j)ES) will be negative,
(N-r) will be positive and the remainingone will be exactly equal to 0. As
such, U,(X(,)) willbe equal to (N-2r+ 1)/ { N() } 2. Similarly,{ N(2) } 2U,(X+))
will be equal to (N - 2r), where X+ (or X-) indicates that the value is just
greaterthan (or less than) X. Now, we writeN= 2M or 2M+1 accordingas
N is even or odd. For N=2M+1,
we observe that Un(X(m+1))=0, while
and
for N = 2M, it followsthat for
>0
<0.
Similarly,
Un(X(+1))
Un(X+ 1))
any b in the open interval (X(M), X(M+1)), Un(b)= 0, while it is positive or
negative accordingas b is <X(M) or X(M+1). Hence, from (2.3) and (2.4),
we obtain
X(M+1)2
(2(X(M) + X(M+1)),
N = 2M + 1,
N = 2M.
(3.2)
Thus ,3*is the medianofthe N numbers{Xij: (i, j) E S }. To obtain the expressions for03 and i3 , we let
N*
{U(
)}*U
and Mi = 4(N+(-1)iN*)
fori = 1, 2,
(3.3)
where U,*is definedby (2.5). From (2.8), (3.3) and the observationsmade
above, it follows that Un(X(M1)) = (N*+1)7 {N(2) } 2> Un* but Un(X1M,))
=N*/{NI }= Un*.Hence f3*
=X+1). Similarly, 3=X(M2J1) Hence,
P{X(M1)
<
< X(M2+1) | A} =
E,.
(3.4)
REGRESSION
ESTIMATE
BASED
1383
ON TAU
4. AN ILLUSTRATIVE EXAMPLE
We considerthe followingdata fromGraybifl [3, pp. 119-1201, also consideredby Adichie [1].
ti 1
yi 9
10 12 18
15 19 20 45 55 78
The least squares estimateof (3 is 4.02. Since all ti's are distinct,N (2) = 21.
The values of Xi, definedby (3.1) are obtained as (in ascendingorder)
1, 2.5, 2.88, 3.67, 3.71
t,) =
3*(Yn, tn) + a
forall real a.
(5.1)
The same invariancerelationis also satisfiedby (L* and (3*in (2.8), and as a
result the confidenceinterval in (2.9) is also invariant in the above sense.
Again, by a straightforward
generalizationof the porofof Theorem 1 of [4],
1384
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
DECEMBER
1968
about thetrueparameterj3.
of A* is symmetric
Theorem5.1. The distribution
Proof.By virtueof (5.1), we may assume withoutany loss of generalitythat
,B=O. Rewriting Un(O) as U(Y., tn), we have from (2.2) that U(-Yn, tn)
= - U(Y., t.). Also, for /= 0, U.(O) has a distributionsymmetricabout 0
(cf. Kendall [6, p. 68]). Hence, U(Yn, tn) and, U(-Yn, tn) have the same
Alsofrom(2.3) and (2.4),weobtainthat0*(Y., tn) =
distribution.
0*(_ Y, tn).
Hence, the distributionof /* (Y., tn), beingthe same as of /*( Y,,, t,), is also
symmetricalabout 0, the assumed value of A. Q.E.D.
III. Validitywhenbothvariablesare subjectto errors.We considerhere the
moregeneralcase, in which t. is not observable and the observable (random)
variable is Wn= (W1, . .. , Wn),whereW = t+vi, i-= 1, . . . , n. It is assumed
where (es, vi) are stochasticallyindependent,for i=1,
that Yi=a+?ti+ei,
* *, n. Thus, havingobserved(Ye, Wi), i= 1, * - *, n, we want to estimateB.
Theil [i1 ] consideredthisproblemunderthe assumptionsthat (i) P { Iv > gi
0 forsome finitegi(>O), (ii) it--tiI >gi+gj forall isr3j,and (iii) the random
variables Ei=ei-/vi, i=1, * * , n, are all independentand identicallydistributed. Under these assumptions,P { Wi 5 Wj, Vi 5j } = 1. Thus the Wi's
occur in the same order as the ti's and we can consider (with probability1)
WIV<W2< ... <Wn, so that N, definedby (2.1), is equal to (2). Hence, definingUn(b) as in (2.2), with ti's replaced by Wi's, we obtain that here with
probability1,
Un(/3) (2)
c(Yj -
i - /(Wj- wi))
(5.2)
(2)
(<i
PROPERTIES
OF THE ESTIMATORS
i=1
n(t-
4%)2,
An = (1/12) n(n2-
(6.1)
REG(RESSION
ESTIMATE
BASED
1385r)
ON TAU
(i-
i=l
1?
2(n
+ 1))((ti.) 4)/(TnA,).
(6.2)
B(F)
cc.
(6.3)
_00
12o2(F)p2B2(F),
(6.4)
as in [7, 11]
as n-- oo. Similarly,
we compare the reciprocalsof the squares of the limitingvalues of Tn(O3U,n
and Tn (I ,-L,n) as a measureoftheirA.R.E., and arriveat (6.4) as
-L.n)
the A.R.E. of the confidenceintervalin (2.9) with respectto the confidence
intervalderivedfromthe least squares estimators.We shall now study (6.4)
in more detail. For this, we recall that tB is composed of an distinctsets of
elements,wherein the jth set thereare uj elementswhich are all equal to t ,
. <t*. Let Rj=uo+ * +uj_
say, for j=1, **
an (?2), where t*<
+ I(uj+l), forj=1, * *, an whereuo= 0. Then, we have the following:
Theorem6.3. 0 <Pn <1, wherethe upper bound 1 is attainedif and only if
t= a+ bRj forall j 1, * * *, a., where
b is positive.
Proof.Since tl?t2< ...
<tn, the numeratoron the righthand side of (6.2)
is non-negative,and hence, Pn2>0. To prove that Pn<?, we rewriteTnAnpn as
,uj (Rj -+) (tj -in) whichby the Cauchy-Schwarzinequalityis less than
or equal to TnAn, where the equality sign holds if and only if (t*- Tn)= b(R1
), forall j 1, * * *, an. This completesthe proofof the theorem.
-2+
to the limitp2(>0)
providedp2 converges
1386
AMERICAN
STATISTICAL
ASSOCIATION
JOURNAL,
DECEMBER
1968
-1
tn - 0
(6.5)
m m 1 n=2m+2.
Uj
1 2
Here, clearly p.->l as n-> oo. From theorem6.3 and the above discussionwe
readilyarriveat the followingtheorem.
Theorem6.4. A.R.E. (j3*j ? 12cr2(F)B2(F),wheretheequalitysign holds if
optimallydesigned.
variablesare (at least asymptotically)
theindependent
Thus, foroptimal or asymptoticallyoptimal designs,the A.R.E. of A* relative to A is the same as that of the Wilcoxontest withrespectto the Student's
t-test(forthe two sample location problem). Thus, as in [9, p. 89], it follows
that (i) when F(x) is normal,this A.R.E. is equal to 3/7r= 0.955, (ii) when
F(x) is logisticor double exponential,it is greaterthan unity,(iii) fordistribularge and
tionswith 'heavy tails' (such as Cauchy etc.), it may be indefinitely
(iv) forany continuousF(x), it cannot be less than 0.864. On the contrary,if
ti, - - *, tnare not optimallydesigned,so that Pn does not tend to 1 as n-> 00,
this A.R.E. may not have any lower bound (such as 0.864 or so). In fact, if
Pn->O as n- *o, so also willthisA.R.E. As an exampleof a bad design,consider
the following
A)
t
Uj
-m
1
1 m (m > 1)(66)
m m 1 n=2m+2.
-1
computationsit followsthat
By straightforward
Pn=
0(3/mi) = 0(n-1),
(6.7)
REGRESSION
ESTIMATE
BASED
1387
ON TAU
tained fromour (6.4) and his (6.1). A special case consideredby him in section
3 [1, pp. 896-897] is the estimatorA,&based on the Wilcoxon-scoresstatistic
i.e., on EJ=n(tj-bRj, and in this case, the A.R.E. of , with respect to A
comes out as 12r2(F)B2(F). Thus, the A.R.E. of A* with respectto AX is equal
from0. This means
to the limitingvalue of pn,providedsuch a limitis different
are asympthat foroptimumor asymptoticallyoptimumdesigns,,3*and 0,&
but, unlike A*, &,is not affectedby bad design of
toticallyequally efficient,
tn. However, this is not unexpected. ,, like ,B,utilizes the exact values of
X, *, tn in the mixed-rankstatistic,whereas/* only utilizestheirordering.
On the otherhand, :, has to be obtained by a trial and errorsolution,whereas /3*can be obtainedsimplyas the medianof the slopes. So in actual practice,
if pn is close to unity,it may definitelybe of some advantage to consider a
but quick estimatorratherthan a computationally
(possibly)slightlyinefficient
complicatedone.
In passing,we may remarkthat by virtueof theorem6.2,
X(F)
re2/ 1 \/ip,Tn
,n
OL,n)
B(F),
as n
??
(6.8)
be
true), reduces to
-2b(tj-ti)B(F)/Tn+o(T,-
1).
7&-4+00
11-4>00
(7.1)
=
lim G(4aB(F)An/Vn),
n->oo
1388
AMERICAN
STATISTICAL
JOURNAL,
ASSOCIATION
DECEMBER
1968
For the proof of theorem 6.3, we note that for any two real and
finite (b, '), under Ho:-=0, the covariance of {N(2)/V-}iUn(b/{pnTn})
and {N(2)/Vn}`Un(b'/pnTn}) can be shown to be asymptoticallyequal to
unity.Hence, using the resultsof theorem7.1, we see that as n-* 00,
{X
{N
()
(7.2)
O.
}
Vn}Var{Un(b/{ip,Tn}) - Un(b'/jpnTn})
(7.3)
4(b'
b)B(F)AV
{N()/V
n}{U
n(b/{pT})
Un(b'/{pnTn})}
(7.4)
-4(b'
b)B(F)An/Vt -> 0.
I PnTnu
{NQ)/~vf}[U nl(/3L
{N(2)
-U.(Qt3n}
/Tf
(737)-J(X )]
Now, by (2.7) and (2.8), the lefthand side of (7.7) convergesto 2r,q2,and also,
An/V1->VA3/2as n-> 00. Hence, theorem6.2 followsfrom(7.7). Q.E.D.
ACKNOWLEDGMENT
REGRESSION
ESTIMATE
BASED
ON TAU
1389