You are on page 1of 65

Lecture Slides by Dr.

Muhammad Hanif Mian for


Workshop on Recent Developments in Survey Sampling
(August 26-27, 06)

AN ALTERNATIVE ESTIMATOR FOR Y


Since repetition of the observation of a repeated unit in a
sample selected with srswr does not provide additional
information for estimating Y , the mean of the values of
the distinct units in a sample of n units may be
considered as an alternative estimator. That is, if
y1/ , y2/ ,..., yd/ denote the values of the distinct units in a
simple random sample of n units selected with
replacement ( d n ) , then the suggested alternative
estimator is
y/ =

1 d /
S yi
d i =1

(3.21)

This estimator is unbiased for Y and is more efficient than


the sample mean y , namely,
y=

1 n
1 n
y
=
ri yi ,
i n
n i =1
i =1

where ri of is the number of repetitions of the i-th distinct


d

unit and S ri = n.
i =1

The variance of y / can be obtained by nothing that in this


case two stages of randomization are involved: (i) d is a
random variable taking values 1 to n with certain
probabilities, and (ii) selection of the d distinct units from
1

N units with equal probability without replacement and


applying the formula of simple random sampling, we get
1 1 N
2,
(3.22)
Var ( y / ) = E

d
N
N
1

1
1n 1 + 2n 1 + ....... + N n 1
Where E =
. Neglecting terms of
Nn
d
2

1
degree greater than in (3.22), we get
N
n 1 N
1 1
Var y / =
2
(3.23)
+
2
n 2 N 12 N N 1
An unbiased estimator of Var y / is given by

( )

( )

1 1
N 1 2
var y / = + n
sd , (3.24)
d N N N
2
1 d /
S yi y / for d 2.
Where sd2 = 0 for d = 1 and sd2 =
d 1 i =1

( )

The second term in the curly brackets in (3.24), namely


( N 1) ( N n N ) , is likely to be negligibly small compared to
the first term and hence the variance estimator may be
taken as
1 1
var ( y / ) = sd2 .
d N

(3.25)

It may be noted that if N is considerably larger than n,


then the chance of repetition of a unit in the sample will
be small and hence the gain in using y / , instead of y will
be only marginal. The results mentioned in this section
have been discussed in detail by Basu (1958), Raj and
Khamis (1958) and Pathak (1962.)
2

UNBIASED RATIO ESTIMATOR


We have seen that under simple random sampling, classical (conventional) ratio
estimator is biased. Lahiri (1951) suggested that classical ratio estimator can be
made unbiased if the selection procedure is changed. Midzuno (1950) and Sen
(1951) proved the same result. Lahiri suggested that the first unit was

selected with probability proportional to the aggregate of the size (PPAS)


or with probability proportional to

X
i =1

, and the remaining n 1 units

with equal probability and without replacement. Midzuno (1951)


simplified this procedure as the first unit is selected with probability
proportional to Xi (measure of size), and the remaining (n 1) units like
Lahiri (1951). This idea was introduced by Ikeda (1950) reported by
Midzuno (1951). This sampling scheme has striking resemblance to the simple
random sampling without replacement. In fact, it may be viewed as a
generalization of the simple random sampling when extra information on the
population is available.
Let we have a population of N units. The probability that ith unit is first one to be
selected and subsequent (n 1) units with equal probability and without
replacement is

xi
N

X
i =1

1
.
N 1

n 1

The probability that jth unit is first one to be selected and subsequent (n 1)
draws with equal probability and without replacement

xj
N

X
i =1

1
,
N 1

n 1

and so on the probability P(s) for the two selections are therefore

P( s) =

xi + x j
N

X
i =1

1
.
N 1

n 1

Since there are n such selection therefore the probability of the selection of the
sample will be

P( s) =

1
.
X N 1
n 1

x
1
=
.
X N

n
i =1

(6.6.1)

(6.6.2)

The classical ratio estimator is


n

y =

y
i =1
n

x
i =1

(6.1.3)

THEOREM (6.2):
Classical ratio estimator is unbiased under Ikeda-Midzuno- Sen Lahiri selection
procedure with variance

N 1
Var ( y) = X

n 1

n 2
yi
i =n1 Y 2

xi
i =1

(6.6.3)

/ is the sum over all possible samples.


PROOF
Taking the expectation of (6.1.3) we have

yi

E ( y) = i =n1 P( s) X ,
x

i =1
where P(s) is the probability of the sample..Putting the value of P(s) from (6.6.1)
we will have

n
yi
E ( y) = i =n1

xi
i =1

1
X

N 1

n 1

x
i =1

On simplification we get

yi
N yi
= i =1
= y = Y
E ( y) = i =1
N 1
N
N

n


n 1
n
n
The variance expression of y may be derived as;

(6.6.4)

Var ( y) = E ( y) 2 [ Ey]

n 2
yi
i =1
P( s) X Y 2
Var( y ) =
2
n


xi
i =1
Substituting the value of P(s) from (6.6.1)

N 1
Var ( y) = X

n 1

n 2
yi
i =n1 Y 2 .

xi
i =1

Note that the Var ( y) = 0 if yi xi Yi =

Xi
Xi

(6.6.3)

. This is very strong

property and will be referred to as Ratio Estimator Property.


THEOREM ( 6.3)
The mean of ratio estimator is an unbiased with variance

N
Var ( y) = X
n

y2
Y 2
x

(6.6.5)

PROOF
Taking the expectation of (6.1.4) we get

y
E ( y) = XE = P( s ) X
x
x

Using (6.6.2) we have

y
y x 1

E ( y) =
X = N = Y
x X N
Cn

(6.6.5)

Proceeding by the same way as before we can derive the variance expression of
y , i.e.

N
Var ( y) = X
n

y2
Y 2
x

(6.6.6)

THEOREM (6.4)
An unbiased estimator of Var ( y) is

v ar( y) = y2

i =1

= y2

yi2 X n

x i =1

Nn
X
x

y y
j =1

N 1 X
Nn(n 1) x

2 N n 2
y Nn s y

(6.6.7)

(6.6.8)

PROOF
It may be proved that E[v ar( y)] = Var ( y) . For this

n y2 X
n yi2 X

= i
E
P( s)

i =1 Nn x
i =1 Nn x

1 N
x
1
/ yi X
= Yi 2
=
N x X N N i =1

(6.6.9)

and

n n

N 1
x
N 1

= n(n 1)
E yi y j
E ( yi y j )
i =1 j =1

nN (n 1) X
n(n 1) N

j i

YY
i

N 1
1
=
= 2
N
N ( N 1) N
i =1 j =1
j i

YY

i =1 j =1
j i

(6.6.10)

Hence

= E ( y2 ) Y 2 = Var ( y)
Similarly we can show that an unbiased estimator of population total will be

v ar( y) = y2

N X
x

2 N n 2
y Nn s y

RATIO ESTIMATOR AS MODEL-UNBIASED

Consider all estimators y of Y that are linear functions of sample values yi,
that are of the form
n

y = ci yi ,

(6.8.1)

i =1

where the ci does not depend on y i s though they may a function xi. The choice

of the cis restricted to those that give unbiased estimation of Y. The estimator
with the smallest variance is called best linear unbiased estimator. The model
is:

yi = xi + i ,
where E ( i ) = 0, Cov( i , j ) = 0,
and Var ( i ) = i2 = 2 X i2

1 1
2

(6.8.2)

where i are independent of the xi and xi are > 0. The xi (i = 1, 2, N) are


known. The model is the same that was employed by Cochran (1953), which
appears to have been originated by H.F. Smith (1938). Useful references to this
model are Cochran (1953, 63, 77), Brewer (1963b), Godambe and Joshi (1965),
Hanif (1969) Foreman and Brewer (1971), Royall (1970). (1975),Brewer and
Hanif(1983) Cassel, et al (1976), Isaki and Fuller (1982), Hansen, Madow and
Tepping (1983), Samiuddin et al (1992)and many others.

Brewer (1963b) defined an unbiased ratio estimator under model (6.8.2). He


used the concept of unbiased ness which was different from that given in
randomization (design - based) theory. Royall (1970) also used this model.
Brewer and Royall regarded an estimator y (estimated population total) is

unbiased if E ( y ) = E (Y ) in repeated selections of the finite population and


sampled under the model. Under model (6.8.2) Brewer (1963b) proved that the
classical ratio estimate was model unbiased and is best linear unbiased
estimator for any sample [random or not] selected solely according to the values
of the Xi. This result hold goods if the following line conditions are satisfied;
(i)
(ii)

The relation between estimated (yi) and benchmark (xi) is linear and
passes though the origin.
The Var(yi) about this line is proportional to xi.

THEOREM (6.6):
Under the model (6.8.2) classical ratio estimator is unbiased with variance

N
Var ( y) = X
n

y2
( X nx )
X
Y 2 = =
nx
x

(6.8.3)

PROOF:

We know that
n

y = ci yi

(8.6.4)

i =1

Using model (6.8.2) we have


n

y = ci [ xi + i ] =
i =1

i =1

i =1

ci xi + ci i

Since E(i) = 0 we then have

E ( y) =

ci xi
i =1

ci E ( i ) =
i =1

c x
i =1

(6.8.5)

We also know that

Yi = X i + i or E( Y ) = X

(6.8.6)

Now

E[ y Y ] =

i =1

i =1

i =1

ci xi + ci E ( i ) X E ( i )

= ci xi X = 0 If
i =1

c x
i =1

i i

=X

(6.8.7)

Therefore we say that y / is model unbiased if


n

i =1

xi = X

(6.8.8)

The variance expression of y / , i.e.

Var ( y) = E ( y2 ) [ E ( y) ]

E ( y 2 ) = 2

xi2 + ci2 E ( i2 ) + 2

2
i

i =1

(6.8.9)

i =1

2
i

xi E ( i )

Using the condition of model we will have:


n

i =1

i =1

ci2 xi2 + ci2Var ( i )

E ( y 2 ) = 2

(6.8.10)

Using (6.8.2), (6.8.5) and (6.8.9) in (6.8.9), we will have

Var ( y) =

c
i =1

2
i

Var ( i )

(6.8.11)

Let us for simplicity we assume Var ( i ) = xi then (6.8.11) will be:

Var ( y) =

c x
i =1

2
i i

(6.8.12)

We can minimize Var ( y / ) w.r.t. ci. For this the Lagranges multiplier will be

c x c x X
i =1

2
i i

i =1

Differentiating unconditionally with respect to ci, we get.

= 2 ci xi xi = 0
ai

or

ci =

= C (constant)
2
n

We know from (6.8.7) that

c x
i =1

=X

or
n

i =1

xi = X , or

c=

X
= ci
n x

Hence y =
/

i =1

ci

X
yi =
n x

yi =

i =1

i =1
n

x
i =1

X = y

The best linear unbiased estimator y / = y , which is a classical (conventional)


ratio estimator. For the derivation of Var ( y) we proceed as follows:

y Y =

i =1

Since

i =1

i =1

i =1

ci xi + ci i X i

X
X
ci xi = X and ci =
then y Y =
n x
n x

i
i =1

i =1

Divide i into sample and non-sample values we have


i =1

ai =

X
X
then y Y =
n x
n x
X

1
n x

or ( y Y ) =

i
i =1

i =1

N n


i =1

i =1

Squaring and taking the expectation


2

N n
2
X
n
1 E ( i2 ) + E ( i2 )
E ( y Y ) = Var ( y ) =
nx i =1
i =1
2 n
N n
X

1 var ( i ) + var ( i )
Var ( y ) =
nx i =1
i =1

Substituting the value of Var(xi), we have:


2

N n
X nx
Var ( y ) =
1 + xi
nx

i =1

n
N

X nx
x
X
=
+

i xi
i

nx
i =1
i =1

X nx
=
nx + ( X nx )
nx
Var ( y) =

( X nx )
( X nx ) 2
X
nx + ( X nx ) =
2
(n x )
n x

(6.8.3)

10

Using all these assumptions a model-unbiased estimator of from the


sample may be easily proved as

1 n 1
( yi r xi )2 .
n 1 i =1 xi

(6.8.13)

Putting this value of in (6.8.4) a model-unbiased variance estimator is

Var ( y) =

( X nx ) X 1 n 1
( yi rxi ) 2

n x
n 1 i =1 xi

(6.8.14)

This model based unbiased estimator is not only superior to y / but is the best of
a whole class of estimators. For details see Brewer (1963b, 1979), Royall (1970),
Royall and Herson (1973) and Samiuddin et.al. (1978).
6.9
COMPARISON y AND y / UNDER STOCHASTIC MODEL
It is an established fact that the choice of a suitable sample plan is central to the
design of a sample survey. Sample design can be regarded as comprising separate
selection and estimation procedures, but the choices of these are so
interdependent that they must be considered together for virtually all purposes.
Some times the nature of the sample plan is determined by circumstances, but
usually the designer is faced with a choice, and frequently it is obvious which of
a number of possible plan will be most efficient in terms of minimum sample
error for given cost( or vice versa). Standard sampling theory using imputed
values for such quantities as the means, variances, and correlation coefficient of
the (finite) population, or strata or clusters within it, can often indicate which
design is most efficient. Sometimes, however, this is not so. A well-known
example is the comparison between classical ratio estimation using unequal
probabilities. To obtain a straight forwarded answer in this case, Cochran (1953)
made use of a certain super population (6.8.2) which is intuitively attractive and
appears to have some empirical basis. The purpose here is to compare classical
ratio estimator and unbiased estimation method of estimation using equal
probabilities and using large scale sample result which can be obtained using
generalization of model. Comparison for probability proportional to size will be
discussed in Chapter 7, 8 and 9. The stochastic model used here for the purpose
of comparing efficiencies.
6.9.1. Unbiased Estimate for Population Total Based on Simple Random
Sampling
THEOREM (6.7).
Under linear stochastic model (6.8.2) ratio estimator will be more efficient than
N

unbiased estimator if

2 x2 > i2
i =1

PROOF

11

We know that:

N
n

y =

y
i =1

Putting the value of (6.8.2) we get

N
n

y =

n
n

i
x

i =1
i =1

n
N
= x + i
n i =1

( x + ) =
i

i =1

N
n

i =1

i =1

i =1

Y = Yi = [ X i + i ] = X + i

Also

or

i =1

(6.9.2)

=Y X

Var ( y ) = E ( y Y )

(6.9.1)

(6.9.3)
2

= E [ y X + X Y ]

= E ( y X ) (Y X )

= E ( y X ) E (Y X )
2

as cross product term is equal to 2E (Y X )

(6.9.4)

Now on first term of (6.9.4) using (6.8.2) will be

EM E D ( y X )

= EM E D x +
n

i X

i =1

= EM E D ( x X ) +
n

N
N
= 2 x2 + i2
n i =1

i =1

(6.9.5)

Similarly
2

2
N
EM i = EM ( Y X )
i =1

12

or

i =1

2
i

= E (Y X )

(6.9.6)

Using (6.9.5) and (6.9.6) in (6.9.4) we get:

Var ( y ) = y2 = 2 x2 +

N n N 2
i
n i =1

(6.9.7)

Ratio Estimator

y =
N
n

y
X
x

(6.1.3)

( X
i =1

+ i )

X
x
N n

+
x
i

n i =1

X
=
x
Now

(6.9.8)

Var ( y ) = E [ y Y ]

= E [ y X + X Y ]

2
2
= E ( y X ) E (Y X )

(6.9.9)

Now

EM ED [ y X ]

x + n
= EM E D
x

i =1
X X

N X n

i X
= EM E D X +

n x i =1

N X2
N X n
= EM E D
i = n x2
n x i =1

i =1

2
i

N
n

i =1

2
i

(6.9.10)
N

E (Y X ) 2 = i
i =1

Therefore

13

N
Var ( y ) = =
n

-
i =1

2
i

i =1

2
i

Comparing (6.9.7) and (6.9.11) we have:

Var ( y ) Var ( y ) = 2 x2 +
N

i =1

N
n

i =1

i =1

i2 i2

N
n

i =1

2
i

2
i
N

= 2 x2 So Ratio Estimator will always be more efficient if = 2 x2 i2 is


i =1

positive or
Foreman and Brewer (1971) used the following model

Yi = + X i + i

With the same assumption given in (6.8.2) they compared various method of
estimation and proved that ratio method of estimation is more efficient than
unbiased estimation method provided | | < | X | ..

SOME RECENT DEVELOPMENTS ON RATIO ESTIMATORS


Recently two benchmark variables have been used to increase the efficiency.
Some of them are given here
6.10 1 Modification of Classical Ratio Estimator I

Chands (1975) developed a chain ratio type estimator in the


context of two phase sampling. It seems sensible to study the possibility of
adapting it to the new situation although the force of its argument is
somewhat lost in the single phase case.
THREROM (6.8).
An estimator suggested by Samiuddin and Hanif (2006) by using two auxiliary
variables i.e, ratio cum ratio is

T2 = y

X Z
x z

(6.10.1)

With Mean square error is

14

MSE (T2 ) = 1Y 2 C y2 + C x2 + C z2 2C x C y yx 2C y C z yz + 2C x C z xz (6.10.2)


The construction of this estimator is made multiplying Classical Ratio estimator
by

Z
.
z

PROOF
Using the concept given in (6.2.23) we get

e e
T2 Y = Y + e y 1 x 1 z Y
X
Z

Ignoring send and higher order terms we get

Y
Y
ex ez
X
Z
The mean square error T3 will be
T2 Y ; ey

E T2 Y = = y X Z Y
x z

Using (6.10.4) in (6.10.5) we get


2

(6.10.3)

(6.10.4)
2

(6.10.5)
2

Y
Y
E (T2 Y ) ; E e y ex ez
X
Z

Y
Y2
Y
Y
Y2
; E ey2 + 2 ex2 + 2 ez2 2 ey ex 2 ey ez + 2
ex ez
X
Z
XZ
X
Z

Applying expectation we get

Y2
Y2
Y
MSE (T2 ) ; 1 Y 2 C y2 + 2 C x2 X 2 + 2 C z2 Z 2 2 Y XC x C y xy
X
X
Z

Y
Y2
Y ZC y C z yz + 2
XZC x C z xz
Z
XZ

On simplification we get
MSE (T2 ) = 1Y 2 C y2 + C x2 + C z2 2C x C y yx 2C y C z yz + 2C x C z xz (6.10.2)

MSE (T2 ) = 1Y 2 [C y 2 + C x 2 2 xy C x C y ] + 1Y 2 [C 2 z 2C y C z yz + 2C x C z xz ]

MSE (T2 ) = MSE (T1 ) + 1Y 2 [C 2 z 2C y Cz yz + 2C xCz xz ]

(6.10.2)

6.10.2 Revised Ratio Estimator( An Estimator with suitable a


involving two auxiliary variables)

15

THEOREM (6.9). A possible estimator with the involvement of suitable a


and with two auxiliary variable suggested by Samiuddin and Hanif (2006) is

X
Z
T3 = a y + (1 a ) y ,
x
z

(6.10.6)

with mean square error

Cx C y xy C y Cz yz Cx Cz xz + Cz2

2
2
2
MSE T3 = 1Y C y + Cz 2 yz C y Cz
Cx2 + Cz2 2Cx Cz xz

(6.10.7)
PROOF
Using the concept given in (6.2.23)

y
X;
x

(Y + ey ) X = Y 1 + ey 1 ex

Y
X

Expanding and ignoring second and higher order terms we get


X + ex

ey e

Y
= Y 1 + x + .... = Y + e y ex + ....
X
Y X

y
X
x

Similarly

y
Z;
z

(6.10.8).

(Y + ey ) Z = Y 1 + ey 1 ez

Z + ez

ey e

Y
= Y 1 + z + .... = Y + e y ez + .... (6.10.9)
Z
Y Z

Using (6.10.8) and (6.10.9) T3 will be

X
Z

= Y + ey
+ (1 ) Y + e y
X + ex
Z + ez

Y + e y 1 x + (1 ) Y + e y

Y
Y
Y + e y ex (1 ) ez
X
Z
The mean square error will be
T3

) 1 eZ

16

Y
Y
MSE (T3 ) = e y ez ex ez
Z
Z
X

This may be written as


X

MSE (T3 ) ; E a y Y + (1 a ) y Y

Y
Y
Y
= E ae y a ex + e y ez ae y + a ez
X
Z
Z

Y
Y
Y
= E e y a e x ez + a ez
X
Z
Z

(6.10.10)

Y
Y
Y
= E e y ez a ex ez
Z
Z
X

(6.10.11)

In order to get the optimum value of a we first find partial differentiating of


(6.10.11) w.r.t a and then equating to zero.
2
Y

Y Y
Y
Y
E ey ez ex ez Ea ex ez = 0
Z X
Z
Z
X

Therefore optimum value of a is

Y Y
Y
E e y ez ex ez
Z X
Z

a=
2
Y

Y
E ex ez
X
X
Y

Y
Y2
Y2
E e y ex e y ez
ez ex + 2 ez2
Z
ZX
Z
X

=
2
2
2
Y

Y
Y
E 2 ex2 + 2 ez2 2
ex ez
XZ
Z
X

Y 2 C y Cx xy Y 2 C y Cz yz Y 2 Cx Cz xz + Y 2 Cz2
Y 2 Cx2 + Y 2 Cz2 2Y 2 Cx Cz xz

17

Y 2 C y Cx xy C y Cz yz C x C z xz + Cz2
=
Y 2 C x2 + Cz2 2Cx Cz xz
a=

C y Cx xy C y C z yz C x C z xz + Cz2
Cx2 + Cz2 2Cx Cz xz

(6.10.12)

Taking the square of (6.10.11)


2

Y
Y
Y Y
Y
= E e y ez + a 2 E ex ez 2aE e y ez ex ez
Z
Z
Z X
Z

Y 2
Y2
Y
Y2
Y
Y
= E e y2 + 2 ez2 2 e y ez + a 2 E 2 ex2 + 2 ez2 2 ex ez
X
X
Z
X
Z

X
Y

Y
Y2
Y2
2aE e y ex e y ez
ez ex + 2 ez2
Z
ZX
Z
X

Applying expectation the mean square error will be


MST (T3 ) = Y 2 C y2 + C z2 2 yz C y C z + a 2 C x2 + C z2 2C x C z xz
2a C y C x yx C y C z yz C x C z xz + C z2
(6.10.13)
Putting the value of a from (6.10.12) in (6.10.13) and on simplification we

Cx C y xy C y Cz yz Cx Cz xz + Cz2
2
2
2
MSE (T3 ) = Y C y + Cz 2 yz C y Cz
Cx2 + Cz2 2Cx Cz xz

(6.10.7

Since = 0 and = 1 are special cases of T5 therefore we conclude that

Z
X
MSE (T5 ) MSE y and MSE y . In T5 , will have to be replaced
z
x
by its sample estimate .

SAMPLING WITH PROBABILITIES PROPORTIONAL


TO SIZE (WITH REPLACEMENT)
18

7.1.

INTRODUCTION.

In previous chapters equal probability sampling selection procedure and


estimation methods have been discussed. In this and subsequent chapters those
selection procedures will be considered in which probability of selection varies
from unit to unit (unequal probability) in the population. In equal probability
sampling, selection does not depend how large or small that unit is but in
probability proportionate (proportional) to size sampling these considerations are
made. The probabilities must be known for all units of the population.
The general theory of unequal probabilities in sampling was perhaps first
presented by Hansen and Hurwitz (1943). They demonstrated, however, that use
of unequal selection probabilities within a stratum frequently made far more
efficient estimator of total than did equal probability selection provided measure
N

of size ( Z i i.e.

Z
i =1

= Z ) is sufficiently correlated with estimand,( variable

under study) Yi. A method of selection in which the units are selected with
probability proportionate (proportional) to given measure of size, related to the
characteristic under study is called unequal probability sampling or the
probability proportional to size sampling, commonly known as PPS or PS
sampling.
7.2.

SAMPLING WITH UNEQUAL PROBABILITIES WITH


REPLACEMENT [PPS SAMPLING].

The use of unequal probabilities in sampling was first suggested by Hansen and
Hurwitz (1943). Prior to that date there had been substantial developments in
sampling theory and practice, but all these had been based on the assumption that
probabilities of selection within each stratum would be equal. They proposed a
two stage sampling scheme (will be discussed in Chapter 11). The first stage
selection took place in independent draws. At each draw, a single first-stage unit
is selected with probabilities proportional to a measure of size, the number of
second-stage sampling units within each first-stage units. At the second-stage, the
same number of second stage-units is selected from each sampled first-stage unit.
Because it is possible for the same first-stage unit to be selected more than once
therefore, this type of unequal probability sampling is generally known as
sampling with replacement. Since, however, the independence of the draws is not
necessary condition for the units to have a non-zero probability of being selected
more than once, another name first suggested by Hartley and Rao (1962) is

19

multinomial sampling, a term justified by the multinomial distribution of the


number of units in the sample.
Unequal probability can however be used in single stage design.
This scheme compared favorably with other two stage sampling schemes; these
used equal probabilities of selection at the first stage, and then took either a fixed
number or a constant proportion of sub-sampling units from each selected first
stage unit. This selection procedure is explained as:

A list of 523 villages of Multan district along with population of males and
females is given in Appendix-I. In order to understand the selection procedure of
probability proportional to size sampling, 5% sample has been selected from this
population. In order to select a sample we cumulate the measure of sizes (area)
under this selection procedure, 26(5% of total villages) random numbers are
selected from 001 to 956204. These random numbers along with the serial
number of villages, total population and initial probabilities of selection are
given(data is given on next page). If any unit is selected more than once it should
be included in the sample

7.3 EXPECTATION.
If the ith unit is selected from a population of N units with probability
N

or ypps of population total Y


Pi = Z i / Z i , than an unbiased estimator, yHH
i =1

as suggested by Hansen and Hurwitz (1943) is:

= yPPS
=
yHH

1 n yi
,
n i =1 pi

(7.3.1)

where HH denotes the Hansen and Hurwitz, and pps denotes probability
proportional to size.

THEOREM (7.1)
A sample of size n is drawn from a population of N units with probability
proportional to size and with replacement y HH is an unbiased estimator of
population total, Y.

PROOF
We know that

20

=
yHH

1 n yi
,
n i =1 pi

(7.3.1)

Taking the expectation

E ( yHH ) =

N
yi
yi
Yi
1 n
=
=
E
E
Pi = Y
(
)
(
)

n i =1
pi
pi
i =1 Pi

is an unbiased estimator of population total Y.


Therefore yHH
Random number
859677
74835
491741
285996
252541
287850
847258
410596
674344
727666
920794
291874
742201
37860
750855
91613
757074
213334
656265
843800
464793
598479
314161
820668
18504
32315

7.4.

Sr. number of Total population


villages
483
7346
50
9231
275
3713
131
2310
108
7261
133
10425
478
6978
221
399
397
737
423
3203
508
4039
135
5439
434
1373
33
8074
437
3416
54
5841
441
1316
92
6475
385
1261
478
6975
258
2513
360
3039
153
322
472
13056
19
593
28
2515

Probability of
Selection
.005946
.006511
.001335
.001337
.006127
.00353
.006409
.000316
.002414
.001396
.002813
.000906
.000885
.006968
.00166
.003874
.002297
.004451
.002064
.006409
.002781
.001128
.000697
.00613
.000998
.001936

VARIANCE AND UNBIASED VARIANCE ESTIMATOR

21

THEOREM (7.2)
A sample of size n is drawn from a population of N units with probability
proportional to size and with replacement, the variance of y HH is

Var ( yHH ) =

1 N Yi 2
Y 2

n i =1 Pi

(7.4.1)

PROOF.

We know that

)2 Y 2
Var ( yHH ) = E ( yHH
from (7.3.1), we have
Substituting the value, yHH
2

1 n y
) = E i Y 2
Var ( yHH
n i =1 pi

n n

2
yi y j
1 n yi

= 2 E 2 + E
Y 2

n
i =1 p i
i =1 jj =1i pi p j

2
N N
YY
1 N Yi
i j
= 2 n 2 Pi + n(n 1) Pij
Y 2 .

n
PP
i =1 Pi
i =1 j =1
i j
j i

Since the selection of population units are independent; therefore Pij =


PiPj, substituting the value of pij:
N
1 N Yi
N
2
2
+ (n 1) ( Yi ) Yi
n i =1 Pi
i =1
i =1
2

Var ( yHH ) =

2
Y .

On simplification we get:

Var ( yHH ) =

1 N Yi
2
Y
n i =1 Pi

This expression may alternatively be written as

22

1 N Y
) = Pi i Y .
Var ( yHH
n i =1 Pi

Yi Y j
1 N N
PP
=

i j
2n i =1 j =1
Pi Pj
=

(7.4.2)

1 N 1
(Yi Pi Y ) 2 .

n i =1 Pi

(7.4.3)

(7.4.4)

7.4.1 An Alternative proof( using Indicator Variable)

Let ai is defined as the number of times that the ith unit of the population
to be in the sample (Chapter 2), then the joint distribution of ai is

n!
P1a1 P2a2 K PNaN
a1 ! a2 !K aN !

(7.4.5)

Then

E (ai ) = nPi ; Var (ai ) = nPi (1 Pi ); Cov(ai , a j ) = nPP


i j (7.4.6)
An unbiased estimator of population total will be

=
yHH

1 N Yi
ai
n i =1 Pi

(7.4.7)

The unbiased ness can be proved easily as:Taking the expectation of (7.4.7) and putting E(ai) = nPi from (7.4.6) we
get

E ( yHH ) =

Y 1 N
Y
1 N
E (ai ) i = nPi i = Y

n i =1
Pi n i =1
Pi

23

The variance of y HH may be written (see chapter 2) as:

Y2
1 N
) = 2 Var (ai ) i 2 +
Var ( yHH
n i =1
Pi

Yi Y j
Cov(ai , a j )
(7.4.8)
Pi Pj

i =1

j i

j =1

Putting the values of Var (ai ) and Cov(ai , a j ) from (7.4.6) in (7.4.8) and on
simplification we get (7.4.1).
It follows that, if Pi = Yi /

Y
i =1

the variance is zero. In practice, this ideal

situation can of course not be realized as the probabilities cannot be chosen


proportional to Yi, which still has to be observed. But this situation can be
approximated if it is possible to choose Pi proportional to some measures of size
Zi, which is known for all units in the population and which may be assumed
approximately proportional to Yi . The Z i will then be called the size of the ith
unit and least possible variance may be obtained by choosing the probabilities
proportional to the sizes.

An analogous expression for the covariance of yHH and xHH in the case of
sampling with replacement and with probabilities proportional to size may
be written in a straight far warded manner, i.e.
Cov( y, x) =
7.4.1.

1 N Yi
Pi Y i X .

n i =1 Pi
Pi

(7.4.9)

Unbiased Variance Estimator

THEOREM (7.3)

A sample of size n is drawn from a population of N units with probability


proportional to size and with replacement then an unbiased variance
estimator of (7.4.1) is:
2

n
yi

1
)=
.
var( yHH
yHH

n(n 1) i =1 pi

(7.4.10)

24

PROOF.
Taking expectation of (7.4.10)
2
n
1
yi

,
E [ var( yHH )] = E
yHH
n(n 1) i =1 pi

Now
2

n
yi

yi

Y ) .

=
y

Y n E ( yHH

HH
i =1 pi
i =1 pi

Taking the expectation of the above equation


2
2
n y
n y


2
i
i
E y HH = E Y n E(y HH Y )
i =1 p i
i =1 p i

= n Pi i Y n Var ( yHH )
i =1
Pi

1 N Y
)
= n Pi i Y n var ( yHH ) = n ( n 1) var ( yHH
n i =1 Pi

Using (7.4.2) we get


2
n y

i
= n(n 1) Var ( yHH ) .
E yHH
i =1 pi

Using this result in (7.4.10), we get

) ] = Var ( yHH
)
E [ var( yHH
(7.4.10) may be written as
2

n
n
y yj
1
)= 2
var( yHH
i .

2n ( n 1) i =1 j =1 pi p j

(7.4.11)

For calculation purpose alternative form of (7.4.10) is

)=
var( yHH

1 n yi2
'2
2 n yPPS .
n(n 1 i =1 pi

(7.4.12)

25

An unbiased covariance expression may be written analogous to (7.4.9) as

Cov( y, x) =

n
x
1
y
).
( i yHH )( i xHH

n(n 1) i =1 pi
pi

(7.4.13)

Though this scheme is based on with replacement process but for the following
reasons, it is preferred to be used in large scale sample surveys;
(i)
selection of the sample is simple,
(ii)

can be used for any finite predetermined number of units in the


sample,

(iii)

an unbiased variance estimator is simple, and

(iv)

it is also comparatively easy to obtain unbiased variance


estimator of total in multistage designs.

This selection procedure may be more efficient than simple random


sampling if the measure of size is approximately proportional to estimated i.e. Yi
and Zi are linearly related and regression line passing through the origin.
EXAMPLE (7.2)

Select a sample of 26 villages using probability proportional to size and with


replacement selection procedure form the data given in Appendix-I. Estimate the
total number of person in 523 villages and compare this result with actual
number of population given in 523 villages. Estimate Var ( y PPS ) and calculate
standard error of this estimate.

Solution:
Sr.
No.
1
2
3
4
5
6
7
8

yi
7346
9231
3713
2310
7261
10425
6978
399

pi
0.005946
0.006511
0.001335
0.001337
0.006127
0.00353
0.006409
0.000316

yi
pi
1235452.405
1417754.569
2781273.408
1727748.691
1185082.422
2953257.79
1088781.401
1262658.228

yi

y
pi

137886694865.41
35731892260.18
1379426820568.15
14632605885.20
177831700028.54
1812993331051.22
268326052675.04
118422122062.40

yi2
pi2

yi2
pi

1.52634E+12
2.01003E+12
7.73548E+12
2.98512E+12
1.40442E+12
8.72173E+12
1.18544E+12
1.59431E+12

9075633367
13087292428
10326868165
3991099476
8604883467
30787712465
7597516617
503800632.9

26

9
10
11
12
13
14
15
16

737
3203
4039
5439
1373
8074
3416
5841

0.002414
0.001396
0.002813
0.000906
0.000885
0.006968
0.00166
0.003874

305302.403
2294412.607
1435833.63
6003311.258
1551412.429
1158725.603
2057831.325
1507743.934

1693852740901.42
472833951008.72
29223818023.56
19329457362517.20
3065942449.29
200755773987.28
203444246707.55
9808812374.83

93209557065
5.26433E+12
2.06162E+12
3.60397E+13
2.40688E+12
1.34265E+12
4.23467E+12
2.27329E+12

225007870.8
7349003582
5799332030
32652009934
2130089266
9355550517
7029551807
8806732318

27

Sr.
No.
17
18
19
20
21
22
23
24
25
26

yi
1316
6475
1261
6975
2513
3039
322
13056
593
2515
117850

yi

y
pi

yi
pi

pi
0.002297
0.004451
0.002064
0.006409
0.002781
0.001128
0.000697
0.00613
0.000998
0.001936
0.081318

(i)

572921.202
1454729.274
610949.612
1088313.309
903631.787
2694148.936
461979.910
2129853.181
594188.377
1299070.248
41776367.9425

1068871009157.62
23120451813.51
991684897660.22
268811216688.66
494422166072.03
1182363847314.18
1310574981674.20
273602014185.76
1025348645657.47
94687373182.90
32621180470772.50

= y PPS =

Estimated Total

yi2
pi2

yi2
pi

3.28239E+11
2.11624E+12
3.73259E+11
1.18443E+12
8.1655E+11
7.25844E+12
2.13425E+11
4.53627E+12
3.5306E+11
1.68758E+12
99746754265786.8

753964301.3
9419372051
770407461.2
7590985333
2270826681
8187518617
148757532.3
27807363132
352353707.4
3267161674
217890794431.5760

1 n yi

n i =1 p i

41776367.9425
= 1606783
26

whereas the actual/total for 523 villages is 1797841.


(ii)

Var ( y PPS ) =

n
yi

1
y PPS

n(n 1) i 1 pi

32621180470772.50
= 50186431493
25 26

28

S .E ( yPPS ) = 224023.2834

(iii)

) = 1606783 2 224023.2834
C.L ( yPPS

(iv)

This may also be calculated as:

)=
var ( yPPS
=

yi2
1
2
2 nypps
n ( n 1) pi

1
2
99746754265786.80 26(1606783.382)
25 26

= 50186431493
7.4.2.

Comparison of Simple Random Sampling with Replacement


and Probability Proportional to Size with Replacement

We know that

1 N Yi 2
) =
Var ( yHH
Y 2
n i =1 Pi

(7.4.1)

If Pi =1/N then (7.4.1) becomes

)=
Var ( yran

1 N 2
N N 2 Y2
2
N Yi Y = Yi
n
N
n i =1

(7.4.14)

which is a variance expression for simple random sampling with replacement.


Putting Pi = Zi/Z in (7.4.1) and subtracting from (7.4.14), we obtain

) Var ( yHH
)=
Var ( yran
where Z =

Z
i =1

N
n

Y
i =1

Z
1
Zi

(7.4.15)

/N .

Probability Proportional to size (PPS) sampling with replacement will be more efficient than
simple random sampling provided.
N

(Zi Z )
i =1

Yi 2
>0
Zi

(7.4.16)

i.e. If Zi and Yi 2 / Z i are positively correlated.


However, it was noted by Raj (1954) that estimator based on PPS sampling with replacement
turns out to be inefficient compared to unbiased estimate based on simple random sampling with
replacement if the regression line Yi on Zi is far from the origin.

29

7.4.3Comparison of Var( y ran ) and Var( y HH ) Using a Linear Stochastic Model

We have already shown in (7.4.2) thatt

) Var ( yHH
)=
Var ( yran

N
=
n

N
n

Y
i =1

Z
1
Zi

Yi 2
( Zi Z )

i =1 Z i

(7.4.15)

(7.4.17)

For the purpose of comparison, let us take the linear model as defined in (6.8.2) the Chapter 6, i.e.
Assuming that the finite population Y1, Y2, ., YN is a random sample from an infinite superpopulation in which

Yi = Z i + i

where E * ( i ) = 0, E * ( i j ) = 0,
E ( i2 ) = i2

1
2
2 2
1
and i = Z i , where
2

(6.8.2)

Substituting the value of Yi from the model in (7.4.17), we have

) Var ( yHH ) =
Var ( yran

N
n

(
N

i =1

Z i2 + i2 + +2 Z i i )

2
Z 2 Z i + i + 2 i
Zi
i =1

Using the condition of the model


N
N
i2
N 2 N 2 N 2
2
= Z i + i Z Z i + Z
n i =1
i =1
i =1
i =1 Z i

N
Z

Zi
i

N
N
Z 2
N 2
i =1

2
2
2
i =1

Zi
2 i
=
+ Zi
i =1

n
N
N
i =1
i =1 Z i

30

Zi N

2 Z2 + 2 Z i2 i =1 Z i2 1
N i =1

i =1

N
N

Z i Z i2 1

N
N

B 2 Z i2 + 2 Z i2 1Z i i =1 i =1

N
i =1
i =1

N2
n

N2
n

N2
2 Z2 + 2 Cov( Z i2 1 , Z i )
n

(7.4.18)

We conclude that PPS sampling with replacement is more efficient as compared to simple
random sampling, if

2 Z2 + 2 Cov( Z i2 1 , Z i )} > 0
i

or

2 Cov( Z i2 1 , Z i ) > 2 Z2
i

This satisfied only if , since 2 0 and Cov( Z i2 1 , Z i ) > 0.


Or

Z ,Z
i

2 1
i

>

2 Z

2 1
i

(7.4.19)

Or this may alternatively be solved by direct way


N

We know that

Yi = Z i + i , Summing over i we have, Y = Z + i


i =1

We know that variance for population total for simple random sampling with replacement
(ignoring fpc) is

Var ( y ran ) =

N N 2 Y2
Yi .
n i =1
N

Putting the value of Yi and Y from the model, taking expectation and applying the conditions of
model we have
2
N
N * N
1

2
E [Var ( y ran ) ] = E ( Z i + i ) Z + i
n
N
i =1

i =1
*

N
N
N
1

2
2
2
2 2
N

Z
N

Z
i2
+

i
n
i =1
i =1
i =1

31

Since

i2

N
N

1
Z2
2
2
N
Z
(
N
1)

i2

n
N
i =1
i =1

1 N 2 ( Zi )
Z
and S =
, therefore

i
N 1 i =1
N

N
N 1

=
N 2 S Z2i + 2 Z i2

n
i =1

i2

2
Zi

Now
2

1 N Yi
) =
Var ( yPPS
Y
n i =1 Pi

Putting Pi =

Zi

(7.4.1)

we get
2

1 N Yi
= Z
Y
n i =1 Z i

(7.4.20)

Putting the value Yi and Y from the model, taking expectation and applying the condition of
model we have
2
N
1 * N ( Zi + i )2

Z + i
E [Var ( y PPS ) ] = E Z
n i =1
Zi
i =1

1 N i2 N 2
i
Z
n i =1 Z i i =1

(7.4.21)

Since i2 = 2 Z i2 we have

E * [Var ( y PPS ) ] =

N 2 1 N 2
Z
i Zi Zi
n i =1 i =1
i =1

(7.4.22)

) ] and E * [Var ( yran


) ] we have
Comparing E * [Var ( yPPS

N ( N 1)

B 2 S Z2 + 2 or Z i , Z i2 1 =
i

n
) ] E * [Var ( yPPS ) ]
E [Var ( yran

N ( N 1)
n

B 2 S Z2 + 2 1 S Z S 2 1
Z i Zi
Zi
i
i

1 2 N 2 N N 2 1
2 2
N Z i Z i Z i + N ( N 1) S zi
n
i 1
i =1 i 1

32

1
N
2 2
2
(
1)
1

N
N

N
N
(
)

Zi
Zi
n
i 1

i =1

i 1

Zi2 Zi

Z
i =1

2 1
i

N ( N 1) 2 2
S Zi + 2Cov Z1 , Z i2 1

n
N ( N 1) 2 2
S Zi + z , z 2 1 S zi S z2i 1Cov Z1 , Z i2 1
=
1 i

n
=

We conclude that PPS estimator will be superior to equal probability if

2 1

( Zi , Zi

> 2 SZ i / 2 SZ

2 1
i

which is same as (7.4.19).


under the model (6.8.2) Brewer and Hanif (1983) proved that:

E [Var ( y PPS ) ] = Z 2
*

2
n

1
i =1

1
n

2 1

(7.4.23)

and

)
Var ( yHH

y Y
= i
n
i =1 i
n

In the expression, i is written for npi, so that i is the expected number of appearance of the ith
population unit in sample.

7.5.

GAIN DUE TO PPS SAMPLING (WITH REPLACEMENT) OVER


SIMPLE RANDOM SAMPLING

We know that the variance expression for simple random sampling with replacement is
2

Yi
N ( N 1) 1 N 2 i =1
)=
Var ( yran
Yi N
n
N 1 i =1

(2.5.2)

and

)=
var( yPPS

1 n yi2
'2
2 n yPPS
n(n 1) i =1 pi

(7.4.12)

We can prove that


N
1 n yi2
yi2 N Yi 2
=
=
=
E
P
Yi 2


i
i =1
n i =1 pi
pi i =1 Pi

(i) E

(7.5.1)

and

33

(ii)

1
1
'2
'2
) = E ( yPPS
)
var( yPPS
) E var ( yPPS
E yPPS
N
N
1
2
)}2 E var [ yPPS ]
= E ( ypps
) {E ( ypps )}2 + {E ( yPPS
N
1
Y2
) + Y 2 Var ( yPPS ) =
= Var ( yPPS
= N Y 2 . (7.5.2)
N
N
Using (7.5.1) and (7.5.2) in (2.5.2) we can have

N ( N 1) 1 1 n yi2 1 '2
) .
yPPS var( yPPS

n
N 1 n i =1 pi N

)=
varPPS ( yran

(7.5.3)

)=
varPPS ( yran

N
n2

2
i

p
i =1

1 '2
1
).
yPPS + var( yPPS
n
n

n
yi2
1
1
'2
).
= 2 N nyPPS
+ var( yPPS
n
i =1 pi
n
Subtracting var ( yPPS ) from (5.5.4) we get

) var( yPPS
)=
varPPS ( yran

1
n2

(7.5.4)

n yi2
1
'2
) var( yPPS
).
N nyPPS + var( yPPS
i =1 pi
n

n yi2
n 1
1 n yi2
1
2
2

= 2 N nyPPS +
2 nyPPS
n i =1 pi
n n ( n 1) i =1 pi

N
n2

N
n2

1
= 2
n

i =1

i =1

i =1

2
yi2 ypps 1

2
pi
n
n

yi2 1

pi n 2

i =1

i =1

yi2 1 2
.
+ yPPS
pi2 n

yi2
pi2

yi2
1
N .
pi
pi

(7.5.5)

34

Therefore
) v ar( yPPS ) =
v arPPS ( yran

1
n2

i =1

yi2
1
N . .
pi
pi

(7.5.6)

An estimate of the percentage gain in efficiency due to pps sampling is ainin


) v ar ( yPPS
)
varPPS ( yran
100 .
var ( yPPS )

(7.5.7)

EXAMPLE (7.3)

A sample of size 5 has been selected from a population of size 20 farms. Number of trees,
along with initial probability of selection is given
i)

Estimate the total number of trees in that area, calculate the estimated variance and
standard error of this estimator.

ii)

Estimate the gain in precession over simple random sampling. The actual number of
trees are 28443.
2

yi2 / pi

S.No. of
Villages

No. of
Trees
(yi)

Probability
of Selection
(pj)

yi
pi

yi


yPPS
pi

8
4
16
11
10

311
949
11799
2483
3044

0.014
0.036
0.275
0.121
0.212

22214.286
26361.111
42905.455
20520.661
14358.490

9349614.91
1186162.77
310938735.20
22575222.29
119104700.50

6908642.9
25016694.4
506241458.1
50952801.6
43707245.28

126360.003

463154435.5

632826842.1

35

(i)

=
Estimated Total = yPPS
=

126360.003
= 25272 trees
5

= Y

Actual Total

1 n yi
.
n i =1 pi

= 28443
2

(ii)

n
yi

1
)=
.
var ( yPPS
yPPS

n(n 1) i 1 pi

(iii)

1
. [463154435.5] = 23157721.77
5 4

S.E( y PPS ) = 4812.247

)=
varPPS ( yran

1
n2

n yi2
1
'2
) (7.5.4)
N n yPPS + var( yPPS
i =1 pi
n

1
1
20 (632826842.1) 5 (25272) 2 + (23157721.77)
25
5

= 383158187.5
) v arPPS ( yPPS )
v arpps ( yran
100
)
v ar( yPPS
383158187.5 23157721.77
=
100 = 1554.56%
23157721.77
7.6.

ALTERNATIVE ESTIMATOR TO HANSEN AND HURVITZ


ESTIMATOR
Pathak (1962) described an estimator for the sampling scheme suggested by Hansen

and Hurvitz (1963). For this let we have a sample of three units selected from a
population of N units. Llet the selected sample has yi, yi, yj observations with
probabilities pi, , pi, pj respectively, then Pathak (1962) defines an estimator:

y
y + yj
1y
yp = i + j + i
,
3 pi p j pi + p j

(7.6.1)

36

or for sample size n it may be written as:

yi

n 1

y
1
yp = i + in=1 ..
n i =1 pi
pi

i =1

(7.6.2)

This is more efficient than Hansen and Hurwitz (1993) estimator but more difficult to
calculate. The gain in precision is small unless the sampling fraction is large.
7.7.

RATIO ESTIMATION FOR PPS SAMPLING


We know that
=
yHH

1 n yi
1 n xi

=
and
x

HH
n i =1 pi
n i =1 pi

Therefore

1 n

n i =1
= n
yHH
1

n i =1

yi
pi
. X.
xi
pi

(7.7.1)

From Hansen, Hurwitz and Madow (1953), we have


) = Var ( yHH ) 2 R Cov ( yHH
, xHH
) + R 2 Var ( yHH
) .(6.2.19)
Var ( yHH

Using (7.4.2) and (7.4.9) and analogues expression

37

1 N X
) = Pi i X ,
Var ( xHH
n i =1 Pi

(7.7.2)

in (6.2.19) and on simplification


) =
Var ( yHH

N
N

Yi X i
Yi 2
1 N Yi 2
2
2
R
R

+
(Y RX ) 2 (7.7.3)

n i =1 Pi
i =1 Pi
i =1 Pi

1 N 1
= (Yi R X i .
n i =1 Pi

(7.7.4)

This may be put easily as


2

Var ( yHH ) =

X
1 N Yi
Pi R i .

n i =1 Pi
Pi

(7.7.5)

) may be written in a straight forward


An approximate unbiased estimator of Var( y HH

way or may be derived


2

n
yi
xi
1
) =
var( yHH
r ,

n(n 1) i =1 pi
pi

(7.7.6)

or
2

N
yi y xi
1
) =
var( yHH

.

n(n 1) i =1 pi x pi

(7.7.7)

CHAPTER-4
TWO-PHASE SAMPLING
1.1 Introduction
Consider the problem of estimating population mean of Y of a Study Variable Y from a finite population
on N units. When information on one or more auxiliary variable say X and Z which are correlated with the
variable Y are available or can be cheaply obtained ratio or regression type estimates can be used to
improve the efficiency. These cases may include knowledge of X or Z or both X and Z . These are

38

however situations where prior knowledge about these may be lacking and a census or complete count is
too costly. Two phase sampling is used to gain information about x & z cheaply from a first stage bigger
sample. A sub sample is then selected from the units selected at the first phase & Y is observed for the
selected units.
Useful references in this area are Mohanty (1967), Chand (1975), Ahmed (1977), Kiregyera
(1980, 1984), Sahoo et al (1993) and Roy (2003). We have used Linear models and the method of Least
Squares (L.S) following Roy (2003) to deal with different situations. The results as expected are
encouraging. We have also indicated how slight adjustments can be made in earlier works to improve the
efficiency of the estimates. An implication of this is that some of these earlier works do not fully utilize the
available information.
Let N be the size of the population, from which a sample of size n1 ( n1 < N ) is drawn using a
simple random sampling without replacement. The values of X and Z are noted for the quits selected. From
this sample a sub-sample of size n2 ( n2 < n1 ) is again selected using a simple random sampling with out
replacement observing as Y. S. Further let y2 , x2 and z2 be the sample means of y, x, and z variables
respectively based on the sample of size n1 and let x2 and z2 be the sample mean based on the first phase
sample of size n1 of variable x and z respectively.
Various situations of interest may arise depending on availability of information about X and Z .
We will deal with them separately.
To suit different situation we introduce the following notations. Let S y2 =
1 =

1 N
Yi Y
N 1 i =1

1 1
1
1
, 2 =
, C y2 = S y2 Y 2 with C x2 , C z2 similarly defined. Also xy , yz and xz denote
n1 N
n2 N

the population correlation coefficient between X and Y , Y and Z and X and Z respectively. We will
also write

39

( )

y1 = Y + e y1 , x1 = X + ex1 , z1 = Z + ez1 , E ex21 = 1 X 2 Cx2

( )

( )

E e y21 = 1Y 2 C y2 and E ez21 = 1 Z 2 C y2 . E (ex ey ) = X Y C y C x xy

( )=
ex22

x2 = X + ex2 , E

E e1 ex2

X 2 C x2 , E ex2 e y2 = 2 X YC x C y xy

= ( 2 1 ) X 2 C x2 ,

(4.1.1)

E e y2 ex1 ex2 = ( 2 1 ) Y X C y C x xy

E ex ez1 ez2 = 0

E e y2 ex1 = 1Y X C y Cx xy

with other terms similarly defines: Also we will assume that both e y1 , e y2 are much smaller in comparison
with Y with similar assumptions for auxiliary variables we will look into the following situations
separately.
i) In addition to the sample we are given the population means of X and Z which are X and Z
respectively. We may call this complete information case.
ii) In addition to the sample we are given X only, ( Z being unknown). We will call this partial
information case.
iii) Only the information on the sample is available i.e. X and Z are unknown. We will call this no
additional information case.
4.2 Ratio and Regression Estimators

In this section following estimator of ratio and regression alongwith mean square error have been
considered.
a)

T1( 2 ) =

y2
X
x2

[ X is known ]

b)

T2( 2 ) =

y2
x1
x2

[ no information ]

c)

T3( 2 ) = y2 + byx ( x1 x2 ) [ no information ]

4.2.1 Ratio Estimator with known information

Consider
T1( 2 ) =

y2
X
x2

(4.2.1)
Using (1.1.1) we get

40

T1( 2 )

Y + e y2
X + ex2

.X

ey

= Y 1 + 2

ey

= Y 1 + 2

= Y + e y2

(T Y ) = e y

ex2

1
X

ex
2
X
Y
ex2
X

Y
ex
X 2

The mean square error of T1( 2 ) will be

( ) (

MSE T1( 2 ) = E T1( 2 ) Y

Y
= E e y2 ex2
X

(4.2.2)
Taking the square R.H.S of (4.2.2.) we get

Y2
Y
= E e y22 + 2 ex22 2 e y2 ex2
X
X

Using (1.1.1)

( )

MSE T1( 2 ) = Y 2 2 C y2 +

Y2
X

2 X 2 C x2 2

Y
2Y X C y C x xy
X

On simplification we get

( )

V1( 2 ) = MSE T1( 2 ) = 2Y 2 C y2 + C x2 2 xy C x C y

(4.2.3)

4.2.2 Ratio Estimator with no information

Consider
T2( 2 ) =

y2
x1
x2

(4.2.4)
Using (1.1.1) in (4.2.4) we get
T2( 2 ) =

Y + e y2
X + ex2

( X + ex )
1

41

= Y + e y2

)( X + ex )( X + ex )
1

ex
ex

= Y + e y2 1 + 1 1 2

X
X

ex
ex

= Y + e y2 1 + 1 2

X
X

Y
= Y + e y2 +
ex ex2
X 1

or
T2( 2 ) Y = e y2 +

Y
ex ex2
X 1

The mean square error of T2( 2 )

( ) (

Y
MSE T2( 2 ) = E T2( 2 ) Y = E e y2 +
ex1 ex2
X

(4.2.5)

Y2
= E e y22 + 2 ex1 ex2
X

+2

Y
e y ex ex2
X 2 1

Using (1.1.1) we get

( )

MSE T2( 2 ) = 2Y 2 C y2 +
or

Y2
X2

Y
X
2 2
2
= 2Y C y + Y ( 1 2 ) C x2 2Y 2 ( 1 2 ) C x C y xy

( 1 2 ) X 2 Cx2 + 2 ( 2 1 ) Y X Cx C y xy

( )

V3( 2 ) = MSE T3( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x2 2 C x C y xy

(4.2.6)
4.2.3 Regression Estimator with no information

Consider
T3( 2 ) = y2 + byx ( x1 x2 ) (4.2.7)

Using (1.1.1) we get

T3( 2 ) = Y + e y2 yx + er

) ( ex

= Y + e y2 + yx ex1 ex2

or

(T ( ) Y ) = e
2 2

y2

ex2

+ yx ex1 ex2

(4.2.8)
The mean square error of T3( 2 ) is

( ) (

MSE T3( 2 ) = T3( 2 ) Y


(4.2.9)

= E e y2 + yx ex1 ex2

or

42

( )

MSE T3( 2 ) = E e y22 + yx e y2 ex1 ex2

(4.2.10)

or

( )

MSE T3( 2 ) = 2Y 2 C y2 + yx ( 1 2 ) Y X C y C x xy

Substituting the value of yx =

Y Cy

xy

X Cx

xy YC y
MSE T3( 2 ) = 2Y 2 C y2 +
( 1 2 ) Y X C y Cx xy
XC x

( )

On simplification we get

( )

MSE T3( 2 ) = Y 2 C y2 2 + ( 1 2 ) 2xy

( )

V2( 2 ) = MSE T2( 2 ) = Y 2 C y2 2 1 2xy + 1 2xy

(4.2.11)

4.3 Mohantys [1967] Estimator and some modifications

In this section following estimators are mentioned.


a)

Z
T4( 2 ) = y2 + byx ( x1 x2 )
z2

b)

z
T5( 2 ) = y2 + byx ( x1 x2 ) 1
z2

c)

X
T6( 2 ) = y2 + byz ( z1 z2 )
x2

d)

T7( 2 ) =

z2
z1 + byx X x1

z2

4.3.1 Mohanty (1967) considered the estimation when Z is known

Z
T4( 2 ) = y2 + byx ( x1 x2 )
z2

(4.3.1)
Using (1.1.1) in (4.3.1) we get

T4( 2 ) = Y + ey2 + yx + er

) ( ex

Z
ex2
z + ez
2

On simplification we get

Y
ez
Z 2

Y
ez
Z 2

T4( 2 ) = Y + ey2 + yx ex1 ex2

or
T4( 2 ) Y = ey2 + yx ex1 ex2

The MSE of T3( 2 ) is

43

E T4( 2 ) Y

Y
= E ey2 + yx ex1 ex2 ez2
Z

(4.3.2)

= E ey22 + 2yx ex1 ex2

Y2
Z

ez22 + 2 yx ey2 ex1 ex2

Y
Y
e y2 ez2 2 yx ez2 ex1 ex2
Z
Z

Y2
MSE T4( 2 ) = 2Y 2 C y2 + ( 2 1 ) 2yx X 2 C x2 + 2 2 Z 2 Cz2

Z
+ ( 1 2 ) 2 yx Y X C y C x xy 2 2
2 ( 1 2 ) yx

Y
Y Z C y C z yz
Z

Y
Z X C z C x xz
Z

(4.3.3)
Putting the values of yx =

xy C y Y
X Cx

in (4.3.3) we get

( )

MSE T4( 2 ) = 2Y 2 C y2 + ( 1 1 )

2xy C y2Y 2
X 2 Cx2

X 2 C x2

+2 ( 1 2 )

2 Y 2 C y C z yz 2

xy YC y
X Cx

Y X C y C x xy

xy C y Y Y
( 1 2 ) Z X Cz Cx xz
Cx X Z

(4.3.4)

On simplification

( )

MSE T4( 2 ) = Y 2 2 C y2 + ( 2 1 ) C y2 2xy + 2 Cz2 + 2 ( 1 2 ) C y2 2xy

2 2 C y C z yz 2 ( 1 2 ) C y Cz xy xz

= Y 2 2C y2 ( 2 1 ) C y22xy + 2 Cz2 22 C y Cz yz 2 ( 1 2 ) C y Cz xy xz

= Y 2 2 C y2 2 C y2 2yz + 2 C y2 2yz ( 2 1 ) C y2 2xy + 2 C z2

22 C y C z 2 ( 1 2 ) C y Cz xy xz

or

= Y 2 2 C y2 1 2yz + 2 C z2 2C y Cz + C y2 2yz C y2 2yz

2 C y2 2yz ( 2 1 ) C y2 2xy 2 ( 1 2 ) C y C z xy xz

or

) (


= Y 2 2 C y2 1 2xz + C z C y yz

) C
2

2 2
y yz

44

+2 C y2 2yz ( 2 1 ) C y2 2xy 2 ( 1 2 ) C y C z xy xz

or

) (


= Y 2 2 C y2 1 2xz + C z C y yz

)
2

+ ( 2 1 ) C y2 2xy + 2C y C z xy xz C z2 2xz + C z2 2xz

or

( )

) (


V4( 2 ) = MSE T4( 2 ) = Y 2 2 C y2 1 2xz + C z C y yz

)}

+ ( 2 1 ) C z2 2xz C y xy C z 2xz

(4.3.5)
4.3.2 Mohantys Ratio-Cum-Regression Estimator with no Information

Mohanty (1967) constructed another Ratio-Cum-Regression estimator when sample


information are only given i.e.

z
T5( 2 ) = y2 + byx x1 x2 1

z2

(4.3.6)
Using (1.1.1) we get
Z + ez
1
T5( 2 ) = Y + ey2 + Byx + er ex1 ex2

Z + ez
2

or

ez ez
T5( 2 ) = Y + ey2 + Byx ex1 ex2 1 + 1 1 2


Z
Z

The mean square error of T5( 2 ) is

E T5( 2 ) Y

Y
= E ey2 + yx ex1 ex2 +
ez1 ez2
Z

(4.3.7)

( )

MSE T5( 2 ) = E ey22 + 2yx ex1 ex2

+2 yx ey2 ex1 ex2 + 2

Y2

( ez
Z2

ez2

Y
Y
e y2 ez1 ez2 + 2 yx
ex ex2
Z
Z 1

)( ez

ez2

Using (1.1.1) we get

45

( )

MSE T5( 2 ) = 2Y 2 C y2 + 2yx ( 2 1 ) X 2 Cx2 +

Y2
Z2

( 2 1 ) Z 2 Cz2

Y
( 2 1 ) Y Z C y Cz yz
Z
+2 yx ( 2 1 ) X Z C x C z yz

+2 ( 1 2 ) yx Y X C y C x xy + 2

(4.3.8)
Putting the value of yx =

YC y
XC x

( )

MSE T5( 2 ) = 2Y

+2 ( 2 1 )
+2 ( 2 1 )

xy C y Y
X Cx
xy C y Y
X Cx

xy

C y2

+ ( 2 1 )

2xy C y2 Y 2
C x2 X 2

X 2 C x2 +

Y2
Z

( 2 1 ) Z 2 Cz2

Y
( 2 1 ) Y Z C y Cz yz
Z

Y X C y Cx xy + 2
X Y Cx C z xy yz

On simplification we get

( )

MSE t5( 2 ) = Y 2 2 C y2 + ( 2 1 ) C y2 2xy C z2 2xz + 2C y Cz xy xz + Cz2 2xz

+Cz2 + C y2 2yz 2C y Cz yz + C y2 2yz

= Y 2 2 C y2 + ( 2 1 ) 2xy C y2 + C z2 2 2xy C y2

2 C y C z yz + 2 C y C z xy xz
= Y 2 2 C y2 + ( 2 1 ) 2xy C y2 C z2 2C y Cz yz + 2C y C z xy xz

( )

V5( 2 ) = MSE T5( 2 ) = Y 2 2 C y2 + ( 2 1 ) 2xz Cz2 xy C y xz Cz

+ C z C y yz

C y2 2xz

(4.3.9)
4.3.3 Modification of T4( 2 ) by Interchanges X and Z
X
T6( 2 ) = y2 + byz ( z1 z2 )
x2
(4.3.10)

Using (1.1.1) in (4.3.10) we get

T6( 2 ) = Y + e y2 + yz + ez

) ( ez

X
ez2
X + ex
2

ex

= Y + e y2 + yz ez1 ez2 1 2

46

T6( 2 ) = Y

Y
+ ex2 + e y2 + yz ez1 ez2
X

or

T6( 2 ) Y = e y2 + yz ez1 ez2

Y
ex
X 2

The mean square error of T6( 2 ) is

( ) ( )

MSE T6( 2 ) = E T7( 2 )

Y
= E e y2 + yz ez1 ez2 ex2
X

(4.3.11)
Squaring the R.H.S of (4.3.11)

( )

Y2
MSE T6( 2 ) = e y22 + 2yz ez1 ez2 + 2 e x22 + 2 yz e y2 ez1 ez2
X

Y
Y
e y2 ex2 2 yz ex2 ez1 ez2
X
X

Using (1.1.1)

( )

MSE T6( 2 ) = 2 Y 2 C y2 + ( 2 1 ) 2yz Z 2 + 2

22

Y2

X 2 C x2

X
+ ( 1 2 ) 2 yz Y ZC y C z C y C z yz

Y
Y
Y X C y C z xy ( 1 2 ) 2 yz
X Z Cx Cz xz
X
X

Putting the value of yz =

Y 2C y
Z 2 ez

2yz

( )

MSE T6( 2 ) = 2 Y 2 C y2 + ( 2 1 )

2yz Y 2 C y2
Z

C z2

+ 2

+ 2 ( 1 2 )

22

Y2
X

X 2 C x2

yz Y C y
Z Cz

Y Z C y Cz yz

yz Y C y Y
Y
. X Z C x C z C y C z xz
Y X C y C x xy 2 ( 1 2 )
X
Z Cz
X

or
= Y 2 2 C y2 + ( 1 1 ) C y2 2xz + 2 C x2 2 2 C y C x xy

2 ( 1 2 ) yz xz C y C x

= Y 2 2 C y2 + ( 2 1 ) C y2 2xz + 2 C y Cx xy 2 ( 1 2 ) yz xz C y C x

= Y 2 2 C y2 + ( 2 1 ) C y2 2xz + C x C y yz xz + 2 Cx2 2C y C x xy

47

= Y 2 2 C y2 + ( 2 1 ) C y2 2xz + Cx2 2yz 2Cx C y yz xz C x2 2yz

+2 Cx2 + C y2 2xy 2C y Cx xy C y2 2xy

MSE T6( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x2 2xy C y xy C x yz

+2 Cx C y xy

(4.3.12)

C y2 xy

4.3.4 Modification of Mohanty when X is known


T7( 2 ) =

y2
z1 + byx X x1

z2
(4.3.13)

Using (1.1.1) in (4.3.13) we get


T7( 2 ) =

Y + e y2

Z + ez + yx 1 ex
1
1
Z + ez 2
=

( Y + e y ) 1 ez
2

ez1

Z 1 +
yx ex1
Z
Z

ez ez
ex
ez
= Y + ey2 1 + 1 1 2 yx 1 1 2

Z
Z
Z
Z

ez
ez

= Y + e y2 1 + 1 2 yx ex1
Z
Z

or
= Y + e y2
T7( 2 ) Y = e y2 +

Y
Y
ez1 ez2 yx ex1
Z
Z

Y
Y
ez ez2 yx ex1
Z 1
Z

Mean square error of T7( 2 ) is

( ) (

MSE T7( 2 ) = E T7( 2 ) Y

Y
Y
E e y2 +
ez1 ez2 yx ex1
Z
Z

(4.3.14)

Y2
= E e y22 + 2 ez1 ez2
Z

Y2
Z

2yx e x21

Y
Y
Y2
e y2 ez1 ez2 2 yx ey2 ex1 2 2 yx ex1 ez1 ez2
Z
Z
Z

= 2Y 2 C y2 + ( 2 1 ) +

Y2
Z

Z 2 C z2

Y2
Z

X 2 C x2 + 2 ( 1 2 )

Y
YC y ZCz yz
Z

48

Y
1 yz Y X C y C x yx
Z

On simplification we get

( )

X2
MSE T7( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz2 + 12yx 2 C x2
Z

+2 ( 1 2 ) C y C z 21

X
yz C y C z yx
Z

Putting the value of yx

2xy Y 2 C y2 X 2 2
MSE T7( 2 ) = Y 2 C y2 + ( 2 1 ) Cz2 + 1
Cx
X 2 Cx2 Z 2

( )

+2 ( 1 2 ) C y Cz yz 21

Y Cy
X
C y C x yz
xy
Z
X Cx

or

Y2
= Y 2 2 e y1 + ( 2 1 ) Cz2 + 1 2 C y2 2xy
Z

+2 ( 1 2 ) C y Cz yz 21

Y 2
C y xy yz
Z

or

= Y 2 2 C y2 + ( 2 1 )

{C

2
z

2C y C z xz + C y2 2yx C y2 2xy

2
Y
Y

+2 2 C y2 2xy 2 C y2 xy yz
Z
Z

or

= Y 2 2 C y2 + ( 2 1 ) Cz C y xy

C y2 2yx

Y 2

Y
+1 2 C y2 2xy + 2yz 2yz 2 C y2 xy yz
Z
Z

or

( )

MSE T7( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz C y xy

C y2 2xy

2
Y

+1 C y xy yz 2yz

(4.3.15)

4.4

Chands (1975) Estimators

In this section following estimator are derived


a)

T8( 2 ) = y2

x1 Z
x2 z1

49

b)

T9( 2 ) = y2

x2 z1
x1 Z

c)

T10( 2 ) = y2

z1 x1
z2 X

d)

T11( 2 ) = y2

z1 X
z2 x1

4.4.1 Chand (1975) suggested chain-based ratio and product estimator-I


T8( 2 ) = y2

x1 Z
x2 z1

(4.4.1)
Using (1.1.1) in (4.4.1) we get

T8( 2 ) = Y + e y2

X +e1

) X + ex

Z
Z + ez1

x2

ex
ex
ez

= Y + e y2 1 + 1 1 2 1 1

X
X
Z

ex
ex
ez

= Y + e y2 1 + 1 1 1

X
X
Z

Y
Y
=Y +
ex ex2 ez1 + e y2
X 1
Z

or
T8( 2 ) Y = e y2 +

Y
Y
ex ex2 ez1
X 1
Z

The mean square of T8( 2 ) will be

Y
Y
E T8( 2 ) Y = E e y2 +
ex1 ex2 ez1
X
Z

(4.4.2)

Squaring the R.H.S. of (4.4.2)

Y2
= E e y22 + 2 ex1 ex2
X

Y2
Z

ez21 + 2

Y
ey ex ex2
X 2 1

Y
Y2
e y2 ez1 2
ex1 ex2 ez1
Z
XZ

Using (1.1.1) we get

( )

MSE T8( 2 ) = Y 2 C y2 +
+2

Y2
X

( 2 1 ) X Cx2 + 1
2

Y2
Z

Z 2 Cz2

Y
Y
( 1 2 ) Y X C y Cx xy 2 1Y Z C y Cz yz 0
X
Z

( )

MSE T8( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cx2 + 1 Cz2

50

+ 2 ( 1 2 ) C y C x xy 21 C y Cz yz

= Y 2 2 C y2 + ( 2 1 ) Cx2 + 2xy C y2 2 C y C x xy 2xy C y2

+ 1 C z2 21 C y C z yz

{(

( )

V8( 2 ) =MSE T8( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x C y xy

{(

+1 C z C y yz

C y2 2xy

C y2 2yz

(4.4.3)
4.4.2 Chand (1975) Chain-based Ratio and Product Estimator-II

Chand (1975) considered another chain-based ratio estimator


T9( 2 ) = y2

x2 z1
x1 Z

(4.4.4)
Using (1.1.1) in (4.4.4) we get

T9( 2 ) = Y + e y2

X +e

) X + ex

Z + ez1
Z

x1

ex e x
ez

= Y + e y2 1 + 2 1 1 1 + 1

X
X
Z

Y
Y
= Y + e y2
ex1 ex3 + ez1
X
Z

T9( 2 ) Y = e y2

Y
Y
ex ex2 + ez1
X 1
Z

The mean square error of T9( 2 ) is

( ) (

MSE T9( 2 ) = E T9( 2 ) Y

Y
Y
= E e y2
ex1 ex2 + ez1
X
Z

(4.4.5)

Y2
= E e y22 + 2 ex1 ex2
X

+2

Y2
Z

ez21 2

Y
ey ex ex2
X 2 1

Y
Y2
e y2 ez1 2
ex1 ex2 ez1
Z
XZ

(4.4.6)
Using (1.1.1) in (4.4.6) we get

( )

MSE T9( 2 ) = 2Y 2 C y2 +
2

Y2

Y2

Z2

( 2 1 ) X 2 Cx2 +
2

1 Z 2 Cz2

Y
Y
( 1 2 ) Y X C y Cx xy + 2 2 C y Cz yz Y Z + 0
X
Z

or

51

= Y 2 2 C y2 + ( 2 1 ) C x2 + 1 C z2 + 2 ( 2 1 ) C y C x xy + 21 C y Cz yz

or

= Y 2 2 C y2 + ( 2 1 ) C x2 + 2 C y Cx xy + 1 C z2 + 2 C y C z yz

{(

( )

V9( 2 ) = MSE T9( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x + xy C y

2
+1 C z + yz C z 2yz C z2

(4.4.7)

C y2 2xy

4.4.3 Modification of Chands T9( 2 )

Following additional estimator parallel to Chand (1975) estimator is considered


T10( 2 ) = y2

z1 x1
z2 X

(4.4.8)
Using (1.1.1) in (4.3.8) we get

T10( 2 ) = Y + e y2

Z +e1

) Z + ez

X
X + ex1

z2

ez
ez
ex

= Y + e y2 1 + 1 1 + 2 1 + 1

Z
Z
X

= Y + e y2

(T10 Y ) = ey

Y
Y
ez ez2 ex1
Z 1
Z

Y
Y
ez1 ez2 ex1
Z
Z

The mean square error of T10( 2 ) is

Y
Y
E T10( 2 ) Y = e y2 +
ez1 ez2 ex1
Z
Z

(4.4.9)

Y2
= E e y22 + 2 ez1 ez2
Z

Y2
Z2

e x21 +

Y
Y2
e y2 ex1 2 2 ex1 ez1 ez2
Z
Z

Y
ey ez ez2
Z 2 1

(4.4.10)
Using (1.1.1) in (4.4.10) we get

MSE T10( 2 ) = 2Y 2 C y2 +
+2

Y2

Y2

Z2

( 2 1 ) Z 2 Cz2 + 2
2

X 2 C x2

Y
Y
( 1 2 ) Y Z C y Cz yz 2 1Y XC y Cx xy + 0
Z
Z

52

= Y 2 2 C y2 + ( 2 1 ) C z2 + 1Cx2 + 2 ( 1 2 ) C y C z yz 21C y C x xy

= Y 2 2 C y2 + ( 2 1 ) C z2 2C y Cz yz + C y2 2yz C y2 2yz

1 C x2 2C y C x xy + C y2 2xy C y2 2xy

or

V10( 2 ) = MSE T10( 2 ) = Y 2 2 C y2 + ( 2 1 ) C z C y yz

+1 C x C y xy

(4.4.11)

C y2 2yz

C y2 2xy

4.4.4 Modification of Chands T8( 2 )


T11( 2 ) = y2

z1 x1
z2 X

(4.4.12)
Using (1.1.1) in (4.4.12) we get

= Y + e y2

Z + e 1 X + ex1
.
X
z2

) Z + ez

ez
ez
ex

= Y + e y2 1 + 1 1 + 2 1 + 1

Z
Z
X

T11( 2 ) = Y + e y2 +

Y
Y
ez1 ez2 + ex1
Z
Z

T11( 2 ) Y = e y2 +

Y
Y
ez ez2 + ex1
Z 1
Z

The mean square error of T11( 2 ) is

) (

MSE T11( 2 ) = T11( 2 ) Y

Y
Y
= E e y2 +
ez1 ez2 + ex1
Z
Z

(4.4.13)

Y2
Y2
MSE T11( 2 ) = E e y22 + 2 ez1 ez2 + 2 e x21
Z
Z

+2

Y
Y
Y2
ez1 ez2 + 2 ey2 + 2 ex1 ez1 ez2
Z
Z
Z

(4.4.14)

Using (1.1.1) in (4.4.14) we get

MSE T11( 2 ) = 2Y 2 C y2 +

Y2
Z

( 2 1 ) Z 2 Cz2 + 1
2

Y2
Z

X 2 C x2

53

Y
Y
( 2 1 ) Y Z C y Cz yz + 2 1Y X C y Cz xy + 0
Z
X

On simplification we get

MSE T11( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz2 + 1C x2

+2 ( 2 1 ) C y C z yz + 21C y C x x xy

= Y 2 2 C y2 + ( 2 1 ) Cz2 + 2C y Cz yz + 1 C x2 + 2C y C x xy

V11( 2 ) = MSE T10( 2 ) = Y 2 2 C y2 + ( 2 1 ) C z + C y yz

{(

C y2 yz

+1 C x + C y xy C y2 2xy

(4.4.15)
4.5

Kiregyeras Estimators and some modifications

Kiregyera (1980, 1984) suggested the following estimators:


a)
b)
c)

T12( 2 ) =

y2
x1 + bxz Z z1

x2

T13( 2 ) = y2 + byx 1 Z x2
z1

T14( 2 ) = y2 + byx ( x1 x2 ) + bxz z1 Z

The following estimators on the lines of Kiregyeras are also been suggested to meet the
requirements of this monographs:

y2
z1 + bzx X x1

z2

d)

T15( 2 ) =

e)

T16( 2 ) = y2 + byz 1 X z2
x1

4.5.1 Kiregyeras (1980) Estimator (Chand-Kiregyera Estimator)

This is a modification of Chand (1975). Kiregyera (1980) assumed that Z i is closely related to X i , but
compared to X i is remotely related to Yi . This assumption may not always be to realize in particular.
Therefore T8( 2 ) may not be effectively used in many situations
T12( 2 ) =

y2
x1 + bxz Z z1

x2

(4.5.1)
Using (1.1.1) in (4.5.1) we get
T12( 2 ) =

Y + e y2

X + ex + ( xz + er ) Z Z ez
1
1
X + ex2

54

= Y + e y2

ex2
ex1

1 +
X 1 +
xz ez1
X
X

1
X

ex
ex
ez

= Y + e y2 1 2 1 + 1 xz 1

X
X
X

ex
ez

ex
= Y Y 2 + e y2 1 + 1 xz 1
X
X
X


ex

Y
Y
= Y Y 2 + e y2 + ex1 xz ez1
X
X
X

T12( 2 ) = Y + e y2 +

Y
Y
ex ex2 xz ez1
X 1
X

T12( 2 ) Y = e y2 +

Y
Y
ex1 ex2 xz ez1
X
X

(4.5.2)
The mean square error of T12( 2 )

E T12( 2 ) Y

Y
Y
= E e y2 +
ex1 ex2 xz ez1
X
X

(4.5.3)

Y
= E e y22 +
ex ex2
X 1

Y2

2Y
e y ex ex2
X 2 1

Y
Y2
xz e y2 ez1 2 xz 2 ez1 ez1 ez2
X
X

+ 2xz

ez21 +

(4.5.4)
Using (1.1.1) in (4.5.4) we get

MSE T12( 2 ) = 2Y 2 C y2 +

Y2
X

( 2 1 ) X 2 Cx2 + 12xz
2

+2 ( 1 2 )

Y
Y X C y Cx xy
X

Y2

Z 2 Cz2
X2
Y
2 1 xz Y C y Z C x yz + zero
X

(4.5.5)
Putting the Values of xz = xz

XC x
in (4.5.5) we get
Z Cz

MSE T12( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x2 + 1 2xz C z2

+2 ( 1 2 ) C y C x xy 21 xz yz C y C x

or

{(

= Y 2 2 C y2 + ( 2 1 ) Cx2 2 C y Cx xy + 2xy C y2 2xy C y2

+1 2xz C z2 2 xz yz C y C z

or

55

{(

V12( 2 ) = MSE T11( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x xy C y

{(

2xy C y2

+1 C z xz C y yz C y2 2yz

(4.5.6)

4.5.2 Kiregyeras (1984) Estimator (Chand-Kiregyera Estimator)


x

T13( 2 ) = y2 + byx 1 .Z x2
z1

(4.5.7)
Using (1.1.1) in (4.5.7) we get
X + ex

1
T13( 2 ) = Y + e y2 + yx + er
Z X ex2
Z + ez

ez

= Y + ey2 + yx X + ex1 1 1 X ex2

X
= Y + ey2 + yx X + ex1 ez1 X ex2
Z

= Y + ey2 + yx ex1 ex2 yx

T13( 2 ) Y = e y2 + yx ex1 ex2 yx

X
ez
Z 1

X
ez
Z 1

(4.5.8)
The mean square of T13( 2 ) is

E T13( 2 ) Y

X
= E ey2 + yx ex1 ex2 yx ez1
Z

(4.5.9)

= E ey22 + 2yx ex1 ex2

+ 2yx
yx

X2
Z2

ez21 + 2 yx ey2 ex1 ex2

X
X
e y2 ez1 22yx ez1 ex1 ex2
Z
Z

(4.5.10)
Using (1.1.1) in (4.5.10) we get

MSE T13( 2 ) = 2 C y2 Y 2 + ( 2 1 ) 2yx X 2 C x2 + 2yx 1

X2
Z

Z 2 Cz2

+2 ( 1 2 ) yx Y X C y C x yx

X
1 Y Z C y C z + zero
Z

(4.5.11)
Putting the value of yx =

xy C y Y
X Cx

in (4.5.8) we get

56

MSE T13( 2 ) =

2 C y2
+

Y + ( 1 1 )
2

2xy

C y2 Y
X C x2

xy C y Y

1
1

X Cx

X2
Z

2xy C y2 Y 2
X

X 2 C x2

C x2

Z 2 ( 1 2 )

xy C y Y

Y X C y C x xy

X Cx

X
Y Z C y C z yz
Z

or
= Y 2 2 C y2 + ( 2 1 ) C y2 2xy + 1 2xy

C y2 C z2

2 ( 2 1 ) 2xy C y2 21 xy yz

C x2
Cz 2
Cy
Cx

or

C2
C
= Y 2 2 C y2 + ( 2 1 ) C y2 2xy 22xy C y2 + 1 2xy C y2 z2 2 xy yz z C y2
Cz
Cx

= Y 2 2 C y2 ( 1 2 ) C y2 2xy + 1 xy C y z yz C y 2xz C y2

C
x

Cz

xz 2yz (4.5.12)
V13( 2 ) = MSE T13( 2 ) = Y 2 ( 2 1 ) xy + 1 xy

Cx

4.5.3

Kiregyeras (1984) Regression in Regression

Kiregyera (1984) also developed a regression in regression estimator i.e.


T14( 2 ) = y2 + byx

{( x x ) + b ( z
1

xz

)}

(4.5.13)
This may be written as

T14( 2 ) = y2 + byx ( x1 x2 ) + byx bxz z1 Z

= Y + e y2 + yx ex1 ex2 + yx xz ez1

or

T14( 2 ) Y = e y2 + yx ex1 ex2 yx xz ez1

The mean square error of T14( 2 ) is

E T14( 2 ) Y

= E e y2 + yx ex1 ex2

yx xz ez1

(4.5.14)
Taking the square of R.H.S. of (4.5.14) we get

MSE T14( 2 ) = E e y22 + 2yx ex1 ex2

+ 2yx 2xz ez21 + 2 yx e y2 ex1 ex2

57

2 yz xz e y2 ez1 2 2yx xz ez1 ex1 ex2

(4.5.15)

Using (1.1.1) we get (4.5.15)

2
MSE T14( 2 ) = 2Y 2 C y2 + 2yx ( 2 1 ) X 2 C x2 + 1 Byx
2xz Z 2 C z2

+2 xy ( 1 2 ) Y C y X C x xy 2 yx xz 1 Y C y Z C z yz + zero

(4.5.16)
Putting the value of yx =

xy C y Y
X Cx

and xz =

MSE T14( 2 ) = 2 Y

2xy C y2 Y 2

+ ( 2 1 )

C y2

+1

xz X C x
we get
Z Cz

X 2 C x2

2xy Y 2 C y2 2xz X C x2
X

C x2

+2 ( 1 2 )
21

C z2

xy Y C y
X Cx

Z 2 C z2

Y C y X C x xy

xy X C y xz X C x
Z Cx

X 2 C x2

X Cz

YC y Z C z yz

= Y 2 C y2 2 ( 2 1 ) 2xy 2 2xy + 1 2xy 2xz 2 xy xz yz

or

V14( 2 ) = MSE T14( 2 ) = Y 2 C y2 2 1 2xy + 1 2xy + 2xz 2 xy xz yz

(4.5.17)

4.5.4 Some Modifications of Kiregiyeraa Estimators


T15( 2 ) =

y2
z1 + bxz X x1

z2
(4.5.18)

Using (1.1.1) in (4.5.18) we get


T15( 2 ) =

Y + e y2

Z + ez + ( xz + er ) X X ex
1
1
Z + ez 2

On simplification we get

T15( 2 ) = Y + e y2 1 +
ez1 ez2 xz ex1
Z

(4.5.19)

or

Y
Y
ez1 ez2 xz ex1
Z
Z
(4.5.20)

T15( 2 ) = Y + e y2 +

Putting the value of xz =

yz ZC y
X Cx

in (4.5.20) we get

58

T15( 2 ) Y = e y2 +

Z Cz
Y
ez1 ez2 xz
Y ex1
Z
Z X Cx

C
Y
Y
ez1 ez2 xy z ex1
Cx
Z
Z
(4.5.21)

T15( 2 ) Y = e y2 +

The mean square error of T15( 2 ) is

) (

MSE T15( 2 ) = E T15( 2 ) Y

C
Y
Y
= E e y2 +
ez1 ez2 xz z ex1
Cx
Z
Z

(4.5.22)

Taking the square of R.H.S. of (4.5.22) we get

Y2
MSE T15( 2 ) = E e y22 + 2 ez1 ez2
Z

2
2

Y
Z

2
2

Y2
X

Y2

C z2

C
Y
ey2 ex1 xz z
Cx
Z

Cz

Cx

ex1 ez1 ez2 2


ex1 ez1 ez2 xz

2xz

C x2

e x21

(4.5.23)
Using (1.1.1) in (4.5.23) we get

MSE T15( 2 ) = 2Y 2 C y2 +

+2

Y2

Y2

X2

( 2 1 ) Z 2 Cz2 +
2

2xz

Cz2

1 X 2 Cx2

Cx2

C
Y
Y
( 1 2 ) Y Z C y Cz yz 2 1Y X C y Cx yx xz z + 0
Cx
Z
X

(4.5.24)
On simplification we get

MSE T15( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz2 + 12xz C z2

2 ( 2 1 ) C y C z yz 21C y C z xy

= Y 2 2 C y2 + ( 2 1 ) Cz2 2C y C z yz

+1 2xz C z2 2C y C z xy

{(

V15( 2 ) = MSE T15( 2 ) = Y 2 2 C y2 + ( 2 1 ) C z C y yz C y2 2yz

= xz Cz C y xy

(4.5.25)
4.5.5

C y2 2xy

Second Modification of Kiregyeras Estimator

59

T16( 2 ) = y2 + byz 1 X z2
x1

(4.5.26)
Using (1.1.1) in (4.5.26) we get
Z + ez
1
T16( 2 ) = Y + e y2 + byz
X Z + ez 2
X + ex1

or

T16( 2 ) Y = e y2 + + yz Z + ez1

) 1

ex1
Z ez2
X

Z
= e y2 + yz Z + ez1 Z ez2 ex1
X

or

Z
T16( 2 ) Y = e y2 + yz ez1 ez2 ex1
X

(4.5.27)

The mean square error of T16( 2 ) is

) (

MSE T16( 2 ) = E T16( 2 ) Y

Z
= E e y2 + yz ez1 ez2 yz ex1
X

(4.5.28)

Taking the square of R.H.S. of (4.5.28) we get

MSE T16( 2 ) = E e y22 + 2yz ez1 ez2

+ 2yz

Z2
X

e x21 + 2 yz

2 yz

Z
ex ez ez2
X 1 1

Z
Z
e y2 ex1 22yz ex1 ez1 ez2
X
X

(4.5.29)
Using (1.1.1) in (4.5.29) we get

MSE T16( 2 ) = 2Y 2 C y2 + 2yz ( 2 1 ) Z 2 C z2 + 2yz Z 2 1 X 2 C x2


2 yz ( 1 2 ) Y ZC y C z yz 2 yz 1

Putting the value of yz =

yz Y C y
Z Cz

MSE T16( 2 ) = 2Y 2 C y2 +
+

yz Y C y
Z Cz

Z
Y XC y C z yz 0 (4.5.30)
X

in (4.5.30) we get
2yz Y 2 C y2
Z 2 C z2

( 2 1 ) Z 2 Cz2 +

( 1 2 ) Y ZC y Cz yz 2

yz YC y
Z Cz

2yz Y 2 C y2
Z 2 C z2
1

Z 2 1 X 2 C x2

Z
Y X C y C x xy
X

On simplification we get

60


C2
MSE T16( 2 ) = Y 2 2 C y2 + ( 2 1 ) C y2 2yz + 1 x2 2yz C y2
Cz

+2 ( 1 2 ) C y2 2yz 21

Cx 2
C y yz xy
Cz

or

C2

C
= Y 2 C y2 2 1 2yz + 1 x2 2yz 2 x yz xy
C

Cz

or

C2

C
= Y 2 C y2 2 1 2yz + 1 x2 2yz + 22xy 2 x xy 2xy
C

Cz

or

V16( 2 ) = MSE T16( 2 ) = Y

4.6

C y2

2 1 2yz + 1 C x yz xy 2xy (4.5.31)

Cz

Sahoo et al Estimator

The following estimators will be considered


a)
b)

(
(

T17( 2 ) = y2 + byx ( x1 x2 ) + byz Z z1

T18( 2 ) = y2 + byz ( z1 z2 ) + byx x1 X

)
)

4.6.1 Sahoo et al (1993)

Sahoo et al (1993) developed another type of regression estimator i.e.

T17( 2 ) = y2 + byx ( x1 x2 ) + byz Z z1

(4.6.1)
Using (1.1.1) in (4.6.1) we get

) ( ex ex ) + ( yz + er ) ( ez )
+ yx ( ex ex ) yz ez

T18( 2 ) = Y + e y2 + yx + er
= Y + e y2

or

T17( 2 ) y = e y2 + yx ex1 ex2 yz ez1

(4.6.2)
The mean square of T17( 2 )

) (

MSE T17( 2 ) = E T17( 2 ) Y

= E e y2 + B yx ex1 ex2 yz ez1

(4.6.3)
Taking the square of R.H.S. of (4.6.3) we get

= E e y22 + 2yx ex1 ex2

+ 2yz ez21 + 2 yx e y2 ex1 ex2

61

2 yz e y2 ez1 2 yx ez1 ex1 ex2

(4.6.4)

Using (1.1.1) in (4.6.4) we get

MSE T17( 2 ) = 2 Y 2 C y2 + ( 2 1 ) yx X 2 C x2 + 1 2yz Z 2 C z2


+2 yx ( 1 2 ) Y C y X C x xy 21 yz Y C y ZC z yz + 0

(4.6.5)
Putting the value of yx and yz in (4.6.5) we get

MSE T17( 2 ) = 2 Y

C y2

+1

2yz

21

or

+ ( 2 1 )
2

Y C y2
Z 2 C z2

yz Y C y
Z Cz

2xy Y 2 C y2
X

Cx2

X 2 C x2

Z 2 Cz2 + 2 ( 1 2 )

xy YC y
X Cx

Y C y X C x xy

Y C y Z C z yz

MSE T17( 2 ) = Y 2 C y2 2 + ( 2 1 ) 2xy + 1 2xy 1 2yz

or

V17( 2 ) = MSE T11( 2 ) = Y 2 C y2 2 1 2xy + 1 2xy 2yz

(4.6.6)

62

4.6.2 Modification of Sahoo et al Estimator


T18( 2 ) = y2 + byz

{( z

z2 ) bzx x1 X

)}

(4.6.7)

T18( 2 ) = y2 + byz ( z1 z2 ) byz bzx x1 X ,

(4.6.8)
where byz is sample estimate from second phase and bzx sample estimate from the first phase.
Using (1.1.1) in (4.6.8) we get

T18( 2 ) = Y + e y2 yz + er

) ( Z + ez

Zez2

) ( yz + er ) ( zx + er ) ( X + ex

or on simplification

T18( 2 ) Y = e y2 yz ez1 ez2 yz zx ex1

(4.6.9)
The mean square error of T18( 2 ) is

) (

MSE T18( 2 ) = E T18( 2 ) Y

= E e y2 + yz ez1 ez2 yz zx ex1

(4.6.10)

Squaring the R.H.S. of (4.6.10) we get

MSE T18( 2 ) = E e y22 + 2yz ez1 ez2

+ 2yz 2zx e x21 + 2 yz ey2 ez1 ez2

2 yz zx ey2 ex1 22yz zx ex1 ez1 ez2

(4.6.11)
Using (1.1.1) in (4.6.11) we get

MSE T18( 2 ) = 2Y 2 C y2 + ( 2 1 ) 2yz Z 2 C z22yz 2xz 1 X 2 Cx2


+ yz ( 1 2 ) Y ZC y C z yz 2 yz xz 1 X YC y C x xy + 0

(4.6.12)
Putting the value of yz and zx in (4.6.12)

MSE T18( 2 ) = 2Y 2 C y2 + ( 2 1 )
+2

yz YC y
ZCz

2yz Y 2 C y2
Z 2 C z2

Z 2 C z2 + 1

Y ZC y C z yz 21

2xz Z 2 C z2
ZC z

yz YC y xy ZC z
ZC z

XC x

X 2 C x2

Y XC y Cx xy

(4.6.13)
on simplification we get

MSE T18( 2 ) = 2Y 2 C y2 + ( 2 1 ) 2yz Y 2 C y2 + 1 2yz Y 2 C y2


+ ( 1 2 ) C y2Y 2 2yz 21C y2Y 2 yz xz xy

or

MSE T18( 2) = Cy2Y 2 2 + 2 ( 2 1 ) 2yz + ( 1 2 ) 2yz + 1 2yz 2xz 2 yz xz xy

63

MSE T18( 2 ) = C y2Y 2 2 ( 2 1 ) 2yz + 1 2yz 2xz 2 yz xz xy

V18( 2 ) = MSE T18( 2 ) = C y2Y 2 2 1 2yz + 1 yz xz yz

(4.6.14)

4.7 Roys (2003) Unbiased Regression Estimator

Roy (2003) proposed an unbiased estimator. He has used partial information.

)} {

)}

T19( 2 ) = y2 + k1 x1 + k2 Z z1 x2 + k3 Z z2

(4.7.1)
= y2 + k1 x1 + k1k2 Z z k1 x2 k1k3 Z z2

)
(
)
= y2 + k1 ( x1 x2 ) + k1k2 ( Z z1 ) k1k3 ( Z z2 )
= y2 + ( x1 x2 ) + ( Z z1 ) + ( Z z2 )

(4.7.2)

where = k1 = k1k2 = k1k3

) (

) (

T19( 2 ) = Y + e y2 + ex1 ex2 + Z Z ez1 + Z Z ez2

Y + e y2 + ex1 ex2 ez1 ez2

T19( 2 ) Y = e y2 + ex1 ex2 ez1 ez2

MSE of T19( 2 ) is

E T19( 2 ) Y
(4.7.3)

= E e y2 + ex1 ex2 ez1 ez2

We want to get the optimum value of , and for this we differentiate w.r.t.

) (

= E e y2 ex1 ex2 + ex1 ex2

ez1 ex1 ex2

ez2 ex1 ex2

= ( 2 1 ) Y X C y C x xy + ( 1 2 ) X 2 Cx2
( 0 )

( 1 2 ) Z X C z C x xz = 0

= ( 2 1 ) Y X C y Cx xy + ( 1 2 ) X 2 C x2
( 1 2 ) Z X C z C x xz = 0
X 2 C x2 Z X C z C x xz = Y X C y Cx xy

( 1 2 ) X Cx Z Cz xz Y C y xy = 0
(4.7.3)
Now differentiate w.r.t.

E e y2 ez1 + ez1 ex1 ex2 ez21 ez1 ez2 = 0

64

2 Y Z C y C z yz 2 Z 2 C z2 2 Z 2 C z2 = 0
2 Y C y yz 2 Z C z 2 Z Cz = 0
2 B Z C z + Z C z Y C y yz = 0

(4.7.4)
Now differentiate w.r.t.

E e y2 ez2 + ez2 ex1 ex2 ez1 ez2 ez22 = 0


2 Y 2 Z C y C z yz + ( 2 1 ) Z X Cz Cx xz 1 Z 2 C z2 2 Z 2 Cz2 = 0
2 Y Z C y yz + ( 2 1 ) X C x xz 1 Z C z 2 Z Cz = 0
2 Z C z 2 Y C y yz + 1 Z Cz + ( 1 2 ) Z Cz = 0

2 Z C z Y C y yz + 1 Z Cz + ( 1 2 ) Z Cz = 0

(4.7.5)

65

You might also like