Hanif Lecture

Lecture Slides by Dr.
Muhammad Hanif Mian for

Workshop on Recent Developments in Survey Sampling
(August 26-27, 06)
AN ALTERNATIVE ESTIMATOR FOR Y

Since repetition of the observation of a repeated unit in a
sample selected with srswr does not provide additional
information for estimating Y , the mean of the values of
the distinct units in a sample of n units may be
considered as an alternative estimator. That is, if
y1/ , y2/ ,..., yd/ denote the values of the distinct units in a
simple random sample of n units selected with
replacement ( d n ) , then the suggested alternative
estimator is
y/ =
1 d /
S yi
d i =1
(3.21)
This estimator is unbiased for Y and is more efficient than

the sample mean y , namely,
y=
1 n
1 n
y
=
ri yi ,
i n
n i =1
i =1
where ri of is the number of repetitions of the i-th distinct

d
unit and S ri = n.
i =1
The variance of y / can be obtained by nothing that in this

case two stages of randomization are involved: (i) d is a
random variable taking values 1 to n with certain
probabilities, and (ii) selection of the d distinct units from
1
N units with equal probability without replacement and

applying the formula of simple random sampling, we get
1 1 N
2,
(3.22)
Var ( y / ) = E
d
N
N
1

1
1n 1 + 2n 1 + ....... + N n 1
Where E =
. Neglecting terms of
Nn
d
2
1
degree greater than in (3.22), we get
N
n 1 N
1 1
Var y / =
2
(3.23)
+
2
n 2 N 12 N N 1
An unbiased estimator of Var y / is given by
( )
( )
1 1
N 1 2
var y / = + n
sd , (3.24)
d N N N
2
1 d /
S yi y / for d 2.
Where sd2 = 0 for d = 1 and sd2 =
d 1 i =1
( )
The second term in the curly brackets in (3.24), namely

( N 1) ( N n N ) , is likely to be negligibly small compared to
the first term and hence the variance estimator may be
taken as
1 1
var ( y / ) = sd2 .
d N
(3.25)
It may be noted that if N is considerably larger than n,

then the chance of repetition of a unit in the sample will
be small and hence the gain in using y / , instead of y will
be only marginal. The results mentioned in this section
have been discussed in detail by Basu (1958), Raj and
Khamis (1958) and Pathak (1962.)
2
UNBIASED RATIO ESTIMATOR

We have seen that under simple random sampling, classical (conventional) ratio
estimator is biased. Lahiri (1951) suggested that classical ratio estimator can be
made unbiased if the selection procedure is changed. Midzuno (1950) and Sen
(1951) proved the same result. Lahiri suggested that the first unit was
selected with probability proportional to the aggregate of the size (PPAS)

or with probability proportional to
X
i =1
, and the remaining n 1 units
with equal probability and without replacement. Midzuno (1951)

simplified this procedure as the first unit is selected with probability
proportional to Xi (measure of size), and the remaining (n 1) units like
Lahiri (1951). This idea was introduced by Ikeda (1950) reported by
Midzuno (1951). This sampling scheme has striking resemblance to the simple
random sampling without replacement. In fact, it may be viewed as a
generalization of the simple random sampling when extra information on the
population is available.
Let we have a population of N units. The probability that ith unit is first one to be
selected and subsequent (n 1) units with equal probability and without
replacement is
xi
N
X
i =1
1
.
N 1
n 1
The probability that jth unit is first one to be selected and subsequent (n 1)
draws with equal probability and without replacement
xj
N
X
i =1
1
,
N 1
n 1
and so on the probability P(s) for the two selections are therefore
P( s) =
xi + x j
N
X
i =1
1
.
N 1
n 1
Since there are n such selection therefore the probability of the selection of the
sample will be
P( s) =
1
.
X N 1
n 1
x
1
=
.
X N

n
i =1
(6.6.1)
(6.6.2)
The classical ratio estimator is

n
y =
y
i =1
n
x
i =1
(6.1.3)
THEOREM (6.2):
Classical ratio estimator is unbiased under Ikeda-Midzuno- Sen Lahiri selection
procedure with variance
N 1
Var ( y) = X
n 1
n 2
yi
i =n1 Y 2
xi
i =1
(6.6.3)
/ is the sum over all possible samples.

PROOF
Taking the expectation of (6.1.3) we have
yi
E ( y) = i =n1 P( s) X ,
x
i =1
where P(s) is the probability of the sample..Putting the value of P(s) from (6.6.1)
we will have
n
yi
E ( y) = i =n1
xi
i =1
1
X
N 1
n 1
x
i =1
On simplification we get
yi
N yi
= i =1
= y = Y
E ( y) = i =1
N 1
N
N
n

n 1
n
n
The variance expression of y may be derived as;
(6.6.4)
Var ( y) = E ( y) 2 [ Ey]
n 2
yi
i =1
P( s) X Y 2
Var( y ) =
2
n

xi
i =1
Substituting the value of P(s) from (6.6.1)
N 1
Var ( y) = X
n 1
n 2
yi
i =n1 Y 2 .
xi
i =1
Note that the Var ( y) = 0 if yi xi Yi =
Xi
Xi
(6.6.3)
. This is very strong
property and will be referred to as Ratio Estimator Property.

THEOREM ( 6.3)
The mean of ratio estimator is an unbiased with variance
N
Var ( y) = X
n
y2
Y 2
x
(6.6.5)
PROOF
Taking the expectation of (6.1.4) we get
y
E ( y) = XE = P( s ) X
x
x
Using (6.6.2) we have
y
y x 1
E ( y) =
X = N = Y
x X N
Cn
(6.6.5)
Proceeding by the same way as before we can derive the variance expression of
y , i.e.
N
Var ( y) = X
n
y2
Y 2
x
(6.6.6)
THEOREM (6.4)
An unbiased estimator of Var ( y) is
v ar( y) = y2
i =1
= y2
yi2 X n
x i =1
Nn
X
x
y y
j =1
N 1 X
Nn(n 1) x
2 N n 2
y Nn s y
(6.6.7)
(6.6.8)
PROOF
It may be proved that E[v ar( y)] = Var ( y) . For this
n y2 X
n yi2 X
= i
E
P( s)
i =1 Nn x
i =1 Nn x
1 N
x
1
/ yi X
= Yi 2
=
N x X N N i =1
(6.6.9)
and
n n
N 1
x
N 1
= n(n 1)
E yi y j
E ( yi y j )
i =1 j =1
nN (n 1) X
n(n 1) N
j i
YY
i
N 1
1
=
= 2
N
N ( N 1) N
i =1 j =1
j i
YY
i =1 j =1
j i
(6.6.10)
Hence
= E ( y2 ) Y 2 = Var ( y)
Similarly we can show that an unbiased estimator of population total will be
v ar( y) = y2
N X
x
2 N n 2
y Nn s y
RATIO ESTIMATOR AS MODEL-UNBIASED
Consider all estimators y of Y that are linear functions of sample values yi,
that are of the form
n
y = ci yi ,
(6.8.1)
i =1
where the ci does not depend on y i s though they may a function xi. The choice
of the cis restricted to those that give unbiased estimation of Y. The estimator
with the smallest variance is called best linear unbiased estimator. The model
is:
yi = xi + i ,
where E ( i ) = 0, Cov( i , j ) = 0,
and Var ( i ) = i2 = 2 X i2
1 1
2
(6.8.2)
where i are independent of the xi and xi are > 0. The xi (i = 1, 2, N) are

known. The model is the same that was employed by Cochran (1953), which
appears to have been originated by H.F. Smith (1938). Useful references to this
model are Cochran (1953, 63, 77), Brewer (1963b), Godambe and Joshi (1965),
Hanif (1969) Foreman and Brewer (1971), Royall (1970). (1975),Brewer and
Hanif(1983) Cassel, et al (1976), Isaki and Fuller (1982), Hansen, Madow and
Tepping (1983), Samiuddin et al (1992)and many others.
Brewer (1963b) defined an unbiased ratio estimator under model (6.8.2). He

used the concept of unbiased ness which was different from that given in
randomization (design - based) theory. Royall (1970) also used this model.
Brewer and Royall regarded an estimator y (estimated population total) is
unbiased if E ( y ) = E (Y ) in repeated selections of the finite population and

sampled under the model. Under model (6.8.2) Brewer (1963b) proved that the
classical ratio estimate was model unbiased and is best linear unbiased
estimator for any sample [random or not] selected solely according to the values
of the Xi. This result hold goods if the following line conditions are satisfied;
(i)
(ii)
The relation between estimated (yi) and benchmark (xi) is linear and
passes though the origin.
The Var(yi) about this line is proportional to xi.
THEOREM (6.6):
Under the model (6.8.2) classical ratio estimator is unbiased with variance
N
Var ( y) = X
n
y2
( X nx )
X
Y 2 = =
nx
x
(6.8.3)
PROOF:
We know that
n
y = ci yi
(8.6.4)
i =1
Using model (6.8.2) we have

n
y = ci [ xi + i ] =
i =1
i =1
i =1
ci xi + ci i
Since E(i) = 0 we then have
E ( y) =
ci xi
i =1
ci E ( i ) =
i =1
c x
i =1
(6.8.5)
We also know that
Yi = X i + i or E( Y ) = X
(6.8.6)
Now
E[ y Y ] =
i =1
i =1
i =1
ci xi + ci E ( i ) X E ( i )
= ci xi X = 0 If
i =1
c x
i =1
i i
=X
(6.8.7)
Therefore we say that y / is model unbiased if

n
i =1
xi = X
(6.8.8)
The variance expression of y / , i.e.
Var ( y) = E ( y2 ) [ E ( y) ]
E ( y 2 ) = 2
xi2 + ci2 E ( i2 ) + 2
2
i
i =1
(6.8.9)
i =1
2
i
xi E ( i )
Using the condition of model we will have:

n
i =1
i =1
ci2 xi2 + ci2Var ( i )
E ( y 2 ) = 2
(6.8.10)
Using (6.8.2), (6.8.5) and (6.8.9) in (6.8.9), we will have
Var ( y) =
c
i =1
2
i
Var ( i )
(6.8.11)
Let us for simplicity we assume Var ( i ) = xi then (6.8.11) will be:
Var ( y) =
c x
i =1
2
i i
(6.8.12)
We can minimize Var ( y / ) w.r.t. ci. For this the Lagranges multiplier will be
c x c x X
i =1
2
i i
i =1
Differentiating unconditionally with respect to ci, we get.
= 2 ci xi xi = 0
ai
or
ci =
= C (constant)
2
n
We know from (6.8.7) that
c x
i =1
=X
or
n
i =1
xi = X , or
c=
X
= ci
n x
Hence y =
/
i =1
ci
X
yi =
n x
yi =
i =1
i =1
n
x
i =1
X = y
The best linear unbiased estimator y / = y , which is a classical (conventional)

ratio estimator. For the derivation of Var ( y) we proceed as follows:
y Y =
i =1
Since
i =1
i =1
i =1
ci xi + ci i X i
X
X
ci xi = X and ci =
then y Y =
n x
n x
i
i =1
i =1
Divide i into sample and non-sample values we have

i =1
ai =
X
X
then y Y =
n x
n x
X
1
n x
or ( y Y ) =
i
i =1
i =1
N n

i =1
i =1
Squaring and taking the expectation

2
N n
2
X
n
1 E ( i2 ) + E ( i2 )
E ( y Y ) = Var ( y ) =
nx i =1
i =1
2 n
N n
X
1 var ( i ) + var ( i )
Var ( y ) =
nx i =1
i =1
Substituting the value of Var(xi), we have:

2
N n
X nx
Var ( y ) =
1 + xi
nx
i =1
n
N
X nx
x
X
=
+
i xi
i
nx
i =1
i =1
X nx
=
nx + ( X nx )
nx
Var ( y) =
( X nx )
( X nx ) 2
X
nx + ( X nx ) =
2
(n x )
n x
(6.8.3)
10
Using all these assumptions a model-unbiased estimator of from the

sample may be easily proved as
1 n 1
( yi r xi )2 .
n 1 i =1 xi
(6.8.13)
Putting this value of in (6.8.4) a model-unbiased variance estimator is
Var ( y) =
( X nx ) X 1 n 1
( yi rxi ) 2
n x
n 1 i =1 xi
(6.8.14)
This model based unbiased estimator is not only superior to y / but is the best of
a whole class of estimators. For details see Brewer (1963b, 1979), Royall (1970),
Royall and Herson (1973) and Samiuddin et.al. (1978).
6.9
COMPARISON y AND y / UNDER STOCHASTIC MODEL
It is an established fact that the choice of a suitable sample plan is central to the
design of a sample survey. Sample design can be regarded as comprising separate
selection and estimation procedures, but the choices of these are so
interdependent that they must be considered together for virtually all purposes.
Some times the nature of the sample plan is determined by circumstances, but
usually the designer is faced with a choice, and frequently it is obvious which of
a number of possible plan will be most efficient in terms of minimum sample
error for given cost( or vice versa). Standard sampling theory using imputed
values for such quantities as the means, variances, and correlation coefficient of
the (finite) population, or strata or clusters within it, can often indicate which
design is most efficient. Sometimes, however, this is not so. A well-known
example is the comparison between classical ratio estimation using unequal
probabilities. To obtain a straight forwarded answer in this case, Cochran (1953)
made use of a certain super population (6.8.2) which is intuitively attractive and
appears to have some empirical basis. The purpose here is to compare classical
ratio estimator and unbiased estimation method of estimation using equal
probabilities and using large scale sample result which can be obtained using
generalization of model. Comparison for probability proportional to size will be
discussed in Chapter 7, 8 and 9. The stochastic model used here for the purpose
of comparing efficiencies.
6.9.1. Unbiased Estimate for Population Total Based on Simple Random
Sampling
THEOREM (6.7).
Under linear stochastic model (6.8.2) ratio estimator will be more efficient than
N
unbiased estimator if
2 x2 > i2
i =1
PROOF
11
We know that:
N
n
y =
y
i =1
Putting the value of (6.8.2) we get
N
n
y =
n
n
i
x
i =1
i =1
n
N
= x + i
n i =1
( x + ) =
i
i =1
N
n
i =1
i =1
i =1
Y = Yi = [ X i + i ] = X + i
Also
or
i =1
(6.9.2)
=Y X
Var ( y ) = E ( y Y )
(6.9.1)
(6.9.3)
2
= E [ y X + X Y ]
= E ( y X ) (Y X )
= E ( y X ) E (Y X )
2
as cross product term is equal to 2E (Y X )
(6.9.4)
Now on first term of (6.9.4) using (6.8.2) will be
EM E D ( y X )
= EM E D x +
n
i X
i =1
= EM E D ( x X ) +
n
N
N
= 2 x2 + i2
n i =1
i =1
(6.9.5)
Similarly
2
2
N
EM i = EM ( Y X )
i =1
12
or
i =1
2
i
= E (Y X )
(6.9.6)
Using (6.9.5) and (6.9.6) in (6.9.4) we get:
Var ( y ) = y2 = 2 x2 +
N n N 2
i
n i =1
(6.9.7)
Ratio Estimator
y =
N
n
y
X
x
(6.1.3)
( X
i =1
+ i )
X
x
N n
+
x
i
n i =1
X
=
x
Now
(6.9.8)
Var ( y ) = E [ y Y ]
= E [ y X + X Y ]
2
2
= E ( y X ) E (Y X )
(6.9.9)
Now
EM ED [ y X ]
x + n
= EM E D
x
i =1
X X
N X n
i X
= EM E D X +
n x i =1
N X2
N X n
= EM E D
i = n x2
n x i =1
i =1
2
i
N
n
i =1
2
i
(6.9.10)
N
E (Y X ) 2 = i
i =1
Therefore
13
N
Var ( y ) = =
n
-
i =1
2
i
i =1
2
i
Comparing (6.9.7) and (6.9.11) we have:
Var ( y ) Var ( y ) = 2 x2 +
N
i =1
N
n
i =1
i =1
i2 i2
N
n
i =1
2
i
2
i
N
= 2 x2 So Ratio Estimator will always be more efficient if = 2 x2 i2 is

i =1
positive or
Foreman and Brewer (1971) used the following model
Yi = + X i + i
With the same assumption given in (6.8.2) they compared various method of
estimation and proved that ratio method of estimation is more efficient than
unbiased estimation method provided | | < | X | ..
SOME RECENT DEVELOPMENTS ON RATIO ESTIMATORS

Recently two benchmark variables have been used to increase the efficiency.
Some of them are given here
6.10 1 Modification of Classical Ratio Estimator I
Chands (1975) developed a chain ratio type estimator in the

context of two phase sampling. It seems sensible to study the possibility of
adapting it to the new situation although the force of its argument is
somewhat lost in the single phase case.
THREROM (6.8).
An estimator suggested by Samiuddin and Hanif (2006) by using two auxiliary
variables i.e, ratio cum ratio is
T2 = y
X Z
x z
(6.10.1)
With Mean square error is
14
MSE (T2 ) = 1Y 2 C y2 + C x2 + C z2 2C x C y yx 2C y C z yz + 2C x C z xz (6.10.2)

The construction of this estimator is made multiplying Classical Ratio estimator
by
Z
.
z
PROOF
Using the concept given in (6.2.23) we get
e e
T2 Y = Y + e y 1 x 1 z Y
X
Z
Ignoring send and higher order terms we get
Y
Y
ex ez
X
Z
The mean square error T3 will be
T2 Y ; ey
E T2 Y = = y X Z Y
x z
Using (6.10.4) in (6.10.5) we get

2
(6.10.3)
(6.10.4)
2
(6.10.5)
2
Y
Y
E (T2 Y ) ; E e y ex ez
X
Z
Y
Y2
Y
Y
Y2
; E ey2 + 2 ex2 + 2 ez2 2 ey ex 2 ey ez + 2
ex ez
X
Z
XZ
X
Z
Applying expectation we get
Y2
Y2
Y
MSE (T2 ) ; 1 Y 2 C y2 + 2 C x2 X 2 + 2 C z2 Z 2 2 Y XC x C y xy
X
X
Z
Y
Y2
Y ZC y C z yz + 2
XZC x C z xz
Z
XZ
MSE (T2 ) = 1Y 2 C y2 + C x2 + C z2 2C x C y yx 2C y C z yz + 2C x C z xz (6.10.2)
MSE (T2 ) = 1Y 2 [C y 2 + C x 2 2 xy C x C y ] + 1Y 2 [C 2 z 2C y C z yz + 2C x C z xz ]
MSE (T2 ) = MSE (T1 ) + 1Y 2 [C 2 z 2C y Cz yz + 2C xCz xz ]
(6.10.2)
6.10.2 Revised Ratio Estimator( An Estimator with suitable a

involving two auxiliary variables)
15
THEOREM (6.9). A possible estimator with the involvement of suitable a

and with two auxiliary variable suggested by Samiuddin and Hanif (2006) is
X
Z
T3 = a y + (1 a ) y ,
x
z
(6.10.6)
with mean square error
Cx C y xy C y Cz yz Cx Cz xz + Cz2

2
2
2
MSE T3 = 1Y C y + Cz 2 yz C y Cz
Cx2 + Cz2 2Cx Cz xz

(6.10.7)
PROOF
Using the concept given in (6.2.23)
y
X;
x
(Y + ey ) X = Y 1 + ey 1 ex
Y
X
Expanding and ignoring second and higher order terms we get

X + ex
ey e
Y
= Y 1 + x + .... = Y + e y ex + ....
X
Y X
y
X
x
Similarly
y
Z;
z
(6.10.8).
(Y + ey ) Z = Y 1 + ey 1 ez
Z + ez
ey e
Y
= Y 1 + z + .... = Y + e y ez + .... (6.10.9)
Z
Y Z
Using (6.10.8) and (6.10.9) T3 will be
X
Z
= Y + ey
+ (1 ) Y + e y
X + ex
Z + ez
Y + e y 1 x + (1 ) Y + e y
Y
Y
Y + e y ex (1 ) ez
X
Z
The mean square error will be
T3
) 1 eZ
16
Y
Y
MSE (T3 ) = e y ez ex ez
Z
Z
X
This may be written as

X
MSE (T3 ) ; E a y Y + (1 a ) y Y
Y
Y
Y
= E ae y a ex + e y ez ae y + a ez
X
Z
Z
Y
Y
Y
= E e y a e x ez + a ez
X
Z
Z
(6.10.10)
Y
Y
Y
= E e y ez a ex ez
Z
Z
X
(6.10.11)
In order to get the optimum value of a we first find partial differentiating of

(6.10.11) w.r.t a and then equating to zero.
2
Y
Y Y
Y
Y
E ey ez ex ez Ea ex ez = 0
Z X
Z
Z
X
Therefore optimum value of a is
Y Y
Y
E e y ez ex ez
Z X
Z
a=
2
Y
Y
E ex ez
X
X
Y
Y
Y2
Y2
E e y ex e y ez
ez ex + 2 ez2
Z
ZX
Z
X
=
2
2
2
Y
Y
Y
E 2 ex2 + 2 ez2 2
ex ez
XZ
Z
X
Y 2 C y Cx xy Y 2 C y Cz yz Y 2 Cx Cz xz + Y 2 Cz2
Y 2 Cx2 + Y 2 Cz2 2Y 2 Cx Cz xz
17
Y 2 C y Cx xy C y Cz yz C x C z xz + Cz2
=
Y 2 C x2 + Cz2 2Cx Cz xz
a=
C y Cx xy C y C z yz C x C z xz + Cz2
Cx2 + Cz2 2Cx Cz xz
(6.10.12)
Taking the square of (6.10.11)

2
Y
Y
Y Y
Y
= E e y ez + a 2 E ex ez 2aE e y ez ex ez
Z
Z
Z X
Z
Y 2
Y2
Y
Y2
Y
Y
= E e y2 + 2 ez2 2 e y ez + a 2 E 2 ex2 + 2 ez2 2 ex ez
X
X
Z
X
Z
X
Y
Y
Y2
Y2
2aE e y ex e y ez
ez ex + 2 ez2
Z
ZX
Z
X
Applying expectation the mean square error will be

MST (T3 ) = Y 2 C y2 + C z2 2 yz C y C z + a 2 C x2 + C z2 2C x C z xz
2a C y C x yx C y C z yz C x C z xz + C z2
(6.10.13)
Putting the value of a from (6.10.12) in (6.10.13) and on simplification we
Cx C y xy C y Cz yz Cx Cz xz + Cz2
2
2
2
MSE (T3 ) = Y C y + Cz 2 yz C y Cz
Cx2 + Cz2 2Cx Cz xz
(6.10.7
Since = 0 and = 1 are special cases of T5 therefore we conclude that
Z
X
MSE (T5 ) MSE y and MSE y . In T5 , will have to be replaced
z
x
by its sample estimate .
SAMPLING WITH PROBABILITIES PROPORTIONAL

TO SIZE (WITH REPLACEMENT)
18
7.1.
INTRODUCTION.
In previous chapters equal probability sampling selection procedure and

estimation methods have been discussed. In this and subsequent chapters those
selection procedures will be considered in which probability of selection varies
from unit to unit (unequal probability) in the population. In equal probability
sampling, selection does not depend how large or small that unit is but in
probability proportionate (proportional) to size sampling these considerations are
made. The probabilities must be known for all units of the population.
The general theory of unequal probabilities in sampling was perhaps first
presented by Hansen and Hurwitz (1943). They demonstrated, however, that use
of unequal selection probabilities within a stratum frequently made far more
efficient estimator of total than did equal probability selection provided measure
N
of size ( Z i i.e.
Z
i =1
= Z ) is sufficiently correlated with estimand,( variable
under study) Yi. A method of selection in which the units are selected with
probability proportionate (proportional) to given measure of size, related to the
characteristic under study is called unequal probability sampling or the
probability proportional to size sampling, commonly known as PPS or PS
sampling.
7.2.
SAMPLING WITH UNEQUAL PROBABILITIES WITH

REPLACEMENT [PPS SAMPLING].
The use of unequal probabilities in sampling was first suggested by Hansen and
Hurwitz (1943). Prior to that date there had been substantial developments in
sampling theory and practice, but all these had been based on the assumption that
probabilities of selection within each stratum would be equal. They proposed a
two stage sampling scheme (will be discussed in Chapter 11). The first stage
selection took place in independent draws. At each draw, a single first-stage unit
is selected with probabilities proportional to a measure of size, the number of
second-stage sampling units within each first-stage units. At the second-stage, the
same number of second stage-units is selected from each sampled first-stage unit.
Because it is possible for the same first-stage unit to be selected more than once
therefore, this type of unequal probability sampling is generally known as
sampling with replacement. Since, however, the independence of the draws is not
necessary condition for the units to have a non-zero probability of being selected
more than once, another name first suggested by Hartley and Rao (1962) is
19
multinomial sampling, a term justified by the multinomial distribution of the

number of units in the sample.
Unequal probability can however be used in single stage design.
This scheme compared favorably with other two stage sampling schemes; these
used equal probabilities of selection at the first stage, and then took either a fixed
number or a constant proportion of sub-sampling units from each selected first
stage unit. This selection procedure is explained as:
A list of 523 villages of Multan district along with population of males and
females is given in Appendix-I. In order to understand the selection procedure of
probability proportional to size sampling, 5% sample has been selected from this
population. In order to select a sample we cumulate the measure of sizes (area)
under this selection procedure, 26(5% of total villages) random numbers are
selected from 001 to 956204. These random numbers along with the serial
number of villages, total population and initial probabilities of selection are
given(data is given on next page). If any unit is selected more than once it should
be included in the sample
7.3 EXPECTATION.
If the ith unit is selected from a population of N units with probability
N
or ypps of population total Y

Pi = Z i / Z i , than an unbiased estimator, yHH
i =1
as suggested by Hansen and Hurwitz (1943) is:
= yPPS
=
yHH
1 n yi
,
n i =1 pi
(7.3.1)
where HH denotes the Hansen and Hurwitz, and pps denotes probability
proportional to size.
THEOREM (7.1)
A sample of size n is drawn from a population of N units with probability
proportional to size and with replacement y HH is an unbiased estimator of
population total, Y.
PROOF
We know that
20
=
yHH
1 n yi
,
n i =1 pi
(7.3.1)
Taking the expectation
E ( yHH ) =
N
yi
yi
Yi
1 n
=
=
E
E
Pi = Y
(
)
(
)
n i =1
pi
pi
i =1 Pi
is an unbiased estimator of population total Y.

Therefore yHH
Random number
859677
74835
491741
285996
252541
287850
847258
410596
674344
727666
920794
291874
742201
37860
750855
91613
757074
213334
656265
843800
464793
598479
314161
820668
18504
32315
7.4.
Sr. number of Total population

villages
483
7346
50
9231
275
3713
131
2310
108
7261
133
10425
478
6978
221
399
397
737
423
3203
508
4039
135
5439
434
1373
33
8074
437
3416
54
5841
441
1316
92
6475
385
1261
478
6975
258
2513
360
3039
153
322
472
13056
19
593
28
2515
Probability of
Selection
.005946
.006511
.001335
.001337
.006127
.00353
.006409
.000316
.002414
.001396
.002813
.000906
.000885
.006968
.00166
.003874
.002297
.004451
.002064
.006409
.002781
.001128
.000697
.00613
.000998
.001936
VARIANCE AND UNBIASED VARIANCE ESTIMATOR
21
THEOREM (7.2)
proportional to size and with replacement, the variance of y HH is
Var ( yHH ) =
1 N Yi 2
Y 2
n i =1 Pi
(7.4.1)
PROOF.
We know that
)2 Y 2
Var ( yHH ) = E ( yHH
from (7.3.1), we have
Substituting the value, yHH
2
1 n y
) = E i Y 2
Var ( yHH
n i =1 pi
n n
2
yi y j
1 n yi
= 2 E 2 + E
Y 2
n
i =1 p i
i =1 jj =1i pi p j

2
N N
YY
1 N Yi
i j
= 2 n 2 Pi + n(n 1) Pij
Y 2 .
n
PP
i =1 Pi
i =1 j =1
i j
j i
Since the selection of population units are independent; therefore Pij =

PiPj, substituting the value of pij:
N
1 N Yi
N
2
2
+ (n 1) ( Yi ) Yi
n i =1 Pi
i =1
i =1
2
Var ( yHH ) =
2
Y .

On simplification we get:
Var ( yHH ) =
1 N Yi
2
Y
n i =1 Pi
This expression may alternatively be written as
22
1 N Y
) = Pi i Y .
Var ( yHH
n i =1 Pi
Yi Y j
1 N N
PP
=

i j
2n i =1 j =1
Pi Pj
=
(7.4.2)
1 N 1
(Yi Pi Y ) 2 .
n i =1 Pi
(7.4.3)
(7.4.4)
7.4.1 An Alternative proof( using Indicator Variable)
Let ai is defined as the number of times that the ith unit of the population
to be in the sample (Chapter 2), then the joint distribution of ai is
n!
P1a1 P2a2 K PNaN
a1 ! a2 !K aN !
(7.4.5)
Then
E (ai ) = nPi ; Var (ai ) = nPi (1 Pi ); Cov(ai , a j ) = nPP

i j (7.4.6)
An unbiased estimator of population total will be
=
yHH
1 N Yi
ai
n i =1 Pi
(7.4.7)
The unbiased ness can be proved easily as:Taking the expectation of (7.4.7) and putting E(ai) = nPi from (7.4.6) we
get
E ( yHH ) =
Y 1 N
Y
1 N
E (ai ) i = nPi i = Y
n i =1
Pi n i =1
Pi
23
The variance of y HH may be written (see chapter 2) as:
Y2
1 N
) = 2 Var (ai ) i 2 +
Var ( yHH
n i =1
Pi
Yi Y j
Cov(ai , a j )
(7.4.8)
Pi Pj
i =1
j i
j =1
Putting the values of Var (ai ) and Cov(ai , a j ) from (7.4.6) in (7.4.8) and on
simplification we get (7.4.1).
It follows that, if Pi = Yi /
Y
i =1
the variance is zero. In practice, this ideal
situation can of course not be realized as the probabilities cannot be chosen

proportional to Yi, which still has to be observed. But this situation can be
approximated if it is possible to choose Pi proportional to some measures of size
Zi, which is known for all units in the population and which may be assumed
approximately proportional to Yi . The Z i will then be called the size of the ith
unit and least possible variance may be obtained by choosing the probabilities
proportional to the sizes.
An analogous expression for the covariance of yHH and xHH in the case of
sampling with replacement and with probabilities proportional to size may
be written in a straight far warded manner, i.e.
Cov( y, x) =
7.4.1.
1 N Yi
Pi Y i X .
n i =1 Pi
Pi
(7.4.9)
Unbiased Variance Estimator
THEOREM (7.3)

proportional to size and with replacement then an unbiased variance
estimator of (7.4.1) is:
2
n
yi
1
)=
.
var( yHH
yHH
n(n 1) i =1 pi
(7.4.10)
24
PROOF.
Taking expectation of (7.4.10)
2
n
1
yi

,
E [ var( yHH )] = E
yHH
n(n 1) i =1 pi

Now
2
n
yi
yi
Y ) .
=
y
Y n E ( yHH
HH
i =1 pi
i =1 pi
Taking the expectation of the above equation

2
2
n y
n y

2
i
i
E y HH = E Y n E(y HH Y )
i =1 p i
i =1 p i

= n Pi i Y n Var ( yHH )
i =1
Pi
1 N Y
)
= n Pi i Y n var ( yHH ) = n ( n 1) var ( yHH
n i =1 Pi
Using (7.4.2) we get

2
n y

i
= n(n 1) Var ( yHH ) .
E yHH
i =1 pi

Using this result in (7.4.10), we get
) ] = Var ( yHH
)
E [ var( yHH
(7.4.10) may be written as
2
n
n
y yj
1
)= 2
var( yHH
i .
2n ( n 1) i =1 j =1 pi p j
(7.4.11)
For calculation purpose alternative form of (7.4.10) is
)=
var( yHH
1 n yi2
'2
2 n yPPS .
n(n 1 i =1 pi
(7.4.12)
25
An unbiased covariance expression may be written analogous to (7.4.9) as
Cov( y, x) =
n
x
1
y
).
( i yHH )( i xHH
n(n 1) i =1 pi
pi
(7.4.13)
Though this scheme is based on with replacement process but for the following
reasons, it is preferred to be used in large scale sample surveys;
(i)
selection of the sample is simple,
(ii)
can be used for any finite predetermined number of units in the

sample,
(iii)
an unbiased variance estimator is simple, and
(iv)
it is also comparatively easy to obtain unbiased variance

estimator of total in multistage designs.
This selection procedure may be more efficient than simple random

sampling if the measure of size is approximately proportional to estimated i.e. Yi
and Zi are linearly related and regression line passing through the origin.
EXAMPLE (7.2)
Select a sample of 26 villages using probability proportional to size and with

replacement selection procedure form the data given in Appendix-I. Estimate the
total number of person in 523 villages and compare this result with actual
number of population given in 523 villages. Estimate Var ( y PPS ) and calculate
standard error of this estimate.
Solution:
Sr.
No.
1
2
3
4
5
6
7
8
yi
7346
9231
3713
2310
7261
10425
6978
399
pi
0.005946
0.006511
0.001335
0.001337
0.006127
0.00353
0.006409
0.000316
yi
pi
1235452.405
1417754.569
2781273.408
1727748.691
1185082.422
2953257.79
1088781.401
1262658.228
yi
y
pi
137886694865.41
35731892260.18
1379426820568.15
14632605885.20
177831700028.54
1812993331051.22
268326052675.04
118422122062.40
yi2
pi2
yi2
pi
1.52634E+12
2.01003E+12
7.73548E+12
2.98512E+12
1.40442E+12
8.72173E+12
1.18544E+12
1.59431E+12
9075633367
13087292428
10326868165
3991099476
8604883467
30787712465
7597516617
503800632.9
26
9
10
11
12
13
14
15
16
737
3203
4039
5439
1373
8074
3416
5841
0.002414
0.001396
0.002813
0.000906
0.000885
0.006968
0.00166
0.003874
305302.403
2294412.607
1435833.63
6003311.258
1551412.429
1158725.603
2057831.325
1507743.934
1693852740901.42
472833951008.72
29223818023.56
19329457362517.20
3065942449.29
200755773987.28
203444246707.55
9808812374.83
93209557065
5.26433E+12
2.06162E+12
3.60397E+13
2.40688E+12
1.34265E+12
4.23467E+12
2.27329E+12
225007870.8
7349003582
5799332030
32652009934
2130089266
9355550517
7029551807
8806732318
27
Sr.
No.
17
18
19
20
21
22
23
24
25
26
yi
1316
6475
1261
6975
2513
3039
322
13056
593
2515
117850
yi
y
pi
yi
pi
pi
0.002297
0.004451
0.002064
0.006409
0.002781
0.001128
0.000697
0.00613
0.000998
0.001936
0.081318
(i)
572921.202
1454729.274
610949.612
1088313.309
903631.787
2694148.936
461979.910
2129853.181
594188.377
1299070.248
41776367.9425
1068871009157.62
23120451813.51
991684897660.22
268811216688.66
494422166072.03
1182363847314.18
1310574981674.20
273602014185.76
1025348645657.47
94687373182.90
32621180470772.50
= y PPS =
Estimated Total
yi2
pi2
yi2
pi
3.28239E+11
2.11624E+12
3.73259E+11
1.18443E+12
8.1655E+11
7.25844E+12
2.13425E+11
4.53627E+12
3.5306E+11
1.68758E+12
99746754265786.8
753964301.3
9419372051
770407461.2
7590985333
2270826681
8187518617
148757532.3
27807363132
352353707.4
3267161674
217890794431.5760
1 n yi
n i =1 p i
41776367.9425
= 1606783
26
whereas the actual/total for 523 villages is 1797841.

(ii)
Var ( y PPS ) =
n
yi
1
y PPS
n(n 1) i 1 pi
32621180470772.50
= 50186431493
25 26
28
S .E ( yPPS ) = 224023.2834
(iii)
) = 1606783 2 224023.2834
C.L ( yPPS
(iv)
This may also be calculated as:
)=
var ( yPPS
=
yi2
1
2
2 nypps
n ( n 1) pi
1
2
99746754265786.80 26(1606783.382)
25 26
= 50186431493
7.4.2.
Comparison of Simple Random Sampling with Replacement

and Probability Proportional to Size with Replacement
We know that
1 N Yi 2
) =
Var ( yHH
Y 2
n i =1 Pi
(7.4.1)
If Pi =1/N then (7.4.1) becomes
)=
Var ( yran
1 N 2
N N 2 Y2
2
N Yi Y = Yi
n
N
n i =1
(7.4.14)
which is a variance expression for simple random sampling with replacement.

Putting Pi = Zi/Z in (7.4.1) and subtracting from (7.4.14), we obtain
) Var ( yHH
)=
Var ( yran
where Z =
Z
i =1
N
n
Y
i =1
Z
1
Zi
(7.4.15)
/N .
Probability Proportional to size (PPS) sampling with replacement will be more efficient than
simple random sampling provided.
N
(Zi Z )
i =1
Yi 2
>0
Zi
(7.4.16)
i.e. If Zi and Yi 2 / Z i are positively correlated.

However, it was noted by Raj (1954) that estimator based on PPS sampling with replacement
turns out to be inefficient compared to unbiased estimate based on simple random sampling with
replacement if the regression line Yi on Zi is far from the origin.
29
7.4.3Comparison of Var( y ran ) and Var( y HH ) Using a Linear Stochastic Model
We have already shown in (7.4.2) thatt
) Var ( yHH
)=
Var ( yran
N
=
n
N
n
Y
i =1
Z
1
Zi
Yi 2
( Zi Z )
i =1 Z i
(7.4.15)
(7.4.17)
For the purpose of comparison, let us take the linear model as defined in (6.8.2) the Chapter 6, i.e.
Assuming that the finite population Y1, Y2, ., YN is a random sample from an infinite superpopulation in which
Yi = Z i + i
where E * ( i ) = 0, E * ( i j ) = 0,
E ( i2 ) = i2
1
2
2 2
1
and i = Z i , where
2
(6.8.2)
Substituting the value of Yi from the model in (7.4.17), we have
) Var ( yHH ) =
Var ( yran
N
n
(
N
i =1
Z i2 + i2 + +2 Z i i )
2
Z 2 Z i + i + 2 i
Zi
i =1

Using the condition of the model

N
N
i2
N 2 N 2 N 2
2
= Z i + i Z Z i + Z
n i =1
i =1
i =1
i =1 Z i

N
Z
Zi
i
N
N
Z 2
N 2
i =1
2
2
2
i =1
Zi
2 i
=
+ Zi
i =1
n
N
N
i =1
i =1 Z i
30
Zi N
2 Z2 + 2 Z i2 i =1 Z i2 1
N i =1
i =1
N
N
Z i Z i2 1
N
N
B 2 Z i2 + 2 Z i2 1Z i i =1 i =1
N
i =1
i =1
N2
n
N2
n
N2
2 Z2 + 2 Cov( Z i2 1 , Z i )
n
(7.4.18)
We conclude that PPS sampling with replacement is more efficient as compared to simple
random sampling, if
2 Z2 + 2 Cov( Z i2 1 , Z i )} > 0
i
or
2 Cov( Z i2 1 , Z i ) > 2 Z2
i
This satisfied only if , since 2 0 and Cov( Z i2 1 , Z i ) > 0.

Or
Z ,Z
i
2 1
i
>
2 Z
2 1
i
(7.4.19)
Or this may alternatively be solved by direct way

N
We know that
Yi = Z i + i , Summing over i we have, Y = Z + i

i =1
We know that variance for population total for simple random sampling with replacement
(ignoring fpc) is
Var ( y ran ) =
N N 2 Y2
Yi .
n i =1
N
Putting the value of Yi and Y from the model, taking expectation and applying the conditions of
model we have
2
N
N * N
1

2
E [Var ( y ran ) ] = E ( Z i + i ) Z + i
n
N
i =1

i =1
*
N
N
N
1
2
2
2
2 2
N
Z
N
Z
i2
+
i
n
i =1
i =1
i =1
31
Since
i2
N
N
1
Z2
2
2
N
Z
(
N
1)
i2
n
N
i =1
i =1
1 N 2 ( Zi )
Z
and S =
, therefore
i
N 1 i =1
N
N
N 1
=
N 2 S Z2i + 2 Z i2
n
i =1
i2
2
Zi
Now
2
1 N Yi
) =
Var ( yPPS
Y
n i =1 Pi
Putting Pi =
Zi
(7.4.1)
we get
2
1 N Yi
= Z
Y
n i =1 Z i
(7.4.20)
Putting the value Yi and Y from the model, taking expectation and applying the condition of
model we have
2
N
1 * N ( Zi + i )2

Z + i
E [Var ( y PPS ) ] = E Z
n i =1
Zi
i =1
1 N i2 N 2
i
Z
n i =1 Z i i =1
(7.4.21)
Since i2 = 2 Z i2 we have
E * [Var ( y PPS ) ] =
N 2 1 N 2
Z
i Zi Zi
n i =1 i =1
i =1
(7.4.22)
) ] and E * [Var ( yran

) ] we have
Comparing E * [Var ( yPPS
N ( N 1)
B 2 S Z2 + 2 or Z i , Z i2 1 =
i
n
) ] E * [Var ( yPPS ) ]
E [Var ( yran
N ( N 1)
n
B 2 S Z2 + 2 1 S Z S 2 1
Z i Zi
Zi
i
i
1 2 N 2 N N 2 1
2 2
N Z i Z i Z i + N ( N 1) S zi
n
i 1
i =1 i 1
32
1
N
2 2
2
(
1)
1
N
N
N
N
(
)
Zi
Zi
n
i 1
i =1
i 1
Zi2 Zi
Z
i =1
2 1
i
N ( N 1) 2 2
S Zi + 2Cov Z1 , Z i2 1
n
N ( N 1) 2 2
S Zi + z , z 2 1 S zi S z2i 1Cov Z1 , Z i2 1
=
1 i
n
=
We conclude that PPS estimator will be superior to equal probability if
2 1
( Zi , Zi
> 2 SZ i / 2 SZ
2 1
i
which is same as (7.4.19).

under the model (6.8.2) Brewer and Hanif (1983) proved that:
E [Var ( y PPS ) ] = Z 2
*
2
n
1
i =1
1
n
2 1
(7.4.23)
and
)
Var ( yHH
y Y
= i
n
i =1 i
n
In the expression, i is written for npi, so that i is the expected number of appearance of the ith
population unit in sample.
7.5.
GAIN DUE TO PPS SAMPLING (WITH REPLACEMENT) OVER

SIMPLE RANDOM SAMPLING
We know that the variance expression for simple random sampling with replacement is
2
Yi
N ( N 1) 1 N 2 i =1
)=
Var ( yran
Yi N
n
N 1 i =1
(2.5.2)
and
)=
var( yPPS
1 n yi2
'2
2 n yPPS
n(n 1) i =1 pi
(7.4.12)
We can prove that

N
1 n yi2
yi2 N Yi 2
=
=
=
E
P
Yi 2

i
i =1
n i =1 pi
pi i =1 Pi
(i) E
(7.5.1)
and
33
(ii)
1
1
'2
'2
) = E ( yPPS
)
var( yPPS
) E var ( yPPS
E yPPS
N
N
1
2
)}2 E var [ yPPS ]
= E ( ypps
) {E ( ypps )}2 + {E ( yPPS
N
1
Y2
) + Y 2 Var ( yPPS ) =
= Var ( yPPS
= N Y 2 . (7.5.2)
N
N
Using (7.5.1) and (7.5.2) in (2.5.2) we can have
N ( N 1) 1 1 n yi2 1 '2
) .
yPPS var( yPPS

n
N 1 n i =1 pi N
)=
varPPS ( yran
(7.5.3)
)=
varPPS ( yran
N
n2
2
i
p
i =1
1 '2
1
).
yPPS + var( yPPS
n
n
n
yi2
1
1
'2
).
= 2 N nyPPS
+ var( yPPS
n
i =1 pi
n
Subtracting var ( yPPS ) from (5.5.4) we get
) var( yPPS
)=
varPPS ( yran
1
n2
(7.5.4)
n yi2
1
'2
) var( yPPS
).
N nyPPS + var( yPPS
i =1 pi
n
n yi2
n 1
1 n yi2
1
2
2
= 2 N nyPPS +
2 nyPPS
n i =1 pi
n n ( n 1) i =1 pi
N
n2
N
n2
1
= 2
n
i =1
i =1
i =1
2
yi2 ypps 1
2
pi
n
n
yi2 1
pi n 2
i =1
i =1
yi2 1 2
.
+ yPPS
pi2 n
yi2
pi2
yi2
1
N .
pi
pi
(7.5.5)
34
Therefore
) v ar( yPPS ) =
v arPPS ( yran
1
n2
i =1
yi2
1
N . .
pi
pi
(7.5.6)
An estimate of the percentage gain in efficiency due to pps sampling is ainin

) v ar ( yPPS
)
varPPS ( yran
100 .
var ( yPPS )
(7.5.7)
EXAMPLE (7.3)
A sample of size 5 has been selected from a population of size 20 farms. Number of trees,
along with initial probability of selection is given
i)
Estimate the total number of trees in that area, calculate the estimated variance and
standard error of this estimator.
ii)
Estimate the gain in precession over simple random sampling. The actual number of
trees are 28443.
2
yi2 / pi
S.No. of
Villages
No. of
Trees
(yi)
Probability
of Selection
(pj)
yi
pi
yi

yPPS
pi
8
4
16
11
10
311
949
11799
2483
3044
0.014
0.036
0.275
0.121
0.212
22214.286
26361.111
42905.455
20520.661
14358.490
9349614.91
1186162.77
310938735.20
22575222.29
119104700.50
6908642.9
25016694.4
506241458.1
50952801.6
43707245.28
126360.003
463154435.5
632826842.1
35
(i)
=
Estimated Total = yPPS
=
126360.003
= 25272 trees
5
= Y
Actual Total
1 n yi
.
n i =1 pi
= 28443
2
(ii)
n
yi
1
)=
.
var ( yPPS
yPPS
n(n 1) i 1 pi
(iii)
1
. [463154435.5] = 23157721.77
5 4
S.E( y PPS ) = 4812.247
)=
varPPS ( yran
1
n2
n yi2
1
'2
) (7.5.4)
N n yPPS + var( yPPS
i =1 pi
n
1
1
20 (632826842.1) 5 (25272) 2 + (23157721.77)
25
5
= 383158187.5
) v arPPS ( yPPS )
v arpps ( yran
100
)
v ar( yPPS
383158187.5 23157721.77
=
100 = 1554.56%
23157721.77
7.6.
ALTERNATIVE ESTIMATOR TO HANSEN AND HURVITZ

ESTIMATOR
Pathak (1962) described an estimator for the sampling scheme suggested by Hansen
and Hurvitz (1963). For this let we have a sample of three units selected from a
population of N units. Llet the selected sample has yi, yi, yj observations with
probabilities pi, , pi, pj respectively, then Pathak (1962) defines an estimator:
y
y + yj
1y
yp = i + j + i
,
3 pi p j pi + p j
(7.6.1)
36
or for sample size n it may be written as:
yi
n 1
y
1
yp = i + in=1 ..
n i =1 pi
pi
i =1
(7.6.2)
This is more efficient than Hansen and Hurwitz (1993) estimator but more difficult to
calculate. The gain in precision is small unless the sampling fraction is large.
7.7.
RATIO ESTIMATION FOR PPS SAMPLING

We know that
=
yHH
1 n yi
1 n xi
=
and
x
HH
n i =1 pi
n i =1 pi
Therefore
1 n
n i =1
= n
yHH
1
n i =1
yi
pi
. X.
xi
pi
(7.7.1)
From Hansen, Hurwitz and Madow (1953), we have

) = Var ( yHH ) 2 R Cov ( yHH
, xHH
) + R 2 Var ( yHH
) .(6.2.19)
Var ( yHH
Using (7.4.2) and (7.4.9) and analogues expression
37
1 N X
) = Pi i X ,
Var ( xHH
n i =1 Pi
(7.7.2)
in (6.2.19) and on simplification

) =
Var ( yHH
N
N
Yi X i
Yi 2
1 N Yi 2
2
2
R
R
+
(Y RX ) 2 (7.7.3)
n i =1 Pi
i =1 Pi
i =1 Pi
1 N 1
= (Yi R X i .
n i =1 Pi
(7.7.4)
This may be put easily as

2
Var ( yHH ) =
X
1 N Yi
Pi R i .
n i =1 Pi
Pi
(7.7.5)
) may be written in a straight forward

An approximate unbiased estimator of Var( y HH
way or may be derived

2
n
yi
xi
1
) =
var( yHH
r ,
n(n 1) i =1 pi
pi
(7.7.6)
or
2
N
yi y xi
1
) =
var( yHH

.
n(n 1) i =1 pi x pi
(7.7.7)
CHAPTER-4
TWO-PHASE SAMPLING
1.1 Introduction
Consider the problem of estimating population mean of Y of a Study Variable Y from a finite population
on N units. When information on one or more auxiliary variable say X and Z which are correlated with the
variable Y are available or can be cheaply obtained ratio or regression type estimates can be used to
improve the efficiency. These cases may include knowledge of X or Z or both X and Z . These are
38
however situations where prior knowledge about these may be lacking and a census or complete count is
too costly. Two phase sampling is used to gain information about x & z cheaply from a first stage bigger
sample. A sub sample is then selected from the units selected at the first phase & Y is observed for the
selected units.
Useful references in this area are Mohanty (1967), Chand (1975), Ahmed (1977), Kiregyera
(1980, 1984), Sahoo et al (1993) and Roy (2003). We have used Linear models and the method of Least
Squares (L.S) following Roy (2003) to deal with different situations. The results as expected are
encouraging. We have also indicated how slight adjustments can be made in earlier works to improve the
efficiency of the estimates. An implication of this is that some of these earlier works do not fully utilize the
available information.
Let N be the size of the population, from which a sample of size n1 ( n1 < N ) is drawn using a
simple random sampling without replacement. The values of X and Z are noted for the quits selected. From
this sample a sub-sample of size n2 ( n2 < n1 ) is again selected using a simple random sampling with out
replacement observing as Y. S. Further let y2 , x2 and z2 be the sample means of y, x, and z variables
respectively based on the sample of size n1 and let x2 and z2 be the sample mean based on the first phase
sample of size n1 of variable x and z respectively.
Various situations of interest may arise depending on availability of information about X and Z .
We will deal with them separately.
To suit different situation we introduce the following notations. Let S y2 =
1 =
1 N
Yi Y
N 1 i =1
1 1
1
1
, 2 =
, C y2 = S y2 Y 2 with C x2 , C z2 similarly defined. Also xy , yz and xz denote
n1 N
n2 N
the population correlation coefficient between X and Y , Y and Z and X and Z respectively. We will
also write
39
( )
y1 = Y + e y1 , x1 = X + ex1 , z1 = Z + ez1 , E ex21 = 1 X 2 Cx2
( )
( )
E e y21 = 1Y 2 C y2 and E ez21 = 1 Z 2 C y2 . E (ex ey ) = X Y C y C x xy
( )=
ex22
x2 = X + ex2 , E
E e1 ex2
X 2 C x2 , E ex2 e y2 = 2 X YC x C y xy
= ( 2 1 ) X 2 C x2 ,
(4.1.1)
E e y2 ex1 ex2 = ( 2 1 ) Y X C y C x xy
E ex ez1 ez2 = 0
E e y2 ex1 = 1Y X C y Cx xy
with other terms similarly defines: Also we will assume that both e y1 , e y2 are much smaller in comparison
with Y with similar assumptions for auxiliary variables we will look into the following situations
separately.
i) In addition to the sample we are given the population means of X and Z which are X and Z
respectively. We may call this complete information case.
ii) In addition to the sample we are given X only, ( Z being unknown). We will call this partial
information case.
iii) Only the information on the sample is available i.e. X and Z are unknown. We will call this no
additional information case.
4.2 Ratio and Regression Estimators
In this section following estimator of ratio and regression alongwith mean square error have been
considered.
a)
T1( 2 ) =
y2
X
x2
[ X is known ]
b)
T2( 2 ) =
y2
x1
x2
[ no information ]
c)
T3( 2 ) = y2 + byx ( x1 x2 ) [ no information ]
4.2.1 Ratio Estimator with known information
Consider
T1( 2 ) =
y2
X
x2
(4.2.1)
40
T1( 2 )
Y + e y2
X + ex2
.X
ey
= Y 1 + 2
ey
= Y 1 + 2
= Y + e y2
(T Y ) = e y
ex2
1
X
ex
2
X
Y
ex2
X
Y
ex
X 2
The mean square error of T1( 2 ) will be
( ) (
MSE T1( 2 ) = E T1( 2 ) Y
Y
= E e y2 ex2
X
(4.2.2)
Taking the square R.H.S of (4.2.2.) we get
Y2
Y
= E e y22 + 2 ex22 2 e y2 ex2
X
X
Using (1.1.1)
( )
MSE T1( 2 ) = Y 2 2 C y2 +
Y2
X
2 X 2 C x2 2
Y
2Y X C y C x xy
X
( )
V1( 2 ) = MSE T1( 2 ) = 2Y 2 C y2 + C x2 2 xy C x C y
(4.2.3)
4.2.2 Ratio Estimator with no information
Consider
T2( 2 ) =
y2
x1
x2
(4.2.4)
Using (1.1.1) in (4.2.4) we get
T2( 2 ) =
Y + e y2
X + ex2
( X + ex )
1
41
= Y + e y2
)( X + ex )( X + ex )
1
ex
ex
= Y + e y2 1 + 1 1 2
X
X
ex
ex
= Y + e y2 1 + 1 2
X
X
Y
= Y + e y2 +
ex ex2
X 1
or
T2( 2 ) Y = e y2 +
Y
ex ex2
X 1
The mean square error of T2( 2 )
( ) (
Y
MSE T2( 2 ) = E T2( 2 ) Y = E e y2 +
ex1 ex2
X
(4.2.5)
Y2
= E e y22 + 2 ex1 ex2
X
+2
Y
e y ex ex2
X 2 1
( )
MSE T2( 2 ) = 2Y 2 C y2 +
or
Y2
X2
Y
X
2 2
2
= 2Y C y + Y ( 1 2 ) C x2 2Y 2 ( 1 2 ) C x C y xy
( 1 2 ) X 2 Cx2 + 2 ( 2 1 ) Y X Cx C y xy
( )
V3( 2 ) = MSE T3( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x2 2 C x C y xy
(4.2.6)
4.2.3 Regression Estimator with no information
Consider
T3( 2 ) = y2 + byx ( x1 x2 ) (4.2.7)
T3( 2 ) = Y + e y2 yx + er
) ( ex
= Y + e y2 + yx ex1 ex2
or
(T ( ) Y ) = e
2 2
y2
ex2
+ yx ex1 ex2
(4.2.8)
The mean square error of T3( 2 ) is
( ) (
MSE T3( 2 ) = T3( 2 ) Y

(4.2.9)
= E e y2 + yx ex1 ex2
or
42
( )
MSE T3( 2 ) = E e y22 + yx e y2 ex1 ex2
(4.2.10)
or
( )
MSE T3( 2 ) = 2Y 2 C y2 + yx ( 1 2 ) Y X C y C x xy
Substituting the value of yx =
Y Cy
xy
X Cx
xy YC y
MSE T3( 2 ) = 2Y 2 C y2 +
( 1 2 ) Y X C y Cx xy
XC x
( )
( )
MSE T3( 2 ) = Y 2 C y2 2 + ( 1 2 ) 2xy
( )
V2( 2 ) = MSE T2( 2 ) = Y 2 C y2 2 1 2xy + 1 2xy
(4.2.11)
4.3 Mohantys [1967] Estimator and some modifications
In this section following estimators are mentioned.

a)
Z
T4( 2 ) = y2 + byx ( x1 x2 )
z2
b)
z
T5( 2 ) = y2 + byx ( x1 x2 ) 1
z2
c)
X
T6( 2 ) = y2 + byz ( z1 z2 )
x2
d)
T7( 2 ) =
z2
z1 + byx X x1
z2
4.3.1 Mohanty (1967) considered the estimation when Z is known
Z
T4( 2 ) = y2 + byx ( x1 x2 )
z2
(4.3.1)
Using (1.1.1) in (4.3.1) we get
T4( 2 ) = Y + ey2 + yx + er
) ( ex
Z
ex2
z + ez
2
Y
ez
Z 2
Y
ez
Z 2
T4( 2 ) = Y + ey2 + yx ex1 ex2
or
T4( 2 ) Y = ey2 + yx ex1 ex2
The MSE of T3( 2 ) is
43
E T4( 2 ) Y
Y
= E ey2 + yx ex1 ex2 ez2
Z
(4.3.2)
= E ey22 + 2yx ex1 ex2
Y2
Z
ez22 + 2 yx ey2 ex1 ex2
Y
Y
e y2 ez2 2 yx ez2 ex1 ex2
Z
Z
Y2
MSE T4( 2 ) = 2Y 2 C y2 + ( 2 1 ) 2yx X 2 C x2 + 2 2 Z 2 Cz2
Z
+ ( 1 2 ) 2 yx Y X C y C x xy 2 2
2 ( 1 2 ) yx
Y
Y Z C y C z yz
Z
Y
Z X C z C x xz
Z
(4.3.3)
Putting the values of yx =
xy C y Y
X Cx
in (4.3.3) we get
( )
MSE T4( 2 ) = 2Y 2 C y2 + ( 1 1 )
2xy C y2Y 2
X 2 Cx2
X 2 C x2
+2 ( 1 2 )
2 Y 2 C y C z yz 2
xy YC y
X Cx
Y X C y C x xy
xy C y Y Y
( 1 2 ) Z X Cz Cx xz
Cx X Z
(4.3.4)
On simplification
( )
MSE T4( 2 ) = Y 2 2 C y2 + ( 2 1 ) C y2 2xy + 2 Cz2 + 2 ( 1 2 ) C y2 2xy
2 2 C y C z yz 2 ( 1 2 ) C y Cz xy xz
= Y 2 2C y2 ( 2 1 ) C y22xy + 2 Cz2 22 C y Cz yz 2 ( 1 2 ) C y Cz xy xz
= Y 2 2 C y2 2 C y2 2yz + 2 C y2 2yz ( 2 1 ) C y2 2xy + 2 C z2
22 C y C z 2 ( 1 2 ) C y Cz xy xz
or
= Y 2 2 C y2 1 2yz + 2 C z2 2C y Cz + C y2 2yz C y2 2yz
2 C y2 2yz ( 2 1 ) C y2 2xy 2 ( 1 2 ) C y C z xy xz
or
) (

= Y 2 2 C y2 1 2xz + C z C y yz

) C
2
2 2
y yz
44
+2 C y2 2yz ( 2 1 ) C y2 2xy 2 ( 1 2 ) C y C z xy xz
or
) (

= Y 2 2 C y2 1 2xz + C z C y yz

)
2
+ ( 2 1 ) C y2 2xy + 2C y C z xy xz C z2 2xz + C z2 2xz
or
( )
) (

V4( 2 ) = MSE T4( 2 ) = Y 2 2 C y2 1 2xz + C z C y yz

)}
+ ( 2 1 ) C z2 2xz C y xy C z 2xz
(4.3.5)
4.3.2 Mohantys Ratio-Cum-Regression Estimator with no Information
Mohanty (1967) constructed another Ratio-Cum-Regression estimator when sample

information are only given i.e.
z
T5( 2 ) = y2 + byx x1 x2 1
z2
(4.3.6)
Z + ez
1
T5( 2 ) = Y + ey2 + Byx + er ex1 ex2
Z + ez
2
or
ez ez
T5( 2 ) = Y + ey2 + Byx ex1 ex2 1 + 1 1 2

Z
Z
E T5( 2 ) Y
Y
= E ey2 + yx ex1 ex2 +
ez1 ez2
Z
(4.3.7)
( )
MSE T5( 2 ) = E ey22 + 2yx ex1 ex2
+2 yx ey2 ex1 ex2 + 2
Y2
( ez
Z2
ez2
Y
Y
e y2 ez1 ez2 + 2 yx
ex ex2
Z
Z 1
)( ez
ez2
45
( )
MSE T5( 2 ) = 2Y 2 C y2 + 2yx ( 2 1 ) X 2 Cx2 +
Y2
Z2
( 2 1 ) Z 2 Cz2
Y
( 2 1 ) Y Z C y Cz yz
Z
+2 yx ( 2 1 ) X Z C x C z yz
+2 ( 1 2 ) yx Y X C y C x xy + 2
(4.3.8)
Putting the value of yx =
YC y
XC x
( )
MSE T5( 2 ) = 2Y
+2 ( 2 1 )
+2 ( 2 1 )
xy C y Y
X Cx
xy C y Y
X Cx
xy
C y2
+ ( 2 1 )
2xy C y2 Y 2
C x2 X 2
X 2 C x2 +
Y2
Z
( 2 1 ) Z 2 Cz2
Y
( 2 1 ) Y Z C y Cz yz
Z
Y X C y Cx xy + 2
X Y Cx C z xy yz
( )
MSE t5( 2 ) = Y 2 2 C y2 + ( 2 1 ) C y2 2xy C z2 2xz + 2C y Cz xy xz + Cz2 2xz
+Cz2 + C y2 2yz 2C y Cz yz + C y2 2yz
= Y 2 2 C y2 + ( 2 1 ) 2xy C y2 + C z2 2 2xy C y2
2 C y C z yz + 2 C y C z xy xz
= Y 2 2 C y2 + ( 2 1 ) 2xy C y2 C z2 2C y Cz yz + 2C y C z xy xz
( )
V5( 2 ) = MSE T5( 2 ) = Y 2 2 C y2 + ( 2 1 ) 2xz Cz2 xy C y xz Cz
+ C z C y yz
C y2 2xz
(4.3.9)
4.3.3 Modification of T4( 2 ) by Interchanges X and Z
X
T6( 2 ) = y2 + byz ( z1 z2 )
x2
(4.3.10)
Using (1.1.1) in (4.3.10) we get
T6( 2 ) = Y + e y2 + yz + ez
) ( ez
X
ez2
X + ex
2
ex
= Y + e y2 + yz ez1 ez2 1 2
46
T6( 2 ) = Y
Y
+ ex2 + e y2 + yz ez1 ez2
X
or
T6( 2 ) Y = e y2 + yz ez1 ez2
Y
ex
X 2
( ) ( )
MSE T6( 2 ) = E T7( 2 )
Y
= E e y2 + yz ez1 ez2 ex2
X
(4.3.11)
Squaring the R.H.S of (4.3.11)
( )
Y2
MSE T6( 2 ) = e y22 + 2yz ez1 ez2 + 2 e x22 + 2 yz e y2 ez1 ez2
X
Y
Y
e y2 ex2 2 yz ex2 ez1 ez2
X
X
Using (1.1.1)
( )
MSE T6( 2 ) = 2 Y 2 C y2 + ( 2 1 ) 2yz Z 2 + 2
22
Y2
X 2 C x2
X
+ ( 1 2 ) 2 yz Y ZC y C z C y C z yz
Y
Y
Y X C y C z xy ( 1 2 ) 2 yz
X Z Cx Cz xz
X
X
Putting the value of yz =
Y 2C y
Z 2 ez
2yz
( )
MSE T6( 2 ) = 2 Y 2 C y2 + ( 2 1 )
2yz Y 2 C y2
Z
C z2
+ 2
+ 2 ( 1 2 )
22
Y2
X
X 2 C x2
yz Y C y
Z Cz
Y Z C y Cz yz
yz Y C y Y
Y
. X Z C x C z C y C z xz
Y X C y C x xy 2 ( 1 2 )
X
Z Cz
X
or
= Y 2 2 C y2 + ( 1 1 ) C y2 2xz + 2 C x2 2 2 C y C x xy
2 ( 1 2 ) yz xz C y C x
= Y 2 2 C y2 + ( 2 1 ) C y2 2xz + 2 C y Cx xy 2 ( 1 2 ) yz xz C y C x
= Y 2 2 C y2 + ( 2 1 ) C y2 2xz + C x C y yz xz + 2 Cx2 2C y C x xy
47
= Y 2 2 C y2 + ( 2 1 ) C y2 2xz + Cx2 2yz 2Cx C y yz xz C x2 2yz
+2 Cx2 + C y2 2xy 2C y Cx xy C y2 2xy
MSE T6( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x2 2xy C y xy C x yz
+2 Cx C y xy
(4.3.12)
C y2 xy
4.3.4 Modification of Mohanty when X is known

T7( 2 ) =
y2
z1 + byx X x1
z2
(4.3.13)
Using (1.1.1) in (4.3.13) we get

T7( 2 ) =
Y + e y2
Z + ez + yx 1 ex
1
1
Z + ez 2
=
( Y + e y ) 1 ez
2
ez1

Z 1 +
yx ex1
Z
Z
ez ez
ex
ez
= Y + ey2 1 + 1 1 2 yx 1 1 2
Z
Z
Z
Z
ez
ez
= Y + e y2 1 + 1 2 yx ex1
Z
Z
or
= Y + e y2
T7( 2 ) Y = e y2 +
Y
Y
ez1 ez2 yx ex1
Z
Z
Y
Y
ez ez2 yx ex1
Z 1
Z
Mean square error of T7( 2 ) is
( ) (
MSE T7( 2 ) = E T7( 2 ) Y
Y
Y
E e y2 +
ez1 ez2 yx ex1
Z
Z
(4.3.14)
Y2
= E e y22 + 2 ez1 ez2
Z
Y2
Z
2yx e x21
Y
Y
Y2
e y2 ez1 ez2 2 yx ey2 ex1 2 2 yx ex1 ez1 ez2
Z
Z
Z
= 2Y 2 C y2 + ( 2 1 ) +
Y2
Z
Z 2 C z2
Y2
Z
X 2 C x2 + 2 ( 1 2 )
Y
YC y ZCz yz
Z
48
Y
1 yz Y X C y C x yx
Z
( )
X2
MSE T7( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz2 + 12yx 2 C x2
Z
+2 ( 1 2 ) C y C z 21
X
yz C y C z yx
Z
Putting the value of yx
2xy Y 2 C y2 X 2 2
MSE T7( 2 ) = Y 2 C y2 + ( 2 1 ) Cz2 + 1
Cx
X 2 Cx2 Z 2
( )
+2 ( 1 2 ) C y Cz yz 21
Y Cy
X
C y C x yz
xy
Z
X Cx
or
Y2
= Y 2 2 e y1 + ( 2 1 ) Cz2 + 1 2 C y2 2xy
Z
+2 ( 1 2 ) C y Cz yz 21
Y 2
C y xy yz
Z
or
= Y 2 2 C y2 + ( 2 1 )
{C
2
z
2C y C z xz + C y2 2yx C y2 2xy
2
Y
Y
+2 2 C y2 2xy 2 C y2 xy yz
Z
Z
or
= Y 2 2 C y2 + ( 2 1 ) Cz C y xy
C y2 2yx
Y 2
Y
+1 2 C y2 2xy + 2yz 2yz 2 C y2 xy yz
Z
Z
or
( )
MSE T7( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz C y xy
C y2 2xy
2
Y
+1 C y xy yz 2yz
(4.3.15)
4.4
Chands (1975) Estimators
In this section following estimator are derived

a)
T8( 2 ) = y2
x1 Z
x2 z1
49
b)
T9( 2 ) = y2
x2 z1
x1 Z
c)
T10( 2 ) = y2
z1 x1
z2 X
d)
T11( 2 ) = y2
z1 X
z2 x1
4.4.1 Chand (1975) suggested chain-based ratio and product estimator-I

T8( 2 ) = y2
x1 Z
x2 z1
(4.4.1)
Using (1.1.1) in (4.4.1) we get
T8( 2 ) = Y + e y2
X +e1
) X + ex
Z
Z + ez1
x2
ex
ex
ez
= Y + e y2 1 + 1 1 2 1 1
X
X
Z
ex
ex
ez
= Y + e y2 1 + 1 1 1
X
X
Z
Y
Y
=Y +
ex ex2 ez1 + e y2
X 1
Z
or
T8( 2 ) Y = e y2 +
Y
Y
ex ex2 ez1
X 1
Z
The mean square of T8( 2 ) will be
Y
Y
E T8( 2 ) Y = E e y2 +
ex1 ex2 ez1
X
Z
(4.4.2)
Squaring the R.H.S. of (4.4.2)
Y2
= E e y22 + 2 ex1 ex2
X
Y2
Z
ez21 + 2
Y
ey ex ex2
X 2 1
Y
Y2
e y2 ez1 2
ex1 ex2 ez1
Z
XZ
( )
MSE T8( 2 ) = Y 2 C y2 +
+2
Y2
X
( 2 1 ) X Cx2 + 1
2
Y2
Z
Z 2 Cz2
Y
Y
( 1 2 ) Y X C y Cx xy 2 1Y Z C y Cz yz 0
X
Z
( )
MSE T8( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cx2 + 1 Cz2
50
+ 2 ( 1 2 ) C y C x xy 21 C y Cz yz
= Y 2 2 C y2 + ( 2 1 ) Cx2 + 2xy C y2 2 C y C x xy 2xy C y2
+ 1 C z2 21 C y C z yz
{(
( )
V8( 2 ) =MSE T8( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x C y xy
{(
+1 C z C y yz
C y2 2xy
C y2 2yz
(4.4.3)
4.4.2 Chand (1975) Chain-based Ratio and Product Estimator-II
Chand (1975) considered another chain-based ratio estimator

T9( 2 ) = y2
x2 z1
x1 Z
(4.4.4)
Using (1.1.1) in (4.4.4) we get
T9( 2 ) = Y + e y2
X +e
) X + ex
Z + ez1
Z
x1
ex e x
ez
= Y + e y2 1 + 2 1 1 1 + 1
X
X
Z
Y
Y
= Y + e y2
ex1 ex3 + ez1
X
Z
T9( 2 ) Y = e y2
Y
Y
ex ex2 + ez1
X 1
Z
( ) (
MSE T9( 2 ) = E T9( 2 ) Y
Y
Y
= E e y2
ex1 ex2 + ez1
X
Z
(4.4.5)
Y2
= E e y22 + 2 ex1 ex2
X
+2
Y2
Z
ez21 2
Y
ey ex ex2
X 2 1
Y
Y2
e y2 ez1 2
ex1 ex2 ez1
Z
XZ
(4.4.6)
Using (1.1.1) in (4.4.6) we get
( )
MSE T9( 2 ) = 2Y 2 C y2 +
2
Y2
Y2
Z2
( 2 1 ) X 2 Cx2 +
2
1 Z 2 Cz2
Y
Y
( 1 2 ) Y X C y Cx xy + 2 2 C y Cz yz Y Z + 0
X
Z
or
51
= Y 2 2 C y2 + ( 2 1 ) C x2 + 1 C z2 + 2 ( 2 1 ) C y C x xy + 21 C y Cz yz
or
= Y 2 2 C y2 + ( 2 1 ) C x2 + 2 C y Cx xy + 1 C z2 + 2 C y C z yz
{(
( )
V9( 2 ) = MSE T9( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x + xy C y
2
+1 C z + yz C z 2yz C z2
(4.4.7)
C y2 2xy
4.4.3 Modification of Chands T9( 2 )
Following additional estimator parallel to Chand (1975) estimator is considered

T10( 2 ) = y2
z1 x1
z2 X
(4.4.8)
Using (1.1.1) in (4.3.8) we get
T10( 2 ) = Y + e y2
Z +e1
) Z + ez
X
X + ex1
z2
ez
ez
ex
= Y + e y2 1 + 1 1 + 2 1 + 1
Z
Z
X
= Y + e y2
(T10 Y ) = ey
Y
Y
ez ez2 ex1
Z 1
Z
Y
Y
ez1 ez2 ex1
Z
Z
Y
Y
E T10( 2 ) Y = e y2 +
ez1 ez2 ex1
Z
Z
(4.4.9)
Y2
= E e y22 + 2 ez1 ez2
Z
Y2
Z2
e x21 +
Y
Y2
e y2 ex1 2 2 ex1 ez1 ez2
Z
Z
Y
ey ez ez2
Z 2 1
(4.4.10)
Using (1.1.1) in (4.4.10) we get
MSE T10( 2 ) = 2Y 2 C y2 +
+2
Y2
Y2
Z2
( 2 1 ) Z 2 Cz2 + 2
2
X 2 C x2
Y
Y
( 1 2 ) Y Z C y Cz yz 2 1Y XC y Cx xy + 0
Z
Z
52
= Y 2 2 C y2 + ( 2 1 ) C z2 + 1Cx2 + 2 ( 1 2 ) C y C z yz 21C y C x xy
= Y 2 2 C y2 + ( 2 1 ) C z2 2C y Cz yz + C y2 2yz C y2 2yz
1 C x2 2C y C x xy + C y2 2xy C y2 2xy
or
V10( 2 ) = MSE T10( 2 ) = Y 2 2 C y2 + ( 2 1 ) C z C y yz
+1 C x C y xy
(4.4.11)
C y2 2yz
C y2 2xy
4.4.4 Modification of Chands T8( 2 )

T11( 2 ) = y2
z1 x1
z2 X
(4.4.12)
Using (1.1.1) in (4.4.12) we get
= Y + e y2
Z + e 1 X + ex1
.
X
z2
) Z + ez
ez
ez
ex
= Y + e y2 1 + 1 1 + 2 1 + 1
Z
Z
X
T11( 2 ) = Y + e y2 +
Y
Y
ez1 ez2 + ex1
Z
Z
T11( 2 ) Y = e y2 +
Y
Y
ez ez2 + ex1
Z 1
Z
) (
MSE T11( 2 ) = T11( 2 ) Y
Y
Y
= E e y2 +
ez1 ez2 + ex1
Z
Z
(4.4.13)
Y2
Y2
MSE T11( 2 ) = E e y22 + 2 ez1 ez2 + 2 e x21
Z
Z
+2
Y
Y
Y2
ez1 ez2 + 2 ey2 + 2 ex1 ez1 ez2
Z
Z
Z
(4.4.14)
Using (1.1.1) in (4.4.14) we get
MSE T11( 2 ) = 2Y 2 C y2 +
Y2
Z
( 2 1 ) Z 2 Cz2 + 1
2
Y2
Z
X 2 C x2
53
Y
Y
( 2 1 ) Y Z C y Cz yz + 2 1Y X C y Cz xy + 0
Z
X
MSE T11( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz2 + 1C x2
+2 ( 2 1 ) C y C z yz + 21C y C x x xy
= Y 2 2 C y2 + ( 2 1 ) Cz2 + 2C y Cz yz + 1 C x2 + 2C y C x xy
V11( 2 ) = MSE T10( 2 ) = Y 2 2 C y2 + ( 2 1 ) C z + C y yz
{(
C y2 yz
+1 C x + C y xy C y2 2xy
(4.4.15)
4.5
Kiregyeras Estimators and some modifications
Kiregyera (1980, 1984) suggested the following estimators:

a)
b)
c)
T12( 2 ) =
y2
x1 + bxz Z z1
x2
T13( 2 ) = y2 + byx 1 Z x2
z1
T14( 2 ) = y2 + byx ( x1 x2 ) + bxz z1 Z
The following estimators on the lines of Kiregyeras are also been suggested to meet the
requirements of this monographs:
y2
z1 + bzx X x1
z2
d)
T15( 2 ) =
e)
T16( 2 ) = y2 + byz 1 X z2
x1
4.5.1 Kiregyeras (1980) Estimator (Chand-Kiregyera Estimator)
This is a modification of Chand (1975). Kiregyera (1980) assumed that Z i is closely related to X i , but
compared to X i is remotely related to Yi . This assumption may not always be to realize in particular.
Therefore T8( 2 ) may not be effectively used in many situations
T12( 2 ) =
y2
x1 + bxz Z z1
x2
(4.5.1)
Using (1.1.1) in (4.5.1) we get
T12( 2 ) =
Y + e y2
X + ex + ( xz + er ) Z Z ez
1
1
X + ex2
54
= Y + e y2
ex2
ex1
1 +
X 1 +
xz ez1
X
X
1
X
ex
ex
ez
= Y + e y2 1 2 1 + 1 xz 1
X
X
X
ex
ez
ex
= Y Y 2 + e y2 1 + 1 xz 1
X
X
X

ex
Y
Y
= Y Y 2 + e y2 + ex1 xz ez1
X
X
X
T12( 2 ) = Y + e y2 +
Y
Y
ex ex2 xz ez1
X 1
X
T12( 2 ) Y = e y2 +
Y
Y
ex1 ex2 xz ez1
X
X
(4.5.2)
The mean square error of T12( 2 )
E T12( 2 ) Y
Y
Y
= E e y2 +
ex1 ex2 xz ez1
X
X
(4.5.3)
Y
= E e y22 +
ex ex2
X 1
Y2
2Y
e y ex ex2
X 2 1
Y
Y2
xz e y2 ez1 2 xz 2 ez1 ez1 ez2
X
X
+ 2xz
ez21 +
(4.5.4)
Using (1.1.1) in (4.5.4) we get
MSE T12( 2 ) = 2Y 2 C y2 +
Y2
X
( 2 1 ) X 2 Cx2 + 12xz
2
+2 ( 1 2 )
Y
Y X C y Cx xy
X
Y2
Z 2 Cz2
X2
Y
2 1 xz Y C y Z C x yz + zero
X
(4.5.5)
Putting the Values of xz = xz
XC x
in (4.5.5) we get
Z Cz
MSE T12( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x2 + 1 2xz C z2
+2 ( 1 2 ) C y C x xy 21 xz yz C y C x
or
{(
= Y 2 2 C y2 + ( 2 1 ) Cx2 2 C y Cx xy + 2xy C y2 2xy C y2
+1 2xz C z2 2 xz yz C y C z
or
55
{(
V12( 2 ) = MSE T11( 2 ) = Y 2 2 C y2 + ( 2 1 ) C x xy C y
{(
2xy C y2
+1 C z xz C y yz C y2 2yz
(4.5.6)
4.5.2 Kiregyeras (1984) Estimator (Chand-Kiregyera Estimator)

x
T13( 2 ) = y2 + byx 1 .Z x2
z1
(4.5.7)
Using (1.1.1) in (4.5.7) we get
X + ex
1
T13( 2 ) = Y + e y2 + yx + er
Z X ex2
Z + ez
ez
= Y + ey2 + yx X + ex1 1 1 X ex2
X
= Y + ey2 + yx X + ex1 ez1 X ex2
Z
= Y + ey2 + yx ex1 ex2 yx
T13( 2 ) Y = e y2 + yx ex1 ex2 yx
X
ez
Z 1
X
ez
Z 1
(4.5.8)
The mean square of T13( 2 ) is
E T13( 2 ) Y
X
= E ey2 + yx ex1 ex2 yx ez1
Z
(4.5.9)
= E ey22 + 2yx ex1 ex2
+ 2yx
yx
X2
Z2
ez21 + 2 yx ey2 ex1 ex2
X
X
e y2 ez1 22yx ez1 ex1 ex2
Z
Z
(4.5.10)
Using (1.1.1) in (4.5.10) we get
MSE T13( 2 ) = 2 C y2 Y 2 + ( 2 1 ) 2yx X 2 C x2 + 2yx 1
X2
Z
Z 2 Cz2
+2 ( 1 2 ) yx Y X C y C x yx
X
1 Y Z C y C z + zero
Z
(4.5.11)
xy C y Y
X Cx
in (4.5.8) we get
56
MSE T13( 2 ) =
2 C y2
+
Y + ( 1 1 )
2
2xy
C y2 Y
X C x2
xy C y Y
1
1
X Cx
X2
Z
2xy C y2 Y 2
X
X 2 C x2
C x2
Z 2 ( 1 2 )
xy C y Y
Y X C y C x xy
X Cx
X
Y Z C y C z yz
Z
or
= Y 2 2 C y2 + ( 2 1 ) C y2 2xy + 1 2xy
C y2 C z2
2 ( 2 1 ) 2xy C y2 21 xy yz
C x2
Cz 2
Cy
Cx
or
C2
C
= Y 2 2 C y2 + ( 2 1 ) C y2 2xy 22xy C y2 + 1 2xy C y2 z2 2 xy yz z C y2
Cz
Cx
= Y 2 2 C y2 ( 1 2 ) C y2 2xy + 1 xy C y z yz C y 2xz C y2
C
x
Cz
xz 2yz (4.5.12)
V13( 2 ) = MSE T13( 2 ) = Y 2 ( 2 1 ) xy + 1 xy
Cx
4.5.3
Kiregyeras (1984) Regression in Regression
Kiregyera (1984) also developed a regression in regression estimator i.e.

T14( 2 ) = y2 + byx
{( x x ) + b ( z
1
xz
)}
(4.5.13)
This may be written as
T14( 2 ) = y2 + byx ( x1 x2 ) + byx bxz z1 Z
= Y + e y2 + yx ex1 ex2 + yx xz ez1
or
T14( 2 ) Y = e y2 + yx ex1 ex2 yx xz ez1
E T14( 2 ) Y
= E e y2 + yx ex1 ex2
yx xz ez1
(4.5.14)
Taking the square of R.H.S. of (4.5.14) we get
MSE T14( 2 ) = E e y22 + 2yx ex1 ex2
+ 2yx 2xz ez21 + 2 yx e y2 ex1 ex2
57
2 yz xz e y2 ez1 2 2yx xz ez1 ex1 ex2
(4.5.15)
Using (1.1.1) we get (4.5.15)
2
MSE T14( 2 ) = 2Y 2 C y2 + 2yx ( 2 1 ) X 2 C x2 + 1 Byx
2xz Z 2 C z2
+2 xy ( 1 2 ) Y C y X C x xy 2 yx xz 1 Y C y Z C z yz + zero
(4.5.16)
xy C y Y
X Cx
and xz =
MSE T14( 2 ) = 2 Y
2xy C y2 Y 2
+ ( 2 1 )
C y2
+1
xz X C x
we get
Z Cz
X 2 C x2
2xy Y 2 C y2 2xz X C x2
X
C x2
+2 ( 1 2 )
21
C z2
xy Y C y
X Cx
Z 2 C z2
Y C y X C x xy
xy X C y xz X C x
Z Cx
X 2 C x2
X Cz
YC y Z C z yz
= Y 2 C y2 2 ( 2 1 ) 2xy 2 2xy + 1 2xy 2xz 2 xy xz yz
or
V14( 2 ) = MSE T14( 2 ) = Y 2 C y2 2 1 2xy + 1 2xy + 2xz 2 xy xz yz
(4.5.17)
4.5.4 Some Modifications of Kiregiyeraa Estimators

T15( 2 ) =
y2
z1 + bxz X x1
z2
(4.5.18)
Using (1.1.1) in (4.5.18) we get

T15( 2 ) =
Y + e y2
Z + ez + ( xz + er ) X X ex
1
1
Z + ez 2
T15( 2 ) = Y + e y2 1 +
ez1 ez2 xz ex1
Z
(4.5.19)
or
Y
Y
ez1 ez2 xz ex1
Z
Z
(4.5.20)
T15( 2 ) = Y + e y2 +
Putting the value of xz =
yz ZC y
X Cx
in (4.5.20) we get
58
T15( 2 ) Y = e y2 +
Z Cz
Y
ez1 ez2 xz
Y ex1
Z
Z X Cx
C
Y
Y
ez1 ez2 xy z ex1
Cx
Z
Z
(4.5.21)
T15( 2 ) Y = e y2 +
) (
MSE T15( 2 ) = E T15( 2 ) Y
C
Y
Y
= E e y2 +
ez1 ez2 xz z ex1
Cx
Z
Z
(4.5.22)
Y2
MSE T15( 2 ) = E e y22 + 2 ez1 ez2
Z
2
2
Y
Z
2
2
Y2
X
Y2
C z2
C
Y
ey2 ex1 xz z
Cx
Z
Cz
Cx
ex1 ez1 ez2 2

ex1 ez1 ez2 xz
2xz
C x2
e x21
(4.5.23)
Using (1.1.1) in (4.5.23) we get
MSE T15( 2 ) = 2Y 2 C y2 +
+2
Y2
Y2
X2
( 2 1 ) Z 2 Cz2 +
2
2xz
Cz2
1 X 2 Cx2
Cx2
C
Y
Y
( 1 2 ) Y Z C y Cz yz 2 1Y X C y Cx yx xz z + 0
Cx
Z
X
(4.5.24)
MSE T15( 2 ) = Y 2 2 C y2 + ( 2 1 ) Cz2 + 12xz C z2
2 ( 2 1 ) C y C z yz 21C y C z xy
= Y 2 2 C y2 + ( 2 1 ) Cz2 2C y C z yz
+1 2xz C z2 2C y C z xy
{(
V15( 2 ) = MSE T15( 2 ) = Y 2 2 C y2 + ( 2 1 ) C z C y yz C y2 2yz
= xz Cz C y xy
(4.5.25)
4.5.5
C y2 2xy
Second Modification of Kiregyeras Estimator
59
T16( 2 ) = y2 + byz 1 X z2
x1
(4.5.26)
Using (1.1.1) in (4.5.26) we get
Z + ez
1
T16( 2 ) = Y + e y2 + byz
X Z + ez 2
X + ex1
or
T16( 2 ) Y = e y2 + + yz Z + ez1
) 1
ex1
Z ez2
X
Z
= e y2 + yz Z + ez1 Z ez2 ex1
X
or
Z
T16( 2 ) Y = e y2 + yz ez1 ez2 ex1
X
(4.5.27)
) (
MSE T16( 2 ) = E T16( 2 ) Y
Z
= E e y2 + yz ez1 ez2 yz ex1
X
(4.5.28)
MSE T16( 2 ) = E e y22 + 2yz ez1 ez2
+ 2yz
Z2
X
e x21 + 2 yz
2 yz
Z
ex ez ez2
X 1 1
Z
Z
e y2 ex1 22yz ex1 ez1 ez2
X
X
(4.5.29)
Using (1.1.1) in (4.5.29) we get
MSE T16( 2 ) = 2Y 2 C y2 + 2yz ( 2 1 ) Z 2 C z2 + 2yz Z 2 1 X 2 C x2

2 yz ( 1 2 ) Y ZC y C z yz 2 yz 1
Putting the value of yz =
yz Y C y
Z Cz
MSE T16( 2 ) = 2Y 2 C y2 +
+
yz Y C y
Z Cz
Z
Y XC y C z yz 0 (4.5.30)
X
in (4.5.30) we get
2yz Y 2 C y2
Z 2 C z2
( 2 1 ) Z 2 Cz2 +
( 1 2 ) Y ZC y Cz yz 2
yz YC y
Z Cz
2yz Y 2 C y2
Z 2 C z2
1
Z 2 1 X 2 C x2
Z
Y X C y C x xy
X
60

C2
MSE T16( 2 ) = Y 2 2 C y2 + ( 2 1 ) C y2 2yz + 1 x2 2yz C y2
Cz
+2 ( 1 2 ) C y2 2yz 21
Cx 2
C y yz xy
Cz
or
C2
C
= Y 2 C y2 2 1 2yz + 1 x2 2yz 2 x yz xy
C
Cz
or
C2
C
= Y 2 C y2 2 1 2yz + 1 x2 2yz + 22xy 2 x xy 2xy
C
Cz
or
V16( 2 ) = MSE T16( 2 ) = Y
4.6
C y2
2 1 2yz + 1 C x yz xy 2xy (4.5.31)
Cz
Sahoo et al Estimator
The following estimators will be considered

a)
b)
(
(
T17( 2 ) = y2 + byx ( x1 x2 ) + byz Z z1
T18( 2 ) = y2 + byz ( z1 z2 ) + byx x1 X
)
)
4.6.1 Sahoo et al (1993)
Sahoo et al (1993) developed another type of regression estimator i.e.
T17( 2 ) = y2 + byx ( x1 x2 ) + byz Z z1
(4.6.1)
Using (1.1.1) in (4.6.1) we get
) ( ex ex ) + ( yz + er ) ( ez )
+ yx ( ex ex ) yz ez
T18( 2 ) = Y + e y2 + yx + er
= Y + e y2
or
T17( 2 ) y = e y2 + yx ex1 ex2 yz ez1
(4.6.2)
The mean square of T17( 2 )
) (
MSE T17( 2 ) = E T17( 2 ) Y
= E e y2 + B yx ex1 ex2 yz ez1
(4.6.3)
= E e y22 + 2yx ex1 ex2
+ 2yz ez21 + 2 yx e y2 ex1 ex2
61
2 yz e y2 ez1 2 yx ez1 ex1 ex2
(4.6.4)
Using (1.1.1) in (4.6.4) we get
MSE T17( 2 ) = 2 Y 2 C y2 + ( 2 1 ) yx X 2 C x2 + 1 2yz Z 2 C z2

+2 yx ( 1 2 ) Y C y X C x xy 21 yz Y C y ZC z yz + 0
(4.6.5)
Putting the value of yx and yz in (4.6.5) we get
MSE T17( 2 ) = 2 Y
C y2
+1
2yz
21
or
+ ( 2 1 )
2
Y C y2
Z 2 C z2
yz Y C y
Z Cz
2xy Y 2 C y2
X
Cx2
X 2 C x2
Z 2 Cz2 + 2 ( 1 2 )
xy YC y
X Cx
Y C y X C x xy
Y C y Z C z yz
MSE T17( 2 ) = Y 2 C y2 2 + ( 2 1 ) 2xy + 1 2xy 1 2yz
or
V17( 2 ) = MSE T11( 2 ) = Y 2 C y2 2 1 2xy + 1 2xy 2yz
(4.6.6)
62
4.6.2 Modification of Sahoo et al Estimator

T18( 2 ) = y2 + byz
{( z
z2 ) bzx x1 X
)}
(4.6.7)
T18( 2 ) = y2 + byz ( z1 z2 ) byz bzx x1 X ,
(4.6.8)
where byz is sample estimate from second phase and bzx sample estimate from the first phase.
Using (1.1.1) in (4.6.8) we get
T18( 2 ) = Y + e y2 yz + er
) ( Z + ez
Zez2
) ( yz + er ) ( zx + er ) ( X + ex
or on simplification
T18( 2 ) Y = e y2 yz ez1 ez2 yz zx ex1
(4.6.9)
) (
MSE T18( 2 ) = E T18( 2 ) Y
= E e y2 + yz ez1 ez2 yz zx ex1
(4.6.10)
Squaring the R.H.S. of (4.6.10) we get
MSE T18( 2 ) = E e y22 + 2yz ez1 ez2
+ 2yz 2zx e x21 + 2 yz ey2 ez1 ez2
2 yz zx ey2 ex1 22yz zx ex1 ez1 ez2
(4.6.11)
Using (1.1.1) in (4.6.11) we get
MSE T18( 2 ) = 2Y 2 C y2 + ( 2 1 ) 2yz Z 2 C z22yz 2xz 1 X 2 Cx2

+ yz ( 1 2 ) Y ZC y C z yz 2 yz xz 1 X YC y C x xy + 0
(4.6.12)
Putting the value of yz and zx in (4.6.12)
MSE T18( 2 ) = 2Y 2 C y2 + ( 2 1 )
+2
yz YC y
ZCz
2yz Y 2 C y2
Z 2 C z2
Z 2 C z2 + 1
Y ZC y C z yz 21
2xz Z 2 C z2
ZC z
yz YC y xy ZC z
ZC z
XC x
X 2 C x2
Y XC y Cx xy
(4.6.13)
on simplification we get
MSE T18( 2 ) = 2Y 2 C y2 + ( 2 1 ) 2yz Y 2 C y2 + 1 2yz Y 2 C y2

+ ( 1 2 ) C y2Y 2 2yz 21C y2Y 2 yz xz xy
or
MSE T18( 2) = Cy2Y 2 2 + 2 ( 2 1 ) 2yz + ( 1 2 ) 2yz + 1 2yz 2xz 2 yz xz xy
63
MSE T18( 2 ) = C y2Y 2 2 ( 2 1 ) 2yz + 1 2yz 2xz 2 yz xz xy
V18( 2 ) = MSE T18( 2 ) = C y2Y 2 2 1 2yz + 1 yz xz yz
(4.6.14)
4.7 Roys (2003) Unbiased Regression Estimator
Roy (2003) proposed an unbiased estimator. He has used partial information.
)} {
)}
T19( 2 ) = y2 + k1 x1 + k2 Z z1 x2 + k3 Z z2
(4.7.1)
= y2 + k1 x1 + k1k2 Z z k1 x2 k1k3 Z z2
)
(
)
= y2 + k1 ( x1 x2 ) + k1k2 ( Z z1 ) k1k3 ( Z z2 )
= y2 + ( x1 x2 ) + ( Z z1 ) + ( Z z2 )
(4.7.2)
where = k1 = k1k2 = k1k3
) (
) (
T19( 2 ) = Y + e y2 + ex1 ex2 + Z Z ez1 + Z Z ez2
Y + e y2 + ex1 ex2 ez1 ez2
T19( 2 ) Y = e y2 + ex1 ex2 ez1 ez2
MSE of T19( 2 ) is
E T19( 2 ) Y
(4.7.3)
= E e y2 + ex1 ex2 ez1 ez2
We want to get the optimum value of , and for this we differentiate w.r.t.
) (
= E e y2 ex1 ex2 + ex1 ex2
ez1 ex1 ex2
ez2 ex1 ex2
= ( 2 1 ) Y X C y C x xy + ( 1 2 ) X 2 Cx2
( 0 )
( 1 2 ) Z X C z C x xz = 0
= ( 2 1 ) Y X C y Cx xy + ( 1 2 ) X 2 C x2
( 1 2 ) Z X C z C x xz = 0
X 2 C x2 Z X C z C x xz = Y X C y Cx xy
( 1 2 ) X Cx Z Cz xz Y C y xy = 0
(4.7.3)
Now differentiate w.r.t.
E e y2 ez1 + ez1 ex1 ex2 ez21 ez1 ez2 = 0
64
2 Y Z C y C z yz 2 Z 2 C z2 2 Z 2 C z2 = 0
2 Y C y yz 2 Z C z 2 Z Cz = 0
2 B Z C z + Z C z Y C y yz = 0
(4.7.4)
Now differentiate w.r.t.
E e y2 ez2 + ez2 ex1 ex2 ez1 ez2 ez22 = 0

2 Y 2 Z C y C z yz + ( 2 1 ) Z X Cz Cx xz 1 Z 2 C z2 2 Z 2 Cz2 = 0
2 Y Z C y yz + ( 2 1 ) X C x xz 1 Z C z 2 Z Cz = 0
2 Z C z 2 Y C y yz + 1 Z Cz + ( 1 2 ) Z Cz = 0
2 Z C z Y C y yz + 1 Z Cz + ( 1 2 ) Z Cz = 0
(4.7.5)
65

Hanif Lecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hanif Lecture

Uploaded by

Copyright:

Available Formats

Lecture Slides by Dr.

Muhammad Hanif Mian for

AN ALTERNATIVE ESTIMATOR FOR Y

This estimator is unbiased for Y and is more efficient than

where ri of is the number of repetitions of the i-th distinct

The variance of y / can be obtained by nothing that in this

N units with equal probability without replacement and

The second term in the curly brackets in (3.24), namely

It may be noted that if N is considerably larger than n,

UNBIASED RATIO ESTIMATOR

selected with probability proportional to the aggregate of the size (PPAS)

, and the remaining n 1 units

with equal probability and without replacement. Midzuno (1951)

The classical ratio estimator is

/ is the sum over all possible samples.

Note that the Var ( y) = 0 if yi xi Yi =

. This is very strong

property and will be referred to as Ratio Estimator Property.

Using (6.6.2) we have

RATIO ESTIMATOR AS MODEL-UNBIASED

where i are independent of the xi and xi are > 0. The xi (i = 1, 2, N) are

Brewer (1963b) defined an unbiased ratio estimator under model (6.8.2). He

unbiased if E ( y ) = E (Y ) in repeated selections of the finite population and

Using model (6.8.2) we have

Since E(i) = 0 we then have

We also know that

Therefore we say that y / is model unbiased if

The variance expression of y / , i.e.

Using the condition of model we will have:

ci2 xi2 + ci2Var ( i )

Using (6.8.2), (6.8.5) and (6.8.9) in (6.8.9), we will have

Let us for simplicity we assume Var ( i ) = xi then (6.8.11) will be:

Differentiating unconditionally with respect to ci, we get.

We know from (6.8.7) that

The best linear unbiased estimator y / = y , which is a classical (conventional)

Divide i into sample and non-sample values we have

Squaring and taking the expectation

Substituting the value of Var(xi), we have:

Using all these assumptions a model-unbiased estimator of from the

Putting this value of in (6.8.4) a model-unbiased variance estimator is

Putting the value of (6.8.2) we get

as cross product term is equal to 2E (Y X )

Now on first term of (6.9.4) using (6.8.2) will be

Using (6.9.5) and (6.9.6) in (6.9.4) we get:

Comparing (6.9.7) and (6.9.11) we have:

= 2 x2 So Ratio Estimator will always be more efficient if = 2 x2 i2 is

SOME RECENT DEVELOPMENTS ON RATIO ESTIMATORS

Chands (1975) developed a chain ratio type estimator in the

With Mean square error is

MSE (T2 ) = 1Y 2 C y2 + C x2 + C z2 2C x C y yx 2C y C z yz + 2C x C z xz (6.10.2)

Ignoring send and higher order terms we get

Using (6.10.4) in (6.10.5) we get

Applying expectation we get

MSE (T2 ) = MSE (T1 ) + 1Y 2 [C 2 z 2C y Cz yz + 2C xCz xz ]

6.10.2 Revised Ratio Estimator( An Estimator with suitable a

THEOREM (6.9). A possible estimator with the involvement of suitable a

with mean square error

Expanding and ignoring second and higher order terms we get

Using (6.10.8) and (6.10.9) T3 will be

This may be written as

In order to get the optimum value of a we first find partial differentiating of

Therefore optimum value of a is

Taking the square of (6.10.11)

Applying expectation the mean square error will be