You are on page 1of 5

TWO PHASE SAMPLING1

1.0 INTRODUCTION
In sample surveys, the information on an auxiliary variatex is required many times, either for
estimation (e.g, ratio, difference and regression estimation) or for selection (e.g. PPS) or
stratification to increase the efficiency of the estimator. When such information is lacking and
it is relatively cheaper to obtain information on x, we can consider taking a large preliminary
_

sample n ' for estimating x N or distribution of x as the case may be, and only a small sample
(generally a sub-sample) for measuring the y variate the character of interest for estimation.
This could mean to devote a part of resources to this large preliminary sample and therefore
reduction in sample size for measuring the study variatey. This technique is known as twophase sampling and was proposed for the first time by Neyman (1938).
Difference between two-phase and two stage sampling:
The main difference is that in two phase sampling it is necessary to have a complete sampling
frame of the units whereas in two-stage sampling, a sampling frame of the second stage units
is necessary only for the sample units selected at the first stage.
2.0 TWO PHASE SAMPLING FOR RATIO ESTIMATOR
The two-phase sampling technique consists in taking a large preliminary sample of size n ' to
_

estimate the population mean xN while a sub-sample of size n is drawn from n ' to observe
the character under study. The simplest biased ratio estimator based on a sample of size n ' is
given by
_
_

y Rd =

yn
_

xn ' = Rn xn '

(1)

xn
_
_

Re l.Bias = B1 ( y Rd )

E ( y Rd ) y N
_

1 1
= (Cx2 2 C xC y )
n n'

(2)

yN
which will be negligible if the sample size n is sufficiently large, it will be zero to first degree
of approximation, if the regression of y on x is linear and passes through origin.
_
1 1
1 1
V ( y Rd ) = S y2 + ( S y2 + RN2 S x2 2 RN S xy )
n' N
n n'
The estimate of variance is given as
_
_
1 1
1 1
Est.V ( y Rd ) = v( y Rd ) = s 2y + ( s 2y + Rn2 sx2 2 Rn sxy )
n' N
n n'
For large N,
_
_
s 2y 1 1 2
Est.V ( y Rd ) = v ( y Rd ) =
+ ( s y + Rn2 s x2 2 Rn s xy )
n' n n'

(3)

(4)

(5)

Prepared by Dr. V. K. Dwivedi, Department of Statistics, UB for STA 453: Sampling Theory and Applications

2.0 TWO PHASE SAMPLING FOR REGRESSION ESTIMATOR


The two-phase sampling technique consists in taking a large preliminary sample of size n ' to
_

estimate the population mean xN while a sub-sample of size n is drawn from n ' to observe
the character under study. The simplest biased regression estimator based on a sample of size
n ' is given by
_

y ld = yn + b ( xn ' xn )

(6)

Variance of estimate is given as,


_
1 1
1 1
V ( y ld ) = S y2 + (1 2 ) S y2
(7)
n' N
n n'
This shows that the variance in two phase sampling is larger than that in one phase sampling.
It should however be noted that collection of information on the main variate (y) for all the n
units in the first phase may be expensive. Therefore the cost of two phase sampling is
expected to be less than that of one phase sampling.
The estimate of variance is given as
_
_
1 1
1 1
Est.V ( y ld ) = v ( y ld ) = s 2y + (1 r 2 ) s 2y
n' N
n n'
For large N,
_
_
s 2y 1 1
+ (1 r 2 ) s 2y .
Est.V ( y ld ) = v( y ld ) =
n' n n'

(8)

(9)

NUMERICAL EXAMPLE (Hypothetical Values)


The following data, relating to the yield of Maize have been taken from a crop cutting survey.
In all 40 random cuts were taken and the yield of Maize observed on the day of harvest. On a
sub sample of 20 cuts out of 40, dry yield was also noted. The results are given below
Sl. No. of
cut
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Harvest Yield (in 100Kg)


x
16.8
12.7
18.8
13.9
11.3
10.9
12.5
17.4
14.1
11.9
13.4
13.5
8.3
13.7
14.6
2

Dry Yield (in 100 Kg)


y
15.2
11.8
17.5
12.5
10.4
10.1
11.2
15.8
13.0
10.8
12.3
12.4
7.6
12.5
13.3

Sl. No. of
cut
16
17
18
19
20

Harvest Yield (in 100Kg)


14.5
17.1
14.5
11.4
14.0
xi = 275.3

Sub Total
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

(i)

Dry Yield (in 100 Kg)


13.5
16.2
13.5
10.3
12.8
yi = 252.7

8.7
11.6
11.5
14.4
17.8
8.4
8.7
14.6
12.1
7.9
8.9
11.1
13
10.5
14.2
12.7
11.9
15.5
17.1
10.9
'
i

= 516.8

Estimate the average dry yield per hectare of dry maize along with its sampling
error, by utilizing both information on harvest as well as on dry yield using (a)
ratio estimator in two phase sampling ,and (b) regression estimator in two phase
sampling.
What would have been loss in precision, had we neglected the additional
information on harvest yield from the subsample of only 20 cuts for which dry
yield is available.

(ii)

CALCULATION
Given;

n=20;

n=40

Calculate the following

= 3296.49; xi 2 = 3906.53; xi yi = 3588.19;


n'

'
i

_
275.3
516.8
= 12.64 ;
x'n =
=
= 12.92 ; y n =
20
n'
40
_

i =1

252.7
xn =
= 13.77 ;
20

Rn =

yn
_

xn

= 0.9179

Using the appropriate formulae calculate the followings,


_2
_2
_ _
1 n 2
1 n 2
1 n
2
s 2y =
y

n
y
;
s
=
x

n
x
;
s
=
x
y

n
x
n
n yn
i
i
xy
i i
n
x
(n 1)
(n 1)
(n 1)

2
2
s y = 5.45;
sx = 6.16; sxy = 5.78
b=

sxy
s

2
x

s
5.78
5.78
= 0.9380; r = xy =
= 0.9968
6.16
sx s y
6.16 x 5.45

(i) Estimate the average dry yield per hectare of dry maize
(a) Estimate the average dry yield per hectare of dry maize by ratio method of estimation in
two phase sampling
_
_

y Rd =

yn
_

xn ' = Rn xn ' = 0.9179 x12.92 = 11.86 x100 kg / hect. ..

xn
s 2y

1 1
+ ( s y2 + Rn2 s x2 2 Rn s xy ) ..........
n' n n'
_
_
5.45 1
1
Est.V ( y Rd ) = v( y Rd ) =
+ (5.45 + 0.91792 x 6.16 2 x 0.9179 x5.78)
40 20 40
=0.1363x104
_

Est.V ( y Rd ) = v ( y Rd ) =

(b) Estimate the average dry yield per hectare of dry maize by regression method of
estimation in two phase sampling
_

y ld = yn + b ( xn ' xn )
_

y ld = 12.64 + 0.9380(12.92 13.77) = 11.84 x100kg / hectare

s 2y

1 1
+ (1 r 2 ) s 2y
n' n n'
_
_
5.45 1
1
Est.V ( y ld ) = v( y ld ) =
+ (1 0.99682 ) x5.45
40 20 40
=0.1372 x104
(c) Estimate of variance of the average dry yield per hectare of dry maize by SRS wor
_
_
s2
Est.V ( y n ) = v( y n ) = y
n
_
_
5.45
Est.V ( y n ) = v( y n ) =
= 0.277
20
(ii) What would have been loss in precision, had we neglected the additional information on
harvest yield from the subsample of only 20 cuts for which dry yield is available i.e estimate
% gain in efficiency using ratio and regression using two phase with respect to sample mean
per element
_

Est.V ( y ld ) = v( y ld ) =

(ii) % Gain in efficiency of ratio estimator with mean per unit SRS wor
_

v( y n )

% Gain in efficiency =
1 x100
v( y_ )
Rd

where
_
s 2 5.45
v( y n ) = y =
= 0.277 x104
n
2
_

0.277 x104

v( y )
% Gain in efficiency = _ n 1 x100; =
1 x100
4
v( y )
0.1363x10

Rd

=103%
(iii) % Gain in efficiency of regression estimator with mean per unit SRS wor
_

v( y n )

% Gain in efficiency =
1 x100
v( y_ )
ld

0.277 x104

v( y )
% Gain in efficiency = _ n 1 x100; =
1 x100
4
v( y )
0.1372 x10

ld

=102%
EXERCISES
1

Explain two phase sampling with example.

Define (i) ratio, and (ii) regression estimators in two phase sampling.

Define (i) ratio, and (ii) regression estimators variances and estimate of variances.

For estimating the total cow population, a survey was conducted in a district, in two consecutive years.
The district has 50 villages in all. A sample of 10 villages was selected with equal probability wor. In
the second year the survey was confined to a sub-sample of 5 villages selected from 10 villages. The
number of cow population in the selected villages are given below:
Sl. No. Village
1
2
3
4
5
6
7
8
9
10

No. of cows in first year


xi
140
70
60
130
250
200
100
129
166
158

No of cows in second year


yi
235
90
88
210
305

Estimate the total number of cows in the second year of the survey with and without using the figures in the first
year and compare their efficiencies.

You might also like