You are on page 1of 12

1

Sampling Theory
MODULE VIII

LECTURE - 28
DOUBLE SAMPLING
(TWO PHASE SAMPLING)


DR. SHALABH
DEPARTMENT OF MATHEMATICS AND STATISTICS
INDIAN INSTITUTE OF TECHNOLOGY KANPUR

The ratio and regression methods of estimation require the knowledge of population mean of auxiliary variable 2 to
estimate the population mean of study variable If information on auxiliary variable is not available, then there are
two options collect a sample only on study variable and use sample mean as an estimator of population mean.
An alternative solution is to use a part of the budget for collecting information on auxiliary variable to collect a large
preliminary sample in which x
i
alone is measured. The purpose of this sampling is to furnish a good estimate of .
This method is appropriate when the information about x
i
is on file cards that have not been tabulated. After collecting
a large preliminary sample of size units from the population, select a smaller sample of size n from it and collect
the information on y. These two estimates are used to obtain an estimator of population mean This procedure of
selecting a large sample for collecting information on auxiliary variable x and then selecting a sub-sample from it for
collecting the information on the study variable y is called double sampling or two phase sampling. It is useful when it
is considerably cheaper and quicker to collect data on x than y and there is high correlation between x and y.
2

In this sampling, the randomization is done twice. First a random sample of size is drawn from a population of size
N and then again a random sample of size n is drawn from the first sample of size .
So the sample mean in this sampling is a function of the two phases of sampling. If SRSWOR is utilized to draw the
samples at both the phases, then
- number of possible samples at first phase when a sample of size n is drawn from a population of size N is ,
say.
- number of possible samples at second phase where a sample of size n is drawn from the first phase sample of size
is , say.

( ) X
( ). Y
X
' n
. Y
' n
' n
0
'
N
M
n
| |
=
|
\ .
' n
1
' n
M
n
| |
=
|
\ .
Population of X
(N units)
Sample
(Large)
n units
Subsample
(small)
n units


3

0
samples M
1
samples M

Then the sample mean is a function of two variables. If is the statistic calculated at the second phase such that
being the probability that i
th
sample is chosen at first phase and j
th
sample is
chosen at second phase, then


where denotes the expectation over second phase and E
1
denotes the expectation over the first phase. Thus







4

0 1
, 1,2,..., , 1,2,..., with
ij ij
i M j M P = =
| |
1 2
( ) ( ) . E E E =
2
( ) E

0 1
0 1
0 1
1 1
/
1 1
/
1 1
1
2
( )
( ( ) ( ) ( | ))
.
stage
stage
using

st
nd
M M
ij ij
i j
M M
i j i ij
i j
M M
i j i ij
i j
E P
PP P A B P A P B A
P P

= =
= =
= =
=
= =
=

5

| |
| |
| | | |
| | | |
| | | |
| | | |
2
2
2 2
2 2
2 2
2
2
1 2 2 2
2 2
1 2 2 1 2 2
2
2
1 2 1 2 1 2
1 2 1 2
( ) ( )
( ( )) ( ( ) ( ))
( ) ( ) ( ) 0
( )] [ ( ) ( )
( ) ( ) ( )
( ) ( ) ( ( ))
( ) ( )

constant for
Var E E
E E E E
E E E E
E E E E E
E E E E E E E
E
E V E E E E
E V V E







=
= +
= + +
( = +

= +

= +
= +
Variance of
Note: The two phase sampling can be extended to more than two phases depending upon the need and objective of
the experiment. Various expectations can also be extended on the similar lines .
Double sampling in ratio method of estimation
If the population mean is not known then double sampling technique is applied. Take a large initial sample of
size by SRSWOR to estimate the population mean as

Then a second sample is a subsample of size n selected from the initial sample by SRSWOR. Let be the
means of y and x based on the subsample. Then
The ratio estimator under double sampling now becomes


The exact expressions for the bias and mean squared error of are difficult to derive. So we find their approximate
expressions using the same approach mentioned while describing the ratio method of estimation.
Let

6

X
' n
X
'
1
1

'
'
n
i
i
X x x
n
=
= =

and y x
( ') , ( ) , ( ) . E x X E x X E y Y = = =

'
Rd
y
Y x
x
=

Rd
Y
0 1 2
'
, ,
y Y x X x X
Y X X


= = =
















7

| |
0 1 2
2 2
1
1 2 2
1 2 2
2
1 2
2
2
2
( ) ( ) ( ) 0
1 1
( )
1
( ) ( )( ' )
1
( )( ' ) | '
1
( ' )
1 1
'
1 1
'
Note that only those values are used whichare common to both the phases
x
x
x
E E E
E C
n N
E E x X x X
X
E E x X x X n
X
E x X
X
S
n N X
C
n N


= = =
| |
=
|
\ .
=
= (

( =

| |
=
|
\ .
| |
=
|
\ .
2
2
( ). E =
| | | |
| | | |
| |
0 2
( ) ( , ')
( | '), ( ' | ') ( , ') | '
, ( ', ')
( ', ')
1 1
'
1 1
'
1 1
'

only those units whichare common to both the phases are considered




xy
y
x
E Cov y x
Cov E y n E x n E Cov y x n
Cov Y X E Cov y x
Cov y x
S
n N XY
S
S
n N X Y
n N

=
= +
= + (

=
| |
=
|
\ .
| |
=
|
\ .
| |
=
|
\ .
' ' '. where is the sample mean of based on the sample size
x y
C C
y y s n






.


8

{ } { }
0 1
2 2
1 2 1 2 2
' '2
1 1 2
2 2
2
1
( ) ( , )
1 1
1 1
1 1
1
( ) ( )
1
( / ') ( / ')
1 1 1
( )
'
1 1 1 1 1
' '


xy
y
x
x y
n
n y
y y
E Cov y x
x y
S
n N X Y
S
S
n N X Y
C C
n N
E Var y
Y
V E y n E V y n
Y
V y E s
Y n n
S S
Y n N n n

=
| |
=
|
\ .
| |
=
|
\ .
| |
=
|
\ .
=
= + (

( | |
= +
` ( |
\ . )
| | | |
= +
| |
\ . \ .
2
2
2
'2
1 1
1 1
'. where is the mean sum of squares of based on initial sample of size
y
y
y
S
n N Y
C
n N
s y n
(
(

| |
=
|
\ .
| |
=
|
\ .






where is the variance of mean of x based on initial sample of size


Estimation error of
Write






upto the terms of order two. Other terms of degree greater than two are assumed to be negligible.




9

{ }
1 2 2
2
2
1
( ) ( , ')
1
( / '), ( '/ ') 0
1
( ')
E Cov x x
X
Cov E x n E x n
X
Var X
X
=
= + (

=
( ') Var X '. n

as
Rd
Y
0
2
1
1
0 2 1
2
0 2 1 1
2
0 2 0 2 1 1 1 2 1
(1 )

(1 )
(1 )
(1 )(1 )(1 )
(1 )(1 )(1 ...)
(1 )
Rd
o
Y
Y X
X
Y
Y
Y

+
= +
+
= + + +
= + + +
+ + + +

Rd
Y
Bias of










The bias is negligible if n is large and relative bias vanishes if i.e., the regression line passes through origin.

Mean squared error of :



1
0

2
0 2 0 1 1 2 1
2
0 2 0 1 1 2 1
2 2

( ) 1 0 0 ( ) 0 ( ) ( ) ( )

( ) ( )
( ) ( ) ( ) ( )
1 1 1 1 1 1 1 1

' '
1

Rd
Rd Rd
x y x y x x
E Y Y E E E E
Bias Y E Y Y
Y E E E E
Y C C C C C C
n N n N n N n N
Y
n



( = + + + +

=
( = +

( | | | | | | | |
= +
| | | | (
\ . \ . \ . \ .
=
( )
2
1
'
1 1
( ).
'
x x y
x x y
C C C
n
Y C C C
n n

| |

|
\ .
| |
=
|
\ .
2
,
x xy
C C =
2`
2 2
0 2 1
2 2 2 2
0 1 2 0 2 0 1 1 2
2 2 2 2

( ) ( )
( ) (
2 2 2
1 1 1 1 1 1 1 1 1
2 2
' '
retaining the terms upto order two)


Rd Rd
y x x x y
MSE Y E Y Y
Y E
Y E
Y C C C C C
n N n N n N n N n

=
+
( = + + +

| | | | | | | |
= + + +
| | | |
\ . \ . \ . \ .

( )
( )
2 2 2 2
2 2
1
1 1 1 1
2 (2 )
'
1 1
2 .
'

(ratio estimator)
x y
x y x y x y x
x y x
C C
N
Y C C C C Y C C C
n N n N
MSE Y C C C
n n

( | |
| (
\ .
| | | |
= + +
| |
\ . \ .
| |
= +
|
\ .

Rd
Y

Rd
Y
The second term is the contribution of second phase of sampling. This method is preferred over ratio method if



Choice of n and n
Write


where contain all the terms containing n and respectively.
The cost function is where C and are the costs per unit for selecting the samples n and
respectively.
Now we find the optimum sample sizes n and for fixed cost The Lagrangian function is






1
1

2
2 0
1
.
2

or
x y x
x
y
C C C
C
C

>
>
'

( )
'
Rd
V V
MSE Y
n n
= +
' and V V ' n
0
' ' C nC n C = + ' C ' n
' n
0
. C
0
2
2
'
( ' ' )
'
0
'
0 ' .
' '
V V
nC n C C
n n
V
C
n n
V
C
n n


= + + +

= =

= =

2
.
' ' ' '.
Thus
or
or
Similarly
Cn V
V
n
C
nC VC
n C V C

=
=
=
=
1
2

Thus


and so
Comparison with SRS
If X is ignored and all resources are used to estimate , then required sample size =

0
' ' VC V C
C

+
=
0
' 0
'
2
0
,
' '
'
' ,
' ' '
'

( )
( ' ')
.
Optimum say
Optimum say

opt
opt
opt Rd
opt opt
C V
n n
C VC V C
C V
n n
C VC V C
V V
Var Y
n n
VC V C
C
= =
+
= =
+
= +
+
=
by Y y
0
.
C
C
2 2
0 0
2
2
( )
/
( )
.

( ' ')
( )

Relative effiiency =
y y
y
opt Rd
S CS
Var y
C C C
CS
Var y
VC V C
Var Y
= =
=
+

You might also like