You are on page 1of 36

Sampling

and
Sampling Distribution

Prof G R C Nair
Objectives

• To be conversant with the terms


• The advantages of sampling
• Types of Sampling
• Sampling Distributions
• Sampling Errors
• To learn Central Limit Theorem
Why Sample the Population?

• Physical impossibility of checking


all items (census).
• The cost of studying all the items.
• Highly time-consuming.
• The destructive nature of certain
tests.

• The sample results are usually


adequate.
Probability Sampling

• Sample selected such that each


item or person in the population
being studied has an equal
likelihood of being selected
Methods of Probability Sampling
• Simple Random Sample:
A sample so formulated that each item
or person in the whole population has the
same chance of being selected
eg: lottery , housie.

Systematic Random Sampling:


The items or individuals of the population
arranged in some order. A random starting
point is selected and then every k th member
of the population is selected for the sample.
Cluster Sampling:
A population first divided into primary units or
clusters, which is as heterogeneous as the
population .Then random samples of clusters
selected for the survey.

• Stratified Random Sampling:


Heterogeneous population first divided into
homogeneous subgroups, or strata, having
some common features.
Proportionate subsample is selected from
each such stratum at random and combined to
get a true sample.
• Non Probability Sample

Inclusion in the sample based on the


judgment or convenience.

Many items may not get a chance.

Statistical theories cannot be used.


Sampling Error
• The difference between the value of a sample
statistic and the value of the corresponding
population parameter .

• In the case of mean, Sampling error = X – 


In case of Std deviation,
Sampling error = s - 

Assuming that the sample is random and no


non sampling error has been made
Non Sampling Error

• Errors due to deficiencies in the


collection , recording and
tabulation of data.
Sampling Distribution
Many number of samples of size ‘n’ can be drawn
from a population of ‘N’ (N C n).

Highly improbable that all samples have the same


mean. They vary from sample to sample.

• The probability distribution of all these values of


means of various samples taken from a population
is called Sampling distribution of means.

• Sampling distribution of std deviations,


proportions etc can also be made.
• The law firm of Hoya and Associates
has five partners. At their weekly
partners meeting each reported the
number of hours they worked for
their clients in the last week.
Partner Hours
1. Dunn 22
2. Hardy 26
3. Kiers 30
4. Malinowski 26
5. Tillman 22
• If two partners are selected
randomly, how many different
samples are possible?
This is the combination of 5
objects taken 2 at a time. That is:
There are a total of 10 different
samples.

5!
5 C2   10
2! (5  2)!
Partners Total Mean
1,2 48 24
1,3 52 26
1,4 48 24
1,5 44 22
2,3 56 28
2, 4 52 26
2, 5 48 24
3, 4 56 28
3, 5 52 26
4, 5 48 24
• Organize the sample means into a
sampling distribution.

Sample Frequency Relative


Mean Frequency
probability
22 1 1/10
24 4 4/10
26 3 3/10
28 2 2/10
Compute the mean of the sample
means. Compare it with the
population mean.

22 (1)  24 ( 4)  26 (3)  28 ( 2 )
X   25 .2
10
22  26  30  26  22
  25.2
5

• The population mean is also 25.2 hours.

Note that the mean of the sample means is


exactly equal to the population mean if all
possible samples are taken. But not otherwise.
Standard Error
• Standard Error (S.E) of a statistic is the
std deviation of sampling distribution of
that statistic.
• S.E of Mean , x =  /root of n
• S.E of Std Deviation, s =  /root of 2n
S.E of proportion,  p = root of (pq/n)
Std error is used to test if the difference
between sample statistic and population
parameter is significant or not.
• A random sample of 500 oranges was
taken from a large consignment and 65
were found to be defective. Find the
Standard error of the proportion of bad
ones in a sample of this size.

• p= 65/500 =0.13
• q= 1-p = 0.87, n = 500
• Std Error = root (pq/n) = 0.015
Finite Population Multiplier
• When samples are taken with out
replacement , from a finite
population, this correction factor is
to be used if n > 5% of N
• Factor = Root of {(N-n)/(N-1)}
• ie, in such cases, Std Error of Mean,
x =(/root of n)* root {(N-n)/(N-1)}
• A simple random sample of size 36 is
drawn from a finite population consisting
of 101 units. If the population Standard
deviation is 12.6 find the standard error
of sample mean when the sample is drawn
i. with replacement ii. without
replacement
• Ans: i. /root n =2.1,
ii. =(/root n) * root {(N-n)/(N-1)}
= 1.693
Central Limit Theorem
• The sampling distribution of the means of
‘n’ samples generated from a population
will tend towards normal distribution, as
the value of ‘n’ increases.

• This is irrespective of the distribution of


the population itself.

• The mean of the sampling distribution of


means will equal the population mean.
The Central Limit Theorem Applies to
Sampling Distributions from Any Population
Normal Uniform Skewed General

Population

n=2

n = 30

 X  X  X  X
• The mean of the sampling distribution
of the sample means, X tends to 
and its standard deviation x tends to
 / root n,
• For large samples, (n > 30),
X can be taken as = 
And x as =  / root n.
Purpose of sampling
• The main purpose of sampling is to infer the
parameter from the statistic. eg: exit poll
• It can be used to identify the unknown
population from which a given sample can be
expected to belong. eg : Oil spill, blood stain
• It can be used to asses the probability of a
one or more samples belonging to the same
population. eg: twin sisters
• Central Limit Theorem permits us to use
normal distribution to test these, if the
sample size is large enough.
Example 1

• The weight of persons using a lift is


normally distributed with a mean of 70 kg
and a standard deviation of 9 kg. The lift
has a maximum capacity of 300 kg. If
four persons enter the lift, what is the
probability of the load exceeding the
limit?
• Hint – similar to marks / first class case
• Sample mean how far away from mean of mean
• Population mean  = 70 kg
• Sample mean = 300/4 = 75 kg
• The sampling error of sampling distribution
of means from a population is = /root n
• = 9/root 4 = 4.5kg
• P( x > 75 ) is to be found out.
• z = x-/x = (75-70)/4.5 = 1.11
• From normal table, corresponding
P (z >1.11) = 0.5 -.3665 = 0.1335
Example 2

• From past experience it is known


that the std deviation of the
number of days of absence of
workers in an industry is 15. In a
random sample of 100 workers, what
is the probability for the mean
value to differ from the actual mean
by more than 3 days ?
Using normal distribution,
Standard error of sampling distribution
x = /root n =15/ root 100 = 1.5
3 days = 2 x
>3 days (ie ‘x’) = >2 (ie z) from the mean
Hence P (x > 3 or < -3) = P (z>2 or < -2).
From table, this is =(.5-.4772)+(.5-.4772)
=0.0456
Example 3

• In a normal distribution with mean 375 and


std deviation 48, how big a sample should
be taken so that the probability for the
sample mean to fall between 370 and 380
will be at least 0.95? Ans :
• Probability on either side of mean = 0.475.
• From table, corresponding value of z =1.96
• 1.96 = (380-375)/(48/root n) . n =354.3
• So sample size to be at least 355.
Example 4 / HW
• An oil refinery has back up monitors to keep
track of the oil flow to prevent disruption in
process. A monitor has an average life of
4300 hrs and a std deviation of 730 hrs. It
has two standby identical units which will
automatically come into operation, if any
monitor fails. What is the probability that a
set of monitors will last for
i.13000 hrs at least? ii.12630hrs at the most?
Hint: Treat it as a sample 3 monitors.
Ans: i. 0.4685 ii. 0.4154
• Treat it as a sample 3 monitors.
• Average life of population  = 4300hrs.
•  = 730 hrs. S.E = 730/root 3 = 421.5
i. To get a life of 13000 hrs with 3 monitors
working one after other, the mean of sample
should be 13000/3 hrs = 4333.33 hrs.
z = (4333.33 - 4300)/421.5 = 0.079
Probability (z > 0.079) = 0.5-.0315 = 0.4685
ii. z = (4210 – 4300)/421.5 = - 0.2135
Probability (z< -0.2135)= 0.5-0.0846= 0.4154
Example HW

• An underwater salvage team is planning to


explore the site where 45 Spanish ships
carrying gold sank. From records, it appears
that it can generate a revenue of 225,000$
per ship, and the std deviation is expected to
be 39,000$.The team’s financier is skeptic and
opined that if the expedition expenses of 2.1
m$ is not recovered in the first 9 ships, he will
cancel the reminder of the exploration. What is
the probability for the exploration to continue
beyond 9 ships? Ans : 0.2393
Example HW

• Sara Gorden is…. 6.46 LEVIN


Assignment - Sampling

• Sampling-page 311-6.3,19,21 Levin


• Page 323- 6.39,
• 6-46, 6-54, 6-62,6-64
• Page 339 for objective

You might also like