You are on page 1of 50

Submitted by:

HIMANI KALRA
MBA (GENERAL)
35
SUBMITTED TO:
DR. SIMMI
UNIVERSITY SCHOOL OF MGT. ,
K.U.K
Chapter 11 1

Sampling
Concept of sampling
Aims of Sampling
Merits and demerits of sampling
Types of sampling methods
Sampling errors
Sampling Distributions
Probability Distributions
The Central Limit Theorem

The method of selecting out of a given


population is called sampling.
Sampling method has three main
stages:
To select a sample.
To collect information from it.
To make inferences regarding the
characteristics of population.

Reduces time & cost of researcher (e.g.


political polls)
Generalize about a larger population
(e.g., benefits of sampling city
neighborhood)
In some cases (e.g. industrial
production) analysis may be
destructive, so sampling is needed

Saving of time & money

Intensive study

Organizational convenience

More reliable results

More scientific Methods

Less accurate

Wrong conclusions

Less reliable

Need of specified knowledge

(A)

Probability sampling methods

(4)

Simple random sampling


Stratified random sampling
Systematic random sampling
Cluster sampling

(A)

Non-profitability sampling methods

(1)
(2)
(3)

(1)
(2)
(3)
(4)

Convenience sampling
Quota sampling
Judgment sampling
snowball sampling

Every subset of a specified size n from the population has an equal


chance of being selected

Math
Alliance
Project

1. Get a list or sampling frame


a. This is the hard part. It must not
systematically exclude any one.
2. Generate random numbers.
3. Select one persons per random numbers.

The population is divided into two or more groups called strata,


according to some criterion, such as geographic location, grade level,
age, or income, and subsamples are randomly selected from each
strata.

Math
Alliance
Project

a.

b.

Select a random number, which will be known as k.


Get a list of people , or observe a flow of people (pedestrians
on a corner).
Select every kth person
Carefull that there is no systematic rhythm to the flow or list of
people.
If every 4th person on the list is , say rich or sincere or some
other consistent method avoid this method.

Every kth member ( for example: every 10th person) is


selected from a list of all population members.

Math
Alliance
Project

The population is divided into subgroups (clusters)


like families. A simple random sample is taken of
the subgroups and then all members of the cluster
selected are surveyed.

Math
Alliance
Project

Selection of whichever individuals are


easiest to reach
It is done at the convenience of the
researcher

Math
Alliance
Project

In it selection criteria based on


personal judgement that the element is
representative of the population under
study.
This is used primarily when there is
limited number of people that have
expertise in the area being researched.

In it selection of additional respondents


is based on refferals from the initial
respondents.

Determine what the population looks


like in terms of specific qualities.
Create quota based on those qualities.
Select people for each quota.

1.
2.

3.
4.

Sampling errors are those which arise due to the method of


sampling.they occur due to
faulty selection of sampling methods.
Substituting one sample for the other sample due to the difficulties
in collecting the samples.
Faulty demarcation of sampling units.
Variability of population which has differentcharateristics.

1.
2.
3.
4.
5.
6.
7.
8.

These are those which creep in due to human factors which always
varies from one investigator to another. These errors arise due to
Faulty planning.
Faulty selection of the sample units.
Lack of trained staff.
Negligence on the part of respondents.
Errors in compilations.
Errors due to wrong statistical measures.
Framing of wrong questionaires.
Incomplete investigation of the sample surveys.

Population - Entire group of items/individuals we want


information about.

Sample - The part of the population we actually examine in


order to gather information.

A parameter is a number that describes the population. It is


fixed, but we dont know its value.

A statistic is a number that describes a sample. Its value is


known, but it varies from sample to sample.

We often use statistics to estimate the unknown parameter

Statistical inference draws conclusions about a population on the basis


of data from a sample.

It also provides us with a statement of how much confidence we can


place in our conclusions.

We are in many cases interested in the mean value a variable takes in


the population.

Individual scores are random draws from a population

The sample mean is a guess about the true population mean

But how accurate (or efficient) is the sample mean?

Or, I could say, what is the standard deviation of the sample mean

I want to estimate the SD of the mean of n observations, i.e., how


much the mean is expected to vary from sample to sample

But I only get to observe one sample

Imagine that you could draw a sample and calculate a mean or


median or SD or whatever statistic again and again from a
population.

What would that distribution of this statistic look like?

Youre conceptualizing a sampling distribution.

What is its expected value and standard deviation?

If you know this, you can answer how likely it is that a sample
with a given mean (or median or SD) was drawn from a
population with known mean (or median or SD)

is a distribution of sample statistics (means, medians, etc.)

is a theoretical distribution that describes all possible means,


medians, etc., and the probability of obtaining each value.

can be visualized using simulations, but must be imagined when


collecting real data.

1.

They are approximately normal


When data in population are normally
distributed and even if they are not,
assuming large n

2.

They are centered at of the


population they are drawn from
Mean is unbiased

3.

Their standard deviation equals the


standard deviation of the individual
scores divided by the square root of
the sample
size (standard error of
SEM
X
themean)
n

100; 15

Assume IQ:

Sampling Distribution of Sample Means if n = 25

E ( X ) 100; X

15
3
25

Sampling Distribution of Sample Means if n = 100

Normal

Normal

E ( X ) 100; X

15
1.5
100

Sampling Distribution of Sample Means if n = 400

Normal

15
E ( X ) 100; X
0.75
400

n = 25

n = 100

n = 400

X 103 100

1.00
X
3

p = .1587

X 103 100

2.00
X
1.5

p = .0228

z103

z103

z103

X 103 100

4.00 p < .0001


X
.75

P-value= 0.05 level

0.6

0.5

Probability

0.4

n = 25
n = 100
n = 400

0.3

0.2

0.1

0
90.0

91.0

92.0

93.0

94.0

95.0

96.0

97.0

98.0

99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0

Sample Means

What Z score in a normal distribution separates the most extreme 5%


of the scores from the middle-most 95% of the scores?

1.96

n = 25; standard error of the mean = 3.00

X X 100
1.96

X
3

X 100 1.96 3.00 94.12

X 100 1.96 3.00 105.88

0.6

0.5

Probability

0.4

n = 25

0.3

0.2

0.1

0
90.0

91.0

92.0

93.0

94.0

95.0

96.0

97.0

98.0

99.0

100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0

Sample Means

What Z score in a normal distribution separates the most extreme 5%


of the scores from the middle-most 95% of the scores?

1.96

n = 100, standard error of the Mean = 1.50

X X 100
1.96

X
1.50

X 100 1.96 1.50 97.06

X 100 1.96 1.50 102.94

0.6

0.5

Probability

0.4

n = 100

0.3

0.2

0.1

0
90.0

91.0

92.0

93.0

94.0

95.0

96.0

97.0

98.0

99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0

Sample Means

What Z score in a normal distribution separates the most extreme 5%


of the scores from the middle-most 95% of the scores?

1.96

n = 400 , standard error of the Mean = 0.75

X X 100
1.96

X
.75

X 100 1.96 .75 98.53

X 100 1.96 .75 101.47

0.6

0.5

Probability

0.4

n = 400

0.3

0.2

0.1

0
90.0

91.0

92.0

93.0

94.0

95.0

96.0

97.0

98.0

99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0

Sample Means

Deriving the standard error of


the mean
This section is for your own edification regarding why
SEM = SD/sqrt(n).
You will not be tested on it.

If X is a random variable, var(X) is its variance


Sum of two variables X1 and X2 = X1 + X2
Variance sum law:
Var(X1 + X2) = var(X1) + var(X2)
Var(X1 - X2) = var(X1) + var(X2)
Constant multiplication rule:
If I multiply a random variable X by 2, I get 2X
Var(2X) = 22 * var(X)
Var(aX) = a2var(X)

Imagine I measure two subjects x1 and x2


They are drawn from random variables X1 and X2,
respectively
I assume they come from identical distributions
Their mean, or the sample mean, is (X1 + X2) / 2
What is the variance of that sample mean?
This tells me how accurate the sample mean is.
Why? Sqrt(var) = st. deviation = how far off the true mean I
typically am

X1 X2
find : var(
)
2

Variance of sampling distribution (for mean)

Assume independent
X1 and X2!

X1 X2
var(
) (1/2) 2 (var(X1) var(X2))
2
var(X1) var(X2) var(X)

Assume X1 & X2 have


identical distribution, with
same variance!

X1 X 2
2 var( X )
2
var(
) (1/ 2) * 2* var( X )
2
4
define : var(X1) var(X2) 2X

X1 X2 2 X
var(
)
2
2

Variance of sampling
distribution for mean of 2
subjects

Standard deviation of sampling distribution (for mean)

X1 X2 2 X
var(
)
2
2
X1 X 2
2X X
SD(
)

Std. of mean of 2 variables


2
2
2
X 1
X1 X2...
1
1

var(

N2

var(X1 X2 ...X N )

N2

var(X1) var(X2) ...var(X ) N


N

N N
2

Std. of mean of n variables

X 1 X 2... Xn X
SD(
)
n
n

Each subject is a random


variable
--> n subjects

This means

If
, or s, is our estimate of the sample standard deviation
(average deviation of an individual from the sample mean)

Is our estimate of how far off the sample mean is, on average, from
the true population mean

This is the standard error of the mean


our estimate of the standard deviation of the sampling distribution of means

The standard deviation of the sampling


distribution is called the standard error

No matter what we are measuring, the


distribution of any measure across all
possible samples we could take
approximates a normal distribution, as
long as the number of cases in each
sample is about 30 or larger.

If we repeatedly drew samples from a population and


calculated the mean of a variable or a percentage or, those
sample means or percentages would be normally
distributed.

The Central Limit Theorem


Standard error can be estimated from a single sample:

Where
s is the sample standard deviation (i.e., the
sample based estimate of the standard deviation of
population), and

the

n is the size (number of observations) of the sample.

Confidence intervals
Because we know that the sampling distribution is normal,
we know that 95.45% of samples will fall within two
standard errors.
95% of samples fall within 1.96
standard errors.
99% of samples fall within
2.58 standard errors.

www.slideshare.com
www.google.com
Fundamentals of statistics , s.c.gupta
Statistics for mba , t.r.jain
www.investopedia.com

You might also like