You are on page 1of 27

Sampling and Sampling

Distributions
When every person or item in the population is studied, it is called a
complete enumeration, or census method

The process of selecting a sample or a portion of elements from a


population or a process using a specific method is called sampling

Statisticians use the word population to refer not only to people but to all
items that have been chosen for study

Statisticians use the word sample to describe a portion chosen from the
population

Mathematically, we can describe samples and populations by using


measures such as the mean, median, mode, Standard deviation etc.
When these terms describe the characteristic of a sample, they are called
Statistics
When they describe the characteristics of a population, they are called
parameters.
A statistics is a characteristic of a sample; a parameter is a
characteristic of a population.

For example: A political analyst selects a specific or random


set of people for interviews to estimate the proportion of votes
that each candidate may get from the population of votes

An auditor selects a sample of vouchers and calculates the


sample mean for estimating population average amount.
• Sampling Error and Non-Sampling Errors
• Any statistical inference based on sample statistics may not
always be correct be correct because such results may not truly
estimate population features. This error is referred to as
sampling error because as compared to results obtained by
one-to-one analysis of members of a population; the sample
statistics may provide a different estimate of the population
characteristic.
• Whenever a sample is drawn, by definition, only that part of
the population that is included in the sample is measured, and
is used to represent the entire population.
• Hence, there would always be some error in the data, resulting
from those members of the population who were not
measured.
• Non-sampling Errors
• Non-sampling errors arise due to biases and mistakes such as
(i) incomplete population members
• (ii) non-random selection of samples
• (iii) use of faculty questionnaire for data collection
• (iv) wrong editing, coding, and presenting of the responses
received through the questionnaire.
• Sources of Non-sampling Errors
• Non-sampling errors can occur because of problems in
coverage, response, non-response, data processing, estimation
and analysis. Each of these types of errors is explained below.
• Coverage errors
• An error in coverage occurs when there is an omission,
duplication or wrongful inclusion of the units in the population
or sample. Omissions are referred to as undercoverage, while
duplication and wrongful inclusions are called overcoverage.
These errors are caused by defects in the survey frame:
inaccuracy, incompleteness, duplication, inadequacy and
obsolescence. Coverage errors may also occur in field
procedures (e.g., a survey is conducted, but the interviewer
misses several households or persons).
• Response errors
• Response errors result from the data that have been requested,
provided, received or recorded incorrectly. The response errors may
occur because of inefficiencies with the questionnaire, the
interviewer, the respondent or the survey process.
• Poor questionnaire design
• It is essential that sample survey or census questions are worded
carefully in order to avoid introducing bias. If questions are
misleading or confusing, then the responses may end up being
distorted.
• Interview bias
• An interviewer can influence how a respondent answers the survey
questions. This may occur when the interviewer is too friendly or
aloof or prompts the respondent. To prevent this, interviewers must
be trained to remain neutral throughout the interview. They must
also pay close attention to the way they ask each question. If an
interviewer changes the way a question is worded, it may impact the
respondent's answer.
• Respondent errors
• Respondents can also provide incorrect answers. Faulty
recollections, tendencies to exaggerate or underplay events,
and inclinations to give answers that appear more 'socially
desirable' are several reasons why a respondent may provide a
false answer.
• Problems with the survey process
• Errors can also occur because of a problem with the actual
survey process. Using proxy responses (taking answers from
someone other than the respondent) or lacking control over the
survey procedures are just a few ways of increasing the
possibility for response errors.
• Non-response errors
• Non-response errors are the result of not having obtained
sufficient answers to survey questions. There are two types of
non-response errors: complete and partial.
• Complete non-response errors
• These errors can occur when the survey fails to measure some
of the units in the selected sample. Reasons for this type of
error may be that the respondent is unavailable or temporarily
absent, the respondent is unable or refuses to participate in the
survey, or the dwelling is vacant. If a significant number of
people do not respond to a survey, then the results may be
biased since the characteristics of the non-respondents may
differ from those who have participated.
Sampling Method
Simple random sampling
• Simple or Unrestricted random sampling is a sampling technique in which
the sample drawn is such that each and every unit in the population has an
equal chance of being included in the sample. In simple random sampling,
the item selected in sample is just a matter of chance. Here, the word
random does not mean haphazard or hit or miss but it implies that only the
chance, which determines the item that are to be included in the sample.
• Systematic Sampling
• Systematic sampling, sometimes called interval sampling,
means that there is a regular gap, or interval, between each
selection. This method is often used in industry, where an item
is selected for testing from a production line (say, every fifteen
minutes) to ensure that machines and equipment are working
to specification.
• For example, the manufacturer might decide to select every
20th item on a production line to test for defects and quality.
This technique requires the first item to be selected at random
as a starting point for testing and, thereafter, every 20th item is
chosen. Another example is, when questioning people in a
sample survey. A market researcher might select every 10th
person who enters a particular store, after selecting a person at
random as a starting point; or interview occupants of every 5th
house in a street, after selecting a house at random as a starting
point.
• Systematic Sampling

• In systematic sampling, elements are selected from the


population at a uniform interval that is measured in time,
order, or space

• If we want to interview every twentieth student on a college


campus, we would choose a random starting point in the first
20 names in the student directory and then pick every
twentieth name thereafter.

• Systematic sampling differs from simple random sampling in


that each element has an equal chance of being selected but
each sample does not have an equal chance of being selected.
• Stratified Sampling
• A general problem with random sampling is that you could, by
chance, miss out a particular group of units in the sample.
However, if you form the population into groups, and a sample
contains units from each group, you can make sure the sample
is representative.
• Stratified random sampling is a random sampling method,
which uses the available information relating to population for
designing a more efficient sample. In stratified random
sampling, a given population to be sampled is sub-divided into
number of sub-groups or sub-population known as Strata.

• The units of each stratum are homogeneous and differ as


widely as possible between strata
• Stratified Sampling
• To use stratified sampling, we divide the population into relatively
homogeneous groups, called strata.
• Then we use one of the two approaches
• Either we select at random from each stratum a specified number of
elements corresponding to the proportion of that stratum in the
population as a whole or we draw an equal number of elements from
each stratum in the population as a whole or we draw an equal
number of elements from each stratum and give weight to the results
according to the stratums’ proportion of total population.
• Stratified Sampling is appropriate when the population is already
divided into groups of different sizes and we wish to acknowledge
this fact.
• The advantage of stratified samples is that when they are properly
designed, they more accurately reflect characteristics of the
population from which they were chosen than do other kinds of
samples.
• Cluster Sampling
• Sometimes it is too expensive to spread a sample across the
population as a whole. Travel costs can become expensive if
interviewers have to survey people from one end of the
country to the other. To reduce costs, statisticians may choose
a cluster sampling technique.
• Cluster sampling divides the population into groups or
clusters. A number of clusters are selected randomly to
represent the total population, and then all units within
selected clusters are included in the sample. No units from
non-selected clusters are included in the sample—they are
represented by those from selected clusters. This differs from
stratified sampling, where some units are selected from each
group.
• Cluster Sampling

• In cluster sampling, we divide the populations into groups, or


clusters, and then select a random sample of these clusters.

• With both stratified and cluster sampling, the population is


divided into well-defined groups.

• We use cluster sampling when each group has small variation


within itself but there is wide variation between the groups

• We use cluster sampling when there is considerable variation


within each group but the groups are similar to each other
• Examples of clusters are factories, schools and geographic
areas such as electoral subdivisions. The selected clusters are
used to represent the population.
• Multistage Sampling
• Multi-stage sampling is like the cluster method, except that it
involves picking a sample from within each chosen cluster,
rather than including all units in the cluster. This type of
sampling requires at least two stages. In the first stage, large
groups or clusters are identified and selected. These clusters
contain more population units than are needed for the final
sample.
• In the second stage, population units are picked from within
the selected clusters (using any of the possible probability
sampling methods) for a final sample. If more than two stages
are used, the process of choosing population units within
clusters continues until there is a final sample.
Non-probability Sampling
• Convenience sampling

• Judgment sampling

• Quota sampling
Principles of Sampling
• Law of Statistical Regularity
• Law of Inertia of Large Numbers.
• The law of statistical regularity owes its origin to the
mathematical theory of probability. According to Conner, “The
law of statistical regularity lays down that a group of objects
chosen at random from a larger group tends to possess the
characteristics of that large group (universe).”
• In simple words, the law states that if the sample is drawn
from the population at random, is likely to have the same
characteristics as that of the population.
• So the two important points are:
• i. Large sample size: As the sample size increases, the
sample is likely to reveal the characteristics similar to
population and provides reliable estimates of the population
parameter.

• ii. Random Selection: The sample from the population is to


be selected at random. By random selection, we mean a
selection where each and every item of the population has an
equal chance of being selected in the sample. A randomly
selected sample would be representative of the population.

You might also like