You are on page 1of 24

Sampling and Sampling

Distributions

When every person or item in the population is studied, it is called a


complete enumeration, or census method
The process of selecting a sample or a portion of elements from a
population or a process using a specific method is called sampling

Statisticians use the word population to refer not only to people but to all
items that have been chosen for study
Statisticians use the word sample to describe a portion chosen from the
population
Mathematically, we can describe samples and populations by using
measures such as the mean, median, mode, Standard deviation etc.
When these terms describe the characteristic of a sample, they are called
Statistics
When they describe the characteristics of a population, they are called
parameters.

A statistics is a characteristic of a sample; a


parameter is a characteristic of a population.

For example: A political analyst selects a specific or random


set of people for interviews to estimate the proportion of votes
that each candidate may get from the population of votes

An auditor selects a sample of vouchers and calculates the


sample mean for estimating population average amount.

Sampling Error and Non-Sampling Errors


Any statistical inference based on sample statistics may not
always be correct be correct because such results may not
truly estimate population features. This error is referred to as
sampling error because as compared to results obtained by
one-to-one analysis of members of a population; the sample
statistics may provide a different estimate of the population
characteristic.
Whenever a sample is drawn, by definition, only that part of
the population that is included in the sample is measured, and
is used to represent the entire population.

Hence, there would always be some error in the data, resulting


from those members of the population who were not
measured.
Non-sampling Errors
Non-sampling errors arise due to biases and mistakes such as
(i) incomplete population members
(ii) non-random selection of samples
(iii) use of faculty questionnaire for data collection
(iv) wrong editing, coding, and presenting of the responses
received through the questionnaire.

Sources of Non-sampling Errors


Non-sampling errors can occur because of problems in
coverage, response, non-response, data processing, estimation
and analysis. Each of these types of errors is explained below.
Coverage errors
An error in coverage occurs when there is an omission,
duplication or wrongful inclusion of the units in the population
or sample. Omissions are referred to as undercoverage, while
duplication and wrongful inclusions are called overcoverage.
These errors are caused by defects in the survey frame:
inaccuracy, incompleteness, duplication, inadequacy and
obsolescence. Coverage errors may also occur in field
procedures (e.g., a survey is conducted, but the interviewer
misses several households or persons).

Response errors
Response errors result from the data that have been requested,
provided, received or recorded incorrectly. The response errors may
occur because of inefficiencies with the questionnaire, the
interviewer, the respondent or the survey process.
Poor questionnaire design
It is essential that sample survey or census questions are worded
carefully in order to avoid introducing bias. If questions are
misleading or confusing, then the responses may end up being
distorted.
Interview bias
An interviewer can influence how a respondent answers the survey
questions. This may occur when the interviewer is too friendly or
aloof or prompts the respondent. To prevent this, interviewers must
be trained to remain neutral throughout the interview. They must
also pay close attention to the way they ask each question. If an
interviewer changes the way a question is worded, it may impact the
respondent's answer.

Respondent errors
Respondents can also provide incorrect answers. Faulty
recollections, tendencies to exaggerate or underplay events,
and inclinations to give answers that appear more 'socially
desirable' are several reasons why a respondent may provide a
false answer.
Problems with the survey process
Errors can also occur because of a problem with the actual
survey process. Using proxy responses (taking answers from
someone other than the respondent) or lacking control over the
survey procedures are just a few ways of increasing the
possibility for response errors.
Non-response errors
Non-response errors are the result of not having obtained
sufficient answers to survey questions. There are two types of
non-response errors: complete and partial.

Complete non-response errors


These errors can occur when the survey fails to measure some
of the units in the selected sample. Reasons for this type of
error may be that the respondent is unavailable or temporarily
absent, the respondent is unable or refuses to participate in the
survey, or the dwelling is vacant. If a significant number of
people do not respond to a survey, then the results may be
biased since the characteristics of the non-respondents may
differ from those who have participated.

Sampling Method

Simple random sampling

Simple or Unrestricted random sampling is a sampling technique in which


the sample drawn is such that each and every unit in the population has an
equal chance of being included in the sample. In simple random sampling,
the item selected in sample is just a matter of chance. Here, the word
random does not mean haphazard or hit or miss but it implies that only the
chance, which determines the item that are to be included in the sample.

Systematic Sampling
Systematic sampling, sometimes called interval sampling,
means that there is a regular gap, or interval, between each
selection. This method is often used in industry, where an item
is selected for testing from a production line (say, every fifteen
minutes) to ensure that machines and equipment are working
to specification.
For example, the manufacturer might decide to select every
20th item on a production line to test for defects and quality.
This technique requires the first item to be selected at random
as a starting point for testing and, thereafter, every 20th item is
chosen. Another example is, when questioning people in a
sample survey. A market researcher might select every 10th
person who enters a particular store, after selecting a person at
random as a starting point; or interview occupants of every 5th
house in a street, after selecting a house at random as a starting
point.

Stratified Sampling
A general problem with random sampling is that you could, by
chance, miss out a particular group of units in the sample.
However, if you form the population into groups, and a sample
contains units from each group, you can make sure the sample
is representative.
Stratified random sampling is a random sampling method,
which uses the available information relating to population for
designing a more efficient sample. In stratified random
sampling, a given population to be sampled is sub-divided into
number of sub-groups or sub-population known as Strata.
The units of each stratum are homogeneous and differ as
widely as possible between strata

Cluster Sampling
Sometimes it is too expensive to spread a sample across the
population as a whole. Travel costs can become expensive if
interviewers have to survey people from one end of the
country to the other. To reduce costs, statisticians may choose
a cluster sampling technique.
Cluster sampling divides the population into groups or
clusters. A number of clusters are selected randomly to
represent the total population, and then all units within
selected clusters are included in the sample. No units from
non-selected clusters are included in the samplethey are
represented by those from selected clusters. This differs from
stratified sampling, where some units are selected from each
group.

Examples of clusters are factories, schools and geographic


areas such as electoral subdivisions. The selected clusters are
used to represent the population.
Multistage Sampling
Multi-stage sampling is like the cluster method, except that it
involves picking a sample from within each chosen cluster,
rather than including all units in the cluster. This type of
sampling requires at least two stages. In the first stage, large
groups or clusters are identified and selected. These clusters
contain more population units than are needed for the final
sample.
In the second stage, population units are picked from within
the selected clusters (using any of the possible probability
sampling methods) for a final sample. If more than two stages
are used, the process of choosing population units within
clusters continues until there is a final sample.

Non-probability Sampling
Convenience sampling
Judgment sampling
Quota sampling

Principles of Sampling
Law of Statistical Regularity
Law of Inertia of Large Numbers.
The law of statistical regularity owes its origin to the
mathematical theory of probability. According to Conner, The
law of statistical regularity lays down that a group of objects
chosen at random from a larger group tends to possess the
characteristics of that large group (universe).
In simple words, the law states that if the sample is drawn
from the population at random, is likely to have the same
characteristics as that of the population.

So the two important points are:


i. Large sample size: As the sample size increases, the
sample is likely to reveal the characteristics similar to
population and provides reliable estimates of the population
parameter.
ii. Random Selection: The sample from the population is to
be selected at random. By random selection, we mean a
selection where each and every item of the population has an
equal chance of being selected in the sample. A randomly
selected sample would be representative of the population.