You are on page 1of 32

TYBMS Prof.

Hemant Kombrabail

SAMPLING

SOME BASIC TERMS


1. Population – In statistical usage the term population is applied to any finite or
infinite collection of individuals. It has displaced the older term universe, which is
derived from the universe of discourse of logic. It is practically synonymous with
aggregate and does not necessarily refer to a collection of living organisms.
2. Census - The complete enumeration of a population or groups at a point in time with
respect to well-defined characteristics such as population, production, traffic on
particular roads. In some connection the term is associated with the data collected
rather than the extent of the collection so that the term Sample Census has a distinct
meaning. The partial enumeration resulting from a failure to cover the whole
population, as distinct from a designed sample enquiry, may be referred to as an
'incomplete census’.
3. Sample - A part of a population, or a subset from a set of units, which is provided by
some process or other, usually by deliberate selection with the object of investigating
the properties of the parent population or set.
4. Sample survey – A survey, which is carried out using a sampling method i.e. in which
a portion only, and not the whole population, is surveyed.
5. Sampling unit - One of the units into which an aggregate is divided or regarded as
divided for the purposes of sampling, each unit being regarded as individual and
indivisible when the selection is made. The definition of unit may be made on some
natural basis, for example, households, persons, units of product, tickets, etc. 01 on
some arbitrary basis, e.g. areas defined by grid coordinates on a map. In the case of
multi-stage sampling the units are different at different stages of sampling, being
'large' at the first stage and growing progressively smaller with each stage in the
process of selection. The term sample unit is sometimes used in a synonymous sense.
6. Sampling Frame - A list, map or other specification of the units, which constitute the
available information relating to the population designated for a particular sampling
scheme. There is a frame corresponding to each state of sampling in a multi-stage
sampling scheme. The frame may or may not contain information about the size or
other supplementary information of the units, but it should have enough details so that
a unit, if included in the sample, may be located and taken up for inquiry. The nature
of the frame exerts a considerable influence over the structure of a sample survey. It is
rarely perfect, and may be inaccurate, incomplete, inadequately described, out of date
or subject to some degree of duplication. Reasonable reliability in the frame is a
desirable condition for the reliability of a sample survey based on it. In multi-stage
sampling it is sometimes possible to construct the frame at higher stages during the
progress of the sample survey itself For example, certain first stage units may be
selected in the first instance, and then more detailed lists or maps be constructed by
compilation of available information or by direct observation only of the first-stage
units actually selected
7. Sampling design - A. sample design is a definite plan for obtaining a sample from the
sampling frame. It refers to the technique or the procedure the researcher would adopt
in selecting some sampling units from which inferences about the population is
drawn. Sampling design is determined before any data are collected.

1
TYBMS Prof. Hemant Kombrabail

8. Statistic(s) and parameter(s) - A statistic is a characteristic of a sample, whereas a


parameter is a characteristic of a population. Thus, when we work out certain
measures such as mean, median, mode etc from samples, then they are called
statistic(s) for they describe the characteristics of a sample. But when such measures
describe the characteristics of a population, they are known as parameter(s). For
instance, the population mean (µ) is a parameter, whereas the sample mean (X) is a
statistic. To obtain the estimate of a parameter from a statistic constitutes the prime
objective of sampling analysis.
9. Sampling error - That part of the difference between a population value and an
estimate thereof, derived from a random sample, which is due to the fact that only a
sample of values is observed, as distinct from errors due to imperfect selection, bias
in response or estimation, errors of observation and recording, etc The totality of
sampling errors in all possible samples of the same size generates the sampling
distribution of the statistic which is being used to estimate the parent value
10. Precision - Precision is the range within which the population average (or other
parameter) will lie in accordance with the reliability specified in the confidence level as
a percentage of the estimate ± or as a numerical quantity. For instance, if the estimate is
Rs. 4000 and the precision desired is ± 4%, then the true value will be no less than Rs.
3840 and no more than Rs. 4160. This is the range (Rs 3840 to Rs. 4160) within which
the true answer should lie. But if we desire that the estimate should not deviate from the
actual value by more than Rs. 200 in either direction, in that case the range would be
Rs. 3800 to Rs. 4200.
11. Confidence level and Significance level - The confidence level or reliability is the
expected percentage of times that the actual value will fall within the stated precision
limits. Thus, if we take a confidence level of 95%, then we mean that there arc 95
chances in 100 (or .95 in 1) that the sample results represent the true condition of the
population within a specified precision range against 5 chances in 100 (or .05 in 1) that
it does not. Precision is the range within which the answer may vary and still be
acceptable; confidence level indicates the likelihood that the answer will fall within that
range, and the significance level indicates the likelihood that the answer will fall
outside that range. We can always remember that if the confidence level is 95%, then
the significance level will be (100 — 95) i.e., 5%:, if the confidence level is 99%, the
significance level is (100 — 99) i.e., 1%, and so on. We should also remember that the
area of normal curve within precision limits for the specified confidence level
constitutes the acceptance region and the area of the curve outside these limits in either
direction constitutes the rejection regions.
12. Sampling distribution - We are often concerned with sampling distribution in
sampling analysis. If we take certain number of samples and for each sample compute
various statistical measures such as mean, standard deviation, etc., then we can find
that each sample may give its own value for the statistic under consideration. All such
values of a particular statistic, say mean, together with their relative frequencies will
constitute the sampling distribution of the particular statistic, say mean. Accordingly,
we can have sampling distribution of mean, or the sampling distribution of standard
deviation or the sampling distribution of any other statistical measure. It may be noted
that each item in a sampling distribution is a particular statistic of a sample. The
sampling distribution tends quite closer to the normal distribution if the number of

2
TYBMS Prof. Hemant Kombrabail

samples is large. The significance of sampling distribution follows from the fact that
the mean of a sampling distribution is the same as the mean of the universe. Thus, the
mean of the sampling distribution can be taken as the mean of the universe.
13. Bias - Generally, an effect which deprives a statistical result of representativeness by
systematically distorting it, as distinct from a random error which may distort on any
one occasion but balances out on the average
14. Biased sample - A sample obtained by a biased sampling process, that is to say, a
process which incorporates a systematic component of error, as distinct from random
error which balances out on the average Non-random sampling is often, though not
inevitably, subject to bias, particularly when entrusted to subjective judgment on the
part of human beings

CENSUS SURVEY AND SAMPLE SURVEY:


Census survey means survey or complete enumeration of population with certain
objectives. The government in India after every ten years conducts such census survey.
The entire geographical area and entire population is covered in census survey. The data
collected are tabulated and published as census report. Such census data are used for
different purposes including economic planning and policy decisions. Census survey is a
costly and time-consuming activity and also needs huge organization and manpower for
its orderly conduct. In commercial research, such census survey is not conducted due to
various constraints particularly relating to funds, time and manpower.

Census implies collection of information from each element of the group or population of
interest, (e.g. Survey of industrial consumers). In many cases, complete enumeration is
not possible and the only alternative available is sampling.

Sample survey is the survey of a small representative part of the population taken up for
detailed scrutiny and study purpose. A sample is a small representative of the whole and
conclusions drawn from such sample are equally applicable to the entire population.
Sample survey gives the benefits of census survey but with less time, expenditure and
manpower. It is a better substitute to census survey. Sample surveys are commonly
conducted in marketing research projects and gives promising results.

A survey which is carried out using a sampling method i.e. using a representative portion
of the whole population is called sample survey which is a short cut alternative to census
survey but gives similar benefits.

REASONS FOR IMPRACTICALITY OF CENSUS


There are certain reasons that make census impractical or even impossible. The reasons
are as follows:
1. Cost: Cost is an obvious constraint on the determination of whether a census should be
taken. If information is desired on grocery purchase and use behavior (frequencies and

3
TYBMS Prof. Hemant Kombrabail

amounts of purchase of each product category, average amount kept at home and the
like) and the population of interest is all households in a country, the cost will
preclude a census being taken. Thus a sample is the only logical way of obtaining new
data from a population of this size.
2. Time: The kind of cost we have just considered is an outlay cost. The time involved in
obtaining information from either a census or a sample involves the possibility of also
incurring an opportunity cost. That is, the decision until information is obtained may
result in smaller gain or a larger loss than would have been the case from making the
same decision earlier. The opportunity to make more (or save more. as the case may
be) is, therefore, foregone.
3- Accuracy: A study using a census, by definition, contains no sampling error. A study
using a sample may involve sampling error in addition to other types of error. Other
things being equal, a census will provide more accurate data than a sample. However
it has been argued that a more accurate estimate of the population of a country could
be made from a sample than from a census. Taking a census of a population on a
"mail out - mail back" basis requires that the names and addresses of almost all
households be obtained, census questionnaires mailed, and interviews conducted of
those not responding. The questionnaires are sent to a population of which only about
half have completed high school. The potential for errors in a returned questionnaire
is therefore high.
4. Destructive nature of the measurement: Measurements are sometimes destructive in
nature. When they are, it is apparent that taking a census would usually defeat the
purpose of a measurement. If one were producing firecrackers, electrical fuses, or gas
seed, performing a functional use test on a all products for quality control purposes
would not be considered from an economic standpoint. A sample is then the only
practical choice. On the other hand, if the light bulbs, bicycles, or electrical appliances
are to be tested, a 100% sample (census) may be entirely reasonable.

According to Crisp R. D., the fundamental idea of sampling is that a small number of
items or parts (called a sample) are chosen at random from a large number of items or a
whole (called a universe or population) the sample will tend to have the same
characteristics and in approximately the same proportion as the universe.

4
TYBMS Prof. Hemant Kombrabail

FEATURES OF SAMPLING
(1) Sampling is a small representative of the whole. It is an effective alternative to the
census survey.
(2) Sampling reduces the time, efforts and money of the researcher on data collection
without any adverse effect on its quality.
(3) The sampling technique is based on the assumption that random selection of sample
from the universe do possesses the same features and characteristics as that of the
universe.
(4) The findings of sample survey are accurate and reliable. The larger sample is better as
the results available are more accurate.
(5) Sampling is used in data collection as well as for different purposes in our daily life.
(6) The concept of sampling is quite common and popular in marketing research as it
helps researchers to finalize their findings and recommendations within a short period.

FEATURES / ATTRIBUTES OF A GOOD / RELIABLE SAMPLE


(1) Goal-oriented: A sample design should be goal oriented. It is means and should be
oriented to the research objectives and fitted to the survey conditions.
(2) Accurate representative of the universe: A sample should be an accurate
representative of the universe from which it is taken. There are different methods for
selecting a sample. It will be truly representative only when it represents all types of units
or groups in the total population in fair proportions. In brief sample should be selected
carefully as improper sampling is a source of error in the survey.
(3) Proportional: A sample should be proportional. It should be large enough to
represent the universe properly. The sample size should be sufficiently large to provide
statistical stability or reliability. The sample size should give accuracy required for the
purpose of particular study.
(4) Random selection: A sample should be selected at random. This means that any item
in the group has a full and equal chance of being selected and included in the sample.
This makes the selected sample truly representative in character.
(5) Economical: A sample should be economical. The objectives of the survey should be
achieved with minimum cost and effort.
(6) Practical: A sample design should be practical. The sample design should be simple
i.e. it should be capable of being understood and followed in the fieldwork.
(7) Actual information provider: A sample should be designed so as to provide actual
information required for the study and also provide an adequate basis for the
measurement of its own reliability.

In brief, a good sample should be truly representative in character. It should be selected at


random and should be adequately proportional. These, in fact, are the attributes of a good
sample.

ADVANTAGES OF SAMPLING METHOD:


(1) Saves time and money: Sampling facilitates primary data collection easily / quickly
and with less cost. It is time saving and economical method of survey for data collection.

5
TYBMS Prof. Hemant Kombrabail

(2) Provides reliable data: The conclusions drawn from the sample survey are reliable,
accurate and also applicable to the whole population/universe. Sampling has no adverse
effect on the quality of data collected. It gives quality results with lesser volume of work.
(3) Scientific base: The concept of sampling has scientific backing as it is based on the
law of statistical regularity and the law of inertia of large numbers.
(4) Facilitates better supervision on data collection: Sampling method is restricted to
limited number of respondents. Naturally effective monitoring and supervision on the
data collection work is possible. This improves the quality of data collected.

LIMITATIONS OF SAMPLING METHOD:


(1) Findings are not completely accurate: The findings of sampling method are
reasonably accurate but not completely accurate .The findings and conclusions drawn
from sample survey may be comparatively less accurate compared with that available
from the census technique in which the entire population is covered.
(2) Findings may not be reliable: The findings may not be reliable if the sample
selected is too small or is not adequately representative in character. In such cases the
conclusions drawn may be misleading and this may affect the quality of research
work.
(3) Difficulties in the selection of representative sample: There are many practical
difficulties in the selection of representative sample. This may defeat the very purpose
of sampling..
(4) Data collection difficult in the case of large sample: Data collection becomes
difficult when large size sample is decided. This also leads to more time and money
for data collection.

A sample survey is a better alternative to the census or complete investigation, which is


lengthy and also costly. For example, census reports are published by the Government
two or three years after the collection of data. However, survey reports (based on
samples) can be prepared and published within a few months. Thus, sampling is widely
used methodology in MR. It is one vital element of research design.

STEPS IN SAMPLING PROCESS:


Having looked into the major advantages and limitations of sampling, we now turn to the
sampling process. It is the procedure required right from defining a population to the
actual selection of sample elements. There are seven steps involved in this process.
Step 1: Define the population
It is the aggregate of all the elements defined prior to selection of the sample. It is
necessary to define population in terms of
(i) elements
(ii) sampling units
(iii) extent
(iv) time.

6
TYBMS Prof. Hemant Kombrabail

A few examples are given here.

If we were to conduct a survey on the consumption of tea in Gujarat, then these


specifications might be as follows
(i) Element: Housewives
(ii) Sampling units: Households, then housewives
(iii) Extent Gujarat State
(iv) Time January 1-10, 1999

If we were to monitor the sales of a product recently introduced by us, the population
might be
(i) Element Our product
(ii) Sampling units Retail outlets, super markets, then our product
(iii) Extent Delhi and New Delhi
(iv) Time January 7-14, 1999

It may be emphasized that all these four specifications must be contained in the
designated population Omission of any of them would render the definition of population
incomplete

Step 2 : Identify the sampling frame


Identifying the sampling frame, which could be a telephone directory, a list of blocks and
localities of a city, a map or any other list consisting of all the sampling units. It may be
pointed out that if the frame is incomplete or otherwise defective, sampling will not be
able to overcome these shortcomings

The question is—How to ensure that the frame is perfect and free from any defect Leslie
Kish has observed that a perfect frame is one where "every element appears on the list
separately, once only once, and nothing else appears on the list" This type of perfect
frame would indicate one-to-one correspondence between frame units and sampling units
But such perfect frames are rather rare Accordingly, one has to use frames with one
deficiency or another, but one should ensure that the frame is not too deficient so as to be
given up altogether

This raises a pertinent question -What are the criteria for a suitable frame? In order to
examine the suitability or otherwise of a sampling frame, a number of questions need be
asked. These are
1 Does it adequately cover the population to be surveyed?
2 How complete is the frame? Is every unit that should be included represented?
3 Is it accurate? Is the information about each individual unit correct? Does the frame as a
whole contain units, which no longer exist?
4 Is there any duplication? If so, then the probability of selection is disturbed as a unit can
enter the sample more than once

7
TYBMS Prof. Hemant Kombrabail

5 Is the frame up-to-date? It could have met all the criteria when compiled but could well
be deficient when it came to be used This could well be true of all frames involving the
human population as change is taking place continuously
6 How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for
sampling? Can it easily be re-arranged so as to enable us to introduce stratification and
to undertake multi-stage sampling?
These are demanding criteria and it is most unlikely that any frame would meet them all
Nevertheless, they are the factors to be borne in mind whenever we undertake random
sampling
In marketing research most of the frames are from census reports, electoral registers, lists
of member units of trade and industry associations, lists of members of professional
bodies, lists of dwelling units maintained by local bodies, returns from an earlier survey
and large scale maps.

Step 3: Specify the sampling unit


The sampling unit is the basic unit containing the elements of the target population. The
sampling unit may be different from the element. For example, if one wanted a sample of
housewives, it might be possible to have access to such a sample directly. However, it is
easier to select households as the sampling unit and then interview housewives in each of
the households.

As mentioned in the preceding step, the sampling frame should be complete and accurate
otherwise the selection of the sampling unit might be defective. It is necessary to get a
further specification of the sampling unit both in personal interviews and in telephone
interviews. Thus, in personal interviews, a pertinent question is—of the several persons in
a household, who should be interviewed? If interviews were held during office timings
when the heads of families and other employed persons are away, interviewing would
under-represent employed persons and over-represent elderly persons, housewives and
the unemployed. In view of these considerations, it is necessary to have a random process
of selection of the adult residents of each household. One method that could be used for
this purpose is to list all the eligible persons living at a particular address and then select
one of them.

Step 4: Specify the sampling method


It indicates how the sample units are selected. One of the most important decisions in this
regard is to determine which of the two—probability and non-probability sample—is to
be chosen.

In case of a probability sample, the probability or chance of every unit in the population
being in the sample is known. Further, the selection of specific units in the sample
depends entirely on chance. No substitution of one unit for another is permissible. This
means that no human judgment is involved in the selection of a sample. In contrast, in a
non-probability sample, the probability of inclusion of any unit in the population in the
sample is not known. In addition, the selection of units within a sample involves human
judgment rather than pure chance.

8
TYBMS Prof. Hemant Kombrabail

In case of a probability sample, it is possible to measure the sampling error and thereby
determine the degree of precision in the estimates with the help of the theory of
probability. This theory also enables us to consider, from amongst the various possible
sample designs, the one that will give the maximum information per rupee. This is not
possible when a non-probability sample is used.
Probability sampling enables us to choose representative sample designs. It also enables
us to estimate the extent to which the results based on such a sample are likely to be
different from what we would have obtained had we covered the population in our study.
Conversely, the use of probability sampling enables us to determine the sample size for a
given degree of precision, indicating that our sample results do not differ by more than a
specified amount from those yielded by a study covering entire population.

Although non-probability sampling does not yield these benefits, on account of its
convenience and economy, it is often preferred to probability sampling. If the researcher
is convinced that the risks involved in the use of a non-probability sample are more than
offset by its being relatively cheap and convenient, his choice should be in favor of non-
probability sampling.

There are various types of sample designs that can be covered under the two broad
groups, random or probability samples and non-random or non-probability samples.

Step 5: Determine the sample size


In other words, one has to decide how many elements of the target population are to be
chosen.
Step 6: Specify the sampling plan
This means that one should indicate how decisions made so far are to be implemented.
For example, if a survey of households is to be conducted, a sampling plan should define
a household, contain instructions to the interviewer as to how he should take a systematic
sample of households, advise him on what he should do when no one is available on his
visit to the household, and so on. These are some pertinent issues in a sampling survey to
which a sampling plan should provide answers.
Step 7: Select the sample
This is the final step in the sampling process. A good deal of office and fieldwork is
involved in the actual selection of the sampling elements. Most of the problems in this
stage are faced by the interviewer while contacting the sample-respondents.

SAMPLING METHODS/SAMPLING DESIGNS


Sample designs are different methods used for the conduct of sample survey. Quota
sampling, judgment sampling etc. are the non-probability sample designs while random
sampling, area sampling, etc. are the probability sample designs. In brief, the sample
designs are divided into the following two categories:
(a) Probability Sampling Methods
(b) Non-Probability Sampling Method

9
TYBMS Prof. Hemant Kombrabail

Types of Sampling Methods

Probability Sampling Non - Probability Sampling

Convenience Judgment Quota Master Panel


Sampling Sampling Sampling Samples Samples

Systematic Sampling Multi-Stage


Simple Random Sampling
Sampling
Cluster Sampling Multi-Phase
Sampling
Replicated
Sampling
Stratified Sampling Area Sampling
Sequential
Sampling

(a) Probability/Random Sampling Methods


In the probability sampling methods, the sample units are selected at random. This means
the selection is haphazard/arbitrary. Every member in the universe has equal chance of
being selected as the representative. The fact that any item can be selected is known. The
selection of sampling item is impartial and independent of the person making the study.
There is no scope for any biased selection of sample units.

Probability sampling methods include random sampling, stratified, cluster, sampling, etc.
Such methods are used extensively in marketing research. These methods provide
unbiased information. The probability sampling methods are objectively designed.
However, these methods are time consuming and also costly for use. Greater statistical
competence and time are required to plan and use probability sampling methods.

(b) Non-probability Sampling Methods

10
TYBMS Prof. Hemant Kombrabail

Here, sample units are selected in a non-random manner. The selection may be
purposive. It may be based on the convenience or the judgment of the researcher. The
selection is deliberate not random. Every item is not given a definite chance of being
included in the sample. The non-probability sampling ' methods include convenience
sampling, judgment sampling, and quota sampling. In these methods, the sample is
selected in a subjective manner and the decision regarding sample is taken by the
researcher * himself. The sample selected may not be representative of the universe to
be studied. The selection of sample may be influenced by the subjective consideration of
the person connected with research work (researcher).

Non-probability sampling methods are also used in marketing research along with
probability methods. Such methods are sometimes preferred because they cost less per
observation, require less time and need relatively little statistical sophistication in
planning the sample design and in the selection the respondents. Probability sampling
methods are more scientific and capable of yielding more representative samples than
non-probability sampling methods. However, there is no sampling method (probability or
non-profitability) that can be considered to be best in all situations. Any suitable method
may be selected and used properly for promising results.

PROBABILITY SAMPLING V/S NON-PROBABILITY SAMPLING

Probability Sampling Non-Probability Sampling


Meaning (i) Probability sampling provides (i) Non-Probability sampling
an equal chance of being does not provide an equal chance
selected in the sample to each of being selected in the sample to
element of the population. each element of the population.
(ii) A probability sample is one, (ii) A non-probability sample is
where the selected units have arbitrarily selected.
some specific chance of being
included in the sample.

Type of method It is a systematic and modern It is a traditional and rather


method of sampling outdated method of sampling.

Selection of The sample is selected by chance The sample is selected by choice


sample or at random
Selection process The selection process is The selection process is, at least
controlled objectively so that the partially, subjective
items will be chosen strictly at
random
Benefit It helps to select a truly The sample selected may or may
representative sample Here, the not be a true representative of the
selection of sample items is whole population as it is selected
independent of the person as per the convenience of the

11
TYBMS Prof. Hemant Kombrabail

making the study (researcher) researcher


Nature of process It is a mechanical and It is a mental
mathematical process process/exercise of the researcher

(A) PROBABILITY SAMPLING METHODS


(1) SIMPLE RANDOM SAMPLING
Random sampling is one popular and extensively used sampling method In this method,
each and every unit of the population has an equal chance of being selected or included in
the sample Random selection does not mean haphazard selection It is one type of
selection in which every item in the universe has an equal chance of being selected alone
with all other items In random sampling, the complete list of the universe is taken but the
selection is made 'at random' from this list However, some uniform system is used for the
selection of sample Random sampling is useful for the conduct of telephone or mail
survey It is an ideal method in the surveys of specialized nature

The process of randomness does not mean that it is 'haphazard', as a layman may be
inclined to think. What it means is that the process of selecting a sample is independent of
human judgment. To ensure this, there are two methods that are followed when drawing a
random sample. These are: (i) the lottery method and (ii) the use of random numbers.

In the lottery method, each unit of the population is numbered and shown on a chit of
paper or disc. The chits are folded and put in a box from which a sample of the requisite
size is to be drawn. In case discs are used, these are well mixed up before a draw is made
so that no particular unit can be identified before it gets selected. The sample is drawn in
the same manner as winning numbers in a lottery are drawn

In the second method, the tables of random numbers are used. The members of the
population are numbered from 1 to N from which n members are selected. This process is
explained below with the help of an illustration.

Suppose a sample of size 50 is to be selected from a population of 500. First, number the
500 units from 1 to 500, the order being quite immaterial. While numbering the units,
ensure that each unit in the population has uniform digits, in this case, three. Thus, 1st
unit would have a three-digit number 001, 2nd unit 002, 10th unit 010, 11th unit O11, and
so on. After the units have been given three-digit numbers, the table of random numbers
is to be used. One may start from the left-hand top corner of the table of random numbers
and proceed systematically down sets of three-digit columns, rejecting numbers over 500
and those that have occurred earlier.
Using the first thousand numbers from the table of random numbers (an excerpt from the
table is given below), a sample of 50 out of 500 will thus be chosen.
231 055 148 389 117 433 495 367 070 313
092 259 113 455 126 426 062 401 100 488
434 325 211 207 398 225 485 035 171 047
318 263 239 108 379 420 122 441 493 310
032 194 144 337 224 006 068 043 500 222

12
TYBMS Prof. Hemant Kombrabail

Advantages Of Simple Random Sampling Method


(1) Simplicity: Simple random sampling is simplest method of probability sampling and
can be used for different types of surveys
(2) Scientific: This method is scientific as there is equal opportunity to every unit for
selection as sample
(3) Truly representative character: The samples selected by this method are truly
representative in character.
(4) Quality results: Random sampling can be used effectively (for quality results) when
the universe to be studied is small and can be listed accurately (e. g. motor car owners in
a city)

Limitations Of Simple Random Sampling Method


(1) Difficult when the universe is very large: In simple random sampling, the whole list
of universe is taken up for selection Obtaining the complete and up-to-date list of the
universe is difficult It is difficult particularly when the universe is very large in number.
(2) Costly: The cost for conducting survey by this sampling method is high as the
samples are selected at random and it is obligatory to contact them and collect the
information
(3) May prove inefficient: This method may prove to be statistically inefficient and
provide a larger standard of error than the other types of sampling designs
(4) Administrative difficulties: Random sampling involves administrative difficulties
as regards the selection of sample and follow-up measures for the collection of
information
(5) May not be fully represented: The sample selected may not be fully representative
as the selection is from the whole population and not from the groups that constitute the
population

(2) STRATIFIED SAMPLING:


In stratified sampling, the units included in the sample constitute roughly the same
population in which they are present in the total population

Stratified sampling is also called proportional random sampling. In this sampling, the
population is first subdivided into certain mutually exclusive groups or strata Such groups
may be formed on the basis of geographical area / size of the household or income After
stratification, a random sample of a given size is selected from each stratum of the total
population This is how an attempt is being made to make the sample more representative
in character Here, each of the strata is represented in the sample in relation to its
importance

13
TYBMS Prof. Hemant Kombrabail

The following example will make this clear.

Strata income per Population number Sample Sample


month (Rs) of households (Proportionate) (Disproportionate)
(1) (2) (3) (4)
0-500 5,000 50 75
501-1000 4,000 40 20
1001-2000 3,000 30 20
2001-3000 2,000 20 25
3001 + 1,000 10 10
15,000 150 150

In the above example, the population consists of 15,000 households, divided into five
strata on the basis of monthly income. Column (3) of the table shows the sample, i.e.,
number of households selected from each stratum. The sample constitutes one per cent of
the population. A sample of this type, where each stratum has a uniform sampling
fraction, is called a proportionate stratified sampling. If, on the contrary, the strata have
variable sampling fractions, the sample is called a disproportionate stratified sample. The
figures given in column (4) of the above table show a disproportionate stratified sample.
It will be seen that the sampling fraction varies from one stratum to another. Thus, for
example, it is 0.015 for the monthly income Rs 0-500 and 0.01 for the stratum, Rs 3001+.

It may he noted that a stratified random sample with a uniform sample fraction results in
greater precision than a simple random sample. But, this is possible only when the
selection within strata is made on a random basis. Further, a stratified proportionate
sample is generally convenient on account of practical considerations,

There are some other considerations in favor of the stratified random sample. The
researcher may be interested in the results for separate strata rather than for the entire
population. A simple random sample will not show results by strata as it presents only an
aggregative picture. Another consideration is that it may be administratively expedient to
split the population into strata. Yet another consideration is that one can use different
procedures for selecting samples from various strata. If the data are more variable in any
particular strata, a larger sampling fraction should be taken in that stratum. This would
result in greater overall precision

This method reduces the sampling error and it is a more accurate and representative
sampling method Naturally, it is treated as an improvement over simple random
sampling. It provides information about different components of the total population Use
of stratified sampling also leads to administrative conveniences In order to use a stratified
sample, some information regarding the population and its strata should be available to
the researcher

The process of stratified random sampling differs from simple random sampling In
simple random sampling, sample items are chosen at random from the entire universe
while in stratified random sampling, a separate random sample is chosen from each

14
TYBMS Prof. Hemant Kombrabail

stratum Stratified random sampling is used in order to increase the precision of sampling
estimates.

(3) SYSTEMATIC RANDOM SAMPLING:


In systematic random sampling method, the units of a population are first listed and the
sample is selected as per a well-defined system. The sample is drawn by selecting every
nth item is the sampling frame, "n" is determined on the basis of the desired size of the
sample A number is drawn at random, usually a number between 1 and 10 is selected For
example, we have 50,000 items in the universe and a sample size is decided as 5,000
items In our case 'n' is equal to 10 Naturally, we have to select every 10th item from the
universe However, the first item is selected at random e.g. let us take 3. Such numbers are
like 3, 13, 23, 33, 43, etc

Advantages of Systematic Random Sampling


(a) It is a simple and unbiased sampling method.
(b) It ensures speedy selection of sample.
(c) It is more efficient statistically than simple random sampling.
(d) It ensures more representative sample.

Disadvantages of Systematic Random Sampling


(a) It is time consuming and costly.
(b) It can go wrong if every sample is assumed to be similar
(c) It can create more confusion if the selection of sample is reckless.

(4) CLUSTER SAMPLING:


In cluster sampling, individual units are not selected as sample but are grouped together
and are selected group-wise for inclusion in the sample Thus, groups are selected on
random basis as sample For example, the total universe will be divided into number of
groups. Each group contains equal number of items. The sample will be selected in
groups only. Similarly, if one family is selected as sample, the information will be
collected from each member of the family. Such selection of sample in group form is
called cluster sampling.

For example, if a survey is to be undertaken in a city to collect data from individual


households, then, selection of households from all over the city would involve a
considerable amount of fieldwork and consequently, would cost more. Instead, a few
localities are first chosen. Then, all the households in these localities are covered in the
sample. Apart from reduction in cost, such a cluster sample would be desirable in the
absence of a suitable sampling frame for the whole population. If, on the other hand, a
sample of individual households from the entire city is to be chosen, it will be necessary
to first undertake the listing of all households. In view of the non-availability of a
satisfactory sampling frame, in the case of cluster sampling, such a listing could be
confined to only a few localities that are to be entirely covered in the sample.

A few points regarding cluster sampling may be noted here. First, "whether or not a
particular aggregate of units should be called a cluster" will depend on the circumstances

15
TYBMS Prof. Hemant Kombrabail

of each case. In foregoing example, localities were taken as clusters and households as
individual units. In another case, the households may be taken as a cluster and the
members of the households as individual units. Second, it is not necessary that clusters
should always be natural aggregates such as locality constituencies, schools or classes.
Artificial clusters may be formed, as is generally done in area sampling where grids may
be determined on the maps. Third, several levels of clusters may be used in any one
sample design. Thus, in a city survey, localities or wards, streets and households may be
selected in which case localities or wards are the clusters at the first level and streets at
the second level and households would be the units.

Cluster sampling method is less costly as the expenditure on traveling of interviewers is


minimized. It is useful when the researcher desires to study the characteristics of certain
individuals or items of identical nature.

(5) AREA SAMPLING:


Area sampling is a form of multi-stage sampling in which maps, rather than lists or
registers, are used as the sampling frame. This method is more frequently used in those
countries that do not have a satisfactory sampling frame such as population lists
In area sampling, the overall area to be covered in a survey is divided into several smaller
areas within which a random sample is selected Thus, for example, a city map can be
used for area sampling Various blocks can be identified on the map and this can provide a
suitable frame The entire city area can be divided into these blocks which are then
numbered and from which a random sample is finally drawn

In sampling the blocks, stratification and sampling with probability proportional to a


measure of size are commonly employed. However, stratification in area sampling is
based on geographical considerations Thus, when blocks are identified and numbered on
the map, they can be grouped into some meaningful strata representing the different
neighborhoods of the town. The point to emphasize is that these blocks must be
identifiable without any difficulty

On the basis of the blocks thus identified, numbered and assigned to strata, a stratified
sample of dwellings can be selected This can be done in either of two ways First, a
sample of dwellings may be drawn from all the dwellings included in a selected block
Second, blocks may be divided into segments of a more or less equal size, and a sample
of these segments can be chosen and finally all the dwellings from the selected segments
may be taken in the sample It will thus be seen that the second method introduces another
stage of sampling, namely, segments

Although the above discussion relates to area sampling with respect to a city or town, the
same approach is applicable to a large area, say, a state or a country, the only difference
being that one or more additional stages of sampling may have to be introduced

Finally, it may be pointed out that area sampling is perhaps the only possibility if a
suitable sampling frame is not available

16
TYBMS Prof. Hemant Kombrabail

(6) MULTI-STAGE SAMPLING


Multi-stage sampling, as the name implies, involves the selection of units in more than
one stage. In such a sampling, the population consists of a number of first stage units,
called primary sampling units (PSUs). Each of these PSUs consists of a number of
second-stage units. First, a sample is taken of the PSUs, and then a sample is taken of the
second-stage units. This process continues until the selection of the final sampling units.
It may be noted that at each stage of sampling, a sample can be selected with or without
stratification.

An illustration would make the concept of multi-stage sampling clear. Suppose a sample
of 5000 urban households from all over the country is to be selected. In such a case, the
first stage sample may involve the selection of districts. Suppose 25 districts out of say
500 districts are selected. The second stage may involve the selection of cities, say four
from each district. Finally, 50 households from each selected city may be chosen. Thus,
one would have a sample of 5000 urban households, arrived at in three stages. It is
obvious that the final sampling unit is the household.

In the absence of multi-stage sampling of this type, the process of the selection of 5000
urban households from all over the country would be extremely difficult. Besides, such a
sample would be very thinly spread over the entire country and if personal interviews are
to be conducted for collecting information, it would be an extremely costly affair. In view
of these considerations a sampling from a widely spread population is generally based on
multi-stage.

The number of stages in a multi-stage sampling varies depending on convenience and the
availability of suitable sampling frames at different stages. Often, one or more stages can
be further included in order to reduce cost. Thus, in our earlier example, the final stage of
sampling comprised 50 households from each of the four selected cities. Since this would
involve the selection of households all over the city, it would turn out to be quite
expensive and time consuming if personal interviews are to be conducted. In such a case,
it may be advisable to select two wards or localities in each of the four selected cities and
then to select 25 households from each of the 2 selected wards or localities. Thus, the cost
of interviewing as also the time in carrying out the survey could be reduced considerably.

It will be seen that an additional stage comprising wards or localities has been introduced
here. Thus the sample has become a four-stage sample –
1st stage – districts
2nd stage – cities
3rd stage – localities
4th stage – households

From the preceding discussion it should be clear that a multi-stage sample results in the
concentration of fieldwork. This in turn, leads to saving time, labor and money. There is
another advantage in its use. Where a suitable sampling frame covering the entire
population is not available, a multi-stage sample can be used.

17
TYBMS Prof. Hemant Kombrabail

(7) MULTI-PHASE SAMPLING


A multi phase sample should not be confused with a multi-stage sample The former
involves a design where some information is collected from the entire sample and
additional information is collected from only a part of the original sample Suppose a
survey is undertaken to determine the nature and extent of health facilities available in a
city and the general opinion of the people. In the first phase a general questionnaire can
be sent out to ascertain who amongst the respondents had at one time or other used the
hospital services. Then, in the second stage, a comprehensive questionnaire may be sent
to only these respondents to ascertain what they feel about the medical facilities in the
hospitals. This is a two-phase or double sampling.

The main point of distinction between a multi-stage and a multi-phase sampling is that in
the former each successive stage has a different unit of sample whereas in the latter the
unit of sample remains unchanged though additional information is obtained from a sub-
sample.

The main advantage of a multi-phase sampling is that it effects economy in time, money
and effort. In our earlier example, if a detailed questionnaire is sent out to a large sample
comprising individuals, they would not be able to provide the necessary information.
Second, more time will be required. Finally, it will be far more expensive to carry out the
survey, especially when personal interviews are involved.

(8) REPLICATED SAMPLING


Replicated sampling implies a sample design in which "two or more sub-samples are
drawn and processed completely independent of each other" It was first introduced by
“Mahalnobis" in 1936, who used the term inter-penetrating sub-samples.

In replicated sampling, several random sub-samples are selected from the population
instead of one full sample. All the sub-samples have the same design and each one of
them is a self-contained sample of the population. For example, take the case of a random
sample of 10 households. This sample may be divided into, say, 10 equal sub-samples to
be assigned to 10 interviewers. Thus, each interviewer may be required to collect
information from 10 households.

A replicated sample is particularly chosen on account of the convenience it affords in the


calculation of standard error. In many complex sample designs, the calculation of
standard error becomes too laborious. Selecting a replicated sample design can
considerably reduce this difficulty. However, in modem times when computers are being
increasingly used, the ease in calculating standard error has made it somewhat less
important. Apart from this advantage, there are certain other advantages of replicated
sampling. First, if the size of a sample is too large, it may be advisable to split it up into
two or more sub-samples. One sub-sample may be used to get the advanced results of the
survey. Second, replicated sampling can indicate the non-sampling errors.

However, replicated sampling would not be helpful in undertaking a detailed


investigation of bias as the numbers in the separate sub-samples tend to be small Further,

18
TYBMS Prof. Hemant Kombrabail

such samples do not reveal any systematic errors that may be more or less common to all
interviewers and the compensating errors which cancel each other out over an
interviewer's assignment.

Apart from the above limitations, replicated samples have other disadvantages If personal
interviews are to be conducted, replicated samples turn out to be costlier Likewise,
tabulation costs would be higher than in the case of a single large sample Finally,
replicated samples are more complex to administer.

(9) SEQUENTIAL SAMPLING


In sequential sampling, a number of samples n1, n2, n3…nx are randomly drawn from
the population It is not at all necessary that each sample should be of the same size
Generally, the first sample is the largest, the second is smaller than the first, the third is
smaller than the second, and so on

A sequential sampling is resorted mainly to bring down the cost and hence the smallest
possible sample is used The desired statistics from first sample, ni, are computed and
evaluated If these statistics do not satisfy the criteria laid down, a second sample is drawn
The results of the first and second samples are added and the statistics are recomputed
This process is continued until the specified criteria are satisfied The criteria are usually a
minimum significance level, a minimum cluster size, or a minimum confidence interval

The main advantage of sequential sampling is that it obviates the need for determining a
fixed sample size before the commencement of the survey

Suppose a firm is to decide whether a new product is to be introduced in the market or


not It feels that if it is able to acquire 15 per cent market share in a country within a year,
it should introduce the new product Further, it feels that if a market share of 10 per cent
in a few test markets is achieved, it would be possible to acquire a 15 per cent market
share in the country, say, within a period of six months Now, when the firm has
undertaken test marketing, it actually achieved far more than 10 per cent, say, 20 per cent,
of the market share and that too within three months of test marketing The firm may be
sure to achieve the 15 per cent national market share within one year even though it may
not be possible for it to accurately forecast the test market share at the end of four months

(B) NON-PROBABILITY SAMPLING METHODS


(1) CONVENIENCE SAMPLING
In convenience sampling, the convenience of the researcher is given importance while
selecting the sample. The researcher as per his convenience decides inclusion of units in
the sample. The items that are easily accessible or easily measurable are included in the
sample. Specific plan/system/method is not used for the selection of items in sample. As a
result bias is likely to enter into the sample selected.
Interviewing respondents on the street or at the bus stop or at the railway station are the
examples of convenience sampling. In this sense, convenience sampling is also called

19
TYBMS Prof. Hemant Kombrabail

accidental sampling, as the respondents in the sample are included merely on account of
their being available on the spot where the survey work is in progress. Convenience
sampling is more suitable in exploratory research, where the focus is mainly on getting
new ideas and insights into a given problem.

Advantages of Convenience Sampling


(a) It is profitably used in pre-testing of questionnaires
(b) It keeps the researcher free of tension.
(c) It allows the respondents to answer questions in leisure.

Disadvantages of Convenience Sampling


(a) Sampling could be non-representative of the population e.g., students living in college
town may not represent sample of student community.
(b) Problem of element of chance
(c) It cannot rule out bias of respondents.

(2) QUOTA SAMPLING


Quota sampling is quite frequently used in marketing research. It involves the fixation of
certain quotas, which are to be fulfilled by the interviewers

Suppose in a certain territory we want to conduct a survey of households Their total


number is 2,00,000 It is required that a sample of 1 per cent, i.e. 2000 households are to
be covered We may fix certain controls which can be either independent or inter-related
These controls are shown in the following tables

A sample of 2000 households has been chosen, subject to the condition that 1200 of these
should be from rural areas and 800 from the urban areas of the territory Likewise, of the
2000 households, the rich households should number 150, the middle class ones 650 and
the remaining 1200 should be

Independent Controls
Rural 1200 Rich 150
Urban 800 Middle class 650
Poor 1200
Total 2000 Total 2000

Inter-related Controls
Rural Urban Total
Rich 100 50 150
Middle class 400 250 650
Poor 700 500 1200
Total 1200 800 2000

20
TYBMS Prof. Hemant Kombrabail

from the poor class These are independent quota controls The second table shows the
inter-related quota controls As can be seen, inter-related quota controls allow less
freedom of selection of the units than that available in the case of independent controls

There are certain advantages in both the schemes Independent controls are much simpler,
especially from the viewpoint of interviewers They are also likely to be cheaper as
interviewers may cover their quotas within a small geographical area In view of this,
independent controls may affect the representativeness of the quota sampling Interrelated
quota controls are more representative though such controls may involve more time and
effort on the part of interviewers Also, they may be costlier than independent quota
controls

In view of the non-random element of quota sampling, it has been severely criticized
especially by statisticians, who consider it theoretically weak and unsound There are
points both in favor of and against quota sampling These are given below
Advantages of quota sampling
(a) It is economical as traveling costs can be reduced An interviewer need not travel all
over a town to track down pre-selected respondents However, if numerous controls
are employed in a quota sample, it will become more expensive though it will have
less selection bias
(b) It is administratively convenient The labor of selecting a random sample can be
avoided by using quota sampling Also, the problem of non-contacts and call-backs
can be dispensed with altogether
(c) When the field work is to be done quickly, perhaps in order to minimize memory
errors, quota sampling is most appropriate and feasible
(d) It is independent of the existence of sampling frames Wherever a suitable sampling
frame is not available, quota sampling is perhaps the only choice available

Limitations of Quota sampling


1 Since quota sampling is not based on random selection, it is not possible to calculate
estimates of standard errors for the sample results
2 It may not be possible to get a 'representative' sample within the quota as the selection
depends entirely on the mood and convenience of the interviewers
3 Since too much latitude is given to the interviewers, the quality of work suffers if they
are not competent
4 It may be extremely difficult to supervise the control and field investigation under
quota sampling
(3) JUDGEMENT SAMPLING
The main characteristic of judgment sampling is that units or elements in the population
are purposively selected It is because of this that judgment samples are also called
purposive samples Since the process of selection is not based on the random method, a
judgment sample is considered to be non-probability sampling

21
TYBMS Prof. Hemant Kombrabail

Occasionally it may be desirable to use judgment sampling Thus, an expert may be asked
to select a sample of 'representative' business firms The reliability of such a sample would
depend upon the judgment of the expert The quota sample, discussed earlier, is in a way a
judgment sample where the actual selection of units within the earlier fixed quota
depends on the interviewer

It may be noted that when a small sample of a few units is to be selected, a judgment
sample may be more suitable as the errors of judgment are likely to be less than the
random errors of a probability sample 16 However, when a large sample is to be selected,
the element of bias in the selection could be quite large m the case of a judgment sample
Further, it may be costlier than the random sampling
(4) MASTER SAMPLES
A master sample is one from which repeated sub-samples can be taken as and when
required from the same area or population This was first used in the United States when
the US Master sample of agriculture was taken In this sampling, the rural area of over
3000 US counties was divided into segments of about four farms each "After selecting a
systematic sample of 1/8 of the segments, the materials were duplicated and made
available, with instruction, at low cost" The crucial point to note in respect of master
samples is that "the actual sample for each new survey is not selected directly from the
entire population but from a frame of segments and dwellings that was selected earlier
from the entire population "

The utility of the samples is limited to a relatively short period for there may be changes
in the population which would distort the representative character of the master samples
In view of this, master samples should be relatively permanent, say, dwellings rather than
individuals or household which frequently undergo changes on account of births, deaths
and migration The main advantage of master samples is that they can be expeditiously
selected on account of their simplicity Another advantage is that they are economical,
because the same master frame is used for drawing samples for several surveys, as a
result of which the cost incurred on the preparation of the master frame is spread over
these surveys. Further, on account of this economy in each survey, one can initially spend
more to create a good master frame. Thus, economy may lead to improved quality in the
listing.
(5) PANEL SAMPLES
Panel samples are frequently used in marketing research. In panel samples, the same units
or elements are measured on subsequent occasions. To give an example: Suppose that one
is interested in knowing the change in the consumption pattern of households. A sample
of households is drawn. These households are contacted to gather information on the
pattern of consumption, subsequently, say after a period of six months, the same
households are approached once again and the necessary information on their
consumption is obtained. A comparison of the results of the two sets of data would
indicate whether there has been any change, and, if so, to what extent. In fact, the
information is collected on a more or less continuous basis with the help of panel
samples.

22
TYBMS Prof. Hemant Kombrabail

Panel samples are extremely convenient and economical and the cost of drawing a second
sample is not incurred. But the main limitation of such samples is that it may be difficult
to sustain the interest of individuals included in the panel for a long period. Many
respondents on the panel may refuse to be interviewed twice or may give poor answers.
In either case the quality of the survey will suffer. Another limiting factor in panel
samples is that there may be bias on account of the continued participation in the panel. It
is felt that the individual is conditioned to some extent by the fact that data on purchases
are reported. In such a case the purchase behavior of panel members may become
different from others not covered by the panel. Furthermore, panel samples may turn out
to be more expensive while locating the same sample of respondents after a lapse of, say,
a year, when some of them might have migrated to other areas. This would involve travel
costs in addition to being difficult.

CHARACTERISTICS OF A GOOD SAMPLE DESIGN


Kish mentions that a good sample design requires the judicious balancing of four broad
criteria— goal orientation, measurability, practicality and economy.

Goal orientation
This suggests that a sample design "should be oriented to the research objectives, tailored
to the survey design, and fitted to the survey conditions" If this is done, it should
influence the choice of the population, the measurement as also the procedure of
choosing a sample
Measurability
A sample design should enable the computation of valid estimates of its sampling
variability Normally, this variability is expressed in the form of standard errors in surveys
However, this is possible only in the case of probability sampling In non-probability
samples, such as a quota sample, it is not possible to know the degree of precision of the
survey results
Practicality
This implies that the sample design can be followed properly in the survey, as envisaged
earlier It is necessary that complete, correct, practical and clear instructions should be
given to the interviewer so that no mistakes are made in the selection of sampling units
and the final selection in the field is not different from the original sample design
Practicality also refers to simplicity of the design, i.e. it should be capable of being
understood and followed in actual operation of the field work
Economy
Finally, economy implies that the objectives of the survey should be achieved with
minimum cost and effort Survey objectives are generally spelt out in terms of precision,
i.e. the inverse of the variance of survey estimates For a given degree of precision, the
sample design should give the minimum cost Alternatively, for a given per unit cost, the
sample design should achieve maximum precision (minimum variance)

23
TYBMS Prof. Hemant Kombrabail

It may be pointed out that these four criteria come into conflict with each other in most of
the cases, and the researcher should carefully balance the conflicting criteria so that he is
able to select a really good sample design As there is no unique method or procedure by
which one can select a good sample, one has to compare several sample designs that can
be used in a survey This means that one has to weigh the pros and cons, the strong and
weak points of various sample designs in respect of these four criteria, before selecting
the best possible one

METHODS OF DETERMINING SAMPLE SIZE


There are six methods of determining sample size in market research
1. Unaided Judgment – When no specific method is used to determine sample size it is
called unaided judgment. Such approach when used to arrive at sample size gives no
explicit considerations to either the likely precision of the sample results or the cost of
obtaining them (characteristics in which client should have interest). It is an approach
to be avoided
2. All-You-Can-Afford - In this method, a budget for the project is set by some
(generally unspecified) process and after the estimated fixed costs of designing the
project, preparing a questionnaire (if required), analyzing the data & preparing the
report are deducted, the remainder of the budget is allocated to sampling Dividing this
remaining amount by the estimated cost per sampling gives the sample size
This method concentrates on the cost of the information and is not concerned about its
value Although cost always has to be considered in any systematic approach to sample
size determination, one also needs to give consideration to how much the information
provided by the sample will be worth. This approach produces sample sizes that are
larger than required as well as sizes that are smaller than optimal
3. Required Size Per Cell - This method of determining sample size can be housed on
simple random, stratified random, purposive and quota samples For example, In a
study of attitudes with respect to fast food establishments in a local marketing area it
was decided that information was desired for two occupational groups and for each of
the four age groups This resulted in 2x4 =-8 sample cells. A sample size of 30 was
needed per cell for the types of statistical analyses that were to be conducted. The
overall sample size was therefore 8 x 30 = 240.
4 Use of Bayesian Statistical Model - The Bayesian model involves finding the
difference between the expected value of the information to be provided by the sample
size and cost of sample. This difference is known as expected net gain from sampling
(ENG) The sample size with the largest positive ENG is chosen.

The procedure for finding the optimal value of ‘n’ or the size of sample under this
approach is as under:
1. Find the expected value of the sample information (EVSI) for every possible n
2. Also workout reasonably approximated cost of taking a sample of every possible
n,
3. Compare the EVSI and the cost of the sample for every possible n. In other
words, workout the expected net gain (ENG) for every possible n as stated below:

24
TYBMS Prof. Hemant Kombrabail

For a given sample size (n): (EVSI) - (Cost of sample) = (ENG)


4. From above step the optimal sample size, that value of n, which maximizes the
difference between the EVSI and the cost of the sample, can be determined
The computation of EVSI for every possible n and then comparing the same with the
respective cost is often a very cumbersome task and is generally feasible with
mechanized or computer help. Hence, this approach although being theoretically
optimal is rarely used in practice.

5. Use of Traditional Statistical Model - The formula for traditional statistical model
depends upon the type of sample to be taken and it always incorporates three common
variables
 an estimate of the variance in the population from which the sample is to be drawn
 the error from sampling that the researcher will allow
 the desired level of confidence that the actual sampling error will be within the
allowable limits
The statistical models for simple random sampling include estimation of means and
estimation of proportion

SAMPLING ERRORS
Whatever kind of sample is taken and whatever the sample size there will always be error
arising from the sampling process. The extent of such error may be defined as the
difference between a sample result, and the result that would have been achieved by
undertaking a complete census. Such errors arise because particular types of cases are
under-represented or over-represented in the sample compared with the population as a
whole. If, for example, the cases are individual consumers, then the under- or over-
representation of the sexes, ages or social classes will affect the measurement (and, more
importantly, the estimates made from them) of a large number of variables. Lack of
representation in the appropriate quantities may be a product of two factors: systematic
error (or bias) and random error (or variance).
Systematic error
Bias arises when the sampling procedures used bring about over- or under- representation
of types of cases in the sample, which is mostly in the same direction. This may happen
because:
• the selection procedures are not random,
• the selection is made from a list that does not cover the population, or uses a procedure
that excludes certain groups,
• non-respondents are not a cross-section of the population.

If the selection procedures are not random then it means that human judgement has
entered into the selection process. For example, interviewers may be asked to choose

25
TYBMS Prof. Hemant Kombrabail

respondents at some geographical location or to select households in specified streets.


The result is likely to be that certain kinds of people or households or organizations are
excluded from the sample. Thus choosing respondents in a shopping centre will miss out
people who seldom or never go shopping; the selection of households by an interviewer
may result in the omission of flats at the tops of stairs.

If the Electoral Register is used to select adults aged 16 or over, then, as indicated earlier,
16 and 17 year-olds and many of the 18 year-olds will be missing from the list and will
be under-represented in the final sample. The use of telephone directories will under
represent certain social groups less likely to be in the telephone book (or those who are
ex-directory). Duplication in lists, for example in the Yellow Pages, may result in some
over-representation. If we try to estimate sales of soap from a sample of private
households, then all users in institutions of various kinds will be excluded.

Non-response is a problem for both censuses and samples. For censuses it means that the
enumeration will be incomplete. If large numbers are missing, it would be inappropriate
to treat those successfully contacted as a representative sample'. For samples, it means
that estimates made from the sample will he biased if non-respondents are not themselves
representative of the population. If they are representative, then non-response is not so
much of a problem; but it may still mean that analyses are made on the basis of too small
a sample.

Whatever the reason for the systematic error, the effect will be that all samples that could
be drawn from a population will tend to result in the same direction of over- or under-
representation. The average of all these samples will then not be the same as the real
population average or proportion. Thus if we took lots of samples using a procedure that
tended to omit working mothers with young children, then all the samples will manifest
such under-representation rather than some over-representing them and some under-
representing them so that the average of all samples was very close to the real population
proportion.

Systematic errors cannot be reduced simply by increasing the sample size. If certain kinds
of people are not being selected, cannot be contacted or are not responding, it will not be
'solved' by taking a bigger sample. Indeed, some kinds of errors -will increase with more
interviewers, more questionnaires and greater data-processing requirements. All the
researcher can do is minimize the likelihood of bias by using appropriate sample designs.
Biases for some variables can be checked, for example against Census data or data from
other sources. Sometimes attempts are made to discover the characteristics of non-
responders, for example by sending out interviewers to non-respondents to a postal
survey, taking 'late' responders as typical of non-responders, or gaining demographic data
from the results of another survey that the non-responders have taken part in.

Random error
If we took a number of random, unbiased samples from the same population there will
almost certainly be a degree of fluctuation from one sample to another. Over a large
number of samples such errors will tend to cancel out, so that the average of such

26
TYBMS Prof. Hemant Kombrabail

samples will be close to the real population value However, we usually take only one
sample, and even a sample that has used unbiased selection procedures will seldom be
exactly representative of the population from which it was drawn. Each sample will, in
short, exhibit a degree of error. Such error is often called 'sampling error', 'hut it would he
clearer to think of it as 'random sampling error' to distinguish it from bias (which some
statisticians and some textbooks, confusingly, categorize as 'non-sampling' error).

Unlike bias, which affects the general sample composition and relates to each variable
being measured in unknown ways, random sampling error will differ from variable to
variable. The reason for this is that the extent of such error will depend on two factors:
• the size of the sample - the bigger the sample, the less the random sampling error (but
by a declining amount),
• the variability in the population for that particular variable - a sample used to estimate
a variable that varies widely in the population will show more random sampling error
than for a variable that does not.

These two factors are used as a basis for calculating the likely degree of variability in a
sample of a given size for a particular variable. This, in turn, is used as an input for
establishing with a specified probability the range of accuracy of sample estimates, or
that sample findings are only random sampling fluctuations from a population of cases in
which the findings are untrue.

NON-SAMPLING ERRORS
Not all errors in a piece of research are a result of the sampling process Certain kinds of
error may arise even if a complete census is taken. There are four main categories of such
error:
• response errors,
• interviewer errors,
• non-response errors,
• processing errors.

Where research is based on asking people questions then response errors may arise
where, for one reason or another, respondents give wrong answers. This may be through
dishonesty, forgetfulness, faulty memories. unwillingness or misunderstanding of the
questions being asked. Many of these errors arise as a result of poor or inadequate
questionnaire design putting it the other way round, the potential for such errors to arise
can be minimized by careful design of question-wording, question formulation and
questionnaire layout. In interview surveys, whether face-to-face or by telephone,
interviewers may themselves misunderstand questions or the instructions for filling them
in. they may be dishonest, inaccurate, make mistakes or ask questions in a non-standard
fashion. Interviewer training, along with field supervision and control can, to a large
extent, remove the likelihood of such errors, but they will never be entirely eliminated,
and there is always the potential for systematic differences between the results obtained
by different interviewers.

27
TYBMS Prof. Hemant Kombrabail

In nearly all research there will be missing cases, but in survey research there will always
be a degree of non-response because some people will refuse to he interviewed or to
complete a questionnaire, some will be ineligible because they turn out not to be part of
the survey population, some will terminate the interview or refuse to answer some of the
questions, and some will be non-contactable, for example, because they have moved
away, died, or are on holiday at the time of the survey. Even where a census is attempted,
it will often remain incomplete. The extent of non-response will vary considerably
according to the type of research, the topic of the research, and, where based on face-to-
face interviews, on the experience and training of the interviewers. Calculating the
amount of non-response can be confusing since some researchers will, for example, take
the proportion of refusals in the sample drawn, others will take refusals and non-contacts
as a proportion of those found eligible, and so on.

Processing errors can arise back at the office, particularly at the stage of entering answers
to questions onto a computerized database via a keyboard and screen. Agencies
sometimes validate these entries by, in effect, entering them twice, and the computer
checks to see if the two entries are identical. Alternatively, some agencies check samples
of the entries. It is possible, in addition, to apply range checks and logical checks.

There are, then, a number of sources of non-sampling error, and it is important to bear
these in mind when interpreting survey results, whether based on a sample or not. The
crucial point is that such errors can arise even if a census is taken.

Total survey error


Any research that is based on addressing questions to people and recording their answers
risks error resulting from the respondents themselves and from interviewers where these
are used in addition to those kinds of error that arise in any research from data handling,
and from inadequacies of sampling. Total survey error is the addition of all these sources
of error, both sampling and non-sampling It is difficult to estimate what the total survey
error is in any one survey, and it will tend to vary from question to question. What is
certainly true is that the error that results from random sampling fluctuations - which is
the only kind of error that is taken into account when confidence intervals are calculated
or tests are made against the null hypothesis - accounts for only a very small proportion
of the total survey error.

Errors of various kinds can always be reduced by spending more money, for example, on
more interviewer training and supervision, on random sampling techniques, on pilot
testing or on getting a higher response rate. However, the reduction in error has to be
traded off against the extra cost involved. Furthermore errors are often interrelated so that
attempts to reduce one kind of error may actually increase another, for example,
minimizing the non-response errors by persuading more reluctant respondents may well
increase response error Non-sampling errors tend to be pervasive, not well-behaved and
do not decrease - indeed may increase - with the size of the sample. It is sometimes even
difficult to see whether they cause under- or over-estimation of population characteristics.
There is, in addition, the paradox that the more efficient the sample design is in

28
TYBMS Prof. Hemant Kombrabail

controlling random sampling fluctuations, the more important in proportion become bias
and non-sampling error.

CONTROLLING NON-SAMPLING ERRORS


In practice, market research agencies make all reasonable attempts, within the limits
imposed by cost and time constraints, to minimize or at least measure the impact or make
some estimate of non-sampling errors and of bias in the sampling procedure. Thus, as far
as response errors are concerned, agencies may:
• pilot-test questionnaires in order to check for misunderstandings of questions,
• analyse tendencies to overclaim or underclaim for certain kinds of consumer behaviour,
for example, the tendency to underclaim the consumption of alcohol, or to overclaim
television watching,
• use aided-recall techniques (prompted lists) to help respondents remember products that
they may have purchased and forgotten about, or radio programs that they forgot they
had listened to,
• use questioning techniques that minimize the effort respondents need to make.

To minimize interviewer error, agencies will often:


• set rigorous training standards for interviewers,
• monitor the process of interviewing by doing 'back checks' - calling or telephoning
respondents who have already been interviewed to check that the interview was carried
out properly, or sending supervisors to accompany interviewers on a regular sample
basis,
• computer analyses may be made of questionnaire errors to identify' interviewers who
may need retraining or reminding of particular points.

To minimize errors resulting from non-response, agencies do one or more- of several


things:
• for interview surveys interviewers may be asked to make a specified number of
callbacks if the respondent was not at home on the first call Three or four such
callbacks may be made, ideally at different times and days of the week
• interviewers may make an appointment by telephone with the respondent.
• self-completing questionnaires may be left where no contact has been made
• monetary incentives or gifts may sometimes help to improve the response rate,
• interviewers may get a 'foot-in-the-door' by having respondents comply with some
small request before presenting them with the larger survey,
• non-respondents to a postal survey may be sent interviewers to persuade respondents to
complete the questionnaire, or they may be sent further reminders.

Processing errors will be minimized by careful editing and checking of the questionnaires
in addition to the use of data entry validation procedures.

Market research agencies will try to minimize bias by using carefully constructed sample
designs that use random procedures wherever possible, or by imposing restrictions on
interviewer choices where it is not. These sample designs were described earlier. Biases
will still remain, however, and sometimes these are known. Thus it may be known that

29
TYBMS Prof. Hemant Kombrabail

there are too many women in the sample, or too few men aged 20-24, compared with
known population proportions. Many agencies will make corrections to the data to adjust
for these biases by 'weighting' them.

In the real world of market research agencies and their clients it is unfortunately true that
many clients do not understand or lack interest in the basics of sampling. In consequence
many clients do not ask for estimates of bias or calculations of random sampling error. At
the same time the agencies feel that to produce calculations, for example of confidence
intervals for a large number of variables will only add confusion and perhaps distrust of
the data. In consequence, sampling errors are often quietly ignored, and the estimates
given are taken to be the 'truth'. Agencies will instead try to assure their clients that the
occurrence and impact of non-sampling errors have been minimized by:
• demonstrating that the procedures for the collection, analysis and reporting of the
results are 'respectable', meticulous and thorough,
• showing that the research design features are such as to minimize sources of error
within the parameters set by time and cost,
• emphasizing the extent of quality control checks that will uncover, correct and minimize
the occurrence of 'mistakes',
• making corrections to the resulting data so that known biases are adjusted for.

Beyond these assurances, clients are sometimes given some indication of the extent of
random sampling error that remains. Clients may be given 'read-off tables for groups of
products or types of variable, based on the 'average' variability for that group or type,
given a particular sample size.

___

Important Sampling Distributions


Some important sampling distributions, which are commonly used, are:
1. Sampling distribution of mean: Sampling distribution of mean refers to the
probability distribution of all the possible means of random samples of a given size
that we take from a population. If samples are taken from a normal population, N (µ,
σν), the sampling distribution of mean would also be normal with mean µx = µ and
standard deviation = σν / n , where µ is the mean of the population, σν is the
standard deviation of the population and n means the number of items in a sample.
But when samplings from a population which is not normal (may be positively or
negatively skewed), even then, as per the central limit theorem, the sampling
distribution of mean tends quite closer to the normal distribution, provided the
number of sample items is large i.e., more than 30. In case we want 'o reduce the
sampling distribution of mean to unit normal distribution i.e., N (0, 1), we can write
x&− µ
the normal variate z = for the sampling distribution of mean. This characteristic
σv/ n
of the sampling distribution of mean is very useful in several decision situations for
accepting or rejection of hypotheses.

30
TYBMS Prof. Hemant Kombrabail

(2) Sampling distribution of proportion: Like sampling distribution of mean, we can as


well have a sampling distribution of proportion. This happens in case of statistics of
attributes. Assume that we have worked out the proportion of defective parts in large
number of samples, each with say 100 items, that have been taken from an infinite
population and plot a probability distribution of the said proportions, we obtain what
is known as the sampling distribution of proportion. Usually the statistics of attributes
correspond to the conditions of a binomial distribution that tends to become normal
distribution as n becomes larger and larger. If p represents the proportion of
defectives i.e., of successes and q the proportion of non-defectives i.e., of failures (or
q == 1 — p) and if p is treated as a random variable, then the sampling distribution of
p×q
proportion of successes has a mean = p with standard deviation = where n is
n
the sample size. Presuming the binomial distribution approximating the normal
distribution for large n, the normal variate of the sampling distribution of proportion
px - p
z = p ´ q where px is the sample proportion of successes, can be used for testing of
n
hypotheses.
3. Student's t-distribution: When population standard deviation (σν) is not known and the
sample is of a small size (i.e., n ≤ 30), we use t distribution for the sampling
distribution of mean and workout t variable as:
X- m n
t=
ss where åi=1 ( X i - X )2 i.e., the sample standard deviation, t-
n n- 1
distribution is also symmetrical and is very close to the distribution of standard
normal variate, z, except for small values of n. The variable t differs from z in the
sense that we use sample standard deviation (σs) in the calculation of t, whereas we use
standard deviation of population (σν,) in the calculation of z. There is a different t
distribution for every possible sample size i.e., for different degrees of freedom. The
degrees of freedom for a sample of size n is n - 1. As the sample size gets larger, the
shape of the t distribution becomes approximately equal to the normal distribution. In
fact for sample sizes of more than 30, the t distribution is so close to the normal
distribution that we can use the normal to approximate the t-distribution. But when n is
small, the t-distribution is far from normal but when n ® ¥ , t-distribution is identical
with normal distribution. The t-distribution tables are available which give the values
of t for different degrees of freedom at various levels of significance. The table value
of t for given degrees of freedom at a certain level of significance is compared with the
calculated value of t from the sample data, and if the latter exceeds, we infer that the
null hypothesis cannot be accepted.
4. F distribution: If (σs1)2 and (σs2)2 are the variances of two independent samples of size
n1 and n2 respectively taken from two independent normal populations, having the
2 2
same variance, ( s n1) = ( s n2) , the ratio F= (σs1)2 /(σs2)2, where

31
TYBMS Prof. Hemant Kombrabail

2 2
( X 1 - X 1) and ( s s 2 ) 2 = å ( X 2i - X 2)
( s s1) = å
2 i
has an F distribution
n1 - 1 n2 - 1
with n1 — 1 and n2 — 1 degrees of freedom. F ratio is computed in a way that the
larger variance is always in the numerator. Tables have been prepared for F
distribution that give value of F for various values of degrees of freedom for larger as
well as smaller variances. The calculated value of F from the sample data is compared
with the corresponding table value of F and if the former exceeds the latter, then we
infer that the null hypothesis of the variances being equal cannot be accepted.
5. Chi-square (γ2) distribution: Chi-square distribution is encountered when we deal with
collections of values that involve adding up squares. Variances of samples require us to
add a collection of squared quantities and thus have distributions that are related to
chi-square distribution. If we take each one of a collection of sample variances, divide
them by the known population variance and multiply these quotients by (n — 1),
where n means the number of items in the sample, we shall obtain a chi-square
distribution. Thus, ( ) ´ ( n - 1)
s s2
s 2p would have the same distribution as chi-square
distribution with (n - 1) degrees of freedom. Chi-square distribution tat not
symmetrical and all the values arc positive. One must know the degrees of freedom for
using chi-square distribution. This distribution may also be used for judging the
significance of difference between observed and expected frequencies and also as a
test of goodness of fit. The generalized shape of γ2 distribution depends upon the
degree of freedom and the γ2 value is worked out as under:
2
k
( Oi - Ei )
g =å
2

i=1 Ei
Tables are there that give the value of γ2 for given degree of freedom which may be used
with calculated value of γ2 for relevant degree of freedom at a desired level of
significance for testing hypotheses.

32

You might also like