Professional Documents
Culture Documents
Hemant Kombrabail
SAMPLING
1
TYBMS Prof. Hemant Kombrabail
2
TYBMS Prof. Hemant Kombrabail
samples is large. The significance of sampling distribution follows from the fact that
the mean of a sampling distribution is the same as the mean of the universe. Thus, the
mean of the sampling distribution can be taken as the mean of the universe.
13. Bias - Generally, an effect which deprives a statistical result of representativeness by
systematically distorting it, as distinct from a random error which may distort on any
one occasion but balances out on the average
14. Biased sample - A sample obtained by a biased sampling process, that is to say, a
process which incorporates a systematic component of error, as distinct from random
error which balances out on the average Non-random sampling is often, though not
inevitably, subject to bias, particularly when entrusted to subjective judgment on the
part of human beings
Census implies collection of information from each element of the group or population of
interest, (e.g. Survey of industrial consumers). In many cases, complete enumeration is
not possible and the only alternative available is sampling.
Sample survey is the survey of a small representative part of the population taken up for
detailed scrutiny and study purpose. A sample is a small representative of the whole and
conclusions drawn from such sample are equally applicable to the entire population.
Sample survey gives the benefits of census survey but with less time, expenditure and
manpower. It is a better substitute to census survey. Sample surveys are commonly
conducted in marketing research projects and gives promising results.
A survey which is carried out using a sampling method i.e. using a representative portion
of the whole population is called sample survey which is a short cut alternative to census
survey but gives similar benefits.
3
TYBMS Prof. Hemant Kombrabail
amounts of purchase of each product category, average amount kept at home and the
like) and the population of interest is all households in a country, the cost will
preclude a census being taken. Thus a sample is the only logical way of obtaining new
data from a population of this size.
2. Time: The kind of cost we have just considered is an outlay cost. The time involved in
obtaining information from either a census or a sample involves the possibility of also
incurring an opportunity cost. That is, the decision until information is obtained may
result in smaller gain or a larger loss than would have been the case from making the
same decision earlier. The opportunity to make more (or save more. as the case may
be) is, therefore, foregone.
3- Accuracy: A study using a census, by definition, contains no sampling error. A study
using a sample may involve sampling error in addition to other types of error. Other
things being equal, a census will provide more accurate data than a sample. However
it has been argued that a more accurate estimate of the population of a country could
be made from a sample than from a census. Taking a census of a population on a
"mail out - mail back" basis requires that the names and addresses of almost all
households be obtained, census questionnaires mailed, and interviews conducted of
those not responding. The questionnaires are sent to a population of which only about
half have completed high school. The potential for errors in a returned questionnaire
is therefore high.
4. Destructive nature of the measurement: Measurements are sometimes destructive in
nature. When they are, it is apparent that taking a census would usually defeat the
purpose of a measurement. If one were producing firecrackers, electrical fuses, or gas
seed, performing a functional use test on a all products for quality control purposes
would not be considered from an economic standpoint. A sample is then the only
practical choice. On the other hand, if the light bulbs, bicycles, or electrical appliances
are to be tested, a 100% sample (census) may be entirely reasonable.
According to Crisp R. D., the fundamental idea of sampling is that a small number of
items or parts (called a sample) are chosen at random from a large number of items or a
whole (called a universe or population) the sample will tend to have the same
characteristics and in approximately the same proportion as the universe.
4
TYBMS Prof. Hemant Kombrabail
FEATURES OF SAMPLING
(1) Sampling is a small representative of the whole. It is an effective alternative to the
census survey.
(2) Sampling reduces the time, efforts and money of the researcher on data collection
without any adverse effect on its quality.
(3) The sampling technique is based on the assumption that random selection of sample
from the universe do possesses the same features and characteristics as that of the
universe.
(4) The findings of sample survey are accurate and reliable. The larger sample is better as
the results available are more accurate.
(5) Sampling is used in data collection as well as for different purposes in our daily life.
(6) The concept of sampling is quite common and popular in marketing research as it
helps researchers to finalize their findings and recommendations within a short period.
5
TYBMS Prof. Hemant Kombrabail
(2) Provides reliable data: The conclusions drawn from the sample survey are reliable,
accurate and also applicable to the whole population/universe. Sampling has no adverse
effect on the quality of data collected. It gives quality results with lesser volume of work.
(3) Scientific base: The concept of sampling has scientific backing as it is based on the
law of statistical regularity and the law of inertia of large numbers.
(4) Facilitates better supervision on data collection: Sampling method is restricted to
limited number of respondents. Naturally effective monitoring and supervision on the
data collection work is possible. This improves the quality of data collected.
6
TYBMS Prof. Hemant Kombrabail
If we were to monitor the sales of a product recently introduced by us, the population
might be
(i) Element Our product
(ii) Sampling units Retail outlets, super markets, then our product
(iii) Extent Delhi and New Delhi
(iv) Time January 7-14, 1999
It may be emphasized that all these four specifications must be contained in the
designated population Omission of any of them would render the definition of population
incomplete
The question is—How to ensure that the frame is perfect and free from any defect Leslie
Kish has observed that a perfect frame is one where "every element appears on the list
separately, once only once, and nothing else appears on the list" This type of perfect
frame would indicate one-to-one correspondence between frame units and sampling units
But such perfect frames are rather rare Accordingly, one has to use frames with one
deficiency or another, but one should ensure that the frame is not too deficient so as to be
given up altogether
This raises a pertinent question -What are the criteria for a suitable frame? In order to
examine the suitability or otherwise of a sampling frame, a number of questions need be
asked. These are
1 Does it adequately cover the population to be surveyed?
2 How complete is the frame? Is every unit that should be included represented?
3 Is it accurate? Is the information about each individual unit correct? Does the frame as a
whole contain units, which no longer exist?
4 Is there any duplication? If so, then the probability of selection is disturbed as a unit can
enter the sample more than once
7
TYBMS Prof. Hemant Kombrabail
5 Is the frame up-to-date? It could have met all the criteria when compiled but could well
be deficient when it came to be used This could well be true of all frames involving the
human population as change is taking place continuously
6 How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for
sampling? Can it easily be re-arranged so as to enable us to introduce stratification and
to undertake multi-stage sampling?
These are demanding criteria and it is most unlikely that any frame would meet them all
Nevertheless, they are the factors to be borne in mind whenever we undertake random
sampling
In marketing research most of the frames are from census reports, electoral registers, lists
of member units of trade and industry associations, lists of members of professional
bodies, lists of dwelling units maintained by local bodies, returns from an earlier survey
and large scale maps.
As mentioned in the preceding step, the sampling frame should be complete and accurate
otherwise the selection of the sampling unit might be defective. It is necessary to get a
further specification of the sampling unit both in personal interviews and in telephone
interviews. Thus, in personal interviews, a pertinent question is—of the several persons in
a household, who should be interviewed? If interviews were held during office timings
when the heads of families and other employed persons are away, interviewing would
under-represent employed persons and over-represent elderly persons, housewives and
the unemployed. In view of these considerations, it is necessary to have a random process
of selection of the adult residents of each household. One method that could be used for
this purpose is to list all the eligible persons living at a particular address and then select
one of them.
In case of a probability sample, the probability or chance of every unit in the population
being in the sample is known. Further, the selection of specific units in the sample
depends entirely on chance. No substitution of one unit for another is permissible. This
means that no human judgment is involved in the selection of a sample. In contrast, in a
non-probability sample, the probability of inclusion of any unit in the population in the
sample is not known. In addition, the selection of units within a sample involves human
judgment rather than pure chance.
8
TYBMS Prof. Hemant Kombrabail
In case of a probability sample, it is possible to measure the sampling error and thereby
determine the degree of precision in the estimates with the help of the theory of
probability. This theory also enables us to consider, from amongst the various possible
sample designs, the one that will give the maximum information per rupee. This is not
possible when a non-probability sample is used.
Probability sampling enables us to choose representative sample designs. It also enables
us to estimate the extent to which the results based on such a sample are likely to be
different from what we would have obtained had we covered the population in our study.
Conversely, the use of probability sampling enables us to determine the sample size for a
given degree of precision, indicating that our sample results do not differ by more than a
specified amount from those yielded by a study covering entire population.
Although non-probability sampling does not yield these benefits, on account of its
convenience and economy, it is often preferred to probability sampling. If the researcher
is convinced that the risks involved in the use of a non-probability sample are more than
offset by its being relatively cheap and convenient, his choice should be in favor of non-
probability sampling.
There are various types of sample designs that can be covered under the two broad
groups, random or probability samples and non-random or non-probability samples.
9
TYBMS Prof. Hemant Kombrabail
Probability sampling methods include random sampling, stratified, cluster, sampling, etc.
Such methods are used extensively in marketing research. These methods provide
unbiased information. The probability sampling methods are objectively designed.
However, these methods are time consuming and also costly for use. Greater statistical
competence and time are required to plan and use probability sampling methods.
10
TYBMS Prof. Hemant Kombrabail
Here, sample units are selected in a non-random manner. The selection may be
purposive. It may be based on the convenience or the judgment of the researcher. The
selection is deliberate not random. Every item is not given a definite chance of being
included in the sample. The non-probability sampling ' methods include convenience
sampling, judgment sampling, and quota sampling. In these methods, the sample is
selected in a subjective manner and the decision regarding sample is taken by the
researcher * himself. The sample selected may not be representative of the universe to
be studied. The selection of sample may be influenced by the subjective consideration of
the person connected with research work (researcher).
Non-probability sampling methods are also used in marketing research along with
probability methods. Such methods are sometimes preferred because they cost less per
observation, require less time and need relatively little statistical sophistication in
planning the sample design and in the selection the respondents. Probability sampling
methods are more scientific and capable of yielding more representative samples than
non-probability sampling methods. However, there is no sampling method (probability or
non-profitability) that can be considered to be best in all situations. Any suitable method
may be selected and used properly for promising results.
11
TYBMS Prof. Hemant Kombrabail
The process of randomness does not mean that it is 'haphazard', as a layman may be
inclined to think. What it means is that the process of selecting a sample is independent of
human judgment. To ensure this, there are two methods that are followed when drawing a
random sample. These are: (i) the lottery method and (ii) the use of random numbers.
In the lottery method, each unit of the population is numbered and shown on a chit of
paper or disc. The chits are folded and put in a box from which a sample of the requisite
size is to be drawn. In case discs are used, these are well mixed up before a draw is made
so that no particular unit can be identified before it gets selected. The sample is drawn in
the same manner as winning numbers in a lottery are drawn
In the second method, the tables of random numbers are used. The members of the
population are numbered from 1 to N from which n members are selected. This process is
explained below with the help of an illustration.
Suppose a sample of size 50 is to be selected from a population of 500. First, number the
500 units from 1 to 500, the order being quite immaterial. While numbering the units,
ensure that each unit in the population has uniform digits, in this case, three. Thus, 1st
unit would have a three-digit number 001, 2nd unit 002, 10th unit 010, 11th unit O11, and
so on. After the units have been given three-digit numbers, the table of random numbers
is to be used. One may start from the left-hand top corner of the table of random numbers
and proceed systematically down sets of three-digit columns, rejecting numbers over 500
and those that have occurred earlier.
Using the first thousand numbers from the table of random numbers (an excerpt from the
table is given below), a sample of 50 out of 500 will thus be chosen.
231 055 148 389 117 433 495 367 070 313
092 259 113 455 126 426 062 401 100 488
434 325 211 207 398 225 485 035 171 047
318 263 239 108 379 420 122 441 493 310
032 194 144 337 224 006 068 043 500 222
12
TYBMS Prof. Hemant Kombrabail
Stratified sampling is also called proportional random sampling. In this sampling, the
population is first subdivided into certain mutually exclusive groups or strata Such groups
may be formed on the basis of geographical area / size of the household or income After
stratification, a random sample of a given size is selected from each stratum of the total
population This is how an attempt is being made to make the sample more representative
in character Here, each of the strata is represented in the sample in relation to its
importance
13
TYBMS Prof. Hemant Kombrabail
In the above example, the population consists of 15,000 households, divided into five
strata on the basis of monthly income. Column (3) of the table shows the sample, i.e.,
number of households selected from each stratum. The sample constitutes one per cent of
the population. A sample of this type, where each stratum has a uniform sampling
fraction, is called a proportionate stratified sampling. If, on the contrary, the strata have
variable sampling fractions, the sample is called a disproportionate stratified sample. The
figures given in column (4) of the above table show a disproportionate stratified sample.
It will be seen that the sampling fraction varies from one stratum to another. Thus, for
example, it is 0.015 for the monthly income Rs 0-500 and 0.01 for the stratum, Rs 3001+.
It may he noted that a stratified random sample with a uniform sample fraction results in
greater precision than a simple random sample. But, this is possible only when the
selection within strata is made on a random basis. Further, a stratified proportionate
sample is generally convenient on account of practical considerations,
There are some other considerations in favor of the stratified random sample. The
researcher may be interested in the results for separate strata rather than for the entire
population. A simple random sample will not show results by strata as it presents only an
aggregative picture. Another consideration is that it may be administratively expedient to
split the population into strata. Yet another consideration is that one can use different
procedures for selecting samples from various strata. If the data are more variable in any
particular strata, a larger sampling fraction should be taken in that stratum. This would
result in greater overall precision
This method reduces the sampling error and it is a more accurate and representative
sampling method Naturally, it is treated as an improvement over simple random
sampling. It provides information about different components of the total population Use
of stratified sampling also leads to administrative conveniences In order to use a stratified
sample, some information regarding the population and its strata should be available to
the researcher
The process of stratified random sampling differs from simple random sampling In
simple random sampling, sample items are chosen at random from the entire universe
while in stratified random sampling, a separate random sample is chosen from each
14
TYBMS Prof. Hemant Kombrabail
stratum Stratified random sampling is used in order to increase the precision of sampling
estimates.
A few points regarding cluster sampling may be noted here. First, "whether or not a
particular aggregate of units should be called a cluster" will depend on the circumstances
15
TYBMS Prof. Hemant Kombrabail
of each case. In foregoing example, localities were taken as clusters and households as
individual units. In another case, the households may be taken as a cluster and the
members of the households as individual units. Second, it is not necessary that clusters
should always be natural aggregates such as locality constituencies, schools or classes.
Artificial clusters may be formed, as is generally done in area sampling where grids may
be determined on the maps. Third, several levels of clusters may be used in any one
sample design. Thus, in a city survey, localities or wards, streets and households may be
selected in which case localities or wards are the clusters at the first level and streets at
the second level and households would be the units.
On the basis of the blocks thus identified, numbered and assigned to strata, a stratified
sample of dwellings can be selected This can be done in either of two ways First, a
sample of dwellings may be drawn from all the dwellings included in a selected block
Second, blocks may be divided into segments of a more or less equal size, and a sample
of these segments can be chosen and finally all the dwellings from the selected segments
may be taken in the sample It will thus be seen that the second method introduces another
stage of sampling, namely, segments
Although the above discussion relates to area sampling with respect to a city or town, the
same approach is applicable to a large area, say, a state or a country, the only difference
being that one or more additional stages of sampling may have to be introduced
Finally, it may be pointed out that area sampling is perhaps the only possibility if a
suitable sampling frame is not available
16
TYBMS Prof. Hemant Kombrabail
An illustration would make the concept of multi-stage sampling clear. Suppose a sample
of 5000 urban households from all over the country is to be selected. In such a case, the
first stage sample may involve the selection of districts. Suppose 25 districts out of say
500 districts are selected. The second stage may involve the selection of cities, say four
from each district. Finally, 50 households from each selected city may be chosen. Thus,
one would have a sample of 5000 urban households, arrived at in three stages. It is
obvious that the final sampling unit is the household.
In the absence of multi-stage sampling of this type, the process of the selection of 5000
urban households from all over the country would be extremely difficult. Besides, such a
sample would be very thinly spread over the entire country and if personal interviews are
to be conducted for collecting information, it would be an extremely costly affair. In view
of these considerations a sampling from a widely spread population is generally based on
multi-stage.
The number of stages in a multi-stage sampling varies depending on convenience and the
availability of suitable sampling frames at different stages. Often, one or more stages can
be further included in order to reduce cost. Thus, in our earlier example, the final stage of
sampling comprised 50 households from each of the four selected cities. Since this would
involve the selection of households all over the city, it would turn out to be quite
expensive and time consuming if personal interviews are to be conducted. In such a case,
it may be advisable to select two wards or localities in each of the four selected cities and
then to select 25 households from each of the 2 selected wards or localities. Thus, the cost
of interviewing as also the time in carrying out the survey could be reduced considerably.
It will be seen that an additional stage comprising wards or localities has been introduced
here. Thus the sample has become a four-stage sample –
1st stage – districts
2nd stage – cities
3rd stage – localities
4th stage – households
From the preceding discussion it should be clear that a multi-stage sample results in the
concentration of fieldwork. This in turn, leads to saving time, labor and money. There is
another advantage in its use. Where a suitable sampling frame covering the entire
population is not available, a multi-stage sample can be used.
17
TYBMS Prof. Hemant Kombrabail
The main point of distinction between a multi-stage and a multi-phase sampling is that in
the former each successive stage has a different unit of sample whereas in the latter the
unit of sample remains unchanged though additional information is obtained from a sub-
sample.
The main advantage of a multi-phase sampling is that it effects economy in time, money
and effort. In our earlier example, if a detailed questionnaire is sent out to a large sample
comprising individuals, they would not be able to provide the necessary information.
Second, more time will be required. Finally, it will be far more expensive to carry out the
survey, especially when personal interviews are involved.
In replicated sampling, several random sub-samples are selected from the population
instead of one full sample. All the sub-samples have the same design and each one of
them is a self-contained sample of the population. For example, take the case of a random
sample of 10 households. This sample may be divided into, say, 10 equal sub-samples to
be assigned to 10 interviewers. Thus, each interviewer may be required to collect
information from 10 households.
18
TYBMS Prof. Hemant Kombrabail
such samples do not reveal any systematic errors that may be more or less common to all
interviewers and the compensating errors which cancel each other out over an
interviewer's assignment.
Apart from the above limitations, replicated samples have other disadvantages If personal
interviews are to be conducted, replicated samples turn out to be costlier Likewise,
tabulation costs would be higher than in the case of a single large sample Finally,
replicated samples are more complex to administer.
A sequential sampling is resorted mainly to bring down the cost and hence the smallest
possible sample is used The desired statistics from first sample, ni, are computed and
evaluated If these statistics do not satisfy the criteria laid down, a second sample is drawn
The results of the first and second samples are added and the statistics are recomputed
This process is continued until the specified criteria are satisfied The criteria are usually a
minimum significance level, a minimum cluster size, or a minimum confidence interval
The main advantage of sequential sampling is that it obviates the need for determining a
fixed sample size before the commencement of the survey
19
TYBMS Prof. Hemant Kombrabail
accidental sampling, as the respondents in the sample are included merely on account of
their being available on the spot where the survey work is in progress. Convenience
sampling is more suitable in exploratory research, where the focus is mainly on getting
new ideas and insights into a given problem.
A sample of 2000 households has been chosen, subject to the condition that 1200 of these
should be from rural areas and 800 from the urban areas of the territory Likewise, of the
2000 households, the rich households should number 150, the middle class ones 650 and
the remaining 1200 should be
Independent Controls
Rural 1200 Rich 150
Urban 800 Middle class 650
Poor 1200
Total 2000 Total 2000
Inter-related Controls
Rural Urban Total
Rich 100 50 150
Middle class 400 250 650
Poor 700 500 1200
Total 1200 800 2000
20
TYBMS Prof. Hemant Kombrabail
from the poor class These are independent quota controls The second table shows the
inter-related quota controls As can be seen, inter-related quota controls allow less
freedom of selection of the units than that available in the case of independent controls
There are certain advantages in both the schemes Independent controls are much simpler,
especially from the viewpoint of interviewers They are also likely to be cheaper as
interviewers may cover their quotas within a small geographical area In view of this,
independent controls may affect the representativeness of the quota sampling Interrelated
quota controls are more representative though such controls may involve more time and
effort on the part of interviewers Also, they may be costlier than independent quota
controls
In view of the non-random element of quota sampling, it has been severely criticized
especially by statisticians, who consider it theoretically weak and unsound There are
points both in favor of and against quota sampling These are given below
Advantages of quota sampling
(a) It is economical as traveling costs can be reduced An interviewer need not travel all
over a town to track down pre-selected respondents However, if numerous controls
are employed in a quota sample, it will become more expensive though it will have
less selection bias
(b) It is administratively convenient The labor of selecting a random sample can be
avoided by using quota sampling Also, the problem of non-contacts and call-backs
can be dispensed with altogether
(c) When the field work is to be done quickly, perhaps in order to minimize memory
errors, quota sampling is most appropriate and feasible
(d) It is independent of the existence of sampling frames Wherever a suitable sampling
frame is not available, quota sampling is perhaps the only choice available
21
TYBMS Prof. Hemant Kombrabail
Occasionally it may be desirable to use judgment sampling Thus, an expert may be asked
to select a sample of 'representative' business firms The reliability of such a sample would
depend upon the judgment of the expert The quota sample, discussed earlier, is in a way a
judgment sample where the actual selection of units within the earlier fixed quota
depends on the interviewer
It may be noted that when a small sample of a few units is to be selected, a judgment
sample may be more suitable as the errors of judgment are likely to be less than the
random errors of a probability sample 16 However, when a large sample is to be selected,
the element of bias in the selection could be quite large m the case of a judgment sample
Further, it may be costlier than the random sampling
(4) MASTER SAMPLES
A master sample is one from which repeated sub-samples can be taken as and when
required from the same area or population This was first used in the United States when
the US Master sample of agriculture was taken In this sampling, the rural area of over
3000 US counties was divided into segments of about four farms each "After selecting a
systematic sample of 1/8 of the segments, the materials were duplicated and made
available, with instruction, at low cost" The crucial point to note in respect of master
samples is that "the actual sample for each new survey is not selected directly from the
entire population but from a frame of segments and dwellings that was selected earlier
from the entire population "
The utility of the samples is limited to a relatively short period for there may be changes
in the population which would distort the representative character of the master samples
In view of this, master samples should be relatively permanent, say, dwellings rather than
individuals or household which frequently undergo changes on account of births, deaths
and migration The main advantage of master samples is that they can be expeditiously
selected on account of their simplicity Another advantage is that they are economical,
because the same master frame is used for drawing samples for several surveys, as a
result of which the cost incurred on the preparation of the master frame is spread over
these surveys. Further, on account of this economy in each survey, one can initially spend
more to create a good master frame. Thus, economy may lead to improved quality in the
listing.
(5) PANEL SAMPLES
Panel samples are frequently used in marketing research. In panel samples, the same units
or elements are measured on subsequent occasions. To give an example: Suppose that one
is interested in knowing the change in the consumption pattern of households. A sample
of households is drawn. These households are contacted to gather information on the
pattern of consumption, subsequently, say after a period of six months, the same
households are approached once again and the necessary information on their
consumption is obtained. A comparison of the results of the two sets of data would
indicate whether there has been any change, and, if so, to what extent. In fact, the
information is collected on a more or less continuous basis with the help of panel
samples.
22
TYBMS Prof. Hemant Kombrabail
Panel samples are extremely convenient and economical and the cost of drawing a second
sample is not incurred. But the main limitation of such samples is that it may be difficult
to sustain the interest of individuals included in the panel for a long period. Many
respondents on the panel may refuse to be interviewed twice or may give poor answers.
In either case the quality of the survey will suffer. Another limiting factor in panel
samples is that there may be bias on account of the continued participation in the panel. It
is felt that the individual is conditioned to some extent by the fact that data on purchases
are reported. In such a case the purchase behavior of panel members may become
different from others not covered by the panel. Furthermore, panel samples may turn out
to be more expensive while locating the same sample of respondents after a lapse of, say,
a year, when some of them might have migrated to other areas. This would involve travel
costs in addition to being difficult.
Goal orientation
This suggests that a sample design "should be oriented to the research objectives, tailored
to the survey design, and fitted to the survey conditions" If this is done, it should
influence the choice of the population, the measurement as also the procedure of
choosing a sample
Measurability
A sample design should enable the computation of valid estimates of its sampling
variability Normally, this variability is expressed in the form of standard errors in surveys
However, this is possible only in the case of probability sampling In non-probability
samples, such as a quota sample, it is not possible to know the degree of precision of the
survey results
Practicality
This implies that the sample design can be followed properly in the survey, as envisaged
earlier It is necessary that complete, correct, practical and clear instructions should be
given to the interviewer so that no mistakes are made in the selection of sampling units
and the final selection in the field is not different from the original sample design
Practicality also refers to simplicity of the design, i.e. it should be capable of being
understood and followed in actual operation of the field work
Economy
Finally, economy implies that the objectives of the survey should be achieved with
minimum cost and effort Survey objectives are generally spelt out in terms of precision,
i.e. the inverse of the variance of survey estimates For a given degree of precision, the
sample design should give the minimum cost Alternatively, for a given per unit cost, the
sample design should achieve maximum precision (minimum variance)
23
TYBMS Prof. Hemant Kombrabail
It may be pointed out that these four criteria come into conflict with each other in most of
the cases, and the researcher should carefully balance the conflicting criteria so that he is
able to select a really good sample design As there is no unique method or procedure by
which one can select a good sample, one has to compare several sample designs that can
be used in a survey This means that one has to weigh the pros and cons, the strong and
weak points of various sample designs in respect of these four criteria, before selecting
the best possible one
The procedure for finding the optimal value of ‘n’ or the size of sample under this
approach is as under:
1. Find the expected value of the sample information (EVSI) for every possible n
2. Also workout reasonably approximated cost of taking a sample of every possible
n,
3. Compare the EVSI and the cost of the sample for every possible n. In other
words, workout the expected net gain (ENG) for every possible n as stated below:
24
TYBMS Prof. Hemant Kombrabail
5. Use of Traditional Statistical Model - The formula for traditional statistical model
depends upon the type of sample to be taken and it always incorporates three common
variables
an estimate of the variance in the population from which the sample is to be drawn
the error from sampling that the researcher will allow
the desired level of confidence that the actual sampling error will be within the
allowable limits
The statistical models for simple random sampling include estimation of means and
estimation of proportion
SAMPLING ERRORS
Whatever kind of sample is taken and whatever the sample size there will always be error
arising from the sampling process. The extent of such error may be defined as the
difference between a sample result, and the result that would have been achieved by
undertaking a complete census. Such errors arise because particular types of cases are
under-represented or over-represented in the sample compared with the population as a
whole. If, for example, the cases are individual consumers, then the under- or over-
representation of the sexes, ages or social classes will affect the measurement (and, more
importantly, the estimates made from them) of a large number of variables. Lack of
representation in the appropriate quantities may be a product of two factors: systematic
error (or bias) and random error (or variance).
Systematic error
Bias arises when the sampling procedures used bring about over- or under- representation
of types of cases in the sample, which is mostly in the same direction. This may happen
because:
• the selection procedures are not random,
• the selection is made from a list that does not cover the population, or uses a procedure
that excludes certain groups,
• non-respondents are not a cross-section of the population.
If the selection procedures are not random then it means that human judgement has
entered into the selection process. For example, interviewers may be asked to choose
25
TYBMS Prof. Hemant Kombrabail
If the Electoral Register is used to select adults aged 16 or over, then, as indicated earlier,
16 and 17 year-olds and many of the 18 year-olds will be missing from the list and will
be under-represented in the final sample. The use of telephone directories will under
represent certain social groups less likely to be in the telephone book (or those who are
ex-directory). Duplication in lists, for example in the Yellow Pages, may result in some
over-representation. If we try to estimate sales of soap from a sample of private
households, then all users in institutions of various kinds will be excluded.
Non-response is a problem for both censuses and samples. For censuses it means that the
enumeration will be incomplete. If large numbers are missing, it would be inappropriate
to treat those successfully contacted as a representative sample'. For samples, it means
that estimates made from the sample will he biased if non-respondents are not themselves
representative of the population. If they are representative, then non-response is not so
much of a problem; but it may still mean that analyses are made on the basis of too small
a sample.
Whatever the reason for the systematic error, the effect will be that all samples that could
be drawn from a population will tend to result in the same direction of over- or under-
representation. The average of all these samples will then not be the same as the real
population average or proportion. Thus if we took lots of samples using a procedure that
tended to omit working mothers with young children, then all the samples will manifest
such under-representation rather than some over-representing them and some under-
representing them so that the average of all samples was very close to the real population
proportion.
Systematic errors cannot be reduced simply by increasing the sample size. If certain kinds
of people are not being selected, cannot be contacted or are not responding, it will not be
'solved' by taking a bigger sample. Indeed, some kinds of errors -will increase with more
interviewers, more questionnaires and greater data-processing requirements. All the
researcher can do is minimize the likelihood of bias by using appropriate sample designs.
Biases for some variables can be checked, for example against Census data or data from
other sources. Sometimes attempts are made to discover the characteristics of non-
responders, for example by sending out interviewers to non-respondents to a postal
survey, taking 'late' responders as typical of non-responders, or gaining demographic data
from the results of another survey that the non-responders have taken part in.
Random error
If we took a number of random, unbiased samples from the same population there will
almost certainly be a degree of fluctuation from one sample to another. Over a large
number of samples such errors will tend to cancel out, so that the average of such
26
TYBMS Prof. Hemant Kombrabail
samples will be close to the real population value However, we usually take only one
sample, and even a sample that has used unbiased selection procedures will seldom be
exactly representative of the population from which it was drawn. Each sample will, in
short, exhibit a degree of error. Such error is often called 'sampling error', 'hut it would he
clearer to think of it as 'random sampling error' to distinguish it from bias (which some
statisticians and some textbooks, confusingly, categorize as 'non-sampling' error).
Unlike bias, which affects the general sample composition and relates to each variable
being measured in unknown ways, random sampling error will differ from variable to
variable. The reason for this is that the extent of such error will depend on two factors:
• the size of the sample - the bigger the sample, the less the random sampling error (but
by a declining amount),
• the variability in the population for that particular variable - a sample used to estimate
a variable that varies widely in the population will show more random sampling error
than for a variable that does not.
These two factors are used as a basis for calculating the likely degree of variability in a
sample of a given size for a particular variable. This, in turn, is used as an input for
establishing with a specified probability the range of accuracy of sample estimates, or
that sample findings are only random sampling fluctuations from a population of cases in
which the findings are untrue.
NON-SAMPLING ERRORS
Not all errors in a piece of research are a result of the sampling process Certain kinds of
error may arise even if a complete census is taken. There are four main categories of such
error:
• response errors,
• interviewer errors,
• non-response errors,
• processing errors.
Where research is based on asking people questions then response errors may arise
where, for one reason or another, respondents give wrong answers. This may be through
dishonesty, forgetfulness, faulty memories. unwillingness or misunderstanding of the
questions being asked. Many of these errors arise as a result of poor or inadequate
questionnaire design putting it the other way round, the potential for such errors to arise
can be minimized by careful design of question-wording, question formulation and
questionnaire layout. In interview surveys, whether face-to-face or by telephone,
interviewers may themselves misunderstand questions or the instructions for filling them
in. they may be dishonest, inaccurate, make mistakes or ask questions in a non-standard
fashion. Interviewer training, along with field supervision and control can, to a large
extent, remove the likelihood of such errors, but they will never be entirely eliminated,
and there is always the potential for systematic differences between the results obtained
by different interviewers.
27
TYBMS Prof. Hemant Kombrabail
In nearly all research there will be missing cases, but in survey research there will always
be a degree of non-response because some people will refuse to he interviewed or to
complete a questionnaire, some will be ineligible because they turn out not to be part of
the survey population, some will terminate the interview or refuse to answer some of the
questions, and some will be non-contactable, for example, because they have moved
away, died, or are on holiday at the time of the survey. Even where a census is attempted,
it will often remain incomplete. The extent of non-response will vary considerably
according to the type of research, the topic of the research, and, where based on face-to-
face interviews, on the experience and training of the interviewers. Calculating the
amount of non-response can be confusing since some researchers will, for example, take
the proportion of refusals in the sample drawn, others will take refusals and non-contacts
as a proportion of those found eligible, and so on.
Processing errors can arise back at the office, particularly at the stage of entering answers
to questions onto a computerized database via a keyboard and screen. Agencies
sometimes validate these entries by, in effect, entering them twice, and the computer
checks to see if the two entries are identical. Alternatively, some agencies check samples
of the entries. It is possible, in addition, to apply range checks and logical checks.
There are, then, a number of sources of non-sampling error, and it is important to bear
these in mind when interpreting survey results, whether based on a sample or not. The
crucial point is that such errors can arise even if a census is taken.
Errors of various kinds can always be reduced by spending more money, for example, on
more interviewer training and supervision, on random sampling techniques, on pilot
testing or on getting a higher response rate. However, the reduction in error has to be
traded off against the extra cost involved. Furthermore errors are often interrelated so that
attempts to reduce one kind of error may actually increase another, for example,
minimizing the non-response errors by persuading more reluctant respondents may well
increase response error Non-sampling errors tend to be pervasive, not well-behaved and
do not decrease - indeed may increase - with the size of the sample. It is sometimes even
difficult to see whether they cause under- or over-estimation of population characteristics.
There is, in addition, the paradox that the more efficient the sample design is in
28
TYBMS Prof. Hemant Kombrabail
controlling random sampling fluctuations, the more important in proportion become bias
and non-sampling error.
Processing errors will be minimized by careful editing and checking of the questionnaires
in addition to the use of data entry validation procedures.
Market research agencies will try to minimize bias by using carefully constructed sample
designs that use random procedures wherever possible, or by imposing restrictions on
interviewer choices where it is not. These sample designs were described earlier. Biases
will still remain, however, and sometimes these are known. Thus it may be known that
29
TYBMS Prof. Hemant Kombrabail
there are too many women in the sample, or too few men aged 20-24, compared with
known population proportions. Many agencies will make corrections to the data to adjust
for these biases by 'weighting' them.
In the real world of market research agencies and their clients it is unfortunately true that
many clients do not understand or lack interest in the basics of sampling. In consequence
many clients do not ask for estimates of bias or calculations of random sampling error. At
the same time the agencies feel that to produce calculations, for example of confidence
intervals for a large number of variables will only add confusion and perhaps distrust of
the data. In consequence, sampling errors are often quietly ignored, and the estimates
given are taken to be the 'truth'. Agencies will instead try to assure their clients that the
occurrence and impact of non-sampling errors have been minimized by:
• demonstrating that the procedures for the collection, analysis and reporting of the
results are 'respectable', meticulous and thorough,
• showing that the research design features are such as to minimize sources of error
within the parameters set by time and cost,
• emphasizing the extent of quality control checks that will uncover, correct and minimize
the occurrence of 'mistakes',
• making corrections to the resulting data so that known biases are adjusted for.
Beyond these assurances, clients are sometimes given some indication of the extent of
random sampling error that remains. Clients may be given 'read-off tables for groups of
products or types of variable, based on the 'average' variability for that group or type,
given a particular sample size.
___
30
TYBMS Prof. Hemant Kombrabail
31
TYBMS Prof. Hemant Kombrabail
2 2
( X 1 - X 1) and ( s s 2 ) 2 = å ( X 2i - X 2)
( s s1) = å
2 i
has an F distribution
n1 - 1 n2 - 1
with n1 — 1 and n2 — 1 degrees of freedom. F ratio is computed in a way that the
larger variance is always in the numerator. Tables have been prepared for F
distribution that give value of F for various values of degrees of freedom for larger as
well as smaller variances. The calculated value of F from the sample data is compared
with the corresponding table value of F and if the former exceeds the latter, then we
infer that the null hypothesis of the variances being equal cannot be accepted.
5. Chi-square (γ2) distribution: Chi-square distribution is encountered when we deal with
collections of values that involve adding up squares. Variances of samples require us to
add a collection of squared quantities and thus have distributions that are related to
chi-square distribution. If we take each one of a collection of sample variances, divide
them by the known population variance and multiply these quotients by (n — 1),
where n means the number of items in the sample, we shall obtain a chi-square
distribution. Thus, ( ) ´ ( n - 1)
s s2
s 2p would have the same distribution as chi-square
distribution with (n - 1) degrees of freedom. Chi-square distribution tat not
symmetrical and all the values arc positive. One must know the degrees of freedom for
using chi-square distribution. This distribution may also be used for judging the
significance of difference between observed and expected frequencies and also as a
test of goodness of fit. The generalized shape of γ2 distribution depends upon the
degree of freedom and the γ2 value is worked out as under:
2
k
( Oi - Ei )
g =å
2
i=1 Ei
Tables are there that give the value of γ2 for given degree of freedom which may be used
with calculated value of γ2 for relevant degree of freedom at a desired level of
significance for testing hypotheses.
32