You are on page 1of 7

Module 2 :Sampling and Data collection

Sampling : In statistics and survey methodology, sampling is concerned with the selection of a subset
of individuals from within a statistical population to estimate characteristics of the whole population.

Methods of sampling: there are two main categories under which various sampling methods can be
done.Probability sampling, Non-probability sampling

Probability sampling: is also called a random sampling. It is chosen that each member of the universe
have equal and known chance of being selected. The most frequently used probability samples are:
Simple random sampling, Systematic sampling, Stratified sampling, Cluster sampling
Simple random sampling: Under this method, each member of the population has a known and equal
chance of being selected. (The process of randomness does not mean that it is “haphazard” as a
layman may be inclined to think. It means that the process of selecting a sample is independent of
human judgment.) there are two methods of drawing a random sampling:
 The lottery method and The use of random numbers
In lottery method, each unit of the population is numbered and shown in a chit of paper. The chits are
folded and put in a box from which sample of the requisite size to be drawn. In the second method,
the tables of random numbers are used. The members of the population are numbered from 1 to N
from which n (sample) are selected.

Suppose a sample of 50 is to be selected from a population of 500. First, the number the 500 units
from 1 to 500. While numbering the units, ensure that each unit in the population has uniform digits,
in this case, three. After the units have been given three digit numbers, the table of random numbers
is to be used. One may start form the left hand top corner of the table of random numbers and
proceed systematically down sets of three digit columns, rejecting numbers over 500 and those, which
have occurred earlier. Using the first thousand numbers from the table of random numbers, as sample
of 50 out of 500 will be chosen.

Systematic sampling: In this case, the sample numbers are chosen in a systematic manner from the
entire population. Each member has a known chance of being selected.
Suppose we want to select a sample of 250 from a population of 2500 employees, the sampling
fraction (k) or sampling interval can be calculated.

k = N i.e., 2500 =10, we randomly select a digit between one and ten, say, nine.
n 250
Thus, we would then select from our list of items numbers 9, 17, 27, 37…up to 2499.this sampling
technique is used in selecting names from city directories or almost any type of lists. For the above
methods, it is necessary that an existing list of units of the population is required.

Stratified sampling
A stratified random sampling is used when the researcher is interested in certain specific categories
within the total population. The population is divided into strata based on recognizable characteristics
of its members, e.g., age, income education etc.

-1-
E.g.: classification of the household based on income:
MHI No: of % of the total No: in sample
families population
Below 10,000 250 10 25
10,000-20,000 1250 50 125
20,000-30,000 850 33 82
Above 30,000 150 7 18

Total 2500 100 250


MHI = monthly household income.
There are two types of stratified sampling: proportionate stratified sampling: this is the method
where the number of items in each stratum is proportionate to their number in the population. Since,
10% of the houses who’s MHI are below 10,000 this group will comprise 10% of the sample. The same
relation is true for other strata.
Disproportionate stratified sampling: in some cases, very little data would be obtained about some of
the strata. In such cases, this method can be used especially when there are major variances in the
values within certain strata.

Cluster sampling: In this method, the various units comprising the population are grouped in clusters
and the sample is made in such away that each cluster has a known chance of being selected.
This is used when there is incomplete data on the composition of population, and, when it is desirable
to save time and costs by limiting the study to specific geographical areas. For example, if a survey
were to be undertaken in a city would involve ac considerable amount of fieldwork and consequently,
would cost more. Instead, a few localities from all over the city are first chosen. Then all the
households in these localities are covered in the sample. Apart from reduction in cost, such cluster
sample is desirable in the absence of suitable sampling frame for the whole population.

Non-probability sampling:
In this sampling, the chance of any particular unit in the population being selected is unknown. Since,
there is no randomness involved in the selection process, an estimate of the sampling error cannot be
made.
There are three most commonly used non-probability sampling methods:
 Judgment sampling: a person knowledgeable about the population under study chooses
sample members he feels would be the most appropriate for the particular study. Thus, a sample
is selected based on his judgment.
 Convenience sampling: the sample here is selected based on the convenience to the
investigator. If 150 persons are to be chosen from Bangalore, the investigator goes to the famous
localities like, M.G Road, Commercial street etc, and picks up 50 persons from each of these
localities. The units selected may be each person who comes across the investigator every 10
minutes.
 Quota sampling: this is similar to stratified sampling. Here the population is divided into strata
based on characteristics of population. The sample unit is chosen so that each stratum is

-2-
represented in proportion to its importance in the population. In this method, the units are
selected on non-random manner while in stratified sampling they are selected on random basis.
 Snowball sampling: this is a multistage sampling in which the first stage in sampling is selected
using probability methods and later stage in sampling is selected by non-probability sampling.
Here the initial set of respondents may be randomly selected. Later additional respondents can
be included in the sample based on the references made by those initial respondents.

Limitations of sampling: sampling encompasses data about only a portion of the universe. When a
precise data on each unit of the population is needed, sampling becomes dysfunctional.
Even in cases, where sampling techniques are applicable, certain problems exist. These problems
pertain to the fact that how the sample represents the population from which it is drawn.

Data Types: Primary Data, Secondary Data


Primary data: primary data are original data from which the researcher directly collects the information
that have not been collected previously. Primary data are first hand information collected through various
methods such as observation.

Types of primary data: primary data can be collected pertaining to the following:
 Demographic and socio economic characteristics: These include features such as the respondent’s
age, qualifications, occupation, marital status, gender, income, social class etc. these variables are
typically used to cross classify the collected data.
 Attitudes and opinions: the marketers are often interested in a person’s attitude with respect tot
heir specific brands and attitudes towards specific features of brands. Whether the consumer has
favourable or unfavorable attitude toward company’s products? It is generally felt that a person’s
attitude will be related to his or her purchasing behavior. Therefore, we try to measure the
respondent’s attitude using scaling techniques.
 Awareness and knowledge: this refers to what respondents do and do not know about some
brand/advertisements. One measure of effectiveness is to use the following approaches.

Methods of collecting primary data:


 Communication method: this involves questioning the respondents to secure the desired
information using a data collection instrument called questionnaire.
 Observation method: this does not involve questioning, instead the situation is observed or
checked and the relevant facts, actions or behavior is recorded. Here an observer is required
and the data may be gathered using some mechanical devices.

Communication methods:
Structured –undisguised questionnaire: closed end questions: most commonly used. Here the
questions are presented with exactly the same order to all respondents. The reason for this
standardization is to ensure that all respondents are replying to the same questions. In this method
along with questions, even responses are standardized. This is done by having fixed alternative
responses in which the respondents can select their answers.
Q: Do you drink orange juice? Yes No

-3-
Structured and undisguised questionnaire: Open-ended questions: here the purpose of the study is
clear, but the responses to the question are open ended.
Q: How often do you watch TV?
Unstructured –disguised questionnaire: also known as motivation research. This is used when the
respondent is not willing to answer or cant find words to describe the answer.Here the
researcher uses techniques known as “projective techniques”. The underlying principle is that the
more unstructured and ambiguous a stimulus (question), the more a respondent can project his
answers by showing his emotions, needs, motivations, attitudes and values.
Structured –disguised questionnaire: these are used very rarely. Here the motive of the
interviewer is not revealed so that the subconscious, hidden motives and attitudes of the
respondent can be found out. Structured questionnaire are used to code, edit and tabulate the
data easily.

Survey techniques: the researcher ahs to decide which technique is to be used to contact the
respondent. The various survey techniques are
 Personal interviews: here the interviewer questions the respondent in a face-to-face meeting. This
can be done on a door to door basis or public paces like shopping malls. Advantages are:
 Telephone surveys: here the prospective respondents are telephoned, usually at homes and asked
to answer a series of questions.
 Mail surveys: here the questionnaire is mailed to the respondent with complete instructions and a
self-addressed stamped envelop. There is no personal interaction with the respondent. These can
be distributed with newspapers and magazine inserts.

Questionnaire design:
A researcher is motivated to design a questionnaire when he finds that the data pertaining to the data
he needs is not available in secondary sources. The term questionnaire refers to a self-administered
process whereby the respondent himself reads the questions and records his answers without the
assistance of an interviewer.
Interview schedule is technically a list of indicative questions those will be asked from the respondent
in person by an interviewer. Questionnaire is more structured and standardized than the interview
schedule.

Difference between questionnaire and interview schedule:


Questionnaire Interview schedule
A self-administered process where the Refers to a list of questions that will be
respondent himself reads questions and discussed with the respondent in person by an
records his answers. interviewer who will record the answers given
by the respondent.
 More highly structured and standardized.  Mostly unstructured and not standardized.
 Lacks flexibility in wording and sequencing  Highly flexible.
 No rephrasing or rearranging  Allowed to rephrase or rearrange according
to the wish of the interviewer.

-4-
Designing the questionnaire: the steps:
 Specifying data requirements
 Determining the type of questions to be asked.
 Deciding the number and sequence of questions
 Revising and pre-testing the questionnaire.

 Specifying data requirements: the researcher’s first job is to ask himself certain questions and find
suitable answers for them.
Ex: What specific data will be necessary to test the hypothesis or establish relationships in which he is
interested? What variables are to be measured? What relationship among variables is important in
reaching research objectives? What kind of analysis to be used? Etc.

 Determining the type of questions to be asked: after specifying the data requirements, the
researcher must decide on the type of questions to be asked from the respondents to get this data.
Generally, questions can be classified into
* Direct questions: they are directly asked for the desired data. However, the directness of the
question also relates o the way a response is interpreted.
*Indirect questions: refer to those responses are used to indicate or suggest data about the
respondent other than the facts given in the answer. This can be used when the content of the
question is somehow threatening to the respondent’s ego, prestige or emotional defects. E.g. why do
you think most of the people buy Lux soap? Here the respondent is generalized other as a vehicle for
expressing his own thoughts.
*Open-end questions: These questions are called free answer questions. As the name implies this
refers to a question that has no fixed alternatives to which the answer must confirm.
E.g., what suggestions do you have for improving orange squashes? ….
*Closed end questions: such questions are also called fixed alternative questions. They refer to those
questions in which the respondent is given a limited number of alternative responses fro which he or
she is to select the one that mostly matches his/her opinion.
*Dichotomous questions: refer to the one which offers the respondent a choice between only two
alternatives, and reduces the issues to its simplest forms. The fixed alternatives are, yes/no,
agree/disagree, right/wrong etc.
*Multiple choice question: this refers to one in which provide several set alternatives for its answers.
Thus, it is in between open end and dichotomous questions.
*Checklists: this is statement on a problem followed by a series of answers from which the respondent
can choose. The checklists question can be put on show cards.
*Rating scale: these are used to measure attitudes. Such scales ask the respondent to rate a particular
object along specified dimensions. E.g. are Likert scales, semantic differential scales, Stapel scales,
Thurstone differential scale etc. (Refer module 6 for more details.)

 Deciding the number and sequence of questions


The number of questions to be asked in a questionnaire depends on the nature of research project in
hand. If the research project is complex, the number of questions required is more. While sequencing
the questions there must be a clear beginning, middle and end. This ensures respondents
involvement. The threatening or embarrassing questions must be put later. Personal questions

-5-
pertaining to income, education, prestige, social life etc. must be asked at the end for the above
reasons. Researcher must start with general questions and slowly move to specific questions and thus
it must follow a funneling approach.
Preparing the first draft- wording the questions: avoid ambiguous questions. The vocabulary and the
language used should be such that there remains no problem in interpreting the question.
 Revising and pre-testing the questionnaire: the questionnaire must be reexamined, revised and
pre tested to guarantee near perfection. Pre testing must be done with small group of sample
taken for the final study. Wording, the no. of questions, sequencing instrument clarity etc need
to be checked.

Collecting primary data through observation method:


Observation involves recording of events or actions as they take place in the environment. Ht
recording of selected activities or events may be through personal or mechanical methods. When a
high degree of accuracy in required the observation method must be employed.

Benefits of observation:
 More authentic information is obtained through observation because the researcher and the
respondent are close to each other.
 Indirect or disguised observation provides actual data on the respondent’s activities and behavior.
 Observation method is simple to implement.
 It is more economical than other methods. & It is quicker to conduct.
Limitations of observation:
 Observer acts as a pure recorder rather than an interviewer.
 Perception or underlying reasons of the purchase cannot be observed.
 The results through may be different form reality.

Secondary data: sources, advantages, limitations, types of data collections.


Secondary data “include those data which are collected for some earlier research work and are
applicable in the study the researcher has presently undertaken”.

Different types of Secondary data


Depending upon the gathering source, secondary data can be divided into two categories:
1. Internal Secondary Data: Internal data is procured by a researcher in normal operations within his
own premises. These data may include credit records, orders, shipments, sales results, advertising
expenditures, detailed operating statements, raw material cots etc.
2. External Secondary Data: External data are generated and collected from variety of events and
sources outside the firm’s premises.
The various external data sources can be divided into four categories:
1. Governmental sources. 2. Commercial sources. 3. Industrial sources. 4. Miscellaneous sources.
Governmental sources:
These sources encompass: (i) Department of census (ii) Central Government and (iii) State, district,
tehsil, block and Panchayat level government sources. The department of census carries information
on census of population, housing, agriculture, business, manufactures etc. The central government has
departments on health, education and social welfare, industries, agriculture housing etc. The data are

-6-
maintained and regularly updated by each of these departments. Similarly the government of each
state and union territory have their secretariats and their capital cities.

Commercial sources:
Certain marketing research and advertising agencies such as Indian Marketing Research Bureau of
Hindustan Thompson Associates Ltd., National Advertising Services (Pvt.) Ltd., ORG, MARG, etc., are
engaged in gathering and providing information to the researchers and other business firms at some
nominal prices. Information can also be sought and procured by researchers from these agencies at
reasonable charges.
Miscellaneous sources:
In this category, we include those researches, which are completed by individual researchers, viz.,
dissertations, monographs, these and others. Research abstracts are also published by some
associations, universities and institutes.
Merits of using Secondary Data
1. Economy and Time: Use of secondary data in research is more economical than primary data.
Primary data collection involves preparing of questionnaires for data collection, going to the field for
actual collection, editing, coding and tabulating of data. This is complicated and tire-some exercise. On
the other hand, collection of primary data may require much longer time. Thus the money advantage
also accompanies the time and effort advantages in the case of secondary data collection vis-à-vis
primary data.
2. Bias and Availability: The secondary data are gathered by certain research agencies on occurrence
of various events e.g. census data. These agencies collect such data for certain purpose. These data
are not biased because the purpose of their collections is different than the objectives the researcher
has in mind while collecting data for a particular research problem.
Limitations of using Secondary Data
1. Limited Applicability
Finding data to suit a specific project is very cumbersome. Collection and use of secondary data
requires a lot of hard work on the part of researcher. The secondary data may have three different
types of variations, which may hinder their use for the project at hand: (i) units of measurement may
be different, (ii) definitions and data classes may be different, (iii) lack of accurecy that is data maybe
outdated (obsolete).
2. Doubtful Accuracy
It is difficult to find data of needed accuracy. Often, the available data are distantly related with the
research problem at hand. It is difficult to determine their accuracy for the present project. More over,
some secondary data may be wrongly collected or fabricated by the research agencies who originally
collected them. Such data cannot be used for the present research project, as their use would distort
the research results.

-7-

You might also like