Sampling

WHAT IS SAMPLING?
Sampling is the act, process, or technique of selecting a suitable sample, or a representative part of a
population for the purpose of determining parameters or characteristics of the whole population. The goal
is to have the smaller study group to resemble as closely as possible the larger group. Sampling permits
the researcher to work with a more manageable group size. The study’s findings can be generalized back
to the total population with inferential statistics.
Sampling is used for gathering data from a population. It is a statistical practice by which observations
are made upon certain individuals of a population so as to derive certain conclusions about the population.
1.1 What is a sample?
A sample is a finite part of a statistical population whose properties are studied to gain information about
the whole (Webster, 1985). When dealing with people, it can be defined as a set of respondents (people)
selected from a larger population for the purpose of a survey.
1.2 Why do we use samples?

1. It is frequently too costly, or impracticable, to collect information for the whole population, i.e to
conduct a census. For example, in business, quality control checks may involve destructive
testing-every component produced cannot therefore be tested to destruction! Likewise, it would
be impractical to measure, for example, the diameter of every single ball-bearing produced to
check for precision.
2. Since in some cases, for example the population of a country, there are obvious practical and
economic reasons which will hinder the study of the whole population, a sample is used so that
the data gathered from the sample may be inferred to the population.
1.3 Advantages of sampling.
1. Reduced cost. If data are secured from only a small fraction of the aggregate, expenditures may
be expected to be smaller than if a complete census is attempted.
2. Greater speed. For the same reason, the data can be collected and summarized more quickly
with a sample than with a complete count. This may be a vital consideration when the
information is urgently needed.
3. Greater scope. In certain types of inquiry, highly trained personnel or specialized equipment,
limited in availability, must be used to obtain the data. A complete census may then be
impracticable: the choice lies between obtaining the information by sampling or not at all. Thus
surveys which rely on sampling have more scope and flexibility as to the types of information
that can be obtained. On the other hand, if information is wanted for many subdivisions or
segments of the population, it may be found that a complete enumeration offers the best solution.
4. Greater accuracy. Because personnel of higher quality can be employed and can be given
intensive training, a sample may actually produce more accurate results than the kind of complete
enumeration that it is feasible to take.
1.4 Disadvantages of sampling

1. Sampling does not give information on every person, business etc
2. Sampling does not provide information for action with respect to individual account.
3. Sampling produces results containing errors of sampling. This is a disadvantage if the error
of sampling is too big for some purpose one has in mind.
2.0 STEPS IN SAMPLING
1. Define population (N) to be sampled is defined.

2. Sample size (n) is determined
3. Control for bias and error.
4. Sample is selected.
2.1 What is a sampling frame?

A Sample frame is a set of the population where all the individuals can be identified and used in the
sampling exercise. It is the actual set from which the sample is to be taken. This sampling frame is
representative of the population.
For example, the sampling frame in a household survey may be the people listed in the telephone
directory.
In most cases, due to time and size constraint, a representative set of the population is taken for
observation. A sample is selected for data collection purposes from the sample frame (hence, from the
population)
2.2 Definition of population

A population can be defined as including all people or items with the characteristic one wish to
understand. The group of interest and its characteristics to which the findings of the study will be
generalized must be identified. It is also called the “target” population (the ideal selection).However at
times the “accessible” or “available” population must be used (realistic selection).
For example,
1. The target population for a household survey may be the Mauritius adult population
2. A manufacturer needs to decide whether a batch of material from production is of high enough
quality to be released to the customer, or should be sentenced for scrap or rework due to poor
quality. In this case, the batch is the population.
2.3 Determination of the sample size.

There is no specific sample size for a research study. Sample sizes depend on the type of study being
conducted and the population being studied. The following is a list of examples to determine sample sizes
for different kinds of research studies.
1. Experimental/Causal Research: Most researchers recommend that the sample size of each group
in an experimental study be at least 30 participants. In some cases the group could be as small as
15 if tight controls are used in establishing the research groups.
2. Co relational Studies: The recommended number for relationship studies is also 30. Smaller
numbers make it difficult to obtain statistical significance.
3. Descriptive Studies: The number of participants in a descriptive study can vary significantly.
Usually the size of the population to be studied has more of an effect on the sample size than any
general sampling rule. Small populations require a larger percentage of the population to be
included in the study.
Sample size for a given population based on a 5% Level of significance
General Rules:
1. A smaller percentage is required for a larger population.
2. Studies using a population less than 100 should use the entire population.
3. Populations of approximately 300 should use a sample of 50%.
4. If the population size is around 1500, 20% should be sampled.
5. Population of above 100000 would require only 384 in the sample population.
NUMBER SAMPLE NUMBER SAMPLE

SIZE SIZE
10 10 150 108
20 19 200 132
30 28 300 169
40 36 500 217
50 44 1000 278
60 52 2000 322
70 59 5000 357
80 66 10000 370
90 73 50000 381
100 80 100000 384
2.3.1 Calculating a Sample Size

In general, the larger the sample size, the more closely the sample data will match that from the
population. However in practice, the number of responses will give sufficient precision at an affordable
cost must be worked out.
Calculation of an appropriate sample size depends upon a number of factors unique to each survey and it
is down to us to make the decision regarding these factors. The three most important are:
1. How accurate one wishes to be

2. How confident one is in the results
3. What budget one has available
The temptation is to say all should be as high as possible. The problem is that an increase in either
accuracy or confidence (or both) will always require a larger sample and higher budget. Therefore a
compromise must be reached and the degree of inaccuracy and confidence one is prepared to accept must
be worked out.
For example in a Market research project, values such as mean income and mean height etc are estimated.
For a mean
The required formula is: s = (z / e)2
Where:
s = the sample size
z = a number relating to the degree of confidence one wishes to have in the result. 95% confidence* is
most frequently used and accepted. The value of ‘z’ should be 2.58 for 99% confidence, 1.96 for 95%
confidence, 1.64 for 90% confidence and 1.28 for 80% confidence.
e = the error that can be accepted, measured as a proportion of the standard deviation (accuracy)
If mean income is being estimated and one wishes to know what sample size to aim for, in order that one
can be 95% confident in the result. Assuming that an error of 10% of the population standard deviation
can be accepted the following calculation can be used:
s = (1.96 / 0.1)2
Therefore s = 384.16
In other words, 385 people would need to be sampled to meet our criterion.
If the whole population had been interviewed, then the confidence level would have been 100%. But
since only a sample has been interviewed, one is less confident. As the sample size calculation has been
based on the 95% confidence level, this means that one can be confident that amongst the whole
population there is a 95% chance that the mean is inside the acceptable error limit. However there is of
course a 5% chance that the measure is outside this limit. If one wanted to be more confident, the sample
size calculation should have been based on a 99% confidence level and if a lower level of confidence can
be accepted, then the calculation can be based on the 90% confidence level.
2.4 Control for sampling bias and error
1. The sources of sampling bias must be known and it must be identified how to avoid it.
2. It must be decided whether the bias is so severe that the results of the study will be
seriously affected
3. In the final report, awareness of bias, rationale for proceeding, and potential effects must
be documented.
2.5 Selection of the sample

It is the process by which the researcher attempts to ensure that the sample is
representative of the population from which it is to be selected. The sampling
method that will be used must be identified.
3.0 HOW TO SELECT THE MOST APPROPRIATE SAMPLING

METHOD
A variety of sampling methods can be employed, individually or in combination. Factors

commonly influencing the choice between these designs include:
1. Nature and quality of frame.
2. Availability of auxiliary information about units on the frame.
3. Accuracy requirements and the need to measure accuracy.
4. Whether detailed analysis of the sample is expected.
5. Cost/operational concerns.
4.0 PROBABILITY SAMPLING
A probability sampling scheme is one in which every unit in the population has a chance (greater than
zero) of being selected in the sample, and this probability can be accurately determined. The combination
of these traits makes it possible to produce unbiased estimates of population totals, by weighting sampled
units according to their probability of selection.
Probability sampling includes:
1. Simple Random Sampling

2. Systematic Sampling
3. Stratified Sampling,
4. Cluster or Multistage Sampling.
These various ways of probability sampling have two things in common:
1. Every element has a known nonzero probability of being sampled.

2. It involves random selection at some point.
4.1 SIMPLE RANDOM SAMPLING (SRS)

A simple random sample is one in which each member (person) in the total population has an
equal chance of being picked for the sample. In addition, the selection of one member should in
no way influence the selection of another. Simple random sampling should be used with a
homogeneous population, that is, one composed of members who all possess the same attribute
you are interested in measuring. In identifying the population to be surveyed, homogeneity can
be determined by asking the question, “What is (are) the common characteristic(s) that are of
interest?” These may include such characteristics as age, sex, rank/grade, position, income,
religious or political affiliation, etc.
Example: Placing names in a hat and drawing the sample is a method of using simple random
sampling.
4.1.1 Steps
The following steps are used to randomly select a sample.
1. The researcher identifies and defines the population.

2. The appropriate sample size is determined.
3. The population is listed and all members are assigned a number. Each individual must have
the same number of digits as the others.
4. The researcher then uses the table or random numbers to select each member of the sample.
Many statistics and research books contain random number tables

similar to the sample shown below.
How to use a random number table.
1. Assuming that there is a population of 185 students and each student has been assigned a number
from 1 to 185. Suppose we wish to sample 5 students
2. Since the population consists of 185 students and 185 is a three digit number, the first three digits
of the numbers listed on the chart must be used.
3. We close our eyes and randomly point to a spot on the chart. For this example, 20631 in the first
column was selected.
4. That number is interpreted as 206 (first three digits). Since there is no member of the population
with that number, we go to the next number 899 (89990). Once again there is no one with that
number, so we continue at the top of the next column. As we work down the column, the first
number to match the population is 100 (actually 10005 on the chart). Student number 100 would
be in the sample. Continuing down the chart, the other four subjects in the sample would be
students 049, 082, 153, and 005.
Microsoft Excel has a function to produce random numbers.
The function is simply
=RAND()
Type that into a cell and it will produce a random number in that cell. Copy the formula throughout a
selection of cells and it will produce random numbers between 0 and 1.
Whatever range that is required can be obtained if the formula is modified. For example, if random
numbers from 1 to 250 are needed, the following formula could be entered:
=INT(250*RAND())+1
The INT eliminates the digits after the decimal, the 250* creates the range to be covered, and the +1
sets the lowest number in the range.
4.1.2 Example:
The University Of Mauritius has decided to offer 10 books to the MBA group B class. In order that the
books are given to 10 random students, a random sampling is carried out using Simple Random Sampling
of 10 students from the class. The names or roll nos, available on XL sheets are listed and a random
number. is allocated to all the students. The list is then sorted in ascending order and the fist 10 names
sorted out are provided with the books.
4.1.3 Advantages:
1. It is easy to conduct
2. strategy requires minimum knowledge of the population to be sampled
3. It minimises bias and simplifies analysis of results. In particular, the variance between individual
results within the sample is a good indicator of variance in the overall population, which makes it
relatively easy to estimate the accuracy of results.
4.1.4 Disadvantages
1. SRS can be vulnerable to sampling error because the randomness of the selection may result in a
sample that doesn't reflect the makeup of the population.
For instance, a simple random sample of ten people from a given country will on average
produce five men and five women, but any given trial is likely to over represent one sex and
under represent the other. Systematic and stratified techniques, discussed below, attempt to
overcome this problem by using information about the population to choose a more representative
sample.
2. SRS may also be cumbersome and tedious when sampling from an unusually large target
population. In some cases, investigators are interested in research questions specific to subgroups
of the population.
For example,
Researchers might be interested in examining whether cognitive ability as a predictor of job

performance is equally applicable across racial groups. SRS cannot accommodate the needs of
researchers in this situation because it does not provide subsamples of the population.
3. It requires the names or the list of all the population members.

4. There is difficulty in reaching all selected in the sample.
SYSTEMATIC SAMPLING
4.2.1 Systematic sampling is a method of random sampling. The individuals to

be sampled are selected at a uniform interval that is measured in time,
order, or space.
4.2.2 Steps:
The following steps are used to systematically select a sample using every (Kth) name.
1. The researcher identifies and defines the total population.
3. The population is listed using the names of the members.
4. The researcher then determines the sample size.
5. The researcher next divides this sample into the total population producing the Kth number.
6. The researcher then selects a random starting point on the population list within the first Kth
number. For example, the population is 1,000 and we need a sample of 50. The Kth number is 20.
The researcher randomly selects a starting point (some number between 1 and 20).
7. The next member of the sample is chosen by adding 20 to the random starting point.
4.2.2Example:
Let's assume that we have a population that has N=100 people in it.
We want to take a sample of n=20.
To use systematic sampling, the population must be listed in a random order.
The sampling fraction would be f = 20/100 = 20%.
The interval size, k, is equal to N/n = 100/20 = 5.
Now, select a random integer from 1 to 5.
By inserting a formula in XL, we get 4.
Now, to select the sample, we start with the 4th unit in the list and take every k-th unit (every 5th, because
k=5). We would be sampling units 4, 9, 14, 19, and so on to 100 and we would end up with 20 units in
our sample.
4.2.3 Advantages
1. It is easier to extract the Sample than in simple random Sampling.
2. It ensures that the individuals chosen are spread across the population.
3. It can be used in case where sampling frame does not exist
4.2.4 Disadvantages
1. All members of the population do not have an equal chance of being selected.
2. The Kth person may be related to a periodical order in the population list, producing
unrepresentativeness in the sample.
3. It may prove to be costly and time consuming if samples are not conveniently located.
4. Bias can occur where there are recurring sets in the population.
4.3 STRATIFIED SAMPLING
A second method of modifying the random sampling process is called stratified sampling. In this case the
population is divided into subgroups chosen by the researcher. Stratified sampling can be proportional or
non-proportional. In proportional sampling the participants are chosen in proportion to the number in each
subgroup. Non proportional sampling occurs when the response weight of the subgroup is not a factor.
4.3.1 Steps:
Sampling is carried out as follows:
1. The population is divided into non-overlapping groups N1, N2, N3, ... Ni, such that N1 + N2 + N3
+ ... + Ni = N where N is population size.
2. The proportion of Ni/N is found.
3. A simple random sample of f = n/N is carried out in each strata.
4.3.2 Example
An example might be taken from University of Mauritius. The opinion of students in the Faculty are to
be taken in connection with their grievances. Suppose from 500 students, there are 300 males and 200
females and out of these, there are 60 male part timers and 50 female part timers.
1. We are required to take a sample of 100 student, stratified according to the above categories.
2. The first step is to find the total number of students (500) and calculate the percentage in each
group.
% male, full time = ( 240 / 500 ) x 100 = 0.48 x 100 = 48
% male, part time = ( 60 / 500 ) x100 = 0.12 x 100 = 12
% female, full time = (150 / 500 ) x 100 = 0.3 x 100 = 30
% female, part time = (50/500)x100 = 0.1 x 100 = 10
3.0 This tells us that of our sample of 100,
48% should be male, full time.

12% should be male, part time.
30% should be female, full time.
10% should be female, part time.
48% of 100 is 48.
12% of 100 is 12.
30% of 100 is 30.
10% of 100 is 10.
4.0 Therefore the above numbers of people are randomly chosen within their strata.
Example of a stratified sampling appropriately used.
The state superintendent of schools wants to determine if geographic location has a significant effect on
teacher support merit pay plan.
1. The researcher identifies and defines the population.

3. The variable and subgroups (strata) for which the researcher wants to guarantee appropriate,
equal representation is identified.
4. All members of the population are classified as members of one identified subgroup.
5. Using a table of random numbers) an “appropriate” number of individuals from each of the
subgroups is randomly selected.
Steps:
1. The total teacher population in state is listed.
2. Teachers are stratified into geographic regions.
3. Proportional sampling is used to select study participants: 20% of population in each area.
4. Teachers are randomly placed into two study groups.

4.3.3 Advantages
1. It gives a more precise sample.
2. It can be used for both proportion and stratification sampling.
3. The sample represents the desired strata.
4. It focuses on important subpopulations but ignores irrelevant ones
5. It improves the accuracy of estimation
6. It is efficient
7. Sampling equal numbers from strata varying widely in size may be used to equate the statistical
power of tests of differences between strata.
4.3.4 Disadvantages
1. It requires the name of all population members.
2. There is difficulty in reaching all in the selected members.
3. The researcher must have the names of all the populations.
4. It can be difficult to select relevant stratification variables
5. It is not useful when there are no homogeneous subgroups
6. It can be expensive
7. It requires accurate information about the population.

4.4 CLUSTER SAMPLING
The random sampling process can be modified by using the cluster sampling process. Cluster sampling
utilizes the convenience of naturally occurring groups. It is particularly useful in situations for which no
list of the elements within a population is available and therefore cannot be selected directly. As this form
of sampling is conducted by randomly selecting subgroups of the population, possibly in several stages, it
should produce results equivalent to a simple random sample.
The sample is generally done by first sampling at the higher level(s) e.g. randomly sampled countries,
then sampling from subsequent levels in turn e.g. within the selected countries sample counties, then
within these postcodes, the within these households, until the final stage is reached, at which point the
sampling is done in a simple random manner e.g. sampling people within the selected households. The
‘levels’ in question, are defined by subgroups into which it is appropriate to subdivide your population.
Cluster samples are generally used if:
1. No list of the population exists.

2. Well-defined clusters, which will often be geographic areas exist.
3. A reasonable estimate of the number of elements in each level of clustering can be made.
4. Often the total sample size must be fairly large to enable cluster sampling to be used effectively.
4.4.1Steps:
1. The population is identified and defined.
2. The desired sample size determined.
3. A logical cluster is identified and defined.
4. All clusters (or a list is obtained) that make up the population of clusters is listed.
5. The average number of population members per cluster is estimated.
6. The number of clusters needed is determined by dividing the sample size by the estimated size of
a cluster.
7. The needed number of clusters is randomly selected by using a table of random numbers.
8. All population members in each selected cluster are included in the study.
4.4.2Example of where cluster sampling would be appropriately

used:
A large suburban school district wants to test the effect of a new integrated reading program on sixth
graders. The school division has a sixth grade population of 3,000 students based in 100 classrooms.
Using normal random sampling, the researcher would list all 3,000 students and use a table of random
numbers to select the study participants. This process would create a situation where every one of the 100
classes would have a few students represented in the sample.
Some of the problems that random sampling would create here are:
1. It is difficult to administer since each class would have only a few students in the sample.
2. It is difficult to set up a control and experimental group study since some students would be in the
same class.
3. Increased cost and time to train the participants in all 100 classrooms.
Steps:
1. The cluster to be used must be determined. The logical cluster to use in this study would be each
of the 100 individual classrooms.
2. The 100 classrooms are determined and the number of subjects needed is determined. In this case
30 classrooms have been chosen.
3. The 30 chosen classrooms are determined and using random selection, the 15 classes to be chosen
for each of the experimental and control groups are determined.
4. The treatment or independent variable is applied to the experimental classrooms.

4.4.3Advantages
1. It is efficient.
2. The researcher doesn’t need the names of all population members.
3. It reduces travel to site
4. It is useful for educational research
4.4.4Disadvantages
Fewer sampling points make it less like that the sample is representative
A comparison between Stratified and Cluster Sampling processes

Stratified Sampling Cluster Sampling
Homogeneity within group Homogeneity between groups
Heterogeneity between groups Heterogeneity within groups
All groups are included Random selection of groups
Sampling efficiency improved by increasing Sampling efficiency improved by decreasing cost at
accuracy at a faster rate than cost. a faster rate than accuracy.
NON-PROBABILITY SAMPLING
Nonprobability sampling is any sampling method where some elements of the population have
no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered'), or
where the probability of selection can't be accurately determined. It involves the selection of
elements based on assumptions regarding the population of interest, which forms the criteria for
selection. Hence, because the selection of elements is nonrandom, nonprobability sampling does
not allow the estimation of sampling errors. These conditions place limits on how much
information a sample can provide about the population. Information about the relationship
between sample and population is limited, making it difficult to extrapolate from the sample to
the population.
5.1 CONVENIENCE SAMPLING
It is the process of including whoever happens to be available at the time. It is also called as accidental or
haphazard sampling
5.1.1Steps:
The researcher just interview people at any places as they walk by for example on the street. This is easy
because he just chooses it, without any random mechanism. He chooses the people that walk by.
Sometimes the people could ignore him so it all depends what he is surveying.
5.1.2Examples:
• The female moviegoers sitting in the first row of a movie theatre

• A group of students in a high school do a study about teacher attitudes. They
interview teachers at the school, a couple of teachers in the family and few
others who are known to their parents.
• In a class of 50 students, the teacher chooses the first 5 students who raise
their hands or who answers a question right.
5.1.3Advantages:
1. Convenience sampling is often a preferred option to other methods of
sampling because it allows a researcher to pilot-test an experiment with
minimal resources and time.
2. It is also relatively inexpensive and allows the researcher to get a gross

estimate of the results.
3. It is perhaps the best way of getting some basic information quickly and
efficiently
4. It can be used when it is impossible to access a wider population, for example

due to time
5.1.4Disadvantages:
1. The sample is not an accurate representation of the population
2. The findings from this sample are less definitive
3. Results have to be extrapolated in order to fine tune them.

4. It is completely unstructured approach. Difficulty in determining how much of
the effect (dependent variable) results from the cause (independent variable)
5.2 PURPOSIVE SAMPLING
It is also called “judgement” sampling. A purposive sampling is a non-random sampling in which the
selection of the sample is based on person expertise about the population. As the purposive sampling is
not based on the probability theory therefore, no objective method is used for measuring the reliability of
the sample results. This technique being unscientific always involves the liking and disliking of the
enumerators. This method is useful only when the sample drawn is small provided the selection of the
sample is representative and the investigator is thoroughly skilled and has experience in the field of
inquiry and known the drawbacks of the deliberate selection.
5.2.1Steps:
When taking the sample, reject people who do not fit a particular profile.
5.2.2Example
A researcher wants to get opinions from non-working mothers. They go around an area knocking on
doors during the day when children are likely to be at school. They ask to speak to the 'woman of the
house. Their first questions are then about whether there are children and whether the woman has a day
job.
5.2.3Advantages:
1. The people who do not fit the requirements are eliminated

2. The sample is an accurate or near to accurate representation of the population.
3. The results are expected to be more accurate
4. It is less time consuming
5. It is less expensive as it involves lesser search costs.
5.2.4Disadvantage
Potential for inaccuracy in the researcher’s criteria and resulting sample selections
5.3 QUOTA SAMPLING
In a Market Research context, the most frequently-adopted form of non-probability sampling is known as
quota sampling? In some ways this is similar to cluster sampling in that it requires the definition of key
subgroups. The main difference lies in the fact that quotas (i.e. the amount of people to be surveyed)
within subgroups are set beforehand (e.g. 25% 16-24 yr olds, 30% 25-34 yr olds, 20% 35-55 yr olds, and
25% 56+ yr olds) usually proportions are set to match known population distributions. Interviewers then
select respondents according to these criteria rather than at random.
5.3.1Steps:
Like stratified sampling, the researcher first identifies the stratums and their proportions as they are
represented in the population. Then convenience or judgment sampling is used to select the required
number of subjects from each stratum. This differs from stratified sampling, where the stratums are filled
by random sampling.
5.3.2Example:
The student council at UOM wants to gauge student opinion on the quality of their extracurricular
activities. They decide to survey 100 of 1,000 students using the grade levels (7 to 12) as the sub-
population.
The table below gives the number of students in each grade level.
Table 1. Number of students enrolled at UOM, by grade

Grade level Number of students Percentage of students (%) Quota of students in sample of 100
7 150 15 15
8 220 22 22
9 160 16 16
10 150 15 15
11 200 20 20
12 120 12 12
Total 1,000 100 100
The student council wants to make sure that the percentage of students in each grade level is reflected in
the sample. The formula is:
Percentage of students in Grade 10

= (number of students ÷ number of students) x 100%
= (150 ÷ 1,000) x 100
= 15%
Since 15% of the school population is in Grade 10, 15% of the sample should contain Grade 10 students.
Therefore, use the following formula to calculate the number of Grade 10 students that should be included
in the sample:
Sample of Grade 10 students

= (15% of 100) x 100
= 0.15 x 100
= 15 students
5.3.3Advantages:
1. It is easier to organize as compared to random sampling;

2. It is cheaper to collect samples in this form;
3. More reliable than random sampling;
4. Each group to be researched is included in the sample
5.3.4Disadvantages:
1. People who are less accessible (more difficult to contact, more reluctant to participate) are
underrepresented.
2. The subjective nature of this selection means that only about a proportion of the population has a
chance of being selected in a typical quota sampling strategy.
3. It does not meet the basic requirement of randomness.
4. Some units may have no chance of selection or the chance of selection may
be unknown. Therefore, the sample may be biased.
5. Not as representative of the population as a whole as other sampling

methods
6. Because the sample is non-random it is impossible to assess the possible

sampling error
SNOWBALL SAMPLING
In snowball sampling, someone who meets the criteria for inclusion in the study is identified. The person
is then asked to recommend others who they may know who also meet the criteria. Although this method
would hardly lead to representative samples, there are times when it may be the best method available.
5.4.1 Steps:
1. Find people to study.
2. Ask them to refer you other people who fit your study requirements, then
follow up with these new people.
3. Repeat this method of requesting referrals until you have studied enough
people.
5.4.2Example:
If the homeless are being studied, it is not likely to find good lists of homeless people within a specific
geographical area. However, if we go to that area and identify one or two, we may find that they know
very well who the other homeless people in their vicinity are and how we can find them.
5.4.3 Advantages
Snowball sampling is especially useful when we are trying to reach populations that are
inaccessible or hard to find.
5.4.4Disadvantages:
1. It is a good qualitative material but poor in terms of generating reliable data

that applies to the larger population
2. The way that the sample is chosen by target people makes it liable to various
forms of bias. People tend to associate not only with people with the same
study selection characteristic but also with other characteristics.
Heterogeneity Sampling
Homogeneity Sampling
If all opinions or views need to be included, and there is no concern for representing these views
proportionately then heterogeneity sampling is performed. Another term for this is sampling for diversity.
In many brainstorming or nominal group processes (including concept mapping), some form of
heterogeneity sampling is used because the primary interest is in getting broad spectrum of ideas, not
identifying the "average" or "modal instance" ones. In effect, what must be sampled is ideas and not
people. Here the universe is made up of all possible ideas relevant to some topic and a sampling of this
population is needed, not a sample of the people who have the ideas. Clearly, in order to get all of the
ideas, and especially the "outlier" or unusual ones, a broad and diverse range of participants must be
included. Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.
How to ensure that the sample is representative of the population
An essential prerequisite is that any sample must be selected in such a way as to be representative of the
population from which it has been drawn. The fundamental consideration is that any sample should be a
random sample, i.e. every member of the population should have an equal chance of being selected.
However a representative sample does not mean that it is an exact replica, in miniature, of the population
parameters. The results are subject to sampling error.
WHAT ARE RESPONSE RATES?
The percentage of people who respond to your survey is considered the response rate. A high survey
response rate helps to ensure that the survey results are representative of the survey population.
Sufficient response rates are important for surveys. A survey that collects very little data may not contain
substantial information. In order to collect successful responses, researchers must take into consideration
the audience, the quantity of online surveys in circulation, and the potential for surveys reported as spam.
These factors may result in lower respondent interest and acceptance of survey invitations. But there are
ways to increase response rates!
The Importance of Response Rates

A high response rate is the key to legitimizing a survey's results. When a survey receives responses from
a large percentage of its target population, the findings are seen as more accurate. Low response rates,
on the other hand, can damage the credibility of a survey's results, because the sample is less likely to
represent the overall target population.
Low response rates are a continuing problem for survey organizations. Some people simply refuse to
participate in surveys, while others, for a wide range of reasons, cannot participate. Still, a well-
designed survey, coupled with incentives and techniques to elicit response, can help guarantee a healthy
response rate.
Reasons for Non-Response
There are many reasons why people might choose not to respond to a survey. Sometimes time is a
factor. People may feel they can't spare the time to participate in a survey. Others may see a survey as a
nuisance, particularly telephone and mail surveys. However, some factors that can cause non-response
lie in the hands of the surveyors themselves, and can thus be avoided. The following list includes some
of the pitfalls that can lead to non-response:
• If potential respondents have trouble understanding the questions, the chance that they
will choose not to participate increases. Survey questions must be clear and concise.
• The survey format must be unambiguous and consistent. Question formats should also
remain consistent and not jump randomly from type to type (i.e. multiple choice to short answer
and back again). Instructions should be as explicit as possible.
• People are much more likely to respond to a nicely designed survey. A form that looks
unprofessional or haphazardly constructed will undoubtedly lead to a lower response rate. Web
surveys that require too much scrolling or contain too many pages can also inhibit response.
• Telephone Surveys can occur any time during the day, but the incredible growth of the
telemarketing industry has led many people to screen their calls, especially during the dinner
hour. If telephone interviewers identify themselves and their purpose up front, instances where
people assume they are telemarketers and screen them out can be minimized.
Response Issues
Not only do survey researchers have to be concerned about non response rate errors, but they also have to
be concerned about the following potential response rate errors:
• Response bias occurs when respondents deliberately falsify their responses. This error greatly
jeopardizes the validity of a survey's measurements.
• Response order bias occurs when a respondent loses track of all options and picks one that
comes easily to mind rather than the most accurate.
• Response set bias occurs when respondents do not consider each question and just answer all the
questions with the same response. For example, they answer "disagree" or "no" to all questions.
These response errors can seriously distort a survey's results. Unfortunately, response bias is difficult to
eliminate; even if the same respondent is questioned repeatedly, he or she may continue to falsify
responses. Response order bias and response set errors, however, can be reduced through careful
development of the survey questionnaire.
Methods That Can Induce Response

Just as there are ways to avoid causing non-response, there are numerous proven methods that can
stimulate response. Some of the methods survey organizations use to help increase response rates
include the following:
• Incentives are perhaps the most effective method to ensure participation. Survey
organizations use many kinds of incentives to elicit response, such as offering to share the
survey's findings or awarding a certain number of 'points' for each survey taken that can then be
redeemed for prizes. Some survey organizations enter respondents in a sweepstakes or even pay
a modest stipend for participation.
• Although answering machines are generally viewed as a problem, they can also be used
to a survey organization's advantage. A simple message requesting a call back can be very
effective, especially if the organization uses an 800 number.
• Postcards or e-mails announcing upcoming surveys have been shown to increase
response.
• Successful survey organizations always follow up the initial invitation with a reminder to
those that have not yet responded.
• Establishing legitimacy can help convince potential respondents to participate in a
survey. A good survey tells potential respondents, who is conducting the survey and what
credentials they hold. It also outlines procedures for asking questions and providing feedback.
• Surveying employees is a great way to gauge both opinion and workplace efficiency, but
these surveys only work if enough employees participate. Offering employees time to fill out a
survey not only ensures participation, it also sends a positive message that their opinions are
valued, leading to honest, more useful responses.
PART 2
Discuss how population parameters(mean, variance and proportion)are estimated from sample
parameters
E S T IMA T IO N OF P O P U L A T IO N P A R A ME TE R S
Every member of a population cannot be examined so we use the data from a sample, taken from the
same population, to estimate some measure, such as the mean, of the population itself.
The sample will provide us with the best estimate of the exact 'truth' about the population. The method of
sampling depends on the data available but the ideal method, as every member of the population has an
equal chance of being selected, is random sampling.
We estimate limits within which we are expect the 'truth' about the population to lie and state how
confident we are about this estimation.
There are therefore two types of estimate of a population parameter:
• Point estimate - one particular value;
• Interval estimate - an interval centred on the point estimate.

Point Estimate of Parameter (e.g. mean)
From the sample, a single value is calculated to serve as an estimate for the population parameter.
a) The best estimate of the population percentage, π , is the sample percentage, p.
b) The best estimate of the unknown population mean, µ , is the sample mean, x = ∑x .
n
This estimate of µ is often written µ
 and referred to as 'mu hat'.
µ (mu) is the symbol for the population mean
c) Sample variance is calculated using a formula
d) The best estimate of the unknown population standard deviation, σ , is the sample standard deviation s,
where:
∑ ( x − x)
2
s= This is from the [ xσn −1 ] key.

( n − 1)
∑ ( x − x)
2
N.B. s = [ xσn ] key underestimates σ

( n)
Little difference between the two estimates when n is large.
Example 1: The Accountant wishes to obtain some information about all the invoices sent out to a
supermarket's account customers. In order to estimate this information, a sample of twenty invoices is
randomly selected from the whole population.
Values of Invoices (£):

32.53 22.27 33.38 41.47 38.05
31.47 38.00 43.16 29.05 22.20
25.27 26.78 30.97 38.07 38.06
25.11 24.11 43.48 32.93 42.04
1) The proportion in the population over £40, π


4
p= × 100 = 20% πˆ = 20%
20
2) The population mean, µ


658.44
x = = £32.92 , µ
 = £32.92
20
3) The population standard deviation, σ


s (from xσ ) = £7.12,
n-1 σ
 = £7.12
Interval Estimate (Confidence interval)
Often it is more useful to quote two limits between which the parameter is expected to lie, together with
the probability of it lying in that range.
The limits are called the confidence limits and the interval between them the confidence interval.
e.g. We are 95% confident that the mean male height lies between 5' 9" and 5' 11".
The width of the confidence interval depends on three sensible factors:
• the degree of confidence we wish to have in it,

the chance of it including the 'truth', e.g. 95%;
• the size of the sample, n;

• the amount of variation among the members
of the sample, i.e. its standard deviation, s.
Confidence interval for the population mean (µ ) where σ is unknown (Usual case)
If the sample size is small, (n < 30), and the population standard deviation is unknown, then the t-
tables are used.
These give a wider interval and so compensates for the probable error in estimating the value of the
population standard deviation from the sample standard deviation.
(If the sample size is large either table gives a similar result.)
s
Confidence Interval: μ = x ± t  where t is from the t-table with (n-1) degrees of freedom
n
In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter.
Instead of estimating the parameter by a single value, an interval likely to include the parameter is given.
Thus, confidence intervals are used to indicate the reliability of an estimate. How likely the interval is to
contain the parameter is determined by the confidence level or confidence coefficient. Increasing the
desired confidence level will widen the confidence interval.
A confidence interval is always qualified by a particular confidence level, usually expressed as a
percentage; thus one speaks of a "95% confidence interval". The end points of the confidence interval are
referred to as confidence limits. For a given estimation procedure in a given situation, the higher the
confidence level, the wider the confidence interval will be.
The calculation of a confidence interval generally requires assumptions about the nature of the estimation
process – it is primarily a parametric method – for example, it may depend on an assumption that the
distribution of the population from which the sample came is normal. As such, confidence intervals as
discussed below are not robust statistics, though modifications can be made to add robustness – see robust
confidence intervals.
The purpose of sampling is to draw inferences about a population parameter on the basis of sample
information.
Point estimators
A sample mean derived from a process of random sampling provides a good estimator of the population
mean in the sense that it is one that is near to the true population mean. A single sample mean may also be
regarded as a good estimator as it provides an unbiased estimate of the population mean. The probability
of a sample mean selected at random exceeding by certain amounts is exactly equal to the
probability of it being below by the same amounts.
We can say that µ=X where the hat (^) on µ indicates that it is an estimate of µ, the unknown population
parameter. Thus the sample mean X may be used as an estimator-an unbiased estimator- of the population
mean, µ.
Since the value of the estimator, X, computed from a single sample is a single value, it is referred to as a
point estimate of the unknown population mean because it represents a single point on the scale of
possible values.
Interval estimators
in statistics, the evaluation of a parameter—for example, the mean (average)—of a population by

computing an interval, or range of values, within which the parameter is most likely to be
located. Intervals are commonly chosen such that the parameter falls within with a 95 or 99
percent probability, called the confidence coefficient. Hence, the intervals are called confidence
intervals; the end points of such an interval are called upper and lower confidence limits.
The interval containing a population parameter is established by calculating that statistic from
values measured on a random sample taken from the population and by applying the knowledge
(derived from probability theory) of the fidelity with which the properties of a sample represent
those of the entire population.
The probability tells what percentage of the time the assignment of the interval will be correct but not
what the chances are that it is true for any given sample. Of the intervals computed from many samples, a
certain percentage will contain the true value of the parameter being sought
For example,
Suppose we want to estimate the mean summer income of a class of business students.
For n=25 students,
is calculated to be 400 $/week.(Point estimate)
An alternative statement is:
The mean income is between 380 and 420 $/week.(Interval estimate)
Qualities of Estimators
Qualities desirable in estimators include unbiasedness, consistency, and relative efficiency:
• An unbiased estimator of a population parameter is an estimator whose expected value is equal

to that parameter.
• An unbiased estimator is said to be consistent if the difference between the estimator and the
parameter grows smaller as the sample size grows larger.
• If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to
be relatively efficient.
Unbiasedness
An unbiased estimator of a population parameter is an estimator whose expected value is equal to

that parameter.
E.g. the sample mean is an unbiased estimator of the population mean , since:
E( )=
Consistency
An unbiased estimator is said to be consistent if the difference between the estimator and the
parameter grows smaller as the sample size grows larger.
E.g. is a consistent estimator of because:
V( ) is
That is, as n grows larger, the variance of X grows smaller.

Efficiency
If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to
be relatively efficient.
E.g. both the the sample median and sample mean are unbiased estimators of the population
mean, however, the sample median has a greater variance than the sample mean, so we choose
since it is relatively efficient when compared to the sample median.
Four commonly used confidence levels are
Standard error of the mean
As sample size increases, the sample means cluster more and more around the true population mean. Thus
the variance and standard deviation of the sampling distribution decline as sample size is increased. This
standard deviation is formally referred to as the standard error of the mean
σx
Where σ is the population standard deviation and n is the sample size.
The standard error declines as the sample size is increased, not proportionately- it declines according to
√n, not n.
Importance of confidence interval estimators for parameters.
In statistics, a confidence interval (CI) is a particular kind of interval estimate of a population parameter.
Instead of estimating the parameter by a single value, an interval likely to include the parameter is given.
Thus, confidence intervals are used to indicate the reliability of an estimate. How likely the interval is to
contain the parameter is determined by the confidence level or confidence coefficient. Increasing the
desired confidence level will widen the confidence interval.
A confidence interval is always qualified by a particular confidence level, usually expressed as a

percentage; thus one speaks of a "95% confidence interval". The end points of the confidence interval are
referred to as confidence limits. For a given estimation procedure in a given situation, the higher the
confidence level, the wider the confidence interval will be.
The calculation of a confidence interval generally requires assumptions about the nature of the estimation
process – it is primarily a parametric method – for example, it may depend on an assumption that the
distribution of the population from which the sample came is normal. As such, confidence intervals as
discussed below are not robust statistics, though modifications can be made to add robustness – see robust
confidence intervals.
REFERENCE
http://writing.colostate.edu/guides/research/survey/com2d4.cfm
http://davidmlane.com/hyperstat/A12977.html
http://www.britannica.com/EBchecked/topic/466339/point-estimation
http://onlinestatbook.com/chapter8/mean.html
Division of Instructional Innovation and Assessment, The University of Texas at Austin. “Guidelines for
Maximizing Response Rates.” Instructional Assessment Resources. 2007.
http://www.utexas.edu/academic/diia/assessment/iar/teaching/gather/method/survey-Response.php
http://en.wikipedia.org/wiki/Sampling_(statistics)
http://www.gap-system.org/~history/Extras/Cochran_sampling_intro.html
http://www.marketresearchworld.net/index.php?
option=com_content&task=view&id=23&Itemid=1&limit=1&limitstart=1
http://www.socialresearchmethods.net/kb/sampterm.php
http://www.socialresearchmethods.net/tutorial/Mugo/tutorial.h

Sampling

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sampling

Uploaded by

Copyright:

Available Formats

WHAT IS SAMPLING?

1.1 What is a sample?

1.2 Why do we use samples?

1.3 Advantages of sampling.

1.4 Disadvantages of sampling

2.0 STEPS IN SAMPLING

1. Define population (N) to be sampled is defined.

2.1 What is a sampling frame?

2.2 Definition of population

2.3 Determination of the sample size.

Sample size for a given population based on a 5% Level of significance

3. Populations of approximately 300 should use a sample of 50%.

4. If the population size is around 1500, 20% should be sampled.

NUMBER SAMPLE NUMBER SAMPLE

100 80 100000 384

2.3.1 Calculating a Sample Size

1. How accurate one wishes to be

2.4 Control for sampling bias and error

2.5 Selection of the sample

3.0 HOW TO SELECT THE MOST APPROPRIATE SAMPLING

A variety of sampling methods can be employed, individually or in combination. Factors

1. Nature and quality of frame.

2. Availability of auxiliary information about units on the frame.

3. Accuracy requirements and the need to measure accuracy.

4. Whether detailed analysis of the sample is expected.

4.0 PROBABILITY SAMPLING

Probability sampling includes:

1. Simple Random Sampling

These various ways of probability sampling have two things in common:

1. Every element has a known nonzero probability of being sampled.

4.1 SIMPLE RANDOM SAMPLING (SRS)

1. The researcher identifies and defines the population.

Many statistics and research books contain random number tables

How to use a random number table.

Microsoft Excel has a function to produce random numbers.

The function is simply

Researchers might be interested in examining whether cognitive ability as a predictor of job

3. It requires the names or the list of all the population members.

4.2.1 Systematic sampling is a method of random sampling. The individuals to

1. The researcher identifies and defines the total population.

2. The appropriate sample size is determined.

3. The population is listed using the names of the members.

4. The researcher then determines the sample size.

We want to take a sample of n=20.

To use systematic sampling, the population must be listed in a random order.

The sampling fraction would be f = 20/100 = 20%.

The interval size, k, is equal to N/n = 100/20 = 5.

Now, select a random integer from 1 to 5.

By inserting a formula in XL, we get 4.

1. It is easier to extract the Sample than in simple random Sampling.

3. It can be used in case where sampling frame does not exist

4.3 STRATIFIED SAMPLING

Sampling is carried out as follows:

2. The proportion of Ni/N is found.

3. A simple random sample of f = n/N is carried out in each strata.

% male, full time = ( 240 / 500 ) x 100 = 0.48 x 100 = 48

% male, part time = ( 60 / 500 ) x100 = 0.12 x 100 = 12

% female, full time = (150 / 500 ) x 100 = 0.3 x 100 = 30

% female, part time = (50/500)x100 = 0.1 x 100 = 10

3.0 This tells us that of our sample of 100,

48% should be male, full time.

30% should be female, full time.