You are on page 1of 79

Chapter three

Sampling techniques and sample size determination

TADESSE AWOKE

Why you are here?


you have an interest in statistical reasoning you have a desire to learn to use statistics properly in experimental design and data analysis you want to develop your ability to critically assess scientific (or pseudoscientific) arguments

What is expected from you


attendance at most lectures Participation in class feedback to me on what you like and dislike about the course, and especially how it might be improved Submission of assignment on time

History of statistics
Sir Ronald Aylmer Fisher (1890-1962)
The theory of estimates and statistical inference Analysis of variance (Fisher's ANOVA) Maximum likelihood estimation

Thomas Bayes ()
Probabilities theory Prior and posterior distribution

What Do Biostatisticians Do?


Identify and develop treatments for disease and estimate their effects. Identify risk factors for diseases. Design, monitor, analyze, interpret, and report results of clinical studies. Develop statistical methodologies to address questions arising from medical/public health data.
5

Why Can it be Interesting?


Combines rigors of mathematics with uncertainties of the real world. Can make contribution to advancement of science, statistics, medicine, and public health. Can study diseases/health problems in which you may have an interest (cancers, HIV, reproductive health, ). Whether you apply statistics to biological or other processes, it is the art of decision making in the face of uncertainty.
6

How to properly use Biostatistics


Develop an underlying question of interest Generate a hypothesis Design a study Collect Data Analyze Data
Descriptive statistics Statistical Inference Interpretation Data and Reporting the Results
7

Define problems, Questions, and Research aims Review the literature

The Role of Statistics in the Scientific Method

Statististical Methods, Measurement tools and models

Develop a hypothesis Design experiments or other tests Collect and record data Peer review Analyze and interpret data Disseminate results Public understanding of research Scientific impact of research
8

Revise or modify Protocol

Replication of results

Investigating Research Integrity

Sampling
Research and Sample:
research is to search or investigate exhaustively It is a careful or diligent search, studious inquiry or examination especially investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts or practical application of such new or revised theories or laws it can also be the collection of information about a particular subject. In research terms a sample is a group of people, objects, or items that are taken from a larger population for measurement. The sample should be representative of the population to ensure that we can generalize the findings from the research sample to the population as a whole.

Idea Behind Sampling


The term sampling refers to strategies that enable us to pick a subgroup from a larger group and then use this subgroup as a basis for making inferences about the larger group. The researcher's goal is always to generalize about the population based on observations of the sample. Sampling strategies not only make it possible to collect data from a smaller number of respondents, but these strategies also make it possible to go into greater depth with this smaller number by asking more and deeper questions or by following up the structured questions with more openended or qualitative questions (see chapter 1) than would be possible with a larger group of respondents.

10

Sampling Strategy
The manner in which a sample is drawn is an important factor in determining how useful the sample will be for making inferences about the population from which it is drawn. It is quite possible to have a very large sample upon which no sound decision can be based. This occurs because the respondents in the sample are not really similar to the population about which we want to make generalizations. To be useful, the sample must be representative of the population about which we wish to make generalizations.

11

Basic Terms
A population is a group of individuals persons, objects, or items from which samples are taken for measurement for example a population of presidents or professors, books or students. Census Obtained by collecting information about each member of a population Sample Obtained by collecting information only about some members of a "population Sampling Frame is the list of people from which the sample is taken. It should be comprehensive, complete and up-to-date. Examples of sampling frame: Electoral Register; Postcode Address File; telephone book and so on.
12

Probability samples: With probability sampling methods, each population element has a known (non-zero) chance of being chosen for the sample. Non-probability samples: With non-probability sampling methods, we do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen

13

What is Sampling?
Sampling is the act, process, or technique of selecting a suitable sample, or a representative part of a population for the purpose of determining parameters or characteristics of the whole population. When dealing with people, it can be defined as a set of respondents (people) selected from a larger population for the purpose of a survey. The purpose of sampling is to draw conclusions about populations from samples, we must use inferential statistics which enables us to determine a population`s characteristics by directly observing only a portion (or sample) of the population.

14

What is Sampling?
We obtain a sample rather than a complete enumeration (a census ) of the population for many reasons. Obviously, it is cheaper to observe a part rather than the whole, but we should prepare ourselves to cope with the dangers of using samples. There would also be difficulties measuring whole populations because: The large size of many populations Inaccessibility of some of the population Destructiveness of the observation Accuracy

15

Target and study population


Target population refers to the entire group of individuals or objects to which researchers are interested in generalizing the conclusions. But, because of practicalities, entire target population often cannot be studied. The target population usually has varying characteristics and it is also known as the theoretical population.

16

Steps in Sampling Design


There are steps that we need to follow to get in to the respondents . The steps that we need to do are that:
What is the target population? Define the target population and study population. What are the parameters of interest? Define the parameters of interest of the study. What is the sampling frame? Select the sampling frame. What is the appropriate sampling method? Determine which sampling method we are going to use depending on the setting of the population and the purpose of the study. Plan procedures to select the sampling unit Determine the size of the sample which will be selected from the population. Select actual sampling unite Conduct field work
17

Error in Sampling
A sample is expected to mirror the population from which it comes, however, there is no guarantee that any sample will be precisely representative of the population. The uncertainty associated with an estimate that is based on data gathered from a sample of the population rather than the full population is known as sampling error. So, the question is, why do sample estimates have uncertainty associated with them? There are two reasons for this question. These are:
Estimates of characteristics from the sample data can differ from those that would be obtained if the entire population were surveyed.

Estimates from one subset or sample of the population can differ from those based on a different sample from that same population.
18

Sampling Error
One of the most frequent cause that makes the sample unrepresentative of the total population is sampling error. Sampling error comprises the differences between the sample and the population that are due solely to the particular participants that have been selected. Sampling error can make a sample unrepresentative of its population and it is related to sample size.

19

The cause of sampling error


Chance: main cause of sampling error and is the error that occurs just because of bad luck. Sampling bias: Sampling bias is a tendency to favor the selection of participants that have particular characteristics. The chance component (sometimes called random error) exists no matter how carefully the selection procedures are implemented, and the only way to minimize chance-sampling errors is to select a sufficiently large sample.

20

Non Sampling Error (Measurement Error)


A non-sampling error is an error that results solely from the manner in which the observations are made. It can occur whether the total study population or a sample is being used. It may either be produced by participants in the study or be an innocent by product of the sampling plans and procedures These biased observations can be innocent but very devastating to the findings of the study.

21

The Cause of Non-Sampling Error NonThe interviewers effect The respondent effect Knowing of the study purpose Induced bias Non-response In general, Non-sampling error can be grouped into two main types: systematic and random. Systematic error makes survey results unrepresentative of the target population by distorting the survey estimates in one direction. Random error can distort the results in any given direction but tend to balance out on average Thus, the total survey error is the sum of both sampling error and nonsampling error.

22

Advantage and Disadvantage of Sampling

Advantage of Sampling: Sampling is a must in some situations: Saves time Sampling reduces the study population to a reasonable size that expense are greatly reduced. Sometimes the experiments are done on sample basis Sampling saves the source of data from being all consumed. Sample data is also used to check the accuracy of the census data

23

Disadvantage of Sampling
If sampling is biased, or not representative or too small the conclusion may not be valid and reliable In research the respondents of the study must have a common characteristics which is the basis of the study If the population is very large and there are many sections and subsections, the sampling procedure becomes very complicated If the researcher does not possess the necessary skill and technical knowhow in sampling procedure, then the outcome will be devastated.

24

Types of Sampling
There are many methods of sampling when doing research. One of the most important decisions that any researcher makes is how to obtain the type of participants needed for the study. The sample that we draw for our study determines the generalizability of our findings. When we draw our sample, we want to have a good representation of all of the kinds of people in the population. In General there are two methods of sampling: Probability Sampling Method Non-Probability Sampling Method
25

Types of Sampling Methods

Samples

Non-Probability Samples

Probability Samples

Simple Random Purposive Judgemental Convenience Systematic

Stratified Cluster

Probability Sampling Method


Probability sampling strategies typically use a random or chance process, although there are important exceptions to this rule. What does it mean to be independent? The researchers select each person for the study separately. Let us say you were asked to participate in an experiment, enjoyed it, and told your friends to contact the researcher to volunteer for the study. This would be an example of non-independent sampling. The "equal chance" and "independent" components of random sampling are what makes us confident that the sample has a reasonable chance of representing the population.

27

Sampling Frame
The sampling frame is the population as it is defined and available through records. There are a number of probability sampling techniques that can be used depending on the types of the population complexity we want to study. Where do we start? When we use probability sampling, we begin by defining our population.

28

1. Simple Random Sampling


Simple random sampling is the most straightforward of the random sampling strategies. We use this strategy when we believe that the population is relatively homogeneous for the characteristic of interest. This sampling method has the following assumptions: The population consists of N homogeneous subjects. There is frame for the population The sample consists of n subjects or objects. All possible samples of n subjects are equally likely to occur

29

Procedures to select the sample


How do we actually take a random sample? The specific procedures that you follow may vary depending on your resources, but all involve some type of random process. Depending on the complexity of the population, we can use different tools to select n samples from the frame. These are lottery method, table of random number (they are available in the appendix of many research methods and statistics textbooks) or computer generated random number. Lottery method is appropriate if the total population is not too large, otherwise if the population is too large then it will be very difficult to use lottery method. Thus, table of random number or computer generated random number is the feasible method to be used.

30

Steps in selecting sample using table of random number define the population Determine the desired sample size List the population from 1 to N Assign each of the individuals on the list a consecutive number from 0 to the required number, like 01-99, or 001-999. Decided row wise or column wise to read. Select an arbitrary number in the table of random number with defined number of row and column. Make sure that the number of digits of the selected number should be the same with the number of digit of N. If the selected number corresponds to the number assigned to the individual in the population, then that individual is in the sample, otherwise drop the number and proceed to the next number either row wise or column wise. Repeat the steps until the desired sample size is reached.

31

Example
Assume that the total number of patients who visit Gondar University Hospital for the last six months is N. We want to see the prevalence of TB among those patients who visited the hospital. So if we thing that those patients who visited the hospital within the specified time period are homogeneous with respect to the variable of interest and list of the patients are available, then we can use simple random sampling to select the sample.

32

2. Systematic Random Sampling


A method of selecting sample members from a larger population according to a random starting point and a fixed, periodic interval. Typically, every nth member is selected from the total population for inclusion in the sample population. Systematic sampling is still thought of as being random, as long as the periodic interval is determined beforehand and the starting point is random. It is frequently chosen by researchers for its simplicity and its periodic quality. To use systematic sampling as strategies to select the study subject,

it needs the population to be homogeneous, however the method does not require frame.
Hence, in the absence of frame, this method will be the best choice.

33

Steps in systematic sampling: Define the population Determine the desired sample List the population from 1 to N Determine K, where k=N/n Select a random number between 1 and k, let us denote this number by a Starting at a, take every Kth number on the list until the desired sample is obtained.Then the selected list will be a, a+k, a+2k, a+3k, ,

34

Let's say that we lined up our population into a nice and neat sampling frame and selected every 3rd member. What would our sample look like? Does it look good? Since systematic random sampling is a type of probability sampling, the researcher must ensure that all the members of the population have equal chances of being selected as the starting point or the initial subject. The researcher must be certain that the chosen constant interval between subjects do not reflect a certain pattern of traits present in the population. If a pattern in the population exists and it coincides with the interval set by the researcher, randomness of the sampling technique is compromised.

35

3. Stratified Random Sampling


Stratified random sampling is used when we have subgroups in our population that are likely to differ substantially in their responses or behavior (i.e if the population is heterogeneous). In stratified random sampling, the population is first divided into a number of parts or 'strata' according to some characteristic, chosen to be related to the major variables being studied. For example, you are interested in visual-spatial reasoning and previous research suggests that men and women will perform differently on these types of task So, you divide your sample into male and female members and randomly select equal numbers within each subgroup (or "stratum") With this technique, you are guaranteed to have enough of each subgroup for meaningful analysis. Often we used simple random sampling to select a sample from each strata after stratification.
38

Steps involve in stratified sampling method:


Define the population Determine the desired sample size Identify the variable and subgroups (strata) for which you want to guarantee appropriate representation (either proportional or equal) Classify all members of the population as a member of one of the identified subgroups Randomly select (using simple random sampling) an appropriate number of individuals from each subgroup. Then the total sample size will be the sum of all samples from each subgroup.

39

There are two methods to get the study subject from each subgroup, proportional allocation or equal allocation. We use proportional allocation technique when our subgroups vary dramatically in size in our population

40

The higher the population in the subgroup, the higher the sample size will be. However, equal allocation will be used if the total population from each subgroup is approximately equal. Consider the following figure:

41

Advantage of stratified sampling over simple random sampling It can provide greater precision than a simple random sample of the same size. Because it provides greater precision, a stratified sample often requires a smaller sample, which saves money. A stratified sample can guard against an "unrepresentative" sample We can ensure that we obtain sufficient sample points to support a separate analysis of any subgroup. The main disadvantage of a stratified sample is that it may require more administrative effort than a simple random sample.

42

4. Cluster Random Sampling


If the study covers wide geographical area, using the other methods will be too costly. The idea is, divided the total population in to different clusters and then the unit of selection will be cluster. Therefore, total population in the selected cluster will be taken as the sample.

43

Steps in cluster sampling are:


Define the population Determine the desired sample size Identify and define a logical cluster (can be kebele, Got, residence, and so on) Make a list of all clusters in the population Estimate the average number of population number per cluster Determine the number of clusters needed by dividing the sample size by the estimated size of the cluster Randomly select the required number of clusters (using table of random number as the total number of clusters is manageable) Include in the sample all population in the selected cluster.
44

Consider the following graphical display:

45

5. Multistage Random Sampling


This is the most complex sampling strategy. The researcher combines simpler sampling methods to address sampling needs in the most effective way possible. For example, the administrator might begin with a cluster sample of all schools in the district. Then he might set up a stratified sampling process within clusters. Within schools, the administrator could conduct a simple random sample of classes or grades. By combining various methods, researchers achieve a rich variety of results useful in different contexts.
46

NonNon-Probability Sampling Method


Most probability sampling strategies have a random or chance component, though there are some important exceptions. Sampling strategies have a random or chance component, though there are some important exceptions. It is this random component that gives us confidence that our sample is a reasonably good representation of the population. This random component can be time-consuming and expensive. Hence, in the presence of constraints, the alternative sampling method is non-probability sampling method. Non-probability sampling strategies are used when it is practically impossible to use probability sampling strategies.
47

1. Purposive Sampling
When the desired population for the study is rare or very difficult to locate and recruit for a study, purposive sampling may be the only option. For example, you are interested in studying cognitive processing speed of young adults who have suffered closed head brain injuries in automobile accidents. This would be a difficult population to find.

48

2. Convenience Sampling
Convenience sampling selects a particular group of people but it does not come close to sampling all of a population. The sample would generalize only to similar programs in similar cities. Convenience sampling looks just like cluster sampling. The major difference is that the clusters of research participants are selected by convenience rather than by a random process.

3. Judgment Sampling
The researcher selects the sample based on judgment

4. Quota sampling
It is the non-probability equivalent of stratified sampling. This differs from stratified sampling, where the stratums are filled by random sampling.
49

5. Snowball sampling
It is a special non-probability method used when the desired sample characteristic is rare. Snowball sampling relies on referrals from initial subjects to generate additional subjects. What we need to do in case of snowball sampling is that first identify someone who meets the criteria and then let him/her bring the other he/she knew. While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
50

Sample Size
Determining the sample size for a study is a crucial component of study design. The goal is to include sufficient numbers of subjects so that statistically significant results can be detected. Among the questions that a researcher should ask when planning a survey or study is that "How large a sample do I need? The answer will depend on the aims, nature and scope of the study and on the expected result. All of which should be carefully considered at the planning stage. In general, sample size depends on:
The type of data analysis to be performed

The desired precision of the estimates one wishes to achieve The kind and number of comparisons that will be made The number of variables that have to be examined simultaneously How heterogeneous the sampled population is.

51

There are three possible categories of outcome variables. The first is where the variable of interest has only two alternatives response: yes/no, dead/alive, vaccinated/not vaccinated and so on. The second category covers those outcome variable with multiple, mutually exclusive alternatives responses, such as marital status, religion, blood group and so on. For these two categories of outcome variables, the data are generally express as percentages or rates. So we can use percentage to compute the sample size. The third category covers continuous response variables such as birth weight, age at first marriage, blood pressure and cerium uric acid level, for which numerical measurement are usually made. In this case the data are summarize in the form of means and standard deviations or their derivatives.

52

Sample Size Determination


There are several approaches to determining the sample size. Depending on the type of response variable, whether it is categorical or continuous, we will have two sets of formulas. The sample size determination formulas come from the formulas for the maximum error of the estimates and is derived by solving for n.

Sample Size for Single Population Mean


This is the condition in which the research question is about mean.

Standard deviation of the population: It is rare that a researcher knows


the exact standard deviation of the population.Typically, the standard deviation of the population is estimated:
from the results of a previous survey,

from a pilot study, from secondary data, from judgment of the researcher.
53

Maximum acceptable difference: This is the maximum amount of error that you are willing to accept. Desired confidence level : The confidence level is your level of certainty that the sample mean does not differ from the true population mean by more than the maximum acceptable difference. Commonly we use a 95% confidence level. Then the sample size determination formula for single population mean is defined by:

Where = The level of significance which can be obtain as 1-confidence level. =Standard deviation of the population w= Maximum acceptable difference z /2 = The value under standard normal table for the given value of confidence level
54

Sample Size for Single Population Proportion


This is the situation in which the variable of interest is categorical. Three questions must be answered to determine the sample size for single population proportion: Best estimate of population proportion of the variable of interest : Make your best estimate of what the actual percent of the survey characteristic is. The possible source of this proportion are:
from the results of a previous study,

item from a pilot study, item judgment of the researcher. item Simply taking 50% Maximum acceptable difference: Desired confidence level:

55

Then the formula for the sample size of single population proportion is defined as:

56

Example
One of MPH student want to conduct a research on the prevalence of ANC utilization of mothers in DABAT district. Given that the prevalence from the previous study found to be 45.7% , what will be the sample size he should take to address his objective? Solution Margin of error d= 5% A confidence level of 95% will give the value of as Z/2=1.96. Then using the formula of:

n=382

57

Example A new calcium channel blocker is to be tested for treatment of patients with unstable angina. The effect on heart rate is unknown. Suppose it is determined that a clinically important change in the heart rate as a result of taking this medication is 5 beats per minute in either direction over the initial 48 hour period after taking the medication. What sample size is needed for a study of the change in heart rate if the study is to have 80% power, with =0.05, for detecting a change in =0.05, heart rate of 5 beats per minute in either direction, assuming that the standard deviation is 10 beats/minute.

2 ( z1 + z1 / 2 )2 102 (0.84 + 1.96) 2 n= = = 31.36, use n = 32 2 2 5

Some Considerations

59

Sample size for two population


Equal Sample size for the Difference of proportions

Unequal Sample size for the Difference of proportions

60

Equal Sample size for the Difference of proportions


Here the objective of the study is to check whether there is significant difference between two proportions coming from two different population. The sample size to be taken from each group is assumed to be equal.

61

62

Note that, the formula will give you the sample size which will be taken from each sample and moreover it does not include continuity correction in to account. Continuity correction brings normal curve probability in closer agreement with the binomial probabilities. By applying continuity correction, we increase n slightly.

63

example
An investigator wants to determine if the mortality rate in calves raised by farmer's wives differs from the mortality rate in calves raised by hired managers. He/she hypothesizes a calf mortality rate of 0.25 for calves raised by farmer's wife and 0.40 for calves raised by hired managers. The level of significance, alpha, is stated to be 0.01, and the desired power of the test is 0.95. How many calves should be included in the study? Solution: From the given information, the required sample size can be computed using the following as:

64

Unequal Sample size for the Difference of proportions:

65

Continuity correction

66

Example
The case-fatality rate among cancer patients undergoing standard therapy is 0.90, and is 0.70 for cancer patients receiving a new treatment. Find the required sample size to test a hypothesis that the case-fatality rate differed between groups at the stated level of significance, alpha = 0.05, and desired power of the test, 0.90. Assume that the multiplicative factor is 2. Solution: From the given information the required sample size can be computed as:

Compute the sample size using continuity correction


67

Design effects
The loss of effectiveness by the use of cluster sampling instead of simple random sampling is design effect. The design effect is basically the ratio of actual variance under the sampling method actually used, to the variance computed under the assumption of simple random sampling Working definition of design effect is that factor by how much sample variance for the sample plan exceeds simple random sample of same size. How much worse your sample is from a simple random sample

Formula
Two correlation (within and between clusters) Measures of homogeneity within cluster (intra-class correlation) Intra-class correlation is the degree to which person or hh in the same cluster has same characteristics compared to another selected at random in the whole population Hence deff is affected by cluster size and intra-class correlation Where =intra class correlation Rule of thumb is try deff of 2 or less Sample size clustered = Sample size unclustered deff.

Example
Cluster size used gave an ICC of approximately 0.015. Using this ICC as an approximation, and with a chosen cluster size of 80 this gives us a design effect of 2.11. Hence the sample size will be given by Sample size clustered = Sample size unclustered 2.11.

Sample size using statistical software


As an alternative method, we can use EPI INFO statistical software to calculate the sample size required for the study. Let us assume the population that we want to conduct the study has target population of size N=100,000. The proportion of the variable of interest is not known which means there is no previous study done and hence we decided to use 50 percent as an estimate of the prevalence for that variable. Then the steps that we need to follow to get the required sample size using EPI INFO statistical software are given below:
71

Steps to compute sample size


First make sure you install the software EPI INFO. If your computer has the software, then go to the start menu and open it. This is the window that you are going to get when you open the software:

72

Start page

73

74

75

76

77

1. The main objective of sampling is to: a. get representative sample b. compute summary measure from the sample c. draw conclusion about the population d. get information about the sample we selected 2. . The population in which we can get access to select the sample is known as: a. target pop b. theoretical pop. C. study pop. D. study subject 3 . The error that will come due to bad luck is known as non-sampling error a. True b. False 4. If the cases are too rare, which sampling techniques is advisable to be used? a. systematic b. quota c. snowball d. cluster

78

5. Assume the population that we want to conduct a research is on patients who are following ART in Gondar, moreover these patients have similar characteristics with respect to the study variable. In order to select a sample of patients from all 5000 patients on follow up, which sampling technique is more appropriate? a. simple random b. systematic c. stratified d. cluster

79

You might also like