You are on page 1of 41

Chapter II STEPS OF SCIENTIFIC RESEARCH

Population and Sampling

Population and Sampling


DEFINITION OF TERMS

Population: Collection of units sharing a common

characteristic. It might be: * Finite: possibility of counting all units; e.g. students in a school * Infinite: counting all units is not feasible; e.g. RBCs of an individual. Sample: A subset of a population obtained to investigate properties of the parent population. Target population: Population upon which the results of the study will be generalized.

Population and Sampling


DEFINITION OF TERMS

Sampling population: Population from which the

sample was taken. Ideally, the sampling and target populations should be the same. Sampling unit: Population unit used for sampling. In a population of individuals, the unit is not always a person; e.g. * In a study of patients' satisfaction, a random sample of two days of the week was taken, and all the patients attending the hospital during these two days were interviewed. The sampling unit here is not the individual patient but the day of the week.

Population and Sampling


DEFINITION OF TERMS

* In a study of the prevalence of smoking among school children in the country, a sample of five provinces was selected. From each province three cities were selected. From each city four schools were selected. From each school, three classes were selected. All the students of the selected classes were interviewed.

Population and Sampling


DEFINITION OF TERMS

Analytic unit: This is the population unit used in

data analysis. In the above examples, the analysis of patients' satisfaction and that of the prevalence of smoking pertain to the individual; the analytic unit is the person. Thus, regardless of the sampling unit, the unit of analysis relate more closely to the study questions and objectives. Sampling frame: Listing of all the units that compose the sampling population.

Population and Sampling Sampling:


Sampling is the process of selection of a number of units from a defined study population. In some situations, quite uncommon, the study population is very small, and all its units can be studied. In this case, there is no need for sampling, although the application of statistical principles could not be done since they are based on sampling. More often the population is infinite or too large to be totally included in the study. Hence the importance of sampling.

Population and Sampling Sampling:


Apart from saving time, money and effort, sampling leads to increased precision of the collected data since the resources are concentrated on a smaller number of units. More importantly is the possibility of inference or generalization from sample to population, which rests on randomness of selection of the sample. The sampling involves the following steps: 1. Identification of study population 2. Determination of sampling population 3. Definition of sampling unit 4. Choice of sampling method 5. Estimation of sample size

Population and Sampling Sampling:


1. Identification of study population. As defined above, the study or target population is the one upon which the results of the study will be generalized. It is crucial that the investigator defines the target population clearly, since it is the most important determinant of the sampling population. On the one hand, if the subject of the study could be affected by local sociodemographic characteristics e.g. prevalence studies, the target population is limited to the locality of the study.

Population and Sampling Sampling:


1. Identification of study population. If, on the other hand, the research question pertains to biological processes with minimal effects of the local or socio-demographic characteristics, the process of generalization can be extended to a wider target population, e.g. clinical trials for drug therapy Example, A. In a study of the prevalence of hypertension in city "X", the prevalence rate was found to be 30% among males above 25 years of age. What is the target or study population?

Population and Sampling Sampling:


1. Identification of study population. This prevalence rate of 30% is only generalizable to "males above 25 years of age residing in city X". It does not apply to other gender or age categories, or any other locality. Thus the study population is well defined by age, gender and residence. B. In a study of therapy of hypertension, drug "A" was found to be more efficacious in the treatment of mild hypertension. What is the target population?

Population and Sampling Sampling:


1. Identification of study population. If the study is sound, it can be assumed that its results are applicable to any patient with mild hypertension, regardless of age, gender, race, etc. Hence, the target population is defined only by the level of hypertension.

Population and Sampling Sampling:

2. Determination of sampling population. The sampling population is the one from which the sample is drawn. The definition of the sampling population by the investigator is governed by two factors: Feasibility: this leads the investigator to select a reachable sampling population, for convenience; e.g. hospital-based studies of prevalence where the sampling population consisting of patients attending the hospital is quite different from the target population, the community.

Population and Sampling Sampling:

2. Determination of sampling population. In this case, generalization from the sample to the target population is not possible, and the external validity of the study is jeopardized. External validity: the ability to generalize from the study results to the target population. This leads the investigator to identify the proper sampling population that allows him to generalize his results. In this case, the sampling and the target populations are identical, and the external validity of the study is high.

Population and Sampling Sampling:

3. Definition of sampling unit. The sampling unit might be different from the analysis unit, as outlined above, but more often they are the same. The definition of the sampling unit is done by setting: Inclusion criteria: these specify the characteristics that make a unit eligible for inclusion in the study sample. These characteristics must be clear and well defined. Working definitions might be used e.g. Hypertension is BP > 140/90.

Population and Sampling Sampling:

3. Definition of sampling unit. An example of inclusion criteria in a study of oral contraceptives (OC) and deep venous thrombosis (DVT) is as follows: {Married} {Female} {Using OC for > 2yrs} Exclusion criteria: these are criteria that disqualify eligible units. They are not the opposite of inclusion criteria. They pertain to conditions or factors that might affect the subject of the study.

Population and Sampling Sampling:


Thus, in the above example, the exclusion criteria are NOT: {Unmarried} {Male} {Not using OC for > 2yrs} They could rather be: {Bed-ridden} {History of major surgery} since these two conditions are risk factors for DVT, and hence could affect the studied relationship between OC and DVT.

Population and Sampling Sampling:

4. Choice of sampling method. There are two main types of sampling, nonprobability and probability sampling. Nonprobability sampling is not recommended in medical research, and hence will be discussed briefly. The main emphasis will be on probability sampling. 4.1 Non-probability sampling. In this type of sampling, there is no known probability of selection for each unit. Thus, generalization from study results is not possible since representativeness of the sample cannot be assumed. There are two methods of nonprobability sampling:

Population and Sampling Sampling:

4. Choice of sampling method. 4.1.1 Convenience sampling: The investigator selects a convenient sample; e.g. to assess the opinion of patients about service, the investigator decides to interview all the patients coming to his office today. The assumption that these patients represent all the patients attending the hospital cannot hold. The investigator cannot generalize the results. 4.12 Quota sampling: In the above study, the investigator wants to ensure that all types of patients are represented in his sample.

Population and Sampling Sampling:

4. Choice of sampling method.

He decides to interview 60 males and 60 females. Within each gender he will include 20 individuals <20 yrs of age, 20 >60 yrs, and 20 in between. Thus, he has six categories of age and gender, with a quota of 20 individuals in each. Although this sample might be better than the previous one, yet there is no known probability of selection, and generalization is still risky.

Population and Sampling Sampling:

4. Choice of sampling method. 4.2 Probability sampling: This is the more important sampling type. Here there is a known probability of selection for each sampling unit. However, this necessitates the presence of a sampling frame. There are various methods of probability sampling. 4.2.1 Simple random sampling. - Most basic sampling method - Requirements: complete sampling frame, numbering of sampling units and sample size

Population and Sampling Sampling:

4. Choice of sampling method.

Methods: * lottery method (drawing numbers from a box) * table of random numbers * computer generated random numbers
4.2.2 Systematic random sampling. - Also a basic sampling method - Requirements: * complete sampling frame in the form of a list * numbering of sampling units * sample size

Population and Sampling Sampling:

4. Choice of sampling method. - Methods: * determination of the periodicity of sampling as follows: period = population (N) / sample size (n) e.g. N = 1000 n = 100 then every 1000/100 or 10th sampling unit will be selected * determination of the starting point (in this example from 1 to 10) by simple random sampling * compilation of the required sample according the starting point, the periodicity and the sample size

Population and Sampling Sampling:

4. Choice of sampling method. 4.2.3 Stratified random sampling. - Most suitable to ensure representation of certain subcategories of the population in the sample. These subcategories gain importance whenever they are suspected to affect the research question, or the relationship under study. Example: In the study of the relationship between hypercholesterolemia and CAD, gender and smoking are important factors that might affect this association.

Population and Sampling Sampling:

4. Choice of sampling method.

We might need to study it in various subcategories or strata of gender and smoking: * male smokers * female smokers * male non-smokers * female non-smokers Stratified random sampling is thus a process of dividing the population into mutually exclusive strata, and sampling from these various strata.

Population and Sampling Sampling:

4. Choice of sampling method. 4.2.4 Multi-stage random sample. - Mostly used in surveys - Requirements: * sampling frame of the first population and of subsequently selected sampling units * determination of the different stages of sampling * determination of the required sample size in each stage * compilation of the required sample by simple or systematic random methods

Population and Sampling Sampling:

4. Choice of sampling method. Example: determination of the prevalence of DM in the country * sampling frames are done for the selected units only. In the previous example the needed sampling frames are for: + provinces of the country + cities of the selected provinces + districts of the selected cities + households of the selected districts + individuals of the selected households

Population and Sampling Sampling:

4. Choice of sampling method. Compare these frames to the frame required for simple random sampling: enumeration of all the individual persons in the country.

Population and Sampling Sampling:

4. Choice of sampling method. 4.2.5 Cluster sampling. In a study of the prevalence of schistosomiasis in a village, the sampling units were households. It was decided to select households by simple random sampling. The village had 10,000 households. The required sample size, 300 households, were found to be scattered on an area of 600 km 2 . The time and resources wouldn't allow the investigator to undergo this tedious field work. What would he do?

Population and Sampling Sampling

4. Choice of sampling method 4.2.5 Cluster sampling The investigator noticed that the village consisted of 100 hamlets with an average of 80 to 120 households each. All the hamlets were quite similar regarding socio-demographic characteristics and factors related to the disease. Each hamlet contained various categories of age, gender, occupation, social class, education, etc. Each hamlet was considered as a cluster of households. A sampling frame of the 100 hamlets was prepared.

Population and Sampling Sampling

4. Choice of sampling method 4.2.5 Cluster sampling A simple random sample of "n" clusters was selected. All the households of the selected clusters were included in the sample. The assumption of representativeness of the sample is based on: * the similarity of / or homogeneity among all the clusters * the heterogeneity within each cluster [ note the difference from stratified random sampling ]

Population and Sampling Sampling

4. Choice of sampling method 4.2.6 Area sampling. Area sampling is very similar to cluster sampling. It is used whenever natural clusters are not present, and the investigator creates the required clusters. If the previous study was done in a city, with no hamlets for clustering, a map of the city could be used to divide it into sections or clusters for sampling. The only one condition to fulfill is the similarity among the various sections. This kind of sampling is used in counting the hair of the scalp.

Population and Sampling Sampling

4. Choice of sampling method 4.2.7 Multi-phase sampling. In certain studies, the outcome might be assessed by two or more diagnostic tools. One of these tools might be inexpensive, rapid, harmless and acceptable, while the other could be costly and having potential side-effects. If, for example, the investigator wants to determine the prevalence of angina in the population of males above 30 yrs in a certain locality, his tools are the Rose's questionnaire for angina and the exercise test. He might select a large random sample of the population for interviewing.

Population and Sampling Sampling

4. Choice of sampling method 4.2.7 Multi-phase sampling POPULATION SAMPLE SUBSAMPLE Test 1 Test 2

MULTI-PHASE SAMPLING

Population and Sampling Sampling

5. Estimation of sample size. How many subjects (sampling units) should be studied? The answer to this question is often an empirical choice of a number. This number will erroneously depend only on feasibility i.e. the time allowed for the study, the available resources, the frequency of cases, etc.

Population and Sampling Sampling

5. Estimation of sample size. There is also a common belief that the larger the sample size, the better is the study. This is a kind of misbelief since, similar to a small sample size, a large sample size can lead to methodologic and statistical problems. The major problem with a small sample size is its inability to show a significant difference whenever it is actually present.

Population and Sampling Sampling

5. Estimation of sample size. The size of the sample will depend on the following factors: 1. Magnitude of the difference to be detected (effect size): A large sample size is needed for detection of a minute difference, e.g. an investigator is comparing the effect of two antihypertensive drugs. If the expected difference is in the order of 1-2 mm Hg, he will need a much larger sample size than that required to detect a difference of 5-10 mm Hg. Here, the difference to be detected should be governed by its clinical significance.

Population and Sampling Sampling

5. Estimation of sample size. The same applies for the magnitude of risk in etiologic studies. Thus, the sample size is inversely related to the effect size. 2. Variability of the measurement: Variability of the measurement can be simulated to background noise. The higher this noise is, the more difficult is the detection of the signal, the more effort is required, the more subjects need to be studied. In the previous example, if the drugs are tested on homogenous groups with low variability of blood pressure measurements, required..

Population and Sampling Sampling

5. Estimation of sample size. Thus, sample size is directly related to the standard deviation. detection of the difference will be easier and will need a small number of subjects. The variability of measurements is reflected by the standard deviation or the variance. The higher the standard deviation, the larger is the sample size. .

Population and Sampling Sampling

5. Estimation of sample size. 3. Level of significance: The significance level " " pertains to the maximum risk or probability of rejecting a true null hypothesis. It is also known as error or type I error. Since it is an error, the investigator tends to keep it at minimum. The maximum level of has been set to 5% or 0.05. To be more confident with his results, the investigator might want to minimize his -error to 0.01 or 0.001. However, this is not without cost. The price is in terms of increase in sample size. Thus, sample size is inversely related to the level of -error

Population and Sampling Sampling

5. Estimation of sample size. 4. Power of the study: The power of a study is the probability that it will yield statistically significant result. It is related to another type of error, type II or -error. This error pertains to the risk of accepting the null hypothesis although it is false. The power is equal to (1 - ). The investigator tends to increase the power of his study through minimizing the level of -error. There is no pre-set level of -error as it is the case for -error. However, studies with -error of up to 0.2 (power of 0.8 or 80%) are acceptable.

Population and Sampling Sampling

5. Estimation of sample size. 4. Power of the study: Thus, sample size is inversely related to -error, or directly related to the desired power

To summarize, sample size (n) is directly related to the standard deviation (s), and inversely related to the effect size (ES), -error, and error.

You might also like