Professional Documents
Culture Documents
Sampling
A sample is a subset of a larger population of objects individuals, households, businesses, organizations and so forth. Sampling enables researchers to make estimates of some unknown characteristics of the population in question A finite group is called population whereas a non-finite (infinite) group is called universe A census is a investigation of all the individual elements of a population
Population
Sample
Conduct fieldwork
77
Sampling Units
The sampling unit is a single element or group of elements subject to selection in a sample. Examples: Every student at the university whose first name begins with the letter F All child passengers under 18 years of age who are traveling in a train from destination X to destination Y All jeweler shops in Kapal ar in Istanbul
Sample Size
How you sample is as important as How many you sample. How
Probability samples Non Probability samples
How many
Statistical precision Industry standards
Nonprobability sampling
Probability sampling
Snowball sampling relies upon respondent referrals of others with like characteristics
Convenience sampling
Elements are selected for convenience because theyre available or easy to find. Selection based on ones convenience, by accident, or haphazard way. Often, respondents are selected because they happen to be in the right place at the right time. Thus this sampling method is also known as a haphazard, accidental, or availability sample. Examples: Interviewing people on a street corner or at the mall Surveying students in a classroom Magazine surveys Observing conversations in an on-line chat room
Judgement Sampling
This is a sampling technique in which the business researcher selects the sample based on judgment about some appropriate characteristic of the sample members Example 1: The Consumer Price Index (CPI) is based on a judgment sample of market-based items, housing costs, and other selected goods and services which are representative for most of the overall population in terms of their consumption Example 2: Selection of certain voting districts which serve as indicators for the national voting trend
Quota Sampling
This is a sampling technique in which the business researcher ensures that certain characteristics of a population are represented in the sample to an extent which is he or she desires Example: A business researcher wants to determine through interview, the demand for Product X in a district depending on the gender. If the sample size is to consist of 100 units, the number of individuals from each gender interviewed should correspond to the groups percentage composition of the total population of that district
Religion: .. Christianity ........... 76.4% Christianity 3000 x 76.4% = 2292 ................. Islam ..................... 14.8% Islam 3000 x 14.8% = 444 ................. Hinduism .............. 6.6% Hinduism 3000 x 6.6% = 198 ................. Others ................... 2.2% Others 3000 x 2.2% = 66 _________________________________________________________________________________ _
Advantages include the speed of data collection, less cost, the element of convenience, and representativeness (if the subgroups in the sample are selected properly)
Disadvantages include the element of subjectivity (convenience sampling rather than probability-based which leads to improper selection of sampling units)
Snowball sampling
In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based on the referrals. Hardly leads to representative sample, but useful when population is inaccessible or hard to find. E.g. * the homeless * forced sales properties * wound-up companies
Snowball sampling
yinvolves building a sample through referrals. yonce an initial respondent is identified you ask them to identify others who meet the study criteria. Each of those individuals is then asked for further recommendations. yoften used when working with populations that are not easily identified or accessed, i.e.) a population of homeless persons can be hard to identify, but by using referrals a sample can build quite quickly. ysnowballing does not guarantee representativeness. An option here is to develop a population profile from the literature, and assess representativeness by comparing your sample to your profile.
population
When population is rather uniform (e.g. school/college students, low-cost houses) Simplest, fastest, cheapest
1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Generate n (sample size) different random numbers between 1 and N 4. The numbers generated denote the elements that should be included in the sample
7. 8.
N = 30 n=6
9 5 8 8 6 5 8
9 0 0 6 0 2 9
4 6 8 4 0 5 1
3 5 8 2 9 8 5
7 6 0 0 7 7 5
8 0 6 4 8 7 9
7 0 3 0 6 1 0
9 1 1 8 4 9 5
6 2 7 5 3 6 5
1 7 1 3 6 5 3
4 6 4 5 0 8 9
5 8 2 3 1 5 0
7 3 8 7 8 4 6
3 6 7 9 6 5 8
7 7 7 8 9 3 9
3 6 6 8 4 4 4
7 6 6 9 7 6 8
5 8 8 4 7 8 6
5 8 3 5 5 3 3
2 2 5 4 8 4 7
9 0 6 6 8 0 0
7 8 0 8 9 0 7
9 1 5 1 5 9 9
6 5 1 3 3 9 5
9 6 5 0 5 1 5
3 8 7 9 9 9 4
9 0 0 1 9 9 7
0 0 2 2 4 7 0
9 1 9 5 0 2 6
4 6 6 3 0 9 2
3 7 5 8 4 7 7
4 8 0 8 8 6 1
4 2 0 1 2 9 1
7 2 2 0 6 4 8
5 4 6 4 8 8 2
3 5 4 7 3 1 6
1 8 5 4 0 5 4
6 3 5 3 6 9 4
1 2 8 1 0 4 9
8 6 7 9 6 1 3
N = 30 n=6
Disadvantages
Need complete list of units Does not always achieve best representativeness Units may be scattered and poorly accessible Heterogeneous population (minorities)
Example
Example
In a face-to-face consumer survey, a sample of 500 shoppers is planned for a 7-day (Mon. Sun.) period at a shopping complex. The sampling is planned for 3 time blocks: 12-3 p.m.; 3-6 p.m.; and after 6-9 p.m. Respondents are sub-divided into 4 ethnic groups: Malays (30%), Chinese (30%), Indian (30%), and Others (10%). Finally, they are categorized into Family and Single. Repeat persons are not allowed in the sampling. Determine you sampling plan and determine the timing for respondent pick-up interval? 500/7 = 72 shoppers per day 72/3 = 24 per time block 24/3 = 8 shoppers per hour 8/4 = 2 shoppers per ethnic group per hour 60/8 = 7.5th. minutes pick-up interval
Disadvantages
Need complete list of units If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample.
Procedure
Divide (stratify) sampling frame into homogeneous subgroups (strata) e.g. age-group, urban/rural areas, regions, occupations; Draw random sample in each stratum If strata population size unequal: sample same proportion of subjects from each stratum (the same sampling fraction is used, so probability proportional to size)
7 16
Break population into meaningful strata and take random sample from each stratum Can be proportionate or disproportionate within strata When: * population is not very uniform (e.g. shoppers, houses) * key sub-groups need to be represented more precision * variability within group affects research results * sub-group inferences are needed
Sole Proprietor
Partnership
300
120
Sole Proprietor
Partnership
Disadvantages
Sampling error is difficult to measure Different strata can be difficult to identify Loss of precision if small numbers in individual strata (resolved by sampling proportional to stratum population)
Cluster sampling
Principle
Whole population divided into groups e.g. neighbourhoods Random sample taken of these groups (clusters) Within selected clusters: all units e.g. households included or random sample of these units Provides logistical advantage
One-Stage Sampling
Two-Stage Sampling
Multistage Sampling
- Weighting of clusters: probability proportionate to size (PPS) sampling random sampling within the clusters
Section 3
Section 5 Section 4
Section 3
Section 5 Section 4
Village 1 2 3 4 5 6 7
etc
Multistage - An Example
The president of Supermarkets, Inc. decided to sample purchases at 150 stores in the US. The first stage is to select, on the basis of clustering (save travel time), 15 of the 150 stores. The researcher recommends that cash register files be randomly selected at each of the 15 stores. [second stage] Then select every 20th purchase in a file using a random start. [final stage]
Cluster sampling
Advantages Simple as complete list of sampling units within population not required Much more efficient; less costly Less travel/resources required Disadvantages Cluster members may be more alike than those in another cluster (homogenous).and this dependence needs to be taken into account in the sample size and in the analysis (design effect)
Population of C clusters
Summary..
Simple random sample: Select every unit randomly by using random number table. Systematic random sample: pick a random case from the first k cases of a sample; select every kth case after that one. Stratified random sample: divide a population into groups, then select a simple random sample from each stratum. Cluster sampling: divide the population into groups called clusters or primary sampling units (PSUs); take a random sample of the clusters. Multistage sampling: several levels of nested clusters, often including both stratified and cluster sampling techniques.
Summary..
Summary
Summary
Summary
Summary
Sampling Frame
Non-Response Error Sampling Frame Error Random Sampling Error Total Population
Systematic error (or bias) Representativeness (validity) Information bias Random error (sampling error) Precision
Representativeness (validity)
Sample should accurately reflect distribution of relevant variable in population Person (age, sex) Place e.g. urban vs. rural Time e.g. seasonality Representativeness is essential to generalise
Representativeness (validity)
Information bias
Systematic problem in collecting information Inaccurate measuring Scales, ultrasound, lab tests Badly asked questions Ambiguous, not offering right options
Precision
No sample is exact mirror image of population Random difference between sample and population from which sample drawn is calling precision
Sampling error depends upon size of the sample distribution of character of interest in population Size of error can be measured in probability samples
No precision
Random error
Variability of the population characteristic under investigation Level of confidence desired in the estimate (z) Degree of precision desired in estimating the population characteristic ( e )
The central limit theorem states that given a parameter with mean and variance , the sampling distribution of the sample mean approaches a normal distribution with mean and variance /n This is true even when the distribution of the parameter is not normal. The normal distribution is widely used. Part of its appeal is that it is well behaved and mathematically tractable.
X Q Z! W/n
Somebody calculated all the integrals for the standard normal and put them in a table! So we never have to integrate! Even better, computers now do all the integration.
Area is 93.45%
Problem 1
A study is to be performed to determine a certain parameter in a community. From a previous study a sd. of 46 was obtained. If a sampling error of up to 4 is to be accepted. How many subjects should be included in this study at 99% level of confidence?
n!
2 2 2
Problem 2
It was desired to estimate proportion of anaemic children in a certain preparatory school. In a similar study at another school a proportion of 30 % was detected. Compute the minimal sample size required at a confidence limit of 95% and accepting a difference of up to 4% of the true population.
Z p(1 p) n! e2
n!