Practical Sampling For Impact Evaluations: Cyrus Samii, Columbia University

Cyrus Samii, Columbia University
Practical Sampling for

Impact Evaluations
Introduction
How do we construct a sample to credibly detect a
meaningful effect?
Which populations or groups are we interested in and where do
we find them?
How many people/firms/units should be interviewed/observed
from that population?
How does this affect the evaluation budget?
Warning!
Goal of presentation is not to make you a sampling expert
Goal is also not to give you a headache.
Rather an overview: How do sampling features affect what it is
possible to learn from an impact evaluation?
Outline
1. Sampling frame
What populations or groups are we interested in?
How do we find them?
2. Sample size
Why it is so important: confidence in results
Determinants of appropriate sample size
Further issues
Examples
3. Budgets
Sampling frame
Who are we interested in?
a) All SMEs?
b) All formal SMEs?
c) All formal SMEs in a particular sector?
d) All formal SMEs in a particular sector in a particular region?
Need to keep in mind external validity

Can findings from population (c) inform appropriate programs to help
informal firms in a different sector?
Can findings from population (d) inform national policy?
But should also keep in mind feasibility and what you want to
learn
Might not be possible or desirable to pilot a very broadly defined
program or policy
Sampling frame:
Finding the units were interested in
Depends on size and type of experiment
Lottery among applicants
Example: BDS program among informal firms in a particular area
Can use treatment and comparison units from applicant pool
If not feasible (50,000 get the treatment), need to draw a sample to
measure impact
Policy change
Example: A change in business registration rules in randomly selected
districts
To measure impact on profits, cannot sample all informal businesses in
treatment and comparison districts.
Will need to draw a sample of firms within districts.
Required information before sampling
Complete listing all of units of observation available for sampling in each area or group
Tricky for units like informal firms, but there are techniques to overcome this
Outline
1. Sampling frame
What populations or groups are we interested in
2. Sample size
Further issues
Examples
3. Budgets
Sample size and confidence
Start with a simpler question than program

impact
Say we wanted to know the average annual
profits of an SME in Dakar.
Option 1: We go out and track down 5 business
owners and take the average of their responses.
Option 2: We track down 1,000 business owners and
average their responses.
Which average is likely to be closer to the true

average?
Sample size and confidence:
1,000 firms
5 firms
Profits Number of firms Profits Number of firms
$0 - $1,000 1 $0 - $1,000 70
$ 1,001 -$5,000 2 $ 1,001 -$5,000 150
$5,001-10,000 1 $5,001-10,000 650
$10,001, - $15,000 0 $10,001, - $15,000 125
$15,001 + 1 $15,001 + 5
Similarly, when determining program impact
Need many observations to say with confidence whether
average outcome of treatment group is higher/lower than
in comparison group
What do I mean by confidence?

Minimizing statistical error
Types of errors
Type 1 error: You say there is a program impact when
there really isnt one.
Type 2 error: There really is a program impact but you
cannot detect it.
Type 1 error: Find program impact when theres none
Error can be minimized after data collection, during statistical analysis
Need to adjust the significance levels of impact estimates (e.g. 99% or
95% confidence intervals)
Type 2 error: Cannot see that there really is a program impact

In jargon: statistical test has low power
Error must be minimized before data collection
Best method of doing this: ensuring you have a large enough sample
Whole point of an impact evaluation is to learn something

Ex ante: We dont know how large the impact of this program is
Low powered ex-post: This program might have increased firms
profits by 50% but we cannot distinguish a 50% increase from an
increase of zero with any confidence
Calculating sample size
Theres actually a formula. Dont get scared.

2 2 ( z / 2 z ) 2
N 2 1 ( H 1)
D
Main things to be aware of:

1. Detectable effect size
2. Probability of type 1 and 2 errors
3. Variance of outcome(s)
4. Units (firms, banks) per treated area
Detectable effect size

Smallest effect you want to be able to distinguish from zero
A 30% increase in sales, a 25% decrease in bribes paid
Larger samples easier to detect smaller effects
Do female and male entrepreneurs work similar hours?

Claim: On average, women work 40 hours/week, men work 44
hours/week
If statistic came from sample of 10 women & 10 men
Hard to say if they are different
Would be easier to say they are different if women work 30 hours/week and men
work 80 hours/week
But if statistic came from sample of 500 women and 500 men
More likely that they truly are different
How do you choose the detectable effect

size?
Smallest effect that would prompt a policy
response
Smallest effect that would allow you to say that a
program was not a failure
This program significantly increased sales by 40%.
Great - lets think about how we can scale this up.
This program significantly increased sales by 10%.
Great.uh..wait: we spent all of that money and it only increased
sales by that much?
Type 1 and Type 2 errors

Type 1
Significance level of estimates usually set to 1% or 5%
1% or 5% probability that there is no effect but we think
we found one
Type 2
Power usually set to 80% or 90%
20% or 10% probability that there is an effect but we
cannot detect it
Larger samples higher power
Variance of outcomes
Less underlying variance easier to detect
difference can have lower sample size
Variance of outcomes
How do we know this before we decide our
sample size and collect our data?
Ideal pre-existing data often .non-existent
Can use pre-existing data from a similar
population
Example: Enterprise Surveys, labor force surveys
Makes this a bit of guesswork, not an exact

science
Further issues
1. Multiple treatment arms

2. Group-disaggregated results
3. Take-up
4. Data quality
Further issues
Multiple treatment arms
Straightforward to compare each treatment separately to
the comparison group
To compare treatment groups requires very large samples
Especially if treatments very similar, differences between the
treatment groups would be smaller
In effect, its like fixing a very small detectable effect size
Group-disaggregated results
Are effects different for men and women? For different
sectors?
If genders/sectors expected to react in a similar way, then
estimating differences in treatment impact also requires
very large samples
Who is taller?
Detecting smaller differences is harder
Further issues
Group-disaggregated results
To ensure balance across treatment and comparison
groups, good to divide sample into strata before
assigning treatment
Strata
Sub-populations
Common strata: geography, gender, sector, initial
values of outcome variable
Treatment assignment (or sampling) occurs within
these groups
Why do we need strata?
Geography example
=T
=C
Whats the impact in a particular region?

Sometimes hard to say with any confidence
Random assignment to treatment within

geographical units
Within each unit, will be treatment, will be
comparison.
Similar logic for gender, industry, firm size,
etc
Further issues
Take-up
Low take-up increases detectable effect size
Can only find an effect if it is really large
Effectively decreases sample size
Example: Offering matching grants to SMEs for BDS

services
Offer to 5,000 firms
Only 50 participate
Probably can only say there is an effect on sales with
confidence if they become Fortune 500 companies
Further issues
Data quality
Poor data quality effectively increases required
sample size
Missing observations
Increased noise
Can be partly addressed with field coordinator on
the ground monitoring data collection
Example from Ghana
Calculations can be made in many statistical packages e.g. STATA, OD
Experiment in Ghana designed to increase the profits of microenterprise firms
Baseline profits
50 cedi per month.
Profits data typically noisy, so a coefficient of variation >1 common.
Example STATA code to detect 10% increase in profits:

sampsi 50 55, p(0.8) pre(1) post(1) r1(0.5) sd1(50) sd2(50)
Having both a baseline and endline decreases required sample size (pre and post)
Results
10% increase (from 50 to 55): 1,178 firms in each group
20% increase (from 50 to 60): 295 firms in each group.
50% increase (from 50 to 75): 48 firms in each group (But this effect size not realistic)
What if take-up is only 50%?

Offer business training that increases profits by 20%, but only half the firms do it.
Mean for treated group = 0.5*50 + 0.5*60 = 55
Equivalent to detecting a 10% increase with 100% take-up need 1,178 in each group instead of 295 in
each group
Outline
1. Sampling frame
What populations or groups are we interested in
2. Sample size
Further issues
Examples
3. Budgets
Budgets
What is required?
Data collection
Survey firm
Data entry
Field coordinator to ensure treatment follows

randomization protocol and to monitor data
collection
Data analysis
Budgets
How much will all of this cost?
Huge range. Often depends on
Length of survey
Ease of finding respondents
Spatial dispersion of respondents
Security issues
Formal vs informal firms
Required human capital of enumerator
Et cetera.
Firm-level survey data:$40-350/firm
Household survey data: $40+/household
Field coordinator: $10,000-$40,000/year
Depends on whether you can find a local hire
Administrative data: Usually free
Sometimes has limited outcomes, can miss most of the informal sector
Summing up
The sample size of your impact evaluation will
determine how much you can learn from your
experiment
Some judgment and guesswork in calculations but
important to spend time on them
If sample size is too low: waste of time and money
because you will not be able to detect a non-zero impact
with any confidence
If little effort put into sample design and data collection:
See above.
Questions?

Practical Sampling For Impact Evaluations: Cyrus Samii, Columbia University

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical Sampling For Impact Evaluations: Cyrus Samii, Columbia University

Uploaded by

Copyright:

Available Formats

Cyrus Samii, Columbia University

Practical Sampling for

Need to keep in mind external validity

Start with a simpler question than program

Which average is likely to be closer to the true

What do I mean by confidence?

Type 2 error: Cannot see that there really is a program impact

Whole point of an impact evaluation is to learn something

Theres actually a formula. Dont get scared.

Main things to be aware of:

Detectable effect size

Larger samples easier to detect smaller effects

Do female and male entrepreneurs work similar hours?

How do you choose the detectable effect

Type 1 and Type 2 errors

Makes this a bit of guesswork, not an exact

1. Multiple treatment arms

Whats the impact in a particular region?

Random assignment to treatment within

Example: Offering matching grants to SMEs for BDS

Example STATA code to detect 10% increase in profits:

What if take-up is only 50%?

Field coordinator to ensure treatment follows

You might also like