You are on page 1of 64

Outline About the Course Introduction to Sampling Census vs.

Sampling Probability Sampling

Probability and Statistics II

Karthik Sriram
Indian Institute of Management Ahmedabad
karthiks@iima.ac.in

August 12, 2017

1/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Outline

1 About the Course

2 Introduction to Sampling

3 Census vs. Sampling


Nonsampling Errors
Sampling Bias and Sampling Error

4 Probability Sampling
Estimation of Proportion
Estimation of Population Mean

2/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Course Content

WELCOME!
Statistics is the science (and art!) of data analysis.

PS I (QM1a) basic probability is the foundation required for basic Statistics.

PS II (QM1b) will introduce Statistics. We will cover basic ideas under 3 topics

(a) Sampling and Estimation

(b) Testing of Hypothesis

(c) Linear Regression Modeling

3/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Examples for the 3 topics

(a) Example: Sampling and Estimation

What is the average salary for a graduating IIMA student ?

4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Examples for the 3 topics

(a) Example: Sampling and Estimation

What is the average salary for a graduating IIMA student ?

(b) Example: Hypothesis Testing

Is the average salary for boys statistically different from that of girls ?

4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Examples for the 3 topics

(a) Example: Sampling and Estimation

What is the average salary for a graduating IIMA student ?

(b) Example: Hypothesis Testing

Is the average salary for boys statistically different from that of girls ?

(c) Example: Linear Regression

Can Salary be modeled as a function of CGPA, gender, past years of experience ?

4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Examples for the 3 topics

(a) Example: Sampling and Estimation

What is the average salary for a graduating IIMA student ?

(b) Example: Hypothesis Testing

Is the average salary for boys statistically different from that of girls ?

(c) Example: Linear Regression

Can Salary be modeled as a function of CGPA, gender, past years of experience ?

4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Type of Data Studies

Such an enquiry typically involves design and collection of data followed by


analysis.

Such studies can be of two types

(a) Experimental Study

(b) Observational Study

5/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Experimental Study

Experimental Study
This typically involves an intervention at the time of data collection (treatment).

e.g. To study the effect of drug on head ache, create a control and treatment group,
administer placebo on control group and drug on the treatment group, and then
measure/compare the effect on both groups.

Experiment is usually designed in such a way (e.g. by ensuring different type of


individuals are represented in both groups etc) so that the impact can be attributed to
the treatment.

Experimental study is desirable when we want to establish causality.

6/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Observational Study

Observational Study
We observe the data as-is without any intervention.

For example, to study the relation between the salary, CGPA, number of years of past
experience, we just collect data on these variables from a sample of students and
analyze.

To draw causal inferences from such data is challenging; so one should not fall into
that trap while drawing inference. e.g. suppose you find that the average salary for
girls is higher than boys.... so that does not mean being a girl leads to higher
salary...there may be many other factors at play!

In this course, we mainly focus on inferences with observational data, we will look at
associations but not causality.

Causal inference with observational data needs advanced econometrics techniques


along with a good domain/contextual knowledge. 7/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Course Evaluation.

1. Quizzes: 40% (open book/notes)

2. Class Participation: 10%


Attendance + answering my questions

You are welcome to ask questions or present view points but thats not part of CP.

3. Exam : 50% (open book / notes)

8/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

What you need to do..

1. Gain familiarity with topics from text and slides before the class.

2. Pay attention in class and take notes of class discussions.

3. Slides will be shared with you.

Slides + Notes from class will ensure good understanding of content.

4. Work through exercises + practice problems by instructors will reinforce ideas.

4. Do not hesitate to seek help. Email is the easiest way (karthiks@iima.ac.in).

9/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Needless to say....

1. Do not come late to class... (you may be marked absent and/or asked to leave)

2. Do not leave class in between ..(you may be marked absent)

3. Ensure Minimum Attendance (13 of 15 sessions)

4. Avoid unnecessary distraction in class and do not disturb me/others.


No Email, text message, FB, Whatsapp, Twitter etc.

10/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Motivating Examples for Sampling

Consider the following questions

1. What percentage of businesses in India enabled non-cash payments post


November 8 2016?

2. What was the extent of under-reporting in tax among tax payers in India in FY
2016-17?

3. What was the average household expenditure during the Diwali month last year in
Ahmedabad?

Sampling and Estimation can help answer such questions.

11/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Some terms

To formulate an approach for these questions, the following ideas are essential

1 Population

2 Parameter (also referred to as Population Parameter)

3 Sampling Unit

4 Sampling Frame

5 Element

6 Variable

12/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Illustration of some terms

13/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Definition of terms

1 Population: collection of elements of interest, e.g. residents of a city.

2 Element: entity from which data is taken, e.g. individual in a house.

3 Sampling Unit: A unit that can be sampled, e.g. household

4 Sampling Frame: List of sampling units, e.g. list of households i = 1, 2, . . . , N.

14/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Definition of terms

1 Population: collection of elements of interest, e.g. residents of a city.

2 Element: entity from which data is taken, e.g. individual in a house.

3 Sampling Unit: A unit that can be sampled, e.g. household

4 Sampling Frame: List of sampling units, e.g. list of households i = 1, 2, . . . , N.

5 Variable: The quantity that is recorded for the sampling unit,


e.g. Xi = expenditure of house i.

6 Parameter: A particular summary measure of the population.


e.g. Mean = Average expenditure per household,

X1 + X2 + . . . XN
usually denoted by X = .
N
14/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Question

Question on Variable
Suppose we are estimating the percentage of households with atleast one senior
citizen.

What would be Xi and what would be X ?

15/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Approaches to Estimate

Two approaches can be used to estimate the parameter of interest.

1. Census: Collect data from every sampling unit in the frame and compute the
exact value.

Census is subject to possible Non-Sampling Errors.

16/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Approaches to Estimate

Two approaches can be used to estimate the parameter of interest.

1. Census: Collect data from every sampling unit in the frame and compute the
exact value.

Census is subject to possible Non-Sampling Errors.

2. Sample Survey or Sampling: Collect data only on a representative subset of


units.

Sampling is subject to Non sampling Errors as well as Sampling Bias and


Sampling Error.

16/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Non Sampling Errors: Measurement Error

Question 1: Measurement Error


Which of these is a better approach to determine how many students in a government
school can read and write Gujrati?

(a) Ask the pricipal to provide you the number of such kids in the school.

(b) Conduct a reading and writing proficiency test for the students and determine.

17/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Nonsampling Errors: Measurement Error

Example 2: Measurement Error


Is there anything wrong with this type of question ?

Do you like congress to win the next election and Rahul Gandhi to become the PM ?

18/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Nonsampling Errors: due to Bias in Response

Question: Bias
Is there any problem with this question?

Do you think the Bangalore city corporation is justified in cutting down beautiful
green trees to construct the high speed expressway that will perhaps only cater to
VIPs?

19/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Nonsampling Errors: Bias due to Nonresponse

Question: Bias due to nonresponse


Is there any problem with this approach?

To find out how many students in IIMA like a particular mobile App, an email with a
link to download the App is sent to all IIMA students and responses are sought.

20/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Nonsampling Errrors in a Census: in Summary

Controlling nonsampling errors in a large exercise like census is challenging.


Lot of units to cover and lot of personnel involved (difficult to train and ensure
consistency).

Nonsampling errors, when aggregated over a large number of units, can lead to
grossly wrong results.
A classic example is the opinion poll during Landon versus Roosevelt election in
1936.

21/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Sampling Illustration

22/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Sampling : some notations

In sampling, we would like to draw a (representative) subset of units from the


sampling frame.

representative : no particular bias should exist in selecting a sample.

Units in the population (i.e. sampling units) are denoted by i = 1, 2, 3, ..., N.

Values for X for units in the population are denoted by X1 , X2 , ..., XN (i.e. in
CAPS).

23/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Nonsampling Errors

Sampling : some notations (ctd)

We will denote sampled units as i = 1, 2, 3, . . . , n


being mindful that these units can be any of the units from the sampling frame.
e.g. First sampled unit may happen to be unit 100 in the sampling frame.

n is called the sample size.

Sampled values for the variable are denoted x1 , x2 , . . . , xn (i.e. small letters)

(X bar hat)= x =
Estimate of Population-Mean X , is denoted X x1 +x2 +...+xn
.
n

24/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Sampling Bias and Sampling Error

Sampling Bias

Question: Sampling Bias


Is there anything wrong with this approach?

A sample of incoming PGP students are surveyed to determine which part of the
selection process CAT or Interview is more enjoyable.

25/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Sampling Bias and Sampling Error

Sampling Error

Question: Sampling Error


A bag contains some green and some white pebbles. You want to find out what
percentage of the bag is green.

Is the following conclusion correct?


2
I randomly draw 5 pebbles, 2 out 3 turn out to be green. So, I conclude that 3
of the
bag is green pebbles.

26/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Sampling Bias and Sampling Error

Sampling Error

Even if the sample is representative of the population,


there is still one more type of error, which is unavoidable!.

Sampling Error arises because the estimate is based on a subset of the population
and not the entire population.

27/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Sampling Bias and Sampling Error

Types of Sampling: Question

Question: Types of Sampling


Match each type of Sampling on the left with the correct example on the right.

Type of Sampling Example

1. Convenience Sampling a To survey a sample of Hyper-city customers, you stand at the


store entrance and reach out to people who enter.
b To determine main features to consider while buying a car,
2. Judgement Sampling
you pick a sample from a list of people who bought cars
recently.

c To determine the number of people per house in a remote


village who use mosquito nets, you start with a randomly
3. Systematic Sampling
selected house and then visit every 5th house along a
randomly chosen direction.

d To find out how many PGP students use contact lenses, you
procure a list of all the (say) 1000 students from PGP office.
4. Probability Sampling
Randomly choose 100 students from the list.
28/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Sampling Bias and Sampling Error

Types of Sampling: Idea


Type of Sampling Idea

1. Convenience Sampling a Based on ease of reach or convenience.

2. Judgement Sampling b Uses expert judgement to determine what is representative.

3. Systematic Sampling c Random sampling but not every set of n units has equal
chance of making it to the sample of size n. So if units are
not homogenously distributed, there can be a bias.
e.g. you may end up choosing only corner houses.

4. Probability Sampling
d Sampling based on a pre-determined probability
distribution.e.g. In Simple Random Sampling, equal
probability assigned to every possible sample. This is the
most effective way to obtain a representative sample.
29/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Sampling Bias and Sampling Error

Summary: Census vs Sampling

Census : subject to Nonsampling erorrs , difficult to control due to size of the


study.

Sampling
(i) Nonsampling errors/biases easier to manage due to smaller size.
(ii) Sampling Errors can be controlled in Probability Sampling by determining
appropriate sample size.

Probability sampling is the most preferred approach to sampling.

Convenience, Judgement or Systematic sampling may not be avoidable.


When used, the assumption is that they can be treated approximately as a
probability sample.

30/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Simple Random Sampling

Simple Random Sampling (SRS): A probability sampling where every sample of


size n has equal probability of getting selected.

SRS is the simplest form of Probability Sampling.

It forms the basis for many other more complex sampling designs (e.g. Stratified
sampling, Cluster sampling etc)

In this course, we will learn to work with

(a) SRS with Replacement (SRSWR)

(b) SRS without Replacement (SRSWOR)

31/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Choosing a Simple Random Sample: Steps

Suppose sampling frame is {1, 2, . . . , N}

Step 1 Choose a random number between 1 to N

Step 2 for SRSWR, repeat Step 1 until n units are selected

Note: here a unit can occur more than once in the sample.

Step 2 for SRSWOR, repeat step 1, but discard the sampled unit if it is already
in the sample so far. Go on until n distinct units are chosen.

32/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Choosing a Simple Random Sample: Steps + Packages

Suppose sampling frame is {1, 2, . . . , N}

Step 1 Choose a random number between 1 to N

In excel: use randbetween(1,N), in R: sample(c(1:N), size=1)

Step 2 for SRSWR, repeat Step 1 until n units are selected

Note: here a unit can occur more than once in the sample.

Step 2 for SRSWOR, repeat step 1, but discard the sampled unit if it is already
in the sample so far.
Go on until n distinct units are chosen.

In R: sample(c(1:N), size=n, replace=FALSE)


33/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

A Side Note on Packages

We will use some packages during this course: mostly Minitab , Excel

R is a very powerful and useful software. It is not necessary for this course.

It will be useful if you learn.

For anyone interested, I have created a presentation on Introduction to R,


available on my webpage.

https://www.iima.ac.in/web/faculty/faculty-profiles/karthik-sriram
Click on Teaching, see under some Teaching Material.

34/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

SRS : some probability questions

1. In SRSWR, what is the probability of getting any particular sample of size n?

2. In SRSWOR, what is the probability of getting any particular sample of size n?

3. In SRSWR, what is the probability that any particular unit (say unit 1) is present
a sample of size n?

4. In SRSWOR, what is the probability that any particular unit (say unit 1) is
present a sample of size n?

35/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

SRS Estimation

In SRS, we will mainly consider two problems

a) Estimation of Proportion (p) : e.g. % people supporting a party in a city.

b) Estimation of Mean ( or X ) e.g. Average household expenditure in a city.

36/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion

Project Description Summary of Outcome

Smart phone usage for in-store shopping


Google along with M/A/R/C research,
conducted a survey in the U.S to study the
impact of smart phones on in-store shopping.
They surveyed a random sample of 1,507
smart-phone owners who use mobile devices for
shopping, who completed a 3 part survey for a
shopping trip. Questions of interest would
include for e.g. When do they use it, before
going to the store, to search a store? do they
use it inside the store while shopping? etc.

Assume the number of smart-phone owners is


200000.
courtsey: Google Projects
http://www.thinkwithgoogle.com

37/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion

Project Description Answer the following questions.

Smart phone usage for store shopping What is the population here ?
Google along with M/A/R/C research,
conducted a survey in the U.S to study the
impact of smart phones on in-store shopping. What is the sampling frame ?
They surveyed a random sample of 1,507
smart-phone owners who use mobile devices for
shopping, who completed a 3 part survey for a How many units N are in the sampling
shopping trip. Questions of interest would frame ?
include for e.g. When do they use it, before
going to the store, to search a store? do they
use it inside the store while shopping? etc. Note that they are using a sample. What
is the sample size n?
Assume the number of smart-phone owners is
200000.
What assumption are they making about
the sample?

38/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion

Answer the following questions Summary of Outcome

Name any two population parameters of


interest.

What is their estimate for % of smart


phone shoppers?

39/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion

Answer the following questions Summary of Outcome

How do you think the estimate of % of


smart phone shoppers (p) is computed?

(a) What possible values can each xi take


for any sampled unit i ?

(b) What is the mathematical formula for


p, the estimator of p, in terms of xi ?

40/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion

Answer the following questions Summary of Outcome

Is this statement correct ? Why or Why


not?.

We can conclude that 79% of smart


phone owners are smart phone shoppers
?

41/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Point-Estimator /Point-Estimate

p is called the Point-Estimator.

The particular value for p obtained in the chosen sample is called Point-Estimate.

Note that some books may use the terms inter-changeably!

Note that value of p we get depends on the particular sample obtained. Here,
Point-Estimate for p=79%.

42/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Point-Estimator /Point-Estimate

p is called the Point-Estimator.

The particular value for p obtained in the chosen sample is called Point-Estimate.

Note that some books may use the terms inter-changeably!

Note that value of p we get depends on the particular sample obtained. Here,
Point-Estimate for p=79%.

If the sampling had been repeated once again, we would have chosen different set
of 1507 owners, and hence its very likely that a different value for p would have
been obtained.

So, if we consider all possible samples, we get a whole collection of values of p,


with different probabilities of occurence.

42/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Sampling Distribution

Note that the Point-Estimator is a random variable, because the value it takes
depends on a random outcome, i.e. the random sample that is drawn.

Sampling Distribution
The collection of all possible values for the Point Estimator, that can potentially occur
when we draw a sample of size n, along with the likelihood of their occurance (either
probability density or probability mass function), is referred to as the Sampling
Distribution of the Point-Estimator.

43/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Sampling Distribution of p in SRSWR

(
1, with probability p
Note that x1 Bernoulli(p), i.e., x1 .
0, with probability (1-p)

Due to SRSWR, x1 , x2 , . . . , xn are i.i.d (independent and identically distributed)


Bernoulli(p)

Question: so, t = x1 + x2 + . . . + xn follows what distribution ?

44/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Concept of Bias

Since the Estimator is a random variable, we can talk about its expected value, i.e.
the mean value of the estimator as per the sampling distribution.

Bias of the Point Estimator


Bias is defined as the difference between the Expected value of the estimator and the
population parameter we are trying to estimate. We say the estimator is unbiased if
Bias=0, i.e. when E (p) = p.

Bias(Point Estimator) = E [Point Estimator] Parameter .

45/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Bias for p in SRSWR

Due to SRSWR, x1 , x2 , . . . , xn are i.i.d Bernoulli(p).

1 Pn
E (p) = n i=1 E [xi ] = p

Bias(p) = E (p) p = 0

In SRSWR, p is an unbiased estimator for p.

46/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Concept of standard error

Standard Error (SE)


This is a measure of accuracy the estimator in estimating the parameter of interest. It
is defined as:
q
SE(Point Estimator) = E [(Estimator parameter )2 ]

For the point estimator, p, SE is computed as


p
SE(p)=Standard Error of p to estimate p = E [(p p)2 ].

47/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Concept of standard error

Standard Error (SE)


This is a measure of accuracy the estimator in estimating the parameter of interest. It
is defined as:
q
SE(Point Estimator) = E [(Estimator parameter )2 ]

For the point estimator, p, SE is computed as


p
SE(p)=Standard Error of p to estimate p = E [(p p)2 ].

Note that since E [p] = p, SE(p) = standard deviation SD(p).

47/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion

Answer the following questions Summary of Outcome

What is the standard error of the


estimator (assuming SRSWR)?

What is the standard error of the


estimator (assuming SRSWOR)?

48/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

A note on sampling in infinite populations

Typically, for finite populations, sampling is done without replacement.

However, note that if N is large WR and WOR will be the same!


q
Nn
because N1
1 , as, N

Populations can be infinite ! e.g. Wind Speeds in a particular location.

There is no Sampling Frame to do SRS for infinite populations, but the random
sample needs to be selected so that
(a) Every unit that is selected comes from the same population
(b) Each unit is selected independently.

49/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Estimation of Proportion


Answer the following questions Summary of Outcome

What is the standard error of the


estimate (assuming SRSWOR)?

q q
Nn p(1p)
SE (p) = N1 n

q q
2000001507 .79(1.79)
SE (p) = 2000001 1507 =

SE (p) =0.0105.

Note since E [p] = p, Standard Error and Standard Deviation are the same. In general,
an estimator for a population parameter may be biased. For the problem of estimating
proportions, it happens to be unbiased.
50/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Exercise: Determining Sample Size based on condition on SE

Standard deviation of the estimate


(assuming SRSWOR):

q q
Nn p(1p)
SE (p) = N1 n

Could we have chosen n prior to the


survey so that SE (p) <= = 0.01 ?

51/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Proportion

Summary: Estimation of Proportion (p)

Two sampling designs: SRSWR and SRSWOR.

For large N, the two are approximately the same.

Point Estimator(p) is a random variable. Its distribution is called Sampling


Distribution.

For both SRSWR and SRSWOR, p = x is unbiased, i.e. E [p] = p

q q q
p(1p) Nn p(1p)
For SRSWR SE (p) = n
, for SRSWOR, SE (p) = N1
n
.

Using the formulas for SE, we can choose n while planning the survey, to control
sampling error to a desired level of accuracy. This is possible only when sampling
is planned using probability sampling.

52/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

An unusual Example of Sampling: Crowd Estimation

In the past, street protests have lead to military coup in Egypt. Sizes of crowds
determine the strength of such protests.
Crowds are often estimated based on pictures of aerial view of crowds!.

Figure: Tahrir Square on Novemeber 27 2012: Protests against Mohd Morsi

How can sampling be done here ?

53/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Population Mean

Exercise: Estimating Mean

Exercise: Estimating Mean


An income-tax audit is planned in a business district of a city consisting of 100
businesses. The auditor has a record of the income filed by each business. The
objective is to determine how much under or perhaps over-reporting exists.

Since audit is a time consuming process, the auditor plans to take an SRSWOR
sample of 5 businesses. She wants to answer following questions

(a) What proportion of businesses are under-reporting?

(b) What is the extent of under-reporting ?

(c) What is the estimate of total loss to Incom-tax department from these businesses?

(d) It is also required that a measure of accuracy be reported for these estimates.

54/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Population Mean

Exercise: Population

Table: Population Values

unit Reported Audited unit Reported Audited


1 2038642 3315379 26 1999788 3108352
2 1913064 2940391 27 1897488 3321842
3 2182653 2500547 28 2248574 2395434
4 2358678 3175269 29 1867883 2656837
5 2199321 3305205 30 2033325 2880150
6 2221498 2787269 31 1889359 3002753
7 1944383 2859098 32 2019636 3157186
8 2203841 3069309 33 1770667 3494016
9 2009087 3242680 34 1750015 3115390
10 2315156 3238158 35 1959578 3131504
11 2043658 2900762 36 2035500 2487454
12 1790693 2999451 37 2261529 2862292
13 1942262 2908269 38 1731413 2855181
14 2096310 3009974 39 2172217 3132036
15 1756725 3457691 40 2046575 2965062
16 2061627 2723674 41 2208451 3386484
17 1895964 2757380 42 2265820 3293434
18 1911537 2666675 43 2189192 3159985
19 1880137 3186752 44 1802875 2987870
20 2258916 2696655 45 1756827 2891414
21 2167078 3181502 46 1674159 2929422
22 1886797 2892519 47 2018463 3334932
23 2157684 2966856 48 1995577 3261660
24 1766814 3257874 49 2102696 3187304
25 1893836 3001743 50 1954089 3144499
55/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Population Mean

Exercise: Estimation of Mean

Do the following

Table: Population Values


Variable X = Audited - Reported
unit Reported Audited unit Reported Audited
1 2038642 3315379 26 1999788 3108352
2 1913064 2940391 27 1897488 3321842
3 2182653 2500547 28 2248574 2395434
Population values X1 , X2 , ..., XN , 4 2358678 3175269 29 1867883 2656837
N = 50. 5 2199321 3305205 30 2033325 2880150
6 2221498 2787269 31 1889359 3002753
7 1944383 2859098 32 2019636 3157186
8 2203841 3069309 33 1770667 3494016
9 2009087 3242680 34 1750015 3115390
Pretend Population values can be 10 2315156 3238158 35 1959578 3131504
11 2043658 2900762 36 2035500 2487454
known only for the chosen sample 12 1790693 2999451 37 2261529 2862292
13 1942262 2908269 38 1731413 2855181
14 2096310 3009974 39 2172217 3132036
15 1756725 3457691 40 2046575 2965062
16 2061627 2723674 41 2208451 3386484
Parameter = X , to be 17 1895964 2757380 42 2265820 3293434
estimated. 18
19
1911537
1880137
2666675
3186752
43
44
2189192
1802875
3159985
2987870
20 2258916 2696655 45 1756827 2891414
21 2167078 3181502 46 1674159 2929422
22 1886797 2892519 47 2018463 3334932
Draw SRSWOR: x1 , ..., xn , 23 2157684 2966856 48 1995577 3261660
24 1766814 3257874 49 2102696 3187304
n = 5. 25 1893836 3001743 50 1954089 3144499

56/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling

Estimation of Population Mean

Exercise: Estimation of Mean

Do the following

Table: Population Values


Compute the point estimator for
X unit Reported Audited unit Reported Audited
1 2038642 3315379 26 1999788 3108352
2 1913064 2940391 27 1897488 3321842
3 2182653 2500547 28 2248574 2395434
4 2358678 3175269 29 1867883 2656837
Compute its standard error. 5 2199321 3305205 30 2033325 2880150
6 2221498 2787269 31 1889359 3002753
7 1944383 2859098 32 2019636 3157186
8 2203841 3069309 33 1770667 3494016
9 2009087 3242680 34 1750015 3115390
10 2315156 3238158 35 1959578 3131504
11 2043658 2900762 36 2035500 2487454
12 1790693 2999451 37 2261529 2862292
13 1942262 2908269 38 1731413 2855181
14 2096310 3009974 39 2172217 3132036
15 1756725 3457691 40 2046575 2965062
16 2061627 2723674 41 2208451 3386484
17 1895964 2757380 42 2265820 3293434
18 1911537 2666675 43 2189192 3159985
19 1880137 3186752 44 1802875 2987870
20 2258916 2696655 45 1756827 2891414
21 2167078 3181502 46 1674159 2929422
22 1886797 2892519 47 2018463 3334932
23 2157684 2966856 48 1995577 3261660
24 1766814 3257874 49 2102696 3187304
25 1893836 3001743 50 1954089 3144499

57/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II

You might also like