Professional Documents
Culture Documents
Karthik Sriram
Indian Institute of Management Ahmedabad
karthiks@iima.ac.in
1/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Outline
2 Introduction to Sampling
4 Probability Sampling
Estimation of Proportion
Estimation of Population Mean
2/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Course Content
WELCOME!
Statistics is the science (and art!) of data analysis.
PS II (QM1b) will introduce Statistics. We will cover basic ideas under 3 topics
3/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Is the average salary for boys statistically different from that of girls ?
4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Is the average salary for boys statistically different from that of girls ?
4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Is the average salary for boys statistically different from that of girls ?
4/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
5/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Experimental Study
Experimental Study
This typically involves an intervention at the time of data collection (treatment).
e.g. To study the effect of drug on head ache, create a control and treatment group,
administer placebo on control group and drug on the treatment group, and then
measure/compare the effect on both groups.
6/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Observational Study
Observational Study
We observe the data as-is without any intervention.
For example, to study the relation between the salary, CGPA, number of years of past
experience, we just collect data on these variables from a sample of students and
analyze.
To draw causal inferences from such data is challenging; so one should not fall into
that trap while drawing inference. e.g. suppose you find that the average salary for
girls is higher than boys.... so that does not mean being a girl leads to higher
salary...there may be many other factors at play!
In this course, we mainly focus on inferences with observational data, we will look at
associations but not causality.
Course Evaluation.
You are welcome to ask questions or present view points but thats not part of CP.
8/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
1. Gain familiarity with topics from text and slides before the class.
9/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Needless to say....
1. Do not come late to class... (you may be marked absent and/or asked to leave)
10/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
2. What was the extent of under-reporting in tax among tax payers in India in FY
2016-17?
3. What was the average household expenditure during the Diwali month last year in
Ahmedabad?
11/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Some terms
To formulate an approach for these questions, the following ideas are essential
1 Population
3 Sampling Unit
4 Sampling Frame
5 Element
6 Variable
12/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
13/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Definition of terms
14/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Definition of terms
X1 + X2 + . . . XN
usually denoted by X = .
N
14/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Question
Question on Variable
Suppose we are estimating the percentage of households with atleast one senior
citizen.
15/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Approaches to Estimate
1. Census: Collect data from every sampling unit in the frame and compute the
exact value.
16/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Approaches to Estimate
1. Census: Collect data from every sampling unit in the frame and compute the
exact value.
16/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
(a) Ask the pricipal to provide you the number of such kids in the school.
(b) Conduct a reading and writing proficiency test for the students and determine.
17/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
Do you like congress to win the next election and Rahul Gandhi to become the PM ?
18/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
Question: Bias
Is there any problem with this question?
Do you think the Bangalore city corporation is justified in cutting down beautiful
green trees to construct the high speed expressway that will perhaps only cater to
VIPs?
19/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
To find out how many students in IIMA like a particular mobile App, an email with a
link to download the App is sent to all IIMA students and responses are sought.
20/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
Nonsampling errors, when aggregated over a large number of units, can lead to
grossly wrong results.
A classic example is the opinion poll during Landon versus Roosevelt election in
1936.
21/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
Sampling Illustration
22/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
Values for X for units in the population are denoted by X1 , X2 , ..., XN (i.e. in
CAPS).
23/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Nonsampling Errors
Sampled values for the variable are denoted x1 , x2 , . . . , xn (i.e. small letters)
(X bar hat)= x =
Estimate of Population-Mean X , is denoted X x1 +x2 +...+xn
.
n
24/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Sampling Bias
A sample of incoming PGP students are surveyed to determine which part of the
selection process CAT or Interview is more enjoyable.
25/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Sampling Error
26/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Sampling Error
Sampling Error arises because the estimate is based on a subset of the population
and not the entire population.
27/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
d To find out how many PGP students use contact lenses, you
procure a list of all the (say) 1000 students from PGP office.
4. Probability Sampling
Randomly choose 100 students from the list.
28/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
3. Systematic Sampling c Random sampling but not every set of n units has equal
chance of making it to the sample of size n. So if units are
not homogenously distributed, there can be a bias.
e.g. you may end up choosing only corner houses.
4. Probability Sampling
d Sampling based on a pre-determined probability
distribution.e.g. In Simple Random Sampling, equal
probability assigned to every possible sample. This is the
most effective way to obtain a representative sample.
29/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Sampling
(i) Nonsampling errors/biases easier to manage due to smaller size.
(ii) Sampling Errors can be controlled in Probability Sampling by determining
appropriate sample size.
30/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
It forms the basis for many other more complex sampling designs (e.g. Stratified
sampling, Cluster sampling etc)
31/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Note: here a unit can occur more than once in the sample.
Step 2 for SRSWOR, repeat step 1, but discard the sampled unit if it is already
in the sample so far. Go on until n distinct units are chosen.
32/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Note: here a unit can occur more than once in the sample.
Step 2 for SRSWOR, repeat step 1, but discard the sampled unit if it is already
in the sample so far.
Go on until n distinct units are chosen.
We will use some packages during this course: mostly Minitab , Excel
R is a very powerful and useful software. It is not necessary for this course.
https://www.iima.ac.in/web/faculty/faculty-profiles/karthik-sriram
Click on Teaching, see under some Teaching Material.
34/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
3. In SRSWR, what is the probability that any particular unit (say unit 1) is present
a sample of size n?
4. In SRSWOR, what is the probability that any particular unit (say unit 1) is
present a sample of size n?
35/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
SRS Estimation
36/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
37/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
Smart phone usage for store shopping What is the population here ?
Google along with M/A/R/C research,
conducted a survey in the U.S to study the
impact of smart phones on in-store shopping. What is the sampling frame ?
They surveyed a random sample of 1,507
smart-phone owners who use mobile devices for
shopping, who completed a 3 part survey for a How many units N are in the sampling
shopping trip. Questions of interest would frame ?
include for e.g. When do they use it, before
going to the store, to search a store? do they
use it inside the store while shopping? etc. Note that they are using a sample. What
is the sample size n?
Assume the number of smart-phone owners is
200000.
What assumption are they making about
the sample?
38/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
39/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
40/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
41/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
Point-Estimator /Point-Estimate
The particular value for p obtained in the chosen sample is called Point-Estimate.
Note that value of p we get depends on the particular sample obtained. Here,
Point-Estimate for p=79%.
42/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
Point-Estimator /Point-Estimate
The particular value for p obtained in the chosen sample is called Point-Estimate.
Note that value of p we get depends on the particular sample obtained. Here,
Point-Estimate for p=79%.
If the sampling had been repeated once again, we would have chosen different set
of 1507 owners, and hence its very likely that a different value for p would have
been obtained.
42/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
Sampling Distribution
Note that the Point-Estimator is a random variable, because the value it takes
depends on a random outcome, i.e. the random sample that is drawn.
Sampling Distribution
The collection of all possible values for the Point Estimator, that can potentially occur
when we draw a sample of size n, along with the likelihood of their occurance (either
probability density or probability mass function), is referred to as the Sampling
Distribution of the Point-Estimator.
43/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
(
1, with probability p
Note that x1 Bernoulli(p), i.e., x1 .
0, with probability (1-p)
44/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
Concept of Bias
Since the Estimator is a random variable, we can talk about its expected value, i.e.
the mean value of the estimator as per the sampling distribution.
45/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
1 Pn
E (p) = n i=1 E [xi ] = p
Bias(p) = E (p) p = 0
46/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
47/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
47/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
48/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
There is no Sampling Frame to do SRS for infinite populations, but the random
sample needs to be selected so that
(a) Every unit that is selected comes from the same population
(b) Each unit is selected independently.
49/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
q q
Nn p(1p)
SE (p) = N1 n
q q
2000001507 .79(1.79)
SE (p) = 2000001 1507 =
SE (p) =0.0105.
Note since E [p] = p, Standard Error and Standard Deviation are the same. In general,
an estimator for a population parameter may be biased. For the problem of estimating
proportions, it happens to be unbiased.
50/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
q q
Nn p(1p)
SE (p) = N1 n
51/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Estimation of Proportion
q q q
p(1p) Nn p(1p)
For SRSWR SE (p) = n
, for SRSWOR, SE (p) = N1
n
.
Using the formulas for SE, we can choose n while planning the survey, to control
sampling error to a desired level of accuracy. This is possible only when sampling
is planned using probability sampling.
52/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
In the past, street protests have lead to military coup in Egypt. Sizes of crowds
determine the strength of such protests.
Crowds are often estimated based on pictures of aerial view of crowds!.
53/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Since audit is a time consuming process, the auditor plans to take an SRSWOR
sample of 5 businesses. She wants to answer following questions
(c) What is the estimate of total loss to Incom-tax department from these businesses?
(d) It is also required that a measure of accuracy be reported for these estimates.
54/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Exercise: Population
Do the following
56/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II
Outline About the Course Introduction to Sampling Census vs. Sampling Probability Sampling
Do the following
57/57
Karthik Sriram Indian Institute of Management Ahmedabad karthiks@iima.ac.in
Probability and Statistics II