You are on page 1of 24

ECON1203/ECON2292

Business and Economic


Statistics
Week 3
2
Week 3 topics
Data collection and sampling
Random sampling
Probability distributions
Marginal, conditional & joint distributions
Sampling with & without replacement
Key references
Keller 5.1-5.4, 6.1-6.3
3
Stages of statistical analysis
Define & understand the problem
E.g., firm wants to determine effectiveness of its advertising
Collect data
Analyse data
Use descriptive statistics to organize the data
What are the sales before and after the ads?
Extract information about population parameters
Can conclude that the sales after the ads is higher than the sales
before the ads?
Communicate results
Class project will provide experience in such analyses
4
Data collection
In practice can often find secondary data
Data collected by someone else (e.g., ABS)
possibly for some other purpose
Alternatively could collect primary data
E.g., market researchers using mail survey or
phone interview to determine impact of an ad
Data collection...
In an observational study, the researcher observes the
behaviour of the individuals in the study without trying to
influence the outcome of the study. The data obtained is
called observational data
E.g., a researcher of a pharmacy company wants to
determine the effect of a drug on heart attacks. The
research may select a sample of people and ask if they
have taken the drug in the last 2 years and the number
of incidences of heart attacks they had over the same
period.
5
Data collection...
In an experimental study, the researcher manipulates
the independent (explanatory) variables and tries to
determine how the independent variables influence the
dependent (response) variable. The data obtained is
called experimental data
The researcher may randomly select some people and
divide them into two groups. One group take the drug
regularly (treatment group) and the other dont (control
group). The researcher then observes the number of
incidences of heart attacks in two groups over 2 years.
Statistical methods can be used to determine the
influence of the drug.
6
7
Threats to comparing -
confoundment
Does radiation from mobile phones cause cancer?
An observational study would compare a sample of users
with non-users
If users have a higher incidence of cancer is this
evidence that mobile phones cause cancer?
Note: There is no consistent evidence that this is the case
But even if a relationship was found you need to control
for other factors that might explain the finding
Possible confounding factors?
Higher use of mobile phones in cities where exposure to
other forms of radiation is higher
8
Threats to comparing -
confoundment
Suppose an experimental study of the mobile
phone cancer link is designed
Subjects randomly assigned to
Control group of non-users
Treatment group of users
Wait a few years & observe differences in cancer
rates
Why is group allocation done at random?
9
Data collection
Data collection requires
Definition of target population
Method of sampling
Method of sampling must avoid the problem of bias,
where design of sample systematically favours
certain outcomes
Whats wrong with phone-in polls on talkback radio?
Example of a self-selected sample
Method of simple random sampling: process by
which all samples of the same size n are equally
likely to be chosen from the population of interest
Data collection
Producing a simple random sample
List and number all the individuals in the population of interest from 1 to
N
Use a random number table or computer to randomly generate n
numbers
Stratified random sampling: separate the population into mutually
exclusive sets/stratum, and then draw a simple random sample from
each set/strata
How to draw a sample of households in Australia?
Cluster sample: separate the population into mutually exclusive
sets/clusters, then draw a simple random sample of the clusters
How to draw a sample of the students in todays class?
10
11
Probability
Mathematical means of studying uncertainty
E.g., setting premiums for insurance policies
Provides logical foundation of statistical inference
Making decisions on basis of partial information
E.g., what is probability of sell out at upcoming concert
given data on past concerts, weather forecasts, etc?
Probability review
A random experiment is an action or process that has
several possible outcomes
The set of all possible outcomes is a sample space
An event consists of one or several outcomes
Probability of event E, denoted by P(E), measures the
chance/likeliness that E will occur

12
outcomes possible of number total
E in outcomes of number
) ( = E P
1 ) ( 0 E P
Probability review...
13
Probability review
Often easier to work with probability of the event
complementary to event E, not E, or E
c
P(E
c
) = 1 P(E)
Conditional probability that E occurs, given that F has
is defined by P(EF) = P(E and F) / P(F)
Similarly, P(F |E) = P(E and F) / P(E)
Rearranging yields the multiplication rule
P(E and F) = P(F)P(E |F)
If P(E|F) = P(E), then E & F are said to be independent
i.e., conditioning has no effect
Equivalently, P(F|E) = P(E), or P(E and F) = P(E)P(F)
14
15
SIA: Private health insurance
Abbott defends private health rebate J uly 17, 2006 - 3:16PM
http://www.smh.com.au/articles/2006/07/17/1152988457337.html

Health Minister Tony Abbott has hit out at critics of the Federal Government's private health insurance rebate.

"I think that there has been a consistent line of ideological hostility to the private health insurance rebate from a certain
style of academic and indeed from the Labor Party," Mr Abbott told ABC Radio today. Mr Abbott's comments came
after University of NSW professor Denzil Fiebig released a study of the rebate's effectiveness.

Professor Fiebig claimed the rebate was unsustainable and was no guarantee that people with private health
insurance would use the private health system over the public one.

But Mr Abbott said the rebate took pressure off public hospitals.
"If we didn't have the rebate we'd have a lot more pressure on the public hospital system," Mr Abbott said.

He said Professor Fiebig was part of an ideological group who would never support private health care. Giving people
choice between private and public health care was vital, Mr Abbott said.
16
SIA: Private health
insurance
17
SIA: Private health
insurance
What is the relationship between having private
health insurance (PHI) & being admitted to hospital
as a public or private patient?
Data from ABS 2001 National Health Survey was
used to derive following table of relative frequencies
Admission to hospital in last 12 months
Not admitted Admitted as
public patient
Admitted as
private patient
Totals
Have PHI 40.3 1.5 5.4 47.2
No PHI 45.3 6.6 0.9 52.8
Totals 85.6 8.1 6.3 100.0

18
SIA: Private health
insurance
Treat relative frequencies
as probabilities
Find probability of the
following events:
Have PHI
Have PHI & admitted to
hospital
Admitted as a private
patient given have PHI
Admitted as a public patient
given have PHI
Are admission as private
& have PHI independent
events?
Marginal probability
P(PHI) =0.472


Joint probability
P(PHI and admitted)



=0.015 +0.054
=0.069
Conditional probabilities
P(admitted as private| PHI )


P(admitted as public| PHI )



=0.054/0.472
=0.114

=0.015/0.472
=0.032
Independence
Admission & PHI status are related (dependent)

P(admitted as private| PHI ) P(admitted as private)

19
Auditing example
Lets draw on both topics of probability &
sampling
Background
Local auditing firm services 100 companies
10 of these companies are known to have overdue
accounts
If 2 separate firms are chosen at random
What is probability distribution for number of accounts
overdue?
20
Auditing example
009 . 0
182 . 0 809 . 0 1 overdue) two (
182 . 0
99
90
100
10
99
10
100
90
) and ( ) and ( overdue) one (
809 . 0
99
89
100
90
) | ( ) (
) and ( overdue) none (
account overdue having firm ith denote Let
2 1 2 1
1 2 1
2 1
=
=
=

=
+ =
=

=
=
=
P
e e P e e P P
e e P e P
e e P P
e
C C
C C C
C C
i
21
Sampling
Auditing example illustrates distinction between
sampling with & without replacement
Because problem specified separate firms sampling was
done without replacement
Such sampling induces dependence across events
P(e
2
) depends on type of firm sampled in first draw
Independence & hence random sampling requires
sampling with replacement
Distinction not of practical importance if population is large
Redo auditing example assuming sampling with
replacement
22
Sampling
Events in experiments such as auditing example
may be represented by probability trees
These are diagrams (that resemble trees)
Keller Ex 6.5 looks at selecting 2 students (without
replacement) from 10 students (3 F, 7 M)
23
Auditing example revisted
Assumed probability of an overdue account was
known
Enabled determination of probability distribution of
number of overdue accounts
Consider a different problem
Suppose probability of an overdue account is not known
Given sample of companies auditing firm can determine
sample proportion of these that have overdue accounts
How can this statistic be used to infer something about the
true population proportion of companies with overdue
accounts?
This is an inference problem
24
Progress report #2
In SIA admission distributions
Assign probabilities to qualitative characteristics
Public or private patient
In auditing example
Probabilities assigned to quantitative characteristics
Probabilities of number of overdue accounts
Topic of random variables
Need to introduce theoretical distributions that are
useful in representing/modelling actual data
Initially discrete distribution
Binomial

You might also like