You are on page 1of 38

0

2. Statistical Inference: Single


Population Mean and Proportion
(Review)
ECON 251
Research Methods
1
Descriptive statistics: calculating summary characteristics of data.
Inferential statistics: Using sample summary measures to estimate
population characteristics.
In descriptive
statistics we
summarize the data
from a population or
a sample of it.
Data on population is
NOT available. We
take a sample and
use its summarizing
measures to estimate
the unknown
population
characteristics.
Population
Characteristics
are unknown
Sample:
Find
summarizing
measures
Inference
Descriptive Statistics
Inferential Statistics
population
Summarize the
data
sample
Summarize
the data
Descriptive Statistics vs. Inferential Statistics
2
Statistical Inference Review
There are two procedures for making inference
Hypothesis Testing (HT) and Estimation
In estimation, we attempt to estimate the value of the parameter in
either of two ways:
Point Estimator
A point estimator draws inference about a population by estimating the value of
an unknown parameter using a single value or a point.
Interval Estimator
An interval estimator draws inference about a population by estimating the
value of an unknown parameter using an interval.
We use intervals so we can be precise about our degree of certainty regarding
the sample statistics proximity to the population parameter.
HT involves testing a specific belief about the value of the parameter
HT concepts are the foundation for estimation as well, so we begin
there.
3
4 Steps For Hypothesis Testing
Find the p-value
(P-value method)
Set up
alternative &
null hypotheses
Step 1
Calculate the
test statistic
Step 2
Find critical values
(Rejection region method)
Step 3
Make a decision
Step 4
4
Step One: Set up alternative & null hypotheses
The purpose of hypothesis testing is to determine whether
there is enough statistical evidence in favor of a certain
belief about a population parameter.
There are two hypotheses (about a population parameter(s))
H
0
- the null hypothesis [for example, H
0
: = 5]
H
1
- the alternative hypothesis [for example, H
1
: > 5]

5
Step One: Set up alternative & null hypotheses
The alternative hypothesis is most important, it is what you
are trying to prove. Always start by stating the alternative
first.
The alternative can involve >, < or
The alternative establishes whether the test is one-tailed or
two-tailed.
The alternative establishes the location of the rejection
region(s).
Once you have correctly defined the alternative, the null is
easy to establish.
We always assume the null is true, therefore H
0
MUST
contain =, and may contain , .
6
n
x
z
o

=
n
s
x
t

=
n
p p
p p
z
) 1 (

=
Step Two: Calculating test statistics

Population Mean w/ Sigma known



Population Mean w/ Sigma unknown



Population Proportion
7
Step Two: Calculating test statistics
The standardization formulas provide the test statistic.
They convert our sample statistic from the sampling
distribution to the standardized distribution (t or z in this
case).
There are millions of sampling distributions. Rather than
knowing everything about every one of those distributions, we
standardize our statistic thereby moving it from the sampling
distribution and placing it on the standardized distribution.
We know everything there is to know about the standardized
distribution. Because the test statistic is on the standardized
distribution, we can compare the test statistic to a critical
value, or the area associated with the test statistic (p-value) to
alpha.
8
Step Three: Find critical value or p-value
You need to decide which method you are going to use to
make your decision.
If you are doing the calculations by hand, you will frequently
use the rejection region (critical value) method.
The critical value will either be given to you (exams, in class
examples) or you would find it in excel (NORMSINV, TINV).
P-value method will frequently be used when you are using
software to do your calculations, as most programs provide
these values. You can also find them in excel (NORMSDIST,
TDIST).
In the latter case, be sure you can identify the p-value
graphically as well.
9
Decision Rule: rejection region (critical value) method
Reject H
0
if the test statistic is more extreme than the critical value
Given the significance level (probability of type I error) = o
Two sided alternative
One sided (upper tail) alternative
One sided (lower tail) alternative
Rejection region
o
z
Critical value
Critical value
Rejection region
o
z
In case of t distribution we will have & respectively.
2
o
t
o
t
Rejection region
Rejection region
Critical values
2
o
z
2
o
z
10
Decision Rule: p-value method
P-value is "the amount of evidence in favor of the alternative hypothesis. The
smaller the p-value, the more evidence in favor of the alternative (and the more
likely you will reject H
0
). P-value is most commonly compared to o of 5% for
Reject/DNR decision:
Reject H
0
if the p-value is smaller than the significance level
Each p-value/2
-|t
m
| |t
m
|
p-value = the area to the
right of t
m

t
m
t
m
Two sided alternative
One sided (upper tail) alternative
One sided (lower tail) alternative
(t
m
=test statistic; Same holds true for |Z
m
| & Z
m
)
p-value = the area to the
left of t
m

11
Three steps to finding the p-value from a graph.
1. Find the test statistic
2. Draw an arrow from the test statistic to the extreme end of nearest
rejection region
3. If a two-tailed test, do this on the opposite side of the distribution as
well.
The area of the graph which has an arrow through it, is the p-value.
Try showing the p-value graphically in these 4 examples. In each case, assume
that the critical value is 3.2:

H
0
: = 5; H
0
: = 5; H
0
: = 5; H
0
: = 5;
H
1
: > 5 H
1
: > 5; H
1
: < 5 H
1
: 5
Test stat = 7 Test stat = 3 Test stat = 3 Test stat = 7
Using the p-value is the most common method of making your decision as
most computer software provides this value. However, you must graph your
distribution before making a final determination.
12
Step Four: Make your decision
Make one of the following two conclusions based on the test:

Reject the null hypothesis in favor of the alternative
hypothesis.
There ___ enough evidence to infer that the alternative is true

Do not reject the null hypothesis in favor of the alternative
hypothesis.
There _______ enough evidence to infer that the alternative
is true
13
H
0
is true H
0
is false
DNR H
0
Reject H
0

States of Nature
1- o
Type I error, o,
Significance level
1- |,
power of test
Type II error, |
Errors
Two types of errors are possible when making a
decision:

Type I error - reject H
0
when H
0
is true.
Type II error - do not reject H
0
when H
0
is false.
14
Analogy: Hypothesis testing is similar to a jury trial
Assume innocent until proven guilty
Assume H
0
is true until proven otherwise
Court either finds defendant guilty (Reject H
0
) or not guilty (DNR H
0
)
Courts do not prove a person innocent (Accept H
0
); rather if not guilty
just not enough evidence to prove guilty; similarly, if we DNR H
0
, we are
not saying H
0
is true, only that there is not enough evidence for us to
believe otherwise.
Level of proof required to establish guilty verdict? What if you convict
an innocent person?
Identical to establishing significance level of test.
Type I error (o) is equivalent to convicting an innocent person. We focus
on o, rather than worry about a Type II error (|) releasing a guilty
person.
Beyond a reasonable doubt is court of law norm
15
Errors
It would be desirable to reduce both types of errors at the
same time. But this is NOT possible.
There is a trade off between o and |. As we try to decrease
o, | will increase and vice versa.
Because the consequences of a Type I error are in most
circumstances considered to be of greater concern than a
Type II error (sending innocent person to jail is worse than
letting a guilty person go), we focus on controlling the size of
the Type I error.
16
Errors
Standard in statistics varies depending upon the issue at
stake:
______________ evidence = 1% significance level
__________ evidence = 1.001-5% significance level
__________ evidence = 5.001-10% significance level
__________ evidence = 10.001% or higher significance level

Unless stated specifically to the contrary, assume we are
using o = .05 in all problems.
17
#1 A Nielsen survey estimated in the year 2000 that the mean
number of hours of television viewing per household was
7.25 hours per day. The survey involved 250 households. The
sample data had a standard deviation of 2.5 hours per day. In
1990, it was determined that the population mean of viewing
per household was 6.70 hours per day. Has TV viewing
increased since 1990?
(t
249,0.005
=2.596, t
249,0.01
=2.34, t
249,0.025
=1.9695, t
249,0.05
=1.651);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)

ExamplesHypothesis Testing
18
Hypothesis Testing 4 Step Solution
Identify the alternative and null hypotheses.
H
0
:
H
1
:
Calculate the test statistic



Find the critical value or p-value.
Z
0.05
=
Make the decision
_______ H
0
in favor of the alternative. There is ___________
proof that TV viewing has increased since 1990.
=

=
n
x
Z
o

19
#2 The owners of Subway claim that their stores average
$875,000 in annual sales. You used this information in
deciding to open a store in Delaware. Your store, however,
has not come even close to these annual sales figures. You
want to prove that you were misled, and that the average
figure for all stores is actually less than 875,000. You collect
annual sales figures from 70 randomly selected stores. The
average in your sample turns out to be $856,000, with a
standard deviation of $24,000. You also know from a friend
who is in management of a similar franchise, that you can
count on the standard deviation of sales being $28,000. Can
you prove your claim?
(t
69,0.005
=2.649, t
69,0.01
=2.382, t
69,0.025
=1.995, t
69,0.05
=1.667);
(t
70,0.005
=2.648, t
70,0.01
=2.381, t
70,0.025
=1.994, t
70,0.05
=1.667);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)
20
#3 Your company is considering opening a retail store in
Fairbanks Alaska, but will only do so if average daily
spending per capita is higher there than in the rest of the
country. According to recent data, the average US
household spends $90 per day. A sample was taken in
Fairbanks. From a sample of 49, the average daily
expenditure was $84.50, and the standard deviation was
$14.50. Should you open a store in Fairbanks? You have a
lot riding on this decision, you need to be sure of your
conclusion.
(t
48,0.005
=2.68, t
48,0.01
=2.41, t
48,0.025
=2.01, t
48,0.05
=1.68);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)

21
#4 Microsoft Outlook is believed to be the most widely used e-
mail manager. A Microsoft executive claims that Microsoft
Outlook is used by more than 75% of Internet users. A Merrill
Lynch study involving 300 respondents, reported that 72%
use Microsoft Outlook. Is there enough evidence here to
disprove the executives claim?
(t
299,0.005
=2.592, t
299,0.01
=2.339, t
299,0.025
=1.968, t
299,0.05
=1.65);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)

22
#5 A fast-food restaurant plans a special offer that will enable
customers to purchase specially designed drink glasses
featuring well-known cartoon characters. If more than 15%
of the customers will purchase the glasses, the special offer
will be implemented. A preliminary test has been set up at
several locations, and 88 of 500 customers purchased the
glasses. Should the special offer be introduced?
(t
87,0.005
=2.634, t
87,0.01
=2.37, t
87,0.025
=1.988, t
87,0.05
=1.663);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)

23
#6 For a new newspaper to be financially viable, it has to
capture more than 12% of the Toronto market. In a survey
conducted among 400 randomly selected prospective
readers, 58 participants indicated they would subscribe to
the newspaper. Can the publisher conclude that the
proposed newspaper will be financially viable at the 10%
significance level?
(t
57,0.005
=2.665, t
57,0.01
=2.39, t
57,0.025
=2.00, t
57,0.05
=1.67);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645, z
0.1
=1.282)
24
Confidence Interval Estimation 4 Steps
Confidence interval estimation relies on the same concepts
and relationships as does hypothesis testing. A simple four
step approach to these problems can also be helpful.
1. We begin by calculating the point estimate from our sample
data.
2. To establish the appropriate interval width, find the upper
and lower limits on the standardized distribution associated
with your confidence level.
3. Use the confidence interval formulas to place them on the
sampling distribution.
4. Place the sample statistic at the center of the interval and
the confidence interval is complete.

25

Population Mean w/ Sigma known



Population Mean w/ Sigma unknown



Population Proportion

n
z x
o
o 2 /

n
s
t x
2 / o

n
p p
z p
)

1 (

2 /

o
Confidence Interval Formulas
26
1 o is the confidence level associated with the interval
Sample statistic is used as the center of the interval
W: width of the interval; 2 x W: total length of the interval
UCL (Upper Confidence Limit) and LCL (Lower Confidence Limit)
are found using the critical value associated with o/2
Confidence interval width for mean is a function of:
use of t distribution or z distribution
level of confidence chosen (positively related)
o of the sampling distribution (positively related)
sample size (negatively related)
Population parameter can lie outside of interval in fact, we
know it will o % of the time
If interested in establishing a confidence interval of a specific width
and level of confidence, calculate the least number that is required
to be in your sample to achieve your objective ahead of time.
27
#7 As a new Subway franchisee, you are estimating your expected annual
sales. You have annual sales figures from 70 randomly selected stores.
The average in your sample turns out to be $856,000, with a standard
deviation of $24,000. The population standard deviation is 28,000. You
want a 90% and 95% confidence interval around your estimate.
(t
69,0.005
=2.649, t
69,0.01
=2.382, t
69,0.025
=1.995, t
69,0.05
=1.667);
(t
70,0.005
=2.648, t
70,0.01
=2.381, t
70,0.025
=1.994, t
70,0.05
=1.667);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)

Example Confidence Interval Estimation
28
Confidence Interval Estimation 4 Steps
1. Calculate the point estimate.

2. Find the upper and lower limits on the standardized
distribution associated with your confidence level.
For 1o = 90%; Z
0.05
=
3. Use the confidence interval formulas to place the upper
and lower limits on the sampling distribution.


4. Place the point estimate at the center of the interval

= x
=
n
z x
o
o 2 /
29
Using CI to decide hypothesis tests:
If you have calculated a confidence interval, and then decide
you also want to test a hypothesis with this information, you
can do so directly provided:
The hypothesis being tested is two-tailed
The o from the hypothesis test, and 1o from the confidence
interval total 1.0
If these two conditions hold, then determine whether the
hypothesized value in the null hypothesis for the parameter
falls in the interval created. If it does, DNR H
0
. If it does not
Reject H
0
.

30
ExampleUsing CI to decide a Hypothesis Test
#8 The owners of Subway claim that their stores average $875,000
in annual sales. You used this information in deciding to open a
store in Delaware. Your store, however, has not come even close
to these annual sales figures. You want to prove that you were
misled, and that the average figure for all stores is actually NOT
875,000. You collect annual sales figures from 70 randomly
selected stores. The average in your sample turns out to be
$856,000, with a standard deviation of $24,000. You also know
from a friend who is in management of a similar franchise, that
you can count on the standard deviation of sales being $28,000.
Can you prove your claim?
(t
69,0.005
=2.649, t
69,0.01
=2.382, t
69,0.025
=1.995, t
69,0.05
=1.667);
(t
70,0.005
=2.648, t
70,0.01
=2.381, t
70,0.025
=1.994, t
70,0.05
=1.667);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)
31
Sample sizes required to construct intervals of a certain
degree of confidence and width can be determined by using
one of the following formulas below:

Sample Size for Means

Sample Size for Proportions
a priori idea of

no a priori idea of
2
2 2
2 /
w
z
n
o
o
=
2
2 /
)

1 (

(
(


=
W
p p z
n
o
2
2 2
2 /
) 5 (.
w
z
n
o
=
p

Estimating n for Confidence Intervals


32
When involving mean:
Use sample standard deviation from previous study as o
Use a pilot study to obtain a standard deviation (o)
Use judgment, or best guess
When involving proportion:
Use best estimate if confident of a reasonable value for
You have an a priori value for sample proportion
Use 0.5 as
You have no a priori value for sample proportion
p

Estimating n for Confidence Intervals


33
Example Estimating n for Confidence Intervals
#9 The interval you have created for your Subway is a good
start, but you would be more comfortable with a tighter
range for your estimate of sales. You decide that the
maximum you can tolerate is +/- 2,500. What sample size
would you need to collect to obtain a 90% confidence
interval for annual sales with a width of 2,500?
(t
69,0.005
=2.649, t
69,0.01
=2.382, t
69,0.025
=1.995, t
69,0.05
=1.667);
(t
70,0.005
=2.648, t
70,0.01
=2.381, t
70,0.025
=1.994, t
70,0.05
=1.667);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)
= =
2
2
2
2 /
w
z
n
o
o
34
ExamplesConfidence Intervals
#10 The Environmental Protection Agency (EPA) has agreed to
give tax rebates to manufacturers of vehicles that get a
combined city and highway gas mileage of at least 32
mpg. A 49 car sample of a new Ford vehicle reveals a
mean of 32.6 mpg. It is believed that the highway gas
mileage for Ford vehicles has a standard deviation of 0.78
mpg.
(t
48,0.005
=2.68, t
48,0.01
=2.41, t
48,0.025
=2.01, t
48,0.05
=1.68);
(z
0.005
=2.58, z
0.01
=2.33, z
0.025
=1.96, z
0.05
=1.645)
Construct a 95% confidence interval. Then a 99%
confidence interval
35
#11 Redo example #10, but this time, the standard deviation of
the mpg for the 49 cars is 0.83, and there is no credible
information regarding the population standard deviation of
mpg for these vehicles.
Construct a 95% confidence interval. Then a 99%
confidence interval.

36
#12 Suppose we have made an interval estimation for the
mean of the population such as: [126.56, 192.41]. If we
realize that the true population mean is 195.7, what should
we conclude?
The procedure for interval estimation must have been done
incorrectly.
We should first standardize the LCL and UCL and then see
if they capture the mean.
The procedure can still be valid, since we allow for a certain
amount of error.
We must use a t distribution instead of a z distribution.
We could never get this result.

37
#13 A major news source conducted a poll asking 814 adults to
respond to a series of questions about their feelings toward
the state of affairs within the United States. A total of 562
adults responded yes to the question: Do you feel things
are going well in the United States these days?
A) What is the point estimate of the proportion of the adult
population that feel things are going well in the United
States?

B) What is the 90% confidence interval for the proportion of
the adult population that feels things are going well in the
United States?

C) If one wanted to be 95% certain, and have an interval no
wider than 3%, what sample size would be required?

You might also like