You are on page 1of 42

1

Confidence Intervals
2
Inferential Statistics

Based on a sample, inferential statistics is all
about making some type of statement
concerning the possible value of the population
parameter
Statements are made in a probabilistic sense,
ie, we can never say I am absolutely sure
that the true value of the population
parameter is .

3
Types of statements
Point Estimate: your best guess as to the value of the
population parameter
Confidence interval
Based on a sample, make a statement like I am 90% sure that
the true value of the population mean is BETWEEN 65 and 72
(here 65 is a lower bound and 72 is an upper bound)
Hypothesis testing
Assume some value for the population parameter, eg, I think the
true mean is at least 85. Then, take a sample and see if the
evidence supports or refutes this claim

4
Remember, we are generally making statements concerning
2 population parameters, the population mean and the
population proportion, and we are going to use the sample
mean and the sample proportion respectively to estimate the
parameters
Parameter Estimate When to use
X When the outcome of each
individual trial has many different
possible outcomes
p p When the outcome of each
individual trial only has 2
possible outcomes
5
Point Estimate
This is the Best Guess of the value of the population
parameter, given your sample information
Recall, the sample mean is normally distributed with a mean
of which means that, on average, the sample mean will be
equal to the true population mean (the sample mean is not a
biased estimator of the population mean).

Therefore, given a sample mean ofX obtained from the
sample this is your point estimate of .

Likewise forp : the sample proportion is the point estimate
for p
6
Point Estimate Example

The signing bonus for 30 new players in the PBL are used
to estimate the mean bonus for all new players. The sample
mean is P130,000 with a standard deviation of P25,500.
What is the point estimate of the mean signing bonus for all
new PBL players?
Answer: the sample mean is P130,000 so this is your
point estimate of the population mean,
Note that this is a mean problem because in our
sample, we are going to look at the signing bonus which
can be a whole bunch of different numbers

7
Point Estimate Example

In a random sample of 100 students at a particular college,
60 indicated that they favored having the option of
receiving a pass-fail grade for elective courses. What is the
point estimate of the proportion of ALL students who favor
having the option of receiving a pass-fail grade?
Answer: the sample proportion is 60/100 = .6 so this is
your point estimate of the population proportion, p
Note that this is a proportion problem because in our
sample, we are going to look at whether a student favors
having the pass/fail option or not there are only 2
outcomes they either favor having the option or they
dont


8
Confidence Interval
A confidence interval consists of a range in which
population parameter may fall and a confidence level
The range is a lower and upper bound between which you
think the population parameter lies
The confidence level is how sure you are that the parameter
in within this range
Interpretation: a 95% confidence interval means that 95% of
similarly constructed intervals will contain the population
parameter




9
Understanding Confidence Intervals
Remember that by the Central Limit Theorem, the sample
mean is normally distributed with a mean . Also, recall
that by the the empirical rule, if a variable is normally
distributed then 95.5% of all possible values lie within 2
standard deviations of the mean.
Therefore, 95.5% of all possible sample means lie within 2
standard deviations of the true mean
Looking at it another way: the population mean lies within 2
standard deviations of any given sample mean
Bottom line: If you haveX and add and subtract 2 standard
deviations from it, this is 95.5% confidence interval

10
Understanding Confidence Intervals
Suppose that you know the value of the true population
number of customers a restaurant has and it is = 85 and
suppose you know the standard deviation of the number of
customers, o = 35. Suppose we take a sample of 100 nights
and calculate the sample mean number of customers the
restaurant has.
The distribution the sample mean is drawn on the next page.
11
Distribution of the Sample Mean
X
5 . 3
100
35
X
= = o
85
X
= =
12
Understanding Confidence Intervals
If we move up 2 standard deviations and down 2 standard
deviations from the true mean of 85 we get a range of
85-2(3.5) and 85+2(3.5)
(78, 92)

The Empirical Rule tells us that 95.5% of all possible sample
means lie within 2 standard deviations of the mean so 95.5%
of all sample means lie within the range (78, 92)
If you go out and take a sample of size 100 and calculate the
sample mean,
You may get a value of 86 that # lies within this range
You may get a value of 89 that # lies within this range
You may get a value of 79 that # lies within this range
You may get a value of 75 that # DOES NOT lie within
this range but it is still a possible sample mean
13
In 95.5% of the samples of size 100, you will get a sample
mean between 78 and 92.
95.5% of the time, you will get a sample mean that is
WITHIN 2 standard deviations of the true mean.
4.5% of the time, you will get a sample mean that is NOT
WITHIN 2 standard deviations of the true mean.

14
Turn this situation around: In 95.5% of the samples of size
100, the true mean is WITHIN 2 standard deviations of the
sample mean
So suppose we take a sample, of size n = 100, and get a
sample mean ofX = 82 and suppose the standard deviation
of the sample is s = 35.
To calculate the 95.5% confidence interval, move 2 standard
deviations below the sample mean and 2 standard deviations
above the sample mean




|
|
.
|

\
|
|
.
|

\
|
+ |
.
|

\
|

100
35
) 2 ( 82 ,
100
35
) 2 ( 82
) 89 , 78 (
15
Important Notes
When we say move 2 standard deviations up and down
from the sample mean, I am talking about 2 STANDARD
DEVIATIONS OFX where

We may not know the true population standard deviation, o,
which we need to know to calculate the standard deviation
of the sample mean, but we can just use the sample standard
deviation, s, as an estimate of o (this is what we did in the
previous problem where we said suppose the sample
standard deviation is 35 (and we used that number to in the
calculation of the sample mean standard deviation)
n
X
o
o =
16
General Confidence Intervals
The most frequently used confidence levels are 80, 90, 95,
and 99%.
Suppose we want to calculate a confidence interval based
on a confidence level of L% (where L could be 80, 90, 95,
99, or any number) (they use 1-o notation in the book)
17
Calculating a Confidence Interval
First, go out and take a sample of size n, and calculate the
sample mean,X , and the standard deviation of the sample,
s
To form the confidence level, add and subtract a certain
number of standard deviations from the sample mean where
a standard deviation is


The number of standard deviations you move is called the
confidence coefficient, Z
L
, and is based on the confidence
level L.

n
s
X
= o
( ) ( ) |
.
|

\
|
+ |
.
|

\
|

n
s
Z X ,
n
s
Z X
L L
18
Confidence Coefficient
If your confidence level is L% then to calculate the
appropriate confidence coefficient
Take L/2
Look this number up as a probability in the standard normal
table (meaning, try to find the number as close to this in the
body of the table because recall that the numbers in the body
of the table are probabilities whereas the numbers on the left
and top are Zs)
Find the Z that this probability corresponds to
This Z is your confidence coefficient
19
Intuition of Confidence Intervals
The basic idea of a confidence interval is that you want to
start with the sample estimate (mean or proportion) and
then move up and down a certain number of standard
deviations so that you cover 95% (or 90% or 99% -
depending on your confidence level) of the area.
The number of standard deviations you have to move
depends on the confidence level. For a 95% confidence
level you must move 1.96 standard deviations up and down
so that 0.4750 is between 0 and 1.96 standard deviations
and so 0.95 (2*0.475) is between 1.96 and 1.96 standard
deviations

20
95% Confidence Interval
0 1.96
4750 . 0 ) 96 . 1 Z 0 ( P = s s
Z
4750 . 0 ) 96 . 1 Z 0 ( P = s s
-1.96
21
Example
Suppose you wanted to find the confidence coefficient for
confidence level of 90%
L = .90
Take 0.90/2 = 0.45
Try to find the number in the body as close to 0.45 as you can
Note that you see a .4495 and a .4505 and these are as close to
0.45 as you can get (it doesnt matter which of these two you
choose, but we will go with the .4495 number)
The .4495 number corresponds to a Z = 1.64 so the confidence
coefficient for a 90% confidence interval is 1.64
If the confidence level is 95% then the confidence coefficient
is 1.96
If the confidence level is 99% then the confidence coefficient
is 2.57 (or 2.58)

22
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Standard-Normal Distribution
23
Example
Dell Publishing samples 48 shipments to estimate the mean
postal cost. The sample mean is $25.36 with a standard
deviation of $4.80. Calculate the 98% confidence interval
for the mean postal cost.
Note that the sample size is > 30 so the Central Limit
Theorem applies and the sample mean is normally distributed
X = 25.36, s = 4.80, n = 48, Z
.98
= 2.33
98% Confidence interval for the mean postal cost is


( ) ( ) |
.
|

\
|
+ |
.
|

\
|

n
s
Z X ,
n
s
Z X
L L
( ) ( )
97 . 26 , 75 . 23
48
80 . 4
33 . 2 36 . 25 ,
48
80 . 4
33 . 2 36 . 25
=
|
.
|

\
|
+ |
.
|

\
|

24
Confidence Intervals for Proportions
Suppose we want to form a confidence interval for the
population proportion, p
Recall, the distribution of the sample proportion


p
n
) p 1 ( p
p

= o
p
p
=
25
The confidence interval for the population proportion is
calculated in a similar manner as that for the population
mean

Where

Note that in calculating the standard deviation of the sample
proportion, we are using our sample proportion,p, and not
the population proportion, p
This is because we DONT KNOW WHAT THE
POPULATION PROPORTION IS WE ARE TRYING TO
FORM AN INTERVAL IN WHICH WE THINK THE
POPULATION PROPORTION IS
So we use our best guess of p which isp
( )( ) ( )( )
p L p L
Z p , Z p o o +
n
) p 1 ( p
p

= o
26
Example
If in a sample of 1200 tourists, 840 plan to repeat their trips
the following year. Calculate the 95% confidence interval of
travelers who expect to repeat their trips
p = 840/1200 = 0.70, n = 1200, Z
.95
(calculated the same
way as with confidence intervals for means) = 1.96, and


The 95% confidence interval for the population proportion is




We are 95% sure that the true proportion of individuals who
expect to repeat their trips is between .674 and .726


0132 . 0
1200
) 70 . 0 1 ( 70 . 0
p
=

= o
( )( )
p L
Z p o
( )( ) ( )( )
726 . 0 , 674 . 0
0132 . 0 96 . 1 70 . 0 , 0132 . 0 96 . 1 70 . 0
=
+
27
Margin of Error or Sampling Error
Margin of Error or Sampling Error is the distance between
the estimate and the true parameter
Margin of Error = | estimate parameter|

As a byproduct of a confidence interval, we can also calculate
our maximum margin of error (once again, this maximum
is in a probabilistic sense for example, we are 95% sure that
our maximum margin of error is a certain amount)
Confidence interval:


Margin of error:
( )( )
p L
Z p o
28
Margin of Error
Suppose the true proportion is 0.71
If your estimate is .69 then your margin of error is .02
If your estimate is .74 then your margin of error is .03
With a population proportion of 0.71, the standard deviation
of the sample proportion is (assume a sample size, n = 200)

Note that 90% of the sample proportions are within 1.64
standard deviations of the true proportion
90% are between 0.71-(1.64)(0.032) and 0.71+(1.64)(0.032)
90% are between 0.658 and 0.763
The margin of error
032 . 0
200
) 71 . 0 1 ( 71 . 0
p
=

= o
29
Note that 90% of the sample proportions are within 1.64
standard deviations of the true proportion
90% are between 0.71-(1.64)(0.032) and 0.71+(1.64)(0.032)
90% are between 0.658 and 0.763
The margin of error is less than (1.64)(.032) = 0.053 for 90%
of the sample proportions

If you form a 90% confidence interval, then you are also
90% sure that your margin of error is no larger than the part
that you add and subtract from your estimate.




30
Calculating the Appropriate Sample Size: Means
In some applications, you may want to know how large a
sample is necessary in order for your margin of error to be
no larger than a certain amount
Recall, the standardized value of x is found as


For the sample mean

Rearranging, and solving for n yields


Where Z is determined by the confidence level, s is the standard
deviation of the sample, andX - is the maximum margin of
error


x
x
x
Z
o

=
n
s
X X
Z
X
X
X

o

=

=
2
2 2
) X (
s Z
n

=
31
Calculating the Appropriate Sample Size: Means
As a restaurant owner, you need to decide how much food
to prepare each night. In a sample of 100 nights, the mean
number of customers is 85 with a standard deviation of 38.
How large must your sample be if you want to be 99% sure
that your sample error is no larger than 4?
X - = 4, s = 38, Z = 2.57

Therefore, plugging these numbers in to the formula from the
last page



Interpretation: You have to take a sample of 569 in order for
you to be 99% sure that your maximum margin of error is no
larger than 4 customers


569
4
) 38 ( ) 57 . 2 (
) X (
s Z
n
2
2 2
2
2 2
= =

32
Calculating the Appropriate Sample Size: Proportions
In some applications, you may want to know how large a
sample is necessary in order for your margin of error to be
no larger than a certain amount
Recall, the standardized value of x is found as

For the sample proportion:

Note that we have usedp in calculating the standard deviation
of the sample proportion.


x
x
x
Z
o

=
n
) p 1 ( p
p p
p
Z
p
p

=
o

33
Calculating the Appropriate Sample Size: Proportions
Rearranging, and solving for n yields


Where Z is determined by the confidence level, andp - p is the
maximum margin of error
Problem: we are trying to figure out how large a sample to take
so we havent taken a sample yet so we dont know what p
is
Solution:
Use 0.5 for the value of p
Or take a pilot survey/sample (which is like a preliminary
sample) to get an estimate of what estimate you are likely
to get when you take your real sample

2
2
) p p (
) p 1 ( p Z
n


=
34
Calculating the Appropriate Sample Size: Proportions
In a survey, CNN wants to estimate the proportion of
Americans who plan to travel this Christmas. If they want
to be 95% that their estimate is off by no more than 2%,
how many people must they survey?
They wantp - p = .02, and to be 95% sure they need to move
1.96 standard deviations (so Z = 1.96), therefore, using the
formula on the previous page



Interpretation: if CNN takes a sample of 2401 then they can
be 95% sure that their estimate is no more than 2% from the
true population proportion


2401
) 02 (.
5 . 96 . 1
n
2
2 2
= =
35
Small Sample Confidence Intervals
The previous section on constructing confidence intervals is
valid if the sample size is > 30
The Central Limit Theorem only applies when the sample
size is > 30.
If the sample size is < 30 then the sample mean is not
approximately normally distributed, but instead has a
STUDENT-T distribution
The Student-t distribution looks like the normal distribution, but
it has more area in the tails of the distribution

The ONLY difference when constructing confidence intervals
for small samples versus constructing them for large samples is
that for small samples, use a t number instead of a Z number.
36
In the formula


Use a t number instead of a Z
L
number

( ) ( ) |
.
|

\
|
+ |
.
|

\
|

n
s
Z X ,
n
s
Z X
L L
37
Choosing a t
Which t number should you use?
There is a different t-distribution for every degree of
freedom
DEGREES OF FREEDOM = n-1
The numbers in the body of the t-distribution table are
STANDARD DEVIATIONS not probabilities
If your sample size is 28 then you have 28-1=27 degrees of
freedom.
As the degrees of freedom get larger, the t-distribution starts
to look EXACTLY like the normal distribution



38
If you want to form a 90% confidence level (and you have
27 degrees of freedom) you want to choose the column that
is headed by .05 and the 27 degrees of freedom row
You should find a number of 1.703
t numbers are subscripted by the degrees of freedom and how
much area is beyond a certain point
In this example, we would have t
27,.05
= 1.703
We will also start doing this with our Z numbers as
well subscripting the Zs with the amount of area
BEYOND a certain point
If we had a LARGE sample and wanted to form a 95%
confidence interval, we would use a Z
.05
= 1.96
Notice that the column headings are the areas in the TAILS of
the t-distribution
Therefore, there is .5-area in the upper tail between 0 and a
given number of standard deviations



39
Interpretation: for a t-distribution with 27 degrees of
freedom, you need to move 1.703 standard deviations from
the mean to have .45 between 0 and 1.703
And therefore, .90 between 1.703 and 1.703
See the next slide for a graph
Note: If you have a small sample, you cannot form any
general confidence intervals of any given confidence level
You can only look up confidence levels of 60%, 80%, 90%,
95%, 98%, and 99%

40
90% Confidence Interval with Small Samples
05 . 0 ) t 703 . 1 ( P = s
0 1.703
45 . 0 ) 703 . 1 t 0 ( P = s s
Student-t
45 . 0 ) 703 . 1 t 0 ( P = s s
-1.703
41
Finding t-Numbers
Suppose you have 21 observations and you want to form a
95% confidence interval
t
20,.025
= 2.086
Suppose you have 30 observations and you want to form a
80% confidence interval
t
29,.10
= 1.311
Suppose you have 16 observations and you want to form a
99% confidence interval
t
15,.005
= 2.947
42
Example
Dell Publishing samples 18 shipments to estimate the mean
postal cost. The sample mean is $25.36 with a standard
deviation of $4.80. Calculate the 98% confidence interval
for the mean postal cost.
Note that the sample size is < 30 so the Central Limit
Theorem DOES NOT apply so the sample mean has a student
t distribution
X = 25.36, s = 4.80, n = 18, t
17,.01
= 2.567
98% Confidence interval for the mean postal cost is


( ) ( ) |
.
|

\
|
+ |
.
|

\
|

n
s
t X ,
n
s
t X
01 ,. 17 01 ,. 17
( ) ( )
26 . 28 , 46 . 22
18
80 . 4
567 . 2 36 . 25 ,
18
80 . 4
567 . 2 36 . 25
=
|
.
|

\
|
+ |
.
|

\
|

You might also like