Professional Documents
Culture Documents
Sample proportion
2
(
y
y
)
i
n 1
population proportion
estimates
Confidence Intervals
A confidence interval (CI) is an interval of numbers
believed to contain the parameter value.
The probability the method produces an interval that contains
the parameter is called the confidence level. Most studies use
a confidence level close to 1, such as 0.95 or 0.99.
Most CIs have the form
/ n (1 ) / n
Finding a CI in practice
Complication: The true standard error
/ n (1 ) / n
by se
n
n
Example: What percentage of 18-22 yearold Americans report being very happy?
Recent GSS data: 35 of n = 164 say they are very happy
(others report being pretty happy or not too happy)
.10
.05
.01
/2
.050
.025
.005
z/2
1.645
1.96
2.58
z ( se) with se (1 ) / n
z-value such that prob. for a normal dist within z standard errors of
mean equals confidence level
(e.g., z=1.96 for 95% confidence, z=2.58 for 99% conf.)
With n for most polls (roughly 1000), margin of error usually
about 0.03 (ideally)
Method requires large n so sampling distribution of sample
proportion is approximately normal (CLT) and estimate of true
standard error is decent
In practice, ok if at least 15 observations in the category of
interest and at least 15 observations not in it.
(ex. n=164, 35 very happy, 164 35 = 129 not very happy,
condition satisfied)
se (1 ) / n 0.0(1.0) / 20 0.000
95% CI for population proportion is 0.0 1.96(0.0), or (0.0, 0.0)
Better CI method (due to Edwin Wilson at Harvard in 1927, but
not in most statistics books):
Rather than estimating the standard error, figure out values for
which
| | 1.96 (1 ) / n
Example: for n = 20 with 0,
solving the quadratic equation this gives for provides solutions
0 and 0.16, so 95% CI is (0, 0.16)
Agresti and Coull (1998) suggested using ordinary CI,
estimate z(se), after adding 2 observations of each type.
This simpler approach works well even for very small n
(95% CI has same midpoint as Wilson CI);
Example: 0 vegetarians, 20 non-vegetarians
Change to 2 veg, 22 non-veg, and then we find
Thus,
P ( 1.96 y y 1.96 y ) .95
We can be 95% confident that the sample mean
s
se
n
95% confidence interval for :
s
y 1.96( se), which is y 1.96
n
This works ok for large n, because s then a good estimate of
(and CLT applies). But for small n, replacing by its
estimate s introduces extra error, and CI is not quite wide
enough unless we replace z-score by a slightly larger t-score.
Part of a t table
df
1
10
16
30
100
infinity
90%
t.050
6.314
1.812
1.746
1.697
1.660
1.645
Confidence Level
95%
98%
t.025
t.010
12.706
31.821
2.228
2.764
2.120
2.583
2.042
2.457
1.984
2.364
1.960
2.326
99%
t.005
63.657
3.169
2.921
2.750
2.626
2.576
se obtained as
se s / n 7.157 / 17 1.736
= 2.865,
s = 2.617
Multiple choice:
a. We can be 95% confident the sample mean is between 2.69
and 3.04.
b. 95% of the population watches between 2.69 and 3.04 hours
of TV per day
c. We can be 95% confident the population mean is between
2.69 and 3.04
d. If random samples of size 899 were repeatedly selected, in
the long run 95% of them would contain
= 2.865
Solution
(instead of n = 1067)
Its easier to estimate a population proportion as the value
gets closer to 0 or 1 (close election, near 0.50, difficult)
Better to use approx value for rather than 0.50 unless you
have no idea about its value.
n (1 )
z
n
M
2
z
2 1.96
n 7
2
M
47
yP
(
y
)
(
y
)
P( y) 1
0
= 1,
2
Possible samples
(equally likely)
(0, 0)
(0, 2)
(2, 0)
(2, 2)
Mean of estimates
( yi y )2
n
0
1
1
0
0.5
( yi y ) 2
n 1
0
2
2
0
1.0
( yi ) 2
n
1
1
1
1
1.0