Professional Documents
Culture Documents
Chapter 8
P. 253- 278
Collecting a random sample
Goal: to understand characteristics about a
population
Examples:
Whats the average commuting time for city residents?
Whats the average household income of the patrons of a
particular grocery store?
Whats the average leaf size size of birch trees on August 1
in a particular state park?
What proportion of people in a particular tropical city have
had malaria?
Estimating the mean
One of the most common goals of statistical
inference is estimating a population mean
with a sample mean
Central Limit Theorem
When we have n independent, identically distributed
(X1..Xn) random variables, the mean of those random
variables approaches a normal distribution with mean
= and variance = 2 , as n gets large.
n
Independence of random variables means that the
value of one observation has no effect on the value
of another observation.
0 10 20 30
Mean () = 20
2
Variance ( ) = 100
Std. dev ( ) = 10
We then find the mean of this sample (suppose this mean = 19). Take
another sample of 50 observations and find the mean (suppose its 24).
Do this many times, and well come up with a distribution of means. The
Central Limit Theorem tells us this distribution will always look like the next
slide (as long as n is large, and 50 is large enough):
The normal curve
16 18 20 22 24
x
2
Mean () = 20 Sample size (n) = 50 variance of sample mean = =2
n
Symbols
Population Parameter:
Estimate:
Expected: E ( )
Basic Types of Inference
Point Inference
The value of a population parameter is estimated using a
single value
Examples: mean, standard deviation, etc.
Interval Inference
Attaching a probability to an estimate (i.e., making a
confidence interval)
Population Proportions: P X /n
Population Variance: 2 s2
= 1 0.95 = 0.05
30 30
P (17 1.96 17 1.96 ) 0.95
100 100
So we can say that the 95% C.I. is 17 +/- 5.88 or 11.12, 22.88
Example #1 Questions
What would happen to our interval if we
used a 99% confidence interval
instead?
This means that even when we have s instead of we can use the z-
distribution if n is large
Central Limit Theorem: as n gets large.
What is large?
Rule of thumb: 30
For n less than 30, the distribution of x does not follow the normal
distribution accurately enough.
For this class use the t-distribution any time you have s instead of
Example #2
n = 16
x = 30
s2 = 1600
What is the 95% C.I. for the mean?
Example #2
s = 40
Degrees of freedom = n 1 = 15
t / 2,n 1 t0.05 / 2,161 t0.025,15 2.131 (from the t-table)
s s
P ( X 2.131 X 2.131 ) 0.95
n n
40 40
P (30 2.131 30 2.131 ) 0.95
16 16
This also limits us to using only large samples (in this case n
> 100)
p (1 p ) p (1 p )
p z / 2 p z / 2
n n
Equation for : n
E
2
z / 2 p(1 p )
Equation for : n
E
our equation? n
E
What should zalpha/2 be?
n
z = 1.96 E
E = 100
= 175 1.96 *175
2
n
n = number of days 100
we should sample
2
1.96 *175
n
100
n 11.765
Example #5
A city council election is being held with several
candidates expecting reasonably large returns.
p = 0.45 2
2.58 * 0.45 * 0.55
n = number of n
0 . 005
people we should
sample 2
2.58 * 0.497
n
0.005
n 16310
Class Problem
Given this sample of middle school kid
heights (in inches)
56, 64, 52, 69, 66, 64, 63, 46, 46, 49, 47,
60, 54, 45, 45, 69, 62, 67, 49, 43, 59
s s
P( X 2.845 X 2.845 ) 0.99
n n
8.96 8.96
P(55.95 2.845 55.95 2.845 ) 0.99
21 21
For Monday