Professional Documents
Culture Documents
CHAPTER 7
Statistical
Methods
Descriptive Inferential
Statistics Statistics
Hypothesis
Estimation
Testing
Estimation Process
ESTIMATION
Estimation is the
_ process of using information from the sample
(often using X and s) to make inferences about population
parameters such as µ. This reverses the process examined so
far where we have used information
_
about µ and s to make
probability statements about X .
Unbiasedness
Efficiency
Consistency
Rules and criteria - Bias
An unbiased estimator is one which, on average, will give the right
answer. The expected value of the estimator (i.e. averaged over
many applications of the estimator) is equal to the population
parameter.
( )
E θˆ = θ
The sample mean, variance and proportion are unbiased estimators
of the corresponding population parameters.
() ()
Bias θˆ = E θˆ − θ
Properties of Point Estimators
Usually, if there is an unbiased estimator of a population parameter,
several others exist as well.
x + x + + xn 1 1
E ( x ) = E 1 2 = ( E ( x1 ) + E ( x 2 ) + E ( x n ) ) = ( µ + µ + + µ ) = µ
n n n
Since E(xs) < E( x ), where xs is the smallest sample observation,
(c) is biased downwards.
For (d), E(+1/n) = µ + 1/n > µ, so is biased upwards.
EXAMPLES
_
The sample mean X and sample variance S2 are unbiased
estimators of population mean µ and population variance σ2
respectively.
P (− zα / 2 ≤ Z ≤ + zα / 2 ) = 1 − α
Therefore,
Derivations
_
X −µ
1 −α=P −zα/ 2 ≤ ≤+zα/ 2
σ/ n
σ _
σ
=P −zα/ 2 ≤X −µ≤+zα/ 2
n n
_ σ _
σ
=P X −zα/ 2 ≤µ≤X +zα/ 2
n n
_ σ _ σ
Therefore, the interval X − zα / 2 , X + zα / 2
n n
is a large sample confidence interval for µ with confidence coefficient (1-
α ).
How to construct Confidence
Intervals
Select a confidence level. The confidence level refers to the likelihood that the true
population parameter lies within the range specified by the confidence interval. The
confidence level is usually expressed as a percentage. Thus, a 95% confidence level
suggests that the probability of that the true population parameter lies within the
confidence interval is 0.95.
Compute alpha. Alpha refers to the likelihood that the true population parameter lies
outside the confidence interval. Alpha is usually expressed as a proportion. Thus, if
the confidence level is 95%, then alpha would equal 1 - 0.95 or 0.05.
Identify a sample statistic (e.g, mean, standard deviation) to serve as a point estimate
of the population parameter.
Specify the sampling distribution of the statistic.
Based on the sampling distribution of the statistic, find the value for which the
cumulative probability is 1 – α/2. That value is the upper limit of the confidence
interval.
In a similar way, find the value for which the cumulative probability is alpha/2. That
value is the lower limit of the confidence interval.
Large-Sample Confidence Interval for a
Population Mean
X = µ ± Zσ x
σ x_
X
µ -2.58σ x µ -1.65σ x µ µ +1.65σ x µ +2.58σ x
µ -1.96σ x µ +1.96σ x
90% Samples
95% Samples
99% Samples
Example
Example: Suppose we draw a sample of 100 observations of returns on
the Gini index, assumed to be distributed, with sample mean 4% and
standard deviation 6%.
_
The resulting interval is to be of the form X ± 0.1, with 1-α =0.95.
Thus, B=0.1 and z0.025=1.96. It follows that
Suppose we want to study the mean lifelength of batteries, only the lower
limit of the mean.
_ _
We need to find a statistics g(X ) so that P[µ > g(X ) ]=1- α
_ σ
P X − zα ≤ µ = 1− α
Again using the definition of Z we have n
_
σ
X − zα
Or n
Binomial Distribution – Large Sample CI
for p
Estimating the parameter p for the Binomial distribution is analogous
to the estimation of a population mean.
_
n
Note that Y/n= =∑ i=1 Xi /n = X with Xi having E(Xi)=p and
V(Xi)=p(1-p). From CLT, we have that Y/n is approximately normally
distributed with mean p and variance p(1-p)/n.
Binomial Distribution – Large Sample CI
for p
The confidence interval is constructed in a similar manner as the
confidence interval for µ. The observed fraction of success y/n will
be used as the estimate of p and will be written as p^.
∧ ∧
∧
The CI for p is of the form: p ±zα/ 2 p(1 − p) with
confidence (1-α). n
We have that 1-α = 0.90 and z α/2 = 1.645, y=20 and n=100.
Variability dependent on n,
or sample size.
( n − 1) s 2 ( n − 1) s 2
<σ 2 <
χ n2−1,α / 2 χ n2−1,1−α / 2
Homework
Confidence Intervals – The Multiple Sample Case
Two samples are derived, one from each population and we have the
following:
σ12
E ( X 1 ) = µ1 , V ( X 1 ) =
n1
σ22
E ( X 2 ) = µ2 , V ( X 2 ) =
n2
Independent Populations
E( X1 − X 2 ) = µ1 − µ 2
σ σ 2 2
V ( X1 − X 2 ) = V ( X1) + V ( X 2 ) = + 1 2
n1 n2
Independence
( X 1 − X 2 ) − (µ 1 − µ 2 )
− zα / 2 ≤ ≤ zα / 2
σ 12 / n1 + σ 22 / n2
( X 1 − X 2 ) − zα / 2 σ12 / n1 + σ 22 / n2 ≤ ( µ1 − µ2 )
≤ ( X 1 − X 2 ) + zα / 2 σ12 / n1 + σ 22 / n2
Example
A farm-equipment manufacturer wants to compare the average daily
downtime for two sheet metal stamping machines located in two
different factories. Investigation of company records for 100 randomly
selected days on each of the two machines gave the following results:
n1=100 n2=100
_ _
X 1 = 12 minutes X 2= 9 minutes
=
S12=6 S22=4
3 ± 0.52
That is, we are about 90% confident that µ1- µ2 is between 3-0.52=2.48
and 3+0.52=3.52. This evidence suggests that µ1is larger than µ2.
Normal Distribution – Same Variance – 2 Samples
E ( X 1 − X 2 ) = µ1 − µ 2
σ2 σ2
V ( X1 − X 2 ) = V ( X1 ) + V ( X 2 ) = +
n1 n2
We need a good estimator for the common variance, which should
be a function of the two sample variances, but unbiased.
Variance unknown. σ1 = σ 2
( X 1 − X 2 ) − ( µ1 − µ2 )
~ t n1 +n2 −2
s 2p / n1 + s 2p / n2
( X 1 − X 2 ) − (µ 1 − µ 2 )
− tα / 2,n1 + n2 − 2 ≤ ≤ tα / 2,n1 + n2 − 2
s 2p / n1 + s 2p / n2
( X 1 − X 2 ) − tα / 2,n1 + n2 − 2 s p 1 / n1 + 1 / n2 ≤ ( µ1 − µ 2 )
≤ ( X 1 − X 2 ) − tα / 2,n1 + n2 − 2 s p 1 / n1 + 1 / n2
Confidence Interval for the Difference Between Two
Binomial Proportions
( p1 − p2 ) − ( p1 − p2 )
~ N (0,1)
p1 (1 − p1 ) / n1 + p2 (1 − p2 ) / n2
( p1 − p2 ) − ( p1 − p2 )
− zα / 2 ≤ ≤ zα / 2
p1 (1 − p1 ) / n1 + p2 (1 − p2 ) / n2
Confidence Interval for the Difference Between Two
Binomial Proportions
p ( 1 − p ) p (1 − p 2)
( p1 − p2 ) − zα / 2 1 1
+ 2
≤ ( p1 − p2 )
n1 n2
p (1 − p ) p (1 − p 2)
≤ ( p1 − p2 ) + zα / 2 1 1
+ 2
n1 n2
Example
We want to compare the proportion of defective electric motors turned
out by two shifts of workers. From the large number of motors produce in
a given week, n1=50 motors were selected from the output of shift I, and
n2=40 motors were selected from the output of shift II. The sample from
shift I reveled four to be defective, and the sample from shift II showed
six faulty motors. Estimate the true difference between proportions of
defective motors produced in a 95% confidence interval.
Note. Since the interval contains zero, there does not appear to be any
significant difference between the rates of defectives for the two
shifts. Zero cannot be ruled out as a plausible value of the true
difference between proportions of defective motors.
Confidence Interval for the Ratio of Two Variance
f (F )
s12 σ 12
/ 2 ~ Fn1 −1,n2 −1
s2 σ 2
2
Fα , v,v F
1 2
s12 σ 12
F1−α / 2,n1 −1,n2 −1 ≤ 2 / 2 ≤ Fα / 2,n1 −1,n2 −1
s2 σ 2
s12 1 σ 12 s12 1
( )≤ 2 ≤ 2 ( )
s2 Fα / 2,n1 −1,n2 −1 σ 2 s2 F1−α / 2,n1 −1,n2 −1
2
Homework
DATA iron;
INPUT brand $ dist @@;
CARDS;
A 251.2 B 263.2 C 269.7 D 251.6
A 245.1 B 262.9 C 263.2 D 248.6
A 248.0 B 265.0 C 277.5 D 249.4
A 251.1 B 254.5 C 267.4 D 242.0
A 260.5 B 264.3 C 270.5 D 246.5
A 250.0 B 257.0 C 265.5 D 251.3
A 253.9 B 262.8 C 270.7 D 261.8
A 244.6 B 264.4 C 272.9 D 249.0
A 254.6 B 260.6 C 275.6 D 247.1
A 248.8 B 255.9 C 266.5 D 245.9
;
PROC GLM DATA=iron;
CLASS brand;
MODEL dist=brand;
means brand/cldiff tukey;
RUN;
Alpha 0.05
Error Degrees of Freedom 36
Error Mean Square 21.17503
Critical Value of Studentized Range 3.80880
Minimum Significant Difference 5.5424
Difference
brand Between Simultaneous 95%
Comparison Means Confidence Limits
b. female - This column gives values of the class variable, in our case female. This variable is necessary
for doing the independent group t-test and is specified by class statement.
c. N - This is the number of valid (i.e., non-missing) observations in each group defined by the variable
listed on the the class statement (often called the independent variable).
d. Lower CL Mean and Upper CL Mean - These are the lower and upper confidence limits of the
mean. By default, they are 95% confidence limits.
e. Mean - This is the mean of the dependent variable for each level of the independent variable. On the
last line the difference between the means is given.
f. Lower CL Std Dev and Upper LC Std Dev - These are the lower and upper 95% confidence limits
for the standard deviation for the dependent variable for each level of the independent variable.
g. Std Dev - This is the standard deviation of the dependent variable for each of the levels of the
independent variable. On the last line the standard deviation for the difference is given.
• Std Err - This is the standard error of the mean. It is used in calculating the F-value.
We compare the mean writing score between the group of female
students and the group of male students. Ideally, these subjects
are randomly selected from a larger population of subjects.
Depending on if we assume that the variances for both
populations are the same or not, the standard error of the mean
of the difference between the groups and the degree of freedom
are computed differently.