Professional Documents
Culture Documents
In statistics, estimation refers to the process by which one makes inferences about a population,
based on information obtained from a sample.
Confidence Intervals
Statisticians use a confidence interval to express the precision and uncertainty associated with a
particular sampling method. A confidence interval consists of three parts.
A confidence level.
A statistic.
A margin of error.
The confidence level describes the uncertainty of a sampling method. The statistic and the
margin of error define an interval estimate that describes the precision of the method. The
interval estimate of a confidence interval is defined by the sample statistic + margin of error.
For example, suppose we compute an interval estimate of a population parameter. We might
describe this interval estimate as a 95% confidence interval. This means that if we used the same
sampling method to select different samples and compute different interval estimates, the true
population parameter would fall within a range defined by the sample statistic + margin of error
95% of the time.
Confidence intervals are preferred to point estimates, because confidence intervals indicate (a)
the precision of the estimate and (b) the uncertainty of the estimate.
Confidence Level
The probability part of a confidence interval is called a confidence level. The confidence level
describes the likelihood that a particular sampling method will produce a confidence interval that
includes the true population parameter.
Here is how to interpret a confidence level. Suppose we collected all possible samples from a
given population, and computed confidence intervals for each sample. Some confidence intervals
would include the true population parameter; others would not. A 95% confidence level means
that 95% of the intervals contain the true population parameter; a 90% confidence level means
that 90% of the intervals contain the population parameter; and so on.
Margin of Error
In a confidence interval, the range of values above and below the sample statistic is called the
margin of error.
For example, suppose the local newspaper conducts an election survey and reports that the
independent candidate will receive 30% of the vote. The newspaper states that the survey had a
5% margin of error and a confidence level of 95%. These findings result in the following
confidence interval: We are 95% confident that the independent candidate will receive between
25% and 35% of the vote.
Note: Many public opinion surveys report interval estimates, but not confidence intervals. They
provide the margin of error, but not the confidence level. To clearly interpret survey results you
need to know both! We are much more likely to accept survey findings if the confidence level is
high (say, 95%) than if it is low (say, 50%).
http://stattrek.com/estimation/estimation-in-statistics.aspx?Tutorial=AP
Notation
The following notation is helpful, when we talk about the standard deviation and the standard
error.
Population parameter
Sample statistic
P: Proportion of successes in
population
p: Proportion of successes in
sample
: Population mean
x: Sample estimate of
population mean
i: Mean of population i
s: Sample estimate of
p: Standard deviation of p
x: Standard deviation of x
Standard Deviation
Sample mean, x
x = / sqrt( n )
Sample proportion, p
p = sqrt [ P(1 - P) / n ]
x1-x2 = sqrt [ 21 / n1 + 22 / n2 ]
Note: In order to compute the standard deviation of a sample statistic, you must know the value
of one or more population parameters.
Standard Error
Sample mean, x
SEx = s / sqrt( n )
Sample proportion, p
The equations for the standard error are identical to the equations for the standard deviation,
except for one thing - the standard error equations use statistics where the standard deviation
equations use parameters. Specifically, the standard error equations use p in place of P, and s in
place of .
http://stattrek.com/estimation/standard-error.aspx?tutorial=ap
Margin of Error
In a confidence interval, the range of values above and below the sample statistic is called the
margin of error.
For example, suppose we wanted to know the percentage of adults that exercise daily. We could
devise a sample design to ensure that our sample estimate will not differ from the true population
value by more than, say, 5 percent (the margin of error) 90 percent of the time (the confidence
level).
If you know the standard deviation of the statistic, use the first equation to compute the margin
of error. Otherwise, use the second equation. Previously, we described how to compute the
standard deviation and standard error.
When one of these conditions is satisfied, the critical value can be expressed as a t score or as a z
score. To find the critical value, follow these steps.
To express the critical value as a z score, find the z score having a cumulative
probability equal to the critical probability (p*).
The critical t score (t*) is the t score having degrees of freedom equal
to DF and a cumulative probability equal to the critical probability (p*).
Should you express the critical value as a t score or as a z score? There are several ways to
answer this question. As a practical matter, when the sample size is large (greater than 40), it
doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the
population standard deviation is unknown or when the sample size is small, the t score is
preferred. Nevertheless, many introductory statistics texts use the z score exclusively. On this
web site, we provide sample problems that illustrate both approaches.
You can use the Normal Distribution Calculator to find the critical z score, and the t Distribution
Calculator to find the critical t score. You can also use a graphing calculator or standard
statistical tables (found in the appendix of most introductory statistics texts).
Confidence level
Statistic
Margin of error
Given these inputs, the range of the confidence interval is defined by the sample statistic +
margin of error. And the uncertainty associated with the confidence interval is specified by the
confidence level.
Often, the margin of error is not given; you must calculate it. Previously, we described how to
compute the margin of error.
Identify a sample statistic. Choose the statistic (e.g, sample mean, sample
proportion) that you will use to estimate a population parameter.
Find the margin of error. If you are working on a homework problem or a test
question, the margin of error may be given. Often, however, you will need to
compute the margin of error, based on one of the following equations.
Estimation Requirements
The approach described in this lesson is valid whenever the following conditions are met:
p = sqrt[ P * ( 1 - P ) / n ] * sqrt[ ( N - n ) / ( N - 1 ) ]
where P is the population proportion, n is the sample size, and N is the
population size. When the population size is much larger (at least 10 times
larger) than the sample size, the standard deviation can be approximated by:
p = sqrt[ P * ( 1 - P ) / n ]
When the true population proportion P is not known, the standard deviation of
the sampling distribution cannot be calculated. Under these circumstances,
use the standard error. The standard error (SE) provides an unbiased estimate
of the standard deviation. It can be calculated from the equation below.
SEp = sqrt[ p * ( 1 - p ) / n ]
Alert
The Advanced Placement Statistics Examination only covers the "approximate" formulas for the
standard deviation and standard error. However, students are expected to be aware of the
limitations of these formulas; namely, the approximate formulas should only be used when the
population size is at least 10 times larger than the sample size.
Find the margin of error. Previously, we showed how to compute the margin of
error.
In the next section, we work through a problem that shows how to use this approach to construct
a confidence interval for a proportion.
Estimation Requirements
The approach described in this lesson is valid whenever the following conditions are met:
Each sample includes at least 10 successes and 10 failures. (Some texts say
that 5 successes and 5 failures are enough.)
p1 - p2 =
sqrt{ [P1 * (1 - P1) / n1] * [(N1 - n1) / (N1 - 1)] + [P2 * (1 - P2) / n2] * [(N2 - n2) / (N2 - 1)] }
where P1 is the population proportion for sample 1, P2 is the population
proportion for sample 2, n1 is the sample size from population 1, n 2 is the
sample size from population 2, N1 is the number of observations in population
1, and N2 is the number of observations in population 2. When each sample is
small (less than 10% of its population), the standard deviation can be
approximated by:
When the population parameters (P1 and P2) are not known, the standard
deviation of the sampling distribution cannot be calculated. Under these
circumstances, use the standard error. The standard error (SE) provides an
unbiased estimate of the standard deviation. It can be calculated from the
equation below.
SEp1 - p2 =
sqrt{ [p1 * (1 - p1) / n1] * [(N1 - n1) / (N1 - 1)] + [p2 * (1 - p2) / n2] * [(N2 - n2) / (N2 - 1)] }
where p1 is the sample proportion for sample 1, and where p 2 is the sample
proportion for sample 2. When each sample is small (less than 10% of its
population), the standard deviation can be approximated by:
Alert
Some texts present a different, less general version of the approximate formulas. These formulas,
which appear below, are valid when the proportions are equal.
where P = P1 = P2
where p = p1 = p2
Remember, these two formulas should be used only when the proportions from each group are
equal, and when each sample size is small (less than 10% of the population size).
Find the margin of error. Previously, we showed how to compute the margin of
error.
In the next section, we work through a problem that shows how to use this approach to construct
a confidence interval for the difference between proportions.
http://stattrek.com/estimation/difference-in-proportions.aspx?tutorial=ap
Estimation Requirements
The approach described in this lesson is valid whenever the following conditions are met:
Generally, the sampling distribution will be approximately normally distributed if any of the
following conditions apply.
times larger) than the sample size, the standard deviation can be
approximated by:
x = / sqrt( n )
SEx = s / sqrt( n )
Note: In real-world analyses, the standard deviation of the population is seldom known.
Therefore, the standard error is used more often than the standard deviation.
Alert
The Advanced Placement Statistics Examination only covers the "approximate" formulas for the
standard deviation and standard error. However, students are expected to be aware of the
limitations of these formulas; namely, the approximate formulas should only be used when the
population size is at least 10 times larger than the sample size.
Identify a sample statistic. Use the sample mean to estimate the population
mean.
Find the margin of error. Previously, we showed how to compute the margin of
error.
In the next section, we work through a problem that shows how to use this approach to construct
a confidence interval to estimate a population mean.
Estimation Requirements
The approach described in this lesson is valid whenever the following conditions are met:
Generally, the sampling distribution will be approximately normally distributed if each sample is
described by at least one of the following statements.
If the population standard deviations are known, the standard deviation of the
sampling distribution is:
x1-x2 = sqrt [ 21 / n1 + 22 / n2 ]
When the standard deviation of either population is unknown and the sample
sizes (n1 and n2) are large, the standard deviation of the sampling distribution
can be estimated by the standard error, using the equation below.
Note: In real-world analyses, the standard deviation of the population is seldom known.
Therefore, SEx1-x2 is used more often than x1-x2.
Alert
Some texts present additional options for calculating standard deviations. These formulas, which
should only be used under special circumstances, are described below.
Standard deviation. Use this formula when the population standard deviations
are known and are equal.
x1 - x2 = d = * sqrt[ (1 / n1) + (1 / n2)]
where = 1 = 2
Pooled standard deviation. Use this formula when the population standard
deviations are unknown, but assumed to be equal; and the samples sizes (n 1)
and (n2) are small (under 30).
SDpooled = sqrt{ [ (n1 -1) * s12) + (n2 -1) * s22) ] / (n1 + n2 - 2) }
where 1 =
2
Remember, these two formulas should be used only when the various required underlying
assumptions are justified.
Find the margin of error. Previously, we showed how to compute the margin of
error, based on the critical value and standard deviation.
When the sample size is large, you can use a t score or a z score for the critical value.
Since it does not require computing degrees of freedon, the z score is a little easier. When
the sample sizes are small (less than 40), use a t score for the critical value.
If you use a t score, you will need to compute degrees of freedom (DF). Here's how.
o
The next section presents sample problems that illustrate how to use z scores and t scores
as critical values.
Estimation Requirements
The approach described in this lesson is valid whenever the following conditions are met:
The data set is a simple random sample of observations from the population
of interest.
Each element of the population includes measurements on two paired
variables (e.g., x and y) such that the paired difference between x and y is: d
= x - y.
The sampling distribution of the mean difference between data pairs (d) is
approximately normally distributed.
Generally, the sampling distribution will be approximately normally distributed if the sample is
described by at least one of the following statements.
d = d / sqrt( n )
SEd = sd / sqrt( n )
Note: In real-world analyses, the standard deviation of the population is seldom known.
Therefore, the standard error is used more often than the standard deviation.
Alert
The Advanced Placement Statistics Examination only covers the "approximate" formulas for the
standard deviation and standard error. However, students are expected to be aware of the
limitations of these formulas; namely, the approximate formulas should only be used when the
population size is at least 10 times larger than the sample size.
Identify a sample statistic. Use the mean difference between sample data
pairs (d to estimate the mean difference between population data pairs d.
Find the margin of error. Previously, we showed how to compute the margin of
error, based on the critical value and standard deviation.
When the sample size is large, you can use a t score or a z score for the critical value.
Since it does not require computing degrees of freedon, the z score is a little easier. When
the sample sizes are small (less than 40), use a t score for the critical value.
If you use a t score, you will need to compute degrees of freedom (DF). In this case, the
degrees of freedom is equal to the sample size minus one: DF = n - 1.
Estimation Requirements
The approach described in this lesson is valid whenever the standard requirements for simple
linear regression are met.
Coef
SE Coef
Consta
nt
76
30
2.53
0.01
35
20
1.75
0.04
In the output above, the standard error of the slope (shaded in gray) is equal to 20. In this
example, the standard error is referred to as "SE Coeff". However, other software packages might
use a different label for the standard error. It might be "StDev", "SE", "Std Dev", or something
else.
If you need to calculate the standard error of the slope (SE) by hand, use the following formula:
SE = sb1 = sqrt [ (yi - i)2 / (n - 2) ] / sqrt [ (xi - x)2 ]
where yi is the value of the dependent variable for observation i, i is estimated value of the
dependent variable for observation i, xi is the observed value of the independent variable for
observation i, x is the mean of the independent variable, and n is the number of observations.
Find the margin of error. Previously, we showed how to compute the margin of
error, based on the critical value and standard error. When calculating the
margin of error for a regression slope, use a t score for the critical value, with
degrees of freedom (DF) equal to n - 2.
In the next section, we work through a problem that shows how to use this approach to construct
a confidence interval for the slope of a regression line.
http://stattrek.com/regression/slope-confidence-interval.aspx?tutorial=ap