You are on page 1of 42

CHAPTER 3

ESTIMATION
Definition
• procedure by which a numerical value(s) are assigned to a population
parameter (such as mean, median, mode, variance and standard
deviation) based on the information collected from a sample.

• In other words, sample estimation represents the population


parameters
Examples
• Eg. 1:
if the Malaysian Census Bureau can contact every household in Malaysia to find the
mean housing expenditure incurred by households per month, the results of the
survey will give the value of the true population parameter. However, it is too
expensive, very time consuming or virtually impossible to contact every member of
a population to collect information to find the true value of a population
parameter. Therefore, we usually take a sample from the population and calculate
the value of the appropriate sample statistic. Then we assign a value(s) to the
corresponding population parameter based on the value of the sample statistic.
The value(s) assigned to a population based on the value of a sample statistic is
called an estimate of the population parameter.

• Eg. 2:
suppose the Malaysian Census Bureau takes a sample of 10,000 households and
finds that the mean housing expenditure per month 𝑥,ҧ is RM1370. If he assigns this
value to the population mean, then RM 1370 is called an estimate of μ. The sample
statistic used to estimate a population parameter is called an estimator. Thus the
sample mean 𝑥ҧ is an estimator of the population mean μ.
Properties of good estimator
• Unbiased estimator - estimator should be “close” in some sense to
the true value of the unknown parameter. That is the expected value
or the mean of the estimates obtained from samples of a given size is
equal to the parameter being estimated.
• Consistent estimator - as the sample size increases, the value of the
estimator approaches the value of the parameter being estimated.
• Relatively efficient estimator – of all the statistics that can be used to
estimate a parameter, the relatively efficient estimator has the
smallest variance.
Types of estimation
A. Point estimate
is a single number used to estimate a population parameter. The best
point estimate of the population mean, μ is the sample mean, 𝑥ҧ .
Eg: the bureau can state that the mean housing expenditure per
month μ for all households is about RM1370.
the point estimate of a population parameter = value of the
corresponding sample statistics.
the standard error of 𝑋ത or the estimated standard error, respectively
are
𝜎 𝑆
𝜎𝑥ҧ = or 𝜎ො𝑥ҧ =
𝑛 𝑛
• Eg1: An article in the Journal of Heat and Mass Transfer described a new
method of measuring the thermal conductivity of Armco iron. Using a
temperature of 1000F and a power input of 550 watts, the following 10
measurements of thermal conductivity were obtained:
41.60, 41.48, 42.34, 41.95, 41.86,
42.18, 41.72, 42.26, 41.81, 42.04

a) Find the point estimate of the mean thermal conductivity at 1000F and
550 watts power input.
b) Find the standard error of the sample mean.

Sol: a)The point estimate is the sample mean, 𝑥ҧ =


b) The standard error, 𝜎ො𝑥ҧ =
B. Confidence interval
constructed around the point estimate, the interval is likely to contain
the corresponding population parameter ⇒ confidence interval.
cannot be certain that the interval contains the true unknown
population parameter
only use a sample from the full population to compute the point
estimate and the interval.
the confidence interval is constructed so that we have high confident
that it does contain the unknown population parameter.
Any value of the confidence level can be chosen to construct a
confidence interval, and the more common values are 90%, 95% and
99%.
The confidence interval is given as :

ഥ ± margin of error E
Point estimate, 𝒙

The confidence level associated with a confidence interval states how


much confidence we have that this interval contains the true
population parameter. The confidence level is denoted by
(1-α)100% , where α is the significant level
Confidence intervals for the mean: σ Known
A. TWO-SIDED CONFIDENCE INTERVAL
• If 𝑥ҧ is the sample mean of a random sample of size n from a normal
population with known variance σ2, a 100% (1-α) confidence interval
on μ is given by
𝝈 𝝈
ഥ−
𝒙 𝒛𝜶/𝟐 ഥ+
≤ 𝝁 ≤ 𝒙 𝒛𝜶/𝟐
𝒏 𝒏

• the value of 𝑧𝛼/2 is obtained from the standard normal distribution


𝜎
table and the term 𝑧𝛼/2 is called margin of error E (also called the
𝑛
maximum error of the estimate)
B. ONE-SIDED/UPPER BOUND CONFIDENCE INTERVAL
• If 𝑥ҧ is the sample mean of a random sample of size n from a normal
population with known variance σ2, a 100% (1-α) confidence interval
on μ is given by

𝝈

𝝁 ≤ 𝒙 + 𝒛𝜶
𝒏

• the value of 𝑧𝛼 is obtained from the standard normal distribution


𝜎
table and the term 𝑧𝛼 is called margin of error E
𝑛
C. ONE-SIDED/LOWER BOUND CONFIDENCE INTERVAL
• If 𝑥ҧ is the sample mean of a random sample of size n from a normal
population with known variance σ2, a 100% (1-α) confidence interval
on μ is given by

𝝈
ഥ−
𝝁 ≥ 𝒙 𝒛𝜶
𝒏

• the value of 𝑧𝛼 is obtained from the standard normal distribution


𝜎
table and the term 𝑧𝛼 is called margin of error E
𝑛
• Eg 2: A manager at the Papyrus Paper Company wants to estimate the
mean time required for a new machine to produce a ream of paper. A
random sample of 36 reams required an average time of 1.5 minutes
for each ream. Assuming σ = 0.30 minute,
a. construct an interval estimate with a confidence level 95%.
b. construct an upper bound interval estimate with a confidence level
95%
Sol:
• Eg 3: The Ledd Pipe Company has received a large shipment of pipes,
and a quality control inspector wants to estimate the average
diameter of the pipes to see if they meet minimum standards. She
takes a random sample of 15 pipes, and the sample produces an
average diameter of 2.55 mm. In the past, the diameters of the pipes
have been normally distributed, and the population standard
deviation has been 0.07mm. Assuming that these still hold true,
construct an interval estimate with a 99% level of confidence. Also,
find the lower bound interval estimate with 99% level of confidence.
Sol:
• Eg 4: ASTM Standard E23 defines standard test methods for notched
bar impact testing of metallic materials. The charpy V-notch (CVN)
technique measure impact energy and is often used to determine
whether or not a material experiences a ductile-to-brittle transition
with decreasing temperature. Ten measurements of impact energy (J)
on specimens of A238 steel cut at 600C are as follows:
64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6, 64.8, 64.2, 64.3
Assume that impact energy is normally distributed with σ=1 J.
Construct a 95% confidence interval for the mean impact energy.
Sol:
Sample size
• If 𝑥ҧ is used an estimate of μ, we can be 100 (1-α)% confident that
the error 𝑥ҧ − 𝜇 will not exceed a specified amount E (margin of
error) when the sample size is

𝒛𝜶/𝟐 𝝈 𝟐
𝒏=
𝑬

• If necessary round the answer up to obtain a whole number.


• This formula is applicable if the sample size for the specific error on
the mean as well as the variance are known
Eg 5: A college president asks the statistics teacher to estimate the
average age of the students at their college. How large a sample is
necessary? The statistics teacher decides the estimate should be
accurate within 1 year and be 99% confident. From a previous study,
the standard deviation of the ages is known to be 3 years.

Sol:
Eg 6: A company that produces detergents wants to estimate the mean
amount of detergent in 64-ounce jugs at a 99% confidence level. The
company knows that the standard deviation of the amount of the
detergent in all such jugs is 0.2 ounces. How large a sample should the
company take so that the estimate is within 0.04 ounces of the
population mean?

Sol:
Eg 7: A scientist wishes to estimate the average depth of a river. He
wants to be 98% confident that the estimate is accurate within 2 feet.
From a previous study, the standard deviation of the depths measured
was 4.33 feet.

Sol:
Confidence interval for the mean: σ Unknown
• If the population standard deviation, 𝜎 is not known, then we use sample
standard deviation, 𝑠.

A. TWO-SIDED CONFIDENCE INTERVAL


• If 𝑥ҧ is the sample mean of a random sample of size n from a normal
population with UNKNOWN variance σ2, a 100% (1-α) confidence interval
on μ is given by
𝒔 𝒔
ഥ − 𝒕𝜶/𝟐
𝒙 ഥ + 𝒕𝜶/𝟐
≤ 𝝁 ≤ 𝒙
𝒏 𝒏

• the value of 𝑡𝛼/2 is obtained from the t distribution table with n-1 degrees
of freedom
B. ONE-SIDED/UPPER BOUND CONFIDENCE INTERVAL
• If 𝑥ҧ is the sample mean of a random sample of size n from a normal
population with UNKNOWN variance σ2, a 100% (1-α) confidence
interval on μ is given by

𝝈

𝝁 ≤ 𝒙 + 𝒕𝜶
𝒏

• the value of 𝑡𝛼 is obtained from the t distribution table with n-1


degrees of freedom
C. ONE-SIDED/LOWER BOUND CONFIDENCE INTERVAL
• If 𝑥ҧ is the sample mean of a random sample of size n from a normal
population with UNKNOWN variance σ2, a 100% (1-α) confidence
interval on μ is given by

𝝈

𝝁 ≥ 𝒙 − 𝒕𝜶
𝒏

• the value of 𝑡𝛼 is obtained from the t distribution table with n-1


degrees of freedom
Characteristics of the t-distribution

• Similarities (Normal dist. And t dist.)


i. It is bell-shaped.
ii. It is symmetrical about the mean.
iii. The mean, median and mode are equal to 0 and are located at the centre of the
distribution.
iv. The curve never touches the x-axis.

• Differences
i. The t-distribution differs from the standard normal distribution in the following ways.
ii. The variance is greater than 1.
iii. The t-distribution is actually a family of curves based on the concept of a degree of
freedom, which is related to sample size.
iv. As the sample size increases, the t-distribution approaches the standard normal
distribution.
many statistical distribution
use the concept of degrees
of freedom, and the
formulas for finding the
degrees of freedom vary for
different statistical tests.
The degrees of freedom are
the number of values that
are free to vary after a
sample statistic has been
computed.
Eg 9: Ten randomly selected automobiles were stopped, and the tread death
of the right front tyre was measured. The mean was 0.32 inch, and the
standard deviation was 0.08 inch. Find the 95% confidence interval of the
mean depth. Assume that the variable is approximately normally distributed.

Sol:
Eg 10: Johan, the manager of the paint store, wants to estimate the
mean amount of product sold per day. Twenty business days are
monitored, and an average of 32 litres is sold daily. The sample
standard deviation is 12 litres. Calculate the confidence limit at the 95%
confidence level.

Sol:
Eg 11: An article in the journal Meterials Engineering (1989, Vol II,
pp275-281) described the results of tensile adhesion test on 22 U-700
alloy specimens. The load at specimen failure is as follows (in
megapascals):
19.8 10.1 14.9 7.5 15.4 15.4
15.4 18.5 7.9 12.7 11.9 11.4
11.4 14.1 17.6 16.7 15.8
19.5 8.8 13.6 11.9 11.4
Find a 95% confidence interval for the mean of tensile adhesion test on
22 U-700 alloy specimens.

Sol:
• For Eg 9-11, construct the 99% upper and lower bound confidence
interval
GUIDELINES
Guideline when to use z or t-distribution

Yes
Is σ known? Use zα/2 values no matter what the sample size is*

No
Yes
Is n ≥ 30? Use zα/2 values and s in place of σ in the formula

No

Use tα/2 values and s in the * Variable must be normally distributed when n < 30.
formula** ** Variable must be approximately normally distributed.
Confidence intervals for the variance
If 𝑠 2 is the sample variance from a random sample of n observations
from a normal distribution with unknown variance σ2, then a 100% (1-α)
confidence interval on σ2 is

(𝒏−𝟏)𝒔𝟐 𝒏−𝟏 𝒔𝟐 (𝒏−𝟏)𝒔𝟐 𝒏−𝟏 𝒔𝟐


≤ 𝝈𝟐 ≤ or ≤ 𝝈𝟐 ≤
𝝌𝟐𝜶/𝟐,𝒏−𝟏 𝝌𝟐 𝜶 𝝌𝟐𝒓𝒊𝒈𝒉𝒕 𝝌𝟐𝒍𝒆𝒇𝒕
𝟏− ,𝒏−𝟏
𝟐

2 2
where 𝜒𝛼/2,𝑛−1 and 𝜒1−𝛼/2,𝑛−1 are the upper and lower 100α/2
percentage points of the chi-square distribution with n-1 degrees of
freedom
Confidence intervals for the standard deviation
If 𝑠 is the sample standard deviation from a random sample of n
observations from a normal distribution with unknown variance σ2,
then a 100% (1-α) confidence interval on 𝜎 is

(𝒏 − 𝟏)𝒔𝟐 (𝒏 − 𝟏)𝒔𝟐
≤ 𝝈≤
𝝌𝟐𝜶/𝟐,𝒏−𝟏 𝝌𝟐𝟏−𝜶/𝟐,𝒏−𝟏

2 2
where 𝜒𝛼/2,𝑛−1 and 𝜒1−𝛼/2,𝑛−1 are the upper and lower 100α/2
percentage points of the chi-square distribution with n-1 degrees of
freedom
• To calculate these confidence intervals, a new statistical distribution is
needed. It is called the chi-square distribution.
• The chi-square variable is similar to the t variable in that its
distribution is a family of curves based on the number of degrees of
freedom.

Characteristics of the χ2 distribution

A χ2 variable cannot be negative;


 The distribution are positively skewed;
At about 100 d.f., the χ2 distribution becomes somewhat symmetric;
The area under each χ2 distribution is equal to 1 or 100%.
2 2
Eg 12: Find the values for 𝜒𝑟𝑖𝑔ℎ𝑡 and 𝜒𝑙𝑒𝑓𝑡 for a 90% confidence interval when n = 25.
Sol:

Eg 13: Find the 95% confidence interval for the variance and standard deviation of the
nicotine content of cigarettes manufactured if a sample of 20 cigarettes has a standard
deviation of 1.6 mg.
Sol:

Hence, you can be 95% confident that the true variance for the nicotine content is between
1.5 and 5.5,
and the true standard deviation for the nicotine content of all cigarettes manufactured is
between 1.2 and 2.3 mg based on a sample of 20 cigarettes.
Past Year’s Questions
Q3 (b) Mid Term Exam Sem 2 2014/2015
An izod impact test was performed on 30 specimens of PCV pipe. The sample
mean is 𝑥ҧ = 0.25 and the sample standard deviation is 𝑠 = 0.25. Find a 99%
confidence interval on Izod impact strength for the variance and interpret
the finding.

Q3(c) Mid Term Exam Sem 2 2015/2016 set 1


A civil engineer is analysing the compressive strength of concrete.
Compressive strength is normally distributed with 𝜎 2 = 1000 psi2.
It is desired to estimate the compressive strength with an error that is less
than 15 psi at 99% confidence. What sample size is required?
Q4 sem 2 2014/2015
An automobile company is working on changes in a fuel injection system to
improve gasoline mileage. A random sample of 10 test runs gave the
following mileage (in mpg)
38, 42, 40, 39, 44, 37, 39, 45, 41, 44
i. Find the sample mean and sample standard deviation for the above data
ii. Construct a 90% lower confidence bound on the mean gasoline mileage
and interpret it.
iv. What is the point estimate of the mean gasoline mileage for the
population
Q5 sem 2 2014/2015
If the variance of national accounting examination is 900, how large a
sample is needed to estimate the true mean score within 6 points with
99% confidence?

Q3(c) sem 2 2015/2016 set 1


A civil engineer is analysing the compressive strength of concrete.
Compressive strength is normally distributed with 𝜎 2 = 1000 (psi)2. It is
desired to estimate the compressive strength with an error that is less
than 15 psi at 99% confidence. What sample size is required?
• Q4(b) sem 2 2015/2016 set 1
Ten people were randomly selected where each were asked to list down how
many hours of television they watched per week. The results were given as
follows:
20 25 30 18 22 35 40 32 25 30
i. What is the point estimate of the mean hours of television they watched
per week?
ii. Estimate the mean hours of television they watched per week at 95%
confidence interval.
iii. Construct a 90% confidence interval estimate for the variance number of
hours of television they watched per week
Q3(c) sem 2 2015/2016 set 2
An electrical firm manufactures light bulbs that have a length of life
which is approximately normally distributed with a standard deviation
of 40 hours. How large a sample is needed if the firm is 96% confident
so that the estimation of the mean life of the bulbs is within 10 hours?
Q4(b) sem 2 2015/2016 set 2
An article in the Journal of Composite Material (December 1989, Vol 23, pp.
1200) describes the effect of delamination on the natural frequency of
beams made from composite laminates. Five delaminated beams were
subjected to loads and the resulting frequency were as follows (in hertz)

230.66 233.05 232.58 229.48 232.58

i. Find a 95% confidence interval for the mean of the beams frequency.
Hence, can you conclude that the mean of the beams frequency is 230
hertz? Explain.
ii. Construct a 98% confidence interval for the standard deviation of the
beam frequency.
Q2(a) Sem 1 2015/2016
Data below showed the IQ scores for eight individuals where each was
selected among the youngest of a family
IQ scores: 131 119 103 93 108 100 111 130
i. Estimate the sample mean
ii. Construct the 95% confidence interval for the population mean
• Q4(d) Sem 1 2015/2016
A health care professional wishes to estimate the birth weight of
infants. How large a sample must be obtained if she desires to be 90%
confident that the true mean is within 2kg of the sample mean?
Assume 𝜎=8 kg.
Q4(e) Sem 1 2015/2016
The mean weight in kg for 8 adult males are given as follows. Construct
a 90% confidence interval to estimate the variance of weight for all
adult males.

You might also like