You are on page 1of 62

4.

1 Introduction
The field of statistical inference consist of those
methods used to make decisions or to draw
conclusions about a population. These methods
utilize the information contained in a sample from the
population in drawing conclusions. Statistical
Inference may be divided into two major areas:
parameter estimationand hypothesis testing.
Definition 2.1: An Interval Estimate
In interval estimation, an interval is constructed around
the point estimate and it is stated that this interval is
likely to contain the corresponding population
parameter.
Definition 2.2: Confidence Level and Confidence Interval
Each interval is constructed with regard to a given confidence
level and is called a confidence interval. The confidence level
associated with a confidence interval states how much confidence
we have that this interval contains the true population parameter. The
confidence level is denoted by . ( )
1 100% o
2
EQT 373
Population
Mean
Unknown
Confidence
Intervals
Population
Proportion
Known
2
2 2
2
The (1 )100% Confidence Interval of Population Mean,
(i) if is known and normally distributed population
or
(ii) if is unknown, large (
x z
n
x z x z
n n
s
x z n n
n
o
o o
o
o
o
o
o o

| |
< < +
|
\ .
>
2 2
30)
or
s s
x z x z
n n
o o

| |
< < +
|
\ .
4
( )
1,
2
1, 1,
2 2
(iii) if is unknown, normally distributed population
and small sample size 30
or
n
n n
s
x t
n
n
s s
x t x t
n n
o
o o
o

<
| |
< < +
|
\ .
5
2
If a random sample of size 20 from a normal population
with the variance 225 has the mean 64.3, construct
a 95% confidence interval for the population mean, .
n
x o

=
= =
6
7
0.025
2
It is known that, 20, 64.3 and 15
For 95% CI,
95% 100(1 )%
1 0.95
0.05
0.025
2
1.96

n x
z z
o
o
o
o
o
o
= = = =
=
=
=
=
= =
8
2
Hence, 95% CI
15
64.3 1.96
20
64.3 6.57
[57.73, 70.87]
@

x z
n
o
o | |
=
|
\ .
| |
=
|
\ .
=
=
57.73 70.87
Thus, we are 95% confident that the mean of random variable
is between 57.73 and 70.87
< <
Example 2.2 :
A publishing company has just published a new textbook. Before the
company decides the price at which to sell this textbook, it wants to know the
average price of all such textbooks in the market. The research department at
the company took a sample of 36 comparable textbooks and collected the
information on their prices. This information produced a mean price RM70.50
for this sample. It is known that the standard deviation of the prices of
all such textbooks is RM4.50. Construct a 90% confidence interval for the mean
price of all such college textbooks.
9
solution
0.05
2
It is known that, 36, RM70.50 and 4.50
For 90% CI,
90% 100(1 )%
1 0.90
0.1
0.05
2
1.65

n x RM
z z
o
o
o
o
o
o
= = = =
=
=
=
=
= =
10
2
Hence, 90% CI
4.50
70.50 1.65
36
70.50 1.24
[ 69.26, 71.74]
Thus, we are 90% confident that the m
x z
n
RM RM
o
o | |
=
|
\ .
| |
=
|
\ .
=
=
ean price of all such
college textbooks is between RM69.26 and RM71.74
11
1 2
2.1.2 The (1 )100% Confidence Interval Estimates
on the Differences between Two Population Means,
o

( )
( )
( )
( )
2 2
1 2
1 2
1 2
2
1 2
2 2
1 2
1 2
1 2
2
1 2
1 2
2 2
1 2
1 2
2
1
(i) if and is known and normally
distributed population
(ii) if and is unknown, large
30, 30
or
x x z
n n
s s
x x z n
n n
n n
s s
x x z
n n
o
o
o
o o
o o
o o
+
+
> >
+
( )
2 2
1 2
1 2
2
2 1 2
,
s s
x x z
n n
o
| |
+ + |
|
\ .
12
( )
( )
2 2
1 2
1 2 , 2 1 2
1 2
1 2
2
2 2
1 2
1 2
2
2 2
1 2
1 1 2 2
(iii) if ( , ) is unknown, normally distributed
population and small sample size 30, 30
with
1 1
1 1
v
s s
x x t
n n
n n
s s
n n
v
s s
n n n n
o
o o +
< <
| |
+
|
\ .
=
| | | |
+
|

\ . \ .
2
|
13
( )
( )
1 2 1 2
1 2
1 2
2
for (ii) and (iii) cases, if ( , ) is unknown but we assume = ,
the pooled estimator is used to estimate , then the formula is given as follows:
for large sample size 30, 30 ,
p
S
n n
x x z
o
o o o o o
o
=
> >

( )
( )
( ) ( )
1 2
1 2
1 2
1 2 2, 2
1 2
2 2
1 1 2 2
1 2
1 1
for small sample size 30, 30
1 1
1 1
with
2
p
n n p
p
S
n n
n n
x x t S
n n
n s n s
S
n n
o +
+
< <
+
+
=
+
14
Example 2.3 :
The scientist wondered whether there was a difference in the average daily
intakes of dairy products between men and women. He took a sample of n =50
adult women and recorded their daily intakes of dairy products in grams per day.
He did the same for adult men. A summary of his sample results is listed below.
Construct a 95% confidence interval for the difference in the average daily intakes
of daily products for men and women. Can you conclude that there is a difference
in the average daily intakes of daily products for men and women?
15
Men Women
Sample size 50 50
Sample mean 780 grams per day 762 grams per day
Sample standard
deviation
35 30
Solution
16
( )
( )
( ) ( )
2 2
2 2
1 2
1 2
1 2 2
Hence, 95% CI:
35 30
780 762 1.96
50 50
18 12.78
[5.22, 30.78]
Thus, we should be willin
s s
x x z
n n
o
| |
| |
|
+ = + |
|
|
\ .
\ .
=
=
( )
1 2
g to conclude that there is a difference in the
average daily intakes of daily products for men and women as
0 there is no difference between two population means
is not one of the possible value
=
1 2
s for ( ).
( )
( ) ( )
2
2 2
The (1 )100% Confidence Interval for for Large Samples ( 30)

1

or
1 1

p n
p p
p z
n
p p p p
p z p p z
n n
o
o o
o >


< < +
17
Example 2.4
According to the analysis of Women Magazine in
June 2005, Stress has become a common part of
everyday life among working women in Malaysia. The
demands of work, family and home place an
increasing burden on average Malaysian women.
According to this poll, 40% of working women
included in the survey indicated that they had a little
amount of time to relax. The poll was based on a
randomly selected of 1502 working women aged 30
and above. Construct a 95% confidence interval for
the corresponding population proportion.
18
Solution
Let be the proportion of all working women age 30 and above,
who have a limited amount of time to relax, and let be the
corresponding sample proportion. From the given information,
1502 , 0.
p
p
n p = =
2
40, 1 1 0.40 0.60

Hence, 95% CI
0.40(0.60)
0.40 1.96
1502
0.40 0.02478
[0.375, 0.425
q p
pq
p z
n
o
= = =
| |
=
|
|
\ .
| |
=
|
|
\ .
=
= ] or 37.5% to 42.5%
Thus, we can state with 95% confidence that the proportion of all
working women aged 30 and above who have a limited amount of
time to relax is between 37.5% and 42.5%.
19
1 2 1 2
2.1.4 The (1 )100% Confidence Interval Estimates for the
Differences between Two Proportions, , ( 30, 30) P P n n
o
> >
( )
( ) ( )
1 1 2 2
1 2
2
1 2
1 1

p p p p
p p z
n n
o

+
20
Example 2.5:
Aresearcher wanted to estimate the difference between the
percentages of users of two toothpastes who will never switch to
another toothpaste. In a sample of 500 users of Toothpaste A
taken by this researcher, 100 said that the will never switch to
another toothpaste. In another sample of 400 users of Toothpaste
B taken by the same researcher, 68 said that they will never
switch to another toothpaste. Construct a 97%confidence
interval for the difference between the proportions of all users of
the two toothpastes who will never switch.
21
Solutions
Toothpaste A : n
1
= 500 and x
1
= 100
Toothpaste B : n
2
= 400 and x
2
= 68
The sample proportions are calculated;
Thus, with 97% confidence we can state that the difference between the two
population proportions is between -0.026 and 0.086.
22
( )
( )
1 2
1 1 2 2
1 2
1 2 2
100 68
0.20; 0.17
500 400
A97% confidence interval ;
(1 ) (1 )

0.20(0.80) 0.17(0.83)
0.20 0.17 2.17
500 400
0.03 0.05628
[ 0.026, 0.086]
p p
p p p p
p p Z
n n
o
= = = =
| |

= +
|
|
\ .
| |
= +
|
|
\ .
=
=
EQT 373
For the
Mean
Determining
Sample Size
For the
Proportion
The required sample size can be found to reach a desired margin
of error (e) with a specified level of confidence (1 - o). The
margin of error is also called error of estimation
Definition 2.3 (Estimating the Population Mean):
24
2
/ 2
If is used as an estimate of , we can be 100(1- )% confident
that the error | | will not exceed a specified amount when the
sample size is

x
x E
z
n
E
o
o

| |
=
|
\ .
For the
Mean
Determining
Sample Size
Sampling error (margin of error)
/ 2
X Z
n
o
o

/ 2
e Z
n
o
o
=
If o = 45, what sample size is needed to estimate the
mean within 5 with 90% confidence
So the required sample size is n = 220
(Always round up)
2 2 2 2
2 2
(1.645) (45)
219.19
5
Z
n
e
= = =
Example 2.7:
A team of efficiency experts intends to use the mean of a random
sample of size n=150 to estimate the average mechanical
aptitude of assembly-line workers in a large industry (as
measured by a certain standardized test). If, based on
experience, the efficiency experts can assume that
for such data, what can they assert with probability 0.99
about the maximum error of their estimate?
27
6.2 o =
Solutions
28
0.005
/ 2
Substituting 150, 6.2, and 2.575 into the expression
for the maximum error, we get


2.575(6.2)
1.30
150
Thus,
n z
z
E
n
o
o
o
= = =
=
= =
the efficiency experts can assert with probability 0.99 that their
error will be less than 1.30.
29
/ 2
/ 2
If is used as an estimate of , we can assert with
(1 )
(1- )100% confidence that the error is less than .
(1 )
If we set and solve for , the appropriate
sample size is
x
p p
n
p p
z
n
p p
E z n
n
o
o
o
=

=
2
/ 2

(1 )
z
n p p
E
o
| |
=
|
\ .
EQT 373
Determining
Sample Size
For the
Proportion
2
2

(1 ) Z p p
n
e

=
Now solve for n
to get

(1- ) p p
e Z
n
=
How large a sample would be necessary to estimate the
true proportion defective in a large population within
3%, with 95% confidence?
(Assume a sample yields p = 0.12)
For 95% confidence, we have Z
/2
= 1.96
e = 0.03; p = 0.12
Example 2.8
Solution:
So use n = 451
2 2
/ 2
2 2

(1 ) (1.96) (0.12)(1 0.12)
450.74
(0.03)
Z p p
n
e
o

= = =
Example 2.9:
A study is made to determine the proportion of voters in a
sizable community who favor the construction of a nuclear
power plant. If 140 of 400 voters selected at random favor the
project and we use as an estimate of the actual
proportion of all voters in the community who favor the project,
what can we say with 99% confidence about the maximum
error?
32
140
0.35
400
p = =
Solution
33
0.005
/ 2
Substituting 400, 0.35, and 2.575 into the formula,
we get
(1 )
(0.35)(0.65)
2.575 0.061
400
Thus, if we use 0.35 as an estimate of the actual proportion of
voters in the commun
n p z
p p
E z
n
p
o
= = =

=
= =
=
ity who favor the project, we can assert with
99% confidence that the error is less than 0.061.
Example 2.10:
How large a sample required if we want to be 95% confident
that the error in using to estimate p is less than 0.05? If
, find the required sample size.
34

p
0.12 p =
Solution
35
2
0.025
2

(1 )
1.96
0.12(0.88) 163
0.05
z
n p p
E
| |
=
|
\ .
| |
= ~
|
\ .
Hypothesis and Test Procedures
A statistical test of hypothesis consist of :
1. The Null hypothesis,
2. The Alternative hypothesis,
3. The test statistic and its p-value
4. The rejection region
5. The conclusion
36
0
H
1
H
Definition 2.5:
Hypothesistestingcan be used to determine whether a statement
about the value of a population parameter should or should not
be rejected.
Null hypothesis, H
0
: Anull hypothesis is a claim (or statement)
about a population parameter that is assumed to be true.
(the null hypothesis is either rejected or fails to be rejected.)
Alternativehypothesis, H
1
: An alternative hypothesis is a claim
about a population parameter that will be true if the null
hypothesis is false.
Test Statisticis a function of the sample data on which the
decision is to be based.
p-valueis the probability calculated using the test statistic. The
smaller the p-value, the more contradictory is the data to .
37
0
H
Definition 2.6: p-value
The p-value is the smallest significance level at which the null
hypothesis is rejected.
38
0
0
Using the value approach, we reject the null hypothesis, if
value for one tailed test
value for two tailed test
2
and we do not reject the null hypothesis , if
value for one tailed
p H
p
p
H
p
o
o
o

<
<
> test
value for two tailed test
2
p
o
>
1. State the null hypothesis, H
0
and the alternative
hypothesis, H
1
2. Choose the level of significance, o and the sample size, n
3. Determine the appropriate test statistic
4. Determine the critical values that divide the rejection and
non rejection regions
5. Collect data and compute the value of the test statistic
6. Make the statistical decision and state the managerial
conclusion. If the test statistic falls into the non rejection
region, do not reject the null hypothesis H
0
. If the test
statistic falls into the rejection region, reject the null
hypothesis. Express the managerial conclusion in the
context of the problem
It is not always obvious how the null and alternative hypothesis
should be formulated.
When formulating the null and alternative hypothesis, the
nature or purpose of the test must also be taken into
account. We will examine:
1) The claim or assertion leading to the test.
2) The null hypothesis to be evaluated.
3) The alternative hypothesis.
4) Whether the test will be two-tail or one-tail.
5) Avisual representation of the test itself.
In some cases it is easier to identify the alternative hypothesis first.
In other cases the null is easier.
40
Alternative Hypothesis as a Research Hypothesis
Many applications of hypothesis testing involve
an attempt to gather evidence in support of a
research hypothesis.
In such cases, it is often best to begin with the
alternative hypothesis and make it the conclusion
that the researcher hopes to support.
The conclusion that the research hypothesis is true
is made if the sample data provide sufficient
evidence to show that the null hypothesis can be
rejected.
41
Example: Anew drug is developed with the goal
of lowering blood pressure more than the existing drug.
Alternative Hypothesis:
The new drug lowers blood pressure more than
the existing drug.
Null Hypothesis:
The new drug does not lower blood pressure more
than the existing drug.
42
Null Hypothesis as an Assumption to be Challenged
We might begin with a belief or assumption that
a statement about the value of a population
parameter is true.
We then using a hypothesis test to challenge the
assumption and determine if there is statistical
evidence to conclude that the assumption is
incorrect.
In these situations, it is helpful to develop the null
hypothesis first.
43
Example: The label on a soft drink bottle states
that it contains at least 67.6 fluid ounces.
Null Hypothesis:
The label is correct. > 67.6 ounces.
Alternative Hypothesis:
The label is incorrect. < 67.6 ounces.
44
Example: Average tire life is 35000 miles.
Null Hypothesis: = 35000 miles
Alternative Hypothesis: = 35000 miles
45
How to decide whether to reject or accept ?
The entire set of values that the test statistic may assume is divided
into two regions. One set, consisting of values that support the
and lead to reject , is called the rejection region. The other,
consisting of values that support the is called the acceptance
region.
Tails of a Test
0
H
1
H
0
H
0
H
Two-Tailed Test Left-Tailed
Test
Right-Tailed Test
Sign in =
Sign in < >
Rejection Region In both tail In the left tail In the right tail
0
H
1
H
> s
=
Rejection Region
2 2
or Z Z z z
o o
< > Z z
o
> Z< z
o

2.2.1 a) Testing Hypothesis on the Population Mean,


Null Hypothesis :
Test Statistic :
47

0 0
: H =
any population, is known and n is large
or
normal population, is known and n is small
any population, is unknown and n is large
normal population, is unknown and n is
small
o
o
x
Z
n

=
o
x
Z
s
n

=
o
1
x
t
s
n
v n

=
=
Example 2.11
48
The average monthly earnings for women i n managerial and
professional positions is 2400. Do men i n the same positions
have average monthly earnings that are h igher than those for women?
A random sampl
RM
e of 40 men in managerial and professional
positions showed 3600 and 400. Test the appropriate
hypothesis using 0.01
n
x RM s RM
o
=
= =
=
Solution
49
0
1
0.01
1.The hypothesis to be tested are,
: 2400
: 2400
2.We use normal distribution 30
3. Rejection Region: Z ; 2.33
4. Test Statistic

H
H
n
z z z
o o

s
>
>
> = =
0
3600 2400
18.97
400
40
Since 18.97 2.33, falls in the rejection region, we reject
and conclude that average monthly earnings for men in managerial
and professional positions are significant
x
Z
s
n
H

= = =
>
ly higher than those for women.
Test statistics:
50
1 2

0 1 2
Null Hypothesis : : 0 H =

1 2
For two large and independent samples
and and are known. o o
( )
( ) 1 2
1 2
2 2
1 2
1 2
x x
Z
n n

o o

=
+
1 2
1 2
For two large and independent samples
and and are unknown.
(Assume )
o o
o o =
( )
( ) 1 2
1 2
2 2
1 2
1 2
x x
Z
s s
n n

=
+
( )
( )
( ) ( )
1 2
1 2
1 2
2 2
1 1 2 2
1 2

1 1
1 1
with
2
p
p
x x
Z
S
n n
n s n s
S
n n

=
+
+
=
+
1 2
1 2
For two large and independent samples
and and are unknown.
(Assume )
o o
o o =
51

( )
( ) 1 2
1 2
2 2
1 2
1 2
2
2 2
1 2
1 2
2 2
2 2
1 2
1 1 2 2
1 1
1 1
x x
t
s s
n n
s s
n n
v
s s
n n n n

=
+
| |
+
|
\ .
=
| | | |
+
| |

\ . \ .
( )
( ) 1 2
1 2
1 2
1 2

1 1
2
p
x x
t
S
n n
v n n

=
+
= +
1 2
1 2
For two small and independent samples
and and are unknown.
(Assume )
o o
o o =
1 2
1 2
For two small and independent samples
takenfrom two normally distributed
populations. and are unknown.
(Assume )
o o
o o =
Alternative hypothesis Rejection Region
52
1 1 2
: 0 H =
1 1 2
: 0 H >
1 1 2
: 0 H <
2 2
or Z Z z z
o o
< >
Z z
o
>
Z< z
o

Example 2.12
53
A university conducted an investigation to determine whether
car ownership affects academic achieveme nt was based on two
random samples of 100 male students, eac h drawn from the
student body. The grad
1
1
2 2
2
1 2
2
e point average for the 100 non
owners of cars had an average and variance equal to 2.70
and 0.36 as opposed to 2.54 and 0.40 for the
100 car owners. Do the data present sufficient
n
x
s x s
n
=
=
= = =
= evidence to
indicate a difference in the mean achievements between car owners
and non owners of cars? Test using 0.05 o =
Solution
54
( )
( )
0 1 2
1 1 2
1 2
1 2
2 2
1 2
1 2
The hypothesis to be tested are ,
: 0
: 0
Therefore, test statistic is
2.70 2.54
1.84
0.36 0.40
100 100
H
H
x x
Z Z
s s
n n



=
=


= = = =
+
+
o/2 o/2 1o
z=-1.96 z=+1.96
Do Not
Reject H
0
0 0
Reject H Reject H
2 2 0.05 2 0.025
Rejection Region:Z or Z ; 1.96 z z z z
o o
< > = =
0
Since 1.84 does not exceed 1.96 and not less than 1.96, we fail to reject
and that is, there is not sufficient evidence to declare that there is a difference
in the average academic achievement for
H
the two groups.
55
56
0 0
0
0 0
Null Hypothesis : :

Test Statistic:
H p p
p p
Z
p q
n
=

=
Alternative hypothesis Rejection Region
1 0
: H p p =
1 0
: H p p >
1 0
: H p p <
2 2
or Z Z z z
o o
< >
Z z
o
>
Z< z
o

Example 2.13
When working properly, a machine that is used to make chips for calculators
does not produce more than 4% defective chips. Whenever the machine
produces more than 4% defective chips it needs an adjustment. To check if
the machine is working properly, the quality control department at the
company often takes sample of chips and inspects them to determine if the
chips are good or defective. One such random sample of 200 chips taken
recently from the production line contained 14 defective chips. Test at the 5%
significance level whether or not the machine needs an adjustment.
57
Solution
58
0
1
0
0 0
0.05
The hypothesis to be tested are ,
: 0.04
: 0.04
Test statistic is

0.07 0.04
2.17
0.04(0.96)
200
Rejection Region: Z ; 1.65
Since 2.17 1.
H p
H p
p p
Z
p q
n
z z z
o o
s
>

= = =
> = =
>
0
65, falls in the rejection region, we can reject
and conclude that the machine needs an adjustment.
H
59
1 2
p p
( ) ( )
( ) ( )
0 1 2
1 2 1 2
1 2
1 2
1 2
1 2 1 2
1 2
Null Hypothesis : : 0
Test Statistics:

where
(1 ) (1 )


1 1

H p p
p p p p
x x
Z p
n n
p p p p
n n
p p p p
Z
pq
n n
=

+
= =
+

+

=
| |
+
|
\ .
60
Alternative hypothesis Rejection Region
1 1 2
: 0 H p p =
1 1 2
: 0 H p p >
1 1 2
: 0 H p p <
2 2
or Z Z z z
o o
< >
Z z
o
>
Z< z
o

Example 2.14:
Reconsider Example 2.5, At the significance level 1%, can we
conclude that the proportion of users of Toothpaste A who will
never switch to another toothpaste is higher than the proportion
of users of Toothpaste B who will never switch to another
toothpaste?
61
Solution
62
( )
( )
( ) ( )
0 1 2 1 2
1 1 2 1 2
1 2 1 2
1 2
The hypothesis to be tested are ,
: 0 is not greater than
: 0 is greater than
Therefore, test statistic is

0.20 0.17
1 1

H p p p p
H p p p p
p p p p
Z Z
pq
n n
s
>


= = =
| |
+
|
\ .
0.01
0
0
1.15
1 1
(0.187)(0.813)
500 400
Rejection Region:Z ; 2.33
Since 1.15 2.33, we fail to reject and therefore, we conclude that the
proportions of users of Toothpaste A who will
z z
H
o

=
| |
+
|
\ .
> =
<
never switch to another toothpaste
is not greater than the proportion of users of Toothpaste B who will never switch
to another toothpaste.

You might also like