Hypothesis Testing

Chapter 10 Hypotheses Testing
1. Introduction
2. Element of a Statistical Test
3. Tests for single population mean and single
population proportion
4. Tests for the differences between two population
means and between two population proportions
5. The relationship between hypothesis-testing and
confidence intervals
6. Test for single variance, and for comparing two
variances
Statistical Inference Testing

hypotheses (Ch. 10)
Are the Math students smarter than the
average people?
Do the Subway sandwiches have 6
gm fat or less?
How to test these claims?
Statistical Inference
Testing Hypotheses
A test of significance is a procedure for
evaluating the strength of the evidence
provided by the data against an hypothesis.
Terminology
Key words:
- Null Hypothesis Ho
- Alternative Hypothesis Ha
-Test Statistic and its distribution
- Rejection Region
- Type I error and II errors
- one-sided and two-sided tests
Hypotheses (Ho, Ha):

The null hypothesis, denoted by Ho, is a claim about the
population that is being tested in a statistical test. The
test is designed to assess the strength of the evidence
against the null hypothesis. Usually the null
hypothesis is a statement of no effect or no
difference.
The alternative hypothesis, denoted by Ha, is the
competing claim about the population that we are
trying to find evidence for.
Conclusions of the test, Ho versus Ha, are then
(1) Reject Ho, only if sample evidence strongly suggests
that Ho is not true. Or,
(2) Fail to reject Ho, if the sample does not contain such
evidence.
-Test Statistic
A test statistic is the function of sample data on which
a conclusion to reject or fail to reject Ho is based.
- Rejection Region (RR)
RR is the region contains all values of the test statistic
for which the null hypothesis is to be rejected in favor
of the alternative hypothesis. If for a particular sample
the computed value of the test statistic falls in RR, we
reject Ho; otherwise, we fail to reject Ho.
-Type I error and Type II error
The error of rejecting Ho when Ho is true is called
Type I error.
The error of failing to reject Ho when Ho is false is
called Type II error.
Testing hypothesis is like a court trial
What are the errors in testing

hypothesis?
- one-sided (one-tailed) and two-sided (two-tailed)

tests
If we are interested only in deviations from Ho in one
direction, then the Ha is one-sided and the test is
called to be one-sided test.
If we are interested in the difference from Ho without
specifying the direction of the difference, then Ha is
two-sided. The test is called two-sided test.
Main Components
in a Hypothesis Testing Procedure
1. Compose the null and alternative hypothesis:
2. Specify the test statistic
3. Compute the value of test statistic for the particular
sample(s) given
4. Under a confidence level , determine RR
5. Make conclusion
10.4 Hypothesis test for a population mean

Idea about testing hypothesis about
Is x close to
? How do we measure closeness?
What do you conclude if fell in location (1)? or

Location (2)?
To answer this question, assume
= some value
(This is our hypothesis). Then examine the z-score

or probability for closeness.
One-sample z-test for a population mean

Let 0 be the hypothesized value of . Assume
1. x is the sample mean from a random sample;
2. the population is approximately normal or n is large.
Ho:
= 0
x 0
Test statistic: z = / n
(a) If Ha: > 0 , then we reject Ho if z > z();

(b) If Ha: < 0 , then we reject Ho if z < -z();
(c) If Ha: 0 , then we reject Ho if |z| > z(/2).
Example 1: Are the Math students

smarter than the average people?
Suppose you are given the following information.
It is known that adults IQs have a bell-shaped
distribution with mean = 100, and SD =16.
Assume that a sample of 16 math students from Brock
University gave the average IQ of 113.
Are the math students smarter than the average
people? Is the claim supported by the above data
at 1% level?
Test for the mean of a normal population

using z-statistic when is known
Make conclusion by compare it with z() or z(/2)

for the chosen significance level .
Are Math students smarter than average

people?
If large IQs are observed for Math students,
then we conclude that Math students are indeed
not the same as the average people, but in fact
are smarter.
If Math students are no better than average
people, then Math students should give the
same IQ.
Step 1: Population characteristic of interest is

= true mean of IQ scores of math students.
Step 2: Hypotheses: Ho: =100 (no difference)

vs. Ha: >100 (smarter).
Step 3: Significance level: =0.01.
Step 4: Check assumptions: (1) random sample (2) normality
Step 5: Test statistic:
Step 6: This is a upper tailed test. So, the RR should be: z>z().
z(0.01)=2.33.
Step 7: z>2.33. So, we tend to reject Ho and conclude >100,
which means that math students are smarter.
Steps in a Hypothesis-Testing:
1. Describe the population characteristic of interest.
2. State the null hypothesis, Ho, and the alternative hypothesis,
Ha.
3. Select the significance level .
4. Check the assumptions required for the test.
5. Compute the value of the test statistic w, using the given
sample.
6. Determine the RR.
7. State the conclusion (which will be to reject the Ho if w
belongs to RR and not to reject Ho otherwise).
*
The conclusion should then be stated in the context of the

problem, and the level of significance should be included.
Two important results from previous chapters

If the population is approximately normal or n is large, then
1.
x
z=
/ n
has approximately a standard normal distribution.
2.
x
t=
s/ n
has approximately a t distribution with df=n-1.
Test for the mean of a normal population

using t-statistic when is unknown
The RR is determined by t-values.

Make conclusion by compare t with t(n-1, or
/2) for the chosen significance level .
One-sample t test for population mean

Let 0 be the hypothesized value of . Assume
1. x is the sample mean from a random sample;
2. the population is approximately normal or n is large.
Ho:
= 0
t=
x 0
s/ n
Test statistic:
(a) If Ha: > 0, then we reject Ho if t > t(n-1,);
(b) If Ha: < 0 , then we reject Ho if t < -t(n-1,);
(c) If Ha: 0 , then we reject Ho if |t| > t(n-1,/2).
Example 2. Estimating weight gains by

lambs: Inference about using t
The following are the weight gains (lbs) of six young
lambs of the same breed who had been raised on the same
diet: 8, 7, 3, 9, 2, 4
(a) Construct a 90% CI for the true mean weight
gain.
(b) Is the true mean weight gain more than 3.5 lbs?
Test using = 0.05.
Example 2: Estimating weight gains by

lambs: Inference about
using t
(a) The following are the weight gains (lbs) of six young
lambs of the same breed who had been raised on the same
2
x
diet: 8, 7, 3, 9, 2, 4 ( x = 33,
= 223).
Construct a 90% CI for the true mean weight gain of a

population of similar lambs.
Mean = 5.5, SD = s=2.88, SE=s/ n =1.1758=1.18
Df =n-1= 6-1 = 5, 1- = 0.90, t* = t / 2 =2.02
90% CI: 5.5 (2.02)*(1.18)=5.5 2.38 = ( 3.12, 7.88)
(b) Is the true mean weight gain more than 3.5 lbs?
Step 1: Population characteristic of interest is:
= true mean weight gain.
Step 2: Hypotheses: Ho: =3.5 vs. Ha: >3.5
Step 3: Significance level: =0.05.
Step 4: Check assumptions: (1) random sample (2) normality?
5.5 3.5
Step 5: Test statistic: t = 1.18 = 1.69 , df=n-1=5
Step 6: This is a upper tailed test. So, RR is: t>t(n-1,).
Step 7: Under =0.05, t(5,0.05)=2.015. t=1.69<2.015. We can not
reject Ho.
So, we conclude that there is no strong evidence to show that the
true mean weight gain is more than 3.5 lbs at significant level
0.05.
Summary of one-sample test for mean

1. Terms:
Null Hypothesis Ho
Alternative Hypothesis Ha
Test Statistic and its distribution
Rejection Region
Type I error and II errors
One-sided and two-sided tests

2. Testing for population mean
(1) Using
x
z=
/ n
when
is known.
(2) using
x
t=
s/ n
when
is unknown.
One-sample test for population proportion

p using z - distribution
H 0 : p p 0 (a predefined and
hypothesized value)
X i B1, p
n
Y X i bn, p
i1
1 X i X,
p Y
n
n
i1
and
EX p
p1 p
;
VarX
n
Thus,
apprpx.
p1 p
n
N p,
for a large n and p being not too close to

zero, practically, we use
p Y
n
apprpx.
N p,
p1 p
n
True proportion p, can be estimated by

# successes
p
# of trials
and
approx
pp
p1p
n
N p,
p1 p
.
n
approx
N0, 1.
So, a test statistic can be:

z
pp 0
p1p
n
Since p is unknown, when n is large, we
can use p instead
p observed proportion.
We could have used z
pp 0

p1 p
n
approximated test. But

it would
be more appropriate to
pp
use p 1p0 when H 0 is true.
0
for this
If H a : p p 0 , then we reject Ho if z z;
If H a : p p 0 , then we reject Ho if
z z;
|z| z/2.
Two-sample tests for the difference

between two population means
For a comparison study:
1 : population mean for treatment group
2 : population mean for control group
How to test whether: H 0 : 1 2
X1 N 1,
21
n1
X2 N 2,
X1 X2 N 1 2,
21
n1
22
n2
22
n2
When 21 , 22 are known,

X 1 X 2 1 2
N0, 1.
z
2
2
1
2
n1 n2
H0 : 1 2
Test statistic: z
X 1 X 2
2
1
n1
2
2
n2
If H a : 1 2 , then we reject Ho if
z z;
z z;
|z| z/2.
This test can be used when the

assumptions are valid.
1 Both samples are randomly selected,
2 Both populations are normally distributed,
or
2 can be replaced with 2
2 Both sample sizes are sufficiently large,

3 21 , 22 are known.
When 21 22 2 being unknown,

t
X 1 X 2 1 2
where
S pooled
S 2pooled
1
n1
1
n2
tn 1 n 2 2,
n 1 1s 21 n 2 1s 22
n 1 n 2 2
X 1 X 2
Test statistic: t
S pooled
1
n1
n12
t t n 1 n 2 2;
t t n 1 n 2 2;
|t| t /2 n 1 n 2 2.
This test can be used when the

assumptions are valid.
1 Both samples are randomly selected,
2 Both populations are normally distributed,
3 21 22 2 ,
4 2 is unknown.
Example 1. (Pappenheimer & Karnovsky

1982, described in Moore & McCabe
1989). Scientists isolate natural sleep
potion: Factor S. The rabbits in the
treatment groups were administered a
compound containing Factor S. The
rabbits in the control group were assigned
the same compound without Factor S. The
response variable is percentage of time
asleep.
Results:
Group
Treatment n 1 21 x 1 63 s 1 18. 5
Control
n 2 21 x 2 43 s 2 18. 1
Does Factor S increases the mean

percentage of sleep time? Carry out a test
with level 0. 05.
Solution:
Step 1:
Let 1 be the mean percentage of sleep
time for the rabbits in the treatment groups
were administered a compound containing
Factor S. Let 2 be the mean percentage
of sleep time for the rabbits in the control
group were assigned the same compound
without Factor S.
Step 2:
H 0 : 1 2 versus H a : 1 2
(increases)
Step 3: The significance level is 0. 05,
as required.
Step 4: We assume that the rabbits are
randomly assigned to each group, and the
percentages of sleep times for both groups
are normally distributed.
Step 5:
x 1 x 2 63 43 20
S 2pooled
n 1 1 s 21 n 2 1 s 22
n1 n2 2
2 20 18. 1 2
20
18.
5
40
334. 93
observed test statistic: t
X 1 X 2
S pooled
1
n1
n12
1
21
1
21
20
334.93
3. 54.
Step 6: The critical value is:

t n 1 n 2 2 t 0.05 40
z 0.05
1. 645
So, the rejection region is t 1. 645.
Step 7: From this sample, 3. 54 1. 645.
Therefore, we conclude that Factor S
increases the mean percentage of sleep
time, at the level of significance: 0. 05.
Example 2. The following are summary

statistics for samples from the Edmonton
2001 marathon results (x is the finishing
time in minutes)
Group
F20-29 n 1 22 x 1 260. 4 s 1 33. 9

F30-39 n 2 33 x 2 263. 4 s 2 36. 6
Is there difference in the true means of
Marathon finishing time between the group
of Women in 20s and that of Women in
30s. ?
Using this example to exercise your 7-step

testing procedure, and check with the
following key computed values:
x1 x2 3
S 2pooled
n 1 1 s 21 n 2 1 s 22
n1 n2 2
2 32 36. 6 2
21
33.
9
53
1264. 1383
SE
S pooled
1264. 1383
1
22
1
n1
1
33
1
n2
9. 79
0. 306
t 2 51 z 0.025 1. 96
Since 0. 306 1. 96, there is not enough
evidence showing a difference between
the group of Women in 20s and that of
Women in 30s at 0. 05.
Why
n 1 n 2 2S 2pooled
Why t
2 n 1 n 2 2?
2
X 1 X 2 1 2
S pooled
1
n1
1
n2
tn 1 n 2 2?
For two-sample test for comparing two

population means, H 0 : 1 2 , what if
21 , 22 are unknown but 21 22 ?
When both sample sizes are large, the test

statistic can be used:
t
X 1 X 2
S2
1
n1
S2
2
n2
approx
N0, 1;
Another approximate test in practice is:

t
X 1 X 2
S2
1
n1
S2
2
n2
approx
t 1 , with
1 minn 1 1, n 2 1
Note: This is a very conservative estimate
of the degrees of freedom;
A more accurate test is:

t
2
approx
X 1 X 2
S2
1
n1
S2
2
n2
S2
1
n1
S2
1 /n 1
n 1 1
S2
2
n2
t 2 , with
S2
2 /n 2
n 2 1
Recall: one-sample test for population

proportion p using z - distribution
H0 : p p0
p1 p
apprpx.
p N p,
n
for a large n and p being not too close to
zero, practically, we use
p Y
n
apprpx.
pp
p1p
n
p1 p
N p,
n
approx
N0, 1.
So, a test statistic can be:

z
pp 0
p 0 1p 0
n
We could have used z
pp 0

p1 p
n
for this
approximated test. Your textbook suggest

that it would be more appropriate to use
when H 0 is true.
If H a : p p 0 , then we reject Ho if z z;
z z;
|z| z/2.
p1 p2
p 1 1p 1
n1
p 2 1p 2
n2
When H 0 : p 1 p 2 is true,
approx
N0, 1
the test statistic can be

z
p1 p2
p c 1 p c n11
n p n p
1
n2
, 1
where p c 1 n 11 n 22 2 which is the consistent

estimator with the minimum variance
among all linear combination of p 1 , and p 2 .
Note, when H 0 : p 1 p 2 p d 0 0, the test

statistic can be
z
p 1 p 2 p d0
p 1 1p 1
n1
p 2 1p 2
n2
. 2
Example 8. 8 (Page 413) Two brands of

refrigerators, denote A and B, are each
guaranteed for 1 year. In a random sample
of 50 refrigerators of Brand A, 12 were
observed to fail before the guarantee
period ended. An independent random
sample of 60 refrigerators of Brand B also
revealed 12 failures during guarantee
period. Is there any difference between
two proportions of failures during the
guarantee period, with 2%.
0.
24
q 1 0. 76
n 1 50 p 1 12
1
50
12
n 2 60 p 2 60 p 2 0. 20 q 2 0. 80
z 2 z 0.01 2. 33
A 98% CI for p 1 p 2 is:
0. 24 0. 20 2. 33 0. 24 0. 76 0. 20 0. 80
60
50
0. 04 0. 1851 0. 1451, 0. 2251
H 0 : p 1 p 2 v.s. H a : p 1 p 2
0.04
Using (1), z 0.07912
0. 506.
Fail to reject H 0 : p 1 p 2 .
Using this example to exercise testing
procedure for comparing two population
proportions.
Note: The result made by computing a CI

as it is in your textbook in Page 413 would
be actually consistent with a test using (2):
p 1 p 2
.
z
p 1 1p 1
n1
p 2 1p 2
n2
In this case, z
0.04
0.07944
0. 504 2. 33.
When we use test statistic

z
pp 0

p1 p
n
When we use test statistic

p 1 p 2
z
p 1 1p 1
n1
p 2 1p 2
n2
For a two-sided test for comparing two

population proportions:
H 0 : p 1 p 2 v.s. H a : p 1 p 2
We fail to reject H 0 : p 1 p 2 at significant
level if the value of 0 lies inside a
1001 % confidence interval for p 1 p 2 ;
We reject H 0 : p 1 p 2 at significant level
if the value of 0 lies outside a 1001 %
confidence interval for p 1 p 2 .
Testing regarding population variance(s)

1. Tests for a single variance
Example 1 (7.4-1Hogg & Tanis P373) A
psychology professor claims that the
variance of IQ scores for college students
is equal to 2 100. To test this claim, it is
decided to test the hypothesis:
H 0 : 2 100 against a two-sided
alternative hypothesis H a : 2 100. A
random sample of n 23 students will be
selected, and the test will be based on the
observed unbiased estimate s 2 147. 82 of
the variance 2 of their IQ scores.
H 0 : 2 20
n1s 2
2 n 1;
n1s 2
2
1 n 1;
2
0
n1s 2
n1s 2
2
2
n 1 or
n 1.
1
2
2
Assumptions:
1 The sample is randomly selected
2 population is normally distributed
Relation with confidence intervals for 2 :

For a two-sided test for single population
variance:
H 0 : 2 20 v.s. H a : 2 20
We fail to reject H 0 : 2 20 at significant
level if the value of 20 lies inside the
1001 % confidence interval for 2 :
n1s 2
n1s 2
2 n1 , 2 n1 ;
1
2
We reject H 0 : 2 20 at significant level

if the value of 20 lies outside the
n1s 2
n1s 2
2 n1 , 2 n1 .
1
2
Suppose that IQ scores are normally

distributed.
n1s 2
Let Q 2
When H 0 : 2
n1s 2
2
2 n 1.
100 being true, the test
statistic:
Q
n1s 2
100
32. 52.
20.975 22 10. 98, 20.025 22 36. 78

Since 10. 98 32. 52 36. 78, we fail to
reject this claim under significant level of
0. 05.
Note that a 95% confidence interval for 2

can be computed by:
n1s 2
36.78
n1s 2
10.98
22147.82
,
36.78
22147.8
10.98
88. 42, 296. 18

This CI contains 2 100, so we would
again accept H 0 : 2 100, as expected.
2. Tests for equality of two population

variances:
Example 2 (7.4-2, Hogg & Tanis P376)
A biologist who studies spiders believes
that not only do female green lynx spiders
tend to be longer than their male
counterparts, but also the lengths of the
female spiders seem to vary more than
those of the male spiders. We shall test
whether this latter belief is true. ( 0. 01. )
H 0 : 21 22
s 21
F n 1 1, n 2 1;
s2
2
s 21
1
1,
n
1
;
F
2
1
s2
2
s 21
n 1 1, n 2 1 or
F
2
2
s
2
s 21
s 22
F n 2 1, n 1 1
Assumptions:
1 Two samples are random and independent;
2 Both populations are normal.
Suppose that the distribution of the length

X 1 of male spiders is N 1 , 21 ; the
distribution of the length X 2 of female
spiders is N 2 , 22 . We shall test
H0 :
21
i. e.
21
1 .
22
22
i. e.
21
22
1 v.s. H a : 21 22
30 male and 30 female spiders are

randomly selected.
The sample observations yielded:

n
Group
s2
n 1 30 x 1 5. 917 s 21 0. 4399
Male
Female n 2 30 x 2 8. 153
s 22 1. 41
F 0.01 29, 29 2. 41 (nearest degree of

freedom available!)
Since
s 21
s 22
0.4399
1.41
3. 2053 1 2. 41 1 ,
the null hypothesis is rejected in favor of

the biologists belief at a significant level of
0. 01.
Most commonly used test statistics:
Example 3. Recall the marathon example:

The following are summary statistics for
samples from the Edmonton 2001
marathon results (x is the finishing time in
minutes)
Group
F20-29 n 1 22 x 1 260. 4 s 1 33. 9

F30-39 n 2 33 x 2 263. 4 s 2 36. 6
Is there difference in the true means of
Marathon finishing time between the group
of Women in 20s and that of Women in
30s. ?
This is a test for comparing two population

means. We have previously used
x1 x2 3
S 2pooled
n 1 1 s 21 n 2 1 s 22
n1 n2 2
2
2
32
36.
6
21
33.
9
53
1264. 1383
SE
S pooled
1264. 1383
1
22
1
n1
1
33
1
n2
9. 79
0. 306
t 2 51 z 0.025 1. 96
Since 0. 306 1. 96, there is not enough
evidence showing a difference between
the group of Women in 20s and that of
Women in 30s at 0. 05.
The test used was under assumption of

21 22 . Is it a proper test?
We should test H 0 : 21 22 in advance. It
is a two-sided test. Let 0. 05
F 0.025 21, 32 F 0.025 20, 30 2. 20
(nearest degree of freedom available!)
F 0.025 32, 21 1 F 0.025 30, 21 1 1/2. 31
s 21
s 22
s 21
s 22
33.9 2
36.6 2
0. 8579 2. 20 and
0. 4329.
Therefore, the two sample t-test used with

assuming equal variances was valid.
Relation with confidence intervals for
21
22
For a two-sided test for comparing two

population variances:
H 0 : 21 22 v.s. H a : 21 22
We fail to reject H 0 : 21 22 at significant
level if the value of 1 lies inside the
21
F 2 n 2 1, n 1 1
, F 2 n 1 1, n 2 1;
We reject H 0 : 21 22 at significant level

if the value of 1 lies outside the
21
F 2 n 2 1, n 1 1
, F 2 n 1 1, n 2 1.

Hypothesis Testing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing

Uploaded by

Copyright:

Available Formats

Chapter 10 Hypotheses Testing

Statistical Inference Testing

How to test these claims?

Hypotheses (Ho, Ha):

Testing hypothesis is like a court trial

What are the errors in testing

- one-sided (one-tailed) and two-sided (two-tailed)

10.4 Hypothesis test for a population mean

? How do we measure closeness?

What do you conclude if fell in location (1)? or

(This is our hypothesis). Then examine the z-score

One-sample z-test for a population mean

(a) If Ha: > 0 , then we reject Ho if z > z();

Example 1: Are the Math students

Test for the mean of a normal population

Make conclusion by compare it with z() or z(/2)

Are Math students smarter than average

Step 1: Population characteristic of interest is

Step 2: Hypotheses: Ho: =100 (no difference)

The conclusion should then be stated in the context of the

Two important results from previous chapters

Test for the mean of a normal population

The RR is determined by t-values.

One-sample t test for population mean

Example 2. Estimating weight gains by

Example 2: Estimating weight gains by

Construct a 90% CI for the true mean weight gain of a

Summary of one-sample test for mean

Test Statistic and its distribution

Type I error and II errors

One-sided and two-sided tests

One-sample test for population proportion

for a large n and p being not too close to

True proportion p, can be estimated by

So, a test statistic can be:

Since p is unknown, when n is large, we

can use p instead

approximated test. But

Two-sample tests for the difference

When 21 , 22 are known,

This test can be used when the

2 can be replaced with 2

2 Both sample sizes are sufficiently large,

When 21 22 2 being unknown,

This test can be used when the

Example 1. (Pappenheimer & Karnovsky

Does Factor S increases the mean

observed test statistic: t

Step 6: The critical value is:

Example 2. The following are summary

F20-29 n 1 22 x 1 260. 4 s 1 33. 9

Using this example to exercise your 7-step

For two-sample test for comparing two

When both sample sizes are large, the test

Another approximate test in practice is:

A more accurate test is:

Recall: one-sample test for population

So, a test statistic can be:

We could have used z

approximated test. Your textbook suggest

the test statistic can be

where p c 1 n 11 n 22 2 which is the consistent

Note, when H 0 : p 1 p 2 p d 0 0, the test

Example 8. 8 (Page 413) Two brands of