You are on page 1of 78

Chapter 10 Hypotheses Testing

1. Introduction
2. Element of a Statistical Test
3. Tests for single population mean and single
population proportion
4. Tests for the differences between two population
means and between two population proportions
5. The relationship between hypothesis-testing and
confidence intervals
6. Test for single variance, and for comparing two
variances

Statistical Inference Testing


hypotheses (Ch. 10)
Are the Math students smarter than the
average people?
Do the Subway sandwiches have 6
gm fat or less?

How to test these claims?

Statistical Inference
Testing Hypotheses
A test of significance is a procedure for
evaluating the strength of the evidence
provided by the data against an hypothesis.

Terminology
Key words:
- Null Hypothesis Ho
- Alternative Hypothesis Ha
-Test Statistic and its distribution
- Rejection Region
- Type I error and II errors
- one-sided and two-sided tests

Hypotheses (Ho, Ha):


The null hypothesis, denoted by Ho, is a claim about the
population that is being tested in a statistical test. The
test is designed to assess the strength of the evidence
against the null hypothesis. Usually the null
hypothesis is a statement of no effect or no
difference.
The alternative hypothesis, denoted by Ha, is the
competing claim about the population that we are
trying to find evidence for.
Conclusions of the test, Ho versus Ha, are then
(1) Reject Ho, only if sample evidence strongly suggests
that Ho is not true. Or,
(2) Fail to reject Ho, if the sample does not contain such
evidence.

-Test Statistic
A test statistic is the function of sample data on which
a conclusion to reject or fail to reject Ho is based.
- Rejection Region (RR)
RR is the region contains all values of the test statistic
for which the null hypothesis is to be rejected in favor
of the alternative hypothesis. If for a particular sample
the computed value of the test statistic falls in RR, we
reject Ho; otherwise, we fail to reject Ho.
-Type I error and Type II error
The error of rejecting Ho when Ho is true is called
Type I error.
The error of failing to reject Ho when Ho is false is
called Type II error.

Testing hypothesis is like a court trial

What are the errors in testing


hypothesis?

- one-sided (one-tailed) and two-sided (two-tailed)


tests
If we are interested only in deviations from Ho in one
direction, then the Ha is one-sided and the test is
called to be one-sided test.
If we are interested in the difference from Ho without
specifying the direction of the difference, then Ha is
two-sided. The test is called two-sided test.

Main Components
in a Hypothesis Testing Procedure
1. Compose the null and alternative hypothesis:
2. Specify the test statistic
3. Compute the value of test statistic for the particular
sample(s) given
4. Under a confidence level , determine RR
5. Make conclusion

10.4 Hypothesis test for a population mean


Idea about testing hypothesis about
Is x close to

? How do we measure closeness?

What do you conclude if fell in location (1)? or


Location (2)?
To answer this question, assume

= some value

(This is our hypothesis). Then examine the z-score


or probability for closeness.

One-sample z-test for a population mean


Let 0 be the hypothesized value of . Assume
1. x is the sample mean from a random sample;
2. the population is approximately normal or n is large.
Ho:

= 0

x 0
Test statistic: z = / n

(a) If Ha: > 0 , then we reject Ho if z > z();


(b) If Ha: < 0 , then we reject Ho if z < -z();
(c) If Ha: 0 , then we reject Ho if |z| > z(/2).

Example 1: Are the Math students


smarter than the average people?
Suppose you are given the following information.
It is known that adults IQs have a bell-shaped
distribution with mean = 100, and SD =16.
Assume that a sample of 16 math students from Brock
University gave the average IQ of 113.
Are the math students smarter than the average
people? Is the claim supported by the above data
at 1% level?

Test for the mean of a normal population


using z-statistic when is known

Make conclusion by compare it with z() or z(/2)


for the chosen significance level .

Are Math students smarter than average


people?
If large IQs are observed for Math students,
then we conclude that Math students are indeed
not the same as the average people, but in fact
are smarter.
If Math students are no better than average
people, then Math students should give the
same IQ.

Step 1: Population characteristic of interest is


= true mean of IQ scores of math students.

Step 2: Hypotheses: Ho: =100 (no difference)


vs. Ha: >100 (smarter).
Step 3: Significance level: =0.01.
Step 4: Check assumptions: (1) random sample (2) normality
Step 5: Test statistic:
Step 6: This is a upper tailed test. So, the RR should be: z>z().
z(0.01)=2.33.
Step 7: z>2.33. So, we tend to reject Ho and conclude >100,
which means that math students are smarter.

Steps in a Hypothesis-Testing:
1. Describe the population characteristic of interest.
2. State the null hypothesis, Ho, and the alternative hypothesis,
Ha.
3. Select the significance level .
4. Check the assumptions required for the test.
5. Compute the value of the test statistic w, using the given
sample.
6. Determine the RR.
7. State the conclusion (which will be to reject the Ho if w
belongs to RR and not to reject Ho otherwise).
*

The conclusion should then be stated in the context of the


problem, and the level of significance should be included.

Two important results from previous chapters


If the population is approximately normal or n is large, then
1.

x
z=
/ n
has approximately a standard normal distribution.

2.

x
t=
s/ n
has approximately a t distribution with df=n-1.

Test for the mean of a normal population


using t-statistic when is unknown

The RR is determined by t-values.


Make conclusion by compare t with t(n-1, or
/2) for the chosen significance level .

One-sample t test for population mean


Let 0 be the hypothesized value of . Assume
1. x is the sample mean from a random sample;
2. the population is approximately normal or n is large.
Ho:

= 0
t=

x 0
s/ n

Test statistic:
(a) If Ha: > 0, then we reject Ho if t > t(n-1,);
(b) If Ha: < 0 , then we reject Ho if t < -t(n-1,);
(c) If Ha: 0 , then we reject Ho if |t| > t(n-1,/2).

Example 2. Estimating weight gains by


lambs: Inference about using t
The following are the weight gains (lbs) of six young
lambs of the same breed who had been raised on the same
diet: 8, 7, 3, 9, 2, 4
(a) Construct a 90% CI for the true mean weight
gain.
(b) Is the true mean weight gain more than 3.5 lbs?
Test using = 0.05.

Example 2: Estimating weight gains by


lambs: Inference about

using t

(a) The following are the weight gains (lbs) of six young
lambs of the same breed who had been raised on the same
2
x
diet: 8, 7, 3, 9, 2, 4 ( x = 33,
= 223).

Construct a 90% CI for the true mean weight gain of a


population of similar lambs.
Mean = 5.5, SD = s=2.88, SE=s/ n =1.1758=1.18
Df =n-1= 6-1 = 5, 1- = 0.90, t* = t / 2 =2.02
90% CI: 5.5 (2.02)*(1.18)=5.5 2.38 = ( 3.12, 7.88)

(b) Is the true mean weight gain more than 3.5 lbs?
Step 1: Population characteristic of interest is:
= true mean weight gain.
Step 2: Hypotheses: Ho: =3.5 vs. Ha: >3.5
Step 3: Significance level: =0.05.
Step 4: Check assumptions: (1) random sample (2) normality?

5.5 3.5
Step 5: Test statistic: t = 1.18 = 1.69 , df=n-1=5
Step 6: This is a upper tailed test. So, RR is: t>t(n-1,).
Step 7: Under =0.05, t(5,0.05)=2.015. t=1.69<2.015. We can not
reject Ho.
So, we conclude that there is no strong evidence to show that the
true mean weight gain is more than 3.5 lbs at significant level
0.05.

Summary of one-sample test for mean


1. Terms:

Null Hypothesis Ho

Alternative Hypothesis Ha

Test Statistic and its distribution

Rejection Region

Type I error and II errors

One-sided and two-sided tests


2. Testing for population mean
(1) Using

x
z=
/ n

when

is known.

(2) using

x
t=
s/ n

when

is unknown.

One-sample test for population proportion


p using z - distribution
H 0 : p p 0 (a predefined and
hypothesized value)
X i B1, p
n

Y X i bn, p
i1

1 X i X,

p Y
n
n
i1

and
EX p
p1 p
;
VarX
n

Thus,

apprpx.

p1 p
n

N p,

for a large n and p being not too close to


zero, practically, we use

p Y
n

apprpx.

N p,

p1 p
n

True proportion p, can be estimated by


# successes

p
# of trials
and

approx

pp
p1p
n

N p,

p1 p
.
n

approx

N0, 1.

So, a test statistic can be:


z

pp 0
p1p
n

Since p is unknown, when n is large, we

can use p instead

p observed proportion.
We could have used z

pp 0


p1 p
n

approximated test. But


it would
be more appropriate to

pp
use p 1p0 when H 0 is true.
0

for this

If H a : p p 0 , then we reject Ho if z z;
If H a : p p 0 , then we reject Ho if
z z;
If H a : p p 0 , then we reject Ho if
|z| z/2.

Two-sample tests for the difference


between two population means
For a comparison study:
1 : population mean for treatment group
2 : population mean for control group
How to test whether: H 0 : 1 2
X1 N 1,

21
n1

X2 N 2,

X1 X2 N 1 2,

21
n1

22
n2

22
n2

When 21 , 22 are known,


X 1 X 2 1 2
N0, 1.
z
2
2
1
2
n1 n2

H0 : 1 2
Test statistic: z

X 1 X 2
2
1
n1

2
2
n2

If H a : 1 2 , then we reject Ho if
z z;
If H a : 1 2 , then we reject Ho if
z z;
If H a : 1 2 , then we reject Ho if
|z| z/2.

This test can be used when the


assumptions are valid.
1 Both samples are randomly selected,
2 Both populations are normally distributed,
or

2 can be replaced with 2

2 Both sample sizes are sufficiently large,


3 21 , 22 are known.

When 21 22 2 being unknown,


t

X 1 X 2 1 2

where

S pooled
S 2pooled

1
n1

1
n2

tn 1 n 2 2,

n 1 1s 21 n 2 1s 22
n 1 n 2 2
X 1 X 2

Test statistic: t

S pooled

1
n1

n12

If H a : 1 2 , then we reject Ho if
t t n 1 n 2 2;
If H a : 1 2 , then we reject Ho if
t t n 1 n 2 2;
If H a : 1 2 , then we reject Ho if
|t| t /2 n 1 n 2 2.

This test can be used when the


assumptions are valid.
1 Both samples are randomly selected,
2 Both populations are normally distributed,
3 21 22 2 ,
4 2 is unknown.

Example 1. (Pappenheimer & Karnovsky


1982, described in Moore & McCabe
1989). Scientists isolate natural sleep
potion: Factor S. The rabbits in the
treatment groups were administered a
compound containing Factor S. The
rabbits in the control group were assigned
the same compound without Factor S. The
response variable is percentage of time
asleep.
Results:
Group

Treatment n 1 21 x 1 63 s 1 18. 5
Control

n 2 21 x 2 43 s 2 18. 1

Does Factor S increases the mean


percentage of sleep time? Carry out a test
with level 0. 05.

Solution:
Step 1:
Let 1 be the mean percentage of sleep
time for the rabbits in the treatment groups
were administered a compound containing
Factor S. Let 2 be the mean percentage
of sleep time for the rabbits in the control
group were assigned the same compound
without Factor S.
Step 2:
H 0 : 1 2 versus H a : 1 2
(increases)
Step 3: The significance level is 0. 05,
as required.
Step 4: We assume that the rabbits are
randomly assigned to each group, and the
percentages of sleep times for both groups
are normally distributed.

Step 5:
x 1 x 2 63 43 20
S 2pooled

n 1 1 s 21 n 2 1 s 22

n1 n2 2
2 20 18. 1 2
20

18.
5

40
334. 93

observed test statistic: t

X 1 X 2
S pooled

1
n1

n12

1
21

1
21

20
334.93

3. 54.

Step 6: The critical value is:


t n 1 n 2 2 t 0.05 40
z 0.05
1. 645
So, the rejection region is t 1. 645.
Step 7: From this sample, 3. 54 1. 645.
Therefore, we conclude that Factor S
increases the mean percentage of sleep
time, at the level of significance: 0. 05.

Example 2. The following are summary


statistics for samples from the Edmonton
2001 marathon results (x is the finishing
time in minutes)
Group

F20-29 n 1 22 x 1 260. 4 s 1 33. 9


F30-39 n 2 33 x 2 263. 4 s 2 36. 6
Is there difference in the true means of
Marathon finishing time between the group
of Women in 20s and that of Women in
30s. ?

Using this example to exercise your 7-step


testing procedure, and check with the
following key computed values:
x1 x2 3
S 2pooled

n 1 1 s 21 n 2 1 s 22

n1 n2 2
2 32 36. 6 2
21

33.
9

53
1264. 1383

SE

S pooled
1264. 1383

1
22

1
n1
1
33

1
n2

9. 79
0. 306

t 2 51 z 0.025 1. 96
Since 0. 306 1. 96, there is not enough
evidence showing a difference between
the group of Women in 20s and that of
Women in 30s at 0. 05.

Why

n 1 n 2 2S 2pooled

Why t

2 n 1 n 2 2?

2
X 1 X 2 1 2
S pooled

1
n1

1
n2

tn 1 n 2 2?

For two-sample test for comparing two


population means, H 0 : 1 2 , what if
21 , 22 are unknown but 21 22 ?

When both sample sizes are large, the test


statistic can be used:
t

X 1 X 2
S2
1
n1

S2
2
n2

approx

N0, 1;

Another approximate test in practice is:


t

X 1 X 2
S2
1
n1

S2
2
n2

approx

t 1 , with

1 minn 1 1, n 2 1
Note: This is a very conservative estimate
of the degrees of freedom;

A more accurate test is:


t
2

approx

X 1 X 2
S2
1
n1

S2
2
n2
S2
1
n1

S2
1 /n 1
n 1 1

S2
2
n2

t 2 , with

S2
2 /n 2
n 2 1

Recall: one-sample test for population


proportion p using z - distribution
H0 : p p0
p1 p
apprpx.
p N p,
n
for a large n and p being not too close to
zero, practically, we use

p Y
n

apprpx.

pp
p1p
n

p1 p
N p,
n

approx

N0, 1.

So, a test statistic can be:


z

pp 0
p 0 1p 0
n

We could have used z

pp 0

p1 p
n

for this

approximated test. Your textbook suggest


that it would be more appropriate to use
when H 0 is true.

If H a : p p 0 , then we reject Ho if z z;
If H a : p p 0 , then we reject Ho if
z z;
If H a : p p 0 , then we reject Ho if
|z| z/2.

p1 p2
p 1 1p 1
n1

p 2 1p 2
n2

When H 0 : p 1 p 2 is true,

approx

N0, 1

the test statistic can be


z

p1 p2
p c 1 p c n11
n p n p

1
n2

, 1

where p c 1 n 11 n 22 2 which is the consistent


estimator with the minimum variance
among all linear combination of p 1 , and p 2 .

Note, when H 0 : p 1 p 2 p d 0 0, the test


statistic can be
z

p 1 p 2 p d0
p 1 1p 1
n1

p 2 1p 2
n2

. 2

Example 8. 8 (Page 413) Two brands of


refrigerators, denote A and B, are each
guaranteed for 1 year. In a random sample
of 50 refrigerators of Brand A, 12 were
observed to fail before the guarantee
period ended. An independent random
sample of 60 refrigerators of Brand B also
revealed 12 failures during guarantee
period. Is there any difference between
two proportions of failures during the
guarantee period, with 2%.

0.
24
q 1 0. 76
n 1 50 p 1 12
1
50

12
n 2 60 p 2 60 p 2 0. 20 q 2 0. 80
z 2 z 0.01 2. 33
A 98% CI for p 1 p 2 is:
0. 24 0. 20 2. 33 0. 24 0. 76 0. 20 0. 80
60
50
0. 04 0. 1851 0. 1451, 0. 2251

H 0 : p 1 p 2 v.s. H a : p 1 p 2
0.04
Using (1), z 0.07912
0. 506.
Fail to reject H 0 : p 1 p 2 .
Using this example to exercise testing
procedure for comparing two population
proportions.

Note: The result made by computing a CI


as it is in your textbook in Page 413 would
be actually consistent with a test using (2):
p 1 p 2
.
z
p 1 1p 1
n1

p 2 1p 2
n2

In this case, z

0.04
0.07944

0. 504 2. 33.

When we use test statistic


z

pp 0


p1 p
n

When we use test statistic


p 1 p 2
z
p 1 1p 1
n1

p 2 1p 2
n2

For a two-sided test for comparing two


population proportions:
H 0 : p 1 p 2 v.s. H a : p 1 p 2
We fail to reject H 0 : p 1 p 2 at significant
level if the value of 0 lies inside a
1001 % confidence interval for p 1 p 2 ;
We reject H 0 : p 1 p 2 at significant level
if the value of 0 lies outside a 1001 %
confidence interval for p 1 p 2 .

Testing regarding population variance(s)


1. Tests for a single variance
Example 1 (7.4-1Hogg & Tanis P373) A
psychology professor claims that the
variance of IQ scores for college students
is equal to 2 100. To test this claim, it is
decided to test the hypothesis:
H 0 : 2 100 against a two-sided
alternative hypothesis H a : 2 100. A
random sample of n 23 students will be
selected, and the test will be based on the
observed unbiased estimate s 2 147. 82 of
the variance 2 of their IQ scores.

H 0 : 2 20
If H a : 2 20 , then we reject Ho if
n1s 2
2 n 1;

If H a : 2 20 , then we reject Ho if
n1s 2
2

1 n 1;
2
0

If H a : 2 20 , then we reject Ho if
n1s 2
n1s 2
2
2
n 1 or
n 1.

1
2
2

Assumptions:
1 The sample is randomly selected
2 population is normally distributed

Relation with confidence intervals for 2 :


For a two-sided test for single population
variance:
H 0 : 2 20 v.s. H a : 2 20
We fail to reject H 0 : 2 20 at significant
level if the value of 20 lies inside the
1001 % confidence interval for 2 :
n1s 2
n1s 2
2 n1 , 2 n1 ;

1
2

We reject H 0 : 2 20 at significant level


if the value of 20 lies outside the
1001 % confidence interval for 2 :
n1s 2
n1s 2
2 n1 , 2 n1 .

1
2

Suppose that IQ scores are normally


distributed.
n1s 2
Let Q 2
When H 0 : 2

n1s 2
2

2 n 1.

100 being true, the test

statistic:
Q

n1s 2
100

32. 52.

20.975 22 10. 98, 20.025 22 36. 78


Since 10. 98 32. 52 36. 78, we fail to
reject this claim under significant level of
0. 05.

Note that a 95% confidence interval for 2


can be computed by:

n1s 2
36.78

n1s 2
10.98

22147.82
,
36.78

22147.8
10.98

88. 42, 296. 18


This CI contains 2 100, so we would
again accept H 0 : 2 100, as expected.

2. Tests for equality of two population


variances:
Example 2 (7.4-2, Hogg & Tanis P376)
A biologist who studies spiders believes
that not only do female green lynx spiders
tend to be longer than their male
counterparts, but also the lengths of the
female spiders seem to vary more than
those of the male spiders. We shall test
whether this latter belief is true. ( 0. 01. )

H 0 : 21 22
If H a : 21 22 , then we reject Ho if
s 21
F n 1 1, n 2 1;
s2
2

If H a : 21 22 , then we reject Ho if
s 21
1

1,
n

1
;
F
2
1
s2
2

If H a : 21 22 , then we reject Ho if
s 21
n 1 1, n 2 1 or

F
2
2
s
2

s 21
s 22

F n 2 1, n 1 1

Assumptions:
1 Two samples are random and independent;
2 Both populations are normal.

Suppose that the distribution of the length


X 1 of male spiders is N 1 , 21 ; the
distribution of the length X 2 of female
spiders is N 2 , 22 . We shall test
H0 :

21

i. e.

21

1 .

22

22

i. e.

21
22

1 v.s. H a : 21 22

30 male and 30 female spiders are


randomly selected.

The sample observations yielded:


n

Group

s2

n 1 30 x 1 5. 917 s 21 0. 4399

Male

Female n 2 30 x 2 8. 153

s 22 1. 41

F 0.01 29, 29 2. 41 (nearest degree of


freedom available!)
Since

s 21
s 22

0.4399
1.41

3. 2053 1 2. 41 1 ,

the null hypothesis is rejected in favor of


the biologists belief at a significant level of
0. 01.

Most commonly used test statistics:

Example 3. Recall the marathon example:


The following are summary statistics for
samples from the Edmonton 2001
marathon results (x is the finishing time in
minutes)
Group

F20-29 n 1 22 x 1 260. 4 s 1 33. 9


F30-39 n 2 33 x 2 263. 4 s 2 36. 6
Is there difference in the true means of
Marathon finishing time between the group
of Women in 20s and that of Women in
30s. ?

This is a test for comparing two population


means. We have previously used
x1 x2 3
S 2pooled

n 1 1 s 21 n 2 1 s 22

n1 n2 2
2
2

32

36.
6
21

33.
9

53
1264. 1383

SE

S pooled
1264. 1383

1
22

1
n1
1
33

1
n2

9. 79
0. 306

t 2 51 z 0.025 1. 96
Since 0. 306 1. 96, there is not enough
evidence showing a difference between
the group of Women in 20s and that of
Women in 30s at 0. 05.

The test used was under assumption of


21 22 . Is it a proper test?
We should test H 0 : 21 22 in advance. It
is a two-sided test. Let 0. 05
F 0.025 21, 32 F 0.025 20, 30 2. 20
(nearest degree of freedom available!)
F 0.025 32, 21 1 F 0.025 30, 21 1 1/2. 31
s 21
s 22
s 21
s 22

33.9 2
36.6 2

0. 8579 2. 20 and

0. 4329.

Therefore, the two sample t-test used with


assuming equal variances was valid.

Relation with confidence intervals for

21
22

For a two-sided test for comparing two


population variances:
H 0 : 21 22 v.s. H a : 21 22
We fail to reject H 0 : 21 22 at significant
level if the value of 1 lies inside the
21
1001 % confidence interval for 2 :
F 2 n 2 1, n 1 1

, F 2 n 1 1, n 2 1;

We reject H 0 : 21 22 at significant level


if the value of 1 lies outside the
21
1001 % confidence interval for 2 :
F 2 n 2 1, n 1 1

, F 2 n 1 1, n 2 1.

You might also like