Chapter 2 Hypothesis Testing

Chapter 2 Statistical Hypothesis Testing
_________________________________________________________________
_
STATISTICAL TESTING OF HYPOTHESIS
Concepts and Definitions:
In reality, the problem confronting the scientist or researcher is not so

much the estimation of a population parameter, but rather the formation of a data-
based decision procedure that can produce a conclusion about some scientific
system. For example, the BFAD must decide whether a new flu vaccine is
“effective” or “ineffective”, or a sociologist might wish to collect between
appropriate data to enable him or her to decide whether a person’s blood type and
skin color are independent variables. In each of these cases the BFAD or the
sociologist postulates or conjectures something about a system. In addition, each
must involve the use of experimental data and decision making that is based on
the data. Formally, in each case, the postulate can be put in the form of statistical
hypothesis. Procedures that lead to the acceptance or rejection of statistical
hypotheses such as these comprise a major role of statistical inference. In this
section, we are going to discuss the basic principles of statistical hypothesis
testing.
What is a Hypothesis?
A hypothesis is an assertion or a statement about a population. Data are

gathered to check the validity of this claim. In practice, it is assumed that the
population is so large that it is not feasible to collect all its elements and verify the
hypothesis. In this light, we only use the available data (large enough) to check
the reasonableness of the statement.
What is Hypothesis Testing?
Statistical hypothesis testing is a procedure of dichotomizing the

conflicting theories, and test these statements based from the sample evidence and
probability theory. At the end of this procedure, we can “reject” or “do not reject”
the assumed true hypothesis.
Five-Step Procedure for Testing a Hypothesis
There is a five-step procedure that systematizes hypothesis testing; when

we get to step 5, we are ready to reject or not reject the hypothesis. But keep in
mind that this procedure used by statistician does not provide proof that
something is true, in the manner that mathematician “proves” a theorem. It does
provide a “proof beyond reasonable doubt,” in the manner of the court system.
The reader must be accustomed to understand that the acceptance of a hypothesis
merely implies that the data do not give sufficient evidence to refute it. On the
other way, rejection implies that the sample evidence refutes it. Put another way,
rejection means that there is a small probability of obtaining the sample
information observed when, in fact, the hypothesis is true.
17
_________________________________________________________________
_
Step 1. State the Null Hypothesis and the Alternative Hypothesis.
This procedure always starts with giving the two mutually exhaustive
hypotheses: the null hypothesis, denoted by H 0 , and the alternative hypothesis,
denoted by H1 . The null hypothesis is always assumed to be true before
performing the test. Sometimes, it is referred as the status quo. In practice, the
null hypothesis is expressed as a statement concerning the value of a population
parameter (say the mean). On the other hand, the alternative hypothesis
describes what you will conclude if you reject the null hypothesis. It is often
called the research hypothesis since it is the alternative that researchers want to
get.
To illustrate these concepts, suppose we want to test whether a coin is

“fair” or “biased”. Based from this problem, the appropriate null hypothesis will
be the probability of getting a head, say p, must be equal to ½. That is, H 0 : p  1 2
. On the other hand, the appropriate alternative hypothesis will be H1 : p  1 2 . The
alternative seems obvious! The coin will be biased if the chance of getting a head
or tail is not equal to 0.5. Remember that in most cases, the null hypothesis is
expressed as    0 , where  is the unknown population parameter and 0 is the
assumed value to be tested.
Step 2. Select a Level of Significance.
After establishing the null and alternative hypotheses, the next step is to
choose the level of significance, denoted by  . To understand the notion of level
of significance, a researcher may commit two possible errors in testing
hypothesis. If the researcher rejects the null hypothesis, given that it is true, then
he or she committed a Type I error. On the other hand, if the researcher does not
reject the null hypothesis, given that it is false, then he or she committed a Type II
error. The following table below summarizes the decisions the researcher could
make and the possible consequences.
Researcher
Null hypothesis Do not reject H_0 Rejects H_0
H_0 is true Correct decision Type I error
H_0 is false Type II error Correct decision
We denote   P  Type I error  , while   P  Type II error  . In theory,

both of these error probabilities must be small, in order to say that the test is valid.
But simulation studies show that we cannot minimize these error probabilities
simultaneously. In fact, if we decrease the  , then  increases. The reverse
process is also true. In practice, we usually select a small value for  , and find a
test with maximum 1   . In statistical theory, 1   is called the power of a test.
In biological studies, the usual values of  are 0.05 and 0.01. Sometimes, a test
18
_________________________________________________________________
_
that rejects the null hypothesis at 0.05 level of significance is called significant,
while a test that rejects the null hypothesis at 0.01 level of significance is called
highly significant.
Step 3. Select the Appropriate Test Statistic
A test statistic is a value, determined from sample information, used to

determine whether to reject the null hypothesis. We will deal with these in details
later.
Now consider the tossing of a coin problem posted earlier. Since we want
to test the “fairness” of the coin, the appropriate test statistic is the number of
heads occurred in n tosses, say n = 100. We can also use the ratio between the
number of heads occurred and the total number of tosses.
Step 4. Formulate the Decision Rule.
A decision rule is a statement of the specific conditions under which the

null hypothesis is rejected and the conditions under which it is not rejected. The
region or area of rejection defined the location of all those values that are so large
or so small that the probability of their occurrence under true null hypothesis is
rather remote.
Going back to our problem regarding the “fairness” of a coin, note that if
the null hypothesis is true (i.e., p = 0.5), then the number of heads in 100 tosses
should not be too different from 50. So if the decision rule is to reject H 0 if and
only if X  47 or X  52 , then we tend to reject the null hypothesis very often.
This is the same as saying that a defendant is guilty if one witness appears in the
court! On the other extreme, if we reject the null hypothesis if and only if X  30
or X  80 , then we can think of this as having a roomful of witnesses before we
say that the defendant is guilty. In practice, the decision rule will be based on the
value of  and the probability distribution of the test statistic.
Step 5. Make a Decision

The fifth and final step is computing the value of the test statistic, check
whether its value is inside the rejection region, and making a decision to reject or
not reject the null hypothesis.
P-values in Hypothesis Test
A commonly used method in testing hypothesis is to report the p-value of the

data. A p-value does not require imposing a pre-selected level of significance.
(Sometimes, selecting the appropriate  is troublesome to the researcher.) The p-
value is the probability that the test statistic will take on a value that is at least as
extreme as the observed value of the statistic when the null hypothesis is assumed
to be true. Thus, a p-value conveys much information about the weight of
evidence against H 0 and so a decision maker can draw a conclusion at any
specified level of significance. Formally, we define the p-value as the smallest
level of significance that would lead to the rejection of the null hypothesis
given the observed data. With this definition, if the observed p-value is equal to
0.40, then we will reject the null hypothesis at this level. But using this value is
too large! We can’t afford to have this very large probability of committing type I
error. Thus, we tend to do not reject the null hypothesis. On the other hand, if p-
value = 0.0001, then we can reject the null hypothesis at this very small level of
19
_________________________________________________________________
_
significance. In this case, we reject the null hypothesis. Thus, given a specified
value  , we reject H 0 if and only if p  value   . Otherwise, we do not reject.
This method of testing hypothesis is very popular nowadays since almost
all statistical packages report the p-value for specific test.
One-Tailed and Two-Tailed Tests
In theory, it is assumed that the test statistic has no fixed value. That is, it
is assumed to be random. The probability distribution of a test statistic is often
called as the sampling distribution. Now in hypothesis testing, we partition the
set of all values of the test statistic into rejection and acceptance region. If the
value of the test statistic falls in the rejection region, then we reject the null
hypothesis. Otherwise, we do not reject it. If the region rejections are located at
the tails of the distribution of the test statistic, then we have a two-tailed test.
Rejection region
Rejection region Acceptance Rejection region
region
The sampling distribution of a z test
Otherwise, we have a one-tailed test. The figure below depicts two one-
tailed tests using z statistic.
Rejection Acceptance
region region
(Left) one-tailed test
Acceptance Rejection
region region
(Right) one-tailed test
20
_________________________________________________________________
_
Tests Concerning a Single Mean
Large Sample ( Z –test )
Suppose that a random sample X 1 ,K , X n is drawn from a normal

population with mean  and standard deviation  , and suppose we want to test
H 0 :   0 against an alternative hypothesis H1 . If  is known, then the test
statistic for this testing problem is given by
X  0
Z
/ n
This test statistic has a standard normal distribution, and often called the
z-test. For a specified level of significance  , the rejection region for specific
alternative hypothesis is given below.
Alternative hypothesis Reject H 0 if and only if

  0 Z   z
  0 Z  z
  0 Z   z / 2 or Z  z / 2
In the table, the critical value z is chosen such that P  Z  z    .
1-α
α
Z_α
Example 1: Medical researchers have developed a new artificial heart

constructed primarily of titanium and plastic. The heart will last
and operate almost indefinitely once it is implanted in the patient’s
body, but the battery pack needs to be recharged about every four
hours. A random sample of 50 battery packs is selected and
subjected to a life test. The average life of these batteries is 4.05
hours. Assume that battery life is normally distributed with
standard deviation   0.2 hours. Is there evidence to support the
claim that the mean battery life exceeds 4 hours? Use   0.05 .
Solution: We will solve this testing problem in 5 steps.

1. H 0 :   4 (the mean life time of the battery packs is 4 hours)
H1 :   4 (the mean life time of the battery packs exceeds 4 hours)
21
_________________________________________________________________
_
2. In the problem, it is indicated to use 0.05 level of significance.
3. Since we are testing the mean lifetime and the population standard
deviation is known ( σ / √ n = 0.2 / √ 50 = 0.02828 ), then we use
megastat. In the normal distribution box, enter 4 for the mean,
0.02828 for the standard deviation, and 4.05 for the x. Click
preview for the result. The result is z = 1.77.
4. At 0.05 level of significance, we reject H 0 if and only if

Z  z0.05  1.645 . (Using normal distribution by megastat or
NORMSINV function ).
NOTE: Since it is a right tailed test, use positive result. Thus, Z > z 0.05 = 1.645
1.77
1.645
Rejection region for the example
5. Based from the rejection region (and critical value 1.645), we see
that the z value is inside the rejection region (or 1.77 is greater than
22
_________________________________________________________________
_
1.645). Thus, we reject the null hypothesis and conclude that the
battery life exceeds 4 hours.
Example 2 A random sample of 100 deaths in the Philippines last year showed
an average life span of 69.3 years. Assuming a population standard
deviation of 7.8 years, does this seem to indicate that the life span
today is lesser than 70 years? Use a 0.01 level of significance.
Solution: We will solve this testing problem in 5 steps.

1. H0: µ = 70 years.
H1: µ < 70 years.
2. Use a = 0.01.
3. Since we are testing the mean life span and the population standard
deviation is known (σ / √ n = 7.8 / √ 100 = 0.78. Using normal
distribution under megastat, z = – 0.90.
4. At 0.01 level of significance, we reject H 0 if and only if

Z < z 0.01= -2.33 (Using normal distribution by megastat or
NORMSINV function)
Note: Since it is a left-tailed test, use negative result. Thus, z = -2.33.
-0.90
23
_________________________________________________________________
_
-2.33
5. Based from the rejection region (and critical value –2.33), we see
that the z value is outside the rejection region (or –0.90 is greater
than –2.33). Thus, we do not reject the null hypothesis and
conclude that the life expectancy of Filipinos is 70 years.
Other Uses for the Normal Z Test
The normal Z test can also be used on the following cases:

1. the population standard deviation  is unknown and the sample size is
large enough. (Simulations show that the approximation is valid provided
that the sample size is at least 30.) In this case, we replace  by s, where s
n
is defined by (X i  X )2
.
s i 1
n 1
2. the population is not known to be normal and the sample size is large
enough. In this case, we apply the central limit theorem. Also, if the
population standard deviation is not known, use s.
Small Sample ( t –test )
Another important continuous distribution in statistics: the Student t

distribution or simply the t distribution. The t curve and the standard normal curve
are shown below.
Z distribution
T distribution
Note particularly that the t distribution is flatter, more spread out, than the
standard normal distribution. This is because t distributions have larger standard
deviations than the standard normal.
The following characteristics of the t distribution are based on the
assumption that the population of interest is normal, or nearly normal.
1. It is, like the z distribution, bell-shaped and symmetrical about zero
(0).
2. There is not one t distribution, but rather a “family” of t distribution.
All t distributions have mean 0, but their standard deviations differ
according to the sample size n. But as the sample size increases, the t
distribution approaches the z distribution. That is, for large n, t and z
are almost identical.
Suppose that we observe a random sample X 1 ,K , X n from a normal

population with mean  and standard deviation  . In this case, we assume that
24
_________________________________________________________________
_
the sample size is small (usually smaller than 30) and the population standard
deviation is not known (In practice this is usually the case). Suppose we want to
test H 0 :   0 versus some alternative hypothesis. For this testing problem, we
use the t statistic defined by
X  0
T
S/ n
where S is the sample standard deviation. The following provides the rejection
regions for this testing problem.
Alternative Hypothesis Rejection Region

  0 T  t ,n 1
  0 T  t ,n 1
  0 T  t / 2,n 1 or T  t / 2,n 1
where t , n 1 is the critical value of a t distribution with n  1 degrees of freedom

such that P (T  t ,n 1 )   .
Example 3 A recent survey stated that cell phone owners received an average
of 50 texts daily. To test the claim, a researcher surveyed 25 cell
phone owners and found out that the average number of received
text was 46. The standard deviation of the sample was 7. At 0.03
level of significance, is there enough evidence to reject the
survey’s claim?
Solution: 1. H0:  = 50
H1:   50
2. Use a = 0.03.
3. Compute for the t – statistic. Using Excel, enter the label

( text ), sample mean ( 46 ), sample s.d. ( 7 ), and number
of samples ( 25 ).Click megastat menu, then select
Hypothesis tests and choose Mean vs. Hypothesized Value.
At the dialog box, select summary input. In the input range,
highlight the range values from A1 to A4. Enter 50, the
population mean, for the hypothesized mean, not equal for
the alternative, and 97% for the confidence interval. Note
that confidence interval is equal to ( 1 – a ) 100%. Choose,
of course, t-test. For clarity, look at the figure below.
25
_________________________________________________________________
_
After pressing OK, megastat will provide an output sheet

where all the necessary computations and statistics will be
displayed.
The result t – value = - 2.86.
4. Compute for the critical region. At a = 0.03, we reject H0 if

T  t / 2,n 1 or T  t / 2,n 1 . Using t – distribution, select
calculate t given probability. Enter 0.015 for the probability
for it is two-tailed test ( a/2 = 0.03/2 = 0.015 ). Enter 24 for
the degrees of freedom ( n – 1 = 25 – 1 = 24 ). Click
preview.
26
_________________________________________________________________
_
Note: Since it is two-tailed, t = ± 2.31.
Rejection region
- 2.86
-2.31 2.31
5. Based from the rejection region (and critical value ± 2.31),

we see that the t value is inside the rejection region
(or –2.86 < - 2.31 ). Thus, we reject the null hypothesis and
conclude that cell phone owners received text is not equal
to 50 texts daily.
Example 4 A company claims that the mean weight per banana it ships is 150
grams. A quality control supervisor, inspect from a random sample
of 11 and weigh each banana. The results are reported below in
grams.
152 149 157 155 152 148 147 149 150 152 156
Is there a sufficient evidence to reject the company’s claim? Use

a = 0.05.
Solution: 1. H0 µ = 150 grams.

H1 µ ≠ 150
2. Use a = 0.05.
3. Compute for the t – statistic. Using Excel, enter the data.

Click megastat menu, then select Hypothesis tests and
choose Mean vs. Hypothesized Value. At the dialog box,
select data input. In the input range, highlight the range
values from A1 to A11. Enter 150, the population mean, for
the hypothesized mean, not equal for the alternative, and
95% for the confidence interval. Note that confidence
27
_________________________________________________________________
_
interval is equal to ( 1 – a ) 100%. Choose, of course, t-test.
For clarity, look at the figure below.
After pressing OK, megastat will provide an output sheet

where all the necessary computations and statistics will be
displayed.
28
_________________________________________________________________
_
The result t – value = 1.54.
4. The critical region using t – distribution, t = ± 2.23.
1.54
-2.23 2.23
5. Do not reject. Therefore, the mean weight of banana was

150 grams.
Two Samples: Tests on Two Means
Normal Test for Two Independent Samples
Suppose now that we obtain two independent normal samples. That is, let
X 1 ,K , X n be normally distributed with mean 1 and standard deviation, and
Y1 ,K Ym be normally distributed with mean  2 and standard deviation  2 . In this
case, we want to test H 0 : 1   2 versus some alternative hypothesis H1 . If the
population standard deviations are both known, then we use the Z statistic defined
by
X Y
Z
 12  22

n m
The rejection regions, together with their respective alternative hypothesis,

are summarized below.
1  2 Z   z
1  2 Z  z
1  2 Z   z / 2 or Z  z / 2
Note that the above test can also be used when the population standard
deviations are not known provided that both n and m are large. Just replace  1 by
s1 , and  2 by s2 .
Example 5 A soft-drink manufacturer claims that its 12 – ounce bottles

contains on average 30 calories. A supervisor took a random
sample of 40 bottles of this soft-drink, which were checked for
calories, contained a mean of 32.4 calories and a standard
deviation of 2 calories. Another supervisor took a random sample
of 50 bottles and found out that the mean calories contained were
29
_________________________________________________________________
_
29.5 and standard deviation of 3.4 calories. Test at 1% significance
level if the sample results indicates that the bottled soft-drink have
equal calories content.
Solution: 1. H 0 : 1   2 ( samples have the same calorie content )

H1: 1  2
2. Use a = 0.01 level of significance.
3. Compute for the z– statistic. Using excel, enter the data into
two columns. In this case consider the following data for
column A: test 1, 32.4, 2, 40; and for column B: test 2,
29.5, 3.4, 50. Click megastat menu and select Hypothesis
Tests. In the Hypothesis Tests option, select Compare Two
Independent Groups. At the dialog box, select summary
input. Highlight A1 to A4 for Group 1 and B1 to B4 for
Group 2. Select z – test option and change the confidence
interval for 99%. Look at the figure on the next page
After pressing OK, megastat will provide an output sheet where all the necessary
computations and statistics will be displayed.
The result is z = 5.04.
30
_________________________________________________________________
_
4. The critical region at a = 0.01 ( two tailed ), z = ± 2.58.
5.04
-2.58 2.58
5. Reject H0 and conclude that the samples don’t have the

same calorie content.
Two-Sample t-Test
In this case, we consider two independent random samples X 1 ,K , X n and

Y 1 , Y2 ,K , Ym , where X i is normally distributed with mean 1 and standard
deviation  1 , while Y j is normally distributed with mean  2 and standard deviation
 2 . Also, we assume that both the sample sizes n and m are small. Suppose we
consider testing H 0 : 1   2 versus some alternative hypothesis. If we assume that
the two population standard deviations are equal, then the test statistic for this
testing problem is given by
X Y
T
SP 1  1
n m
where S P , known as the pooled standard deviation, is given by
( n  1) S12  ( m  1) S 22
SP 
nm2
The t-statistic has a t distribution with n  m  2 degrees of freedom. The

following table summarizes the rejection regions and the corresponding
alternative hypotheses.

 1  2 T  t ,n  m 2
1  2 T  t ,n  m  2
1  2 T  t / 2,n  m 2 or T  t / 2,n  m 2
Example 6 A manufacturer claims that the average tensile strength of thread X

exceeds the average tensile strength of thread Y by at least 10
kilograms. To test his claim, 17 pieces of each type of thread are
tested under similar conditions. Type X thread had an average
tensile strength of 85.7 kilograms with standard deviation of 5.67
kilograms, while thread Y had an average tensile strength of 75.3
31
_________________________________________________________________
_
kilograms with standard deviation of 4.46 kilograms. Test the
manufacturer’s claim using a 0.01 level of significance.
Solution: 1. H0: µ1 − µ2 = 10.

H1: µ1 − µ2 > 10.
2. Use a = 0.01.
3. Compute for the t– statistic. Using excel, enter the data into
two columns. In this case consider the following data for
column A: thread X, 85.7, 5.67, 17; and for column B:
thread Y, 75.3, 4.46, 17. Click megastat menu and select
Hypothesis Tests. In the Hypothesis Tests option, select
Compare Two Independent Groups. At the dialog box,
select summary input. Highlight A1 to A4 for Group 1 and
B1 to B4 for Group 2. Enter 10 at the hypothesized
difference. Select greater than option in the Alternative.
Select t – test ( pooled variance ) option and change the
confidence interval for 99%.
Press OK for result.
32
_________________________________________________________________
_
The result, t = 0.23.
4. The critical region at a = 0.01 with 32 degrees of freedom

is t > 2.45.
0.23
2.45
5. Do not reject. We are unable to conclude that the tensile

strength of thread X exceeds that of thread Y by more than
10 kilograms.
Test on a Single Proportion
Tests of hypotheses concerning proportions are required in many areas. All

manufacturing firms are concerned about the proportion of defective items when a
shipment is made. The gambler depends on knowledge of the proportion of
outcomes that he considers favorable. A politician is certainly interested in
knowing what fractions of the voters will favor him in the next election.
For small sample size, the binomial test is the appropriate test. However,
several studies showed that the normal distribution provides a very accurate
approximation to the binomial distribution when n is large and p is close to ½..
Thus, the z-test which is the transformation of binomial random variable with
mean µ = np and standard deviation σ = √ npq. The limiting form of the
distribution
x – np
Z = √ npq
Example 7 A politician claims that he will garner 90% votes from his
bailiwick province. Would you agree to his claim if on a given day
a researcher asked 1000 qualified voters and turn out that only 872
is in favor of the politician? Use a = 0.05
Solution: 1) H0: p = 0.9

H1: p ≠ 0.9
2) Let a = 0.05
3) Computation:
33
_________________________________________________________________
_
z = -2.95; p-value = 0.0032

4) Critical Region
Z 0.05 / 2 = ± 1.96
5) Decision: Reject H0.
Conclusion: We have reason to doubt the politician’s

claim.
Test on Two Proportions
A person may decide to give up drinking only if he is convinced that the

proportion of drinkers with liver cancers the proportion of nondrinkers with lung
cancers. Situations arise where we wish to test the hypothesis that two proportions
are equal. In general, we wish to test the null hypothesis that two proportions, or
binomial parameters, are equal.
34
_________________________________________________________________
_
Example 8 In a study to estimate the proportion of residents in a certain city
and its suburbs who favor the construction of a thermal power
plant, it is found that 73 out of 130 urban residents favor the
construction while only 60 of 150 suburban residents are in favor.
Is there significant difference between the proportion of urban and
suburban residents who favor construction of the thermal plant?
Use a 0.01 level of significance.
Solution: 1) H0: p1 = p2
H1: p1 ≠ p2
2) Let a = 0.01.
3) Computation:
z = 2.70; p-value = 0.0069
4) Critical Region
35
_________________________________________________________________
_
Z 0.01 / 2 = ± 2.58
Conclusion: The proportion of urban residents favoring

the construction is not equal to the
proportion of suburban residents
One – and – Two – Sample Tests Concerning Variances
Engineers and scientists are constantly confronted with studies in which

they are required to demonstrate that measurements involving products or
processes fall inside specifications set by consumers. The specifications are often
met if the process variance is sufficiently small. It is focused on comparative
experiments between processes where inherent variability must formally be
compared. A test comparing two variances is often applied prior to conducting a t-
test on two means. The aim is to determine if the equal variance is violated.
Assume that the distribution of the population being sampled is normal,
the chi-squared value for testing σ2
= σ2
0 is given by
( n – 1 ) s2
χ = σ2
2
0
Example 9 A manufacturer of cell phone batteries claims that the life of his
batteries is approximately normally distributed with a standard
deviation equal 1.05 years. If a random sample of 15 of these
batteries has a standard deviation of 1.3 years, do you think that
σ > 1.05 years? Use a 0.01 level of significance.
Solution: 1) H0: σ2 = 1.1025

H1: σ2 > 1.1025
2) Let a = 0.01
3) Compute for Chi-Square Variance test.
36
_________________________________________________________________
_
χ2 = 21.46, P = 0.0904
4) Critical Region:
χ2 > 29.14.
21.46
χ2
29.14
37
_________________________________________________________________
_
5) Do not reject H0. The χ2 statistic is not significant at the 0.01
level. However, based on the P-value 0.09 there is evidence that
σ > 1.05.
Goodness-of-Fit Test
Throughout this chapter we have been concerned with the testing of

statistical hypotheses about a single population parameters such as µ, σ2, and p.
Now we shall consider a test to determine if a population has a specified
theoretical distribution. The test is based on how good a fit we have between the
frequency occurrence of observations in an observed sample and the expected
frequencies obtained from the hypothesized distribution.
The observed frequencies will always differ from the expected frequencies
due to sampling error. But are these differences significant? The chi-square
goodness-of-fit test will enable one to determine the answer.
( O – E )2
χ2 =Σ
E
where; O = observed frequencies

E = expected frequencies
k – 1 degrees of freedom
Note: The expected frequency for each category must be at least 5.
Example 10 There are three gates of Juan G. Macaraeg National High School.
The principal would like to know if the gates are equally utilized.
As an experiment, 2700 students are observed as they enter the
school. The number of students enter the gate in Canarvacanan
were 1005, in Sta. Fe were 985, and in Sto. Niño were 710. At
0.05 significance level, can we conclude that there is no difference
in the use of three gates?
Solution:
1. H0: Students have no gate preference.
H1: Students show a gate preference.
2. Let a = 0.05.
3. Compute for the chi-square goodness-of-fit test.
38
_________________________________________________________________
_
χ2 = 60.39
4. Critical Region
χ2 0.05, 2 = 5.99
5. Decision: Reject H0.

Conclusion: There is a large difference between the set of
observed frequencies and set of expected frequencies.
Test for Independence ( Categorical Data)
The chi-square test procedure can also be used to test the hypothesis of
independence of two variables of classifications. Suppose we wish to determine if
each person’s blood type and eye color are related in any way. To find whether
two observed characteristics of a member of a population are independent, we
will use Test of Independence.
Suppose we pick a sample size n and classify the data in a two-way table
on the basis of two variables. Such a table for determining whether the
distribution according to one variable is contingent on the distribution of the other
is called a contingency table. A contingency table with r rows and c columns is
referred to as an r × c table ( “r × c” is read “ r by c” ).
The formula for Test for Independence:
39
_________________________________________________________________
_
( O – E )2
χ2 = Σ
E
where the summation extends over all rc cells in the r × c contingency
table. If χ2 > aχ2 with v = ( r – 1 )( c – 1 ) degrees of freedom, reject the null
hypothesis of independence at the level of significance; otherwise, do not reject
the null hypothesis.
Example 11 A researcher wishes to see whether the age of an individual is

related to milk consumption. The data are classified below.
Milk Consumption
Age
Low Moderate High
16 – 25 10 15 16
26 – 35 11 13 27
36 – 45 16 28 5
45 and above 8 9 7
At α = 0.05, is there a relationship between milk consumption and

age?
Solution: 1) H0: Milk consumption of a person does not depend on

his age.
H1: Milk consumption of a person depends in his age.
2) Let α = 0.05.
3) Compute for χ2
40
_________________________________________________________________
_
χ2 = 22.37
4) Critical Region:
χ2 0.05, 6 = 12.59
Conclusion: Milk consumption of an individual depends

in his age.
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.1
41
_________________________________________________________________
_
1. The hospital record shows that the mean weight of a newly born baby is
8.3 lbs, with the standard deviation of 0.6 lbs. A researcher takes a sample
of 100 newly born babies and found to have a mean of 7.8 lbs. Test the
claim at 0.01 level of significance.
2. A sociologist finds that in the Philippines, the mean number of years of

education is 10 while the standard deviation is 1.8. In Region I, a random
sample of 100 people and found out that the sample mean is 12 years. At
the 0.01 level of significance, test the claim that the mean for Region I is
the same as the mean of the country.
3. A survey found that women over the age of 60 consume an average of

1870 calories a day. In order to see if the number of calories consumed by
women over age 60 living in Binalonan is the same, the researcher
sampled 200 women over the age of 60 and found the mean number of
42
_________________________________________________________________
_
calories consumed was 1997. The standard deviation of the sample was 56
calories. At a = 0.05, can it be concluded that there is no difference
between the number of calories consumed by the women over age 60?
4. A manufacturer of a certain brand of auto batteries claims that the mean

life of these batteries is 80 months. A consumer protection agency that
wants to check this claim took a random sample of 50 batteries such
batteries and found that the mean life for this sample is 73.75 months with
a standard deviation of 7 months. Using the 2.5% significance level,
would you conclude that the mean life of these batteries is less than 80
months?
5. A company claims that the mean weight per banana it ships is 180 grams
with a standard deviation of 10 grams. Data generated from a sample of 70
bananas randomly selected from a shipment indicated a mean weight of
43
_________________________________________________________________
_
193.5 grams per banana. Is there sufficient evidence to reject the
company’s claim? Use a = 0.01.
6. The treasurer of a certain university claims that the mean monthly salary
of their college professor is P 37,750 with a standard deviation of P 3000.
A researcher takes a random sample of 100 college professors why were
found to have a mean monthly salary of P 34,375. Do the 100 college
professors have higher salaries than the rest? Test the claim at a = 0.02
level of significance.
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
44
_________________________________________________________________
_
Exercise 2.2
1. Past experience indicates that the time for high school juniors to complete
a standardized test is a normal random variable with a mean of 60 minutes.
If a random sample of 15 high school juniors took an average of 66
minutes to complete this test with a standard deviation of 5 minutes, test
the hypothesis at 0.01 level of significance that µ = 60 minutes against the
alternative that µ < 60 minutes.
2. A recent survey stated that households received an average of 30

telephone calls per week. To test the claim, a researcher surveyed 15
households and found that the average number of calls was 38 and the
standard deviation is 4. At α = 0.01, can the claim be rejected?
45
_________________________________________________________________
_
3. Test the hypothesis that the average content of a particular soft drink is 1
liter if the contents of a random sample of 10 bottles are 1.04, 0.97, 1.01,
1.05, 0.97, 0.98, 1.02, 1.05, 0.97, and 0.97 liters. Use a 0.01 level of
significance and assume that the distribution of contents is normal.
4. The president of a certain tricycle operators and drivers claims that the
average mileage of tricycles is less than 80000. A sample of 16 tricycles
has an average mileage of 90000, with standard deviation of 8000. At α =
0.02, is there enough evidence to reject the president’s claim?
46
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.3
1. A large automobile manufacturing company is trying to decide whether to

buy brand X or brand Y tires for its new models. To help arrive at a
decision, an experimented is conducted using 100 of each brand. The tires
are run until they wear out. The results are
Brand Mean ( in kilometers) SD ( in kilometers )

X 60,000 750
Y 75,750 500
Test the hypothesis that there is no difference in the 2 brands of tires. Use
a 0.03 level of significance.
2. Two types of plastic are suitable for use by an electric component

manufacturer. The breaking strength of this plastic is important. If is
known that σ 1 = σ 2 = 1 psi. From a random sample of size n = 60 and m
= 72, we obtain x1 = 162.5 psi and x2 = 155.0 psi. The company will
choose the plastic with larger breaking strength. Based from the sample,
should the company choose plastic 1? Use 0.01 level of significance.
47
_________________________________________________________________
_
3. A Mathematics department head claims that the average score of Special

Science Class exceeds the average score of Regular class by at least 20
points in the progress test. To test his claim, he choose at random of 100
SSC students and found out that their average score was 80 and standard
deviation of 9, while120 Regular class students had an average score of 67
and standard deviation of 7. Test the department head’s claim using a 0.01
4. The following data, recorded in kilometers per liter, represent the fuel
consumption of two vehicles tested at 90-kilometer per hour steady-
speedy tests;
Vehicle Sample ( n ) Mean Standard Deviation

Jeepney 67 27 2.3
Mini - Bus 59 29 4.1
Test the hypothesis that Mini buses, on the average, exceed similarly
equipped jeepneys by 5 kilometers per liter. Use a 0.01 level of
significance.
48
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.4
1. To find out whether a new drug will cure diabetes, 15 mice with an
advanced stage of disease, are selected. Survival times, in years, from the
time the experiment commenced are as follows:
Treatment 8.9 6.8 6.2 2.9 7.9 5.8 4.2 6.1

No treatment 2.2 2.3 2.5 3.6 2.4 3.1 2.1
At the 0.05 level of significance, can the drug be said be effective?
2. Teacher X conducted a review class in his Chemistry class. He gave a test

before and after the review and gathered the following data:
STUDENT
1 2 3 4 5 6 7 8 9
Score
Before 15 19 19 17 24 29 26 11 18
Review
Score
After 17 29 41 20 32 28 39 30 27
Review
At α = 0.01 level of significance, is the review class effective?
49
_________________________________________________________________
_
3. It is claimed that a new diet will reduce a person’s weight by 3 kilograms

on the average in a period of 10 days. The weights of 8 women who
followed this diet were recorded before and after a 20-day period.
WOMAN
1 2 3 4 5 6 7 8
Weight
57.6 60.5 89.8 67.9 57.6 54.2 68.3 69.3
before
Weight
50.6 60.3 62.4 60.2 51.6 55 59.4 60.3
after
Test the claim at α = 0.02 level of significance.
4. Twenty subjects were used in an experiment to determine if an atmosphere

involving exposure to carbon monoxide has an impact on breathing
capability. The subjects were exposed to breathing chambers, one of which
contained a high concentration of CO. The average breath-taken-per-
minute without CO is 50 and standard deviation of 2, while the average
breath-taken-per-minute exposed with CO is 49 and the standard deviation
of 6. Test the hypothesis that the mean breathing frequency is the same for
the two environments. Use a 0.01 level of significance.
50
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.5
1. It is claimed that a certain drug causes a successful cure in 70% of all

cases. At a 0.01 level of significance, would you agree to this claim if in a
sample of 800 cases 550 were cured?
2. At a certain college it is estimated that at most 50% of the students ride

motorcycles to class. Does this seem to be a valid estimate if, in a random
of 100 college students, 40 are found to ride motorcycles to class? Use a
0.01 level of significance.
51
_________________________________________________________________
_
3. A telephone company claims that two-third of the homes in a certain city

have landline telephones. Do we have reason to doubt this claim if, in a
random sample of 1000 homes in this city, it is found that 878 have
telephones. Use a 0.05 level of significance.
4. Supposed that, in the past, 40% of all adults favored death penalty. Do we
have reason to believe that the proportion of adults favoring death penalty
today has increased if, in a random sample of 40 adults, 19 favor death
penalty? Use a 0.02 level of significance.
52
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.6
1. An urban community would like to show that the incidence of breast

cancer is higher than in a nearby rural area. If it is found that 250 of 3000
adult women in the urban community have breast cancer and 105 of 1000
adult women in the rural community have breast cancer, can we conclude
that breast cancer is more prevalent in the urban community? Use a 0.0l
2. A study was made to determine whether more Filipinos than Italians prefer
white champagne to pink champagne at weddings. Of the 1000 Filipinos
selected at random, 178 preferred white champagne, and of the 3000
Italians selected, 80 preferred white champagne. Can we conclude that a
higher proportion of Filipinos than Italians prefer white champagne at
weddings? Use a 0.02 level of significance.
53
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.7
1. A soft-drink dispensing machine is said to be out of control if the variance

of the contents exceeds 1.2 deciliters. If a random sample of 200 drinks
from this machine has a variance 0f 1.69, does this indicate at the 0.01
level of significance that the machine is out of control?
2. A company claims that the variance of the sugar content of its ice cream is
equal to 50. A sample of 200 servings is selected, and the sugar contents
are measured and found out that the sample variance is 40. At α = 0.05, is
there sufficient evidence to believe the claim?
54
_________________________________________________________________
_
3. A medical researcher believes that the standard deviation of the

temperature of newborn babies is equal to 0.70. A sample of 50 infants was
found to have a standard deviation of 0.750. Assume that the variable is
normally distributed; does the evidence support the medical researcher’s
claim? Use a 0.01 level of significance.
4. A study is conducted to compare the length of time between men and

women to encode a certain phrase in their cellular phones. The data are
tabulated below:
Men Women
n 10 23
sd 1.2 3.7
Test the hypothesis against the alternative hypothesis that variance of men
is greater than of the women. Use a 0.01 level of significance.
55
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.8
1. A machine is supposed to mix peanuts, cashews, hazelnuts, and pecans in

the ratio 4: 3: 2: 1. A can containing 640 of these mixed nuts was found to
have 250 peanuts, 190 cashews, 160 hazelnuts, and 40 pecans. At a 0.05
level of significance, test the hypothesis that the machine is mixing the
nuts in the ratio 4: 3: 2: 1.
2. A supervisor at a certain cinema wants to determine if there is any

preference in the flavors of popcorn that were sold in a day. A random
sample of sales is selected, and the data are shown below. At α = 0.05, are
the flavors selected with equal frequency?
Flavor Butter Barbeque Cheese Plain

sold 38 51 36 45
56
_________________________________________________________________
_
Name:_______________________________________ Score:______________
Section:______________________________________ Date: ______________
Exercise 2.9
1. The department head of Mathematics wanted to determine if there are

significant differences in the way his instructors handled out pass or fail
grades. He set a 0.05 level of significance. The data are shown below:
Instructor PASS FAIL

X 50 10
Y 61 3
Z 27 5
2. A sociology study compared 3 groups in their responses to 1 question:

“Are you happier now than you were 4 years ago?” Their responses are
tabulated below
RESPONSE
GROUP
More Less Same
Professional 59 29 19
Blue collar 16 12 20
Unskilled laborers 17 57 5
Is there significant difference among their responses? Use a 0.01 level of

significance.
57
_________________________________________________________________
_
3. In a study of car accidents and drivers who use cellular phones, the
following sample data are obtained. At a = 0.01, test the claim that the
occurrence of accidents in independent of the use of cellular phones.
Had accident Had no accident

Cellular phone user 100 300
Non-cellular phone user 15 400
4. The supermarket sells red and white eggs in sizes small, medium, large,
and extra large. The table shows the number of cartons sold for the
various sizes and colors during a one-month period.
TYPE OF EGG SIZE

EGG Small Medium Large Extra Large
Red 7000 4077 5011 2080
White 4810 5500 8203 4700
Is egg color preference dependent on the size purchased? Test at 0.01

58
_________________________________________________________________
_
59

Chapter 2 Hypothesis Testing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2 Hypothesis Testing

Uploaded by

Copyright:

Available Formats

Chapter 2 Statistical Hypothesis Testing

Concepts and Definitions:

In reality, the problem confronting the scientist or researcher is not so

A hypothesis is an assertion or a statement about a population. Data are

What is Hypothesis Testing?

Statistical hypothesis testing is a procedure of dichotomizing the

Five-Step Procedure for Testing a Hypothesis

There is a five-step procedure that systematizes hypothesis testing; when

Step 1. State the Null Hypothesis and the Alternative Hypothesis.

To illustrate these concepts, suppose we want to test whether a coin is

Step 2. Select a Level of Significance.

Null hypothesis Do not reject H_0 Rejects H_0

H_0 is true Correct decision Type I error

H_0 is false Type II error Correct decision

We denote   P  Type I error  , while   P  Type II error  . In theory,

A test statistic is a value, determined from sample information, used to

Step 4. Formulate the Decision Rule.

A decision rule is a statement of the specific conditions under which the

Step 5. Make a Decision

P-values in Hypothesis Test

A commonly used method in testing hypothesis is to report the p-value of the

One-Tailed and Two-Tailed Tests

The sampling distribution of a z test

(Left) one-tailed test

Tests Concerning a Single Mean

Large Sample ( Z –test )

Suppose that a random sample X 1 ,K , X n is drawn from a normal

Alternative hypothesis Reject H 0 if and only if

In the table, the critical value z is chosen such that P  Z  z    .

Example 1: Medical researchers have developed a new artificial heart

Solution: We will solve this testing problem in 5 steps.

4. At 0.05 level of significance, we reject H 0 if and only if

Solution: We will solve this testing problem in 5 steps.

4. At 0.01 level of significance, we reject H 0 if and only if

Note: Since it is a left-tailed test, use negative result. Thus, z = -2.33.

Other Uses for the Normal Z Test

The normal Z test can also be used on the following cases:

Small Sample ( t –test )

Another important continuous distribution in statistics: the Student t

Suppose that we observe a random sample X 1 ,K , X n from a normal

Alternative Hypothesis Rejection Region

where t , n 1 is the critical value of a t distribution with n  1 degrees of freedom

3. Compute for the t – statistic. Using Excel, enter the label

After pressing OK, megastat will provide an output sheet

The result t – value = - 2.86.

4. Compute for the critical region. At a = 0.03, we reject H0 if

Note: Since it is two-tailed, t = ± 2.31.

5. Based from the rejection region (and critical value ± 2.31),

Is there a sufficient evidence to reject the company’s claim? Use

Solution: 1. H0 µ = 150 grams.

3. Compute for the t – statistic. Using Excel, enter the data.

After pressing OK, megastat will provide an output sheet

The result t – value = 1.54.

4. The critical region using t – distribution, t = ± 2.23.

5. Do not reject. Therefore, the mean weight of banana was

Normal Test for Two Independent Samples

The rejection regions, together with their respective alternative hypothesis,

Alternative Hypothesis Rejection Region

Example 5 A soft-drink manufacturer claims that its 12 – ounce bottles

Solution: 1. H 0 : 1   2 ( samples have the same calorie content )

2. Use a = 0.01 level of significance.

The result is z = 5.04.

4. The critical region at a = 0.01 ( two tailed ), z = ± 2.58.

5. Reject H0 and conclude that the samples don’t have the