06 Test Hypothesis

06_Test hypotheses
Definition:
Statistical Hypothesis is an assumption made about some
parameters, i.e. about statistical measure of a population.
It is not every hypothesis is a statistical hypothesis
Non-statistical hypotheses are:
Mars is inhabited by human being
Mr.A is the candidate who would be the best president for the
Republic of Indonesia
Mt.Merapi is guarded by mbah Jugo
Neither of these statements is an assumption about a parameter. We
can change by a slight in wording of a statement to become in the
following statistical hypothesis:
Mr.A is the candidate most favored for the presidency of the
Republic of Indonesia
06_Test hypotheses
In order to establish the truth or falsity of a statistical hypothesis

with complete certainty, it may be necessary to examine the
entire population. Since this is frequently impossible or
impractical, we are forced to take a sample from the population
and use the sample in deciding whether the hypothesis is true or
false.
Ho : The hypothesis being tested (read H sub zero)
H1 : The alternative hypothesis
Example:
We wish to test the honesty of a coin (statistical hypothesis,
because it is equivalent to the assumption that for the population
of a very large number of tosses of this coin, the probability p of
obtaining head is 1/2). Suppose now in tossing this coin 100 times
we obtain 60 heads, and from this information we are forced to
reach a decision regarding the honesty of the coin.
06_Test hypotheses
Using the normal curve approximation to the binomial

distribution, we find:
Hypothesis : Ho: p = ; q =
n = 100; m = . 100 = 50
100. 12 . 12 25 5.
z
59.5 50 9.5
1.90,
5
5
P = 0.5000 0.4713 = 0.0287
y
Therefore, if an honest coin is
tossed 100 times, the probability
of obtaining 60 or more heads is
0.0287 ( = 2,87%), certainly not a
high probability
06_Test hypotheses
60 or more
Heads
P = 0.0287
m = 50
X = 59.5
Z=0
z = 1.90
z
4
From this result we are justified in drawing one of two conclusions:

1. The hypothesis is correct, but a rare event has occurred
2.
The hypothesis is not correct
By definition:
In testing a statistical hypothesis, the hypothesis is declared to be
true if the calculated probability exceeds a given value , called the
significance level; and it is considered false if the calculated
probability is less than
If the calculated probability is less than , indicating that the
hypothesis is false, the result is termed significant
Convention:
A result significant if the calculated probability is less than = 0.05
Highly significant if the calculated probability is less than = 0.01
06_Test hypotheses
In the above sample, since the result 0.0287 is less than 0.05, the result
is significant (although not highly significant), and consequently on the
basis of 5% level of significance ( = 0.05) we reject the hypothesis that
the coin is honest.
Why did we calculate the probability of obtaining 60 or more heads and
not the probability of obtaining a deviation of 10 or more from the
expected number of heads (i.e. 60 or more or 40 or fewer head)?
The answer is:
There are two probabilities to calculate depends on what we wish to
consider as the alternative to the hypothesis Ho: p = . If we consider
the alternative hypothesis, denoted by H1 to be p > , then the
appropriate probability to calculate is that of obtaining 60 or more
heads, since, by considering p>1/2 as the alternative hypothesis, we
have indicated that we dont wish to reject the hypothesis if fewer than
50 heads are obtained
If, on the other hand, we consider the alternative hypothesis to be p
, then the appropriate probability to calculate is that of obtaining 60 or
more or 40 or fewer heads.
06_Test hypotheses
The later probability is twice the former, i.e.: 2(0.0287) = 0.0574, so in

this case, using = 0.05 as the significant level, we do not consider
this result significant
By definition:
If the area in only one tail of a curve is used in a statistical
hypothesis, we speak of one-tail test; if the area of both-tail are
used, we refer to test as two-tail
One tail test: the alternative hypothesis clearly is that the parameter
is smaller than the one stated in the hypothesis
Two tail test: if the interest in the problem is that the result obtained
was different (that is, either lower or higher) from the expected one,
the alternative hypothesis is that the parameter is different from the
one stated in the hypothesis
In case of doubt, a two tail test is recommended.
06_Test hypotheses
y
Reject
hypothesis
Reject
hypothesis
Two-tail test for = 0.05

Z >1.96 or z < - 1.96
Accept
hypothesis
Z= -1.96
Z= 1.96
y
Reject
hypothesis
One-tailed test for = 0.05

|z| > 1.64
Accept
hypothesis
Z are called critical value
Z= 1.64
06_Test hypotheses
Example:
From the Mendelian theory, we know that crosses of peas should give
yellow and green peas in the ratio 3:1 . In a particular experiment, 179
yellow and 49 green peas are obtained. Is this a significant deviation from
the theory on the basis of the 5 % level of significance?
Solution:
Probability of obtaining yellow peas is p = , total sample 228, then p = .
228 = 171
Hypothesis Ho: p = ; q = ; n = 228 ; m = . 228 = 171
228. 43 . 14 42.75 6.54

z
178.5 171
1.15
6.54
P = (0.5000 0.3749).2 = (0.1251). 2 = 0.2502

Since P > 0.05 the results does not represent a significant deviation from the theory.
The same conclusions also follows from the fact that 1.15 < 1.96, the critical value of z.
It should be emphasized that the testing of statistical hypotheses does not constitute a
mathematical proof of the truth and falsity of the hypotheses. Unfortunately, there is no
absolute certainty that the conclusion reached will be correct. In fact, in testing
statistical hypotheses, two type of incorrect conclusions are possible.
06_Test hypotheses
Hypotheses and Conclusions reached from sample
Conclusion
Hypothesis Stated is
True
Sample
indicates
False
Reject
hypothesis
Type 1 error
Correct
conclusion
Accept
hypothesis
Correct
conclusion
Type 2 error
By definition
If it happens that the hypothesis being tested is actually true, and if from the
sample we reach the conclusion that it is false, we say that a type 1 error has
been committed.
If it happens that the hypothesis being tested is actually false, and if from the
sample we reach the conclusion that it is true, we say that a type 2 error has
been committed.
06_Test hypotheses
10
Example:
The mean breaking strength of a certain type of cord has been establish from
considerable experience at 18.3 kg with a standard deviation of 1.2 kg. A new
machine is purchased to manufacture this type of cord. A sample of 100
pieces obtained from the new machine shows a mean breaking strength of
17.0 kg. Would you say that this ample is inferior on the basis of 1 % level of
significance?
Solution:
We will calculate the probability of obtaining from a population with a mean of
18.3 and standard deviation of 1.2, a sample of 100 pieces with a mean of 17.0 or
less. If the probability turn out to be less than 1 %, we reject the hypothesis,
otherwise, we accept it.
Hypothesis:
Ho:
m = 18.3 = 1.2
1 .2
1 .2
0.12 ,
10
100
X = 17.0
n = 100
17.0 18.3 1.3
10.83
0.12
0.13
Since |z| = 10.83 is larger than any value of z listed in statistical table, the
corresponding tail area is much less than 0.01 and, consequently, the result is
highly significant
06_Test hypotheses
11
A frequently tested hypothesis in statistics is the assumption that the

means of two population are identical. Suppose we are given two samples
of sizes n1 and n2 and that we have calculated the means of these samples
( X and Y respectively) We then wish to test the hypothesis that the
two populations from which these samples were selected have identical
means.
This is determined by calculating the probability of selecting from two
populations with identical means two samples for which the mean differ by
| X - Y | or more. Two tailed test is normally used in testing the
hypothesis that two means are equal (or that the difference between means
is zero) although there are rare occasion when a one-tailed test may be
preferable.
Example:
The gasoline mileage for 50 cars of make A has a mean of 17.0 km per gallon
with a standard deviation of 2.5 km per gallon, while that for 70 cars of make
B has a mean of 18.6 km per gallon with standard deviation of 3.0 km per
gallon. If this is the only information available on gasoline consumption of
these two makes of cars, can we conclude that cars of make B consume less
gasoline than those of make A (5% level).
06_Test hypotheses
12
Solution:
Here we wish to test the statistical hypothesis that the means of two
populations (distance traveled per gallon of gasoline for makes A and B,
denoted by X and Y, respectively) are identical, or Ho : mx = my . The alternative
hypothesis H1 is mx < my
Hypothesis : Ho : mx = my or mx my = 0
Alternative hypothesis
X = 17,
H1: mx < my
Y = 18.6 ; sx = x = 2.5 ; sy = y = 3.0 ; n1 = 50 ; n2 = 70
X
2.5
,
n1
50
x y x y
2
y
n2
3.0
,
70
2.5 2 3.0 2
50
70
6.25 9.00
50
70
0.1250 0.1286 0.2536 0.50 ,
06_Test hypotheses
13
Then:
z
X Y m
my
x y
17.0 18.6 0
0.50
1.60
3.20
0.50
P = 0.5000 0.4993 = 0.0007

Since the probability P, is less than 0.05, the result is significant. This
means that if the populations actually have identical means, the probability
of obtaining a difference of - 1.60 or less between the two sample means is
very small (P = 0.0007), and this an excellent indication that cars of make B
consume less gasoline than those of make A or that the later are guilty of
a higher gasoline consumption.
By definition:
The hypothesis being tested is also called the null- hypothesis
06_Test hypotheses
14
Confidence limit (batas keterpercayaan)

By definition:
The limits which will contain a parameter with a probability of 95% (or some
other given per cent) are called 95% (or that other per cent) confidence limis for
the parameter
The interval between the confidence limits are called the confidence interval.
Illustration
Suppose a random sample of 100 variates is taken from a normal population
and is found to have a mean of 40 and standard deviation of 11. It will not be
possible to determine precisely the mean of the population, since the sample
of 100 variates with mean 40 could have been taken from any population with
mean at or near 40. Consequently, the best we can hope for is to establish
limits within which the mean of the population will fall with a specified
probability or confidence (usually taken as 95%)
We shall assume that (since the sample is large) the standard deviation of the
population can be well approximated by the standard deviation of the sample,
then we find:
11
x
1.10
n 10
06_Test hypotheses
15
Therefore, 95% of the sample means lie between the value of X obtained by solving
for X the two equations.
X m
1.96
1.10
The value of 1.96 is obtained from statistical table, as the value of z
corresponding to an area of (0.95) = 0.475. The solution are :
X 1 m 2.16
and
X 2 m 2.16
However, since in the illustration we are given X and wish to determine m, we must
solve the above equations for m, which gives
m1 X 2.16 40 2.16 37.84
and
m1 X 2.16 40 2.16 42.16
These values, m1 and m2 constitute the lower and upper 95% confidence limits for the
mean of the population. Frequently they are written more briefly as
m = 40 2.16
06_Test hypotheses
16
X 2.16
X 40
X 2.16
95% confidence interval for m
06_Test hypotheses
17
Example:
A sample of 70 variates has a mean of 65 with standard deviation of 4.2. Find
the 98% confidence limits for the mean of the population.
Solution
Here we are given X 65,
Assuming s
s 4.2,
n 70.
we find:
4.2
4.2
0.50
n
70 8.37
From Statistical table, the value of z corresponding to an area of (0.98) = 0.49

is 2.23; thus:
X m
2.33
0.50
Therefore:
X m 1.16
and
m X 1.16 65 1.16
Which are the 98% confidence limits for the mean of the population
06_Test hypotheses
18
Determining the Size of the Sample in Survey

A pool is to be conducted to determine the percentage of the population favoring
candidate A in an election in which there are two candidates. Determine the size
of the sample which should be pooled in order to obtain the percentage in the
population within 3 % at the 95% confidence limits. A quick survey conducted on
a small sample leads to the belief that the percentage of voter favoring candidate
A is 43%
Solution: From results of the survey sample we will assume that the percentage of
voter in the population favoring candidate A is 43%. We are interested in determining the
size of the sample so that with a probability of 0.95 the percentage in the sample does not
differ by more than 3% from that of the population
Then we have:
p 0.43,
q 0.57,
m 0.43n
n(0.43)(0.57 ) 0.2451.n 0.495 n .

Since we wish the number of people favoring candidate A to differ from that in the
population by at most 0.03n with a probability of 0.95, we have, by using the normal curve
approximation to the binomial distribution,
0.03n
1.96
0.495 n
or
1.96( 0.495)
3.23
0.03
or
n ( 32.3) 2 1043.29 1044
06_Test hypotheses
19
06_Test hypotheses
20
06_Test hypotheses
21

06 Test Hypothesis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

06 Test Hypothesis

Uploaded by

Copyright:

Available Formats

06_Test hypotheses

In order to establish the truth or falsity of a statistical hypothesis

Using the normal curve approximation to the binomial

P = 0.5000 0.4713 = 0.0287

From this result we are justified in drawing one of two conclusions:

The hypothesis is not correct

The later probability is twice the former, i.e.: 2(0.0287) = 0.0574, so in

Two-tail test for = 0.05

One-tailed test for = 0.05

Z are called critical value

228. 43 . 14 42.75 6.54

P = (0.5000 0.3749).2 = (0.1251). 2 = 0.2502

Hypotheses and Conclusions reached from sample

17.0 18.3 1.3

A frequently tested hypothesis in statistics is the assumption that the

Y = 18.6 ; sx = x = 2.5 ; sy = y = 3.0 ; n1 = 50 ; n2 = 70

0.1250 0.1286 0.2536 0.50 ,

P = 0.5000 0.4993 = 0.0007

Confidence limit (batas keterpercayaan)

m1 X 2.16 40 2.16 37.84

m1 X 2.16 40 2.16 42.16

95% confidence interval for m

From Statistical table, the value of z corresponding to an area of (0.98) = 0.49

Determining the Size of the Sample in Survey

n(0.43)(0.57 ) 0.2451.n 0.495 n .

n ( 32.3) 2 1043.29 1044

You might also like