Hypothesis Testing Guide

Chapter Five
HYPOTHESIS TESTING
In the previous section we have seen the

difference between probability distribution
and frequency distribution.
Generally speaking, probability distribution
refers to the underlying, usually unknown,
distribution of the population.
We can also say probability distribution as
the true underlying distribution of the
random variable.
Frequency distribution refers to the observed
distribution of the sample.
2
Cont
When the number of observations is
large, the observed relative
frequency distribution will tend to
look the true probability distribution.
Statistical Inference
Consider the example of the probability
distribution of the six-face die. In simulated
tosses of an unbiased die 60 and 6000 times,
the following frequency distributions were
observed.
Value on Face 1 2 3 4 5 6
Consider No. of tosses = 60
Frequency
9
13 11 3 15 9
Probability
0.150
0.217
0.183
0.250
0.150
Consider No. of tosses = 6000
Frequency
964 1045 994
993
Probability
0.161
0.174
0.166
0.164
0.170
4
0.050
983
1021
0.166
Statistical Inference
Notice that with 6000 tosses the observed
frequency distribution is virtually identical to the
true underlying probability distribution (which is 1/6
= 0.167), while with 60 tosses the distributions are
quite different.
In practice, we do not know the true population
probability distribution, but wish to infer, from our
sample, something about it.
Statistical inference is the process of using
samples to make inferences (formal methods for
drawing conclusions) about a population.
Statistical inference includes
hypothesis testing.
5
Hypothesis Testing
The formal process of hypothesis testing

provides us with a means of answering
research questions.
Hypothesis is a testable statement that

describes the nature of the proposed
relationship between two or more variables
of interest.
The purpose of the study is to collect data
which will allow the researcher to test the
hypothesis.
Idea of hypothesis testing
Null and Alternative hypotheses

Null hypothesis (represented by HO) is the
statement about the value of the population
parameter. That is the null hypothesis postulates that
there is no difference between factor and outcome
or there is no an intervention effect.
Alternative hypothesis (represented by HA) states
the opposing view that there is a difference
between factor and outcome or there is an
intervention effect.
Hypotheses are often stated in a null form, so as to
allow them to be refuted.
E.g. not all swans are white (HO ) as opposed to all
swans are white (HA).
It is difficult to prove the later as one would
supposedly have to see all the swans in the world.
8
But taking a sample of swans, we can reject
or accept
Steps in hypothesis testing

1. Identify the null hypothesis H0 and the alternate
hypothesis HA.
2. Choose . The value should be small, usually less
than 10%. It is important to consider the
consequences of both types of errors.
3. Select the test statistic and determine its value
from the sample data. This value is called the
observed value of the test statistic. Remember that
t statistic is usually appropriate for a small number
of samples; for larger number of samples, a z
statistic can work well if data are normally
distributed.
Cont
4. Compare the observed value of the
statistic to the critical value
obtained for the chosen .
5. Make a decision.
Cont...
If the test statistic

falls in the critical
region:
Reject H0 in favor of HA.
If the test statistic

does not fall in the
critical region:
Conclude that there is
not enough evidence to
11
reject H
0.
Types of testes
1
H 0 : 0 ( 0 )
H A : 1 0 ( 0 )
x 0
zcal
n
ztabulated z for two tailed test
2
if | zcal | ztab reject H o

Decision :
if | zcal | ztab do not reject H o
12
Cont
2
H 0 : 0 ( 0 )
H A : 1 0 ( 0 )
z cal
x 0
, ztabulated z for one tailed test
n
Decision :
3
if zcal ztab reject H o
if zcal ztab do not reject H o
H 0 : 0 ( 0 )
H A : 1 0 ( 0 )
Decision :

13
Types of errors
There are two types of errors
Type of
decision
H0 true
H0 false
Reject H0
Type I error ()
Correct decision
(1-)
Accept H0
Correct decision
(1-)
Type II error ()
Type I error is more serious error and it is the level of

significant
power is the probability of rejecting false null
hypothesis and it is given by 1-
14
The judiciary Vs statistician

1.Premises is to the law
Ho : Guilty
HA: Not Guilty
2. Evidences
3.Arguements based on
the law
Truth
Accep
t
Conclusio
n
Reject
Guilty
Not Guilty
OK
Error()
error
OK
15
1.Hypothesis is to statistics
Ho: = o, P=po
HA: these are different
2. Data
3. Test statistics H
Conclusion
Truth
true
Ho false
Accept
OK
Error
Reject
error
OK
Test Statistics
In hypothesis testing we always start with
statements for HO and HA.
Because of random variation, even an unbiased
sample may not accurately represent the
population as a whole.
As a result, it is possible that any observed
differences or associations may have occurred
by chance.
Statistical testing of a research hypothesis
allows the researcher to quantify the risk or error
involved in making inferences about a
population based on the information obtained
from a sample.
16
Test Statistics...
A test statistics is a value we can compare
with known distribution of what we expect
when the null hypothesis is true.
The general formula of the test statistics is:
Observed
Hypothesized
Test statistics =
value
value
Standard error
17
18
19
20
21
The P- Value
In most applications, the outcome of performing

a hypothesis test is to produce a p-value.
P-value is the probability that the observed
difference is due to chance.
A large p-value implies that the probability of the
value observed, occurring just by chance is low,
when the null hypothesis is true.
That is, a small p-value suggests that there might
be sufficient evidence for rejecting the null
hypothesis.
22
The P- Value
..
But for what values of p-value should we reject

the null hypothesis?
By convention, a p-value of 0.05 or smaller is
considered sufficient evidence for rejecting the
null hypothesis.
By using p-value of 0.05, we are allowing a
5% chance of wrongly rejecting the null
hypothesis when it is in fact true.
When the p-value is less than to 0.05, we often
say that the result is statistically significant.
23
EXAMPLE
2.
3.
4.
5.
A researcher claims that the mean of the IQ for 16 students is

110 and the expected value for all population is 100 with
standard deviation of 10. Test the hypothesis .
Solution
1. Ho:=100
VS
HA:100
Assume =0.05
z=(110-100)4/10=4
z-critical at 0,025 is equal to 1.96.
41.96
Conclusion reject the null hypothesis
24
Hypothesis testing for proportions
Example
In the study of childhood abuse in psychiatry patients, brown
found that 166 in a sample of 947 patients reported histories of
physical or sexual abuse.
a. constructs 95% confidence interval
b. test the hypothesis that the true population proportion is
30%?
Solution (a)
The
95% CIpfor
P is
(1
p )given by
p z
n
2
0.175 0.825
947
0.175 1.96 0.0124
0.175 1.96
[0.151 ; 0.2]
25
example
To the hypothesis we need to follow the steps
Step 1: State the hypothesis

Ho: P=Po=0.3
Ha: PPo 0.3
Step 2: Fix the level of significant (=0.05)
Step 3: Compute the calculated and tabulated value of the test
statistic
zcal
ztab
p Po
0.175 0.3 0.125
8.39
0.0149
p (1 p )
0.3(0.7)
n
947
1.96
26
Example
Step 4: Comparison of the calculated and tabulated
values of the test statistic
Since the tabulated value is smaller than the
calculated value of the test the we reject the null
hypothesis.
Step 6: Conclusion
Hence we concluded that the proportion of childhood
abuse in psychiatry patients is different from 0.3
If the sample size is small (if np<5 and n(1-p)<5)
then use students t- statistic for the tabulated value
of the test statistic.
27
Two sample mean and

proportion
Still now we have seen estimate for only single
mean and single proportion. However it is possible
to compute point and interval estimation for the
difference of two sample means.
let x1, x2, , xn1 are samples from the first
population and y1, y2, , yn2 be sample from the
X
second population.
Y
Sample mean for the first population be
Sample mean for the second population
(X Y)
Then the point estimate for the difference of means
(1-2) is given by
28
Two sample estimation

Confidence interval estimation
A (1-)100% confidence interval for
the difference of means is given by
( x y) z
2
12
n1
22
n2
1 , and 2
If
are unknown, then can be
s1 , and s2
estimated by
29
Hypothesis testing for two sample means

The steps to test the hypothesis for difference of
means is the same with the single mean
Step 1: state the hypothesis
Ho: 1-2 =0
VS
HA: 1-2 0, HA: 1-2 <0, HA: 1-2 >0
Step 2: Significance level ()
Step 3: Test statistic
zcal
( x y ) ( 1 2 )
12
n1
22
n2
30
Hypothesis
ztabulated z for two tailed test
2
ztabulated z for one tailed test

if | zcal | ztab reject H o
For H A : 1 2 0
if | zcal | ztab do not reject H o
if zcal ztabreject H o
For H A : 1 2 0
For H A : 1 2 0
31
Small sample size and population variance is not

given
The test statistic will be students t-statistic with
degree of freedom equals to n1+n2 -2
Hence the tabulated value of t is read from the
table.
The decision remains the same
32
Example
A researchers wish to know if the data they have
collected provide sufficient evidence to indicate a
difference in mean serum uric acid levels between
normal individual and individual with downs
syndrome. The data consists of serum uric acid
readings on 12 individuals with downs syndrome and
15 normal individuals. The means are 4.5mg/100ml
and 3.4 mg/100ml with standard deviation of 2.9 and
H
: 1 2 respectively.
0
3.5
O mg/100ml
H A : 1 2 0
33
SOLUTION
THE
z cal
( x y ) ( 1 2 )
2
1
n1
2
2
n2
( 4.3 3.4) 0
2.9 2
3.5 2
12
15
1.6
1.6
5.33
1.23
1.5178
z z 0.025 1.96
2
34
Estimation and hypothesis testing for two population

proportion
Let n1 and n2 be the sample size from the two
population. If x and y are the out come of interest
then the point estimate for each population is given
by p1=x/n1 and p2=y/n2 respectively.
The point estimates 1-2 =p1-p2
The interval estimate for the difference of
proportions is given by
size is plarge
and
n21(p11>
5,
n 1 (1-p1)>5,
If the sample
(
1
p
)
p
p
)
1
1
2
p1 p2 z
n2p2>5, then
n
n
35
Hypothesis testing for two proportions

To test the hypothesis
Ho: 1-2 =0
VS
HA: 1-2 0
The test statistic is given by
zcal
( p1 p2 ) ( 1 2 )
p1 (1 p1 ) p2 (1 p2 )
n1
n2
36
Small sample size

If the sample size is small and
n1p1<5, n2p2<5, then use students tstatistic at n1+ n2-2 degrees of
freedom with the given level of
significant.
37
Test of significance using the

chi-square
A chi square (2 ) distribution is a
probability distribution.
The chi-square is useful in making
statistical inferences about
categorical data in which the
categories are two and above.
Definition A statistic which measures
the discrepancy between K observed
frequencies O1, O2, . Ok and the
corresponding expected frequencies
38
Chi square
Chi square = 2 = { (Oi - ei)2 }
ei
The sampling distribution of the chisquare statistic is known as the chi
square distribution.
As in t distributions, there is a different
2 distribution for each different value
of degrees of freedom
39
Characteristics Chi square
1. Every 2 distribution extends

indefinitely to the right from 0.
2. Every 2 distribution has only one
(right ) tail.
3. As df increases, the 2 curves get
more bell shaped and approach the
normal curve in appearance (but
remember that a chi square curve
starts at 0, not at - )
40
If the value of 2 is zero, then there is a

perfect agreement between the observed and
the expected frequencies. The greater the
discrepancy
between the observed and expected
frequencies, the larger will be the value of 2.
In order to test the significance of the 2, the
calculated value of 2 is compared with the
tabulated value for the given df at a certain
level of significance.
41
Example:
The following table shows the relation
between the number of accidents in 1
year and the age of the driver in a
random sample of 500 drivers
between 18 and 50. Test, at a 01 level
of significance, the hypothesis that
the number of accidents is
independent of the driver's age.
42
There are 75 drivers between 18 and

25 who have no accidents, 115
between 26 and 40 with no
accidents, and so on, such a table is
called a contingency table. Each
box containing a frequency is
called a cell. This is a 3 x 3
contingency table.
43
Observed frequency
44
Expected frequency
45
Hypothesis: HO : There is no relation

between age of driver and number of
accidents
HA : The variables are dependent
(related)
The degrees of freedom (df) in a
contingency table with R rows and C
columns is:
df = ( R 1) ( C 1) Hence, 2 tab
46
2 calc = (75 90) /90 + (115 120 )

/120 + (110 90) /90 + + (5 15 ) /
15 = 1 + 0.208 + 4.444 + 0.556 +
0.417 + 2.222 + 6.667 + 0 + 6.667
= 22. 2 (This corresponds to a P-value
of less than .001)
Therefore, there is a relationship
between number of accidents and age
of the driver.
47
SUMMARY
48
49
THANK YOU
50

Hypothesis Testing Guide

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing Guide

Uploaded by

Copyright:

Available Formats

Chapter Five

In the previous section we have seen the

The formal process of hypothesis testing

Hypothesis is a testable statement that

Idea of hypothesis testing

Null and Alternative hypotheses

Steps in hypothesis testing

If the test statistic

If the test statistic

if | zcal | ztab reject H o

, ztabulated z for one tailed test

if zcal ztab reject H o

if zcal ztab do not reject H o

if zcal ztab reject H o

if zcal ztab do not reject H o

Type I error is more serious error and it is the level of

The judiciary Vs statistician

In most applications, the outcome of performing

But for what values of p-value should we reject

A researcher claims that the mean of the IQ for 16 students is

Hypothesis testing for proportions

To the hypothesis we need to follow the steps

Step 1: State the hypothesis

Two sample mean and

Two sample estimation

Hypothesis testing for two sample means

ztabulated z for one tailed test

Small sample size and population variance is not

Estimation and hypothesis testing for two population

Hypothesis testing for two proportions

Small sample size

Test of significance using the

Characteristics Chi square

1. Every 2 distribution extends

If the value of 2 is zero, then there is a

There are 75 drivers between 18 and

Hypothesis: HO : There is no relation

2 calc = (75 90) /90 + (115 120 )

You might also like