You are on page 1of 6

# Hypothesis Testing

Hypothesis
A hypothesis is an assumption or a statement that may or may
not be true
The hypothesis is tested on the basis of information obtained
from a sample
of an apartment in a multistoried building is, one may be
interested in knowing whether or not the assessed value equals
some particular value, say Rs.80 lakh
Example whether a new drug is more effective than the
existing drug based on the sample data?
Example whether the proportion of smokers in a class is
different from 0.30?
Null Hypothesis
The hypotheses that are proposed with the intent of receiving a
rejection for them are called null hypotheses
This requires that we hypothesize the opposite of what is
desired to be proved
expenditure are related, we formulate the null hypothesis that
they are not related
If we want to prove that the average wages of skilled workers
in town 1 is greater than that of town 2, we formulate the null
hypotheses that there is no difference in the average wages of
the skilled workers in both the towns
A null hypothesis is denoted by H0
Alternative Hypothesis
Rejection of null hypotheses leads to the acceptance of
alternative hypotheses
The rejection of null hypothesis indicates that

## o the relationship between variables (e.g., sales and

o the difference between means (e.g., wages of skilled workers
in town 1 and 2) have statistical significance
o the difference between proportions have statistical
significance
The acceptance of the null hypotheses indicates that these
differences are due to chance
The alternative hypotheses are denoted by H 1
One-tailed & Two-tailed tests
A test is called one-tailed only if the null hypothesis gets
rejected when a value of the test statistic falls in one specified
tail of the distribution
A test is called two-tailed if null hypothesis gets rejected when
a value of the test statistic falls in either one or the other of the
two tails of its sampling distribution
Two-tailed test Example
Consider a soft drink bottling plant which dispenses soft drinks
in bottles of 300 ml capacity
The bottling is done through an automatic plant
An overfilling of bottle (liquid content more than 300 ml) means
a huge loss to the company given the large volume of sales
An under filling means the customers are getting less than 300
ml of the drink when they are paying for 300 ml
This could bring bad reputation to the company
The company wants to avoid both overfilling and under filling
Therefore, it would prefer to test the hypothesis whether the
mean content of the bottles is different from 300 ml
This hypothesis could be written as: H 0 : = 300 ml; H1 :
300 ml
The hypotheses stated above are called two-tailed hypotheses
One-tailed test Example

## If the concern is the overfilling of bottles, it could be stated as:

H0 : = 300 ml; H1 : > 300 ml;
Such hypotheses are called one-tailed hypotheses and the
researcher would be interested in the upper tail (right hand tail)
of the distribution
If the concern is loss of reputation of the company (under filling
of the bottles), the hypothesis may be stated as:
H0 : = 300 ml; H1 : < 300 ml;
The hypothesis stated above is also called one-tailed test and
the researcher would be interested in the lower tail (left hand
tail) of the distribution
Type 1 & Type 2 Error
Type-1 error It is the probability of rejecting the H 0 when it is
true. It is denoted as . In QC, it is termed as producers risk,
because it is the probability of rejecting a good lot
Type-2 error It is the probability of accepting the H 0 when it
is false. It is denoted as . In QC, it is termed as consumers
risk, because it is the probability of accepting a bad lot
The expression (1 ) is called power of test
To decrease the risk of committing both types of errors, one
may increase the sample size

Decision
sample

based

on

H0 is true

H0 is false

Reject H0

Type-1 error

Correct Decision

Accept H0

Correct Decision

Type-2 error

Formulation of Hypothesis
While designing any hypotheses, there are a few criteria that the
researcher must fulfill. These are:
It must be formulated in simple, clear, and declarative form

## A broad hypothesis might not be empirically testable

To test only one relationship between only 2 variables at a time

for the new diet drink will have positive impact on brand
awareness of the drink

## High organizational commitment will lead to lower

turnover intention
A hypothesis must be measurable and quantifiable
The validation of the hypothesis would necessarily involve
testing the statistical significance of the hypothesized relation

Testing of Hypothesis
The following steps are followed in the testing of a hypothesis:

## Setting up of a suitable significance level - The level of

significance denotes the probability of rejecting the null
hypothesis when it is true. The value of varies from problem
to problem, but usually it is taken as either 5 % or 1 %

## Determination of a test statistic This could be Z or t or

2
F or test statistic & what is to be used depends on various
assumptions

## Determination of critical region Before a sample is

drawn from the population, it is very important to specify the
values of test statistic that will lead to rejection or acceptance
of the null hypothesis. The one that leads to the rejection of
null hypothesis is called the critical region. Given , the optimal
critical region for a two-tailed test consists of that /2 % area in
the RH tail of the distribution plus that /2 % in the LH of the
distribution where that null hypothesis is rejected
Computing the value of test-statistic
Inference H0 may be rejected or accepted depending upon
whether the computed value falls in the rejection or the
acceptance region
Degrees of Freedom (d.f.)
The d.f. is the no. of values in a calculation that we can vary

## Suppose that we know the mean of certain data is 25 and that

the values are 20,10, 50, and one unknown value, x. Then we
can determine that x = 20
Suppose that we know the mean of a data set is 25, with values
20, 10, and two unknown values, say x & y
Means (20 + 10 + x + y)/4 = 25. we have 30 + x + y = 100
or x + y = 70. With this we obtain y = 70 - x;
Once we choose a value for x, the value for y is determined.
This shows that there is 1 d.f.
Now we'll look at a sample size of 100. If we know that the
mean of this sample data is 20, but do not know the values of
any of the data, then there are 99 d.f. All values must add up
to a total of 20 x 100 = 2000. Once we have the values of 99
elements in the data set, then the last one can be determined
If the size of the given sample is n, then the d.f. will be (n-1).
In the contingency table the d.f. is calculated in a slightly
different manner. If the order of CT is r x c, then the d.f. will be
(r 1)(c 1) where r = # of rows & c = # of columns
2-test
The chi-square test is widely used in research. For the use of
chi-square test, data is required in the form of frequencies
Data expressed in percentages or proportion can also be used,
provided it could be converted into frequencies
The majority of the applications of chi-square ( 2) are with
discrete data
Unlike the normal and t distribution, the chi-square distribution
is not symmetric
The values of a chi-square are greater than or equal to zero
The shape of a chi-square distribution depends upon the
degrees of freedom*
With the increase in degrees of freedom, the distribution tends
to normal
2-test applications
Goodness of Fit 2-test is used to find out how well the
theoretical distribution fit with the empirical distribution of
observed distribution obtained from the sample data

## Independence of Attributes 2-test is used to find out

whether 2 or more attributes are associated or not?
Equality of more than two population proportions For
example, the interest may be in determining whether in an
organization, the proportion of the satisfied employees in 4
categories, viz., class I, class II, class III and class IV employees
is the same?