You are on page 1of 5

Hypothesis Testing Summary

Hypothesis testing begins with the drawing of a sample and calculating its characteristics
(aka, statistics). A statistical test (a specific form of a hypothesis test) is an inferential process, based on probability, and is used to draw conclusions about the population parameters.
One way to clarity the process of hypothesis testing is to imagine that you have first of all, a
population to which no treatment has been applied (aka, comparison group). You know
the parameters of this population (for example, the mean and standard deviation). Another
population exists that is the same as the first, except that some treatment has been applied
(aka, the treatment or experimental group). You do not know the parameters of this population.
Samples are drawn from this latter population and the statistics derived from the sample
serve as the estimates of the unknown population parameters. This is the situation in which
hypothesis testing applies, and it provides an introduction for understanding more complicated versions of hypothesis testing that you will encounter later.
The logic of hypothesis testing can be stated in three steps:
1 A hypothesis concerning a population is stated.
2 A sample is selected from the population.
3 The sample data are used to determine whether the hypothesis can reasonably be
supported or not. Ultimately, the conclusion drawn is about the population not just
the sample.
None of this is necessary if the entire population is small and accessible, but this is almost
never the case.

Step 1: HO

Step 1 breaks down into a series of formal stages. The first stage is to state the null hypothesis, which is usually the hypothesis of no difference. The null hypothesis states that the treatment has no effect, or, stated differently, there is no difference between treated and untreated
populations (e.g., 1 2 = 0). That is, the independent variable or treatment will have no
effect on the dependent variable. The null hypothesis is represented by the symbol Ho.
Examples of several forms that the null hypothesis can take are contained in the following
table:

Explanation

H0:
A.

= 100

B.

=0

Population mean is 0.

C.

= 15

Standard deviation in the population is 15.

D.

1 2 = 0

Means of populations 1 and 2 are equal no difference in


the parameters 1 and 2

E.

1 22 2 = 0

Variance in population 1 is equal to variance in population


2.

F.

XY = 0

Correlation coefficient between x and y in the population is


0.

C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm

Population mean is 100.

Explanation

H0:
G.

1 2 = 0

H.

1 = 2 = 3

I.

= .5

The difference between XY in population 1 and XY in population 2 is 0.


The means in populations 1, 2, and 3 are equal.
The proportion in the population is.5

Step 1: HA

The second stage is to state the alternative hypothesis. It proposes the opposite of the null
hypothesis in that it says there will be an effect of treatment, there will be differences
between populations, or that the independent variable or treatment does indeed affect the
dependent variable. The symbol for the alternative hypothesis is either HA or H1 . Most
often the HA is non-directional it just says there will be a difference without saying in
which direction. Sometimes a directional hypothesis is used. This will be discussed a little
later.

Step 2: Sampling

Step 2 requires that a suitable sample is selected from the population. In order to adequately
represent the population, the sample must be random. See sections 9.4 through 9.9 of Hopkins, Hopkins, & Glass (1996) if you have problems with this concept.

Step 3: Statistical Test

In Step 3 the data from the sample are compared with the statement of the null hypothesis.
For example, the sample mean (representing the mean of the unknown population) is compared with the known population mean. The decision is made whether or not to reject the
null hypothesis. See the discussion in section 10.3 in HH&G for the reasoning behind dealing only with failure to accept the null hypothesis.
If we fail to accept null hypothesis, we accept the alternative hypothesis and conclude that
there is a treatment effect or a difference between populations, that is, that the independent
variable or treatment has affected the dependent variable.
These steps are restated in the following table:

Step

Action

1.

State the statistical hypothesis H0 to be tested (e.g., H0: = 100)

2.

Set the level of statistical significance (alpha level). That is, specify the
degree of risk of a Type I error the risk of incorrectly concluding that
H0 is false when indeed it is true. This risk, stated as a probability, is
denoted by (alpha) and is the probability of a type I error (e.g.,
=.05).

3.

Assuming H0 to be correct, determine the probability (p) of obtaining


a sample mean (X) that differs from the population mean () by an
amount as large or larger than that which was observed (e.g., if = 100,
and X = 108, calculate the probability of observing a difference
between X and of 8 or more points).

4.

Make a decision regarding H0 whether or not to reject it (e.g., if the


probability (p) from Step 3 is less than alpha (Step 2), we fail to accpet
the null hypothesis and conclude that mu does not equal 100.

C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm

Whenever we make a decision about rejecting or failing to reject the null hypothesis, two
types of error may occur:
We reject the null hypothesis when we should not because in reality the null hypothesis
is true. This is known Type I, or alpha error.
We accept the null hypothesis when we should not because in reality the null hypothesis
is false. This decision is known as Type II, or beta error.
The four outcomes of decision making are illustrated in the following box:
Actual State of Nature The Null
Hypothesis is, in reality:
True

Accept HO

Reject HO

Error, Type I or
alpha

False

Error, Type II
or beta

Our decision

Most experimenters hope to reject the null hypothesis and to therefore claim that their
experimental treatment has had an effect. However, as false claims of treatment effects (Type
I errors) are scientifically serious, it is necessary to set stringent criteria. We can never be
absolutely certain that we have correctly rejected, or failed to reject, the null hypothesis, but
we can determine the probability associated with making an error in this process.
You may recall from a previous lecture and textbook readings that the probability of obtaining a particular sample mean from a population can be determined using z-scores. Sample
means very close to the population mean are highly likely. Sample means distant from the
population mean (in the tails of the distribution) are very unlikely, but they do occur. If the
null hypothesis is true and our treatment has no effect, we would expect that the sample we
draw will have a mean close to that of the population. Sample means in the tails are not very
likely if the null hypothesis is true. Such means indicate that we should reject the null
hypothesis. (See Figure 10.2, p. 175, HH&G). A boundary or decision line has to be drawn,
therefore, between those sample means that are expected, given the null hypothesis, and
those that are so unlikely that they lead to rejection of the null hypothesis. The boundary
that separates these sample means is called the level of significance or alpha level. It is a
probability value beyond which obtained sample means are very unlikely to occur if the null
hypothesis is true. The value .05 is commonly used as the alpha ) level. It represents the proportion of the area in the tails of the distribution where sample means are sufficiently
unlikely, if the null hypothesis is true.
The alpha level also tells us the probability of producing a Type I error. An example makes
this whole process clearer a good one is provided on pp. 174-175 of HH&G
Although .05 is the most commonly accepted alpha level in psychological and educational
research, more stringent levels, such as .01 and .001 may be used when the consequence of
making a Type I error is serious.
A general statement of the z-score statistic is provided on p. 175 of HH&G:
hypothesizedpopulationmeanz = samplemean
--------------------------------------------------------------------------------------------------------------------s tan darderrorofthesamplingdistribution

C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm

A more general form of this which you will find applicable to a large range of statistics that
you will learn about in the future is:
samplestatistic hypothesizedpopulationparameter
teststatistic = ------------------------------------------------------------------------------------------------------------------------------------------s tan darderrorofthedistributionoftheteststatistic

This may be restated in less statistical terms as:


obtaineddifference
teststatistic = ---------------------------------------------------------------------------difference exp ectedbychance

Do remember that, when testing the null hypothesis, you can reject it when the difference
between your sample data and that which would be expected according to the null hypothesis is large enough. However, if a small difference is obtained, you should not accept the null
hypothesis. Instead, according to the logic involved in this process, you are only entitled to
say that you fail to reject the null hypothesis.
When we reject the null hypothesis, we are saying that the difference we obtained (between
the sample statistic and the hypothesized population parameter) is sufficiently unlikely to
occur by chance alone. We are entitled to say that our treatment has had an effect. But there
is always a small chance that we are wrong. In this case we have made a Type I error. The
probability that we are wrong is equal to significance level.
When findings are stated in a research report the null or the alternative hypotheses are normally not mentioned. Instead, the term statistical significance is used. If the null hypothesis
has been rejected, the findings are said to be statistically significant. If the null hypothesis
was not rejected the findings are not statistically significant.
It is necessary to make a statement about whether or not you obtained statistical significance
and to include the value of your sample statistic and say whether the probability of obtaining
that statistic is greater or smaller than the alpha level you have chosen. These values are often
included in brackets for linguistic simplicity. There is an endless variety of ways in which
you can say the same thing, but you must take care with the wording of statements about significance and non-significance.

One-Tailed Hypothesis Tests


The null hypothesis always says that there is no treatment effect. The alternative hypothesis
says that there is a treatment effect (or in other words, a difference between the sample data
and that expected according to the null hypothesis). Such a statement does not predict the
direction of difference created by the treatment. It is said to be a two-tailed hypothesis
because highly unlikely events in either tail of the distribution will lead to rejection of the
null hypothesis.
Although they are used less often, alternative hypotheses may be one-tailed or directional.
In this case, the researcher is predicting either an increase or a decrease as a result of treatment, but not both. In this case the null hypothesis is rejected only if the sample data fall in
the predicted tail of the distribution. The critical region still represents the same area of the
curve (e.g., .05) but the whole area is located only in one tail (see section 10.10, p. 181;
HH&G.
Two-tailed tests are said to be more conservative because the difference between the sample
data and that expected according to the null hypothesis (that is the treatment effect) must be
larger to achieve the same level of statistical significance, and thus reject the null hypothesis,

C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm

than in a one-tailed test. Even when a particular direction of treatment effect can be predicted a two-tailed test is still often used.

Statistical Power
The goal of hypothesis testing is usually to correctly reject the null hypothesis, or in other
words, to show that the treatment applied has had an effect. The probability of correctly
rejecting the null hypothesis is called the power of a statistical test.
Power is calculated as 1 - , where is the probability of making a Type II error (failing to
reject the null hypothesis when it is false). Statistical power is large when the treatment effect
is large. Put another way, you are more likely to correctly reject the null hypothesis when the
treatment has created a large difference between your sample data and the original population.
Other factors that influence power that are more directly controllable than size of treatment
effect are:
the alpha level chosen. Smaller levels produce smaller values for power.
whether a one-tailed or two-tailed test is used. Statistical power is greater for one-tailed
tests
sample size. Larger samples provide greater power

C:\rsm\y520\sec5982_fall02\week_12\hypothesis_test_summary011109.fm

You might also like