You are on page 1of 8

ERRORS IN HYPOTHESIS TESTING

A superintendent in a medium size school has a problem. The mathematical scores on nationally
standardized achievement tests such as the SAT and ACT of the students attending her school are
lower than the national average. The school board members, who don't care whether the football
or basketball teams win or not, is greatly concerned about this deficiency. The superintendent
fears that if it is not corrected, she will loose her job before long.
As the superintendent was sitting in her office wondering what to do, a salesperson approached
with a briefcase and a sales pitch. The salesperson had heard about the problem of the
mathematics scores and was prepared to offer the superintendent a "deal she couldn't refuse."
The deal was teaching machines to teach mathematics, guaranteed to increase the mathematics
scores of the students. In addition, the machines never take breaks or demand a pay increase.
The superintendent agreed that the machines might work, but was concerned about the cost. The
salesperson finally wrote some figures. Since there were about 1000 students in the school and
one machine was needed for every ten students, the school would need about one hundred
machines. At a cost of $10,000 per machine, the total cost to the school would be about
$1,000,000. As the superintendent picked herself up off the floor, she said she would consider the
offer, but didn't think the school board would go for such a big expenditure without prior
evidence that the machines actually worked. Besides, how did she know that the company that
manufactures the machines might not go bankrupt in the next year, meaning the school would be
stuck with a million dollar's worth of useless electronic junk.
The salesperson was prepared, because an offer to lease ten machines for testing purposes to the
school for one year at a cost of $500 each was made. At the end of a year the superintendent
would make a decision about the effectiveness of the machines. If they worked, she would pitch
them to the school board; if not, then she would return the machines with no further obligation.
An experimental design was agreed upon. One hundred students would be randomly selected
from the student population and taught using the machines for one year. At the end of the year,
the mean mathematics scores of those students would be compared to the mean scores of the
students who did not use the machine. If the means were different enough, the machines would
be purchased. The astute student will recognize this as a nested t-test.
In order to help decide how different the two means would have to be in order to buy the
machines, the superintendent did a theoretical analysis of the decision process. This analysis is
presented in the following decision box.

"Real World"

DECISION

The machines don't work.

The machines work.

Type I

CORRECT

ERROR

probability = 1-

probability =

"power"

Buy the machines.


Decide the machines work.

Type II
Do not buy the machines.

CORRECT

Decide that the machines do not work

probability = 1 -

ERROR
probability =

The decision box has the decision that the superintendent must make on the left hand side. For
simplicity's sake, only two possibilities are permitted: either buy all the machines or buy none of
the machines. The columns at the top represent "the state of the real world". The state of the real
world can never be truly known, because if it was known whether or not the machines worked,
there would be no point in doing the experiment. The four cells represent various places one
could be, depending upon the state of the world and the decision made. Each cell will be
discussed in turn.
1. Buying the machines when they do not work.
This is called a Type I error and in this case is very costly ($1,000,000). The probability of this
type of error is , also called the significance level, and is directly controlled by the
experimenter. Before the experiment begins, the experimenter directly sets the value of . In this
case the value of would be set low, lower than the usual value of .05, perhaps as low as .0001,
which means that one time out of 10,000 the experimenter would buy the machines when they
didn't work.
2. Not buying the machines when they really didn't work.
This is a correct decision, made with probability 1work and the machines are not purchased.

when in fact the teaching machines don't

The relationship between the probabilities in these two decision boxes can be illustrated using
the sampling distribution when the null hypothesis is true. The decision point is set by , the

area in the tail or tails of the distribution. Setting


the tails of the distribution.

smaller moves the decision point further into

3. Not buying the machines when they really work.


This is called a Type II error and is made with probability . The value of is not directly set by
the experimenter, but is a function of a number of factors, including the size of , the size of the
effect, the size of the sample, and the variance of the original distribution. The value of is
inversely related to the value of ; the smaller the value of , the larger the value of . It can
now be seen that setting the value of to a small value was not done without cost, as the value
of is increased.
4. Buying the machines when they really work.
This is the cell where the experimenter would usually like to be. The probability of making this
correct decision is 1- and is given the name "power." Because was set low, would be high, and
as a result 1- would be low. Thus it would be unlikely that the superintendent would buy the
machines, even if they did work.
The relationship between the probability of a Type II error () and power (1-) is illustrated
below in a sampling distribution when there actually was an effect.

The relationship between the size of and can be seen in the following illustration combining
the two previous distributions into overlapping distributions, the top graph with =.05 and the
bottom with = .01.
H0 true H1 true

The size of the effect is the difference between the center points ( ) of the two distributions. If
the size of the effect is increased, the relationship between the probabilities of the two types of
errors is changed.

When the error variance of the scores are decreased, the probability of a type II error is decreased
if everything else remains constant, as illustrated below.

An interactive exercise designed to allow exploration of the relationships between alpha, size of
effects, size of sample (N), size of error, and beta can now be understood. The values of alpha,
size of effects, size of sample, and size of error can all be adjusted with the appropriate scroll
bars. When one of these values is changed, the graphs will change and the value of beta will be
re-computed. The area representing the value of alpha on the graph is drawn in dark gray. The
area representing beta is drawn in dark blue, while the corresponding value of power is
represented by the light blue area. Using this exercise the student should verify:

The size of beta decreases as the size of error decreases.

The size of beta decreases as the size of the sample increases.

The size of beta decreases as the size of alpha increases.

The size of beta decreases as the size of the effects increase.

The size of the increase or decrease in beta is a complex function of changes in all of the other
values. For example, changes in the size of the sample may have either small or large effects on
beta depending upon the other values. If a large treatment effect and small error is present in the
experiment, then changes in the sample size are going to have a small effect.

A SECOND CHANCE
As might be expected, in the previous situation the superintendent decided not to purchase the
teaching machines, because she had essentially stacked the deck against deciding that there were
any effects. When she described the experiment and the result to the salesperson the next year,
the salesperson listened carefully and understood the reason why had been set so low.
The salesperson had a new offer to make, however. Because of an advance in microchip
technology, the entire teaching machine had been placed on a single integrated circuit. As a result
the price had dropped to $500 a machine. Now it would cost the superintendent a total of
$50,000 to purchase the machines, a sum that is quite reasonable.
The analysis of the probabilities of the two types of errors revealed that the cost of a Type I error,
buying the machines when they really don't work ($50,000), is small when compared to the loss
encountered in a Type II error, when the machines are not purchased when in fact they do work,
although it is difficult to put into dollars the cost of the students not learning to their highest
potential.
In any case, the superintendent would probably set the value of to a fairly large value (.10
perhaps) relative to the standard value of .05. This would have the effect of decreasing the value
of and increasing the power (1-) of the experiment. Thus the decision to buy the machines
would be made more often if in fact the machines worked. The experiment was repeated the next
year under the same conditions as the previous year, except the size of was set to .10.
The results of the significance test indicated that the means were significantly different, the null
hypothesis was rejected, and a decision about the reality of effects made. The machines were
purchased, the salesperson earned a commission, the math scores of the students increased, and
everyone lived happily ever after.

THE ANALYSIS GENERALIZED TO ALL EXPERIMENTS

The analysis of the reality of the effects of the teaching machines may be generalized to all
significance tests. Rather than buying or not buying the machines, one rejects or retains the null
hypothesis. In the "real world," rather than the machines working or not working, the null
hypothesis is true or false. The following presents the boxes representing significance tests in
general.

"Real World"

NULL FALSE
DECISION

NULL TRUE
ALTERNATIVE FALSE

ALTERNATIVE
TRUE

No Effects
Real Effects

Reject Null
Type I

CORRECT

ERROR

prob = 1-

prob =

"power"

Accept Alternative
Decide there are
real effects.

Retain Null

Type II
CORRECT

Retain Alternative

ERROR
prob = 1 -

Decide that no effects were discovered.

prob =

CONCLUSION
Setting the value of is not automatic, but depends upon an analysis of the relative costs of the
two types of errors. The probabilities of the two types of errors (I and II) are inversely related. If
the cost of a Type I error is high relative to the cost of a Type II error, then the value of should
be set relatively low. If the cost of a Type I error is low relative to the cost of a Type II error, then
the value of should be set relatively high.

http://www.psychstat.missouristate.edu/introbook/sbk26.htm

You might also like