You are on page 1of 27

Hypothesis Testing

W&W, Chapter 9

Overview
We will discuss two approaches to
hypothesis testing:
1) Using confidence intervals
2) Using critical t or z values, or p values

Hypothesis Test
A statistical hypothesis is simply a claim
about a population that can be put to
the test by drawing a random sample.

Elements of a Hypothesis Test


1.

The null hypothesis, Ho: Specifies


hypothesized values for one or more of
the population parameters
2. The alternative hypothesis, H A: A
statement which says that the population
parameter is something other than the
value specified by the null hypothesis

Elements of a Hypothesis Test


3. The error level, or , which is just 1 the confidence level (in terms of
probability)
4. One-tailed versus two-tailed tests

Example #1
Suppose we want to know if there is a difference in
the salaries for male and female professors. We
might take two samples, one of men and one of
women, to determine their respective mean salary
levels. The calculated M1 and M2 are estimates of the
population means, 1 and 2.

Ho: 1 - 2 = 0, or Ho: 1 = 2
HA: 1 - 2 0, or HA: 1 2

Example #1 (continued)
This is stated as a two-tailed test. If you
believe that women make less than
men, then the alternative hypothesis
might be something like:
HA: 1 - 2 > 0

Example #2
As an employee of the Federal Trade Commission, you are
vigilant in your stand against false or misleading advertising. A
manufacturer of razor blades claims that their new blades give
on average 15 good shaves. You conduct a small test by asking
10 randomly chosen men to each try one of these new razor
blades. The average number of good shaves reported is 13 and
the standard deviation is 3.62. The manufacturer claims that the
true number of shaves (or population value) is 15, or:

Ho: = 15

Example #2 (continued)
If we want to challenge the manufacturers
claim, we might employ a one-tailed test,
where the alternative hypothesis would be:
HA: < 15
Or if we were agnostic, we could use a twotailed test:
HA: 15

Example #3
A more general test that we will see when we get to regression
is where the null hypothesis is equal to zero, and we want to
know if our parameters have a statistically significant effect, or
are different from zero.
For example, suppose a researcher wants to determine if the
amount of electoral rules are related to voter turnout. Suppose
the impact of electoral rules on voter turnout is called . A
typical hypothesis test will be something like the following:
Ho: Electoral rules have no impact on voter turnout, or = 0
HA: Electoral rules affect voter turnout, or 0

Testing Hypotheses: Confidence


Intervals
Let's start with the first example I gave
where we want to see if there is a
difference in the mean salary level (in
thousands of dollars) of male and
female professors. Suppose that for
men (M1 = 16, n1=10, (X1 - M1)2 = 106)
and for women (M2 = 11, n2 = 5, (X2 M2)2 = 40).

Testing Hypotheses: Confidence


Intervals
We calculate the confidence interval as:
(1 - 2) = (M1 - M2) +/- t/2 sp(1/n1 + 1/n2)
We need to calculate the value of s p, which is just:
sp2 = (X1 - M1)2 + (X2 - M2)2
(n1 - 1) + (n2 - 1)
= (106 + 40)/[(10-1) + (5-1)] = 146/13
= 11.2, thus sp = 11.2 = 3.35

Testing Hypotheses: Confidence


Intervals
We plug this back into our confidence
interval to obtain:
(1 - 2)

= (16 - 11) +/- 2.16 (3.35)* (1/10 + 1/5)


= 5 +/- 4

Note: 2.16 is the critical value of t, for


95% confidence, 13 df (for a two tailed
test).

Testing Hypotheses: Confidence


Intervals
With 95% confidence, the difference between
our means is estimated to be between 1 and
9, thus the claim that there is no difference
cannot be accepted, i.e., we can reject the
null hypothesis. Zero is not contained in the
interval.
In general, any hypothesis that lies outside the
confidence interval may be rejected. Thus
the confidence interval may be regarded as
the set of acceptable hypotheses.

Testing Hypotheses: p-values


Let's go back to our one-tailed test by the
FTC employee, who wants to determine
if the razor blade manufacturer's claim
of 15 good shaves is valid.
Ho: = 15
HA: < 15

Testing Hypotheses: p-values


A p-value is just the probability that the
sample value would be as large as the
value actually observed if Ho is true. In
other words, the p-value summarizes
how much agreement there is between
the data and the null hypothesis. In this
case, the null is that the razors give 15
good shaves.

Testing Hypotheses: p-values


We start by calculating the t or z
statistic associated with our observed
value. We would use t in this problem
because the sample size is small
(N=10), and the population standard
deviation is unknown (when the
sample size is large, t and z are
equivalent).

Testing Hypotheses: p-values


t = M - o = 13 - 15 = -1.74
s/N
3.62/10
We can think of t as:
t = estimate - null hypothesis
standard error
If the null hypothesis is zero, then
t = estimate/standard error.
In this case, the t ratio simply measures the size
of the estimate relative to its standard error.

Testing Hypotheses: p-values


Now we want to find the area beyond that value of t,
which gives us the p-value. In this problem, t = -1.74.
To find the p-value, we need to take into account our
degrees of freedom, df = n-1 in this problem, which is
9.
We go to the t-table and look to see where our calculated
t falls relative to the cutoff values for various probability
values. Our value of t is between the t values of 1.38
(p=.10) and 1.83 (p=.05). Thus we can say that our pvalue is between .10 and .05.

Comparing our p-value or t/z statistic with a


critical p-value or t/z
A classical hypothesis test consists of
setting a critical value, which will give us
the reject and accept regions. For
example, for a one-tailed test, with 95%
confidence ( = .05), we use a value of
z = 1.64 as our critical value, or for a
two-tailed test, we use z = 1.96.

Comparison (continued)
We reject the null hypothesis if our calculated
t or z is beyond the critical t or z, or if the pvalue is .
In the above example, a 95% critical t value
for 9 df is 1.83. Since our calculated t does
not exceed the critical t (or fall in the reject
region), we must accept the null hypothesis,
the manufacturer's claim of 15 good shaves.
Also, our p-value is larger than , which is .05.

The Critical Region


A way to think about a calculated value
in the critical region is that:
1) Ho is true, but we have been
exceedingly unlucky and got a very
improbable sample.
2) Ho is not true after all. Thus it is no
surprise that our observed value was so
high or low.

The Critical Region


When we calculated the difference
between male and female professors'
salaries, if the difference is really large,
then we would expect to find something
in the tails, very far away from the
center of the distribution where the
difference is zero.

Type I and Type II Errors


Choosing an alpha level is tricky because
it sets the level at which we will reject
the null hypothesis. And there is a
chance that the higher this value is, the
greater the chance that we will falsely
reject a true Ho.

Type I and Type II Errors


State of the World

Ho Accepted

Ho Rejected

If Ho is true

Correct decision

Type I error

Pr = 1-

Pr =

Type II error

Correct decision

Probability =

Probability = 1 -
= power of the test

If Ho is false

Type I and Type II Errors


To give you an analogy, in a court of law
we assume people are innocent (the
null hypothesis) until proven guilty.
A Type I error would be finding an
innocent man guilty.
A Type II error would be letting a guilty
man go free.
Which is worse?

Type I and Type II Errors


By decreasing our error or alpha level, we will
increase the chance of a Type II error
(accepting the null when it is really false)
because we make the criteria for rejection
more stringent.
The only way that error can be reduced without
increasing the probability of a Type II error is
by increasing the sample size.

You might also like