You are on page 1of 9

Hypothesis Testing in Stata

The purpose of this handout is to show how to perform hypothesis testing in Stata for

One Sample Means


One Sample Proportion
Two Sample Means
Two Sample Proportions

Note that there are usually several ways in Stata to perform each test. The commands differ based on the
information you have to work with; summary statistics or the raw data itself.

1 Sample Mean
The thermostat in your classroom is set at 72F, but you think the thermostat isnt working well. On seven
randomly selected days, you measure the temperature at your seat. Your measurements (in degrees
Fahrenheit) are 71, 73, 69, 68, 69, 70, and 71. Test whether the mean temperature at your seat is different
from 72F.

Suppose the data is loaded into Stata under the variable name temp
1 Sample mean- Hypothesis Test with Summary Statistics

We want to test H o : = 72 H a : 72

The Stata command is ttesti n x s o

Since the p-value = .0257 < .05, we reject the null hypothesis and conclude that the average temperature
is not 72 degrees.

1 Sample Mean- Hypothesis Test with Data in a Column

We want to test H o : = 72 H a : 72

The Stata command is ttest varname= o

Since the p-value = .0257 < .05, we reject the null hypothesis and conclude that the average temperature
is not 72 degrees.

Note that the p-values for this problem differ slightly from the previous output; this is because I rounded
the sample statistics when used in the ttesti command above.
1 Sample Proportion

A survey of 300 gun owners is taken and 120 (40%) of those surveyed say they have a gun for protection
for self/family. Use these results to test Ho: p = 32% vs. Ha: p > 32%, where p is the percent of all gun
owners who say they have a gun for protection of self/family

1 Sample Proportion- Hypothesis Test with Counts or Summary Statistics

The Stata command is

prtesti n n1 p0 , count
or
prtesti n p p0

The output is exactly the same-the only difference is what information is given to the prtesti command;
either the counts, or the summary statistics.

For the alternative of interest the p-value is 0.0015< 0.05 so we reject the null hypothesis.
1 Sample Proportion- With Data in a Column

Some researchers believe that more than 80% of the population is right handed. Lets test that claim using
our class survey data. We have a variable called righthanded, which is one if the person is righthanded, 0
otherwise. We want to test H o : p = 0.80 H a : p > 0.80

Data

The Stata command is

prtest varname= po

Since the p-value for the alternative of interest equals 0 which is less than 0.05, we reject the null
hypothesis.
2 Sample Means

From class survey data, we have the average amount men and women pay for haircuts. We wish to test if
the mean hair cut prices are the same, versus the hypothesis that they are different. That is, we wish to test
H o : men = women H a : men women

2 Sample Means with data in two columns

Suppose we have data in two columns as follows:

The Stata command is

ttest varname1=varname2, unpaired unequal

Since the p-value for the alternative of interest equals 0.0005 which is less than 0.05, we reject the null
hypothesis.
2 Sample Means with summary data

The Stata command is

ttesti n1 x1 s1 n2 x2 s2 , unequal

(not exactly the same output as the last page because of rounding when we input the mean and standard
deviation values).
2 Sample Means with the by command

Suppose your data is stacked and has a corresponding grouping variable as follows (females are coded 1,
men are coded 0):

The Stata command is

ttest varname, by(group_variable) unequal

We might want to test whether the men and women spend the same amount on their haircuts. Based on
the 2 sided p-value of 0.0005, we reject this null hypothesis.
2 Sample Proportions

2 Sample Proportions with Summary Data

Ticketmaster is evaluating two suppliers of a scanning system it is considering purchasing. Both scanners
are designed to detect forged tickets for sporting events. The company is interested in determining
whether there is a difference in the proportion of forged tickets detected by the two suppliers. Two
hundred known forged tickets will be randomly selected and scanned by systems from each supplier. For
supplier one, 186 forgeries are detected, and for supplier two, 168 are detected.

The stata command is prtesti n1 obs1 n2 obs2 , count

For the alternative that the proportion of detected forged tickets is the same for both companies, the p-
value is 0.0048 < 0.05, so we may reject the null hypothesis of equality and conclude that the proportion
of fake tickets detected is different for the two companies.
2 Sample Proportions with the by command

Suppose your data is stacked and has a corresponding grouping variable as follows:

The Stata command is

prtest varname, by(group_variable)

We might want to test whether the same percentage of males and females are right handed. Based on the 2
sided p-value of 0.8550, we fail to reject this hypothesis.

You might also like