You are on page 1of 6

Introduction to Statistics

Math 58 Fall 2007


Jo Hardin
Exam 2
1) Short Answer Questions
(a) (4 pts) What is wrong with the null hypothesis statement H0: X 2 = X 1 ?
Hypothesis tests should always be about the population, not the sample.
(b) (4 pts) Suppose a researcher indicates that the p-value he calculated was not
statistically significant but suggestive. What might this mean?
The p-value was probably between 0.05 and 0.1.
(c) (4 pts) A poll conducted March 6-8, 2004 by the Wall Street Journal/NBC News asked
a random sample of 1018 respondents their opinions about gay marriage. When asked to
state whether they would favor or oppose a constitutional amendment making it illegal
for gay couples to marry, 43% responded in favor of the amendment and 52% opposed
(5% were unsure). When these same respondents were asked whether they would favor
or oppose a constitutional amendment that defined marriage as a union between a man
and a woman and made same-sex marriages unconstitutional, 54% favored the
amendment, 42% opposed (1% said it depends, and 3% were not sure).
A friend suggests carrying out a two-sample z-test of proportions to compare these
results. What is the primary reason why this would not be an appropriate analysis of
these data?
The samples arent independent.
(e) (3 pts) True or False? A histogram of sample data will have a normal distribution if
the sample size is large enough. Briefly explain.
False. You can probably come up with lots of data sets that wouldnt be normal even if
you sampled the entire population (income, colors of M&Ms,) What we learned is that
the sample mean and sample proportion are normally distributed if your sample is large
enough.
(d) (4 pts) Suppose the following statement is made in a statistical summary: A
comparison of breathing capabilities of individuals in households with low nitrogen
dioxide levels and individuals in households with high nitrogen dioxide levels indicated
that there is no difference in the means (two-sided p-value = .24). What is wrong with
this statement?

A large p-value communicates the decision not enough evidence to reject the null
hypothesis. A large p-value does *not* say that the null hypothesis is true. The quote
above says there is no difference in means (which insinuates that the null hypothesis is
true).
(f) (4 pts) In a study of the effects of marijuana use during pregnancy, measurements on
babies of mothers who used marijuana during pregnancy were compared to
measurements on babies of mothers who did not (New England Journal of Medicine). A
95% confidence interval for the difference in mean head circumference (nonuse minus
use) was .61 to 1.19 cm. What can be said from this statement about a p-value for testing
the hypothesis that the mean difference is zero?
We can reject the null hypothesis that the difference in population means is different from
zero (because zero is not in the interval).
(g) (2 pts) True or False? A statistical computer package will only print out a p-value or
confidence interval if the conditions for its validity are met. No explanation necessary.
False. The computer program will spit out results regardless of the appropriateness of the
testing procedure.
2) On a recent trip, I wandered through a parking garage in Las Vegas. I noticed that
there seemed to be a large number of California license plates. So I randomly selected a
subset of cars in the garage (with over 350 cars) and counted that 9 of the 15 cars had CA
plates.
(a) (8 pts) Use these data to determine a 95% confidence interval for the proportion of all
cars in this garage that have CA license plates. Report and interpret the interval.
np = 9 < 10, n(1-p) = 6 < 10 : our samples arent quite large enough to use the CLT (and
the normal distribution multiplier), but well calculate the CI anyway, and give a caveat
that we might be a bit off.
0.6 1.96 * sqrt( 0.6 * 0.4 / 15) = (0.552, 0.848)
We are 95% confident that the true proportion of cars in the garage with CA plates is
between 0.552 and 0.848.
(b) (4 pts) Indicate whether the width of this interval would change (larger, smaller, or no
change) if: (no explanations needed)
- the confidence level was increased to 99%? larger
- the sample proportion remained the same but the sample size was doubled?
smaller

- the sample proportion remained the same but the population of interest was the
proportion of cars in all Las Vegas garages? no change
- the sample proportion was larger? smaller
(c) (4 pts) Explain what is meant by the phrase 95% confident in part (a) without
using the words confident or sure (or other synonym) in your explanation.
If we had taken many many samples of size 15 from the cars in the garage and each
time we had calculated a 95% confidence interval for the true proportion of cars in the
garage with CA plates we would expect 95% of the confidence intervals to contain the
true value.
3) Eleven-month-old Frannie weighs 19 pounds. The national average for the weight of
an eleven-month-old baby is 20.25 pounds, with a standard deviation of 2 pounds.
(a) (4 pts) Determine the z-score for Frannies weight, and write a sentence interpreting
the z-score.
Z = (19-20.25) / 2 = -0.625
Frannies weight is 0.625 standard deviations below the national average.
(b) (4 pts) Determine what proportion of eleven-month-olds in the US weigh more than
Frannie. Be sure to state any assumptions that you make in order to do this calculation.
If we assume that weights are distributed normally, we can find the probability of
choosing a randomly selected baby who weighs more than Franie:
P( X > 19 ) = P( Z > -0.625) = 0.7357
Where X is a randomly selected babys weight, and Z is the respective z-score.
(*c) (4 pts) For an eighteen-month-old, the national average weight is 24.25 pounds, with
a standard deviation of 2.5 pounds. Determine what Frannies weight will have to be at
the age of eighteen months, in order for her to be among the middle 50% of eighteenmonth-old babies.
Again, using the normal table and related probabilities:
P( -0.67 < Z < 0.67) = P(-0.67 < (X-24.25) / 2.5 < 0.67) = 0.5
That is, the middle 50% of weights are between 22.575 lbs and 25.925 lbs.

4) Nenana is a small, interior Alaskan town that holds a famous competition to predict the
exact moment that spring arrives each year. The arrival of spring is defined to be the
moment when the ice on Tanana River breaks, which is measured by a tripod erected on
the ice with a trigger to an official clock. The minute at which the ice breaks has been
recorded in every year since 1917. Treating these data (through 2006) as a random
sample from the process by which nature produces the ice-breaking each year, software
reports the following 95% one-sample t confidence interval based on the date of ice
break (recorded in number of days since April 1).
One-Sample T: date
Variable
N
Mean
date
90 34.4778

StDev
5.9685

95% CI
(33.2277, 35.7279)

(a) (3 pts) Calculate the margin of error used in this analysis without using the actual
confidence interval. Show your work.
SE = 5.9685 / sqrt(90) = 0.629
t_{89} = 1.99
margin of error = t* SE = 1.25
(b) (2 pts) Would the standard error increase, decrease, or stay the same if all else
stayed the same but we (no explanation needed):
(i) increased the confidence level? stay the same
(ii) increased the sample size? decrease
(c) (2 pts) Convert this interval from the date scale back to the calendar year. Is this
years ice break date (April 27, 2007) in the confidence interval?
The interval goes from about the 4th of May to the 6th of May. April 27th is not in the
interval.
(d) (3 pts) Specify the primary reason why you would not be surprised if this years icebreak data was not in this interval. (Hint: the answer doesnt have anything to do with
global warming.)
The interval is for the sample mean, not an individual observation. We dont ever expect
individual observations to fall in a confidence interval for the population mean.

85
80
65

70

75

co u rse g ra d e

90

95

5) Dr. Lock, a colleague of ours at another college, conducted a study in which he taught
two sections of introductory statistics. He used exclusively sports-related examples in
one section and used a typical variety of examples in the other section. Otherwise, he
tried to teach the courses as similarly as possible. These two sections had been advertised
as such in the course registration process, so students knew in advance which section they
were signing up for. At the end of the course, Dr. Lock compared the total points earned
(from a maximum of 400 points) among students in the two sections.

sports

Sports
Regular

Mean
(x )
80.06
83.76

regular

St. Dev.
(s)
8.11
5.00

Sample Size
(n)
30
27

(a) (3 pts) Identify the observational units, the explanatory variable, and the response
variable.
Obs units: students
Expl Var: course (sports or regular)
Resp Var: score in course
(b) (4 pts) Examine visual displays and numerical summaries for comparing student
performance between the two sections. Write a few sentences describing what they
reveal about how the performances of students compare between the two sections.
The sports section seems to score slightly lower than the regular section (both in terms of
mean and median). However, the sports section also seems much more variable, both in
terms of high performing students and low performing students.

(c) (12 pts) Conduct a test of whether the two sections differed significantly. (For the
purposes of conducting this test, suppose that section was randomly allocated.) Include
all elements of the test, and define whatever symbols you introduce. Also summarize
your conclusion, being sure to comment on significance, causation and generalizability.
H0: 1 = 2
On average, students score about the same for either section of introductory statistics.
Ha: 1 2
On average, students do not score the same for either section of introductory statistics.
Where 1 is the AVERAGE score of students who take the sports section; 2 is the
AVERAGE score of students who take the regular section.
t_{26} = (X-bar1 X-bar2 -0) / sqrt( s1^2 / n1 + s2^2 / n2) = 2.09
p-value = 2* P( t_{26} > 2.09) < 2*0.025 = 0.05
The p-value is just under our specified cutoff of 0.05. We can reject the null hypothesis
that on average, students score the same in the two sections. It seems as though students
in the regular section score higher with statistical significance.
If, indeed, we randomly assigned the section, we would expect other variables (study
skills, background, etc.) to balance out in the two sections. A significant difference in the
course, therefore, would be attributed to the examples. That is, random assignment
would allow us to conclude that sports examples cause lower scores, on average.
It is impossible for the students to be a random sample from the university we are
studying (much less a random sample from all universities in the country!!) Therefore,
we are hesitant to generalize the results to any larger population than that which will be
taking introductory courses from Dr. Lock.
(d) (3 pts) No longer assuming that the sections were randomly allocated, suppose that
the data reveal that students in the sports section did significantly worse than those in the
regular section. How would you briefly respond to someone who claims that this shows
that using sports examples is harmful to students learning?
Without randomly assigning the course section, we are in the situation of collecting data
from an observational study. Observational studies have confounding variables which
prevent them from inferring causation. Here, it is possible that athletes take the sports
section and also have very little time to study. They dont perform as well in their
courses because they dont have the time to learn the material.

You might also like