Professional Documents
Culture Documents
Final Examination
Directions
The exam will end 3 hours minutes after it begins. The exam is divided into two parts.
The first part is multiple choice. Please answer the multiple choice questions on the exam
by circling the best answer (some rounding occurs in several places). The second part of
the exam consists of several problems. Please answer these problems in the space
provided on the exam (you may use the backs of the sheets if necessary). You will get
partial credit for these problems provided that your answers are organized and legible so
that your train of thought can be easily followed. All answers must also be transferred to
the answer sheet to be fully counted.
Good Luck
DON'T EVEN THINK ABOUT PANICING
By Printing my name below I acknowledge that Harvard has an honor code and that I
will adhere to it. Failure to abide by the honor code could result in failing this course and
having to wash Professor Parzens car with my toothbrush.
NAME: ______________________________________________ (-50 if not printed)
Multiple Choice (3 points each)
1) A hypothesis test is used to prevent a machine from under-filling or overfilling
quart bottles of beer. On the basis of a sample, the null hypothesis is rejected and
the machine is shut down for inspection. A thorough examination reveals there is
nothing wrong with the filling machine. From a statistical point of view:
a. A correct decision was made.
b. A Type I and Type II error were made.
c. A Type I error was made.
d. A Type II error was made.
2) The median waiting time for patients to see a doctor at a local emergency room is
much smaller than the mean waiting time. Which of the following is most
consistent with this information (circle one):
a. A histogram of the waiting times would be symmetric.
b. A histogram of the waiting times would be left-skewed.
c. A histogram of the waiting times would be right-skewed.
3) A student is studying very hard in a fluid dynamics course, but he knows he will
either pass or not pass. Suppose this student, Jack Daniels, has a probability of
0.90 for studying the night before the exam. Also, he has a probability of 0.75 for
passing the exam. If the probability of passing the exam, given that he studied the
night before, is 0.82, what may you conclude?
a. The probability of Jack not passing the exam is 0.10.
b. P(Jack studies OR Jack passes) is greater than 0.75.
c. P(Jack does not pass AND Jack studies) is greater than 0.5.
d. P(Jack passes AND Jack studies) is greater than 0.75.
e. None of the above
4) Suppose a computer processor yielded the following random sample of binary
digits:
0101001101010100010000010101010001011010
01010010010101010100110001011010
Is the computer processor yielding an even distribution of ones and zeros? If the
above sample contains 72 digits, of which thirty are ones, what is the value of
the test statistic to answer this question?
a. t = -1.41
b. t = -1.65
c. t = 1.65
d. t = 1.41
e. None of the above
5) If Steve and Doug Butabi want to find the proportion of people who believe they
can move their heads graciously, how large of a sample size is required so that the
margin of error is at most 2 percentage points with a 95% confidence level?
a. 3382
b. 4148
c. 34
d. 266
e. None of the above
6) In the Land of Chocolate, three friends (Hershey, Nougat, and B.C.) all have to
make tough decisions about the next phase of their life where they have ONLY
two options: going to college or working in the nearby sugar mines of Dos
Catorce. Suppose the following probabilities are true: P(Hershey goes to college)
= 0.2 = P(Nougat works at the sugar mines), P(B.C. goes to college) = 0.7. If
each of them makes their choice independent of the others, what is the probability
of Hershey and Nougat going to college while B.C. works at the sugar mines?
a. 0.112
b. 0.048
c. 0.028
d. None of the above
7) The following diagram comes from a famous piece of music.
A random sample of 81 people indicated that 19 people knew the piece of music
from the diagram alone. A random sample of 200 people (independent from the
first) indicated that 175 people knew the piece of music when it was played on a
piano. Construct a 95% confidence interval for the population proportion of
people who know the piece of music from the diagram alone.
a) (.14,.33)
b) (.23,.56)
c) (.12,.23)
d) (.32,44)
e) None of the above
Below you are given the graphs of two normal density curves, both with the same mean.
Use these density curves to answer the following questions:
8) The area under each of these curves is equal to 1.
a) True
b) False
9) Which curve has the larger standard deviation?
a) Graph A
b) Graph B
10) Which distribution has a smaller percent of its data between 35 and 40 units?
a) Graph A
b) Graph B
11) Give a rough estimate of the standard deviation of the density curve in Graph B
a) 5
b) 10
c) 15
d) 20
e) None of the above
12) An instructor gives the same y versus x data as given below to four students.
They each come up with four different answers for the straight line regression
model. Only one is correct. The correct model is
a. y = 60x 1200
b. y = 30x 200
c. y = 139.43 + 29.684x
d. y = 1+ 22.782x
13) A scientist finds that regressing the y versus x data given below results in the
coefficient of determination for the straight-line regression model to be one.
The missing value for y at x = 17 most nearly is
a. -2.444
b. 2.000
c. 6.889
d. 34.00
14) Suppose Z is a standard normal random variable. Then what is the probability that
X=1+2Z will be less than 3?
a) .1587
b) .3413
c) .8413
d) .0013
e) None of the Above.
15) Suppose that X is a binomial random variable with n=3, p=.22. What is the
probability that X will take the value 2?
a) .886744
b) .113256
c) .037752
d) .962248
e) None of the Above.
16) Consider a game where you win $9 with probability .1 and lose $1 with
probability .9. What is your expected profit for this game?
a) $1
b) $1
c) 0
d) $2
e) None of the Above.
17) Suppose that X is a random variable taking the value 5 with probability .4, and
taking the value 5 with probability .6. What is the standard deviation of X?
a) 10
b) 4
c) 24
d) 24
e) None of the Above.
18) Suppose that X is a random variable with a binomial distribution. If the expected
value of X is 50 and the variance of X is 25 then the distribution of X must be:
a) Symmetric
b) Skewed Right
c) Skewed Left
d) It cannot be determined from the given information.
19) If two random variables X and Y have a negative covariance, then:
a) high values of x tend to be associated with high values of y and low values
of x tend to be associated with low values of y.
b) high values of x tend to be associated with low values of y and low
values of x tend to go with high values of y.
c) negative values of x tend to go with negative y values, and vice versa.
d) the expected value of x times y is less than zero.
20) A management-consultant firm uses a regression model where X1 stands for
previous experience, X2 for number of years at current job, and X3 for score on a
job-aptitude test. These variables are used in a regression model to predict job
satisfaction. Job satisfaction ranges from 1 to 20, with 20 indicating that an
employee is satisfied with every aspect of his or her job. The prediction equation
is Yhat = 1.7 0.15 X1 + 0.25 X2 + 0.14 X3. What would the consulting firm
predict for the job satisfaction of an employee who has 15 years of prior
experience, 10 years of employment at the present job, and an aptitude test score
of 85?
a. 14.83
b. 13.85
c. 17.79
d. 15.12
e. None of the above
21) The average cost of tuition, room and board at small private liberal arts colleges is
reported to be $8,500 per term, but a financial administrator believes that the
average cost is higher. A study was conducted using a sample of 150 small liberal
arts colleges. The computer output below was obtained. Let o = 0.05.
Hypothesis test results:
: population mean
H
0
: = 8500
H
A
: > 8500
Based on the output, the conclusion should be
a. the true average cost is higher than $8,500.
b. the true average cost is lower than $8,500.
c. the true average cost is equal to $8,500.
d. the true average cost is equal to $8,708.90.
22) In developing a confidence interval for a population mean, a sample size of 40
observations was used. The CI was 17.25 2.42. Had the sample size been 160
instead of 40, the CI would have been
a) 17.25 1.68
b) 17.25 1.21
c) 69.00 9.68
d) 17.25 9.68
23) After fitting a regression model, if the sum of the residuals equals 0 (
1
0
n
i
i
e
=
=
)
(a) The model is fitting well.
(b) We have reason to doubt the normality assumption.
(c) It tells us nothing.
(d) The slope parameter must be zero.
Mean Sample Mean Std. Err. DF T-Stat P-value
8708.9 96.36292 149 2.1678462 0.0159
24) If the coefficient of determination
2
100% R = , then
(a) none of the variability in the observations is explained by the model fit.
(b) all observations fall on the fitted line exactly.
(c) the model is not true.
(d) a quadratic model would fit the data better.
25) In testing the hypothesis Ho : = 75 vs Ha : 75, the following information is
known: n = 64, x = 72, and s = 10. The computed test statistic is equal to
a) 1.96
b) 2.4
c) -2.4
d) -1.96
26) Suppose the 95% confidence interval for the true population proportion p is (0.36,
0.54). Based on this confidence interval alone, in which of the following set(s) of
hypotheses would the null hypothesis be rejected (at the 0.05 significance level)?
a) Ho : p = 0.3 versus Ha : p 0.3
b) Ho : p = 0.4 versus Ha : p 0.4
c) Ho : p = 0.5 versus Ha : p 0.5
d) All of the above.
27) Your boss asks you to calculate a 99% confidence interval instead of a 90%
confidence interval. What is an advantage and a disadvantage of this action?
a) The advantage is higher confidence. The disadvantage is a wider interval.
b) The advantage is higher confidence. The disadvantage is a narrower interval.
c) The advantage is lower confidence. The disadvantage is a wider interval.
d) The advantage is lower confidence. The disadvantage is a narrower interval.
Based on a random sample of 1000 high school students, 280 of them said they are
current smokers. The 90% confidence interval for the true proportion of all high school
students that are current smokers is (0.26, 0.30).
28) Does the sample proportion lie in the interval (0.26, 0.30)?
a) Yes
b) No
c) Can't tell
29) Does the population proportion lie in the interval (0.26, 0.30)?
a) Yes
b) No
c) Can't tell
30) If we use a 95% confidence level instead of a 90% confidence level, will the
confidence interval calculation from the same data produce an interval narrower
than (0.26, 0.30)?
a) Yes
b) No
c) Can't tell
31) Will the sample proportion for a future sample of 1000 high school students lie in
the interval (0.26,0.30)?
a) Yes
b) No
c) Can't tell
The scatterplot below displays information for 50 states for the year 2000 with regard to
the variables: M.D.s per 100,000 which represents the number of doctors per 100,000
residents and Percent Poverty, the percentage of the population considered to be living in
poverty. The R-sq value is 5.6% and the least squares regression line is
M.D.s per 100,000 = 279.3 4.175 (Percent Poverty)
32) Which of these options better interprets the value of the slope?
a) For each additional percent in poverty the estimated number of M.D.s per
100,000 goes down by 4.175 on the average.
b) For each additional percent in poverty the estimated number of M.D.s per
100,000 goes up by 4.175 on the average.
c) For each additional M.D. the estimated percent in poverty goes down by 4.175%
on the average.
d) For each additional M.D. the estimated percent in poverty goes up by 4.175% on
the average.
e) For every 1 M.D. the percent in poverty goes down by 4.175%.
33) In the year 2000 the percent in poverty for Tennessee was 13.4. According to the
model (or regression equation), how many doctors would we have expected per
100,000 people?
a) About 279
b) About 275
c) About 223
d) About 335
e) About 250
34) Which of these statements is the best interpretation of R-sq in this example?
a) 5.6% of the people living in poverty have enough M.D.s
b) Only 5.6% of the variability in the number of M.D.s per 100,000 is
explained by the percent of the population living in poverty.
c) Only 5.6% of the M.D.s live in poverty.
d) Only 5.6% of the people living in poverty have no M.D.s
35) In the year 2000 the District of Columbia had 23.5% in poverty and 702 M.D.s
per 100,000. If this data point was added to the scatterplot, it would be
a) a residual.
b) negatively correlated with the data.
c) an outlier and influential observation.
d) a weak influence on the least-squares regression line.
e) a lurking variable.
A persons muscle mass is expected to decrease with age. To explore this relationship in
women, a nutritionist randomly selected 15 women from each 10-year age group,
beginning with age 40 and ending with age 79. The observations and least-squares
regression line appear in the scatterplot and the R-sq value is 75%.
36) Which of the following statements is the most accurate ?
(A) For each additional year of age the estimated mean muscle mass increases and
decreases.
(B) The relationship between age and muscle mass is weak because the
correlation is negative. Higher muscle mass goes with both lower and higher age.
(C) The scatterplot shows a negative direction, with higher muscle mass
going with lower age. The plot is generally straight with a moderate amount
of scatter.
(D) The relationship between age and muscle mass is weak because R-sq=75% is
a small number compared to the intercept of 156.35.
(E) The correlation between age and muscle mass turns out to be -0.866. This is
an indication that age is causing muscle mass to decrease with time.
37) Which is the most appropriate statement regarding the interpretation of the
intercept?
(A) For each additional year of age the estimated mean muscle mass decreases by
approximately 1.19 MMIs.
(B) The average muscle mass is 156.35 MMI for women at age 0.
(C) The minimum muscle mass is 156.35 MMI.
(D) For each additional year of age muscle mass decreases by approximately
156.35 MMIs.
(E) We cannot interpret the intercept here since it does not make sense that a
newborn female child would have a muscle mass index of 156.35.
38) The following probability density curve represents waiting times at a customer
service counter at a national department store. The mean waiting time is 5 minutes
with standard deviation 5 minutes. If we took all possible samples of size n=100,
how would you describe the sampling distribution of the resulting sample means?
(A) Shape = right skewed, mean = 5, standard deviation = 5
(B) Shape = same as above graph, mean = 5, standard deviation = 0.5
(C) Shape = approximately normal, mean = 5, standard deviation = 0.5
(D) Shape = approximately normal, mean = 5, standard deviation = 5
(E) Shape = binomial, n =100; p = .05
Hy-Vee Inc. collected data to measure the impact of television advertising on the price
which customers expect to pay for a deluxe pre-packaged dinner sold in Hy-Vee
grocery stores. For each local TV market, Hy-Vee determined two marketing inputs:
x1 = Number of one-week TV promotions
x2 = Advertised discount (in percent) for price of the dinner
In particular, Hy-Vee used x1 = 1, 3, 5, and 7 promotions in combination with x2 = 10%,
20%, 30%, and 40% discounts. Hy-Vee advertised in 10 local TV markets for each of the
(4x4) = 16 combinations of x1 and x2, for a total of 160 markets. Hy-Vee also conducted
a post-advertising customer survey in each market to measure
y = Expected price for the dinner, in dollars
Here is the resulting computer output:
39) Which of the following conclusions is supported by the output?
(a) Promotions is linearly related to Price.
(b) Discount is linearly related to Price, after accounting for Promotions.
(c) The regression assumptions are satisfied.
(d) Neither Promotions nor Discount is linearly related to Price.
(e) The price of beer is likely to fall now that the national elections are over.
40) Interpret the slope for Promotions.
(a) Promotions decrease on average by 0.102 for each one-dollar increase in
expected price.
(b) Expected price decreases on average by $0.102 for each additional promotion.
(c) Promotions decrease on average by 0.102 for each one-dollar increase in
expected price, when discount is held constant.
(d) Expected price decreases on average by $0.102 for each additional
promotion, when discount is held constant.
(e) Expected price decreases on average by $0.0174 for each additional
promotion, when discount is held constant.
41) Suppose that Hy-Vee plans an ad campaign which features two promotions of a
35% price discount in each local market. Estimate the mean expected price with
95% certainty.
(a) $4.31
(b) ($4.24, $4.37)
(c) ($3.78, $4.84)
(d) Stop! Im too tired to calculate this.
42) Suppose that the goal of the ad campaign described in the previous question is for
customers to expect the price to be at most $4.35, on average. Which modification
to the ad campaign should be recommended to help Hy-Vee achieve its goal?
(a) Feature a 10% price discount instead of a 35% discount.
(b) Feature a 30% price discount instead of a 35% discount.
(c) Run seven promotions in each market.
(d) Run a single promotion in each market.
(e) None of the modifications is recommended.
Short Answers
1) (9 points) In a recent study, 928 women were asked about their smoking habits
during pregnancy and then again five years later. The data are summarized in the
table below.
a) What is the approximate probability that a randomly chosen woman smoked 5
years after pregnancy?
(230+95)/928
b) If a randomly selected woman smoked during pregnancy, what is the probability
that she smoked 5 years after pregnancy?
P(5 yrs later|smoked during) = 230/271
c) Are the events Smoking during Pregnancy and Smoking Five Years Later
independent or dependent? Explain.
No, the conditional probability in (b) does not equal the unconditional
probability in (a).
2) (21 points) Consider the following multiple regression computer output and then
answer the questions on the following pages.
Multiple linear regression results
Dependent Variable: var13
Independent Variable(s): var1, var2, var3, var4, var5, var6, var7, var8, var9, var10,
var11, var12
Parameter estimates:
Analysis of variance table for multiple regression model:
Root MSE (also called s
e
) 0.32969475
R-squared (adjusted): 0.2242
Variable Estimate Std. Err. Tstat P-value
Intercept -0.057636276 0.048596717 -1.1860118 0.236
var1 -0.001675507 0.028293129 -0.059219576 0.9528
var2 4.140445E-5 1.442341E-4 0.28706425 0.7741
var3 0.0025728503 0.0026423088 0.973713 0.3305
var4 0.0386679 0.016722031 2.3123925 0.021
var5 -0.002243308 0.0021889468 -1.0248344 0.3058
var6 0.029505625 0.02173865 1.3572887 0.1751
var7 0.06405778 0.027510468 2.3284876 0.0202
var8 0.088917315 0.021120988 4.2099032 <0.0001
var9 -0.04077969 0.027402725 -1.4881619 0.1372
var10 -0.07517062 0.02721936 -2.76166 0.0059
var11 -0.030563422 0.02852621 -1.0714155 0.2843
var12 -0.012943288 0.0255786 -0.50602025 0.613
Source DF SS MS F-stat P-value
Model 12 1.9961203 18.363802 <0.0001
Error 709 77.06733 0.10869864
Total 721 101.020775
a) Which variable is the most important variable in the model ?
Var8 since it has the lowest p-value
b) Which variable would be removed first when performing a backwards
stepwise regression ?
Var1 since it has the highest p-value.
c) What would happen to the value of R-sq when you remove the variable in
part (b) (circle one answer)
Go Up Go Down
d) What is the coefficient of determination (R-sq) for the full model ?
R-sq = SSR/SST = 1 (SSE/SST) = 1 (77.067/101.02)
e) Compute a 95% confidence interval for var10.
-0.07517062 +/- 1.96*(0.02721936)
f) Do we need var2 in the model ? Explain.
No, the p-value is above .05
g) Test the null hypothesis that var8 equals 0.1
Ho: var8=0.1 Ha: no it doesnt
t = (0.088917315-0.1)/ 0.021120988 = -0.5247
Since |t|<1.96 we fail to reject the null hypothesis.
Practice Exam 2 Solutions
Final Examination
Directions
The exam will end 3 hours after it begins. The exam is divided into three parts.
The first and second parts are true-false and multiple choice, respectively. Please
answer the true-false and multiple choice questions on the exam by circling the
best answer. There will be some partial credit for the multiple-choice questions
as long as some credible work is shown. The third part of the exam consists of
several problems. Please answer these problems in the space provided on the
exam (you may use the backs of the sheets if necessary). You will get partial
credit for these problems provided that your answers are organized and legible
so that your train of thought can be easily followed.
Unless stated, all confidence intervals and hypothesis test should be calculated at
the 95% confidence level (use 1.96).
A note on re-grade requests: Only written requests will be considered. Clerical
errors will be changed without question, but other inquiries will result in a re-
grade of the entire exam.
GOOD LUCK
By signing my name here I acknowledge that the GBS has an honor code and I will abide by it.
___________________________________________
NAME (PLEASE PRINT) :
_______________________________________________________________ (-100 if
not printed)
Multiple Choice (5 points each)
1) Consider the following sample data:
25 11 6 4 2 17 9 6
For these data the median is:
a. 7.5
b. 3.5
c. 10.
d. None of the above.
2) The owner of a fish market has an assistant who has determined that the weights of
catfish are normally distributed, with mean of 3.2 pounds and standard deviation of 0.8
pound. If a sample of 64 fish yields a mean of 3.4 pounds, what is probability of
obtaining a sample mean this large or larger?
a) 0.0001
b) 0.0013
c) 0.0228
d) 0.4987
3) In the construction of confidence intervals, if all other quantities are unchanged, an
increase in the sample size will lead to a interval.
a) narrower
b) wider
c) less significant
d) biased
4) A major department store chain is interested in estimating the average amount its credit
card customers spent on their first visit to the chains new store in the mall. Fifteen credit
card accounts were randomly sampled and analyzed with the following results: X =
$50.50 and
2
s = 400 . Construct a 95% confidence interval for the average amount its
credit card customers spent on their first visit to the chains new store in the mall
assuming that the amount spent follows a normal distribution.
a) $50.50 $9.09
b) $50.50 $10.12
c) $50.50 $11.00
d) $50.50 $11.08
5) In the annual report, a major food chain stated that the distribution of daily sales at their Detroit stores
is known to be bell-shaped, and that 95 percent of all daily sales fell between $19,200 and $36,400.
Based on this information, what were the mean sales?
a. Around $20,000
b. Close to $30,000
c. Approximately $27,800
d. Cant be determined without more information.
6) For some positive value of X, the probability that a standard normal variable is between 0
and +2X is 0.1255. The value of X is
a) 0.99
b) 0.40
c) 0.32
d) 0.16
e) None of the above
7) If we know that the length of time it takes a college student to find a parking spot in the
library parking lot follows a normal distribution with a mean of 3.5 minutes and a
standard deviation of 1 minute, find the probability that a randomly selected college
student will find a parking spot in the library parking lot in less than 3 minutes.
a) 0.3551
b) 0.3085
c) 0.2674
d) 0.1915
e) None of the above
8) The Central Limit Theorem is important in statistics because
a) for a large n, it says the population is approximately normal.
b) for any population, it says the sampling distribution of the sample mean is
approximately normal, regardless of the sample size.
c) for a large n, it says the sampling distribution of the sample mean is
approximately normal, regardless of the shape of the population.
d) for any sized sample, it says the sampling distribution of the sample mean is
approximately normal.
9) It is believed that number of people who attend a Mardi Gras parade each year depends on the
temperature that day. A regression has been conducted on a sample of years where the
temperature ranged from 28 to 64 degrees and the number of people attending ranged from 8400 to
14,600. The regression equation was found to be x y 191 2378 + = . Which of the following is
true?
a. The average change in parade attendance is an additional 2378 people per one degree
increase in temperature.
b. The average change in parade attendance is an additional 191 people per one
degree increase in temperature.
c. If the temperature is 75 degrees, we can expect that 16,703 people will attend.
d. If the temperature is 0 degrees this year, then we should expect 2378 people to attend
10) An analyzing the residuals to determine whether the simple regression analysis satisfies
the regression assumptions, which of the following is the best answer?
a. The histogram of the residuals should be approximately bell shaped
b. The scatter plot of the residuals against the dependent variable should
illustrate that the variation in residuals is the same over all levels of y (should
have no patterns).
c. Neither a nor b are true
d. Both a and b are true
11) Assume that after running a regression that you have calculated a prediction of 110 = y .
Also assume that n = 201 and that s