Professional Documents
Culture Documents
Authors:
Drs. Mark Soskin and Bradley Braun
Associate Professors of Economics
B.
C.
D.
E.
F.
G.
H.
I.
J.
K.
L.
Page 2
Page 3
Page 4
2.45 A scatterplot is a
a. two dimensional plot of data
b. frequency distribution
c. histogram
d. trend line
e. random array of points
2.46 An index
a. summarizes a group of related variables
b. is the average of several variables
c. is often used to represent overall movements in stock prices
d. is often used to represent overall movements in consumer prices
e. all of the above
2.48 Which of the following is a guideline for constructing time series graphs?
a. place time on the vertical axis
b. always draw a trend line rather than connect the plotted points
c. beware of graphs that omit the most recently available data
d. compare each time series observation with its cross section counterpart
e. graph two (or more) time series variables on the same graph to compare
them
Page 5
Page 6
Page 7
3.9 If mean and median salaries are very similar at a firm, then you should
conclude that
a. most employees make about the same salary
b. if some salaries are far below the mean, other salaries are considerably
above the mean
c. no salaries are far from the mean
d. some salaries may be far above the mean, but none can be far below it
e. the similarity between mean and median tells us nothing about the
distribution of the data
3.10 Given the following means, for which would you also want to know the
dispersion?
a. choosing among three mutual funds which averaged 15, 12, and 8 percent
return
b. hiring a new engineer from among graduates with 3.3, 3.1, and 3.0 grade
point averages
c. attempting to make vacation reservations at one of three resort areas with
average annual occupancy rates of 60, 70, and 80 percent
d. all of the above
e. none of the above
3.11 If the mean weight of 50 parts in a shipment is 2.2 pounds and the median
weight is 0.8 pounds, the total weight of the shipment is
a. 40 pounds
b. 50 pounds
c. 110 pounds
d. 220 pounds
e. cannot be determined from the information provided
3.12 If the mean weight of 50 parts in a shipment is 2.2 pounds and the median
weight is 0.8 pounds, then
a. half the parts weigh more than 0.8 pounds
b. 26 parts weigh less than 0.8 pounds
c. 25 parts weigh between 0.8 and 2.2 pounds
d. all of the above
e. none of the above
Page 8
Page 9
3.84
is the parameter but
__
a. X and
__
b. X and M
__
c. m and X
__
d. and X
e. none of the above
Page 10
3.85 If the mean for a very large number of samples is equal to the parameter
being estimated, the estimator is
a. unbiased
b. a sample
c. not an outlier
d. random
e. none of the above
3.86 An outlier is defined as
a. an observation considerably larger than any other in the sample
b. an observation considerably smaller than any other in the sample
c. an observation considerably larger or smaller than any other in the sample
d. observations that are measured incorrectly
e. observations that should be deleted from the sample
3.87 Which of the following conclusions does not involve potentially large
sampling bias?
a. poverty cannot be difficult to overcome. All the people I know that came
from the ghetto are doing well financially now.
b. I can't see why our product isn't selling. Our most trusted customers tell
me we have the best product on the market.
c. I knew I wanted to become a computer programmer. I loved computers
playing around with the one at home for years.
d. I took art courses in school, so I know that I am not artistically inclined.
e. All of the above involve sampling bias.
3.88 To ensure that s has the same units and scale as the variable in the data,
a. we divide by n - 1
b. we square the differences from the mean
c. we sum all the squared differences
d. we take the square root of the final mean sum of squares
e. none of the above
Page 11
3.89 To adjust for the lost degree of freedom in the sample estimator s
a. we divide by n - 1
b. we square the differences from the mean
c. we sum all the squared differences
d. we take the square root of the final mean sum of squares
e. none of the above
3.90 For a utility company responding to the 200 service outages last month,
median repair time was 45 minutes, the mean was one and one-half hours,
the minimum was 15 minutes, the maximum was 1 day, and the standard
deviation was 3 hours. The range of service outage times last month was
a. 23.85 hours
b. 23.75 hours
c. 23.55 hours
d. 23.25 hours
e. 22.50 hours
3.91. The list price of new mid- and full-sized cars models was surveyed and the
summary statistics reported below:
Variable
N Mean Median TrMean StDev SEMean
PRICE
45 22795 20389 22620 6486 967
Variable Min Max
Q1
Q3
PRICE
13206 35553 17091 27357
Answer the following four questions regarding the data described above:
1.
The median price, $20,389, must be an actual car price in the data set
because:
a. there is an odd number of car models in the data set
b. the average of the minimum and maximum is the median
c. the median and mode always are actual observed values, but the mean
may not be
d. the median is less than the mean
e. the median does not have to be an actual car price in this data set
Page 12
2.
$22,347 is
a. The interquartile range
b. The range
c. The standard deviation
d. The mode
e. None of the above
3.
4.
If a car dealer were to stock one of each of the 45 cars models, the total
value of its inventory would be approximately:
a. $600,000
b. $750,000
c. $900,000
d. $1 million
e. Cannot be calculated from the information given
3.92. In the 1993 NBA college draft, the 54 players drafted signed for annual
salaries (in thousands of dollars) given in the following sorted data listing,
summary statistics, and histogram:
SALARY
125 125 125 125 125 145 160 170 175 200 220
245 275 300 305 325 365 380 400 430 475 500
550 575 600 625 660 700 745 775 800 825 865
900 950 1000 1100 1250 1300 1430 1500 1600 1700 1780
1875 1945 2000 2100 2200 2260 2335 2350 2400 2500
Variable
SALARY
Variable
SALARY
Page 13
2.
Frequency
20
10
3.
4.
100
400
700
1000
1300
1600 1900
2200 2500
SA LA RY
Page 14
5.
6.
By only examining the histogram for the salary data, we can immediately
conclude that
a. The mean will be substantially larger than the median
b. There are outliers in the data
c. Salaries do not have a bell-shaped distribution
d. The standard deviation will be relatively large
e. All of the above
3.93. A regional supervisor for Coors has the price per 6-pack of its beer surveyed
from the 25 supermarket chains in Los Angeles. Describe in a couple
sentences the average and variability of beer prices from the following
computer output of descriptive statistics:
coors
MIN
MAX
Q1
Q3
coors 3.2900 4.2900 3.4900 3.8200
Page 15
3.94 New television shows are notoriously risky ventures. Nielsen ratings are the
main variable used to assess the success of a show and the rates that can be
charged advertisers. A network programmer would like you to summarize
the Nielsen ratings for the 1993-94 season crop of new shows. You obtain
the following computer printout:
Data Display
NIELSEN
4.8 5.0
7.8 7.9
9.5 10.2
11.3 11.4
12.7 12.7
15.3 15.7
7.6
9.2
10.8
12.2
14.8
7.6
9.4
10.9 11.3
12.2 12.2
15.1 15.2
Descriptive Statistics
N MEAN MEDIAN TRMEAN STDEV SEMEAN
NIELSEN
63 11.176 10.900 11.077
3.586 0.452
NIELSEN
MIN MAX
Q1
Q3
4.800 20.500 8.300 13.600
a) Based on the DESCRIBE information, locate and mark the mean and median
Nielsen rating on the sorted data listing above. [No explanation please]
b) Is the median an actual value in the data? Why or why not? In two sentences,
explain carefully how the median was calculated from this data set.
c) By examining the Nielsen data printed above, explain why the mean is not very
different from the median.
d) What is the RANGE for the Nielsen data (single number answer)?
e) Calculate (from the DESCRIBE output above) an interval two standard deviations
on either side of the mean? Are about 95 percent of the Nielsen ratings
within this interval? Which ratings (if any) are above and below these limits?
f) Write a one-sentence report summarizing to the network programmer your
findings about averages and dispersion of new program Nielsen ratings.
Page 16
Page 17
Page 18
Page 19
Page 20
s = 0.4899
R-sq = 94.2%
Analysis of Variance
SOURCE
DF
SS
Regression 1 171.69
Error
44
10.56
Total
45 182.25
R-sq(adj) = 94.1%
MS
F
p
171.69 715.35 0.000
0.24
Page 21
(1) The monthly trend rate in short term interest may be described by: Interest
rates (decreased / increased) at an average rate of
percentage
points per month.
(2) Assuming past trends continued, forecast interest rates for December 1993
(Month = 50):
(3) What percent of variation in interest rates is explained by the regression?
%. This percentage may be verified by subtracting from 1 the following ratio:
.
(4) The standard error of the estimate, 0.4899, is the square root of which
number?
(5) To capture the actual interest rate outcome about 95% of the time, predictions
based on the fitted equation may report a margin of error of roughly plus or
minus
(please round).
(6) The correlation is negative because of the negative sign of the
. The correlation coefficient between interest rate and Month is - 0.
.
[show work]
Answers?
Page 22
4.43 The mean cross-product deviation for bivariate data are the
a. correlation
b. coefficient of determination
c. standard error of the estimate
d. covariance
e. none of the above
4.44 Which of the following is a unitless measure ranging from -1 to +1?
a. correlation
b. coefficient of determination
c. standard error of the estimate
d. covariance
e. all of the above
4.45 Which of the following is not a difference between regression and
correlation?
a. correlation requires us to first plot the data
b. regression requires that we designate a dependent variable
c. regression allows us to make predictions
d. correlation is merely a measure of association
e. correlation is more appropriate for exploring a new area of inquiry
4.46 If rX,Z = 0 then we say that X and Z are
a. uncorrelated
b. perfectly correlated
c. negatively correlated
d. both a and b
e. both a and c
4.47 If the correlation among two variables is 0.2, then R for the simple
regression is
a. 0.40
b. 0.20
c. 0.04
d. 0.02
e. cannot be calculated with the information given
Page 23
Answer the next four questions based on the following statistical output:
The primary business of a newspaper is to sell readers to advertisers. Data on two
variables are collected for Florida newspaper on the following variables:
Advert advertisers space (in thousands of inches) purchased during each month
Month a time trend variable for each of n = 50 months: month = 1, 2, ..., 50
The regression equation is :
S = 7.276
R-sq = 50.7%
Analysis of Variance
SOURCE
DF
SS
Regression 1 2612.4
Error
48 2541.4
Total
49 5153.8
R-sq(adj) = 49.7%
MS
2612.4
52.9
F
p
49.34 0.000
Page 24
4.50. The least-squares equation reduces the total variation in predicting advert
a. From 5154 to 2541
b. From 2612 to 52.9
c. From 2541 to 52.9
d. From 50.7 to 49.7
e. From 49 to 48
4.51. We may predict advertising space for month 20 to be approximately
a. 92,000 inches with a margin of error of 15,000 inches
b. 82,000 inches with a margin of error of 15,000 inches
c. 92,000 inches with a margin of error of 7,300 inches
d. 82,000 inches with a margin of error of 7,300 inches
e. 82,000 inches with a margin of error of 7,300 inches
4.52. The correlation coefficient between advertising space and month was
a. +0.71
b. - 0.71
c. +0.26
d. - 0.26
e. Insufficient information to determine answer
4.59 Given that rxy = -.80, determine the following:
(a) R for the regression of Y on X and the sign of the slope coefficient b1
(b) R for the regression of X on Y and the sign of the slope coefficient b1
Page 25
4.60 The following multiple regression determines insurance charged to short haul
trucking firms: premium = b0 + b1 fleetsz + b2 popden
where the variables in the regression equation are defined as:
premium = insurance premiums charged each firm (in dollars per truck)
fleetsz
= number of truck owned by each firm
popden
= county population density (people / square mile) where
company is located
From a random sample of n = 32 trucking firms surveyed in 1993,
the regression equation is
premium = 779 - 5.49 fleetsz + 0.885 popden
[other output not relevant is omitted here]
S = 332.5
R-sq = 36.6%
Analysis of Variance
SOURCE
DF
SS
Regression 2 1850413
Error
29 3206046
Total
31 5056459
R-sq(adj) = 32.2%
MS
925206
110553
F
p
8.37 0.001
1. This data set consists of time series / cross section (circle one) data.
2. This is multivariate regression because there is more than one variable.
3. The
statistic tells us we have explained over one-third the variation in
premiums.
4. The degrees of freedom for the error sum of squares =
, found by
_.
5. Use the fitted equation to predict premiums for a firm with 10 trucks in a county
with 50 people / square mile.
[round answer to the nearest dollar]
6. For the following, show computations assuming no other explanatory variable
changes:
(a) Companies increasing fleet size 20 trucks are charged $110 lower premiums on
average.
b1 fleetsz =
(b) Companies moving to counties of 200 fewer people / square mile are charged
$177 less on average.
b2 popden =
Answers?
Page 26
Page 27
4.76 The monthly rent for a 1200 square foot apartment that is a 2 minute walk
from campus may be estimated to be
a. $1180
b. $980
c. $1220
d. $1420
e. $760
4.77 How much should you expect rents to change on average if you moved to an
equal-sized apartments 10 minutes further walking distance from campus?
a. $100 less
b. $100 more
c. $20 more
d. $200 more
e. exactly the same amount
4.78 We should not expect R from two regressions to be directly comparable if
a. the samples for the two regressions was gathered from very different time
periods
b. the sample for the one of regressions was from a more broadly-defined
population
c. the dependent variables used in each regression was defined differently
d. all of the above
e. none of the above
4.79 Suppose a particular regression equation is utterly worthless in accounting
for variation in the dependent variable over the entire population. Then we
should expect that the least-squares equation from a random sample of that
population will yield an R = 0
a. only very rarely
b. less than 50 percent of the time
c. most of the time
d. nearly always
e. always
Page 28
4.80 Which is not true about the difference between R and the adjusted R?
a. the difference is greater for smaller sample sizes
b. the difference is greater if R is large
c. the difference is greater if there are more explanatory variables in the
equation
d. adjusted R cannot be greater than R
e. adjusted R cannot be greater than 100 percent
4.81 Multiple regression is so often used in business and economics today
because
a. most variables we seek to explain are affected by a complex set of factors.
b. an acceptable fit often cannot be obtained with only a single explanatory
variable.
c. controlled experiments are often too difficult to conduct.
d. the abundance of data and speed of modern computers make multiple
regression a practical option.
e. all of the above.
4.82 In comparing regression fits on cross section and time series data
a. R is usually lower for cross section data because it is easier to explain why
different items are different
b. R is usually higher for cross section data because it is easier to explain
why different items are different
c. R is usually lower for cross section data because it is more difficult to
explain why different items are different
d. R is usually higher for cross section data because it is more difficult to
explain why different items are different
e. there is no systematic difference between fits for either type of data
Answers? 4.72-4.82
Page 29
Answer the next five questions based on the following case description and
statistical output:
We next examine a regression equation where advertising space is a function of
newspaper sales and the state of the economy:
advert = advertising space (in thousands of inches) purchased during each
circ = the monthly level of circulation (measured in millions)
jobless = the monthly unemployment rate (in percent)
The regression equation is:
advert = 125 + 1.95 circ - 6.13 jobless
4.83 Forecast advertising space next month when circulation is 4 million and
unemployment rate is 6 percent.
a. 22,000 inches
b. 29,000 inches
c. 96,000 inches
d. 132,000 inches
e. None of the above
4.84 If the circulation numbers were instead reported in thousands (rather than
millions) of newspapers, what would the coefficient of circ now have to be
for the fitted equation to tell exactly the same story?
a. 1950
b. 1.95
c. 0.00195
d. 0.00000195
e. Cannot be determined from the information provided.
4.85 Assuming unemployment is the same, circulation decline of one million will
result in a
a. 1950 inch increase in advertising space
b. 127,000 inch increase in advertising space
c. 1950 inch decrease in advertising space
d. 127,000 inch decrease in advertising space
e. None of the above
Page 30
4.86 For a one percentage point increase in the unemployment, other things
equal, advertising space will
a. increase by about six thousand inches
b. increase by about 119 thousand inches
c. decrease by about six thousand inches
d. decrease by about 119 thousand inches
e. none of the above
4.87 Which of the following describes the data set and regression equation:
a. data set is time series data and the regression is multivariate regression
b. data set is cross section data and the regression is multivariate regression
c. data set is time series data and the regression is simple regression
d. data set is cross section data and the regression is simple regression
e. not enough information provided to determine the data set and regression
type.
4.88 Determine for each of the following cases:
(a) b1 = 15, X = 10
(b) b1 = 15, X = 100
(c) b1 = 1.5, X = 100
(d) b1 = -5, X = 10
(e) b1 = 15, X = -10
Page 31
4.88 Answer the next four questions based on the following case and statistical
output on the regression: Predicted P GAS = b0 + b1 month
where the variables are:
P GAS monthly average price at pump for regular gasoline (in cents / gallon)
month numbered from 1 to 53 from April 1986 to August 1990.
The Minitab regression equation is: P GAS = 83.8 + 0.414 month
[other output not relevant is omitted here]
S = 4.532
R-sq = 66.9%
R-sq(adj) = 66.3%
a.
Rounded to the nearest cent, the annual trend rate (i.e., every 12 months) in
gas price increases is about
a. 1 cent
b. 5 cents c. 84 cents d. 89 cents e. None of the above
b.
The forecast of gas prices in March 1991 (i.e., month = 60), assuming past
trends continue is approximately
a. 25 cents
b. 35 cents
c. 84 cents
d. 109 cents
e. None of the above
c.
d.
Page 32
4.89 Answer the next four questions based on the following case and statistical
output: Data on the following fours variables were collected from a random
sample of n=49 persons taking the business law section of the CPA (Certified
Public Accounting) exam:
LAWSCORE = each person's score on the business law section of the CPA exam
HOURS = number of hours that person studied per week to prepare for the exam
GPA = undergraduate grade point average of the person taking the exam
WORKEXP = number of years work experience of the person taking the exam
The regression equation is in the following form:
Predicted LAWSCORE = b0 + b1 HOURS + b2 GPA + b3 WORKEXP
When the regression was run, the following output results:
LAWSCORE = 53.8 + 0.400 HOURS + 3.45 GPA + 0.272 WORKEXP
[other output not relevant is omitted here]
s = 9.050
R-sq = 25.4%
Analysis of Variance
SOURCE
DF
SS
Regression 3 1256.46
Error
45 3685.78
Total
48 4942.25
R-sq(adj) = 20.5%
MS
418.82
81.91
F
p
5.11 0.004
a) The degrees of freedom for the error sum of squares are calculated as follows:
a. 48 - 1 - 1 = 46 degrees of freedom
b. 50 - 2 = 48 degrees of freedom
c. 45 - 4 = 41 degrees of freedom
d. 49 - 3 - 1 = 45 degrees of freedom
e. None of the above
b) If a second regression equation reports an R2 = 28.6% and an adjusted R2
=19.2%, then this second equation has
a. A better fit than the regression equation above.
b. A worse fit than the regression equation above.
c. The same fit as the regression equation above.
d. Not enough information provided to answer this question.
Page 33
c) Assuming the other explanatory variables don't change, ten more hours study
weekly
a. increases exam scores an average of 0.4 points
b. increases exam scores an average of 4 points
c. increases exam scores an average of 40 points
d. increases exam scores an average of 58 points
e. None of the above
d) The exam score for a person studying 30 hours per week who earned a 3.0
grade point average in college and has ten years work experience is
predicted to be approximately
a. 72
b. 79
c. 85
d. 91
e. None of these
Answers?
4.90 At the beginning of 1991, a national realtors association wants you to analyze
trends in the home construction industry over the preceding four years and
forecast new construction activity this year.
Predicted H starts = b0 + b1 Month
The variables in the regression equation are for month t
H starts
number of housing starts (in thousands) during month t
Month
month = 1 to 48 from January 1987 to December 1990)
The regression is fit yielding the following Minitab output:
The regression equation is: H starts = 1727 - 12.2 month
[other output not relevant is omitted here]
s = 97.54
R-sq = 75.7%
R-sq(adj) = 75.2%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
SS
1 1366204
46 437648
47 1803852
MS
F
p
1366204 143.60 0.000
9514
Page 34
a) What was the monthly trend rate in housing starts over the period examined?
Housing starts (increased / decreased) at an average rate of
thousand per
month decreased, 12.2
b) What percent of variation in car sales was explained by this trend equation?
75.7%
c) Calculate the correlation between Month and Sales.
2
0.757
R =
r=
= - .87 because b1 <0
d) Verify with a calculator that the standard error of the estimate has the proper
relationship to the mean square error MSE.
SEE =
=
=
97.54
e) Verify with a calculator that R has the expected relationship to the error and
total sum of squares, SSE and SST.
R2 = 1 - SSE / SST = 1 - 437,648 / 1,803,852 = .757 or 75.7%
f) Remember that month = 48 was December 1990. Use a calculator to forecast
housing starts for January 1991 and again for February 1991 from the fitted
regression equation.
predicted H starts = 1727 - 12.2(49) = 1129.2 thousand for January 1991
predicted H starts = 1727 - 12.2(50) = 1117.0 thousand for February 1991
g) The actual values for housing starts in January 1991 was 844 thousand and 1008
thousand for February. Compare your answers in question (6) above with
these actual outcomes, and discuss how close each of your forecasts was.
h) Below is the time series plot of housing starts from 1987 through the end of
1991 (i.e., for an additional 12 months beyond the data for which the
regression equation was fitted)
MS E
0.9514
Both forecasts were too high, however the second forecast (for February 1991)
was much closer being only 109 thousand off rather than 285 thousand away
from the actual outcome.
Page 35
- **
1750+ *
**
*
H strts - ***** *
* 2 ***
*
**2 *
*
1400+
** *
* ** *
** * * *
* *
****
2*
***
1050+
* **2
* ***
*
*
--+---------+---------+---------+---------+---------+----month
0
12
24
36
48
60
i) Based on your examination of this plot, discuss how you can tell that the
regression equation we estimated would yield very bad forecasts for mid- and late1991 (months 54 to 60).
The downward trend over the first four years (48 months) was used to fit the
regression equation. However, the trend may have reversed and turned into
an upward trend thereafter. Thus, forecasts from the fitted line will be
increasingly wrong and too low.
j) If a different regression equation were to be used instead of a time trend
equation, suggest one or more explanatory variables that would explain the
variation in housing starts in the U.S. Acceptable answers: interest rates,
construction costs, unemployment, and population.
Answers?
Page 36
4.91 The director of a hospital pharmacy wants to determine how staffing level
affects the rate at which prescriptions are processed. She decides to use the
following regression equation: prescript = b0 + bb staff + b2 in-pat
where the variables are defined as follows:
staff
= average number of staff on duty on day t
presrip
= prescriptions processed per hour during day t
in-pat
= number of in-patients at the hospital on day t
Forty-seven days are sampled and a regression yields the following equation:
prescrip = - 67.1 + 21.0 staff + 0.395 in-pat.
[other output not relevant is omitted here]
S = 14.22
R-sq = 69.6%
Analysis of Variance
SOURCE
DF
SS
Regression 2
20392
Error
44
8892
Total
46
29284
R-sq(adj) = 68.3%
MS
10196
202
F
p
50.45 0.000
a) Verify that the error degrees of freedom should be 44, given this regression and
sample size.
DF = n - k - 1 = 47 - 2 - 1 = 44
Using the marginal effects"delta" () formula answer b and c: Based on the fitted
equation, determine the expected change in the prescription processing rate,
assuming the other explanatory variable in the regression equation does not
change: [In each case, show delta formula computations and report answers in
correct direction of change, numerical magnitude, and correct units]
b) The staff level at the pharmacy is increased by two persons.
prescrip = b1 staff = (+21)(+2) =+42
an increase of 42 prescriptions /
hour
c) The number of patients at the hospital decreases by 100.
prescrip = b2 in-pat = (+.395)(-100) =-39.5
a decrease of 40
prescriptions / hour
Make a prediction for questions d and e:
d) Predict the prescription processing rate when there are 200 patients in the
hospital and 10 persons maintained on staff at the pharmacy.
predicted prescrip = -67.1 + 21.0(10) +.395(200) = -67.1 + 210 + 79
= 222 prescriptions / hour
Page 37
e) Why shouldn't you worry about the negatively signed intercept term, -67.1, in
the fitted equation? [Hint: use information from the descriptive statistics
below]
Even if both variables are at their minimum values, predicted prescrip would still
have positive value.
N MEAN MEDIAN TRMEAN STDEV SEMEAN
staff
47 9.512 9.563
9.509
1.037 0.151
prescrip 47 170.55 174.93 170.68
25.23 3.68
in-pat.
47 95.98
97.00
95.72
15.40
2.25
MIN MAX
Q1
Q3
staff
7.375 12.125 8.625 10.187
prescrip 110.00 227.86 153.57 188.14
in-pat.
72.00 129.00 81.00 108.00
f) What is the danger of using the fitted equation to predict "prescript" if only four
staffers are on duty at the pharmacy?
The range of staff data was 7.375 to 12.125, so four staff would result in
extrapolation if used to fit the regression and predictions cannot be trusted.
Answers?
Answers for Chapter 4
4.1: a; 4.2: d; 4.3: b; 4.4: a; 4.5: d; 4.6: e; 4.7: d; 4.8: d; 4.9: b; 4.10: c; 4.29: a; 4.30: d; 4.31: d;
4.32: a; 4.33: b; 4.34 (rounded): a) 88%, b) 67%, c) 6%, d) 60%, e) 82%; 4.35; 4.43: d; 4.44: a;
4.45: a; 4.46: a; 4.47: c; 4.48: e; 4.49: d; 4.50: a; 4.51: a; 4.52: b; 4.59: a and b are the same
answer (64% and negative sign); 4.60; 4.72-4.82; 4.88: a) +150, b) +1500, c) +15, d) -50, e) 150; 4.89; 4.90; 4.91;
Page 38
Page 39
Page 40
6.3
6.4
Subjective probability
a. is derived from human judgments.
b. expresses our personal degree of belief.
c. is small when there is great doubt that an event will occur.
d. is most appropriate in new, complex, and difficult to quantify situations.
e. all of the above.
Page 41
6.16 If the probability of an event A is P(A) = .25, then the probability of its
complement C, P(C) must be
a. 0
b. 0.25
c. 0.5
d. 0.75
e. 1.0
6.17 If this year, P(Recession) = 0.25, P(Mideast War) = 0.10, and
P(Recession and Mideast War) = 0.05, then P(Recession or Mideast War) is
a. 0.025
b. 0.20
c. 0.30
d. 0.35
e. 0.40
6.18 Upon entering an intersection where two roads cross, cars have a 0.2
probability of turning left. Then the event Right Turn
a. has a probability of 0.8
b. is the complementary event of Left Turn
c. is mutually exclusive of Left Turn
d. is independent of Left Turn
e. all of the above
6.19 Suppose that of all newly-trained employees at a fast-food franchise, fifty
percent are still working there one year later but another thirty percent do not last
more than the first two months. Therefore, there is a twenty percent chance that
a new employee will
a. be around after two months
b. not last a year
c. be fired immediately
d. work there more than two months but no more than a year
e. insufficient information to answer
Page 42
6.20 A new product of the type your company is considering have achieved
market success in 200 out of 1000 cases. Which of the following correctly
describes this record?
a. the probability of success is 0.20
b. the probability of not achieving success is 80 percent
c. the odds are four-to-one against success
d. the chance of success is one in five
e. all of the above
6.21 If there are even odds that you will get a job offer from Acme, Inc., then the
probability P(Acme job offer) equals
a. 100 percent
b. 75 percent
c. 50 percent
d. 33 percent
e. not enough information provided
6.22 Which of the following does not indicate statistical independence among
events?
a. Students in the honors program have the same chance of passing this
course as any other type of student.
b. The chance of a small firm failing is identical to that for any size firm.
c. The odds of the boss' son getting a promotion are no better or worse than
for any other employee of the firm
d. People who saw their new commercial were no more likely to shop at KMart than those who didn't watch it.
e. All of the above indicate independent events.
6.23 For a product to be delivered on time, all of the following must occur: the
order is processed within one work day, the product is shipped the following
day, and the shipment is routed through the proper regional distribution
center. If each of these event are independent and their probabilities are
0.8, 0.8, and 0.5, the probability of on-time delivery is
a. 0.50
b. 0.48
c. 0.40
d. 0.32
e. 0.24
Page 43
6.24 If S is the set containing events describing the time between servicing of a
copying machine under warranty, which of the following would be a possible
event in S?
a. five months
b. customer complained
c. asked for money back
d. cartridge need replacing
e. all of the above
6.25 If events S = {A, B, C} is an exhaustive set of events, then
a. P(A) + P(B) + P(C) = 1
b. P(S) = 1
c. A, B, and C must be mutually exclusive events
d. all of the above
e. none of the above
6.26 If S = {A, B, C} and P(A) + P(B) + P(C) = 1.0, then
a. A, B, and C are mutually exclusive and exhaustive events.
b. A, B, and C are mutually exclusive but not exhaustive events.
c. A, B, and C are exhaustive but not mutually exclusive events.
d. A, B, and C are complements.
e. A, B, and C are independent events.
6.27 If the sample space consists of a listing of each product sold at a
supermarket, then frozen foods and fresh produce are
a. outcomes
b. events
c. sample spaces
d. experiments
e. none of the above
6.28 If the sample space consists of a listing of each product sold at a
supermarket, then frozen foods and fresh produce are
a. independent
b. mutually exclusive
c. exhaustive
d. complementary
e. all of the above
Page 44
L e s s
t h a n
5 0
W o r k e r s
E x p o r t
U .S . M a n u f a c t u r i n g
C o m p a n ie s
Page 45
Page 46
6.37 Which of the following pairs of events are most likely to be independent?
a. a rusting fender and a car less than two years old
b. a person who has attended college and an athlete over 7 feet tall
c. a company president earning more than $500,000 a year and a Fortune
500 company
d. an A on the first exam and final course grade of D
e. having blue eyes and majoring in accounting
6.38 If the probability of an event A is P(A) = .25, then the probability of its
complement C, P(C) must be
a. 0
b. 0.25
c. 0.5
d. 0.75
e. 1.0
6.39 People usually express reluctance to offer a needy relative one of their two
kidneys for transplant surgery. They are often convinced to become donors when
they learn that the same diseases that cause one kidney to fail will also damage the
other kidney. This argument relies on explaining that the failure of each kidney are
not
a. exhaustive events
b. mutually exclusive events
c. independent events
d. complementary events
e. all of the above
6.40 If events A and B are statistically independent, and P(A) = 0.5 and P(B) = 0.4,
then the joint probability P(A and B) must be
a. 0.1
b. 0.2
c. 0.5
d. 0.9
e. cannot calculate without knowing the conditional probabilities
Page 47
6.41 For a product to be delivered on time, all of the following must occur: the
order is processed within one work day, the product is shipped the following
day, and the shipment is routed through the proper regional distribution
center. If each of these event are independent and their probabilities are
0.8, 0.6, and 0.5, the probability of on-time delivery is
a. 0.50
b. 0.48
c. 0.40
d. 0.30
e. 0.24
6.64 If X is a random variable with sample space {2, 6} and P(X=2) = 0.5, P(X=6) =
0.5, then 4 equals
a. the mean
b. the variance
c. the standard deviation
d. both a and b
e. both a and c
6.65 In Tampa, the probability distribution for the random variable X measuring
the price charged for weekday video tape rental is found to be:
Price Charged
$ 1.50
2.00
2.50
3.00
P(X)
0.10
0.60
0.10
0.20
Page 48
6.69 A manager may not select the investment alternative with the highest
expected profit if another investment
a. has a lower standard deviation and the manager is averse to risk
b. has a higher standard deviation and the manager is averse to risk
c. has a lower standard deviation and the manager is risk loving
d. both a and c are possible explanations of the manager's behavior
e. both b and c are possible explanations of the manager's behavior
The questions that follow are related to the following decision making problem: A
manufacturer has to decide whether to replace (R), fix (F), or ignore (I) its aging
factory equipment this year. If R is chosen, there is a 0.8 chance of no production
stoppages (NO) and a 0.2 chance of minor stoppages (MIN). If F is selected, on the
other hand, P(NO) falls to 0.5 and P(MIN) increases to 0.5. If I is chosen by the
manufacturer, P(MIN) = 0.6 and there is now a 0.4 chance for major stoppages (MAJ).
Because of the higher costs associated with replacing equipment, profits from choice
R will only be 15 if NO occurs and 5 if MIN results. For choice F, a NO outcome yields
profits of 18 and MIN results in profits of 8. In the case of choice I, MIN produces
profits of 20 but MAJ cause profits of 0.
6.70 The number of decision forks faced by this manufacturer is
a. 0
b. 1
c. 2
d. 3
e. 4
6.71 The number of chance forks faced by this manufacturer is
a. 0
b. 1
c. 2
d. 3
e. 4
Page 49
6.72 The greatest expected profits for this manufacturer are derived by choosing
a. R
b. F
c. I
d. either a or b
e. either b or c
6.73 The expected profits from R is
a. 7 and 8
b. 7 and 12
c. 13 and 8
d. 13 and 12
e. none of the above
6.74 In problems using Bayes' theorem, we
a. always seek to determine a conditional probability as our answer
b. assume that outcomes each statistically independent
c. look for keywords such as "who", "what", and "how" to determine
whether we are dealing with marginal probabilities
d. all of the above
e. none of the above
6.75 In problems using Bayes' theorem, we
a. always seek to determine a conditional probability as our answer
b. assume that events are not statistically independent
c. look for the presence of keywords such as "given that", "if", and "when" to
identify conditional probabilities
d. all of the above
e. none of the above
Page 50
Use the information from the following situation to answer questions below:
A study of past shuttle launch attempts reveals that the probability of a launch (L)
taking place was 40 percent if a heavy clouds cover (HC) was forecast, 60 percent if
a light cloud cover (LC) was forecast, and 80 percent if no clouds (NC) were
forecast. A survey of weather forecasts for the Cape informs us that clear days are
forecast three-quarters of the time and light clouds are forecast 20 percent of the
time. Assume that there are only three kinds of forecasts: HC, LC, and NC.
6.76 The probability P(HC) is
a. 0 percent
b. 5 percent
c. 30 percent
d. 37.5 percent
e. 55 percent
6.77 The 40, 60, and 80 percent probabilities given in the problem are
a. marginal probabilities
b. conditional probabilities
c. joint probabilities
d. Bayesian probabilities
e. all of the above
6.78 Using Bayes' theorem, we may use the shuttle launch and weather
information to solve for
a. P(HCL)
b. P(LCL)
c. P(NCL)
d. all of the above
e. none of the above
6.79 The denominator 0.74 in Bayes' formula from the launch and weather
probabilities is calculated from the following sum:
a. 0.01 + 0.24 + 0.49
b. 0.02 + 0.12 + 0.60
c. 0.04 + 0.08 + 0.62
d. 0.12 + 0.24 + 0.38
e. 0.20 + 0.30 + 0.24
Page 51
6.80 The probability that heavy clouds were forecast if a launch is known to have
occurred that day is approximately
a. 22 percent
b. 12 percent
c. 7 percent
d. 3 percent
e. 1 percent
Page 52
Most statistical analysis uses only a few distributions for all but one of the
following reasons:
a. we may use large sample properties
b. business variables are distributed in only a few different manners
c. we can represent many distributions by a handful of distribution families
d. business statistics relies primarily on discrete distributions
e. all of the above are explanations
7.4
Page 53
7.5
The probability that exactly two computers in the lab will break down this
month is approximately
a. 0.1
b. 0.2
c. 0.3
d. 0.4
e. 0.5
7.6
The probability that no more than two computers will break down this
month is approximately
a. 0.3
b. 0.4
c. 0.5
d. 0.6
e. 0.7
7.7
7.8
7.9
A company either adopts TQM methods or it does not adopt them. If the
probability of adoption is 0.4, then, in the notation of Bernoulli trials,
a. P(S) = 0.4
b. P(F) = 0.6
c. p = 0.6
d. q = 0.4
e. all of the above
Page 54
7.10 If the probability of any particular car buyer choosing the dealer's bank
financing is 0.7, then the probability that the next four buyers will each
choose the dealer's bank financing is approximately
a. 0.53
b. 0.49
c. 0.34
d. 0.24
e. 0.17
7.11 For the preceding example, out of n = 5 buyers the expected number of
buyers who accept the dealer's bank financing is
a. 0.7
b. 1.05
c. 2.5
d. 3.2
e. 3.5
7.12 The standard deviation for the expected number of buyers in the preceding
example is
a. 3.50
b. 1.87
c. 1.22
d. 1.05
e. 1.02
7.13 A probability density function applies only to
a. continuous random variables
b. discrete random variables
c. distributions having both tails infinitely long
d. distributions having at least one infinitely-long tail
e. any random variables
7.14 By examining the algebraic function for the normal pdf, it is easy to see that
the density
a. is determined from only two parameters, and
b. is the same for x = 1.5 as it is for x = 1.5
c. always has a positive value
d. is a maximum at x =
e. all of the above
Page 55
Page 56
Page 57
Page 58
Statistical inference favors the use of larger samples because large sample
size do each of the following except:
a. reduce the thickness of the t distribution tails.
n
b. increase the
quotient in the standard error calculations.
c. tend to make central limit theorem approximations appropriate.
d. permit the use of the normal distribution for our sampling distribution.
e. all of the above
8.2
8.3
8.4
Which of the following are not common characteristics of both the normal
distribution and t distributions:
a. Their shape depends on the number of degrees of freedom.
b. They are symmetrical.
c. Each have two infinitely-long tails.
d. They each have a single mode.
e. All of the above are characteristics of both distributions.
Page 59
8.5
If mean auto sales for a random sample of n = 9 salespersons last year was
$300,000 and the sample standard deviation was $90,000, then we should
use as the standard error of the estimate a value of
a. $60,000
b. $30,000
c. $15,000
d. $7500
e. answer depends on the choice of sampling distribution.
8.6
8.7
8.8
Page 60
8.31 The difference between a parameter and a sample point estimate of that
parameter is called the
a. sampling error
b. bias
c. variance
d. distribution
e. degrees of freedom
8.32 Which of the following does not suggest an interval estimate for the
population mean:
a. As a rough guess, I'd say we rework an average of 15 assemblies each
month.
b. Our new boss reprimands a dozen or so employees a week.
c. Flights average 10 minutes late give or take five minutes.
d. GM's market share averaged about 40 % in each of the past three decades.
e. All of the above suggest interval estimates.
8.33 Which of the following does not involve statistical inference?
a. descriptive statistics
b. point estimation
c. interval estimation
d. hypothesis testing
e. forecasting
8.34 Which information is generally found in a confidence interval?
a. a point estimate
b. an interval width
c. a confidence level
d. all of the above
e. a and c only
8.35 The probability distribution of an estimator or statistic is a
a. random sample
b. confidence interval
c. null hypothesis
d. summary statistic
e. sampling distribution
Page 61
8.36 Analysts for a tire manufacturer estimate from a random sample that mean
tread life for its new radial belt tire is between 20,000 and 50,000 miles.
Which of the following strategies might be used to obtain a narrower
confidence interval?
a. Use a point estimate.
b. Collect a larger sample.
c. Use a smaller value for .
d. Use a larger confidence level.
e. All of the above.
8.41 A statement capable of being subjected to empirical evidence is
a. a hypothesis
b. a tautology
c. a theory
d. a parameter
e. an assumption
8.42 One difference between business statistics and statistics in the sciences is
a. business statistics cannot use the scientific method
b. sciences do not need to formulate hypotheses
c. sciences do not test hypotheses
d. human behavior is not as predictable as planets and viruses
e. all of the above are differences
8.43 Each of the following is a major source of measurement error in business and
economic data except
a. many societies fail to record business and economic data
b. governments suppress data to protect confidentiality
c. people are reticent to disclose financial information
d. businesses are reticent of making information available to rivals
e. all of the above are sources of measurement error
8.44 In traditional hypothesis testing, we
a. try to reject the null hypothesis
b. set up the null hypothesis as a "straw man"
c. make the null and alternative hypotheses exhaustive
d. make the null and alternative hypotheses mutually exclusive
e. all of the above
Page 62
Page 63
e. an hypothesis test
8.50 Which of the following is not a limitation of statistical significance?
a. in large samples, significance can be found from minor patterns
b. in small samples, even substantial effect size may not result in significance
c. although we can select , we usually don't know
d. rejecting H0 does not necessarily support any particular HA
e. all of the above are limitations
8.51 Which of the following is not an unethical practice in hypothesis testing?
a. choosing an H0 that is easy to reject
b. always conducting one-sided tests
c. conducting hypothesis tests prior to estimating confidence intervals
d. choosing significance levels just large enough to obtain significance
e. all of the above are unethical
8.52 A one-sided test that yields significance at the = .05 level is equivalent to
significance for the corresponding two-sided test at the
a. = .10 level
b. = .05 level
c. = .025 level
d. = .01 level
e. none of the above
8.53 One difference between hypothesis testing and interval estimation is
a. hypothesis tests may have no meaningful estimation counterpart
b. hypothesis testing involves inferential statistics
c. hypothesis test examine the distribution centered around
d. hypothesis testing loses much of its usefulness for small samples
e. hypothesis testing is more important in forecasting problems
X
Page 64
I.
Page 65
9.1
9.2
9.3
9.4
9.5
The log transformation may be used when working with variables having a
a. bimodal distribution
b. symmetrical distributions
c. uniform distributions
d. distribution with two infinitely-long tails
e. highly skewed distribution
9.6
Page 66
Answer the next three questions based on the following case and statistical output:
A manufacturer collects a sample of 24 monthly sales (in units of millions of dollar)
to estimate mean monthly sales for the population:
Confidence Intervals
Variable N Mean StDev SE Mean
sales
24 10.796 4.273 0.872
90.0 % C.I.
( 9.301, 12.291)
9.7
For this case, the standard error of the mean is a little more than one-fifth
the size of the sample standard deviation because:
a. The square-root of the sample size is slightly less than five
b. The standard deviation is a bit less than five
c. Half the mean is slightly greater than five
d. Twice the width of the confidence interval is slightly greater than five
e. None of the above facts are relevant here
9.8
Page 67
The confidence interval reported is substantially narrower than fourstandard-errors-of-the-mean wide because
a.90% confidence intervals are narrower than 95% intervals
b. confidence intervals using the t-distribution are narrower than those using
the z-distribution
c. the standard deviation is fairly small in this sample
d. we must first divide by the square root of the sample size
e. all of the above
Page 68
9.12 The null and alternative hypotheses for this test on mean nonsmoker
premiums are:
a.
b.
c.
d.
e.
H0: = $25 H0: = $25 H0: = $25 H0: > $25 H0: < $25
HA: > $25 HA: $25 HA: < $25 HA: < $25 HA: > $25
9.13 According to the printout, the mean premium for this sample of nonsmokers
a. is 16.94 standard errors below $25
b. is 16.39 standard errors below $25
c. is 2.59 standard errors below $25
d. is 2.34 standard errors below $25
e. is 0.012 standard errors below $25
9.14 Mean premiums tests significantly less than $25 (per $1000 coverage) at any
the following significance levels except:
a. = .20 significance level
b. = .10 significance level
c. = .05 significance level
d. = .01 significance level
e. tests significant at any of the levels selected above
9.15 If a two-tailed test had been conducted instead, the computer output would
have been exactly the same except
a. the t-ratio would have been twice as large: t = 4.68
b. the t-ratio would have been half as large: t = 1.17
c. the t-ratio would have been positive: t = +2.34
d. the p-value would have been twice as large: p = 0.024
e. the p-value would have been half as large: p = 0.006
Answers for Chapter 9:
9.1: a; 9.2: a; 9.3: b; 9.4: a; 9.5: e; 9.6: a; 9.7: a; 9.8: e; 9.9: a; 9.10: b; 9.11: a; 9.12: c; 9.13:
d; 9.14: d; 9.15: d
J.
Page 69
Page 70
10.5 A property assessor wants to determine whether property values vary among
houses on difference sized lots. If the assessor measures property value for a
random sample of houses in the Atlanta area in October 1993, the relation
between property value and lot size may be confounded if
a. property values differ between different regions of the country
b. property values vary from one year to the next
c. property values vary among houses different distances from downtown
Atlanta jobs and shopping
d. property values differ among houses, apartments, and commercial
e. none of the above are capable of producing confounding effects for this
experimental design
10.15 What distinguishes one-way analysis of variance from other types of ANOVA
is
a. there is only one response variable
b. there is only one factor
c. there is only one treatment
d. both b and c
e. none of the above are unique to one-way ANOVA
10.20 Which of the following is not an assumption of analysis of variance models?
a. zero overall population mean
b. constant standard deviation among the treatment population
c. normally distributed treatment populations
d. independent, random samples
e. all of the above are assumptions of ANOVA models
10.21 Which of the following is an assumption about the random disturbance term
in analysis of variance models?
a. constant standard deviation of the random disturbance
b. normal distribution of the random disturbance
c. both a and b
d. none of the above
Page 71
10.22 If treatment sample sizes are approximately equal, which of the following
can be said about the sensitivity of the assumptions for one-way analysis of
variance:
a. moderate departures from normality do not invalidate ANOVA test results
b. moderate differences among treatment standard deviations do not
invalidate ANOVA test results
c. use of non-independent or nonrandom samples do not invalidate ANOVA
test results
d. a and b are true
e. only balance designs may be subjected to analysis of variance tests
10.23 If we wish to test whether construction worker wages are significantly
different among four different states, the number of treatments necessary
for analysis of variance must be
a. 1
b. 2
c. 3
d. 4
e. insufficient information provided to answer this question
10.24 In an ANOVA test of whether mean construction worker wages are
significantly different among four different states, not rejecting the null
hypothesis implies that
a. mean construction worker wages are the same as mean wages in other
occupations
b. mean construction worker wages are the same from one year to the next
c. mean construction workers wages are the same for each state
d. wages are the same among all construction workers
e. all of the above
Page 72
10.25 In order to reject the null hypothesis that mean MPG (miles per gallon) is the
same among subcompact, compact, and mid-sized cars, we must conclude
that
a. sample mean MPGs are all different
b. at least one sample mean is different from the sample mean MPG for the
other two car sizes
c. population mean MPGs are all different
d. at least one population mean is different from the population mean MPG
for the other two car sizes
e. none of the above
10.33 The F-ratio for one-way analysis of variance is the ratio of
a. the treatment mean to the overall mean
b. the treatment mean square to the mean square error
c. the treatment sum of squares to the error sum of squares
d. the alpha value to the p-value
e. the p-value to the alpha value
Answer the following questions about the analysis of variance table below:
SOURCE
DF SS
MS F
FACTOR
3
18
ERROR
20
TOTAL
48
10.34 The number of treatments for the factor is
a. 1
b. 2
c. 3
d. 4
e. cannot be determined from the information given
10.35 The sample size for the experimental design is
a. 22
b. 23
c. 24
d. 25
e. 26
Page 73
10.36 The error sum of squares, SSE, and mean square error, MSE, are
a. 30 and 10
b. 30 and 1.5
c. 66 and 33
d. 66 and 3.3
e. not enough information to determine
10.37 The F ratio is
a. 1
b. 3
c. 4
d. 5
e. 8
10.38 Determine MSTR and MSE from the following information:
(a) SSTR = 30, SSE = 480, n = 30, and k = 6
(b) SSTR = 300, SSE = 48, n = 30, and k = 6
(c) SSTR = 30, SSE = 480, n = 60, and k = 12
(d) SSTR = 30, SSE = 480, n = 12, and k = 3
(e) SSTR = 30, SSE = 48, n = 12, and k = 4
10.39 Determine the F-ratio from the following information:
(a) MSTR = 400, MSE = 120
(b) MSTR = 0.50, MSE = 0.020
(c) SSTR = 30, SSE = 480, n = 12, and k = 3
(d) SSTR = 300, SSE = 48, n = 30, and k = 6
10.42 Tests to compare individual treatment means may be conducted only if
a. the null hypothesis that all treatment means are equal was to be rejected
b. at least one of the sample treatment means is different from the rest
c. the sample size is large (at least 30)
d. there are at least four treatment being compared
e. not all the assumption of the ANOVA model are valid
Page 74
SOURCE
FACTOR
ERROR
TOTAL
DF
3
20
SS
20
60
MS
(c)
SOURCE
FACTOR
ERROR
TOTAL
DF
5
SS
20
MS
29
80
SOURCE
FACTOR
ERROR
TOTAL
DF
SS
MS
8.5
21
23
63
80
SOURCE
FACTOR
ERROR
TOTAL
DF
4
SS
MS
F
4.5
(d)
(e)
15
14
Page 75
10.78 Which of the following is not common to both regression and analysis of
variance?
a. involve linear models
b. use an F-test to test for the significance of the model
c. use quantitative explanatory variables
d. use a dependent variable that is measured quantitatively
e. all of the above are common to regression and analysis of variance
10.79 Regression models are similar to analysis of variance models in all except
which of the following ways:
a. the dependent variable in regression is like the response variable in
ANOVA
b. the explanatory variables in regression are like the factors and treatments
in ANOVA
c. both use F tests for significance of the model
d. both report analysis of variance tables
e. all of the above are similarities between regression and analysis of
variance
10.80 Which of the following advice should you give someone deciding between
the use of regression and analysis of variance models?
a. use analysis of variance if you have a balanced design
b. use analysis of variance if you need to estimate a slope coefficient for an
explanatory variable
c. use regression if you have a controlled experiment with most confounding
factors held fixed
d. use regression if you have only one or two explanatory variables, each of
which is categorical
e. all of the above are good advice to give
Page 76
Page 77
Page 78
(1) State the alternative hypothesis given that the null hypothesis is
H0: 5 min = 10 min = 15-30min
HA: at least one j different from the other means
(2) Show results of p-value decision rules for this test at =.05 significance level
and state your test findings in one sentence.
p = .089 > = 05 No significant spending differences found regardless of amount of
time salesperson devoted to customer.
(3) If the manager only examines the means of the three treatment samples, he
might conclude that 10 minutes with the customer nearly quadrupled purchase
amounts ($116.1 versus $30.9), and 15 to 30 minutes doubled that figure again
($215.6 versus $116.1). Use the data listing and confidence interval diagram to
respond to this mistaken conclusion.
The overlapping confidence intervals and huge range of data within the last two
categories indicates within-category variation dominates the between-category
variation.
(4) Is the design balanced? Explain.
Design unbalanced because fewer observations from "5 min" than from other two
categories
Page 79
10.83. Many companies have United Way drives among their employees, and set
goals to surpass the previous year. Test whether marital status and the size
of one's salary has a significant effect on the percentage of salary donated to
United Way. A factorial design with replications from random sampling
results in 30 employee observations stored in three columns (i.e. stacked)
with response variable and two factors defined as follows:
donate%
response variable: percentage of salary donated to charity
marstat
factor 1: coded 1 = single, 2 = married
paycode
factor 2: coded 1 = salary under $20,000, 2 = salary $20,000 to
$30,000, and 3 = salary at least $30,000
A table of means is first generated:
ROWS: marstat
COLUMNS: paycode
1
2
3
ALL
1 0.9284 1.4323 2.2490 1.5366
2 0.9508 1.4080 1.7954 1.3848
ALL 0.9396 1.4202 2.0222 1.4607
CELL CONTENTS -- donate%:MEAN
A two-way model (with interactions) is run, with the following results:
Analysis of Variance for donate%
Source
DF
SS
MS
F
P
marstat
1 0.1729 0.1729 0.85 0.367
paycode
2 5.8841 2.9420 14.39 0.000
Interaction 2 0.3442 0.1721 0.84 0.443
Error
24 4.9053 0.2044
Total
29 11.3064
(1) Use the p-values to conduct the F tests at the = .05 level for each factor,
interpret your test results verbally.
Only salary matters.
Page 80
(2) Use the ALL column or ALL row of the table of means to quantify any significant
patterns found in question #1.
The significant factor, marital status, was associated with single employees
donating at nearly twice the rate (4.2%) of married employees' donations (2.2%)
(3) Conduct a test at the .05 level for possible interactions between marital status
and salary. What does the test result allow you to conclude about whether
an additivity assumption would have been valid?
Fails the p-value decision rule so additivity is not valid.
Page 81
Page 82
11.5 If you have data for the entire population, which of the following will no
longer be a factor?
a. sampling error
b. measurement error
c. modeling error
d. errors in judgment
e. all of the above
11.6 If all regression assumptions are valid, least-squares estimators
a. are unbiased
b. have minimum standard deviation among all unbiased estimators
c. are efficient
d. all of the above
e. none of the above
11.7 Which of the following is not true about autocorrelation?
a. it results in inefficient estimation
b. it is a problem only for time series data
c. it means that and j are correlated for i j
d. it results in biased estimation
e. all of the above are true
11.8 Omitting an intercept, or constant, term from a regression equation
a. is recommended whenever we suspect that E() is not zero
b. means the equation cannot be estimated with least-squares analysis
c. usually improves the regression fit
d. should be avoided even if the intercept is meaningless in the equation
e. all of the above
11.9 Which of the following is not a regression assumption?
a. the parameters are constant
b. E() is zero
c. is uncorrelated with each of the explanatory variables
d. each is uncorrelated with every other
e. all of the above are regression assumptions
Page 83
Page 84
Coef
8559
115.2
-90.33
-1639
88.76
227.97
Std Err
85392
138.0
70.69
4574
19.29
20.57
t-ratio
0.10
0.83
-1.28
-0.36
4.60
11.08
p
0.920
0.407
0.206
0.721
0.000
0.000
11.14. Complete the alternative hypothesis for two-tailed significance test on the
SALETAX variable.
a) H0: 2 = 0 and HA: 2 (complete the alternative hypothesis)
b) Use the p-value decision rule to conduct this test at the = .05 level.
11.15. A one-tailed test is conducted on whether airport arrivals is directly related
to traffic volume; then
a) H0: 5 = 0 and HA: 5 (complete the alternative hypothesis)
b) Use the p-value decision rule to conduct this test at the = .05 level.
11.16. Why is a one-tailed test for the AIRARIV variable justified in this model?
11.17. Use the p-value decision rule to conduct a one-tailed test for METROPOP at
the = .05 level.
11.18. Based on a 95% confidence level, each additional one thousand auto
tourists adds about 89 cars/month to the traffic, plus or minus
.
Page 85
11.25
A valid decision rule for a two-sided test of a regression coefficient is
a. t-ratio > t /2(n k 1)
b. t-ratio < t /2(n k 1)
c. t-ratio > t (n k 1)
d. t-ratio < t (n k 1)
e. none of the above
11.26
A valid decision rule for a two-sided test of a regression coefficient is
a. p >
b. p <
c. p/2 >
d. p/2 <
e. none of the above
11.27
Which of the following steps does not belong in the inference process
for explanatory variables?
a. use the t-ratio or p-value decision rules to determine test results
b. test alternative models at several different significance levels
c. translate test results on each coefficient into significance about the
corresponding explanatory variable
d. interpret estimates of regression coefficients as slopes
e. all of these step belong
11.28
Which of the following steps is out of sequence in the inference
process for explanatory variables?
a. state the regression model
b. collect the sample data and estimate the regression equation
c. decide which variables are to be tested
d. determine which variable are eligible for one-sided tests
e. assign a level of significance
11.29
Which of the following does not belong with the rest in interpreting
the findings of a two-sided test of an explanatory variable X1?
a. X1 is directly related to the dependent variable in the model
b. 1 is significantly different from zero
c. we reject the null hypothesis
d. X1 is statistically significant
e. all of the above are equivalent
Page 86
11.30
For the one-tailed test, a valid decision rule for an explanatory variable
to test significant is that the regression coefficient has the anticipated sign
and
a. p > /2
b. p < /2
c. p/2 >
d. p/2<
e. none of the above
Case Study: Answer questions 11.31- 11.36
An insurance company investigates the determinants of life insurance rates. A
survey of n = 58 policyholders collects information on the three variables in the
following the regression model: premium = 0 + 1 Age + 2 mortrate +
premium
annual premium for each $1000 in life insurance coverage (in dollars)
Age
mortrate
Regression Analysis
The regression equation is:
premium = - 0.77 + 0.364 Age + 1.40 mortrate
Predictor
Constant
Age
mortrate
Coef
-0.765
0.3636
1.4038
Std Err
3.463
0.1158
0.2283
t-ratio
p
-0.22 0.826
3.14 0.003
6.15 0.000
11.31
Which are the null-alternative hypotheses for a two-tailed significance
test on the Age variable?
a.
b.
c.
d.
e.
H0: 1 = 0 H0: 1 = 0 H0: 1 = 0 H0: 1 > 0 H0: 1 < 0
HA: 1 > 0 HA: 1 0 HA: 1 <0
HA: 1 < 0 HA: 1 > 0
Page 87
11.32
According to the regression printout above,
a. Age is more than three standard errors greater than mean age
b. Age is about 0.36 standard errors greater than 0
c. The sample regression coefficient of Age is about 0.36 errors greater than 0
d. The sample regression coefficient of Age is more than three errors greater
than 0
e. The population regression coefficient of Age is about 0.36 errors greater than
0
11.33
If Age is tested at the = .05 significance level, then we may conclude
each of the following except:
a. p <
b. Reject H0
c. The coefficient of Age is significantly different from zero
d. Age has a significant and direct effect on life insurance premiums
e. All of the above are valid
11.34
Each additional year of age adds an average of ______ to insurance
premiums (other things equal)?
a. 36 cents with a margin of error of about 3.5 cents
b. 36 cents with a margin of error of about 23 cents
c. 11 cents with a margin of error of about 3 cents
d. $3.14 with a margin of error of 12 cents
e. Cannot be answered because Age is not statistically significant
11.35
Construct the null-alternative hypotheses for a one-tailed significance
test that mortrate is directly related to premium?
a.
b.
c.
d.
e.
H0: 2 = 0 H0: 2 = 0 H0: 2 = 0 H0: 2 > 0 H0: 2 < 0
HA: 2 > 0 HA: 2 0
HA: 2 < 0 HA: 2 < 0 HA: 2 > 0
11.36
Mortality rates would have a significant, direct relationship with
premiums at any of the following significance levels except:
a. = .20 significance level
b. = .10 significance level
c. = .05 significance level
d. = .01 significance level
e. tests significant at any of the levels selected above
Page 88
s = 26.51
R-sq = 83.1%
Analysis of Variance
SOURCE
DF
SS
R-sq(adj) = 81.8%
MS
227866
46388
274254
%#$@*
703
Page 89
&%#@! 0.000
11.39 Complete the null and alternative hypotheses lines for the F-test of this
model:
H0: 1 =
=0
HA: j0 for at least one j, where j = 1 through
.
11.40. Using the p-value decision rule, p is (less / greater) than , so we (reject /
cannot reject) the null hypothesis, and we therefore conclude that this
expressway traffic model (tests / does not test) significant at the = .01 level.
11.41. This test is also equivalent to the following test for the population R-square,
2 :
H0: 2 = 0 HA: 2
(complete the alternative hypothesis)
Test findings from question 11.40 allows us to conclude that the model fit ( is / is
not ) significant.
11.42. A garbled Fax transmission made some items in the output unreadable.
From your knowledge of the analysis of variance table, the degrees of
freedom for the regression must equal
, the mean square regression is
therefore
, and the F-ratio must then be equal to _______.
11.45
If for a particular explanatory variable, the sample regression
coefficient is 10.5 and its standard deviation is 3.5, then the t-ratio is
a. 7.0
b. 14
c. 0.33
d. 3.0
e. insufficient information provided to determine the t-ratio
11.46 Which of the following can be determined from t-tests on regression
results?
a. whether the model is significant
b. whether specific explanatory variables are significant
c. whether the regression fit is significant
d. all of the above
e. none of the above
11.47
Page 90
Page 91
Page 92
11.56
Which of the following has an F distribution?
a. the ratio of sums of squares
b. mean squares
c. sums of squares
d. the ratio of mean squares
e. all of the above
11.57
A regression model is more likely to test significant if
a. the sample Size n is large
b. there are many explanatory variables in the model
c. R is small
d. the used for the test is small
e. all of the above
11.58
If SSE = 600, SSR = 300, n = 29, and k = 4, then the F-ratio is
a. 6
b. 4
c. 3
d. 1/2
e. none of the above
Answer questions 11.59-11.61 based on a regression analysis of variance table;
unfortunately, a defect in the printer causes it to provide information on degrees of
freedom (DF) for the regression and error and the SSR.
SOURCE
Regression
Error
Total
DF
3
&#
19
SS
*$#@!#
84.80
204.05
MS
@#!@$#
%$#@##
F
&%$#@
p
0.000
11.59
The sample used to generate this ANOVA table must contain
a. n = 20 observations
b. n = 19 observations
c. n = 17 observations
d. n = 16 observations
e. not enough information provided
Page 93
11.60
SSR is equal to
a. 288.85
b. 119.25
c. 39.75
d. 20
e. not enough information provided
11.61
The F-ratio for the table is approximately
a. 8.91
b. 7.50
c. 1.406
d. 0.469
e. not enough information provided
11.62 Solve for the F-ratio given the following information:
(a) n = 30, k = 4, SSR = 200, SSE = 1000
(b) MSR = 60, s = 20
(c) n = 36, k = 2, SST = 120, R = 0.60
11.63 Complete the following ANOVA table by calculating the MS column and F
ratio:
SOURCE
DF
SS MS F
Regression
4
136
Error
21
630
Total
25
769
11.64 Complete the omitted information from the following ANOVA table if the
model contains 3 explanatory variables and the sample Size is 32.
SOURCE
DF SS
MS F
Regression
60
Error
Total
90
11.65 Complete the following ANOVA table, determine the R, and test whether
R is significantly greater than 0 at the .01 level.
SOURCE
DF SS MS F
Regression
6
240
Error
120
Total
18
Page 94
11.66 Complete the following ANOVA table, determine the R, s, and test whether
the model is significant at the .01 level.
SOURCE
DF SS
MS F
Regression
2
0.30
Error
50
0.04
Total
11.67 Determine four things that are wrong in the following ANOVA table:
SOURCE
DF SS MS
F
p
Regression
4
180 40
2.0 .001
Error
25
500 20
Total
30
640
Case Study: United Way predicts donations of two firms using information on
wages, employment, and last years giving. They collect a random sample of 55
participating companies and fit the model:
Giving = 0 + 1 Wages + 2 EMP + 3 GiveLast +
where the variables in the model are defined as:
Giving
Total amount raised for United Way at each company (in dollars)
Wages Average employee annual wage (in thousands of dollars)
EMP
Number of employees at the company
GiveLast Total amount raised last year at the same company (in dollars)
Use the descriptive statistics and regression to answer questions 11.68-11.71:
Variable
N Mean Median TrMean StDev SEMean
Wages
55 27.42 28.00
26.98
10.41 1.40
EMP
55 203.0 150.0
180.9
143.0 19.3
GiveLast
55 6202 4460
5425
6238 841
The regression equation is
Giving = - 449 + 43.8 Wages + 2.93 EMP + 0.661 GiveLast
Min
Max
12.00 60.00
100.0 800.0
100.0 28208
s = 2210
R-sq = 81.1%
R-sq(adj) = 80.0%
Fit Stdev.Fit
95.0% C.I.
95.0% P.I.
?
298
(4841, 6038) (962, 9917)
Values for FIRM 1: Wages = 27.5, EMP = 200, and GiveLast = 6200
Page 95
Fit
Stdev.Fit
95.0% C.I.
95.0% P.I.
26104 12251
(1503, 50704) (1106, 51101) XX
Values for FIRM2: Wages = 45, EMP = 5000, and GiveLast = 15000
X denotes a row with X values away from the center
XX denotes a row with very extreme X values
11.68 The predicted value for Firm 1's giving is $________
11.69 We are 95% confidence that Firm 1 will donate between approximately $
____
and $ ____
this year.
11.70. The margin of error for the prediction of Firm 1's giving at the 95% level of
confidence is about ______ times the _______because the sample size is
_________ and Wages, EMP, and GiveLast are near their _______ values.
11.71. The XX extrapolation warning attached to the Firm 2s prediction is a result
of trying to predict corporate giving based on a level of _________ outside
the sample data range used to fit the model.
11.72
Which of the following measures the interval for the average value of
the dependent variable given particular values of the explanatory variables?
a. the prediction interval for Y
b. the confidence interval for the conditional mean of Y
c. the forecast interval for Y
d. the univariate confidence interval for Y
e. none of the above
11.73
For which of the following situations would I use a prediction interval?
a. estimating the number of defects in a car rolling off the assembly line at 4
P.M.
b. estimating the mean time spent by a sales clerks with customers of a
particular age
c. estimating the average number of years that CEOs retain their Jobs if they
have been with that same corporation 10 year
d. all of the above
e. none of the above
11.74
The standard error of the conditional mean s is related to the
standard error of the estimate s in that the former is equal to
a. s
Page 96
b. s/
c. s only when all explanatory variables are at their mean
d. s/
only when all explanatory variables are at their mean
e. always larger than s
n
11.75
Which of the following is true about the standard error of the
conditional mean s?
a. s is larger the further the explanatory variables are from their means
b. s is larger the closer the explanatory variables are to their means
c. s is unaffected by the explanatory variables, only is affected
d. s is larger the closer the explanatory variables are to one another
e. none of the above
11.76
In comparing the confidence interval for the conditional mean with the
prediction interval, which of the following is true?
a. prediction intervals are larger and more affected by extreme values for
explanatory variables
b. prediction intervals are larger but less affected by extreme values for
explanatory variables
c. prediction intervals are smaller but more affected by extreme values for
explanatory variables
d. prediction intervals are smaller and less affected by extreme values for
explanatory variables
e. comparisons depend on the specific model and sample being analyzed
11.77
In regression models, forecast intervals are a type of
a. confidence interval for the conditional mean
b. univariate confidence interval for the variable being forecast
c. prediction interval
d. confidence interval for the regression coefficient
e. none of the above
Page 97
11.78 A consortium of Florida cities hires a budget analyst to model the factors
affecting police force Size and predict the police force for three newly incorporated
cities. The model is fit for n = 56 middle-Sized cities (populations 25,000 to 100,000)
using the following model: FORCE = 0 + 1 VIOL +2 PRTX + 3 OLD% + 4 OLD% +
where the variables in the model are defined as:
FORCE
police force Size (number of officers on the city police force)
VIOL
violent crime rate (as a percent of total crimes)
PRTX
property tax (city tax in dollars per capita)
POP
city population (in thousands)
OLD%
elderly share of population (percent at least 65)
After generating descriptive univariate statistics on the independent variables, the
regression model is fit and three predictions made:
Variable
VIOL
PRTX
POP
OLD%
N
56
56
56
56
Mean
7.661
164.8
52.38
10.900
Median
6.750
103.0
47.55
10.550
TrMean
6.952
142.1
51.57
10.616
s = 20.36
R-sq = 85.0%
R-sq(adj) = 83.8%
Analysis of Variance
SOURCE
DF
SS
MS
Regression 4 119858
Error
51
21144 415
Total
55 141002
p
0.000
Fit Stdev.Fit
9 5.0% C.I.
95.0% P.I.
89.71
2.73
(84.24, 95.19)
(48.46, 130.96)
179.34 9.33 (160.60, 198.08) (134.36, 224.31)
66.19 18.90 (28.24, 104.13) (10.41, 121.97) XX
XX denotes a row with very extreme X values
PREDICTION 1
PREDICTION 2
PREDICTION 3
Page 98
Answer the following questions based on the preceding output and model:
A)
A printing error caused the SSR to be missing from the Analysis of Variance
table. The MSR is equal to
a. 290
b. 29,965
c. 98,714
d. 479,432
e. none of the above
B)
The same printing error caused the F-ratio also to be omitted from the
Analysis of Variance table. The F-ratio is approximately
a. 12
b. 17
c. 72
d. 98
e. none of the above
C)
D) After conducting the F-test at the = .01 significance level using the p-value
decision rule, what should we do?
a. reject H0 and conclude that the model is significant
b. reject H0 and conclude that the model is not significant
c. cannot reject H0 and conclude that the model is significant
d. cannot reject H0 and conclude that the model is not significant
e. insufficient information provided to conduct the test
E)
We are 95% confident that the mean police force for hundreds of cities with
the same characteristics as the first city is between approximately which two
values?
a. 90 to 95
b. 84 to 90
c. 84 to 95
d. 48 to 131
e. 69 to 110
Page 99
F)
Which of the following is not true about the prediction for the first city?
a. the point prediction is based on explanatory variables near their sample
means
b. the prediction interval is approximately four times the standard error of
the estimate
c. the prediction interval is narrower than for virtually any other possible
prediction
d. all of the above are true
G)
The prediction interval is wider for the second city than for the first city
because
a. the second prediction involves extrapolation
b. the second prediction involves explanatory variables are not near the
sample means
c. the second prediction involves a larger Sized police force
d. the second prediction is made after the first
e. none of the above
H)
In this case, the prediction for the third city causes an "XX" warning to be
issued on the printout because
a. one of the explanatory variables lies outside the range of the sample data
b. two of the explanatory variables lies outside the range of the sample data
c. three of the explanatory variables lies outside the range of the sample data
d. all four of the explanatory variables lies outside the range of the sample
data
e. the warning is not relevant to this case because no time series forecasting
is involved
Page 100
11.79
To perform rate studies and issue municipal bonds, counties need to model and
estimate electrical power usage needs. County also use resulting information to
determine whether it is more profitable to purchase on the spot market, enter into
long term supply contracts with utilities, or build their own generating facilities.
Monroe country in the Keywest resort area of Florida gathers quarterly time series
data on T = 28 quarters from 1982-1988 on the following variables:
powert =
DD coolt =
customert =
retailt =
residential power usage (in millions of kilowatt hours) during tth quarter
cooling degree days (a measure of temperature)
number of billed residences (in thousands) during the tth quarter
Florida taxable retail sales (in billions of dollar) during the tth quarter
A consultant for the county's utility board obtains the following regression output
on this data:
Predict for ddcool = 400, customer = 15, retail = 6;
Predict for ddcool = 1197, customer = 16.36, retail = 6.77.
The regression equation is
power = - 48.8 + 0.0104 DD cool + 4.38 customer - 0.21 retail
Predictor
Coef
StErr
Constant -48.83
11.12
DD cool 0.010438 0.001517
customer
4.379
1.084
retail
-0.210
1.452
s = 3.325
R-sq = 80.4%
t-ratio
p
-4.39 0.000
6.88 0.000
4.04
0.000
-0.14
0.886
R-sq(adj) = 78.0%
Analysis of Variance
SOURCE
DF
SS
MS
F
p
Regression 3 1043.06
347.7 32.8 0.000
Error
24 254.28
10.6
Total
27 1297.34
Fit
Stdev.Fit
95% C.I.
95% P.I.
19.771 1.642 ( 16.382, 23.160) ( 12.116, 27.425) PREDICTION 1
33.884 0.628 ( 32.587, 35.181) ( 26.899, 40.868) PREDICTION 2
MEAN 'DD cool' = 1197.0; MEAN 'customer' = 16.359; Mean 'retail'= 6.7732
Page 101
Answer the following questions based on the preceding output and model:
A)
Modeling and F-Test of the Model: Formally present the model being
estimated.
Hint: Use variable names (DD cool, customer, and retail) and the beta () parameters as
variable coefficients:
B) The hypotheses associated with the F test on the model may be constructed in
terms of the betas ('s) of the model is given as:
C) Conduct the F-test at the = .01 significance level using the p-value decision
rule and then state your conclusion in one sentence.
D) Check that the 19.771 Fit from PREDICTION 1 is correct:
Hint: Use the fitted equation to predict power usage when there are 400 cooling degree
days during the quarter, 15 thousand customers and retail sales are $6 billion.
E) Based on PREDICTION 1, which 95% confidence interval would you report if you
were forecasting power usage for a particular quarter with those explanatory
variable values.
F) Based on PREDICTION 1, which 95% confidence interval would you report if you
you wanted to capture average power usage for many quarters with those
explanatory variable values.
G) Examine PREDICTION 2, which is based on 1197 cooling degrees days, 16,360
customers, and $6.77 billion in retail sales. Explain why standard deviation of
the fit for PREDICTION 2 is so much smaller (only 0.63) than the "Stdev.Fit"
for PREDICTION 1 (1.64).
Page 102
11.80 A bank analyst collects monthly data on the economy for the period just prior
the 1990-92 recession. The sample consists of monthly time series data on the U.S.
economy from 1988 until the middle of 1990, 30 consecutive months. The variables
in the model are defined
unemp
conf
starts
invent
Stdev t-ratio
p
1.104
8.61 0.000
0.008559 -0.52 0.607
0.1971 -1.45 0.158
0.001652 -5.74 0.000
A) Formally present in equation form the model being estimated. Use the variable
names (unemp, conf, starts, and invent) instead of Y, X1, X2, and X3), and don't
forget to use the beta () parameters as variable coefficients.
B) Complete the null-alternative hypotheses for a two-tailed significance test of the
'invent' variable. [use the proper j]: H0: ___ and HA: ___
C) Next, construct the one-tailed hypothesis to test whether there is an inverse
relationship of housing starts with the dependent variable.
[use the proper j]: H0: ___ and HA: ___
D) Explain in one sentence why we are justified in conducting the one-tailed test in
question (C) above.
E) Using the p-value decision rule, conduct each of the two tests from questions B
and C at the = .05 level; in each case, show which two numbers you compared
and determine whether each null hypothesis can or cannot be rejected.
Page 103
Page 104
L.
Page 105
Page 106
12.6 One difference between the sign test and the t-test is that the sign test
a. does not assume a normal distribution of the sampling statistic
b. tests hypotheses related to the mean
c. involves only the sum of ranks
d. all of the above
e. none of the above
12.7 One difference between the sign test and the Wilcoxon test is that the
Wilcoxon test
a. does not assume a normal distribution of the sampling statistic
b. tests hypotheses related to the mean
c. involves only the sum of ranks
d. all of the above
e. none of the above
12.8 Given a null hypothesis of M = 10, for which of the following samples would
the sign test yield different results:
a. 5, 5, 5, 5, 5, 15
b. 1, 1, 1, 1, 9, 15
c. 5, 5, 5, 5, 5, 1000
d. 5, 5, 5, 5, 15, 15
e. all of the above would yield the same sign test results
12.9 Given a null hypothesis of M = 10, for which of the following samples would
the Wilcoxon test yield different results
a. 1, 2, 3, 4, 15
b. 6, 7, 8, 9, 15
c. 1, 2, 3, 4, 1000
d. 9, 9, 9, 9, 12
e. all of the above would yield the identical Wilcoxon test results
12.10 For nonparametric tests on the two samples 5, 6, 7, 8, 9, 20 and 4, 5, 6, 7, 8,
11 using a null hypothesis of M = 10, we would be
a. less likely to reject H0 with the first sample if we used the Wilcoxon test
b. less likely to reject H0 with the first sample if we used the sign test
c. less likely to reject H0 with the first sample if we used either test
d. equally likely to reject H0 for the first and second sample if we used the
sign test
e. both a and d
Page 107
MEDIAN
86.65
Page 108
B) Use the histogram to decide which test (or tests) are justified by the
distributional assumptions. Carefully explain your reasoning.
Histogram of fuelcost N = 32
Midpoint Count
50
1 *
55
4 ****
60
1 *
65
1 *
70
1 *
75
0
80
3 ***
85
6 ******
90
6 ******
95
4 ****
100 4 ****
105 1 *