Professional Documents
Culture Documents
DSCI 305
1. A regression model describes the causal relationship between a response variable and a set of
explanatory variables. True
2. The least square method minimizes the total variation in the response variable of a regression model.
False. The least square method minimizes the sum of squares due to errors, which is the
unexplained portion of the total variation in the response variable.
3. The least square method maximizes the variation explained by the regression model.
True. Minimizing SSE is equivalent to maximizing SSR.
5. If all the points in a scatter diagram between X and Y are on a straight line, SSE = 0.
True.
6. The larger the sample size, the larger the proportion of variation explained by the regression model.
False. The coefficient of determination or the explanatory power of a regression model would not
be improved by increasing the sample size. It would be improved by adding new explanatory
variables which are significantly related to the response variable.
7. To test the relationship between the response variable and the explanatory variable in a simple
regression model, one can use either the t test or the F test.
True
8. The degrees of freedom of SSE (the error sum of squares) in a simple linear regression model are the
sample size minus two.
True
9. Regression results are usually insensitive to unusual observations such as outliers and influential
observations.
False. Regression results are sensitive to unusual observations such as outliers and influential
observations
10. An observation far away from the mean of the explanatory variable could be an influential
observation.
True. An observation far way from the regression line is called an outlier
11. The condition of equal error variances over all relevant ranges of the variables in a regression model is
called homoscedasticity.
True
1
12. If the normal probability plot of residuals in a regression model is close to a straight line, one can
assume that the sample observations of the response variable are from a Normal distribution.
True
13. The width of a confidence interval for the mean Y in a simple regression model decreases as the
sample size increases.
True. The width of a confidence for the mean Y and the width of a prediction interval for the
individual Y depend on the confidence level, the data dispersion, the sample size, and the distance
of a given X value from its mean
14. The width of a confidence interval for the individual Y becomes wider as the value of X is farther
away from its mean.
True
15. For a given X value in a simple regression model, a prediction interval for the individual Y is always
wider than a confidence interval for the mean Y.
True
2
The following description is for questions 16-34.
How important is long driving to get a low score in golf? To investigate this, data on 122 players on the
PGA tour in 2017 are examined. The data are given in “Quiz 4 review questions (DATA).” The average
driving distance per round is used to predict the average score using the simple linear regression model.
16. What are the response and the explanatory variables in this study?
The response variable is the average score and
the explanatory variable is the average driving distance per round
17. What are the degrees of the freedom of the quantity s, the standard error of regression?
The degrees of the freedom of the quantity s are n – 2 = 122 – 2 = 120
19. What are the hypotheses of the t and F procedures to test the significance of the relationship
between the response and the explanatory variables?
H0: 1 = 0, Ha: 1 0
22. Construct a 95% confidence interval for the slope 1 in the simple linear regression model.
23. What is the proportion of the variation in the average score explained by the regression model?
7.77% (r2 = 0.0777)
24. What is the correlation between the average score and the average driving distance?
Since the slope of the regression line is negative, the correlation, r=−0.2787
26. Construct the normal probability plot of the residuals Evaluate the normality assumption.
The normal probability plot appears to be a straight line except three extreme point. The residuals
3
are close to normal
27. Construct a plot of residuals against the average driving distance. Evaluate the homoscedasticity
assumption.
There is no pattern in the residual plot to suspect any violation of homoscedasticity
30. According to the regression line, find a point estimate of the average score in 2017 for PGA tour
players who averaged 300 yards per drive.
S = 78.7835 – 0.0239 D =78.7835 − 0.0239*300 = 71.60
31. Find the standard error of fit associated with a 95% confidence interval for the average score of all
PGA tour players who averaged 300 yards per drive. Use an approximation formula.
32. Construct a 95% confidence interval for the average score of all PGA tour players who averaged 300
yards per drive.
33. Find the standard error of prediction associated with a 95% prediction interval for the average score of
a PGA tour player who averaged 300 yards per drive. Use an approximation formula.
34. Construct a 95% prediction interval for the average score of a PGA tour player who averaged 300
yards per drive.
5
SUMMARY OUTPUT
Regression Statistics
Multiple R ?
R Square ?
Adjusted R Square 0.5500
Standard Error ?
Observations 22
ANOVA
df SS MS F Sig. F
Regression ? ? 220 ? 0.0000
Residual ? ? ?
Total ? 385
40. Find the value of the t statistic for testing the hypotheses H0: 1 = 0, Ha: 1 0.
6
41. Find the value of the F statistic for testing the hypotheses H0: 1 = 0, Ha: 1 0.
A regression line obtained by the least squares method minimizes the sum of squared residuals.
o True
The debt-to-capital ratio of a company signifies the amount of financial risk a company is
taking. If the ratio is high, the risk is high. The data on debt-to-capital ratio and return on
capital for 24 health care companies are given in "Quiz 4 (DATA)." Develop a simple linear
regression model to predict the return on capital using the debt-to-capital ratio, and answer the
following questions:
o The debt to capital ratio
The sample correlation between the return on capital and the debt-to-capital ratio is (Give your
answer to 3 deciaml places.)
o -.519
o Make sure you look at the slope to see if negative or not
The proportion of variation in the return on capital explained by the debt-to-capital ratio is
(Give your answer to 3 decimal places.)
o .269
Based on the regression output, we conclude at α=5%α=5% that the linear relationship
between the response variable and the explanatory variable is
o Significant since the P-Value is less than the significance level
7
The lower and upper limits of a 95% confidence interval for the population slope β1β1 are
(Give your answer to 3 decimal places.)
o Lower: -.732
o Upper: -.115
Consider company #5 where the debt-to-capital ratio = 15.6 and the return on capital =
31.8. The residual of this observation is (Give your answer to 1 decimal place.)
o 3.8
o predicted return on capital = 34.614 - 0.423 x 15.6 = 28.0,
residual = 31.8 - 28.0 = 3.8
Which of the following plots is the normal probability plot of the residuals?
o Plot A
The standard error of prediction associated with a 95% prediction interval for the return on
capital when the debt-to-capital ratio is 20% is approximately
o 10.9751+124−−−−−
o −√10.9751+124
8
A plot of the residuals versus the lagged residuals is shown below. This plot shows (Note
that this is not a residual plot but a lagged residual plot.)
9
The coefficient of determination of the model is (Give your answer to 3 decimal places.)
o .778
o SSR = SST – SSE = 310.67-69.12=241.55
o R^2 = SSR/SST = 241.55/310.67 = .778
The degrees of freedom associated with SSE are
o 13
o N – p – 1 = 15 – 1- 1 = 13
The standard error of regression is (Give your answer to 3 decimal places.)
o 2.306
o MSE= SSE/DFE = 69.12/13=5.3169
o S = Squr(MSE) = 2.306
The F statistic for testing H0:β1=0H0:β1=0 is (Give your answer to 2 decimal places.)
o 45.41
o F= MSR/MSE
o MSR= SSR/P, = 241.55/1= 241.55
o MSE= SSE/(n-p-1), = 69.12/13=5.31692
o F = 241.55/5.31692= 45.41
10