You are on page 1of 10

Quiz 4 Review Questions

DSCI 305

Questions 1-15 are True/False questions.

1. A regression model describes the causal relationship between a response variable and a set of
explanatory variables. True

2. The least square method minimizes the total variation in the response variable of a regression model.
False. The least square method minimizes the sum of squares due to errors, which is the
unexplained portion of the total variation in the response variable.

3. The least square method maximizes the variation explained by the regression model.
True. Minimizing SSE is equivalent to maximizing SSR.

4. If SSR = SST in a regression model, the coefficient of determination is zero.


False. If SSR = SST in a regression model, the coefficient of determination is one.

5. If all the points in a scatter diagram between X and Y are on a straight line, SSE = 0.
True.

6. The larger the sample size, the larger the proportion of variation explained by the regression model.
False. The coefficient of determination or the explanatory power of a regression model would not
be improved by increasing the sample size. It would be improved by adding new explanatory
variables which are significantly related to the response variable.

7. To test the relationship between the response variable and the explanatory variable in a simple
regression model, one can use either the t test or the F test.
True

8. The degrees of freedom of SSE (the error sum of squares) in a simple linear regression model are the
sample size minus two.
True

9. Regression results are usually insensitive to unusual observations such as outliers and influential
observations.
False. Regression results are sensitive to unusual observations such as outliers and influential
observations

10. An observation far away from the mean of the explanatory variable could be an influential
observation.
True. An observation far way from the regression line is called an outlier

11. The condition of equal error variances over all relevant ranges of the variables in a regression model is
called homoscedasticity.
True

1
12. If the normal probability plot of residuals in a regression model is close to a straight line, one can
assume that the sample observations of the response variable are from a Normal distribution.
True

13. The width of a confidence interval for the mean Y in a simple regression model decreases as the
sample size increases.
True. The width of a confidence for the mean Y and the width of a prediction interval for the
individual Y depend on the confidence level, the data dispersion, the sample size, and the distance
of a given X value from its mean

14. The width of a confidence interval for the individual Y becomes wider as the value of X is farther
away from its mean.
True

15. For a given X value in a simple regression model, a prediction interval for the individual Y is always
wider than a confidence interval for the mean Y.
True

2
The following description is for questions 16-34.

How important is long driving to get a low score in golf? To investigate this, data on 122 players on the
PGA tour in 2017 are examined. The data are given in “Quiz 4 review questions (DATA).” The average
driving distance per round is used to predict the average score using the simple linear regression model.

16. What are the response and the explanatory variables in this study?
The response variable is the average score and
the explanatory variable is the average driving distance per round

17. What are the degrees of the freedom of the quantity s, the standard error of regression?
The degrees of the freedom of the quantity s are n – 2 = 122 – 2 = 120

18. What is the estimated regression line?


S (average score) = 78.7835 – 0.0239 D (average driving distance)

19. What are the hypotheses of the t and F procedures to test the significance of the relationship
between the response and the explanatory variables?
H0: 1 = 0, Ha: 1  0

20. Perform the t test at  = 5%.


Reject the null hypothesis since the P-value is less than 0.05

21. Perform the F test at  = 1%.


Reject the null hypothesis since the P-value is less than 0.01

22. Construct a 95% confidence interval for the slope 1 in the simple linear regression model.

23. What is the proportion of the variation in the average score explained by the regression model?
7.77% (r2 = 0.0777)

24. What is the correlation between the average score and the average driving distance?
Since the slope of the regression line is negative, the correlation, r=−0.2787

25. Are there any outliers in the data?


One of the standard residuals is 3.634 (greater than 3). Other than this point, all other standard
residuals are within -3 and 3

26. Construct the normal probability plot of the residuals Evaluate the normality assumption.
The normal probability plot appears to be a straight line except three extreme point. The residuals

3
are close to normal

27. Construct a plot of residuals against the average driving distance. Evaluate the homoscedasticity
assumption.
There is no pattern in the residual plot to suspect any violation of homoscedasticity

28. Are the residuals independent from each other?


The lagged residual plot appears to be random. So, we accept the assumption that the residuals are
independent

29. Which of the following conclusions seems most justified?


(a) There is no evidence of a relationship between the average driving distance and the average score
in 2017 data.
(b) There is distinctive evidence (P-value less than 0.05) that there is a positive correlation between
the average driving distance and the average score in 2017 data.
(c) There is distinctive evidence that PGA tour players who averaged longer driving distance had
4
lower average scores in 2017.
(d) The presence of strongly influential observations in these data makes it impossible to draw any
conclusions about the relationship between the average driving distance and the average score in
2017 data.
(c) Since the P-value is 0.0019, there is a statistically significant evidence of a negative
correlation

30. According to the regression line, find a point estimate of the average score in 2017 for PGA tour
players who averaged 300 yards per drive.
S = 78.7835 – 0.0239 D =78.7835 − 0.0239*300 = 71.60

31. Find the standard error of fit associated with a 95% confidence interval for the average score of all
PGA tour players who averaged 300 yards per drive. Use an approximation formula.

32. Construct a 95% confidence interval for the average score of all PGA tour players who averaged 300
yards per drive.

First 78.783-.024*300 = 71.600


Do t.inv.2t(.05,120) to get t*
1.1326 /squr(122) = standard error / squr of observations

33. Find the standard error of prediction associated with a 95% prediction interval for the average score of
a PGA tour player who averaged 300 yards per drive. Use an approximation formula.

34. Construct a 95% prediction interval for the average score of a PGA tour player who averaged 300
yards per drive.

The following description is for questions 35-41.

Below is an Excel output of a simple regression model: Y  0  1 X   .

5
SUMMARY OUTPUT

Regression Statistics
Multiple R ?
R Square ?
Adjusted R Square 0.5500
Standard Error ?
Observations 22

ANOVA
df SS MS F Sig. F
Regression ? ? 220 ? 0.0000
Residual ? ? ?
Total ? 385

Coef. St. Error t Stat P-value


Intercept 5.4 2.270 ? 0.0274
X 2.7 0.523 ? 0.0000

35. How much variation in Y cannot be explained by the regression model?


SSR = MSR * DFR = 220 * 1 = 220
SSE = SST – SSR = 385 – 220 = 165

36. Find the coefficient of determination of the regression model.

37. Find the degrees of freedom associated with SSE.

38. What is the standard error of regression?

39. Construct a 99% confidence interval for 1 .

40. Find the value of the t statistic for testing the hypotheses H0: 1 = 0, Ha: 1  0.

6
41. Find the value of the F statistic for testing the hypotheses H0: 1 = 0, Ha: 1  0.

 A regression line obtained by the least squares method minimizes the sum of squared residuals.
o True
 The debt-to-capital ratio of a company signifies the amount of financial risk a company is
taking. If the ratio is high, the risk is high. The data on debt-to-capital ratio and return on
capital for 24 health care companies are given in "Quiz 4 (DATA)." Develop a simple linear
regression model to predict the return on capital using the debt-to-capital ratio, and answer the
following questions:
o The debt to capital ratio
 The sample correlation between the return on capital and the debt-to-capital ratio is (Give your
answer to 3 deciaml places.)
o -.519
o Make sure you look at the slope to see if negative or not

 The proportion of variation in the return on capital explained by the debt-to-capital ratio is
(Give your answer to 3 decimal places.)
o .269
 Based on the regression output, we conclude at α=5%α=5% that the linear relationship
between the response variable and the explanatory variable is
o Significant since the P-Value is less than the significance level

7
 The lower and upper limits of a 95% confidence interval for the population slope β1β1 are
(Give your answer to 3 decimal places.)
o Lower: -.732
o Upper: -.115
 Consider company #5 where the debt-to-capital ratio = 15.6 and the return on capital =
31.8. The residual of this observation is (Give your answer to 1 decimal place.)
o 3.8
o predicted return on capital = 34.614 - 0.423 x 15.6 = 28.0,
residual = 31.8 - 28.0 = 3.8
 Which of the following plots is the normal probability plot of the residuals?

o Plot A

 How many potential outliers are in the data?


o No outliers
 A point estimate for the return on capital when the debt-to-capital ratio is 20% is (Give your
answer to 1 decimal place.)
o 26.2
o predicted return on capital = 34.614 - 0.423 x 20 = 26.2

 The standard error of prediction associated with a 95% prediction interval for the return on
capital when the debt-to-capital ratio is 20% is approximately
o 10.9751+124−−−−−
o −√10.9751+124

8
 A plot of the residuals versus the lagged residuals is shown below. This plot shows (Note
that this is not a residual plot but a lagged residual plot.)

o No distinct violation of the independence of errors


 For a given confidence level, a confidence interval for the mean Y is wider than a
prediction interval for an individual Y.
o False
 In a simple regression model, the width of a confidence interval for the response variable,
evaluated at X = Xi, increases
o As the distance of Xi from the mean of the explanatory variable increases.
 Which of the following chart is most appropriate to examine homoscedasticity?
o A residual plot
 The errors in a simple linear regression model are assumed to be normal, homoscedastic,
and independent from each other.
o True
 If the spread of residuals either increases or decreases as an independent variable increases,
the assumption of constant variance is not met. This problem is called heteroscedasticity.
o True
 In a simple regression model, the square of the correlation between the response variable
and the explanatory variable is equal to the coefficient of determination.
o True
 Below is an Excel output of a simple regression model:

9

 The coefficient of determination of the model is (Give your answer to 3 decimal places.)
o .778
o SSR = SST – SSE = 310.67-69.12=241.55
o R^2 = SSR/SST = 241.55/310.67 = .778
 The degrees of freedom associated with SSE are
o 13
o N – p – 1 = 15 – 1- 1 = 13
 The standard error of regression is (Give your answer to 3 decimal places.)
o 2.306
o MSE= SSE/DFE = 69.12/13=5.3169
o S = Squr(MSE) = 2.306
 The F statistic for testing H0:β1=0H0:β1=0 is (Give your answer to 2 decimal places.)
o 45.41
o F= MSR/MSE
o MSR= SSR/P, = 241.55/1= 241.55
o MSE= SSE/(n-p-1), = 69.12/13=5.31692
o F = 241.55/5.31692= 45.41

10

You might also like