You are on page 1of 2

AB1202 – Statistics and Quantitative Methods In-Lecture Exercise Solutions

AB1202 – STATISTICS AND QUANTITATIVE METHODS


In-Lecture Exercise (ILE) Solutions (Week 10)
Lecture 10 – Model Building

Q1. Which of the following statement is definitely TRUE?


(a) An order 3 polynomial model is better than an order 2 polynomial model
(b) An order 2 polynomial model is better than an order 1 polynomial model
(c) An order 1 polynomial model is better than an order 0 polynomial model
[CORRECT]

Q2. A qualitative random variable with a sample space of {red, green, blue}
______________________.
(a) can be modeled in a regression using 2 binary variables [CORRECT]
(b) can be modeled in a regression using 2 normally distributed variables
(c) can only be modeled in a regression using 3 frequency count variables

Q3. A regression model must have all continuous explanatory random variables - for
otherwise the notion of “gradient” would not quite make sense. However, in the airline
seating model discussed in the lecture slides, the model was given as y = 9.8994 - 0.1063 A
+ 0.948 D - 0.144 XE - 1.1368 XB, where XE and XB have a sample space of {0, 1}. The
best explanation to reconcile them is:
(a) the binary variables are automatically treated by regression software as
continuous random variables to complete the regression
(b) the binary variables basically force the regression model to regress only the
economy, or only the business, or only the first class passengers, but never
mixing them together [CORRECT]
(c) the binary variables provide insight into the reality by interpolating between
values in the sample space.

Q4. Which statement is TRUE about multicollinearity?


(a) it is unavoidable in practice [CORRECT]
(b) it can be eliminated altogether in an optimized regression model
(c) it is very much sought for in any regression modeling work

Q5. Which of the following CANNOT provide insight into the presence of multicollinearity?
(a) Variance Inflation Factor
(b) Covariance Matrix [CORRECT]
(c) Correlation Matrix

Q6. When we assessed that explanatory variables X1 and X3 are strongly negatively
linearly correlated in our multiple regression model involving many variables, we should:
(a) remove both X1 and X3 from the model
(b) retain both X1 and X3
(c) remove either X1 or X3 depending on which contextually makes more sense
[CORRECT]

Q7. Which of the following is an objective function which allows us to compare the
goodness of models?
(a) Akai Information Criteria
(b) p-value [CORRECT]
(c) R^2

2018SEM1 Page 1
AB1202 – Statistics and Quantitative Methods In-Lecture Exercise Solutions

Q8. Running R with 20 explanatory variables and sufficiently large number of samples, a
stepwise forward regression model using AIC gives the best model with only 5 explanatory
variables. The lecture slide says it is dangerous to use the best model without further
thoughts. Which of the following best explains the reason?
(a) the 5-variable model might not give the best objective function value
(b) the 5 variables’ coefficients could be all zeros
(c) the 5 variables selected might not be contextually the most natural or
meaningful [CORRECT]

Q9. Suppose R runs a stepwise forward regression with 3 explanatory variables, X1, X2
and X3. In a particular step, it has arrived at Y ~ X1 with an AIC value of 1.33. It then
shows “+ X2” with an AIC of 2.14 and “+ X3” of -0.51. What action would R take next?
(a) Add X2 to its model to give Y ~ X1 + X2
(b) Add X3 to its model to give Y ~ X1 + X3 [CORRECT]
(c) Stop and report its best model is Y ~ X1.

Q10. A 20-explanatory variable values (X1, X2, …, X20) were gathered in 100 samples to
be regressed against response variable Y. In another exercise, 3 out of the 20 variables
were manually selected by consultation with industrial experts and professors and then
regressed with Y using the 100 samples. On the basis of just this assessment alone, which
model’s coefficients would be expected to be more reliable?
(a) the full 20-variable model
(b) the 3-variable model [CORRECT]
(c) both models give equally reliable coefficients, just that the model significance
would be different

2018SEM1 Page 2

You might also like