Chapter 12

Chapter 12, page 1
Math 445 Chapter 12: Strategies for Variable Selection
Chapter 12 introduces criteria for model selection and comparison and discusses some standard
methods of choosing models. Model selection depends on the objectives of the study. Ramsey and
Schafer identify three possible objectives (pp. 345-6) that will influence how you select a model or
models:
1. Adjusting for a large set of explanatory variables. We want to examine the effect of a
particular variable or variables after adjusting for the effect of other variables which we know
may affect the response.
2. Fishing for explanation. Which variables are important in explaining the response?
3. Prediction. All that is desired is a model that predicts the response well from the values of the
explanatory variables. Interpretation of the model is not a goal.
Before considering some criteria by which a “best” model might be selected among a class of models,
let’s review model development.
Model Development Steps

• Variable Selection: identification of response variable(s) and all candidate explanatory variables.
This is done in the planning stages of a study. Note that a general rule of thumb is that you need
5 to 10 times as many observations as there are explanatory variables in order to do a good job of
model selection and fitting.
• Model Formation: fitting and comparing models based on some selection criteria to determine
one or more candidate models.
• Model Diagnostics: checking for problems in the model and/or its assumptions.
1. Residual Analysis - identifying outliers, missing variables, model lack of fit, and
violation of assumptions.
2. Influence Statistics - identifying influential observations, or those which have a great
effect on the form of the model.
Example: Suppose we are studying differences in abundance of bird species in 3 forest habitats. The
habitats represent various levels of prescribed burns. The experiment itself consists of counting the
number of birds of each species type heard from a station within 100 meters in a 10-minute period.
Many stations were used in the study for replication.
What is the response variable?
Explanatory variables? Habitat type, Neighboring habitat type, elevation, slope, aspect, visibility, etc.
Suppose we have 8 candidate explanatory variables X 1 , … , X 8 . How many possible first-order models
are there?
With 20 variables, there are 1,048,575 models, and these are only the first-order models. Clearly
fitting all possible models is not a feasible prospect.
Chapter 12, page 2
Criteria for selecting models
1. R2 : R2 cannot decrease when variables are added so the model maximizing R2 is the one with
all the variables. Maximizing R2 is equivalent to minimizing SSE. R2 is an appropriate way to
compare models with the same number of explanatory variables (as long as the response
variable is the same). Be aware that measures like R2 based on correlations are sensitive to
outliers.
2. MSE = SSE/(n-p): MSE can increase when variables are added to the model so minimizing
MSE is a reasonable procedure. However, minimizing MSE is equivalent to maximizing
adjusted R2 (discussed below) and tends to overfit (include too many variables).
3. Adjusted R2 : This statistic adjusts R2 by including a penalty for the number of parameters in
the model. This statistic is closely related to both R2 and MSE, as shown below.
Adjusted R2 =
Total mean square - Residual mean square MST − MSE MSE ( p − 1)(1 − R 2 )
= = 1− = R2 −
Total mean square MST MST n− p
where p is the number of coefficients (including the intercept) in the model. The third
expression shows that maximizing adjusted R2 is equivalent to minimizing MSE since MST is
fixed (it’s simply the variance of the response variable).
• Adjusted R2 tends to select models with too many variables (overfitting). This can be seen
from the fact that adjusted R2 will increase when a variable is added if the F statistic for
comparing the two models is greater than 1. This is a very generous criterion as this
corresponds to a significance level of around .5.
4. Mallows’ Cp: The Cp statistic assumes that the full model with all variables fits. Then Cp is
computed for a reduced model as
Cp = p + (n − p )
(σˆ
− σˆ full
2 2
)= ( n − p )
σˆ 2
+ 2p − n
σˆ full
2
σˆ full
2
where p is the number of coefficients (including the intercept) in the reduced model.
• Note that σ̂ 2 is simply MSE (mean square error or mean square residual) for a model.
• Models with small values of Cp are considered better and, ideally, we look for the smallest
model with a Cp of around p or smaller. Some statistics programs will compute Cp for a
large set of models and plot Cp versus p, as in Display 12.9 on p. 357. Unfortunately, SPSS
does not compute Cp automatically.
• CP assumes that the full model fits and satisfies all the regression model assumptions.
Outliers, unexplained nonlinearity, nonconstant variance, may seriously affect the
performance of Cp as a model selection tool.
Chapter 12, page 3
• Mallow's Cp is closely related to AIC. AIC has come to be preferred by many statisticians
in recent years.
5. Akaike's Information Criterion (AIC): The AIC statistic for a model is given by:
⎛ SSE ⎞
AIC = n ln⎜ ⎟ + 2p
⎝ n ⎠
where SSE = the error SS for the model under consideration and ln is natural log.
• The model with the smallest AIC value is considered best.
• The term 2p is the penalty for the number of parameters in the model.
• Ripley: “AIC has been criticized in asymptotic studies and simulation studies for tending to
over-fit, that is, choose a model at least as large as the true model. That is a virtue, not a
deficiency: this is a prediction-based criterion, not an explanation based one.'' BIC (below)
is a criterion based on “explanation” approach and places a bigger penalty on the number of
parameters.
• AIC can only be used to compare models. It is not an absolute measure of fit of the model
like R2 is. The model with the smallest AIC among those you examined may fit the data
best, but this does not mean it's a good model. Therefore, selecting which models to
consider (which variables, transformations, form of the model) and making sure the models
satisfy the regression model assumptions is very important.
• Since AIC is not an absolute measure of fit, many authors suggest reporting ∆AIC, the
difference between the AIC of each model and the AIC of the best fitting model. A further
suggestion is to consider all models with ∆AIC less than about 2 as having essentially equal
support.
• Neither AIC nor Cp nor R2 nor adjusted R2 can be used to compare models with different
response variables.
• AIC is based on the assumption that the models satisfy the regression model assumptions
and can be greatly affected by outliers.
6. Bayesian Information Criterion (BIC). BIC is similar to AIC but the penalty on the number
of parameters is pln(n) where ln is the natural log. That is,
⎛ SSE ⎞
BIC = n ln⎜ ⎟ + p ln(n)
⎝ n ⎠
BIC is motivated by a Bayesian approach to model selection and is said not to tend to overfit
like AIC. Therefore, it may be better for model selection for “explanation.” The purpose of
having the penalty depend on the sample size n is to reduce the likelihood that small and
relatively unimportant parameters are included (which is more likely with large n).
Chapter 12, page 4
7. PRESS Statistic (not in text): another prediction-based model selection statistic is the PRESS
statistic. It is calculated as follows: Remove the ith observation and fit the model with the
remaining n-1 observations. Then use this model to calculate a predicted value for the left-out
observation; call this predicted value Yi* . Compute Yi − Yi* , the difference between the
observed response and the predicted response from the model without the ith observation in it.
Repeat this process for each data value. The PRESS statistic is then defined as:
∑ (Yi − Yi* )
n
2
PRESS =
i =1
• The model with the smallest PRESS statistic is considered “best.”
• Leaving one item out at a time is known as n-fold cross-validation or leave-one-out cross-
validation..
• The Yi − Yi* are called “deleted” residuals in SPSS. So the PRESS statistic can be
computed in SPSS by saving the deleted residuals, creating a new variable which is the
square of the deleted residuals, then computing the sum of this new variable using
Analyze…Descriptive Statistics…Descriptives and choosing Sum under Options.
• PRESS is similar to SSE, but is based on the deleted residuals rather than the raw residuals.
Unlike SSE, it’s possible for PRESS to increase when variables are added to the model.
The PRESS statistic is an example of the general idea of using crossvalidation to assess the predictive
power of models. A model will generally predict the data it's based on better than new data and bigger
models will necessarily do a better job of predicting the data they’re based on than smaller models:
SSE always decreases as more terms are added to the model. A less biased way of assessing the
predictive power of a model is to use the following general idea: fit a model using a subset of the data,
then validate the model using the remainder of the data. This is called crossvalidation (abbreviated
CV).
In k-fold CV, the data are randomly split into k approximately equal-sized subsets. Each subset is left
out in turn and the model based on the remaining subsets is used to predict for the left-out subset. The
PRESS statistic is based on n-fold CV, that is, only one observation at a time is left out. Simulations
have suggested that smaller values of k may work better; 10-fold CV has become a standard method of
cross-validation. Cross-validation is most useful as a way to compare models rather than as an
absolute measure of how good the predictions will be. This is because the model used for prediction of
each subset is different than the model based on all the data that will actually be used to predict future
observations. Each of the models being compared should use the same splits of the data. It’s also best
to repeat the 10-fold CV several times and average the results.
Chapter 12, page 5
Example
Ozone data without case 17. n = 110 cases. Dependent variable is log10(ozone).
All possible models with main effects and two-way interactions

Model p SSE R2 MSE AIC BIC PRESS
W + T + S + W:T + W:S + T:S 7 21.534 0.695 0.209 -165.39 -146.49 25.62
W + T + S + W:T + W:S 6 22.152 0.687 0.213 -164.28 -148.08 25.56
W + T + S + W:T + T:S 6 21.537 0.695 0.207 -167.38 -151.17 24.51
W + T + S + W:S + T:S 6 21.867 0.691 0.210 -165.70 -149.50 25.44
W + T + S + W:T 22.182 24.55
W + T + S + W:S 5 22.726 0.679 0.216 -163.47 -149.96 25.63
W + T + S + T:S 5 21.897 0.690 0.209 -167.56 -154.05 24.54
W+T+S 4 23.069 0.674 0.218 -163.82 -153.02 25.20
W + T + W:T 4 26.372 0.627 0.249 -149.10 -138.30 28.54
W+T 3 26.995 0.618 0.252 -148.53 -140.43 28.78
W + S + W:S 4 36.121 0.489 0.341 -114.50 -103.69 39.39
W+S 3 36.410 0.485 0.340 -115.62 -107.52 38.70
T + S + T:S 4 27.029 0.618 0.255 -146.39 -135.59 29.22
T+S 3 28.038 0.603 0.262 -144.36 -136.26 29.68
W 2 44.985 0.364 0.417 -94.36 -88.95 46.84
T 2 31.908 0.549 0.295 -132.14 -126.74 32.98
S 2 57.974 0.180 0.537 -66.45 -61.05 60.15
Constant 1 70.695 0.000 0.649 -46.63 -43.93 72.00
All possible models with main effects and quadratic terms

W + T + S + W^2 + T^2 + S^2 7 20.175 0.715 0.196 -172.56 -153.66 23.57
W + T + S + W^2 + T^2 6 20.754 0.706 0.200 -171.45 -155.25 23.79
W + T + S + W^2 + S^2 6 20.875 0.705 0.201 -170.81 -154.61 23.51
W + T + S + T^2 + S^2 6 21.270 0.699 0.205 -168.75 -152.55 24.15
W + T + S + W^2 21.393 23.65
W + T + S + T^2 5 21.818 0.691 0.208 -167.95 -154.45 24.36
W + T + S + S^2 5 22.614 0.680 0.215 -164.01 -150.51 25.12
W + T + W^2 + T^2 5 24.924 0.647 0.237 -153.31 -139.81 28.19
W + T + W^2 4 25.390 0.641 0.240 -153.27 -142.47 27.68
W + T + T^2 4 25.998 0.632 0.245 -150.67 -139.87 28.33
W + S + W^2 + S^2 5 29.996 0.576 0.286 -132.94 -119.43 32.79
W + S + W^2 4 32.958 0.534 0.311 -124.58 -113.78 35.31
W + S + S^2 4 33.350 0.528 0.315 -123.28 -112.47 36.12
T + S + T^2 + S^2 5 25.466 0.640 0.243 -150.95 -137.44 28.14
T + S + T^2 4 26.418 0.626 0.249 -148.91 -138.11 28.58
T + S + S^2 4 27.207 0.615 0.257 -145.67 -134.87 29.39
W + W^2 3 41.263 0.416 0.386 -101.86 -93.76 43.98
T + T^2 3 30.579 0.567 0.286 -134.82 -126.72 32.32
S + S^2 3 49.093 0.306 0.459 -82.74 -74.64 51.72
Chapter 12, page 6
Approaches to choosing a model
There are a number of possible approaches to model selection using the measures above to compare
and select models:
• Choose several models a priori that make scientific sense. Use criteria above (like AIC and
BIC) to compare models.
• Examine all possible models involving the variables, including interactions and/or quadratic
terms or both (this is what was done with Ozone data). Generally feasible only up to 3 or 4
variables.
• Examine all main effects models only (there are 2k-1 possible models where k is the number of
variables). Consider interactions or other higher order terms only after the main effects have
been selected.
• If the number of variables is large, select a subset of the variables first, perhaps based on the
correlation of each of the variables individually with the response and/or eliminating redundant
variables (ones which are highly correlated with another variable). Then proceed with one of
the above approaches.
• If the number of variables is large, use stepwise regression to select possible models. Stepwise
regression does not require examination of all models.
Some authors do not believe in stepwise methods and other procedures that search for “good-fitting”
models because they are essentially searching through many tens or hundreds of possible models,
whether they make any scientific sense or not, and picking the “best” ones. The more models you
consider, the higher the likelihood you will select the “wrong” one. Therefore, they believe, you should
select a few models a priori that you will compare. Others argue that there is no “right” model and that
if the goal is prediction, it does not matter if the model makes physical sense. In that case, cross-
validation (discussed above) might be an important tool.
Stepwise regression
Stepwise regression methods attempt to find models minimizing or maximizing some criterion without
examining every possible model. Stepwise methods are not guaranteed to find the best model (in
terms of the criterion selected), but simply try to find the best models using a one-step at a time
approach.
The three most common types of subset selection methods employed are outlined below. The criterion
used in these descriptions is the F statistic for comparing two nested models, but stepwise methods can
also use the associated P-value, or AIC or BIC as a criterion. The latter two are now generally
preferred to the F statistic or P-value. SPSS, however, only does stepwise regression with the F
statistic or P-value.
The three types of stepwise methods are:
Forward Selection
1. Start with the model with only the constant.
Chapter 12, page 7
2. Consider all models which consist of the current model plus one more term. For each term not
in the model, calculate its “F-to-enter” (the extra sum-of-squares F statistic). Identify the
variable with the largest F-to-enter. Higher order terms (interactions, quadratic terms) are
eligible for entry only if all lower order terms involved in them are already in the model. For
example, do not consider the interaction AxB for entry unless both A and B individually are
already in the model.
3. If the largest F-to-enter is greater than 4 (or some other user-specified number), add this
variable to get a new current model and return to step 2. If the largest F-to-enter is less than the
user-specified number, stop.
The criterion could also be the P-value for the F-test, in which case a term is added only if its P-value
is less than the user-specified cutoff (usually somewhere between .05 to .20). If a variable is a
categorical variable with more than 2 levels, we add all the indicator variables for this variable at once.
Note that once a variable has been entered it cannot be removed, even if its coefficient becomes
statistically nonsignificant with the addition of other variables, which is possible.
Backward Elimination
1. Start with the model with all of the candidate variables and any higher order terms which might
be important.
2. Calculate the F-to-remove for each variable in the current model (the extra-sum-of-squares test
statistic). Identify the variable with the smallest F-to-remove. A lower order term is eligible
for removal only if all higher order terms involving that variable have already been removed.
For example, the variable A is not eligible for removal if AxB is still in the model.
3. If the smallest F-to-remove is 4 (or some other user-specified number) or less, then remove that
variable to get a new current model and return to step 2. If the smallest F-to-remove is greater
than the user-specified number, stop.
• Again, the criterion for removal could be the P-value (remove a variable only if its P-value is
greater than the cutoff).
• Backward elimination is preferred to forward selection by many users because it does not
eliminate a term unless there is good reason to (forward selection, on the other hand, does not
include a term unless there is convincing evidence to include it).
Stepwise Selection
This method is a hybrid of the previous two, involving both forward selection and backward
elimination.
1. Start with the model with only the constant.
2. Do one step of forward selection.
3. Do one step of backward elimination.
4. Repeat steps 2 and 3 until no changes occur during one cycle of steps 2 and 3.
The F-to-enter must be greater than the F-to-remove; otherwise, you could have a never-ending cycle
of a variable being entered, then eliminated. If a P-value cutoff is used, then the P for entry must be
smaller than the P for removal.
Chapter 12, page 8
Forward selection in SAT data (Case study 12.1) using P of .05 or less to enter. Preliminary analysis
presented in text suggested that log of percent taking exam (log(takers)) should be used in place of
takers.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 1112.248 12.275 90.611 .000
Log10(takers) -135.896 9.476 -.900 -14.340 .000
2 (Constant) 1060.351 15.539 68.239 .000
Log10(takers) -148.061 8.459 -.981 -17.504 .000
expend 2.900 .646 .252 4.488 .000
3 (Constant) 851.315 87.022 9.783 .000
Log10(takers) -143.383 8.272 -.950 -17.333 .000
expend 2.698 .620 .234 4.350 .000
years 12.833 5.265 .127 2.438 .019
a. Dependent Variable: sat
Excluded Variablesd
Collinearity
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
1 income .078a .997 .324 .144 .648
years .157a 2.592 .013 .354 .960
public .048a .755 .454 .109 .980
expend .252a 4.488 .000 .548 .897
rank .221a 1.028 .309 .148 .086
2 income -.057b -.783 .438 -.115 .533
years .127b 2.438 .019 .338 .943
public -.014b -.254 .801 -.037 .916
rank .101b .546 .588 .080 .084
3 income -.051c -.726 .472 -.108 .532
public .056c .938 .353 .138 .727
rank .369c 1.939 .059 .278 .067
a. Predictors in the Model: (Constant), Log10(takers)
b. Predictors in the Model: (Constant), Log10(takers), expend
c. Predictors in the Model: (Constant), Log10(takers), expend, years
d. Dependent Variable: sat
Chapter 12, page 9
These three stepwise methods will not necessarily lead to the same model. In addition, changes in the F
or P-to-enter and F or P-to-remove can result in more or fewer variables in the final model.
The SPSS stepwise regression procedure has some disadvantages. SPSS has no way of knowing that
some variables may be higher order terms that involve lower order terms. Therefore, it cannot enforce
the restriction that higher order terms cannot be added before the corresponding lower order terms
have been added, nor that lower order terms cannot be eliminated until all higher order terms involving
them have been eliminated (that is why I used the SAT data and not the Ozone data with higher order
terms in this example). SPSS also cannot treat the set of indicator variables corresponding to a
categorical variable as one set of variables that should all be added or eliminated at once.
However, SPSS does allow you to define blocks of explanatory variables which can be treated
differently in stepwise regression. Therefore, for the ozone data, where I wanted to look at adding
two-way interactions and quadratic terms, I defined Block 1 to be Wind, MaxTemp and SolarRad and
Block 2 to be all the two way interactions and quadratic terms. I also defined the “Method” for Block
1 to be “Enter”, which means these variables will be in the starting model and cannot be eliminated. I
also defined the “Method” for Block 2 to be “Stepwise”, which means these variables can be added or
eliminated. The P-to-enter and P-to-remove were the default values of .05 and .10, respectively. will
be in the starting model and cannot be eliminated.
Ozone data, case #17 deleted: stepwise regression; Wind, MaxTemp and SolarRad fored to be in the
model.
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) .114 .226 .504 .615
Wind speed (mph) -.030 .006 -.308 -4.779 .000
Maximum temperature (F) .019 .002 .519 7.830 .000
Solar radiation (langleys) .001 .000 .245 4.248 .000
2 (Constant) .518 .260 1.992 .049
Wind speed (mph) -.096 .024 -.980 -4.040 .000
Maximum temperature (F) .018 .002 .489 7.534 .000
Solar radiation (langleys) .001 .000 .247 4.429 .000
Wind^2 .003 .001 .676 2.868 .005
a. Dependent Variable: Log10(Ozone)
Excluded Variables
Collinearity
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
1 Wind^2 .676 2.868 .005 .270 .052
MaxTemp^2 1.929 2.454 .016 .233 .005
SolarRad^2 -.359 -1.453 .149 -.140 .050
WindTemp -.776 -2.049 .043 -.196 .021
WindSolar -.256 -1.258 .211 -.122 .074
TempSolar 1.198 2.371 .020 .225 .012
2 MaxTemp^2 1.431 1.789 .076 .173 .004
SolarRad^2 -.384 -1.606 .111 -.156 .050
WindTemp -.021 -.038 .969 -.004 .010
WindSolar -.117 -.572 .568 -.056 .069
TempSolar .933 1.846 .068 .178 .011
Chapter 12, page 10
One significant problem with using the F statistic or P-value is that the addition and elimination of
variables is not based on a criterion for comparing models – the final model is not necessarily
“optimal” in any sense. Why not add or eliminate variables based on one of the measures considered
in the first part of this handout, such as AIC or BIC?
The stepAIC function in the MASS library of S-Plus does stepwise regression using AIC (or BIC) as
the criterion. In forward selection, it looks for the single variable which reduces AIC the most; if no
variable reduces AIC, then it stops. In backward elimination, the goal is the same: find the variable
whose elimination reduces AIC the most. If no variable reduces AIC when it's eliminated, then stop.
In stepwise using both directions, find the addition or deletion which reduces AIC the most. Using AIC
has the additional appeal of not having to set arbitrary criteria for entering and removing variables.
The stepAIC function also handles categorical variables and interactions properly: an interaction
cannot be added unless all the variables involved in the interaction have been added; similarly, a
variable cannot be eliminated unless all higher order interactions involving that variable have been
eliminated. Unfortunately, stepAIC does not handle quadratic terms properly.
> m0 <- lm(sat~1,data=case1201)

> summary(m0)
Call: lm(formula = sat ~ 1, data = case1201)

Residuals:
Min 1Q Median 3Q Max
-158.4 -59.45 19.55 50.55 139.6
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 948.4490 10.2140 92.8574 0.0000
Residual standard error: 71.5 on 48 degrees of freedom

Multiple R-Squared: 2.465e-029
F-statistic: Inf on 0 and 48 degrees of freedom, the p-value is NA
> stepAIC(m0,~log(takers) + income + years + public + expend + rank)

Start: AIC= 419.42
sat ~ 1
Df Sum of Sq RSS AIC

+ log(takers) 1 199006.8593 46369.26 339.7760
+ rank 1 190296.7388 55079.38 348.2108
+ income 1 102026.4049 143349.72 395.0799
+ years 1 26338.2438 219037.88 415.8538
<none> NA NA 245376.12 419.4176
+ public 1 1231.7335 244144.39 421.1710
+ expend 1 385.5838 244990.54 421.3406
Step: AIC= 339.78

sat ~ log(takers)

+ expend 1 20523.4615 25845.80 313.1361
+ years 1 6363.5198 40005.74 334.5429
<none> NA NA 46369.26 339.7760
+ rank 1 871.1345 45498.13 340.8467
+ income 1 785.0507 45584.21 340.9393
+ public 1 448.9059 45920.36 341.2993
Chapter 12, page 11
- log(takers) 1 199006.8593 245376.12 419.4176
Step: AIC= 313.14

sat ~ log(takers) + expend

+ years 1 1248.184463 24597.62 312.7106
+ rank 1 1053.599508 24792.20 313.0967
<none> NA NA 25845.80 313.1361
+ income 1 53.329409 25792.47 315.0349
+ public 1 1.292761 25844.51 315.1336
- expend 1 20523.461462 46369.26 339.7760
- log(takers) 1 219144.737003 244990.54 421.3406
Step: AIC= 312.71

sat ~ log(takers) + expend + years

+ rank 1 2675.51301 21922.10 309.0681
<none> NA NA 24597.62 312.7106
- years 1 1248.18446 25845.80 313.1361
+ public 1 287.82166 24309.80 314.1339
+ income 1 19.19044 24578.43 314.6724
- expend 1 15408.12616 40005.74 334.5429
- log(takers) 1 190946.97826 215544.60 417.0660
Step: AIC= 309.07

sat ~ log(takers) + expend + years + rank

<none> NA NA 21922.10 309.0681
+ income 1 505.3684 21416.74 309.9253
+ public 1 185.0259 21737.08 310.6528
- rank 1 2675.5130 24597.62 312.7106
- years 1 2870.0980 24792.20 313.0967
- log(takers) 1 5094.3405 27016.44 317.3067
- expend 1 13619.6111 35541.72 330.7455
Call:
lm(formula = sat ~ log(takers) + expend + years + rank, data = case1201)
Coefficients:
(Intercept) log(takers) expend years rank
399.1147 -38.1005 3.995661 13.14731 4.400277
Degrees of freedom: 49 total; 44 residual

Residual standard error: 22.32106
Stepwise regression starting with the main effects model and allowing all two-way interactions.
> stepAIC(mfull,list(upper=~.^2,lower=~1))
Start: AIC= 311.88
sat ~ log(takers) + income + years + public + expend + rank

+ years:public 1 5027.807692 16368.93 300.7547
+ log(takers):public 1 3617.792915 17778.95 304.8035
+ income:public 1 1977.822427 19418.92 309.1269
+ income:years 1 1804.755461 19591.98 309.5617
- public 1 19.997447 21416.74 309.9253
+ public:rank 1 1452.863422 19943.87 310.4340
Chapter 12, page 12
- income 1 340.339906 21737.08 310.6528
+ log(takers):years 1 1197.663996 20199.07 311.0570
+ log(takers):income 1 1194.412626 20202.33 311.0649
+ income:rank 1 1046.006240 20350.73 311.4235
<none> NA NA 21396.74 311.8795
+ years:rank 1 485.951497 20910.79 312.7538
+ log(takers):expend 1 447.951860 20948.79 312.8428
+ expend:rank 1 323.487437 21073.25 313.1330
+ years:expend 1 93.688852 21303.05 313.6645
+ public:expend 1 51.522079 21345.22 313.7614
+ log(takers):rank 1 44.248267 21352.49 313.7781
+ income:expend 1 9.445369 21387.29 313.8579
- log(takers) 1 2150.004922 23546.74 314.5712
- years 1 2531.615348 23928.35 315.3590
- rank 1 2679.046601 24075.78 315.6599
- expend 1 10964.372896 32361.11 330.1517
Step: AIC= 300.75

sat ~ log(takers) + income + years + public + expend + rank + years:public

- income 1 193.844212 16562.77 299.3315
+ income:rank 1 869.194138 15499.74 300.0811
<none> NA NA 16368.93 300.7547
+ public:rank 1 587.095100 15781.84 300.9649
+ expend:rank 1 513.555766 15855.37 301.1927
+ income:public 1 119.187306 16249.74 302.3966
+ log(takers):rank 1 96.896741 16272.03 302.4638
+ income:expend 1 16.336369 16352.59 302.7058
+ income:years 1 10.664688 16358.27 302.7227
+ log(takers):years 1 9.199796 16359.73 302.7271
+ public:expend 1 4.688396 16364.24 302.7406
+ years:rank 1 4.080195 16364.85 302.7425
+ years:expend 1 3.618119 16365.31 302.7439
- log(takers) 1 2319.536747 18688.47 305.2482
- rank 1 2533.477921 18902.41 305.8060
- years:public 1 5027.807692 21396.74 311.8795
- expend 1 13670.486641 30039.42 328.5038
Step: AIC= 299.33

sat ~ log(takers) + years + public + expend + rank + years:public

+ log(takers):public 1 7.036022e+002 15859.17 299.2045
<none> NA NA 16562.77 299.3315
+ expend:rank 1 6.439627e+002 15918.81 299.3884
+ log(takers):expend 1 6.224671e+002 15940.31 299.4545
+ public:rank 1 4.726451e+002 16090.13 299.9129
+ income 1 1.938442e+002 16368.93 300.7547
+ public:expend 1 3.375877e+000 16559.40 301.3216
+ log(takers):rank 1 1.935137e+000 16560.84 301.3258
+ years:expend 1 1.528711e+000 16561.25 301.3270
+ years:rank 1 8.679866e-001 16561.91 301.3290
+ log(takers):years 1 5.202697e-002 16562.72 301.3314
- rank 1 2.456165e+003 19018.94 304.1071
- log(takers) 1 2.985168e+003 19547.94 305.4514
- years:public 1 5.174303e+003 21737.08 310.6528
- expend 1 1.615704e+004 32719.81 330.6919
Chapter 12, page 13
Step: AIC= 299.2
sat ~ log(takers) + years + public + expend + rank + years:public +
log(takers):public

<none> NA NA 15859.17 299.2045
+ expend:rank 1 602.5956096 15256.58 299.3063
- log(takers):public 1 703.6021875 16562.77 299.3315
+ income 1 413.5731794 15445.60 299.9097
+ years:rank 1 141.9104795 15717.26 300.7640
+ log(takers):years 1 102.4165565 15756.76 300.8870
+ public:rank 1 54.7708444 15804.40 301.0350
+ public:expend 1 39.7984090 15819.37 301.0813
+ log(takers):rank 1 6.6716882 15852.50 301.1839
+ years:expend 1 0.8878288 15858.28 301.2017
- years:public 1 2725.3253513 18584.50 304.9749
- rank 1 3086.8696076 18946.04 305.9190
- expend 1 12860.9171063 28720.09 326.3031
Call:
lm(formula = sat ~ log(takers) + years + public + expend + rank + years:public +
log(takers):public, data = case1201)
Coefficients:
(Intercept) log(takers) years public expend rank years:public
2590.556 19.42852 -134.2278 -26.43972 4.347684 5.991911 1.661026
log(takers):public
-0.5848999

Stepwise using BIC

> stepAIC(mfull,list(upper=~.^2,lower=~1),k=log(49))
Start: AIC= 325.12
sat ~ log(takers) + income + years + public + expend + rank

+ years:public 1 5027.807692 16368.93 315.8892
- public 1 19.997447 21416.74 321.2762
- income 1 340.339906 21737.08 322.0037
+ income:public 1 1977.822427 19418.92 324.2615
+ income:years 1 1804.755461 19591.98 324.6963
<none> NA NA 21396.74 325.1222
+ public:rank 1 1452.863422 19943.87 325.5686
- log(takers) 1 2150.004922 23546.74 325.9221
+ log(takers):years 1 1197.663996 20199.07 326.1916
+ income:rank 1 1046.006240 20350.73 326.5581
- years 1 2531.615348 23928.35 326.7099
- rank 1 2679.046601 24075.78 327.0109
+ years:rank 1 485.951497 20910.79 327.8884
+ expend:rank 1 323.487437 21073.25 328.2676
+ years:expend 1 93.688852 21303.05 328.7990
+ public:expend 1 51.522079 21345.22 328.8959
+ log(takers):rank 1 44.248267 21352.49 328.9126
Chapter 12, page 14
+ income:expend 1 9.445369 21387.29 328.9924
- expend 1 10964.372896 32361.11 341.5027
Step: AIC= 315.89

sat ~ log(takers) + income + years + public + expend + rank + years:public

- income 1 193.844212 16562.77 312.5743
<none> NA NA 16368.93 315.8892
+ income:rank 1 869.194138 15499.74 317.1075
+ public:rank 1 587.095100 15781.84 317.9913
+ expend:rank 1 513.555766 15855.37 318.2191
- log(takers) 1 2319.536747 18688.47 318.4910
- rank 1 2533.477921 18902.41 319.0487
+ income:public 1 119.187306 16249.74 319.4230
+ log(takers):rank 1 96.896741 16272.03 319.4901
+ income:expend 1 16.336369 16352.59 319.7321
+ income:years 1 10.664688 16358.27 319.7491
+ log(takers):years 1 9.199796 16359.73 319.7535
+ public:expend 1 4.688396 16364.24 319.7670
+ years:rank 1 4.080195 16364.85 319.7688
+ years:expend 1 3.618119 16365.31 319.7702
- years:public 1 5027.807692 21396.74 325.1222
- expend 1 13670.486641 30039.42 341.7466
Step: AIC= 312.57

sat ~ log(takers) + years + public + expend + rank + years:public

<none> NA NA 16562.77 312.5743
+ log(takers):public 1 7.036022e+002 15859.17 314.3390
+ expend:rank 1 6.439627e+002 15918.81 314.5230
+ log(takers):expend 1 6.224671e+002 15940.31 314.5891
+ public:rank 1 4.726451e+002 16090.13 315.0475
- rank 1 2.456165e+003 19018.94 315.4581
+ income 1 1.938442e+002 16368.93 315.8892
+ public:expend 1 3.375877e+000 16559.40 316.4561
+ log(takers):rank 1 1.935137e+000 16560.84 316.4604
+ years:expend 1 1.528711e+000 16561.25 316.4616
+ years:rank 1 8.679866e-001 16561.91 316.4635
+ log(takers):years 1 5.202697e-002 16562.72 316.4659
- log(takers) 1 2.985168e+003 19547.94 316.8024
- years:public 1 5.174303e+003 21737.08 322.0037
- expend 1 1.615704e+004 32719.81 342.0428
Call:
lm(formula = sat ~ log(takers) + years + public + expend + rank + years:public,
data = case1201)
Coefficients:
(Intercept) log(takers) years public expend rank years:public
3274.012 -34.05226 -164.8157 -33.8661 4.651103 5.040749 2.042115

Chapter 12, page 15
> m1 <- lm(log(ozone)~wind+temp+solar,data=Ozone)
> summary(m1)
Call: lm(formula = log(ozone) ~ wind + temp + solar, data = Ozone)

Residuals:
Min 1Q Median 3Q Max
-1.0203 -0.31515 -0.0093072 0.32296 1.1222
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 0.26236 0.52033 0.50423 0.61515
wind -0.06931 0.01450 -4.77854 0.00001
temp 0.04445 0.00568 7.82953 0.00000
solar 0.00219 0.00052 4.24768 0.00005
Residual standard error: 0.46651 on 106 degrees of freedom

Multiple R-Squared: 0.67369
F-statistic: 72.947 on 3 and 106 degrees of freedom, the p-value is 0
Stepwise regression using AIC: start with the main effects model and allow all two-way interactions
and quadratic terms; “lower” specifies the lowest allowable model, which is the main effects model.
> stepAIC(m1,list(upper=~.^2+wind^2+temp^2+solar^2,lower=m1))
Start: AIC= -163.82
log(ozone) ~ wind + temp + solar

+ I(wind^2) 1 1.67592921 21.392844 -170.11663
+ I(temp^2) 1 1.25107360 21.817700 -167.95347
+ temp:solar 1 1.17208023 21.896693 -167.55592
+ wind:temp 1 0.88700820 22.181765 -166.13308
+ I(solar^2) 1 0.45453682 22.614236 -164.00908
<none> NA NA 23.068773 -163.82005
+ wind:solar 1 0.34252408 22.726249 -163.46557
Step: AIC= -170.12

log(ozone) ~ wind + temp + solar + I(wind^2)

+ temp:solar 1 0.67869427353 20.714150 -171.66297
+ I(temp^2) 1 0.63901036417 20.753834 -171.45243
+ I(solar^2) 1 0.51800644492 20.874838 -170.81295
<none> NA NA 21.392844 -170.11663
+ wind:solar 1 0.06713886979 21.325705 -168.46239
+ wind:temp 1 0.00030265311 21.392541 -168.11818
- I(wind^2) 1 1.67592920662 23.068773 -163.82005
Step: AIC= -171.66

log(ozone) ~ wind + temp + solar + I(wind^2) + temp:solar

+ I(solar^2) 1 0.7474978978 19.966652 -173.70586
<none> NA NA 20.714150 -171.66297
+ I(temp^2) 1 0.2793246140 20.434825 -171.15638
- temp:solar 1 0.6786942735 21.392844 -170.11663
+ wind:temp 1 0.0536327944 20.660517 -169.94815
+ wind:solar 1 0.0015866564 20.712563 -169.67139
- I(wind^2) 1 1.1825432544 21.896693 -167.55592
Step: AIC= -173.71

Chapter 12, page 16
log(ozone) ~ wind + temp + solar + I(wind^2) + I(solar^2) + temp:solar

<none> NA NA 19.966652 -173.70586
+ I(temp^2) 1 0.2687418912 19.697910 -173.19646
+ wind:temp 1 0.0981394822 19.868512 -172.24786
+ wind:solar 1 0.0051540289 19.961498 -171.73426
- I(solar^2) 1 0.7474978978 20.714150 -171.66297
- temp:solar 1 0.9081857264 20.874838 -170.81295
- I(wind^2) 1 1.1811295810 21.147781 -169.38399
Call:
lm(formula = log(ozone) ~ wind + temp + solar + I(wind^2) + I(solar^2) + temp:
solar, data = Ozone)
Coefficients:
(Intercept) wind temp solar I(wind^2)
2.7000915 -0.19764083 0.016191722 -0.0024656831 0.0059294158
I(solar^2) temp:solar
-0.000012334129 0.0001202964

Stepwise using BIC (k is the multiplier on p; the default value is k=2);
> stepAIC(m1,list(upper=~.^2+wind^2+temp^2+solar^2,lower=m1),k=log(110))
Start: AIC= -153.02
log(ozone) ~ wind + temp + solar

+ I(wind^2) 1 1.67592921 21.392844 -156.61423
+ I(temp^2) 1 1.25107360 21.817700 -154.45107
+ temp:solar 1 1.17208023 21.896693 -154.05352
<none> NA NA 23.068773 -153.01813
+ wind:temp 1 0.88700820 22.181765 -152.63068
+ I(solar^2) 1 0.45453682 22.614236 -150.50668
+ wind:solar 1 0.34252408 22.726249 -149.96317
Step: AIC= -156.61

log(ozone) ~ wind + temp + solar + I(wind^2)

<none> NA NA 21.392844 -156.61423
+ temp:solar 1 0.67869427353 20.714150 -155.46008
+ I(temp^2) 1 0.63901036417 20.753834 -155.24955
+ I(solar^2) 1 0.51800644492 20.874838 -154.61006
- I(wind^2) 1 1.67592920662 23.068773 -153.01813
+ wind:solar 1 0.06713886979 21.325705 -152.25951
+ wind:temp 1 0.00030265311 21.392541 -151.91530
Call:
lm(formula = log(ozone) ~ wind + temp + solar + I(wind^2), data = Ozone)
Coefficients:
(Intercept) wind temp solar I(wind^2)
1.1932358 -0.22081888 0.041915712 0.0022096915 0.0068982286

Chapter 12, page 17
Bayesion posterior probabilities based on equal priors
Model p SSE R2 MSE AIC BIC PRESS EXP(-BIC) Post. Prob

W + T + S + W:T + W:S + T:S 7 21.534 0.695 0.209 -165.39 -146.49 25.62 4.16676E+63 0.00002
W + T + S + W:T + W:S 6 22.152 0.687 0.213 -164.28 -148.08 25.56 2.04328E+64 0.00012
W + T + S + W:T + T:S 6 21.537 0.695 0.207 -167.38 -151.17 24.51 4.49052E+65 0.00254
W + T + S + W:S + T:S 6 21.867 0.691 0.210 -165.70 -149.50 25.44 8.45328E+64 0.00048
W + T + S + W:T 5 22.182 0.686 0.211 -166.13 -152.63 24.55 1.93360E+66 0.01093
W + T + S + W:S 5 22.726 0.679 0.216 -163.47 -149.96 25.63 1.33906E+65 0.00076
W + T + S + T:S 5 21.897 0.690 0.209 -167.56 -154.05 24.54 7.99954E+66 0.04522
W+T+S 4 23.069 0.674 0.218 -163.82 -153.02 25.20 2.85589E+66 0.01614
W + T + W:T 4 26.372 0.627 0.249 -149.10 -138.30 28.54 1.15592E+60 0.00000
W+T 3 26.995 0.618 0.252 -148.53 -140.43 28.78 9.72689E+60 0.00000
W + S + W:S 4 36.121 0.489 0.341 -114.50 -103.69 39.39 1.07645E+45 0.00000
W+S 3 36.410 0.485 0.340 -115.62 -107.52 38.70 4.95841E+46 0.00000
T + S + T:S 4 27.029 0.618 0.255 -146.39 -135.59 29.22 7.69111E+58 0.00000
T+S 3 28.038 0.603 0.262 -144.36 -136.26 29.68 1.50302E+59 0.00000
W 2 44.985 0.364 0.417 -94.36 -88.95 46.84 4.27065E+38 0.00000
T 2 31.908 0.549 0.295 -132.14 -126.74 32.98 1.10276E+55 0.00000
S 2 57.974 0.180 0.537 -66.45 -61.05 60.15 3.26346E+26 0.00000
Constant 1 70.695 0.000 0.649 -46.63 -43.93 72.00 1.19828E+19 0.00000
W + T + S + W^2 + T^2 + S^2 7 20.175 0.715 0.196 -172.56 -153.66 23.57 5.41614E+66 0.03062
W + T + S + W^2 + T^2 6 20.754 0.706 0.200 -171.45 -155.25 23.79 2.65594E+67 0.15014
W + T + S + W^2 + S^2 6 20.875 0.705 0.201 -170.81 -154.61 23.51 1.40046E+67 0.07917
W + T + S + T^2 + S^2 6 21.270 0.699 0.205 -168.75 -152.55 24.15 1.78494E+66 0.01009
W + T + S + W^2 5 21.393 0.697 0.204 -170.12 -156.61 23.65 1.03481E+68 0.58499
W + T + S + T^2 5 21.818 0.691 0.208 -167.95 -154.45 24.36 1.19339E+67 0.06746
W + T + S + S^2 5 22.614 0.680 0.215 -164.01 -150.51 25.12 2.32093E+65 0.00131
W + T + W^2 + T^2 5 24.924 0.647 0.237 -153.31 -139.81 28.19 5.23253E+60 0.00000
W + T + W^2 4 25.390 0.641 0.240 -153.27 -142.47 27.68 7.48057E+61 0.00000
W + T + T^2 4 25.998 0.632 0.245 -150.67 -139.87 28.33 5.55609E+60 0.00000
W + S + W^2 + S^2 5 29.996 0.576 0.286 -132.94 -119.43 32.79 7.37547E+51 0.00000
W + S + W^2 4 32.958 0.534 0.311 -124.58 -113.78 35.31 2.59434E+49 0.00000
W + S + S^2 4 33.350 0.528 0.315 -123.28 -112.47 36.12 7.00004E+48 0.00000
T + S + T^2 + S^2 5 25.466 0.640 0.243 -150.95 -137.44 28.14 4.89140E+59 0.00000
T + S + T^2 4 26.418 0.626 0.249 -148.91 -138.11 28.58 9.55897E+59 0.00000
T + S + S^2 4 27.207 0.615 0.257 -145.67 -134.87 29.39 3.74366E+58 0.00000
W + W^2 3 41.263 0.416 0.386 -101.86 -93.76 43.98 5.24144E+40 0.00000
T + T^2 3 30.579 0.567 0.286 -134.82 -126.72 32.32 1.08093E+55 0.00000
S + S^2 3 49.093 0.306 0.459 -82.74 -74.64 51.72 2.60459E+32 0.00000
1.76893E+68 1.00000
0.207
Example: Data were collected for each of the 50 states

on the average SAT score and a number of other variables. The reason for
collecting the other variables is to help explain the discrepancy between
states' SAT averages. For example, many midwestern states (Montana included)
have much higher SAT scores than other regions. A closer look reveals that
Chapter 12, page 18
this difference is due primarily to the fact that only the better students
in these states actually take the SAT exam. Hence it is important to examine
what factors affect the average SAT scores for each state. Some of the
variables considered as ``explanatory'' variables were:
\begin{enumerate}
\item Percentage of eligible students who took the exam (TAKERS)
\item Median income of families of test-takers (INCOME)
\item Average number of years of study in social science, natural science,
and humanities among the test-takers (YEARS)
\item Percentage of test-takers in public schools (PUBLIC)
\item State expenditures in hundreds of dollars per student (EXPEND)
\item Median percentile ranking of test-takers within their schools (RANK).
\end{enumerate}
Before fitting any models, it is a good idea to examine
the relationships between all pairs of variables. A scatterplot
matrix and a correlation matrix are very useful. The variable
TAKERS appears to have a nonlinear relationship with SAT score so
we may want to consider a transformation of takers: log of TAKERS
appears to work well. There also appear to be a couple of
outliers; Alaska is a particularly extreme outlier on state
expenditures (EXPEND).
We can try For this data set, there are other possible
objectives, besides finding good models for predicting SAT score.
For example:
\begin{quotation}
{\em After accounting for the percentage of students who
took the test (Log(TAKERS)) and the median class rank of the
test-takers (RANK), which variables are important predictors of
state SAT scores?} \end{quotation}
\begin{quotation} {\em
After accounting for the percentage of students who took the test
(TAKERS) and the median class rank of the test-takers (RANK),
which states performed best for the amount of money they spend?}
\end{quotation}
The first question might be examined by looking at partial

correlations between SAT score and other variables after adjusting
for TAKERS and RANK. Added variable plots and partial residual
plots (available in S-Plus on the regression menu) allow us to
look at this visually (these plots should be obtained by adding
each variable separately to the model with TAKERS and RANK.
The second question could be answered in this way. First, fit the
regression model involving the TAKERS and RANK variables. What do
the resulting residuals tell us? The residuals are the difference
in the observed SAT scores and those predicted by the variables
TAKERS and RANK. A positive residual means the SAT score is
higher than predicted and a negative residual means it is lower
Chapter 12, page 19
than predicted based on these 2 variables. The states could then
be ranked based on these residuals.
\end{document}
\underline{Note:} Both AIC and BIC are available in

S-Plus in the MASS library. The AIC of any fitted linear model
can be obtained by the command \textbf{extractAIC(m)} and the BIC
by \textbf{extractAIC(m,k=log(n))} where m is a fitted model and
$n$ is the sample size. Stepwise regression using AIC or BIC is
obtained from the \textbf{stepAIC} command which is illustrated on
a separate handout.
Example
Ozone data without case 17. n = 110 cases. Dependent variable is log10(ozone).
Chapter 12, page 20
All possible models with main effects and two-way interactions
W + T + S + W:T + W:S + T:S 7 21.534 0.695 0.209 -165.39 -146.49 25.62
W + T + S + W:T + W:S 6 22.152 0.687 0.213 -164.28 -148.08 25.56
W + T + S + W:T + T:S 6 21.537 0.695 0.207 -167.38 -151.17 24.51
W + T + S + W:S + T:S 6 21.867 0.691 0.210 -165.70 -149.50 25.44
W + T + S + W:T 5 22.182 0.686 0.211 -166.13 -152.63 24.55
W + T + S + W:S 5 22.726 0.679 0.216 -163.47 -149.96 25.63
W + T + S + T:S 5 21.897 0.690 0.209 -167.56 -154.05 24.54
W+T+S 4 23.069 0.674 0.218 -163.82 -153.02 25.20
W + T + W:T 4 26.372 0.627 0.249 -149.10 -138.30 28.54
W+T 3 26.995 0.618 0.252 -148.53 -140.43 28.78
W + S + W:S 4 36.121 0.489 0.341 -114.50 -103.69 39.39
W+S 3 36.410 0.485 0.340 -115.62 -107.52 38.70
T + S + T:S 4 27.029 0.618 0.255 -146.39 -135.59 29.22
T+S 3 28.038 0.603 0.262 -144.36 -136.26 29.68
W 2 44.985 0.364 0.417 -94.36 -88.95 46.84
T 2 31.908 0.549 0.295 -132.14 -126.74 32.98
S 2 57.974 0.180 0.537 -66.45 -61.05 60.15
Constant 1 70.695 0.000 0.649 -46.63 -43.93 72.00
All possible models with main effects and quadratic terms

W + T + S + W^2 + T^2 + S^2 7 20.175 0.715 0.196 -172.56 -153.66 23.57
W + T + S + W^2 + T^2 6 20.754 0.706 0.200 -171.45 -155.25 23.79
W + T + S + W^2 + S^2 6 20.875 0.705 0.201 -170.81 -154.61 23.51
W + T + S + T^2 + S^2 6 21.270 0.699 0.205 -168.75 -152.55 24.15
W + T + S + W^2 5 21.393 0.697 0.204 -170.12 -156.61 23.65
W + T + S + T^2 5 21.818 0.691 0.208 -167.95 -154.45 24.36
W + T + S + S^2 5 22.614 0.680 0.215 -164.01 -150.51 25.12
W + T + W^2 + T^2 5 24.924 0.647 0.237 -153.31 -139.81 28.19
W + T + W^2 4 25.390 0.641 0.240 -153.27 -142.47 27.68
W + T + T^2 4 25.998 0.632 0.245 -150.67 -139.87 28.33
W + S + W^2 + S^2 5 29.996 0.576 0.286 -132.94 -119.43 32.79
W + S + W^2 4 32.958 0.534 0.311 -124.58 -113.78 35.31
W + S + S^2 4 33.350 0.528 0.315 -123.28 -112.47 36.12
T + S + T^2 + S^2 5 25.466 0.640 0.243 -150.95 -137.44 28.14
T + S + T^2 4 26.418 0.626 0.249 -148.91 -138.11 28.58
T + S + S^2 4 27.207 0.615 0.257 -145.67 -134.87 29.39
W + W^2 3 41.263 0.416 0.386 -101.86 -93.76 43.98
T + T^2 3 30.579 0.567 0.286 -134.82 -126.72 32.32
S + S^2 3 49.093 0.306 0.459 -82.74 -74.64 51.72

Chapter 12

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 12

Uploaded by

Copyright:

Available Formats

Chapter 12, page 1

Math 445 Chapter 12: Strategies for Variable Selection

Model Development Steps

What is the response variable?

• The model with the smallest AIC value is considered best.

• The model with the smallest PRESS statistic is considered “best.”

All possible models with main effects and two-way interactions

All possible models with main effects and quadratic terms

The three types of stepwise methods are:

> m0 <- lm(sat~1,data=case1201)

Call: lm(formula = sat ~ 1, data = case1201)

Residual standard error: 71.5 on 48 degrees of freedom

> stepAIC(m0,~log(takers) + income + years + public + expend + rank)

Df Sum of Sq RSS AIC

Step: AIC= 339.78

Df Sum of Sq RSS AIC

Step: AIC= 313.14

Df Sum of Sq RSS AIC

Step: AIC= 312.71

Df Sum of Sq RSS AIC

Step: AIC= 309.07

Df Sum of Sq RSS AIC

Degrees of freedom: 49 total; 44 residual

Df Sum of Sq RSS AIC

Step: AIC= 300.75

Df Sum of Sq RSS AIC

Step: AIC= 299.33

Df Sum of Sq RSS AIC

Df Sum of Sq RSS AIC

Degrees of freedom: 49 total; 41 residual

Stepwise using BIC

Df Sum of Sq RSS AIC

Step: AIC= 315.89

Df Sum of Sq RSS AIC

Step: AIC= 312.57

Df Sum of Sq RSS AIC

Degrees of freedom: 49 total; 42 residual

Call: lm(formula = log(ozone) ~ wind + temp + solar, data = Ozone)

Residual standard error: 0.46651 on 106 degrees of freedom

Df Sum of Sq RSS AIC

Step: AIC= -170.12

Df Sum of Sq RSS AIC

Step: AIC= -171.66

Df Sum of Sq RSS AIC

Step: AIC= -173.71

Df Sum of Sq RSS AIC

Degrees of freedom: 110 total; 103 residual

Stepwise using BIC (k is the multiplier on p; the default value is k=2);

Df Sum of Sq RSS AIC

Step: AIC= -156.61

Df Sum of Sq RSS AIC

Degrees of freedom: 110 total; 105 residual

Bayesion posterior probabilities based on equal priors

Model p SSE R2 MSE AIC BIC PRESS EXP(-BIC) Post. Prob

Example: Data were collected for each of the 50 states

The first question might be examined by looking at partial

\underline{Note:} Both AIC and BIC are available in

All possible models with main effects and quadratic terms

You might also like