You are on page 1of 6

Model Selection in R

We will work again with the data from Problem 6.9, Grocery Retailer. Recall that we
formed a data table named Grocery coniting of the !ariable Hours, Cases,
Costs, and Holiday. We ran a f"ll linear model which we named Retailer
in!ol!ing Hours a the re#one !ariable and Cases, Costs and Holiday a three
#redictor !ariable. $or thi e%am#le, we can ha!e a "b&model which incl"de only X1,
or only X2, or only X3, or '"t X1 and X2, or '"t X1 and X3, or '"t X2 and X3, or all three
!ariable. We can alo com#licate matter by incl"ding #ower of thee !ariable
(a##ro#riately centered), or interaction like X1X2, or other tranformation (*"are root,
log, etc.), b"t then we wo"ld ha!e to obtain another f"ll model which incl"de all thee
!ariable.
+"r firt model election tool i the R f"nction leaps(). ,hi i fo"nd in the #ackage
leaps, which m"t firt be loaded-
> library(leaps)
.f R can/t find the #ackage yo" will need to go to the R re#oitory !ia the Packages
men" and the Install package(s) o#tion to download it and intall it. ,he
leaps() f"nction will earch for the bet "bet of yo"r #redictor "ing whiche!er
criterion yo" deignate. ,o "e thi f"nction, we need to #ro!ide it with a matri%
coniting of the #redictor !ariable, a !ector coniting of the re#one !ariable, the
name of the #redictor !ariable, and the criterion to "e. $or thi e%am#le, the #redictor
!ariable are the econd thro"gh fo"rth col"mn of the table Grocery, i.e.,
Grocery[,2!", and the re#one !ariable i the firt col"mn, i.e., Grocery[,#".
,he name of the #redictor are contained in the econd thro"gh fo"rth element of the
!ector na$es(Grocery), i.e., na$es(Grocery)[2!". S"##oe we "e the
Mallow/ Cp criterion for model election. ,hen the R command to "e i-
> leaps( %&Grocery[,2!", y&Grocery[,#",
na$es&na$es(Grocery)[2!", $et'od&(Cp()
We get the following o"t#"t-
,he firt #art of the o"t#"t, denoted )*'ic', lit e!en #oible "b&model in e!en
row. ,he firt col"mn indicate the n"mber of #redictor in the "b&model for each row.
,he !ariable in each "b&model are thoe deignated ,R01 in each row. $or e%am#le,
the firt "b&model incl"de '"t the !ariable Holiday (note that Cases and Costs are
$23S1 and Holiday i ,R01). ,he ne%t two #art of the o"t#"t don/t gi!e " any new
information, b"t the lat #art, deignated )Cp, gi!e " the !al"e of the Mallow/ Cp
criterion for each "b&model, in the ame order. ,he bet "b&model i that for which the
Cp !al"e i cloet to p (the n"mber of #arameter in the model, incl"ding the interce#t).
$or the f"ll model, we alway ha!e Cp 4 p. ,he idea i to find a "itable red"ced model,
if #oible. 5ere the bet red"ced model i the third one, coniting of Cases and
Holiday, for which Cp 4 6.7689:; and p = 7.
.ntead of "ing the Mallow/ Cp criterion, we can "e the R
6
or the ad'"ted R
6
criteria.
<"t "e $et'od&(r2( or $et'od&(ad+r2(, re#ecti!ely, in #lace of
$et'od&(Cp( a the lat f"nction arg"ment. ,he highet !al"e for either criteria
indicate the bet "b&model.
2 le&attracti!e alternati!e to "ing the leaps() f"nction wo"ld be to make a lit of
each "b&model yo" wih to conider, then fit a linear model for each "b&model
indi!id"ally to obtain the election criteria for that model. +ne way to make thi eaier
i to tart with the f"ll model, the "e the update() f"nction to remo!e and=or add
#redictor te#&by&te#. $or intance, we co"ld tart with o"r f"ll model Retailer and
delete '"t one !ariable, Costs. ,hen we fit a new model named ,e*-od with only the
remaining #redictor. ,he R command i
> ,e*-od ./ update( Retailer, 010 2 Costs )
,hen if we want to modify ,e*-od o that we obtain another new model with both
Costs and Cases deleted, the R command wo"ld be-
> ,e*-od ./ update( ,e*-od, 010 2 Cases)
.f yo" then want to add Costs back into the model (b"t not Cases), the R command
wo"ld be-
> ,e*-od ./ update( ,e*-od, 010 3 Costs)
2t each te# yo" can obtain the !al"e for Rp
6
and Ra,p
6
from the model "mmary, which
are gi!en a M"lti#le R&S*"ared and 2d'"ted R&*"ared, re#ecti!ely. So record
thoe ne%t to the corre#onding "bet. >"t to obtain the Mallow? Cp criterion for each
"b&model, yo" need yo"r calc"lator. @o" alo need the !al"e for MSE when yo" fit the
f"ll model with all the #otential #redictor !ariable (in o"r e%am#le, that wo"ld be MSE 4
69876), and the !al"e of n (the n"mber of ober!ation in the data et, which i 86 in o"r
e%am#le). ,hen find the !al"e of SSE for the "b&model in the 2A+B2 table for that
"b&model. ,hat will be "btit"ted for SSEp in form"la (9.9) on #.78:. Ci!ide thi by
the MSE from the f"ll model, then "btract from the re"lt the !al"e of n D 6p, where p i
one more than the n"mber of !ariable in yo"r "b&model. ,he re"lt i the !al"e of Cp
for that "b&model.
,o obtain the AICp criterion for any "b&model, yo" will ha!e to obtain a linear fit
in!ol!ing '"t the #redictor for that "b&model, a decribed abo!e. ,hen ty#e
> e%tract4IC($odel)
b"t #"t the name yo" ha!e gi!en the "b&model in #lace of $odel. 3ikewie, to obtain
the SBCp criterion (alo called BICp), ty#e
> e%tract4IC($odel, k & log(n))
b"t #"t the !al"e of n for yo"r data et in #lace of n, and #"t the name of yo"r "b&model
in #lace of $odel. We wo"ld chooe the "b&model that minimiEe thee two !al"e.
,o obtain the PRESSp criterion for each "b&model, ty#e
> su$(($odel)residuals5(#/'at6alues($odel)))72)
b"t #"t the name of yo"r "b&model in #lace of $odel. >e caref"l with the
#arenthee. We wo"ld chooe the "b&model that minimiEe thi !al"e.
Since there are 6
P D F
"bet to conider among P D F #otential #redictor !ariable, the
abo!e #roce can become !ery tedio" and time con"ming when there are fo"r or more
#redictor. +ne way aro"nd thi in R i to "e te#wie regreion. @o" can do forward
te#wie regreion, backward te#wie regreion, or a combination of both, b"t R "e
the AICp criterion at each te# intead of the criteria decribed in the te%t. ,o "e thi
#roced"re in the forward direction, yo" firt m"t fit a bae model (with one #redictor)
and a f"ll model (with all the #redictor yo" wih to conider). ,o fit a bae model in o"r
e%am#le, we will chooe Holiday a o"r #redictor, ince we are certain thi !ariable
ho"ld be incl"ded in o"r final model-
8ase ./ l$( Hours 1 Holiday, data&Grocery )
We will "e Retailer a o"r f"ll model. ,hen to im#lement forward te#wie
regreion, ty#e
> step(8ase, scope & list( upper&Retailer, lo*er&1# ),
direction & (9or*ard(, trace&:4;<=)
b"t "e the name of your bae model and f"ll model, re#ecti!ely, in #lace of 8ase and
Retailer. ,he o"t#"t for o"r e%am#le look like-
,he forward te#wie regreion #roced"re identified the model which incl"ded the two
#redictor 5oliday and Gae, b"t not Got, a the one which #rod"ced the lowet !al"e
of 2.G.
,o "e the ame #roced"re in the backward direction, the command i m"ch im#ler,
ince the f"ll model i the bae model. We '"t ty#e-
> step( Retailer, direction & (back*ard(, trace&:4;<= )
b"t "e the name of your f"ll model, with all #otential #redictor !ariable incl"ded, in
#lace of Retailer. ,he o"t#"t for o"r e%am#le look like-
,he backward elimination #roced"re alo identified the bet model a one which incl"de
only Cases and Holiday, not Costs. @o" can alo r"n both #roced"re in "cceion
by ty#ing (bot'( in #lace of (9or*ard( after direction& in the forward
te#wie regreion command, i.e.,
step(8ase, scope & list( upper&Retailer, lo*er&1# ),
direction & (bot'(, trace&:4;<=)
,hi i #robably the bet way to go. .f yo" #refer to ee the re"lt at each te#,
regardle of direction, change the lat etting to trace&>R?=.
.ntead of "ing the 2.G criterion, we can #erform a backward te#wie regreion "ing
P&!al"e to delete #redictor one&at&a&time. Ghooe a ignificance le!el before yo"
begin. ,hen tart with the f"ll model, look at the corre#onding model "mmary, and then
identify the #redictor (if any) which ha the larget P&!al"e (for the t tet) abo!e yo"r &
le!el. ,hen fit a new linear model with that #redictor deleted ("e the update()
f"nction to make thi eaier). Aow look at the model "mmary corre#onding to the new
model, and again identify the #redictor for which the P&!al"e (for the t tet) i larget (b"t
not maller than yo"r &le!el). $it a new linear model with that #redictor deleted, and
contin"e thi #roce "ntil all the remaining P&!al"e are below yo"r &le!el.
We can alo #erform a !erion of backward te#wie regreion "ing the R f"nction
addter$() and dropter$() in the M2SS #ackage. ,o load them, "e the
library(-4<<) command. ,hee f"nction allow yo" to "e an F&tet criterion or a
P&!al"e criterion. @o" ho"ld ha!e an F limit and=or an &le!el choen ahead of time.
Start with yo"r f"ll model, and "e the R command-
> dropter$( Retailer, test & (:( )
with the name of yo"r f"ll model in #lace of Retailer. We get-
,hen identify and delete the #redictor (if any) with the mallet F&!al"e below yo"r F
limit, or the larget P&!al"e abo!e yo"r &le!el. $or e%am#le, if the F&limit to delete a
!ariable i 6.9, the ob!io" candidate for deletion i Costs, with an F&!al"e of 9.768F.
3ikewie, if we are "ing an &le!el of 9.98 for deletion, we wo"ld delete Costs.
,hen we fit a new linear model, call it ,e*-od, with Costs deleted ("e the
update() f"nction to make thi eaier), and "e the command-
> dropter$( ,e*-od, test & (:( )
and re#eat the #roce "ntil all F&!al"e are larger than yo"r F limit or all P&!al"e are
below yo"r &le!el.
Similarly, one may tart with a red"ced model and "e the R command addter$() to
chooe a !ariable for admiion. .f yo" want to begin with a n"ll model coniting of no
#redictor ('"t the interce#t), "e 1 # in the model form"la, i.e., #"t F where yo"
""ally #"t the name of the #redictor-
,ull ./ l$( Hours 1 #, data&Grocery )
+f co"re, "e the name of yo"r re#one !ariable in #lace of Hours and yo"r data table
in #lace of Grocery. ,hen "e the commnad-
> addter$( ,ull, scope & Retailer, test&(:( )
with the name of yo"r f"ll model in #lace of Retailer. We get-
,hen identify and admit the #redictor (if any) with the larget !al"e abo!e yo"r F limit or
the mallet P&!al"e below yo"r &le!el. $or e%am#le, if the F&limit to admit a !ariable
i 7.9, the ob!io" candidate for admiion i Holiday, with an F&!al"e of 96.
3ikewie, if we are "ing an &le!el of 9.98 for admiion, we wo"ld admit Holiday.
$it a new linear model, call it ,e*-od, with thi !ariable added ("e the update()
f"nction to make thi eaier), and "e the command-
> addter$( ,e*-od, scope & :ull-odel, test & (:( )
and re#eat the #roce "ntil all F&!al"e are larger than yo"r F limit or all P&!al"e are
below yo"r &le!el.
+ne can alo combine both f"nction to check a many "b&model a eem reaonable
by mo!ing backward and forward. +f co"re, when there are many !ariable thi
become im#ractical. 2lo, one can "e the f"nction add#() and drop#() intead of
addter$() and dropter$(), re#ecti!ely, in the ame manner. @o" don/t need to
load any additional #ackage to call thee f"nction.

You might also like