Professional Documents
Culture Documents
CART from A to B
James Guszcza, FCAS, MAAA
CAS Predictive Modeling Seminar
Chicago
September, 2005
Contents
An Insurance Example
Some Basic Theory
Suggested Uses of CART
Case Study: comparing CART with other methods
What is CART?
Philosophy
Our philosophy in data analysis is to look at
the data from a number of different
viewpoints. Tree structured regression offers
an interesting alternative for looking at
regression type problems. It has sometimes
given clues to data structure not apparent
from a linear regression analysis. Like any
tool, its greatest benefit lies in its intelligent
and sensible application.
--Breiman, Friedman, Olshen, Stone
An Insurance Example
{1,2,310+}
{1,2,36}
{0,1}
Node 1
NUM_VEH
Class Cases %
0
37891 66.2
1
19312 33.8
N = 57203
NUM_VEH >
4.500
Terminal
Node 1
Class Cases %
0
29083 80.0
1
7276 20.0
N = 36359
Terminal
Node 2
Class Cases %
0
8808 42.3
1
12036 57.7
N = 20844
Node 1
NUM_VEH
N = 57203
Node 2
LIAB_ONLY
N = 36359
Node 4
NUM_VEH
N = 20844
Terminal
Node 3
Class = 0
Class Cases %
0
7591 96.5
1
279 3.5
N = 7870
Terminal
Node 1
Class = 0
Class Cases %
0 18984 78.7
1
5138 21.3
N = 24122
Terminal
Node 2
Class = 1
Class Cases %
0
2508 57.4
1
1859 42.6
N = 4367
Terminal
Node 4
Class = 1
Class Cases %
0
4327 48.1
1
4671 51.9
N = 8998
Terminal
Node 5
Class = 0
Class Cases %
0
2072 76.5
1
637 23.5
N = 2709
Terminal
Node 6
Class = 1
Class Cases %
0
2409 26.4
1
6728 73.6
N = 9137
Node 1
NUM_VEH
Class Cases %
0
37891 66.2
1
19312 33.8
N = 57203
NUM_VEH >
4.500
Terminal
Node 1
Class Cases %
0
29083 80.0
1
7276 20.0
N = 36359
Terminal
Node 2
Class Cases %
0
8808 42.3
1
12036 57.7
N = 20844
Node 1
NUM_VEH
N = 57203
Node 2
LIAB_ONLY
N = 36359
Node 4
NUM_VEH
N = 20844
Node 3
FREQ1_F_RPT
N = 28489
Terminal
Node 3
Class = 0
Class Cases %
0
7591 96.5
1
279 3.5
N = 7870
Terminal
Node 1
Class = 0
Class Cases %
0 18984 78.7
1
5138 21.3
N = 24122
Terminal
Node 2
Class = 1
Class Cases %
0
2508 57.4
1
1859 42.6
N = 4367
Terminal
Node 4
Class = 1
Class Cases %
0
4327 48.1
1
4671 51.9
N = 8998
Terminal
Node 5
Class = 0
Class Cases %
0
2072 76.5
1
637 23.5
N = 2709
Terminal
Node 6
Class = 1
Class Cases %
0
2409 26.4
1
6728 73.6
N = 9137
Node 1
NUM_VEH
N = 57203
Node 2
LIAB_ONLY
N = 36359
Node 4
NUM_VEH
N = 20844
Terminal
Node 3
Class = 0
Class Cases %
0
7591 96.5
1
279 3.5
N = 7870
Terminal
Node 1
Class = 0
Class Cases %
0 18984 78.7
1
5138 21.3
N = 24122
Terminal
Node 2
Class = 1
Class Cases %
0
2508 57.4
1
1859 42.6
N = 4367
Terminal
Node 4
Class = 1
Class Cases %
0
4327 48.1
1
4671 51.9
N = 8998
Terminal
Node 5
Class = 0
Class Cases %
0
2072 76.5
1
637 23.5
N = 2709
Terminal
Node 6
Class = 1
Class Cases %
0
2409 26.4
1
6728 73.6
N = 9137
High-Dimensional Predictors
Categorical predictors:
CART considers every
possible subset of
categories
Nice feature
Very handy way to group
massively categorical
predictors into a small #
of groups
Node 1
LINE_IND$
N = 38300
= ("dump",...)
= ("contr",...)
Terminal
Node 1
N = 11641
Node 2
LINE_IND$
N = 26659
= ("hauling",...)
Node 3
LINE_IND$
N = 901
= ("contr",...)
Terminal
Node 4
N = 25758
= ("hauling")
= ("specDel")
Terminal
Node 2
N = 652
Terminal
Node 3
N = 249
A Little Theory
Splitting Rules
Regression Trees
Classification Trees
-[p*log(p) + (1-p)*log(1-p)]
Max entropy/Gini when p=.5
Min entropy/Gini when p=0 or 1
Classification Trees
vs. Regression Trees
Splitting Criteria:
Goodness of fit
measure:
available as model
tuning parameters
Splitting Criterion:
Goodness of fit:
misclassification rates
same measure!
sum of squared errors
No priors or
misclassification costs
Sequentially collapse
nodes that result in
the smallest change
in purity.
weakest link
pruning.
--Dan Steinberg
2004 CAS P.M.
Seminar
Cost-Complexity Pruning
Weakest-Link Pruning
Finding
model
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
train
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
test
train
train
train
train
train
train
train
train
10
test
train
train
train
train
train
train
train
train
train
How to Cross-Validate
Just Right
size of tree
2
8 10
13
18
21
0.4
0.6
0.8
1.0
0.2
Inf
0.059
0.035
0.0093
cp
0.0055
0.0036
CART in Practice
CART advantages
CART advantages
Unlike regression
CART Disadvantages
Uses of CART
Variable Selection
Case Study:
Spam e-mail Detection
Compare CART with:
Neural Nets
MARS
Logistic Regression
Ordinary Least Squares
The Data
8%
92%
57 variables created
Frequency
name)
Frequency
Frequency
Frequency
Etc
Methodology
Software
Un-pruned Tree
Pruning Back
size of tree
Inf
0.09
10
11
13
15
17
19
20
22
24
25
30
37
52
56
66
71
83
0.6
0.8
1.0
0.4
0.2
0.043
0.018
0.011
0.0096
0.0061
0.0049
0.0033
0.002
0.0011
2.4e-05
Pruned Tree #1
size of tree
3
Inf
0.09
10
11
13
15
17
19
20
22
24
25
30
37
52
56
66
71
83
0.8
1.0
0.6
0.4
0.2
0.043
0.018
0.011
0.0096
0.0061
cp
0.0049
0.0033
0.002
0.0011
2.4e-05
Pruned Tree #2
Suggests rule:
Many $ signs, caps, and ! and few instances of
company name (HP) spam!
freq_DOLLARSIGN< 0.0555
freq_remove< 0.065
freq_EXCL< 0.5235
freq_hp>=0.16
freq_george>=0.14
freq_EXCL< 0.3765
0
285/12
avg.CAPS< 2.92
tot.CAPS< 83.5
0
1.061e+04/170
0
59/0
freq_free< 0.77
1
70/178
1
4/290
freq_remove< 0.025
1
20/75
0
415/29
1
0/13
1
46/193
0
208/54
1
12/51
1.0
0.8
0.6
perfect model
unpruned tree
pruned tree #1
pruned tree #2
0.4
0.2
0.0
0.0
0.2
0.4
0.6
Perc.Total.Pop
0.8
1.0
Other Models
GLM model
Logistic regression
run on 20 of the
most powerful
predictive variables
Comparison of Techniques
0.8
0.6
perfect model
mars
neural net
pruned tree #1
glm
regression
0.4
0.2
0.0
1.0
Perc.Spam
0.0
0.2
0.4
0.6
Perc.Total.Pop
0.8
1.0
Goodnode:
(freq_$ < .0565) & (freq_remove < .065) & (freq_! <.524)
Badnode:
(freq_$ > .0565) & (freq_hp <.16) & (freq_! > .375)
perfect model
neural net
decision tree #2
glm
hybrid glm
0.0
Perc.Spam
0.0
0.2
0.4
0.6
Perc.Total.Pop
0.8
1.0
Concluding Thoughts
More Philosophy
Binary Trees give an interesting and often
illuminating way of looking at the data in
classification or regression problems. They
should not be used to the exclusion of other
methods. We do not claim that they are
always better. They do add a flexible
nonparametric tool to the data analysts
arsenal.
--Breiman, Friedman, Olshen, Stone