Professional Documents
Culture Documents
Monitor
Modeling
Performance
Deploy
Model
From Building Better Models with JMP Pro,
Grayson, Gardner and Stephens, 2015.
Data
Preparation
Key
Activities: Key
Tools:
• Determine
which
data
are
needed • SQL/Query
• Compile
(or
collect
new)
data • Data
table
structuring
-‐ join,
concatenate,
• Explore,
examine
and
understand
data update,
stack,
summarize,…
• Assess
data
quality • Summary
statistics
and
graphical
displays,
interactive
tools
and
filtering
Multivariate
• Clean
and
transform
data procedures
(clustering,
PCA,…)
• Define
features • Transformations,
creating
derived
variables
• Reduce
dimensionality • Missing
data
utilities,
outlier
analysis,
• Create
training,
validation
and
test
sets recoding,
binning
• Creating
holdout
set(s)
Modeling
Key
Activities: Key
Tools:
• Choose
the
appropriate
modeling
method
• Multiple
Regression
or
methods • Logistic
Regression
• Fit
one
or
more
models • Naïve
Bayes
• Evaluate
the
performance
of
each
model
• kNN
using
validation
statistics
(misclassification,
• Classification
and
Regression
Trees
RMSE,
Rsquare) • Bootstrap
Forests
and
Boosted
Trees
• Choose
the
best
model
or
set
of
models
to
• Neural
Networks
address
the
analytics
problem
(and
• Generalized
Linear
Models
ultimately
the
business
problem) • Survival
Models
• Forecasting/Time
Series
• **Create
ensemble
models
• Model
Comparison
• Text
Mining
The
Data
• German
Credit
data
set
available
at
https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
• Each
applicant
rated
as
either
a
“good
credit”
(700
cases)
or
a
“bad
credit”
(300
cases)
The
best
model,
from
a
profit
perspective,
is
a
Two
Stage
Forward
Selection,
with
an
average
profit
of
0.1315.
0.20
0.10
0.00
0.00 0.20 0.40 0.60 0.80 1.00
1-Specificity
CONCEPTUAL DATA-‐DRIVEN
EXPLORATORY
Adapted
from
Good
Charts
by
Scott
Berinato,
p.
76.
EXPLORATORY
Adapted
from
Good
Charts
by
Scott
Berinato,
p.
76.
Modeling
Objective:
Develop
a
classification
model
to
predict
if
an
applicant
is
a
good
or
bad
credit
risk.
Modeling
Results:
Developed
a
classification
model
to
maximize
net
profits
Estimated
average
gain
per
loan
made
~
$2236
Key
Drivers:
Co-‐Applicant
for
Loan,
Owns
Residence,
Rents,
Number
of
Existing
Credits,
and
Interactions
Between
Many
Factors