Glennon - Model Risk in Retail Scoring

Comptroller of the Currency
Administrator of National Banks
Managing Model Risk in Retail Scoring
Dennis Glennon
Credit Risk Analysis Division

Office of the Comptroller of the Currency
September 28, 2012
The opinions expressed in this paper are those of the authors and do not necessarily reflect those of
the Office of the Comptroller of the Currency. All errors are the responsibilities of the authors.
Filename
Agenda
Introduction to Model Risk

What is it?
Why is it relevant?
Managing Model Risk

Overview of Sound Model Development and
Validation Procedures
Emerging Issues Related to Model Risk
2
2
Filename
Models Risk: What is it?
Model Risk Potential for adverse consequences

from decisions based on incorrect or misused model
outputs
Model errors that produce inaccurate outputs
Model may be used incorrectly or inappropriately (i.e.,
using a model outside the environment for which it was
designed).
Model risk emerges from the process used to

develop models for measuring credit risk.
The process introduces a secondary loss exposure
beyond that of credit risk alone
e.g., poor underwriting decisions based on erroneous
models or overly broad interpretations of model results.
3
3
Filename
Model Risk: What is it?
Credit Risk: The risk to earnings or capital from an
obligor's failure to meet the terms of any contract with
the bank or otherwise fails to perform as agreed.
A conceptually distinct exposure to loss.
There are many reasons for poor model-based results

including:
Poor modeling (i.e., inadequate understanding of
the business)
Poor model selection (i.e., overfitting)
Inadequate understanding of model use
Changing conditions in the market
4
4
Filename
Managing Model Risk
The goal of model-risk analysis is to isolate the

effect of a bank's choice of risk-management
strategies from those associated with incorrect
or misused model output.
Model Validation is an essential component of

a sound model-risk management process.
Validate at time of model
development/implementation
Ongoing monitoring
Re-validate
5
5
Filename
Model Risk
Model validation can be costly.
However, using unvalidated models to

underwrite, price, and/or manage risk is
potentially an unsafe and unsound practice.
The best defense against model risk is the

implementation of formal, prudent, and
comprehensive model-validation procedures.
6
6
Filename
Model Risk: Sound Modeling Practices
Sound modeling practices

In many cases, there are generally accepted
methods of building and validating models.
These methods incorporate procedures developed
in the finance, statistics, econometrics, and
information theory literature.
Although these methods are valid, they may not
be appropriate in all applications.
A model selected for its ability to discriminate between
high and low risk may perform poorly at predicting the
likelihood of default.
7
7
Filename
Models as Decision Tools
Two primary modeling objectives

Classification: The model is used to rank
credits by their expected relative performance
Prediction: The model is used to accurately
predict the probability of the outcome
Modelers typically have one of these

objectives in mind when developing and
validating their models
8
8
Filename
Model Selection: Which model is better?
obs. good (G) - y=0 obs. bad (B) - y=1
Model 1 Model 2
1 3 5 7 9 1 5 4 4 11
y
y
1 1
[0.1] [0.3] [0.5] [0.7] [0.9] [0.08] [0.45] [0.44] [0.67] [0.92]
[bad rate] [bad rate]
11 6 5 2 1
9 7 5 3 1
0
0 10 20 30 40 50 60 70 80 90 100 0
0
10
20 30 40 50 60 70 80 90 100
Score (quintiles) Score (quintiles)
[#B / (#G + #B)] 9
9
Filename
A comparison of models: visual summary

Reliable, but not Accurate Reliable and Accurate
7 6
development log odds actual log odds
6 actual log odds 5 development log odds
5
Odds: 33:1
Bad %: 3.0%
4
log(odds)
4
3
3 Odds: 33:1
Bad %: 3.0%
2 2
Odds: 12.2:1
Bad %: 7.6%
1 1
Score: 253
0 0
320 300 280 260 240 225 309 289 269 249 229 209
score
10
Filename
Illustrative Example
Risk-Rating Model
7
6 Development (K-S = 32.1)
Validation (K-S = 34.3)

5
ln(good/bad)
ln(20/1) = 3.0
3
bad rate = 5%
2
ln(4/1) = 1.4
1
bad rate = 20%
0
644 653 665 675 684 693 706 715 725 739 753
Score Bands
11
Filename
The model design should reflect how the

model will be used.
As such, the choices of:
sample design
modeling technique
validation procedures
should reflect the intended purpose for which
the model will ultimately be used.
To effectively manage model risk, the right
tools must be used.
12
12
Filename
Models are developed for different purposes

i.e., classification or prediction. As such,
the choices of:
sample design
modeling technique
validation procedures
are driven by the intended purpose for which

the model will ultimately be used.
13
13
Filename
Model Validation
The classification objective is the weaker of two
conditions.
There are well-developed methods outlined in the
literature and accepted by the industry that are used to
assess the validity of models developed under that
objective.
In practice, we see:
Development
KS / Gini used as the primary model selection tool
These evaluated on the development, hold out, and out-
of-time samples
Validation
KS / KS
Stability test (e.g., PSI, characteristic analysis, etc.)
Backtesting analysis 14
14
Filename
Model Validation
Almost all scoring models generate KS values that reject the null
that the distribution of good accounts is equal to the distribution
of bads.
KS is also used to identify a specific model with the maximum

separation across alternative models.
In practice, however, the difference between the maxKS and those

of alternative models is never tested using statistical methods
(although there are tests outlined in the literature e.g.,
Krzanowski and Hand, 2011).
More importantly, once a model is selected, few modelers apply a

statistical test to determine if the KS has change significantly over
time to conclude the model is no longer working as expected.
15
15
Filename
Model Validation
The test that have been developed, however, tend to
be sensitive to sample size. Given the size of
development and validation samples, very small
changes may be statistically significant.
OPEN ISSUE 1: Are there tests banks can use to test

for statistical significance that are not overly sensitive
to sample size.
16
16
Filename
Model Validation
Predictive models are developed under a model
accuracy objective.
As a result, a goodness-of-fit test is required for model
selection.
Common performance measures used to evaluate
predictive models:
Interval Test
Chi-Square Test
Hosmer-Lemeshow (H-L) Test
Unfortunately, the goodness-of-fit tests assume

defaults are independent events. If the events are
dependent, the tests will reject the null too frequently.
17
17
Filename
Model Validation
The Vasicek Test is an alternative test of accuracy that
allows for dependence.
The Vasicek Test is designed to capture the effect of

dependence on the size of the confidence bands.
Formula used to derive the confidence bands

1 ( PD) Z .95
Vint
(1 )
where Vint is the width of the interval; ~ N(0,1);
Z.95=1.64; and correlation.
18
18
Filename
Vasicek Test: An Example
Vasicek Test Analysis
Upper Bound
Segment Accounts Estimated PD Actual PD Vasicek 95% CI
= 0.15 = 0.05 = 0.015
1 1000 0.00000 0.00200 0.000003 0.00000 0.00000 0.00005
2 1000 0.00001 0.00000 0.000058 0.00004 0.00003 0.00024
3 1000 0.00008 0.00000 0.000323 0.00023 0.00015 0.00062
4 1000 0.00031 0.00100 0.001272 0.00087 0.00059 0.00141
5 1000 0.00102 0.00400 0.003957 0.00265 0.00183 0.00299
6 1000 0.00313 0.00800 0.011466 0.00760 0.00536 0.00659
7 1000 0.01003 0.01900 0.033541 0.02230 0.01618 0.01620
8 1000 0.03767 0.06300 0.107877 0.07392 0.05605 0.04948
9 1000 0.18798 0.26700 0.393836 0.29771 0.24538 0.21220
10 1000 0.75928 0.54900 0.927103 0.86425 0.81919 0.78578
19
Filename
Model Validation: Vasicek Test
If is too high the bands are too wide: too
many models would pass the test
is not known and has to be estimated.

For point-in-time based models, can be very small
For through-the-cycle based models, can be large
In practice, we often see models fail the

interval/Chi-square test, but pass the Vasicek
test (especially when samples are large).
Open Issue 2: How do we resolve the

inconsistency?
20
20
Filename
Sensitivity of Validation Test to Sample Size
Accuracy tests tend to reject models that
discriminate well
consistent with the expectations of the LOB
Measurement can be so precise that even a

small, non-relevant difference in point estimates
can be considered statistically significant.
21
21
Filename
Default Rates
40.00
30.00
default rate
20.00
10.00
0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
actual 34.11 22.75 16.18 13.63 11.44 9.84 9.07 7.60 7.35 6.83 6.37 5.72 5.41 4.49 4.41 3.57 3.44 2.93 2.27 1.14
predicted 34.56 22.55 16.59 13.27 11.48 9.86 8.65 7.90 7.16 6.54 5.97 5.50 5.04 4.64 4.14 3.76 3.38 2.96 2.46 1.43
score range
22
22
Filename
Default Rate p-values HL

Seg Default Non-Default Total Actual Predicted (cv - 5%)
1 4027 7780 11807 34.11 34.56 0.3039 1.0572

2 2992 10158 13150 22.75 22.55 0.5832 0.3011
3 1847 9568 11415 16.18 16.59 0.2390 1.3867
4 1184 7505 8689 13.63 13.27 0.3226 0.9787
5 878 6795 7673 11.44 11.48 0.9125 0.0121
6 1007 9223 10230 9.84 9.86 0.9459 0.0046
7 598 5996 6594 9.07 8.65 0.2250 1.4722
8 536 6512 7048 7.60 7.90 0.3506 0.8713
9 474 5973 6447 7.35 7.16 0.5541 0.3500
10 507 6913 7420 6.83 6.54 0.3124 1.0205
11 459 6752 7211 6.37 5.97 0.1516 2.0568
12 373 6150 6523 5.72 5.50 0.4357 0.6076
13 380 6647 7027 5.41 5.04 0.1562 2.0109
14 339 7214 7553 4.49 4.64 0.5354 0.3842
15 355 7698 8053 4.41 4.14 0.2238 1.4799
16 244 6584 6828 3.57 3.76 0.4094 0.6806
17 239 6712 6951 3.44 3.38 0.7819 0.0767
18 246 8145 8391 2.93 2.96 0.8712 0.0263
19 217 9360 9577 2.27 2.46 0.2296 1.4432
20 208 17978 18186 1.14 1.43 0.0010 10.8227
HL stat 27.0433
p-value 0.0782
23
23
Filename
p-values
1.00
0.80
0.60
p
0.40
0.20
0.00 11
13
15
17
19
1
n-sample 3n-sample c-value score range
24
24
Filename
Interval Tests with Large Samples
Conclusion:
Statistical difference: significant
Economic difference: insignificant
Solutions?
Reduce the number observations using a
sample: less powerful test

Redefine the test
Interval test
Focus on capital
25
25
Filename
Interval Tests with Large Samples
(5)
(4)
(3)
(2)
(1)
-1% 0 +1%
26
26
Filename
Interval Test
Restate the null as an interval defined over an
economically acceptable range
If the CI1- around the point estimate is within the in
interval, conclude no economically significant
difference
May want to reformulate the interval test in terms of
an acceptable economic bias in the calculation of
regulatory capital
Open Issue 3: How do we reconcile business

and statistical significance?
27
27
Filename
Conclusion
Active management of model risk

Sound model development, implementation, and use
of models are vital elements, and
Rigorous model validation is critical to effective
model risk management.
Model Risk should be managed like other risks

Identify the source
Manage it properly
28
28
Filename

Glennon - Model Risk in Retail Scoring

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Glennon - Model Risk in Retail Scoring

Uploaded by

Copyright:

Available Formats

Comptroller of the Currency

Administrator of National Banks

Managing Model Risk in Retail Scoring

Credit Risk Analysis Division

September 28, 2012

Introduction to Model Risk

Managing Model Risk

Emerging Issues Related to Model Risk

Model Risk Potential for adverse consequences

Model risk emerges from the process used to

A conceptually distinct exposure to loss.

There are many reasons for poor model-based results

The goal of model-risk analysis is to isolate the

Model Validation is an essential component of

Model validation can be costly.

However, using unvalidated models to

The best defense against model risk is the

Sound modeling practices

Two primary modeling objectives

Modelers typically have one of these

obs. good (G) - y=0 obs. bad (B) - y=1

A comparison of models: visual summary

6 Development (K-S = 32.1)

Validation (K-S = 34.3)

The model design should reflect how the

Models are developed for different purposes

are driven by the intended purpose for which

KS is also used to identify a specific model with the maximum

In practice, however, the difference between the maxKS and those

More importantly, once a model is selected, few modelers apply a

OPEN ISSUE 1: Are there tests banks can use to test

Hosmer-Lemeshow (H-L) Test

Unfortunately, the goodness-of-fit tests assume

The Vasicek Test is designed to capture the effect of

Formula used to derive the confidence bands

Vasicek Test Analysis

= 0.15 = 0.05 = 0.015

1 1000 0.00000 0.00200 0.000003 0.00000 0.00000 0.00005

2 1000 0.00001 0.00000 0.000058 0.00004 0.00003 0.00024

3 1000 0.00008 0.00000 0.000323 0.00023 0.00015 0.00062

4 1000 0.00031 0.00100 0.001272 0.00087 0.00059 0.00141

5 1000 0.00102 0.00400 0.003957 0.00265 0.00183 0.00299

6 1000 0.00313 0.00800 0.011466 0.00760 0.00536 0.00659

7 1000 0.01003 0.01900 0.033541 0.02230 0.01618 0.01620

8 1000 0.03767 0.06300 0.107877 0.07392 0.05605 0.04948

9 1000 0.18798 0.26700 0.393836 0.29771 0.24538 0.21220

10 1000 0.75928 0.54900 0.927103 0.86425 0.81919 0.78578

is not known and has to be estimated.

In practice, we often see models fail the

Open Issue 2: How do we resolve the

Measurement can be so precise that even a

Default Rate p-values HL

1 4027 7780 11807 34.11 34.56 0.3039 1.0572

n-sample 3n-sample c-value score range

Economic difference: insignificant

sample: less powerful test

Open Issue 3: How do we reconcile business

Active management of model risk

Model Risk should be managed like other risks

You might also like