2 views

Uploaded by jbsimha3629

Credit risk

- Analysis of University Students’ Performance in Matriculation, Post Matriculation and First Year Examinations in Delta and Edo States, Nigeria
- Tmp 1500
- EnviRM3(Multivariate)
- 1. Class Exercise Simpleregression Ceo-salary
- Fivez 2005 syringes buffer.pdf
- HBR_Store24B_Case
- 341882630-HP-B2B-Business
- reg alpha
- GCM_Chapter15.pdf
- Report 2 Forecasting n Econ July 29
- JRO_vol_1_2011_p_33-52
- Linear Relationship
- SSF paper IJBT 14(2) 193-199.pdf
- DoS_201303
- Econometrics I 2
- At Lasst Final 4pm
- An Experimental Investigation Into ‘Pledge and Review’ in Climate Negotiations
- Bivariate Regression Analysis
- 1-s2.0-S0360319913007453-main
- Istanbul Exchange Regression

You are on page 1of 5

Common Problems, Practical Solutions

Part 2: Modeling Strategies

by Jeffrey S. Morrison

C

ontinuing the series dealing with requirements for Basel II

and building PD and LGD models, this article offers some

tips in model-building strategy so the analyst can get the most

out of the data available. The focus is on selecting the right predictors,

dummy variables, transformations, and interactions, evaluating influen-

tial outliers, and developing segmentation schemes.

V

olumes of literature have been written on Selecting the Right Predictors

probability of default, loss given default, and Although good modeling data may be initially

the statistical techniques used to estimate hard to find, once your bank has put together the

them. Theres a deficit of literature, however, on practi- informational infrastructure for the New Capital

cal advice so the analyst can produce better models Accord, you might have more data than you know

more quickly. For each topic there are a variety of what to do with. In building a PD model using a

approachesmany of which can work equally well. regression technique, typically only 10 to 15 predictor

Regardless of whether the analyst is building a PD or variables are finally chosen. However, you may have

LGD model, some practical strategies can provide a 200 or more potentially predictive attributes coming

good jump-start in aligning your modeling process to from your loan accounting systems, not to mention

the New Basel Capital Accord. In this article, we will outside vendors such as economic service providers.

refer to a hypothetical dataset to model the probability So how do you narrow down all that information? As

of default using such predictive attributes as LTV, indicated in the May 2003 issue of The RMA Journal,

payment history, bureau scores, and income. And sound judgment, combined with knowledge of simple

although these techniques can be implemented in just correlations in your data, is a good starting point.

about any statistical language, examples will be given Luckily, there are some additional tools to make the

using the SAS software language as well as S-PLUS. model-building process a little less stressful.

2004 by RMA. Jeff Morrison is vice president, Credit MetricsPRISM Team, at SunTrust Banks, Inc., Atlanta,

Georgia.

98 The RMA Journal May 2004

Preparing for Basel II:

Pa r t 2

The first is the automatic variable selection rou- Clustering routines do not offer any insight into

tines found in most statistical packages. Generally, the relationship of the individual variables and our

they are called forward stepwise, backward stepwise, dependent variable (PD or LGD). What they do

and best-fit procedures. Forward selection builds the offer, however, is a much easier way of looking at

model from the ground up, entering a new variable your predictive attributes. It may be that Cluster 1

and eliminating any previous variables not found sta- represents delinquent payment behavior data.

tistically significant according to a specified criteria Cluster 2 could reflect geographic or economic data.

rule. The backward stepwise procedure works just Therefore, in a regression procedure you should be

the opposite. It starts with all available predictors sure to test variables from each cluster. So how do

and eliminates them one by one based upon their you know which to pick? One way is to choose those

level of significance. Best-fit procedures may take variables with the lowest ratio values as indicated in

much more computer time because they may try a Column Da measure easily computed by most

variety of different variables to maximize some popular statistical software packages such as SAS. In

measure of fit rather than the significance level of our example, we might select VAR7 from Cluster 1

each variable. Whats the advice? Experiment with and VAR5 from Cluster 2 to try in a stepwise regres-

them all, but dont be surprised if the backward step- sion.

wise procedure works a little better in the long run.

However, a word of cautiononly include variables Dummy Variables

in your model that make business sense, regardless of Much of the information you might have on an

what comes out of any automatic procedure. account is in the form of a categorical variable.

One major drawback of these procedures is that These are variables such as product type, collateral

they can lead to variables in your final model that are code, or the state in which the customer or collater-

too correlated with one another. For example, if local al resides. For example, lets say you have a product

area employment and income are highly correlated type code stored as an integer ranging from 1 to 4.

with one another, the stepwise procedures may not be If the variable was entered into a regression model

able to correctly isolate their individual influence. As a without any changes, then its coefficients meaning

result, they could both end up in your model. This is might not make much sense, let alone be predic-

a condition called multicollinearity and may cause prob- tive. However, if we recoded it as follows, then the

lems in the model-building process. One way of

avoiding this problem is to perform a correlation Figure 1

analysis before you run any regression procedure. The Variable Clustering

adviceonly include predictor variables in your

A B C D

regression that have a correlation with each other less R-Squared R-Squared 1-R-Squared

than .75 in absolute value. If you do run into variables Variable Own Cluster Next Closest Ratio

that are too highly correlated with one another, choose

Cluster 1

the one that is the most highly correlated with your

dependent variable and drop the other. VAR1 .334 .544 1.460

A second tool is called variable clustering, a set of

procedures that can help the analyst jump some of the VAR2 .544 .622 1.206

hurdles associated with using an automatic selection VAR7 .322 .242 0.894

routine. This procedure seeks to find groups or clus-

ters of variables that look alike. The optimal predic- Cluster 2

tive attributes would be those that are highly correlat- VAR5 .988 .433 0.021

ed with variables within the cluster, but correlated

very little with variables in other clusters. SAS, for VAR8 .544 .231 0.592

example, has an easy-to-use procedure called PROC

VAR3 .764 .434 0.416

VARCLUS that produces a table such as the one

shown in Figure 1. VAR4 .322 .511 1.386

99

Preparing for Basel II:

Pa r t 2

regression could pick up the differences each prod- dure called binning sometimes helps the analyst

uct category makes in the PD or LGD, all other increase the models predictive power even further.

things remaining equal: Although there are a variety of different binning

IF PRODUCT_CODE=1 THEN DUMMY_1=1; approaches available, all leverage off the idea that

ELSE PRODUCT CODE=0; breaking down a variable into intervals can provide

IF PRODUCT_CODE=2 THEN DUMMY_2=1; additional information. When dealing with a variable

ELSE PRODUCT CODE=0; that has continuous values, such as a bureau score, a

IF PRODUCT_CODE=3 THEN DUMMY_3=1; two-step process can be implemented. First we do

ELSE PRODUCT CODE=0; something called discretizing, where the original value

IF PRODUCT_CODE=4 THEN DUMMY_4=1; is changed into ranges in order to smooth the data

ELSE PRODUCT CODE=0; creating a small number of discrete values. An exam-

What you now have are four product code ple of this process is shown in Figure 2, where the

dummy variables that will allow the model to esti- bureau score was recoded into having only five values.

mate the default impact among different product Now that you have regrouped the data into five

types. By convention, one of the variables has to be range values, simply create dummy variables from

left out in the regression model or an error will them as shown in Figure 3.

result. Therefore, you would include dummy_1, Note that the last grouping variable was not done

dummy_2, and dummy_3 in your regression model because we would automatically leave this out of the

rather than a single variable presenting all values of model by convention. The reason this procedure

the product code. sometimes works to increase the accuracy of the

model is because it allows the model to be more flex-

Variable Binning ible in estimating the relationship between the

Building on the dummy variable concept, a proce- bureau score, for example, and the default when the

relationship may not be strictly linear. This

Figure 2 is a great technique to try on other variables

such as LTV, income, or months on books.

Step 1Discretizing your Data

Grouping

SAS Code Transformations

Range

Many times, simple transformations of

455-569 IF B_SCORE >455 AND B_SCORE<=569 THEN B_SCORE=569;

the data end up making the model more

570-651 IF B_SCORE >569 AND B_SCORE<=651.5 THEN B_SCORE=651;

predictive. If you are working with a PD

652-695 IF B_SCORE >651 AND B_SCORE<=695 THEN B_SCORE=695; model, one thing you could do as part of

696-764 IF B_SCORE >695 AND B_SCORE<=764 THEN B_SCORE=764; the preliminary analysis is to run a logistic

765-850 IF B_SCORE >764 AND B_SCORE<=850 THEN B_SCORE=849; regression using only a single predictor

Figure 3 Figure 4

Step 2Dummy Variables / Transformation

Binning Transformation AnalysisLogistic

B_SCORE_1 =0;

Obs _Status_ _Lnlike_ Name T_Type Recomm

IF B_SCORE=569 THEN B_SCORE_1 =1;

1 0 Converged -423.744 INCOME XSQ <-------*

B_SCORE_2 =0;

2 0 Converged -423.744 INCOME QUAD

IF B_SCORE=651 THEN B_SCORE_2 =1;

3 0 Converged -434.979 INCOME NONE

B_SCORE_3 =0;

IF B_SCORE=695 THEN B_SCORE_3 =1; 4 0 Converged -443.669 INCOME SQRT

Preparing for Basel II:

Pa r t 2

hood measure as shown in Figure

Influence Plot

4. This measure is automatically

produced by most statistical soft- Logistic Influence Results

ware packages. 6

If you do the same for various

5

Change in Deviance

transformations such as the square,

100

the square root, and the log of the 4

variable, pick the transformation

3

where the log likelihood measure

is closest to 0. Include this win- 2

ning transformation in your full 1

regression model along with other

predictors. However, there is no 0

guarantee that such a transforma- 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

tion will improve the overall fit of Estimated Probability

the model once the other predictor

variables have been added. Remember, modeling is observations that show up on a graph of this type,

as much of an art as it is a science! but the worst ones should be reviewed and a deci-

sion made as to whether they should be deleted.

Interactions With respect to a models implementation code, it is

Additional variables can be created reflecting the always a good idea to limit extreme predictor values

interaction of two variables together that may some- to certain maximums or minimums. For example,

times end up increasing your models accuracy. you might want to restrict an LTV value to be

Usually the creation of these variables comes from between 0 and 100 in the coding algorithm before

prior knowledge as to what should influence default coefficients or weights are used to predict the proba-

probabilities or LGD. To create such a variable, sim- bility of default.

ply multiply together the variables you wish to inter-

act and include the new variable in the regression. Segmentation

Sometimes a better overall PD or LGD model

Influential Outliers can be built if a segmentation analysis is performed.

Observations called influence points can impact Segmentation in this context refers to the develop-

the parameter estimates more than others. During ment of more than one PD or LGD model for better

the model development process, it is always good to overall accuracy. The effectiveness of devoting the

produce an influence plot so you can decide what time needed to build additional models depends on

corrective action (if any) is necessary. Figure 5 is a the predictive information available and whether or

bubble chart showing that observation #100 exerts not the segmented populations are really that differ-

significant influence on the size of your regression ent from one another. Sometimes the segmentation

estimates. This type of analysis usually uses a short- groups are obvious, such as modeling consumer loans

cut approach to see how the predicted probabilities apart from loans to small businesses. Sometimes, the

(and estimated parameters) would change if each segmentation is not so obvious. For example, you

observation were removed. The rule of thumb is to could build a PD model for loans with high, medium,

examine any observations that have a change in or low outstanding balances, or separate models for

deviance greater than 4.0, as represented by the dot- accounts with different levels of income or LTV.

ted horizontal line in Figure 5. Some software pack- So how do you know how to define the high,

ages call these measures by different names, but medium, or low segmentation breaks? One common

most produce some option for looking at influential approach is something called CART. CART stands

observations. In reality, there may always be some for Classification and Regression Treesa method

101

Preparing for Basel II:

Pa r t 2

quite different from regression analysis. It uses an tree. Figure 6 shows that the tree structure produces

approach referred to as recursive partitioning to deter- five nodes. A node represents the end of the splitting

mine breaks in your data. It does this by first finding logic and the segmentation branch of the tree.

the best binary split in the data, followed by the next Therefore, in order to create a CART segmentation

best binary split, and so forth. variable for our regression model, we would code

CART is an exploratory analysis tool that something like this:

attempts to describe the structure of your data in a IF LTV<.545 AND SCORE<651.5 THEN NODE=1;

tree-like fashion. S-Plus, one of the more popular IF LTV<.545 AND SCORE>=651.5 THEN NODE=2;

CART vendors, uses deviance to determine the tree Etc.

structure. As in regression, CART relates a single The same type of coding would be done for

dependent variable (in our case, default or LGD) to nodes 3-5. To introduce this information content into

a set of predictors. An example of this tree-like struc- logistic regression, we would create dummy variables

ture is shown in Figure 6. CART first splits on for each node and use them as predictors in our PD

accounts where LTV is greater than and less than model.

54.5%. Therefore, you could design a PD model for

accounts with LTVs below this cutoff and one for Summary

accounts greater than this cutoff. If you had even Getting the most out of the data is one of the

more time on your hands, you might want to consid- biggest challenges facing the model builder. Anyone

er using score and income to further define your seg- can build a model. However, the challenge is build-

mentation schemes. ing the most predictive model possible. This article

Sometimes it is possible to use CART analysis has offered some tips on building a better mouse-

directly in a logistic regression model. This is a much

trap and how to go about it in a way that is based on

quicker process than building and implementing sound statistical approaches that should satisfy most

separate regression models for each segment. So how regulators. On the other hand, it must be remem-

can we integrate the CART methodology into a sin- bered that building PD and LGD models is more an

gle PD model? The answer is to construct a variable art than a science. Todays model builders come from

that reflects the tree structure for the branches in the

a variety of disciplines, have different academic

training and experience, and are

accustomed to using certain statis-

Figure 6

tical tools. Regardless of these dif-

Typical CART. Segmentation Analysis in S-PLUS Software ferences, there is simply no substi-

tute for knowing your data.

LTV <0.545 Understanding how it was devel-

oped, what values are reasonable,

and how the business works is

essential in the model develop-

Score <651.5 Income <71,000 ment process.

Jeff.Morrison@suntrust.com.

References

Score <719.5

1 0 0 1 Jeffrey S. Morrison, Preparing for Modeling

Node 1 Node 2 Requirements in Basel II, Part 1: Model

Node 5 Development, The RMA Journal. May 2003.

Logistic Regression,1989 John Wiley & Sons, Inc.

1 0 3 Bryan D. Nelson,Variable Reduction for

Node 3 Node 4 Modeling Using PROC VARCLUS, Paper 261-26.

SAS User-group publication.

- Analysis of University Students’ Performance in Matriculation, Post Matriculation and First Year Examinations in Delta and Edo States, NigeriaUploaded byapjeas
- Tmp 1500Uploaded byFrontiers
- EnviRM3(Multivariate)Uploaded byvaishalliniklaje
- 1. Class Exercise Simpleregression Ceo-salaryUploaded byсимона златкова
- Fivez 2005 syringes buffer.pdfUploaded byYulianri Rizki Yanza
- HBR_Store24B_CaseUploaded bymanojben
- 341882630-HP-B2B-BusinessUploaded byaayushi
- reg alphaUploaded byapi-98271862
- GCM_Chapter15.pdfUploaded bydungnt0406
- Report 2 Forecasting n Econ July 29Uploaded byHelloman1234
- JRO_vol_1_2011_p_33-52Uploaded bymikaelcollan
- Linear RelationshipUploaded bychingyho
- SSF paper IJBT 14(2) 193-199.pdfUploaded byAnusree Nishanth
- DoS_201303Uploaded byAyush Jain
- Econometrics I 2Uploaded byAnonymous 1tv9Msp
- At Lasst Final 4pmUploaded by0810pgdbm1103242
- An Experimental Investigation Into ‘Pledge and Review’ in Climate NegotiationsUploaded byCintia Dias
- Bivariate Regression AnalysisUploaded byMegaDocs
- 1-s2.0-S0360319913007453-mainUploaded byMurtaza Çakaloğlu
- Istanbul Exchange RegressionUploaded byAbhinav Srivastava
- UT Dallas Syllabus for gisc6379.501.07s taught by Michael Tiefelsdorf (mrt052000)Uploaded byUT Dallas Provost's Technology Group
- Ch04Uploaded byAyu Kusuma Wardhani
- ClarifyUploaded byHazel Cadoo-Gabriel
- TablaRugosidadAbsolutaMateriales LEERUploaded byBriam Tica
- fenomenos 3Uploaded byenriqueg_53
- finalexamsession1_2010Uploaded byStavros Mouslopoulos
- Research Methology GBS-2011Uploaded byAshish Verma
- Regression AssumptionsUploaded byUmair Anees
- data-structure-mcqs.docxUploaded byPrem Prakash
- PerchUploaded byChris Milham

- StockMarketPredictionUsingTwitterSentimentAnalysis.pdfUploaded byjbsimha3629
- Combining Load Forecasts From Independent ExpertsUploaded byjbsimha3629
- Module 1_ Enterprise Analytics Lesson Plan1.1.docxUploaded byjbsimha3629
- Chakraborty Aaai12Uploaded byjbsimha3629
- RM Good AdviceUploaded byjbsimha3629
- Tour to FranceUploaded byjbsimha3629
- 1809.10756.pdfUploaded byjbsimha3629
- Smart Grid CommUploaded byjbsimha3629
- Advances in MLUploaded byjbsimha3629
- stopwords.txtUploaded byjbsimha3629
- IJCAI3.pdfUploaded byjbsimha3629
- HR Analytics SkillsUploaded byjbsimha3629
- Linear ModelsUploaded byjbsimha3629
- Six sigmaUploaded byjbsimha3629
- Six sigmaUploaded byjbsimha3629
- AASMA.pptUploaded bysatishdaksha534
- 5.EstimationUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- Big Data SyllabusUploaded byjbsimha3629
- 3.DistributionsUploaded byjbsimha3629
- 5.EstimationUploaded byjbsimha3629
- 4.Hypothesis TestingUploaded byjbsimha3629
- TimeseriesUploaded byjbsimha3629
- AI BrochureUploaded byjbsimha3629
- AI BrochureUploaded byjbsimha3629
- 01_paperUploaded byjbsimha3629
- Data Types and OperationsUploaded byjbsimha3629
- FCMUploaded byjbsimha3629
- 2.Variable AnalysisUploaded byjbsimha3629
- Enrollments Forecasting Based On Aggregated K-Means Clustering and Fuzzy Time SeriesUploaded byIRJET Journal

- AS399x GUI Description V2.0Uploaded byCaddish Diy
- educ 696 week 6 expert educator interview doan thanushi hettipathiranaUploaded byapi-350805379
- 1 Introduction 8086Uploaded bylidhin
- Stealth 8Uploaded byGuillermo Manuel Schödl
- Black OxideUploaded byGerman Toledo
- KM - Roll of a Statement.pdfUploaded byMatt DiPietro
- QUESTINAIRE: Impact of Ict in Teaching and Learning of Business EducationUploaded bysunday_abatan5437
- The Mirror of the Word of God - [Primacy of God's Words # 5 ]Uploaded byRandolph Barnwell
- T&D PPTUploaded byswweetypatel
- 4_10Uploaded byNanescu Liliana
- 0-0 the Divine Measure of Time and Space.pdfUploaded byPerlim Pim Pom
- instructional project 3 educ 5312-research paper mertkan murtUploaded byapi-322626059
- chemotherpy.docxUploaded byShreyas Walvekar
- RaubarUploaded bysajjad_naghdi241
- Shodhganga.inflibnet.ac.in Bitstream 10603 40861 12 12 Chapter4Uploaded byrock
- resume.docxUploaded byTanya Waller
- OkkenUploaded byAvinash Singh
- SocialCode Intelligence Brief Solving Marketing Challenges Through Facebook AdsUploaded byClifford Ganaseb
- CBZ-GAS Filled Shock AbsorberUploaded bysmtauto
- Intelligent Controller for Coupled Tank SystemUploaded byOmprakash Verma
- PEX-04-03Uploaded byCyreen Figueroa
- ac102_ch6.pdfUploaded byGopi Krishna
- Recommended Dietary Allowances (Rda)Uploaded byberrystrawbery
- cis101-003 summer 2014 syllabus 3Uploaded byapi-256662959
- The Sango Cult in Nigeria and the TrinidadUploaded byBianca Lima
- Chapter 4Uploaded byChristopher Middleton
- AVG Vega5024Uploaded byajeetsinghkatiyar
- Enneads IV - Plotinus - On the Nature of the Soul; Transl MackennaUploaded byWaterwind
- Salesforce_B2C_cheatsheetUploaded bylokeshm59434f
- darmstadtUploaded byikour