You are on page 1of 20

1

Data Mining Techniques &


Its Applications in Insurance
Society of Actuaries
San Francisco Spring Meeting
June 24 - 26, 2002
Lijia Guo, PhD, ASA, MAAA
University of Central Florida
Session 11L
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 2
Learning Objectives
Understanding a Data Mining Process
Having insight about the actuarial
applications of data mining techniques
Exploring the perspective of applying data
mining techniques in your own practice
2
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 3
Agenda
Introduction
Data Mining Methods
Actuarial Applications
Conclusions & Questions
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 4
Introduction
Changes in Information Technology
Availability of large quantity of insurance
data
Mind your business by mining your data
3
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 5
What is Data Mining?
An information discovery process.
Prediction
-- Finding unknown values/relationships/patterns from
known large database
Description
-- interpretation of a large database
Making crucial business decisions - turn the
newfound knowledge into actionable results
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 6
Why Use Data Mining?
Product development
Marketing
Analysis of Claims Distribution
Healthcare
ALM
Fraud detection
Solvency analysis
4
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 7
Data Mining Methods
Classification
Regression
Clustering
Summarizations
Dependency modeling
Deviation Detection
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 8
Data Mining Algorithms
Decision Trees (Breiman et al., 1984)
Logistic regression (Hosmer & Lemeshow,
1989)
Neural Networks (Bishop, 1995; Ripley, 1996)
Fuzzy Logics
Genetic Algorithms (Goldberg, 1989)
Bayesian analysis, (Cheeseman et al., 1988)
Hybrid algorithms
5
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 9
Data Mining Algorithms
-- Decision Trees
What are decision trees
How decision trees work
Choosing variables
Grouping
Creating the leaf nodes of the tree
Strengths and weaknesses
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 10
Data Mining Algorithms
-- Neural Networks
What are Neural Networks
How Neural Networks work
Processing elements
Training
Predicting
Strengths and weaknesses
6
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 11
Data Mining Algorithms
-- Hybrid Algorithms
Problems with standard algorithms
Advanced algorithms
Discovery-driven approaches
Mixture of algorithms
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 12
Data Mining:
Knowledge Discovery Process
Data Acquisition
Data integration
Data exploration
Model building
Understanding your model
Post-mining analysis
7
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 13
Data Mining Process: Data Acquisition
Data acquisition
Getting your data
Data qualification issues
Data quality issues
Data derivation
Defining a study
Basic Risk Characteristics
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 14
Data Mining Process:Data Acquisition
-- Case Study
SOA database for RP-2000 Mortality Tables
10,957,103 exposed life-years
Subset of the database that includes all the lives
above age 70 (3,769,956 exp, 217,490 death)
Risk groups
Age, gender, participation status, union, pay type,
collar type, and annuity amount, etc.
8
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 15
Data Mining Process:Data Acquisition
-- Case Study
Existing study on advanced-age mortality
Smooth extension of the patterns
Families of curves - Gompertz law, etc.
All these approaches aim at explaining the age
pattern of mortality.
Mortality distribution varies among seniors
with different backgrounds
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 16
Data Mining Process: Data Integration
To identify the factors that influence
mortality
To study the interaction of the risk factors
To gain the perspective on the importance
of these factors
9
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 17
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 18
Data Mining Process: Data Integration
-- Case Study
Main effect exists for all six variables
considered
Degrees of the effects of the risk factors are
different.
the interaction of these factors
the importance of the factors
10
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 19
Data Mining Process: Data exploration
Decision tree algorithm
Analyze the influences and the importance of
the mortality risk factors
observations are grouped into several segments
Algorithm - SAS/Enterprise Miner Version
4.2 (2001).
Further study the interaction and the
importance of the risk factors
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 20
Data Mining Process: Data Integration
-- Case Study
Variable Importance Measure
Variable Importance
Participation Status 1.00
Gender 0.75
Annuity size 0.43
Pay Type 0.21
Union 0.18
Collar 0.00
11
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 21
Data Mining Process: Data exploration
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 22
Data Mining Process: Data exploration
12
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 23
Six risk groups:
Employees
Beneficiaries
Combined
Disabled
Male Retirees
Female Retirees.
Logistic regression method
Data Mining Process: Model building
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 24
Data Mining Process: Model Building --Case Study: Female Retiree
13
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 25
Data Mining Process: Model Building
-- Case Study: Female Retiree Group
Collar and Pay Type are two important
variables
The interaction between Collar and Pay
Type does exist
Both annuity size and union are not
picked up by tree algorithm
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 26
Data Mining Process: Model Building
-- Case Study: Female Retiree Group
R-square for the regression is 0.95
PTC PT C x x
p
p
046 . 0 00087 . 0 26 . 0 97 . 17
1
log
2
+ +

,
`

.
|

'

collar mixed
collar blue
collar white
C
0047 . 0
0
0

'

type pay salarized


type pay hourly
type pay combined
PT
0
051 . 0
033 . 0
Where p is the mortality rate, x is the age
14
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 27
Data Mining Process: Model Building -- Case Study: Female Retiree Group
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 28
Data Mining Process: Model Building
-- Case Study: Male Retiree Group
R-square for the regression is 0.92
Where p is the mortality rate, x is the age
SU U S x x
p
p
+ +

,
`

.
|

2
00055 . 0 20 . 0 57 . 14
1
log

'

annuity small
annuity median
annuity e l
S
0074 . 0
060 . 0
arg 044 . 0

'


combined
member union non
member union
U
040 . 0
14 . 0
0
15
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 29
Data Mining Process: SEMMA
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 30
Data Mining Process: Model Building -- Case Study: MaleRetiree
16
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 31
Data Mining Process: Post-mining Analysis -- Case Study
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 32
Data Mining Process: Understanding
your model Case Study
The male retirees mortality model and the female
retirees mortality model depend on different
variables
Mortality of the beneficiaries is determined by
gender, annuity size, the pay type, and their
interactions
The gender factors will play a much-reduced role
in determining beneficiaries mortality model
17
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 33
Data Mining Process: Post-mining
Analysis -- Case Study
Limited results on the mortality distribution for
the ages above 95
As the female demography changed in the past
three decade, variables such as annuity size, and
union will play more important role in
determining the female mortality
Other risk factors such as education, life style,
smoking/non-smoking, etc.
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 34
Data Mining Process: Summary
-- Case Study
Non Gompertz (linear growth) between age
70 and 85
Selection of the risk factors may influence
the quality of the mortality model
Mortality models varies with the most
important risk factor (the participating
status, in this study) among all the other
variables
18
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 35
Data Mining Process:
-- Case Study in Claim Analysis
Basic risk characteristics
Top-down identification
Underlying statistical properties
Domain-specific constraints
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 36
Data Mining Process:
-- Case Study in ALM
Decision tree and DNF learning
Generative stochastic modeling
Probabilistic networks
Probabilistic Rules
Hidden Markov model
19
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 37
Data Mining Process:
-- Applications in Healthcare
More productive managed care program
Pricing
Individual health insurance market
Recovery & prevention of fraudulent claims
Prescription Drugs cost management
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 38
Quiz on Data mining
What is Data Mining?
What can data mining do?
What are data mining techniques?
What are the applications of data mining?
How can you practice on data mining?
20
S OA S an Francisco S pring M eeting
June 24-26, 2002
Slide 39
Summary
Overview of data mining techniques
Its application to actuarial practice
Future developments
Potential contribution to your area

You might also like