Professional Documents
Culture Documents
Agenda
1.
2.
3.
4.
5.
6.
7.
8.
2
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Introductions
3
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Introduction / Objective
1. What is Predictive Modeling?
2. Types of predictive models.
3. Data and Data Preparation.
4. Applications case studies.
4
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Predictive Modeling:
A Review of the Basics
5
SCIOinspire Corp Proprietary & confidential. Copyright 2008
6
SCIOinspire Corp Proprietary & confidential. Copyright 2008
P o p u la tio n R is k R a n k in g
E v e n t fre q u e n c y (p e rc e n t)
80
60
40
20
0
0 .2 %
0 .7 %
1 .3 %
4%
14%
25%
C u m u la tiv e T o ta l P o p u la tio n
7
SCIOinspire Corp Proprietary & confidential. Copyright 2008
8
SCIOinspire Corp Proprietary & confidential. Copyright 2008
9
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Identification how?
10
SCIOinspire Corp Proprietary & confidential. Copyright 2008
11
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Diagnosis type
Reliability
Practicality
Physician Referral
High
Lab tests
High
Low
Low
Claims
Medium
High
Prescription
Drugs
Medium
High
Self-reported
Low/medium
Low
Medical and Drug Claims are often the most practical method of identifying
candidates for predictive modeling.
12
SCIOinspire Corp Proprietary & confidential. Copyright 2008
ICD-9-CM CODE
DESCRIPTION
DIABETES
250.xx
Diabetes mellitus
357.2
Polyneuropathy in diabetes
362.0, 362.0x
Diabetic retinopathy
366.41
Diabetic cataract
648.00-648.04
Less severe
13
SCIOinspire Corp Proprietary & confidential. Copyright 2008
CODES
CODE TYPE
DESCRIPTION - ADDITIONAL
HCPCS
J1815
HCPCS
67227
CPT4
DIABETES;
G0108,
G0109
individual or group
CPT4
996.57
ICD-9-CM
V45.85
ICD-9-CM
V53.91
ICD-9-CM
V65.46
ICD-9-CM
14
SCIOinspire Corp Proprietary & confidential. Copyright 2008
More Severe
Insulin**
OralAntiDiabetics
2720*
2723*
2725*
2728*
2730*
2740*
2750*
2760*
2799*
Sulfonylureas**
Antidiabetic - Amino Acid Derivatives**
Biguanides**
Meglitinide Analogues**
Diabetic Other**
ReductaseInhibitors**
Alpha-Glucosidase Inhibitors**
Insulin Sensitizing Agents**
Antiadiabetic Combinations**
15
SCIOinspire Corp Proprietary & confidential. Copyright 2008
16
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Dx Group
402.x
403.1
404.x
Hypertensive heart/renal
disease, with heart failure
415.x
425.X
429.x
Cardiomyopathy/ myocarditis
428.X
Heart failure
Construction of a model*
* From Ian Duncan: Managing and Evaluating Care Management Interventions (Actex, 2008)
Condition Category
Diabetes with No or
Unspecified Complications
0.0
2.1
Hypertension
0.0
1.5
Drug/Alcohol Dependence
0.6
Age-Sex
0.4
4.6
18
Notes
Trumped by Diabetes
with Renal
Manifestation
Trumped by CHF
Construction of a model
Grouper/Risk-adjustment theory is that there is a high correlation
between risk scores and actual dollars (resources used).
The Society of Actuaries has published three studies that test this
correlation. They are available from the SOA and are well worth reading.
(See bibliography.) They explain some of the theory of risk-adjusters and
their evaluation, as well as showing the correlation between $s and Risk
Scores for a number of commercial models.
Note 1: the SOA tests both Concurrent (retrospective) and Prospective
models. Concurrent model correlations tend to be higher.
Note 2: there are some issues with models that you should be aware of:
They tend to be less accurate at the extremes (members with high or
low risk scores);
We have observed an inverse correlation between risk-score and $s
across a wide range of members.
19
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Look-back
Lab
Prescription
Clean Period
Hospital
Office
Office
Admission
Visit
Visit
Office
Visit
20
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Broad:
Rx:
4.7%
Rx
30.8%
6.3%
6.6%
21
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Narrow
+ Broad
+ Rx
TOTAL
Year 2
Year 1
Narrow
+ Broad
+ Rx
Not Identified
TOTAL
75.9%
85.5%
24.1%
100.0%
14.5%
100.0%
92.6%
7.4%
100.0%
22
SCIOinspire Corp Proprietary & confidential. Copyright 2008
100.0%
Reimbursement
23
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Condition(s)
CHF
# members
Score
Risk Total
Expected
Cost
Actual Cost
19.9
39.8
43,780
$ 50,000
532
Cancer 1
20
8.7
174.2
191,620
$ 150,000
796
10
16.0
159.7
175,670
$ 160,000
531
15
9.0
135.3
148,830
$ 170,000
1221
4.8
28.8
31,680
$ 50,000
710
10
11.1
110.9
121,990
$ 125,000
882
Diabetes
3.7
25.7
28,270
$ 28,000
967
Cardiac
6.1
24.5
26,950
$ 30,000
881
Asthma
3.0
24.1
26,510
$ 40,000
723.0
795,300
$ 803,000
82
24
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Program Evaluation
25
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Equals:
Reduced Cost PMPY
Multiplied by: Actual Member Years in Measurement Period
Estimated Savings
$420
20,000
$8,400,000
26
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Actuarial, Underwriting
27
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Provider Profiling
Profiling of provider
Efficiency Evaluation
28
SCIOinspire Corp Proprietary & confidential. Copyright 2008
29
SCIOinspire Corp Proprietary & confidential. Copyright 2008
30
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Predictive
Modeling
Tools
Statistical
Models
Artificial
Intelligence
31
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Predictive
Modeling
Tools
Statistical
Models
Artificial
Intelligence
32
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Actuarial, Underwriting
and Profiling
Perspectives
Medical
Management
Perspective
Program Evaluation
Perspective
33
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Risk Groupers
34
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Company
Risk Grouper
Data Source
IHCIS/Ingenix
ERG
Age/Gender, ICD-9
NDC, Lab
UC San Diego
CDPS
Age/Gender, ICD -9
NDC
DxCG
DCG
RxGroup
Age/Gender, ICD -9
Age/Gender, NDC
Symmetry/Ingenix
ERG
PRG
ICD 9, NDC
NDC
Johns Hopkins
ACG
Age/Gender, ICD 9
35
SCIOinspire Corp Proprietary & confidential. Copyright 2008
* See New SOA study (Winkelman et al) published 2007. Available from SOA.
36
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Risk
Groupers
PM Tools
Statistical
Models
Artificial
Intelligence
37
Actuarial,
Underwriting and
Profiling
Perspectives
38
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Program
Evaluation
Perspective
Statistical Models
39
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Trees
ANOVA
Logistic
Regression
Time Series
Survival
Analysis
Linear
Regression
40
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Non-linear
Regression
Risk
Groupers
PM Tools
Statistical
Models
Artificial
Intelligence
41
Risk
Groupers
PM Tools
Statistical
Models
Artificial
Intelligence
42
43
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Neural
Network
Genetic
Algorithms
Conjugate
Gradient
Nearest
Neighbor
Pairings
Principal
Component
Analysis
Rule
Induction
Fuzzy Logic
Simulated
Annealing
44
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Kohonen
Network
Perception
NN tracks complex
relationships by
resembling the human
brain
NN can accurately
model complicated
health care systems
Reality
Performance equals
standard statistical
models
46
SCIOinspire Corp Proprietary & confidential. Copyright 2008
In Summary
47
SCIOinspire Corp Proprietary & confidential. Copyright 2008
* Meaning that experts determine the risk factors, rather than statistics.
48
SCIOinspire Corp Proprietary & confidential. Copyright 2008
49
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Population
Actual Cost
0.0% - 0.5%
0.5% - 1.0%
Top 1%
Top 5%
Total
67,665
67,665
135,330
676,842
13,537,618
PMPY Total
Actual Cost
$47,357
$20,977
$34,170
$14,303
$1,623
Percentage of
Total Cost
14.6%
6.5%
21.1%
44.1%
100%
0.5% - 1.0%
Top 1%
Top 5%
Total
5,249
24,619
32,496
35,150
14.9%
70.0%
92.4%
100.0%
50
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Population
Actual Cost
0.0% - 0.5%
0.5% - 1.0%
Top 1%
Top 5%
Total
67,665
67,665
135,330
676,842
13,537,618
PMPY Total
Actual Cost
$47,357
$20,977
$34,170
$14,303
$1,623
Percentage of
Total Cost
14.6%
6.5%
21.1%
44.1%
100%
0.5% - 1.0%
Top 1%
Top 5%
Total
5,249
24,619
32,496
35,150
14.9%
70.0%
92.4%
100.0%
51
SCIOinspire Corp Proprietary & confidential. Copyright 2008
P o p u la tio n R is k R a n k in g
E v e n t fre q u e n c y (p e rc e n t)
80
60
40
20
0
0 .2 %
0 .7 %
1 .3 %
4%
14%
25%
C u m u la tiv e T o ta l P o p u la tio n
53
SCIOinspire Corp Proprietary & confidential. Copyright 2008
54
SCIOinspire Corp Proprietary & confidential. Copyright 2008
55
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Savings/Cost ($ millions)
$4
$3
$2
Gro ss
Savings
Expenses
$1
Net Savings
$2%
17%
32%
47%
62%
$(1)
$(2)
Penetration (%)
56
SCIOinspire Corp Proprietary & confidential. Copyright 2008
77%
92%
57
SCIOinspire Corp Proprietary & confidential. Copyright 2008
What is a model?
58
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Background
Available data for creating the score included the following
Eligibility/demographics
Rx claims
Medical claims
59
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Analysis Dataset
Test Dataset
Diagnostics
Put half of the claimants into an analysis dataset and half into a test dataset. This is to prevent
over-fitting. The scoring will be constructed on the analysis dataset and tested on the test
dataset. Diagnostic reports are run on each dataset and compared to each other to ensure that
the compositions of the datasets are essentially similar. Reports are run on age, sex, cost, as
well as disease and Rx markers.
60
SCIOinspire Corp Proprietary & confidential. Copyright 2008
61
SCIOinspire Corp Proprietary & confidential. Copyright 2008
62
SCIOinspire Corp Proprietary & confidential. Copyright 2008
40
% Claimants
35
Avg of composite
dependent variable
30
25
20
15
10
5
0
1
64
Typical
"transformed"
variable
65
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Anticoagulants flag
Digoxin flag
Diuretics flag
Number of corticosteroid drug prescriptions truncated at 2
66
SCIOinspire Corp Proprietary & confidential. Copyright 2008
67
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Stepwise Algorithm
1. Run a single-variable regression for each independent variable.
Select the variable that results in the greatest value of R2. This is
Variable 1.
2. Run a two-variable regression for each remaining independent
variable. In each regression, the other independent variable is
Variable 1. Select the remaining variable that results in the greatest
incremental value of R2. This is Variable 2.
3. Run a three-variable regression for each remaining independent
variable. In each regression, the other two independent variables are
Variables 1 and 2. Select the remaining variable that results in the
greatest incremental value of R2. This is Variable 3.
6. Results - Examples
Stepwise linear regressions were run using the "promising"
independent variables as inputs and the composite dependent
variable as an output.
Separate regressions were run for each patient sex.
Sample Regressions
Female
Scheduled drug prescription 358.1
NClass
414.5
MClass
157.5
Baseline cost
0.5
Diabetes Dx
1818.9
Intercept
18.5
69
SCIOinspire Corp Proprietary & confidential. Copyright 2008
6. Results - Examples
Examples of application of the female model
Female Regression Regression Formula
(Scheduled Drug *358.1) + (NClass*414.5) + (Cost*0.5) + (Diabetes*1818.9) + (MClass*157.5) -18.5
Transformed
Value
Predicted Value
Raw Value
Claimant
ID
1
2
3
3
2
0
1
2
3
3
6
0
1
2
3
423
5,244
1,854
1
2
3
0
0
0
1
2
3
8
3
0
Transform Function
Actual Value
Schedule Drugs
Schedule Drugs
2 < RV < 5
RV >5
2.0
3.0
2 $
2 $
1 $
716.20
716.20
358.10
Value Range
Transformed Value
RV< 2
1.0
3 $
6 $
0.5 $
1,243.50
2,487.00
207.25
Value Range
Transformed Value
RV < 2
0.5
2,000 $
6,000 $
2,000 $
1,000.00
3,000.00
1,000.00
Value Range
Transformed Value
Cost
RV < 5k 5k < RV < 10k RV > 10k
2,000
6,000
10,000
NClass
Cost
Diabetes
NClass
2 < RV < 5
3.0
Value Range
Transformed Value
Yes
1.0
Diabetes
No
0.0
3 $
2 $
0.5 $
472.50
315.00
78.75
Value Range
Transformed Value
RV < 1
0.5
MClass
1 < RV < 7
2.0
$
$
$
3,413.70
6,499.70
1,625.60
0 $
0 $
0 $
MClass
TOTAL
1
2
3
$
$
$
4,026.00
5,243.00
1,053.00
70
SCIOinspire Corp Proprietary & confidential. Copyright 2008
RV > 5
6.0
RV > 7
3.0
71
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Background Case 1
Large client.
Several years of data provided for modeling.
Never able to become comfortable with data which did
not perform well according to our benchmark statistics
($/claimant; $pmpm; number of claims per member).
BENCHMARK
DATA
(Commercial only)
pmpm
Medical Only
70.40
Rx Only
16.49
7.70
TOTAL
86.89
22.10
(Commercial;
Claims/
member/ year
pmpm
14.40
Claims/
member/ year
Medical + Rx
32.95
5.36
TOTAL
32.95
5.36
72
Background Case 1
73
SCIOinspire Corp Proprietary & confidential. Copyright 2008
All Groups
140
120
100
80
60
40
20
-8
-9
-1
0% 00
%
to
+
0% -9
9
- 7 to %
0% -8
9
- 6 to %
0% -7
9
- 5 to %
0% -6
9
- 4 to %
0% -5
9
- 3 to %
0% -4
9
- 2 to %
0% -3
9%
t
-1 o
0% -2
9
to %
0 % 19
%
to
0% -9%
1 0 to
% 9%
20 to 1
% 9%
30 to 2
% 9%
40 to 3
% 9%
50 to 4
% 9%
60 to 5
% 9%
70 to 6
% 9%
80 to 7
% 9%
90 to 8
% 9%
to
99
%
Analysis 1: all groups. This analysis shows that, at the group level,
prediction is not particularly accurate, with a significant number of groups at
the extremes of the distribution.
74
SCIOinspire Corp Proprietary & confidential. Copyright 2008
50
40
30
20
10
-1
0
-9
0% 0%
+
t
-8 o -9
0%
9%
to
-7
0% 89%
to
-6
0% 79%
t
-5 o -6
0%
9%
to
-4
0% 59%
t
-3 o -4
0%
9%
to
-2
0% 39%
t
-1 o -2
0%
9%
to
-1
0% 9 %
to
-9
0% %
10 to 9
%
%
to
19
20
%
%
to
29
30
%
%
to
39
40
%
%
to
49
50
%
%
to
59
60
%
%
to
69
70
%
%
to
79
80
%
%
to
89
90
%
%
to
99
%
-8
-9
-1
0% 00
%
to
+
0% -9
9
-7 to %
0% -8
9
-6 to %
0% -7
9
-5 to %
0% -6
9
-4 to %
0% -5
9
-3 to %
0% -4
9
-2 to %
0% -3
9
t
-1 o %
0% -2
9
to %
0% 19
%
to
0% -9%
10 to
% 9%
20 to 1
% 9%
30 to 2
% 9%
40 to 3
% 9%
50 to 4
% 9%
60 to 5
% 9%
70 to 6
% 9%
80 to 7
% 9%
90 to 8
% 9%
to
99
%
Conclusion
77
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Background Case 2.
78
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Background
A number of different tree models were built (at clients
request).
Technically, an optimal model was chosen.
Problem: how to convince Underwriting that:
Adding the predictive model to the underwriting process
produces more accurate results; and
They need to change their processes to incorporate the
predictive model.
79
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Some data
Node
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
PREDICTED
Average Profit
(3.03)
0.19
(0.20)
0.09
(0.40)
(0.27)
0.11
0.53
(0.13)
0.27
0.38
0.08
0.06
0.27
(1.07)
0.07
(0.33)
0.11
(0.13)
PREDICTED
PREDICTED
Number in
Number in Node
Node
(Adjusted)
70
173
860
2,122
2,080
5,131
910
2,245
680
1,678
350
863
650
1,604
190
469
1,150
2,837
1,360
3,355
1,560
3,849
320
789
12,250
30,221
2,400
5,921
540
1,332
10,070
24,843
1,400
3,454
4,460
11,003
1,010
2,492
42,310
104,380
80
SCIOinspire Corp Proprietary & confidential. Copyright 2008
ACTUAL
Number in
node
170
2,430
6,090
2,580
20
760
1,810
470
2,910
3,740
3,920
830
29,520
6,410
1,320
24,950
3,250
11,100
2,100
104,380
ACTUAL
Average Profit
(0.60)
0.07
(0.06)
0.10
0.02
0.16
0.04
(0.01)
0.03
0.04
(0.07)
0.08
0.02
0.21
(0.03)
(0.08)
(0.10)
0.08
(0.11)
0.005
Node
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
PREDICTED
Average Profit
(3.03)
0.19
(0.20)
0.09
(0.40)
(0.27)
0.11
0.53
(0.13)
0.27
0.38
0.08
0.06
0.27
(1.07)
0.07
(0.33)
0.11
(0.13)
PREDICTED
PREDICTED
Number in
Number in Node
Node
(Adjusted)
70
173
860
2,122
2,080
5,131
910
2,245
680
1,678
350
863
650
1,604
190
469
1,150
2,837
1,360
3,355
1,560
3,849
320
789
12,250
30,221
2,400
5,921
540
1,332
10,070
24,843
1,400
3,454
4,460
11,003
1,010
2,492
42,310
104,380
ACTUAL
Number in
node
170
2,430
6,090
2,580
20
760
1,810
470
2,910
3,740
3,920
830
29,520
6,410
1,320
24,950
3,250
11,100
2,100
104,380
ACTUAL
Average Profit
(0.60)
0.07
(0.06)
0.10
0.02
0.16
0.04
(0.01)
0.03
0.04
(0.07)
0.08
0.02
0.21
(0.03)
(0.08)
(0.10)
0.08
(0.11)
0.005
Directionally
Correct
(+ or -)
6 red
13 green
81
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Node
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
PREDICTED
Average Profit
(3.03)
0.19
(0.20)
0.09
(0.40)
(0.27)
0.11
0.53
(0.13)
0.27
0.38
0.08
0.06
0.27
(1.07)
0.07
(0.33)
0.11
(0.13)
PREDICTED
PREDICTED
Number in
Number in Node
Node
(Adjusted)
70
173
860
2,122
2,080
5,131
910
2,245
680
1,678
350
863
650
1,604
190
469
1,150
2,837
1,360
3,355
1,560
3,849
320
789
12,250
30,221
2,400
5,921
540
1,332
10,070
24,843
1,400
3,454
4,460
11,003
1,010
2,492
42,310
104,380
ACTUAL
Number in
node
170
2,430
6,090
2,580
20
760
1,810
470
2,910
3,740
3,920
830
29,520
6,410
1,320
24,950
3,250
11,100
2,100
104,380
ACTUAL
Average Profit
(0.60)
0.07
(0.06)
0.10
0.02
0.16
0.04
(0.01)
0.03
0.04
(0.07)
0.08
0.02
0.21
(0.03)
(0.08)
(0.10)
0.08
(0.11)
0.005
Directionally Predicted
Correct
to be
(+ or -)
Profitable
6 red
13 green
82
SCIOinspire Corp Proprietary & confidential. Copyright 2008
11 nodes
Underwriting Decision-making
Underwriting Decision
Total Profit
Average
Profit per
Case
Cases
Written
557.5
0.005
104,380
Average
Profit per
Case
Cases
Written
557.5
0.005
104,380
1,379.4
0.016
87,760
83
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Underwriting Decision-making
Underwriting Decision
Total Profit
84
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Underwriting Decision-making
Underwriting Decision
Total Profit
Average
Profit per
Case
Cases
Written
557.5
0.005
104,380
1,379.4
0.016
87,760
2,219.5
0.021
104,380
Average
Profit per
Case
Cases
Written
85
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Underwriting Decision-making
Underwriting Decision
Total Profit
557.5
0.005
104,380
1,379.4
0.016
87,760
2,219.5
0.021
104,380
2,543.5
0.026
100,620
86
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Underwriting Decision-making
Underwriting Decision
Total Profit
Average
Profit per
Case
Cases
Written
557.5
0.005
104,380
1,379.4
0.016
87,760
2,219.5
0.021
104,380
2,543.5
0.026
100,620
3,836.5
0.038
100,620
Average
Profit per
Case
Cases
Written
87
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Underwriting Decision-making
Underwriting Decision
Total Profit
557.5
0.005
104,380
1,379.4
0.016
87,760
2,219.5
0.021
104,380
2,543.5
0.026
100,620
3,836.5
0.038
100,620
2,540.8
0.025
101,090
88
SCIOinspire Corp Proprietary & confidential. Copyright 2008
89
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Background
Large health plan client seeking a model to improve case
identification for case management.
Considered two commercially-available models:
90
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Analysis
91
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Analysis
99 96 93 90 87 84 81 78 75 72 69 66 63 60 57 54 51 48 45 42 39 36 33 30 27 24 21 18 15 12 9
Model Percentile
Model 2
Model 1
92
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Analysis
93
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Background
Lift Chart Comparison between Two models
60.0%
Percent of Members w/ Hospitalization Identified
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
99
98
97
96
95
94
93
92
91
90
89
88
Model Percentile
Model 2
Model 1
94
SCIOinspire Corp Proprietary & confidential. Copyright 2008
87
86
85
84
83
82
81
80
Analysis
Decile
From
Decile Admissions
To
Population
Expected
Actual
Predicted
Frequency
Actual
Frequency
Predictive
ratio
100%
90%
1,690
808
694
47.8%
41.1%
85.9%
90%
80%
1,699
268
321
15.8%
18.9%
119.6%
80%
70%
1,657
152
247
9.2%
14.9%
162.0%
70%
60%
1,673
107
191
6.4%
11.4%
178.4%
60%
50%
1,681
82
168
4.9%
10.0%
204.0%
50%
40%
1,760
67
165
3.8%
9.4%
246.7%
40%
30%
1,667
50
118
3.0%
7.1%
236.0%
30%
20%
1,729
38
92
2.2%
5.3%
241.9%
20%
10%
1,624
26
68
1.6%
4.2%
261.7%
10%
0%
1,708
91
37
5.3%
2.2%
40.9%
16,888
1,690
2,101
100%
124.4%
95
SCIOinspire Corp Proprietary & confidential. Copyright 2008
96
SCIOinspire Corp Proprietary & confidential. Copyright 2008
97
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Variable
Values
Intercept
Personal Disease
History 1
Health Screenings
Have you had a SIGMOIDOSCOPY within the last 5 years? (tube inserted in rectum
to check for lower intestine problems)
Weight
Management
Health Screenings
Personal Disease
History 2
Have you never been diagnosed with any of the following: list of 27 major
conditions
Personal Disease
History 3
TIA (mini-stroke lasting less than 24 hrs), Heart Attack, Angina, Breast Cancer,
Emphysema
Immunizations
Pneumonia
Physical Activity 1
In the last month, how often have you been angered because of things that
happened that were outside your control?
98
190
0 (No)
1 (Yes)
10,553
0 (No)
1 (Yes)
2,045
26 (Min)
40 (No Value)
45 (Max)
3,069
4,722
5,312
0 (No)
1 (Yes)
1,176
0 (No)
1 (Yes)
(1,220)
0 (No)
1 (Yes)
0 (No)
1 (Yes)
0 (Min, No Value)
20 (Max)
Cost
Impact
0 (Never, Almost
Never, Sometimes,
Fairly Often)
1 (Very Often, No
Value)
2,589
1,118
(915)
-
1,632
Please rate how confident you are that you can have your skin
checked by a doctor once a year?
Women's health 1
Women's health 2
Physical Activity 2
Nutrition
Tobacco
Please rate how confident you are that you can keep from
smoking cigarettes when you feel you need a lift.
99
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Discussion?
100
SCIOinspire Corp Proprietary & confidential. Copyright 2008
(224)
2 (Not confident)
(447)
3 (Fairly confident)
4 (Confident)
(671)
(894)
5 (Very Confident)
7 (No Value)
(1,118)
(1,565)
0 (No)
1 (Yes)
1 (NotPlanning (I am
planning on becoming
pregnant in the next 6
months.))
2 (No Value)
3 (Planning (I am
planning on becoming
pregnant in the next 6
months.))
4 (Pregnant (I am
currently pregnant))
0 (Min, No Value)
3 (Max)
0 (None, No Value)
999
590
1,181
1,771
2,361
(917)
-
1 (OneThree, FourFive)
2 (SixPlus)
(868)
(1,736)
(294)
(441)
2 (Not confident)
3 (Fairly confident)
4 (Confident)
(588)
(883)
(1,177)
Selected references
This is not an exhaustive bibliography. It is only a starting point for
explorations.
Shapiro, A.F. and Jain, L.C. (editors); Intelligent and Other Computational
Techniques in Insurance; World Scientific Publishing Company; 2003.
Dove, Henry G., Duncan, Ian, and Robb, Arthur; A Prediction Model for
Targeting Low-Cost, High-Risk Members of Managed Care Organizations;
The American Journal of Managed Care, Vol 9 No 5, 2003
Berry, Michael J. A. and Linoff, Gordon; Data Mining Techniques for
Marketing, Sales and Customer Support; John Wiley and Sons, Inc; 2004
Montgomery, Douglas C., Peck, Elizabeth A., and Vining, G Geoffrey;
Introduction to Linear Regression Analysis; John Wiley and Sons, Inc; 2001
Kahneman, Daniel, Slovic, Paul, and Tversky (editors); Judgment under
uncertainty: Heuristics and Biases; Cambridge University Press; 1982
101
SCIOinspire Corp Proprietary & confidential. Copyright 2008
102
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Further Questions?
iduncan@soluciaconsulting.com
Solucia Inc.
220 Farmington Avenue, Suite 4
Farmington, CT 06032
860-676-8808
www.soluciaconsulting.com
103
SCIOinspire Corp Proprietary & confidential. Copyright 2008
Outline
Exponential Family
Components of GLM
Estimation
Exponential family
Poisson Distribution
| h
i (|; ) =
>
|!
where | = 0> 1> 2>
i (|; ) = exp (| log log |!)
d (|) = |
e () = log
Other parameters, in addition to the parameter of interest are regarded as nuisance parameters, treated
as known.
Poisson, Normal, and Binomial
Normal Distribution
Binomial Distribution
1
2
i (|; ) =
1@2 exp 2 (| ) >
2
2 2
1
i (|; ) =
|2
|
2
1
2
i (|; ) = exp 22 + 2 22 2 log 2
| (1 )q|
q!
| (1 )q| >
|! (q |)!
"
i (|; ) = exp
d (|) = |
e () = @2
2
d (|) = |
1
2
f () = 2
2 2 log 2
2
|
g (|) = 2
2
| log | log (1 )
+q log (1 ) + log q
|
e () = log log (1 ) = log 1
Properties
Log-likelihood Function
f ()
0
e ()
H [d (\ )] =
go(;|)
X (; |) = g
score statistics
00
Y du [d (\ )] =
00
i3
e ()
X = d (\ ) e () + f () regarded as a random variable (used for inference about parameters in GLMs)
H [X ] = 0
Y du [X ] =
formation
h
H X
00
0
00
e ()f ()
f () =, called the in0
e ()
= Y du (X ) = =
Components of GLM
Random Component
1. Response variables \1> = = = > \Q are assumed to share
the same distribution form from the exponential family. That is
i (|l; l) = exp [|le (l) + f (l) + g (|l)] =
2. The joint probability density function of \1> = = = > \Q
is
i (|1> = = = > |Q ; 1> = = = > Q )
=
Q
Y
l=1
= exp
Q
X
l=1
|le (l) +
Q
X
l=1
f (l) +
Q
X
l=1
g (|l)
Systematic Component
A set of parameters and explanatory variables
xW1
..
X
xWQ
{11 {1s
..
...
= ..
{Q1 {Qs
Link Function
A monotone link function j such that
j (l) = l = xW
1 =
=
s
X
m=0
m {lm =
l = H [\l] =
lqghs=
Logistic Regression
Q l> 2 =
\l
j (l) = l = xW
l , where the link function is the
identity function.
lqghs=
Ehuqrxool (l) =
l l (1 l)1|l
l=1
Usually, written as
y = X + e
l=l=g=
Q
X
Q
X
l
= exp
|l log
+
log (1 l)
1
l
l=1
l=1
l
>
j (l) = log
1 l
the logit function.
Poisson Regression
\l
lqghs=
S rvvlrq (l) =
l=1 |l!
Q
Q
Q
X
X
X
= exp
|l log l
l
log |l!
l=1
l=1
l=1
Estimation
Consider independent random variables \1> = = = > \Q .
H [\l] = l=
j (l) = xW
l =
For each \l, the log-likelihood function is
ol = |le (l) + f (l) + g (|l) =
H (\l) = l = f0 (l) @e0 (l)
h
00
00
i3
Q
X
l=1
Q
X
l=1
ol
|le (l) +
Q
X
l=1
f (l) +
Q
X
l=1
g (|l) =
#
Q "
X
Col
l=1 C m
#
Q "
X
Col Cl Cl C l
l=1 C l Cl C l C m
!#
Q "
X
(|l l)
Cl
=
{lm
C l
l=1 Y du (\l)
Since
Col
= |le0 (l) + f0 (l) = e0 (l) [|l l]
Cl
Cl
f00 (l) e0 (l) + f0 (l) e00 (l)
=
Cl
[e0 (l)]2
= e0 (l) Y du (\l)
C l
= {lm.
C m
The variance-covariance matrix of the Xm0 v has terms
h
=mn = H Xm Xn
Newton-Raphson Algorithm
The estimating equation is given by
=(p1)b(p) = =(p1)b(p1) + U(p1)> (1)
where b(p) is the vector of estimates of the parameters 1> = = = > s at the pwk iteration, =(p1) is
the information matrix at the (p 1)wk iteration,
U(p1) is the vector of Xm0 v at the (p 1)wk iteration.
!2
{lm {ln
Cl
=
>
Y
du
(\
)
C
l
l
l=1
Q
X
1
Cl
zll =
Y du (\l) C l
!2
!2
{lm {ln
Cl
(p1)
en
n=1 l=1 Y du (\l) C l
s X
Q
X
Q " (| )
X
l
l
Cl
+
{lm
C l
l=1 Y du (\l)
!#
(2)
2
3
6 7 8 9 10 12 15
1 1 0 0 0 0 1
1
1
XW Wz>
where z has elements
}l =
s
X
(p1)
{ln en
+ (|l l)
n=1
C
with l and C l evaluated at
l
Cl
>
C l
b(p1).
where
(p1)
XW WX
b(p) =
"
=
XW Wz(p1)=
1
2
"
and xl =
1
{l
and
W
X
Wz
PQ
|l
e1+e2{l
= Pl=1
=
{l|l
Q
l=1 e1+e2{l
j (l) = l = xW
l = l=
We have,
Cl
= 1=
C l
The maximum likelihood estimates are obtained iteratively from the equations
Thus,
zll =
(p1)
b(p) =
XW WX
1
1
=
Y du (\l)
1 + 2{l
and
}l = e1 + e2{l + (|l e1 e2{l)
= |l=
Also
W
= = X
PWX
Q
1
l=1 e1+e2{l
= P
{l
Q
l=1 e1+e2{l
XW Wz(p1)>
y=z=
PQ
{l
l=1 e1+e2{l
PQ
{2l
l=1 e1+e2{l
2
3
..
15
and
1 1
1 1
= .. .. =
(p)
e1
(p)
e2
(1)
(1)
(1)
XW WX
"
XW Wz(1) =
"
=
"
1=821429 0=75
0=75
1=25
"
9=869048
0=583333
(1)1
XW WX
"
7=4514
4=9375
"
#"
>
b
1
b
2
7=45163
4=93530
#
b is the inverse
The variance-covariance matrix for
of the information matrix = = XW WX. That is
>
XW Wz(1)
0=729167 0=4375
0=4375 1=0625
b =
Therefore,
b(2) =
so
1 2
9=869048
0=583333
=1 =
"
0=7817 0=4166
0=4166 1=1863
b is 0=7817 =
Thus the estimated standard error for
1
b is
0=8841 and the estimated standard error for
2
1=1863 = 1=0892.
So, for example, an approximate 95% condence interval for the slope 2 is
4=93530 1=96 1=0892 or (2=80> 7=07) =
>res.p=glm(y~x,family=poisson(link="identity"))
>summary(res.p)
Logistic Regression
log10 FV2pjo1
Number of Number
beetle, ql killed, |l
1=6907
1=7242
1=7552
1=7842
1=8113
1=8369
1=8610
1=8839
59
60
62
56
63
59
62
60
6
13
18
28
52
53
61
60
l
(1 l)
Q |l log 1 l +ql log
X
!
=
=
ql
+ log
l=1
|l
l
log
1 l
= 1 + 2{l>
so
l =
exp ( 1 + 2{l)
1 + exp ( 1 + 2{l)
and
Consider Q random variables \1> = = = > \Q , where \l
Elq (ql> l).
|
(
+
{
)
q
log
[1
+
exp
(
+
{
)]
Q
l
l
l
l
1
2
1
2
!
X
ql
o=
+ log
l=1
|l
l
=
1 l
X1 =
" P
#
PQ
Q q (1 )
q
{
(1
)
l
l
l
l
l
l
l
l=1
PQ
= = PQl=1
=
2
q
{
(1
)
l
l
l
l
l=1
l=1 ql{l l (1 l)
(p)
e1
(p)
e2
1 2
30=384
34=270
R Code
Data entry and manipulation
>y=c(6,13,18,28,52,53,61,60)
>n=c(59,60,62,56,63,59,62,60)
>x=c(1.6907,1.7242,1.7552,1.7842,
1.8113,1.8369,1.8610,1.8839)
>n_y=n-y
>beetle.mat=cbind(y,n_y)
Logistic Regression
>res.glm=glm(beetle.mat~x,
family=binomial(link="logit"))
Estimated Proportion of Deaths
>fitted.values(res.glm)
Inference
Condence Intervals
Statistic
Wald Statistic:
(b )W = (b) (b ) "2 (s) >
Hypothesis Tests
e Q > =1
Hypothesis Test
1. Specify a model P0 cerresponding to K0. Specify a
more general model P1 (with P0 as a special case
of P1).
Prediction
S \l = 1|{l
= {l
=
b +
b { + +
b {
exp
0
1 l1
s ls
b +
b { + +
b {
1 + exp
0
1 l1
s ls
Prediction Accuracy
Classication Table: For a given threshold f [0> 1],
we can form the following 2 2 classication table.
Predicted
Obsedved 0
1
0
q00 q01
1
q10 q11
Total
q0 q1
Total
q0
q1
q
Summary
Generalized Linear Models (GLM; McCullagh and
Nelder, 1983) extend the ordinary linear regression
model to encompass nonnormal response while, at
the same time, enjoying nearly all its merits.
Within the maximum likelihood framework, GLMs
provide a unied approach for commonly used linear
models with continuous, binary/categorical, or count
response.
GLMs are exible and easy to implement.
Actuaries are capable.