Professional Documents
Culture Documents
Pristine
Pristine www.edupristine.com
Multivariate scenario:
Create all possible unique combinations of independent variables
For each of the combinations, find Standard Deviations
Divide the independent values of the response variable by the respective standard deviation
Too cumbersome to do manually using MS Excel. Also the process is iterative.
More convenient to do using Statistical packages like R.
Course approach
First fit a multivariate regression without fixing heteroskedasticity to get a final set of significant variables
Then do manual adjustment and re-fit regression using MS Excel. This will be just for demonstration. As manual
adjustment is always questionable.
Demonstrate linear regression using R
Pristine
Pristine
Pristine
Age
2.
Age Band
3.
4.
Number of Vehicles
5.
Gender
6.
Married
7.
Vehicle Age
8.
9.
Fuel Type
Pristine
Multiple R
0.475273
R Square
0.225885
Adjusted R Square
0.225834
Observations
Standard Error
201.2916
15290
Observations
15290
Sum of Age
# Policies
Average Age
16-25
93,770.0
4,563.0
20.6
26-59
270,793.0
6,384.0
42.4
60+
282,636.0
4,343.0
65.1
# Policies
Average Vehicle Age
3,688
2.50
5,523
8.02
6,079
12.97
1.
Age Band in the form of Average Age of the band (selected out of Age and Age Band). Also got selected over
Years of Driving Experience.
2.
Number of Vehicles
3.
Gender
4.
Married
5.
Vehicle Age Band in the form of Average Vehicle Age of the band (selected out of Vehicle Age and Vehicle
Age Band).
6.
Fuel Type
We will run regression in multivariate fashion and then select final list of variables by taking into
consideration statistical significance.
Pristine
1.
Gender (F = 0 and M = 1)
2.
3.
Fuel Type (P = 0, D = 1)
Snapshot of the final data on which we will run the multivariate regression
Pristine
0.865972274
R Square
0.749907979
Adjusted R Square
0.749809794
Standard Error
114.4310136
Observations
15290
ANOVA
df
SS
Regression
Residual
15283
Total
15289
Coefficients
Intercept
MS
Significance F
200122584.4 13094.45688
800195798
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
624.56529
5.29192
118.02233
0.00000
614.19249
634.93809
614.19249
634.93809
-5.55974
0.06546
-84.93889
0.00000
-5.68804
-5.43144
-5.68804
-5.43144
0.17875
0.97039
0.18420
0.85386
-1.72333
2.08082
-1.72333
2.08082
Gender Dummy
50.88326
1.89081
26.91084
0.00000
47.17705
54.58947
47.17705
54.58947
Married Dummy
78.39837
1.92148
40.80106
0.00000
74.63204
82.16469
74.63204
82.16469
-15.14220
0.26734
-56.63987
0.00000
-15.66623
-14.61818
-15.66623
-14.61818
267.93559
2.74845
97.48614
0.00000
262.54830
273.32287
262.54830
273.32287
Avg Age
Number of Vehicles
Pristine
10
Independent Vars
Coefficients(b)
t Stat
(b/)
P-value
(t-dist table)
Lower 95%
(b-1.96*)
Upper 95%
(b+1.96*)
Lower 95%
(b-1.96*)
Upper 95%
(b+1.96*)
Intercept
624.565
5.292
118.022
0.000
614.192
634.938
614.192
634.938
X1
Avg Age
b1
-5.560
0.065
-84.939
0.000
-5.688
-5.431
-5.688
-5.431
X2
Number of Vehicles
b2
0.179
0.970
0.184
0.854
-1.723
2.081
-1.723
2.081
X3
Gender Dummy
b3
50.883
1.891
26.911
0.000
47.177
54.589
47.177
54.589
X4
Married Dummy
b4
78.398
1.921
40.801
0.000
74.632
82.165
74.632
82.165
X5
b5
-15.142
0.267
-56.640
0.000
-15.666
-14.618
-15.666
-14.618
X6
b6
267.936
2.748
97.486
0.000
262.548
273.323
262.548
273.323
Insignificant
ANOVA
Standard Error
()
df
SS
MS (SS/df)
F (MSReg/MSRes)
Significance F (from
F dist table)
7637.75
600073213.5
100012202.3
n-p-1
15283
200122584.4
13094.457
n-1
15289
800195798
Regression Statistics
Pristine
Multiple R
SquareRoot(R Square)
0.8659723
R Square
SS Regression/SS Total
0.7499080
Adjusted R Square
R2 - (1 - R2)*{p/(n-p-1)}
0.7498098
Standard Error
SquareRoot{SS Residual/(n-p-1)}
114.4310136
Observations
15290
11
Independent Vars
Coefficients(b)
Standard Error
()
t Stat
(b/)
P-value
(t-dist table)
Lower 95%
(b-1.96*)
Upper 95%
(b+1.96*)
Lower 95%
(b-1.96*)
Upper 95%
(b+1.96*)
Intercept
624.565
5.292
118.022
0.000
614.192
634.938
614.192
634.938
X1
Avg Age
b1
-5.560
0.065
-84.939
0.000
-5.688
-5.431
-5.688
-5.431
X2
Number of Vehicles
b2
0.179
0.970
0.184
0.854
-1.723
2.081
-1.723
2.081
X3
Gender Dummy
b3
50.883
1.891
26.911
0.000
47.177
54.589
47.177
54.589
X4
Married Dummy
b4
78.398
1.921
40.801
0.000
74.632
82.165
74.632
82.165
X5
b5
-15.142
0.267
-56.640
0.000
-15.666
-14.618
-15.666
-14.618
X6
b6
267.936
2.748
97.486
0.000
262.548
273.323
262.548
273.323
12
Independent Vars
Coefficients(b)
Standard Error
()
t Stat
(b/)
P-value
(t-dist table)
Lower 95%
(b-1.96*)
Upper 95%
(b+1.96*)
Lower 95%
(b-1.96*)
Upper 95%
(b+1.96*)
Intercept
624.565
5.292
118.022
0.000
614.192
634.938
614.192
634.938
X1
Avg Age
b1
-5.560
0.065
-84.939
0.000
-5.688
-5.431
-5.688
-5.431
X2
Number of Vehicles
b2
0.179
0.970
0.184
0.854
-1.723
2.081
-1.723
2.081
X3
Gender Dummy
b3
50.883
1.891
26.911
0.000
47.177
54.589
47.177
54.589
X4
Married Dummy
b4
78.398
1.921
40.801
0.000
74.632
82.165
74.632
82.165
X5
b5
-15.142
0.267
-56.640
0.000
-15.666
-14.618
-15.666
-14.618
X6
b6
267.936
2.748
97.486
0.000
262.548
273.323
262.548
273.323
13
0.865972274
0.749907979
0.749809794
114.4310136
15290
ANOVA
df
Regression
Residual
Total
SS
MS
600072769.2 120014553.8
Significance F
9165.874
15284
15289
200123028.7
800195798
13093.6292
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Intercept
Avg Age
Gender Dummy
Married Dummy
Avg Vehicle Age
625.005
-5.560
50.883
78.402
-15.142
4.723
0.065
1.891
1.921
0.267
132.333
-84.942
26.912
40.806
-56.641
0.00
0.00
0.00
0.00
0.00
615.7474
-5.6879
47.1768
74.6356
-15.6660
634.2625
-5.4314
54.5890
82.1677
-14.6180
615.7474
-5.6879
47.1768
74.6356
-15.6660
634.2625
-5.4314
54.5890
82.1677
-14.6180
267.935
2.748
97.489
0.00
262.5480
273.3223
262.5480
273.3223
Pristine
14
625.005
-5.560
50.883
78.402
-15.142
267.935
Sign of
Coefficient
-ve
+ve
+ve
-ve
+ve
Inference
Higher is the age, lower is the loss
Average Loss for Males is higher than Females
Average Loss for Single is higher than Married
Older is the vehicle, lower are the losses
Losses are higher for Fuel type D
Pristine
15
600
400
200
0
0
2000
4000
6000
8000
10000
12000
14000
-200
-400
Pristine
16
# Policies
Predicted Loss
0
1528
1529
1529
1529
1529
1529
1529
1529
1529
1530
0
1,167,070
1,046,034
757,330
589,366
531,160
485,428
432,934
385,595
308,050
193,465
Actual Loss
0
1,230,474
991,944
746,854
552,534
553,919
477,284
385,411
423,814
310,846
223,351
Cumulative
Actual Loss
0
1,230,474
2,222,418
2,969,272
3,521,806
4,075,725
4,553,009
4,938,420
5,362,234
5,673,081
5,896,432
Random
0
10%
10%
10%
10%
10%
10%
10%
10%
10%
10%
Cumulative %
Obs
0
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
% Cumulative Actual
Loss
0
20.87%
37.69%
50.36%
59.73%
69.12%
77.22%
83.75%
90.94%
96.21%
100.00%
2500
Actual Loss
2000
600,000
1500
400,000
1000
200,000
500
0
0
Pristine
6
8
Bins of Equal # Policies
10
12
# Policies
Losses
3000
Predicted Loss
800,000
0.27177
100%
%Cumulative Actual Loss
# Policies
1,000,000
Gini Coeff
Gains Chart
1,400,000
1,200,000
Area Under
Gains Curve
0
0.0104
0.0293
0.0440
0.0550
0.0644
0.0732
0.0805
0.0873
0.0936
0.0981
80%
60%
Cumulative % Obs
40%
% Cumulative Actual Loss
20%
0%
0
10
17
Find Standard
Deviation of capped
Losses for the segments.
Detailed methodology
explained in MS Excel.
Calculate Standardized
Capped Losses as
Capped Losses /
Segment Std Dev.
This becomes the new
response variable.
Manually doing this kind of exercise can be flawed as some the segments could be sparsely populated.
This demo. Is just to explain the underlying technique/methodology.
Statistical packages like SAS, R have in-built capability to take care of this.
Pristine
18
0.129001269
0.128716331
4.77078689
15290
ANOVA
df
Regression
Residual
Total
Intercept
Avg Age
Gender Dummy
Married Dummy
Avg Vehicle Age
Fuel Type Dummy
Pristine
5
15284
15289
SS
MS
F
Significance F
51522.10 10304.42 452.73
0
347870.07
22.76
399392.17
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
12.476
0.197 63.374
0.000
12.091
12.862
12.091
12.862
-0.086
0.003 -31.554
0.000
-0.091
-0.081
-0.091
-0.081
0.213
0.079
2.702
0.007
0.058
0.368
0.058
0.368
-0.204
0.080 -2.552
0.011
-0.361
-0.047
-0.361
-0.047
-0.376
0.011 -33.770
0.000
-0.398
-0.354
-0.398
-0.354
0.136
0.115
1.188
0.235
-0.088
0.361
-0.088
0.361
19
Pristine
20
Pristine
21
Thank you!
Pristine
702, Raaj Chambers, Old Nagardas Road, Andheri (E), Mumbai-400 069. INDIA
www.edupristine.com
Ph. +91 22 3215 6191
Pristine
Pristine www.edupristine.com