You are on page 1of 46

Deloitte Consulting, 2005

Introduction to Bootstrapping
James Guszcza, FCAS, MAAA
CAS Predictive Modeling Seminar
Chicago
September, 2005

Deloitte Consulting, 2005

Whats it all about?

Actuaries compute points estimates of


statistics all the time.
Loss ratio/claim frequency for a population
Outstanding Losses
Correlation between variables
GLM parameter estimates

A point estimate tells us what the data


indicates.
But how can we measure our confidence in
this indication?

Deloitte Consulting, 2005

More Concisely

Point estimate says:


what do you think?
Variability of the point estimate says:
how sure are you?
Traditional approaches
Credibility theory
Use distributional assumptions to construct
confidence intervals

Is there an easier and more flexible way?

Deloitte Consulting, 2005

Enter the Bootstrap

In the late 70s the statistician Brad Efron


made an ingenious suggestion.
Most (sometimes all) of what we know about
the true probability distribution comes from
the data.
So lets treat the data as a proxy for the true
distribution.
We draw multiple samples from this proxy

This is called resampling.

And compute the statistic of interest on each


of the resulting pseudo-datasets.

Deloitte Consulting, 2005

Philosophy

[Bootstrapping has] requires very little in the


way of modeling, assumptions, or analysis,
and can be applied in an automatic way to
any situation, no matter how complicated.
An important theme is the substitution of
raw computing power for theoretical analysis
--Efron and Gong 1983

Bootstrapping fits very nicely into the data


mining paradigm.

Deloitte Consulting, 2005

The Basic Idea

Theoretical Picture

Any actual sample of data


was drawn from the unknown
true distribution

The true
distribution
in the sky

We use the actual data to


make inferences about the
true parameters ()

Each green oval is the


sample that might have
been
Sample 1
Y 1, Y Y
1

1
2

Y1

1
k

Sample 2
Y 1, Y Y
2

2
2

Y2

2
k

Sample 3
Y , Y 2 Y
3
1

Y3

Sample N
YN1, YN2 YNk

The distribution of our estimator (Y) depends on both the true


distribution and the size (k) of our sample

YN

Deloitte Consulting, 2005

The Basic Idea

The Bootstrapping Process

Treat the actual distribution


as a proxy for the true
distribution.

The actual
sample

Sample with replacement


your actual distribution N
times.

Y1, Y2 Yk

Compute the statistic of


interest on each re-sample.

Re-sample 1
Y* 1, Y* 2 Y*
1

Y*1

Re-sample 2
Y* 1, Y* 2 Y*
2

Y*2

Re-sample 3
Y* , Y* Y*
3
1

3
2

3
k

Re-sample N
Y*N1, Y*N2 Y*Nk

Y*3

{Y*} constitutes an estimate of the distribution of Y.

Y*N

Deloitte Consulting, 2005

Sampling With Replacement

In fact, there is a chance of


(1-1/500)500 1/e .368
that any one of the original data points wont
appear at all if we sample with replacement 500
times.
any data point is included with Prob .632

Intuitively, we treat the original sample as the


true population in the sky.
Each resample simulates the process of taking
a sample from the true distribution.

Deloitte Consulting, 2005

Theoretical vs. Empirical


Graph on left: Y-bar calculated from an number of
samples from the true distribution.
Graph on right: {Y*-bar} calculated in each of 1000 resamples from the empirical distribution.
Analogy: : Y ::

Y : Y*

bootstrap distribution (Y*-bar)

0.6
0.4

0.02

0.2

0.01

0.0

0.00

phi.ybar

0.03

0.8

0.04

true distribution (Y-bar)

70

80

90

100
ybar

110

120

98.5

99.0

99.5

100.0

y.star.bar

100.5

101.0

Deloitte Consulting, 2005

Summary

The empirical distribution your data


serves as a proxy to the true distribution.
Resampling means (repeatedly) sampling
with replacement.
Resampling the data is analogous to the
process of drawing the data from the true
distribution.
We can resample multiple times
Compute the statistic of interest T on each resample
We get an estimate of the distribution of T.

Deloitte Consulting, 2005

Motivating Example

Lets look at a simple case


where we all know the answer
in advance.
Pull 500 draws from the
n(5000,100) dist.
The sample mean 5000
Is a point estimate of the
true mean .
But how sure are we of this
estimate?
From theory, we know that:
s.d .( X ) / N 100

500

4.47

raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4812.30
97.5%ile
5195.58

Deloitte Consulting, 2005

Visualizing the Raw Data

500 draws from n(5000,100)


Look at summary statistics,
histogram, probability density
estimate, QQ-plot.
looks pretty normal

raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4812.30
97.5%ile
5195.58

Normal Q-Q Plot

4700

0.000

4900

0.002

5100

0.004

n(5000,100) data

4700

4800

4900

5000

5100

5200

5300

-3

-2

-1

Deloitte Consulting, 2005

Sampling With Replacement


Now lets use resampling to estimate the
s.d. of the sample mean (4.47)

Draw a data point at random from the data set.

Draw a second data point.

Then throw it back in

Keep going until weve got 500 data points.

Then throw it back in

You might call this a pseudo data set.

This is not merely re-sorting the data.

Some of the original data points will appear more than


once; others wont appear at all.

Deloitte Consulting, 2005

Resampling

Sample with
replacement 500 data
points from the
original dataset S
Call this S*1
Now do this 999
more times!
S*1, S*2,, S*1000
Compute X-bar on
each of these 1000
samples.

Deloitte Consulting, 2005

R Code
norm.data <- rnorm(500, mean=5000, sd=100)
boots <- function(data, R){
b.avg <<- c(); b.sd <<- c()
for(b in 1:R) {
ystar <- sample(data,length(data),replace=T)
b.avg <<- c(b.avg,mean(ystar))
b.sd <<- c(b.sd,sd(ystar))}
}
boots(norm.data, 1000)

Deloitte Consulting, 2005

Results

From theory we know that


X-bar ~ n(5000, 4.47)
Bootstrapping estimates this
pretty well!
And we get an estimate of
the whole distribution, not
just a confidence interval.

raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4705.08
97.5%ile
5259.27

Normal Q-Q Plot

4985

4995

5005

0.00 0.02 0.04 0.06 0.08

bootstrap X-bar data

X-bar
theory bootstrap
1,000
1,000
5000.00 4995.98
4.47
4.43
4991.23 4987.60
5008.77 5004.82

4985

4990

4995

5000

5005

5010

-3

-2

-1

Deloitte Consulting, 2005

Two Ways of Looking at a Confidence


Interval

Approximate normality assumption

X-bar 2*(bootstrap dist s.d.)

Percentile method
Just take the desired percentiles of the
bootstrap histogram.
More reliable in cases of asymmetric bootstrap
histograms.

mean(norm.data) - 2 * sd(b.avg)
[1] 4986.926
mean(norm.data) + 2 * sd(b.avg)
[1] 5004.661

raw data
statistic
value
#obs
500
4995.79
mean
98.78
sd
2.5%ile
4705.08
97.5%ile
5259.27

X-bar
theory bootstrap
1,000
1,000
5000.00 4995.98
4.47
4.43
4991.23 4987.60
5008.77 5004.82

Deloitte Consulting, 2005

And a Bonus

110
105
100

95

90

Note that we can calculate both the mean and standard


deviation of each pseudo-dataset.
This enables us to estimate the correlation between the
mean and s.d.
Normal distribution is not skew mean, s.d. are
uncorrelated.
Our bootstrapping experiment confirms this.

sample.sd

4985

4990

4995
sample.mean

5000

5005

5010

Deloitte Consulting, 2005

More Interesting Examples

Weve seen that bootstrapping replicates a


result we know to be true from theory.
Often in the real world we either dont know
the true distributional properties of a
random variable
or are too busy to find out.
This is when bootstrapping really comes in
handy.

Deloitte Consulting, 2005

Severity Data
2700 size-of-loss data points.

severity distribution
4 e-04

Lets estimate the distributions of the sample


mean & 75th %ile.
Gamma? Lognormal? Dont need to know.

2 e-04

Mean = 3052, Median = 1136

0 e+00

10000

20000

30000

40000

50000

Deloitte Consulting, 2005

Bootstrapping Sample Avg, 75th %ile


Normal Q-Q Plot

0.000

2800

3000

0.002

3200

0.004

3400

bootstrap dist of severity sample avg

2800

3000

3200

3400

-3

-2

Normal Q-Q Plot

0.000

2800

3000

0.002

3200

3400

bootstrap dist of severity 75th % ile

-1

2800

2900

3000

3100

3200

3300

3400

-3

-2

-1

Deloitte Consulting, 2005

What about the 90th %ile?

So far so good bootstrapping shows that many of our sample


statistics even average severity! are approximately normally
distributed.
But this breaks down if our statistics is not a smooth function of
the data
Often in the loss reserving we want to focus our attention way
out in the tail
90th %ile is an example.
Normal Q-Q Plot

0.0000

7000

8000

0.0010

9000

bootstrap dist of severity 90th % ile

7000

7500

8000

8500

9000

-3

-2

-1

Deloitte Consulting, 2005

Variance Related to the Mean

6000
5500
5000

As with the normal example, we can calculate both the


sample average and s.d. on each pseudo-dataset.
This time (as one would expect) the variance is a function
of the mean.

sample.sd

2800

2900

3000

3100
sample.mean

3200

3300

3400

Deloitte Consulting, 2005

Bootstrapping a Correlation Coefficient #1

80

60

40

20

About 700 data points


Credit on a scale of 1-100
1 is worst; 100 is best
Age, credit are linearly related
See plot
R2.08 .28
Older people tend to have better credit
What is the confidence interval around ?
Plot of Age vs Credit

age

20

40

60

80

100

Deloitte Consulting, 2005

Bootstrapping a Correlation Coefficient #1

appears normally distributed.

.28
s.d.() .028

Both confidence interval calculations agree fairly well:


> quantile(boot.avg,probs=c(.025,.975))
2.5%
97.5%
0.2247719 0.3334889
> rho - 2*sd(boot.avg); rho + 2*sd(boot.avg)
0.2250254 0.3354617
Normal Q-Q Plot

0.20

0.25

0.30

10

0.35

15

correlation coefficient - bootstrap dist

0.20

0.25

0.30

0.35

-3

-2

-1

Deloitte Consulting, 2005

Bootstrapping a Correlation Coefficient #2

Lets try a different example.


1300 zip-code level data points
Variables: population density, median #vehicles/HH
R2.50 ; -.70
Median #Vehicles vs Pop Density

0.0 0.5 1.0 1.5 2.0 2.5

veh

5000

loess line
10000
15000
regression line
density

20000

25000

30000

Deloitte Consulting, 2005

Bootstrapping a Correlation Coefficient #2


more skew.
-.70
95% conf interval: (-.75, -.67)
Not symmetric around
Effect becomes more pronounced the higher the
value of .

Normal Q-Q Plot

-0.75

10

-0.70

15

-0.65

20

correlation coefficient - bootstrap dist

-0.75

-0.70

-0.65

-3

-2

-1

Deloitte Consulting, 2005

Bootstrapping Loss Ratio

Now for what weve all been waiting for


Total loss ratio of a segment of business is
our favorite point estimate.
Its variability depends on many things:
Size of book
Loss distribution
Accuracy of rating plan
Consistency of underwriting

How could we hope to write down the true


probability distribution?

Bootstrapping to the rescue

Deloitte Consulting, 2005

Bootstrapping Loss Ratio & Frequency

50,000 insurance policies


Severity dist from previous example
LR = .79
Claim frequency = .08

Lets build confidence intervals around these


two point estimates.
We will resample the data 500 times
Compute total LR and freq on each sample
Plot the histogram

Deloitte Consulting, 2005

Results: Distribution of total LR

A little skew, but somewhat close to normal


LR .79
s.d.(LR) .05
conf interval 0.1
Confidence interval calculations disagree a bit:
> quantile(boot.avg,probs=c(.025,.975))
2.5%
97.5%
0.6974607 0.8829664
> lr - 2*sd(boot.avg); lr + 2*sd(boot.avg)
0.6897653 0.8888983
Normal Q-Q Plot

0.7

0.8

0.9

1.0

bootstrap total LR

0.7

0.8

0.9

1.0

-3

-2

-1

Deloitte Consulting, 2005

Dependence on Sample Size

Lets take a sub-sample of 10,000 policies


How does this affect the variability of LR?
Again re-sample 500 times
Skewness, variance increase considerably
LR:
.79

.78
s.d.(LR):
.05

.13
Normal Q-Q Plot

0.0

0.6

1.0

0.8 1.0

2.0

1.2

1.4

3.0

bootstrap total LR

0.6

0.8

1.0

1.2

1.4

-3

-2

-1

Deloitte Consulting, 2005

Distribution of Capped LR

Capped LR is analogous to trimmed mean from robust


statistics
Remove leverage of a few large data points
Here we cap policy-level losses at $30,000

Affects 50 out of 2700 claims

Closer to frequency

distribution less skew close to normal


s.d. cut in half! .05 .025
Normal Q-Q Plot

0.55

0.60

10

0.65

15

0.70

bootstrap LR - losses capped @ $30K

0.55

0.60

0.65

0.70

-3

-2

-1

Deloitte Consulting, 2005

Results: Distribution of Frequency

Much less variance than LR; very close to normal


freq .08
s.d.(freq) .017
Confidence interval calculations match very well:
> quantile(boot.avg,probs=c(.025,.975))
2.5%
97.5%
0.07734336 0.08391072
> lr - 2*sd(boot.avg); lr + 2*sd(boot.avg)
0.07719618 0.08388898
Normal Q-Q Plot

0.076

50

0.080

100 150

200

0.084

bootstrap total freq

0.074

0.076

0.078

0.080

0.082

0.084

0.086

-3

-2

-1

Deloitte Consulting, 2005

When are LRs statistically different?

Example: Divide our 50,000 policies into two


sub-segments: {clean drivers, other}

LRtot = .79

LRclean = .58

LLRclean = -27%

LRother = .84

LRRother = +6%

Clean drivers appear to have 30% lower LR


than non-clean drivers
How sure are we of this indication?
Lets use bootstrapping.

Deloitte Consulting, 2005

Bootstrapping the difference in LRs

Simultaneously re-sample the two segments


500 times.

At each iteration, calculate


LRc*, LRo*, (LRc*- LRo*), (LRc* / LRo*)

Analyze the resulting empirical distributions.


What is the average difference in loss ratios?
what percent of the time is the difference in
loss ratios greater than x%?

Deloitte Consulting, 2005

LR distributions of the sub-populations


Normal Q-Q Plot

0.4

0.6

0.8

LR: clean driving record

0.4

0.5

0.6

0.7

0.8

0.9

1.0

-3

-2

Normal Q-Q Plot

0.70

0.80

0.90

1.00

LR: non-clean record

-1

0.70

0.75

0.80

0.85

0.90

0.95

1.00

1.05

-3

-2

-1

Deloitte Consulting, 2005

LRR distributions of the sub-populations


Normal Q-Q Plot

0.0

0.5

1.0

0.7

2.0

0.9

3.0

1.1

LRR: clean driving record

0.5

0.6

0.7

0.8

0.9

1.0

1.1

-3

-2

Normal Q-Q Plot

1.00

1.05

10

1.10

15

LRR: non-clean record

-1

1.00

1.05

1.10

-3

-2

-1

Deloitte Consulting, 2005

Distribution of LRR Differences


Normal Q-Q Plot

-0.1

0.1

0.3

0.5

LRR_other - LRR_clean

0.0

0.2

0.4

0.6

-3

-2

Normal Q-Q Plot

0.0

1.0

0.5

1.5

1.0

2.0

1.5

2.5

LRR_other / LRR_clean

-1

1.0

1.5

2.0

2.5

-3

-2

-1

Deloitte Consulting, 2005

Final Example: loss reserve variability

A major issue in the loss reserving


community is reserve variability

Bootstrapping is a natural way to tackle this


problem.

Predictive variance of your estimate of


outstanding losses.

Hard to find an analytic formula for variability


of this o/s losses.

Approach here: bootstrap cases, not


residuals.

Deloitte Consulting, 2005

Bootstrapping Reserves

S = database of 5000 claims


Sample with replacement all
policies in S

Call this S*1

Same size as S

Now do this 499 more times!

S*1, S*2,, S*500

Estimate o/s reserves on each


sample

Get a distribution of reserve


estimates

Deloitte Consulting, 2005

Simulated Loss Data

Simulate database of 5000 claims

Each of the 5000 claims was drawn from a


lognormal distribution with parameters

500 claims/year; 10 years

=8; =1.3

Build in loss development patterns.

Li+j = Li * (link + )

is a random error term

See CLRS presentation (2005) for more


details.

Deloitte Consulting, 2005

Bootstrapping Reserves

Compute our reserve estimate on each S*k

These 500 reserve estimates constitute an


estimate of the distribution of outstanding losses

Notice that we did this by resampling our


original dataset S of claims.
Note: this bootstrapping method differs from
other analyses which bootstrap the residuals
of a model.

These methods rely on the assumption that your


model is correct.

Deloitte Consulting, 2005

Distribution of Outstanding Losses


4 e-04

total reserves - all 10 years

3 e-04

0 e+00

1 e-04

2 e-04

Blue bars: the


bootstrapped
distribution
Dotted line:
kernel density
estimate of the
distribution
Pink line:
superimposed
normal

19000

20000

21000

22000

23000

24000

25000

Deloitte Consulting, 2005

Distribution of Outstanding Losses

95% confidence
interval

4 e-04
3 e-04

Mean:
$21.751M
Median: $21.746M
:
$0.982M
/ 4.5%

2 e-04

total reserves - all 10 years

1 e-04

The simulated dist of


outstanding losses
appears normal.

0 e+00

19000

20000

21000

22000

23000

24000

(19.8M, 23.7M)

Note: the 2.5 and 97.5 %iles of the bootstrapping distribution


roughly agree with $21.75 2

25000

Deloitte Consulting, 2005

Distribution of Outstanding Losses

We can examine a QQ plot to verify that the


distribution of o/s losses is approximately normal.

However, the tails are somewhat heavier than normal.


Remember this is just simulated data!
Real-life results have been consistent with these results.

Normal Q-Q Plot

19000

0 e+00

21000

2 e-04

23000

25000

4 e-04

total reserves - all 10 years

19000

20000

21000

22000

23000

24000

25000

-3

-2

-1

Deloitte Consulting, 2005

References

Bootstrap Methods and their Applications


--Davison and Hinkley

An Introduction to the Bootstrap


--Efron and Tibshirani

A Leisurely Look at the Bootstrap


--Efron and Gong
American Statistician 1983

Bootstrap Methods for Standard Errors


-- Efron and Tibshirani
Statistical Science 1986

Applications of Resampling Methods in Actuarial


Practice
-- Derrig, Ostaszewski, Rempala
PCAS 2000