Professional Documents
Culture Documents
Non-Life Insurance:
Mathematics & Statistics
tes
(m
Lecture Notes
NL
no
Mario V. Wthrich
RiskLab Switzerland
Department of Mathematics
ETH Zurich
NL
no
tes
(m
w)
tes
(m
w)
Lecture notes. The present lecture notes cover the lecture Non-Life Insurance:
Mathematics & Statistics which is held in the Department of Mathematics at ETH
Zurich. This lecture is a merger of the two lectures Nicht-Leben Versicherungsmathematik and Risk Theory for Insurance. It was taught for its first time in
Spring 2014 at ETH Zurich and in Fall 2014 at University of Bologna (jointly with
Tim Verdonck). The lecture aims at providing a basis in non-life insurance mathematics which forms a core subject of actuarial sciences. After this course, the students are recommended to follow lectures that give a deeper knowledge in different
subjects of non-life insurance mathematics, such as Credibility Theory, Non-Life
Insurance Pricing with Generalized Linear Models, Stochastic Claims Reserving
Methods, Market-Consistent Actuarial Valuation, Quantitative Risk Management,
Data Analytics, etc.
no
Prerequisites. The prerequisites for this lecture are a solid education in mathematics, in particular, in probability theory and statistics.
NL
Terms of Use. These lecture notes are an ongoing project which is continuously
revised and updated. Of course, there may be errors in the notes and there is always
room for improvement. Therefore, I appreciate any comment and/or corrections
that readers may have. However, I would like you to respect the following rules:
These notes are provided solely for educational, personal and non-commercial
use. Any commercial use or reproduction is forbidden.
All rights remain with the author. He may update the manuscript or withdraw the manuscript at any time. There is no right of the availability of any
(old) version of these notes. The author may also change these terms of use
at any time.
The author disclaims all warranties, including but not limited to the use or
the contents of these notes. On using these notes, you fully agree to this.
Citation: please use the SSRN URL.
All pictures and graphs included in these notes are either downloaded from the
internet (open access) or were plotted by the author. If downloaded graphs
3
Electronic copy available at: http://ssrn.com/abstract=2319328
4
violate copyright, I appreciate an immediate note and the corresponding pictures will be removed from these lecture notes.
NL
no
tes
(m
w)
Previous versions.
September 2, 2013
December 2, 2013
August 27, 2014
June 29, 2015
Acknowledgment
tes
(m
w)
Writing these notes, I profited greatly from various inspiring as well as ongoing
discussions, concrete contributions and critical comments with and by several people: first of all, the students that have been following our lectures at ETH Zurich
since 2006; furthermore Hans Bhlmann, Christoph Buser, Philippe Deprez, Paul
Embrechts, Farhad Farhadmotamed, Urs Fitze, Markus Gesmann, Alois Gisler,
Laurent Huber, Lukas Meier, Michael Merz, Esbjrn Ohlsson, Gareth Peters, Albert Pinyol i Agelet, Peter Reinhard, Simon Rentzmann, Rodrigo Targino, Teja
Turk, Tim Verdonck, Maximilien Vila, Yitian Yang, Patrick Zchbauer. I especially thank Alois Gisler for providing his lecture notes [54] and the corresponding
exercises.
NL
no
Mario V. Wthrich
NL
no
tes
(m
w)
w)
Contents
(m
1 Introduction
1.1 Nature of non-life insurance . . . . . . . . . . . . . . .
1.1.1 Non-life insurance and the law of large numbers
1.1.2 Risk components and premium elements . . . .
1.2 Probability theory and statistics . . . . . . . . . . . . .
1.2.1 Random variables and distribution functions . .
1.2.2 Terminology in statistics . . . . . . . . . . . . .
NL
no
tes
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
13
14
14
21
.
.
.
.
.
.
.
.
.
.
23
23
25
26
29
36
37
40
41
45
47
.
.
.
.
.
.
.
.
.
.
.
.
53
53
58
59
64
66
70
73
79
79
82
83
86
Contents
3.4.1
3.4.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(m
.
.
.
.
.
.
.
.
.
.
.
.
tes
no
NL
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
w)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
86
88
.
.
.
.
.
.
.
93
93
94
97
100
105
105
116
.
.
.
.
.
.
121
121
125
129
129
131
133
.
.
.
.
.
.
.
141
142
144
144
154
156
159
164
.
.
.
.
.
.
.
.
.
167
170
174
174
177
182
185
186
189
192
Contents
8.2
8.1.2
Linear
8.2.1
8.2.2
8.2.3
8.2.4
9
Exponential dispersion family with conjugate priors
credibility estimation . . . . . . . . . . . . . . . . .
Bhlmann-Straub model . . . . . . . . . . . . . . .
Bhlmann-Straub credibility formula . . . . . . . .
Estimation of structural parameters . . . . . . . . .
Prediction error in the Bhlmann-Straub model . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
tes
(m
w)
9 Claims Reserving
9.1 Outstanding loss liabilities . . . . . . . . . . . . . . . .
9.2 Claims reserving algorithms . . . . . . . . . . . . . . .
9.2.1 Chain-ladder algorithm . . . . . . . . . . . . . .
9.2.2 Bornhuetter-Ferguson algorithm . . . . . . . . .
9.3 Stochastic claims reserving methods . . . . . . . . . . .
9.3.1 Gamma-gamma Bayesian CL model . . . . . . .
9.3.2 Over-dispersed Poisson model . . . . . . . . . .
9.4 Claims development result . . . . . . . . . . . . . . . .
9.4.1 Definition of the claims development result . . .
9.4.2 One-year uncertainty in the Bayesian CL model
9.4.3 The full picture of run-off uncertainty . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
NL
no
10 Solvency Considerations
10.1 Balance sheet and solvency . . .
10.2 Risk modules . . . . . . . . . .
10.3 Insurance liability variables . .
10.3.1 Market-consistent values
10.3.2 Insurance risk . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
207
212
213
214
218
221
.
.
.
.
.
.
.
.
.
.
.
225
226
232
232
236
237
239
247
249
249
251
257
.
.
.
.
.
263
263
267
270
270
271
Contents
NL
no
tes
(m
w)
10
Chapter 1
1.1.1
(m
1.1
w)
Introduction
NL
no
tes
Insurance originates from a general demand of society who asks for protection
against unforeseeable events which might cause serious (financial) damage to individuals and society. Insurance organizes the financial protection against such
unforeseeable (random) events, meaning that it takes care of the financial replacements of the (potential) damage. The general idea is to build a community (collective) to which everybody contributes a certain amount (fixed deterministic premium1 ) and then the (potential) financial damage is financed by the means of this
community.
In special cases, for instance in re-insurance or accident insurance, the premium can also have
a random part. This is not further discussed here.
11
12
Chapter 1. Introduction
The basic features of such communities are that every member faces similar risks.
By building such communities the individual members profit from diversification
benefits in the form of a law of large numbers that applies to the community.
Insurance companies organize the fair distribution within the community.
NL
no
tes
(m
w)
Chapter 1. Introduction
13
his path-breaking work Ars Conjectandi which has appeared in 1713, eight years
after his death, see Bolthausen-Wthrich [15].
Pn
Y n
i
N (0, 1)
n
A. De Moivre
as
n ,
(1.2)
(m
i=1
w)
1.1.2
no
tes
NL
14
Chapter 1. Introduction
(a) the model world does not provide an appropriate description of real world
behavior;
(b) the parameters in the chosen model are misspecified;
(c) risk factors change over time so that past observations do not appropriately describe what may happen in the future (non-stationarity), of
course, this is closely related to (a) and (b).
(m
w)
In practice, these uncertainties (including pure randomness) ask for a risk loading
(risk margin) beyond the pure risk premium defined by = E[Yi ]. The aim of this
risk loading is to provide financial stability. We will describe this in detail below
in Chapters 5, 6 and 10.
We close this section by describing the premium elements that are considered for
insurance premium calculation:
tes
+ taxes
no
1.2
1.2.1
NL
The sum of all these items specifies the insurance premium. Non-life insurance
mathematics and statistics typically studies the first two items. This is part of the
program of the subsequent chapters.
In this section we briefly recall the crucial notation and key results of probability
theory used in these notes. We denote the underlying probability space by (, F, P)
and assume throughout that this probability space is sufficiently rich so that it
carries all the objects that we are going to consider.
Random variables on this probability space (, F, P) are denoted by capital letters X, Y, S, N, . . . and the corresponding observations are denoted by small letters
x, y, s, n, . . .. That is, x constitutes a realization of X. Random vectors are denoted by boldface, e.g., X = (X1 , . . . , Xd )0 and the corresponding observation by
x = (x1 , . . . , xd )0 for a given dimension d N. Since there is broad similarity
Version April 14, 2016, M.V. Wthrich, ETH Zurich
Chapter 1. Introduction
15
between random variables and random vectors, we restrict to random variables for
introducing the crucial terminology from probability theory.
Random variables X are characterized by (probability, cumulative) distribution
functions F = FX : R [0, 1], meaning that for all x R
F (x) = FX (x) = P [X x] [0, 1]
w)
pk = P [X = k] > 0
(m
tes
Z x
for all x R.
f (y) dy
no
This function f is called density of X and in that case we also use the terminology
X f.
Assume X F and h : R R is a sufficiently nice measurable function. We
define the expected value of h(X) by
kA h(k) pk
if X is discrete,
if X is absolutely continuous.
h(x) dF (x) =
NL
E [h(X)] =
h(x)f (x) dx
The middle term uses the general framework of the Riemann-Stieltjes integral
R
R h dF (and in fact the second equality is not an identity because the middle
term is more general than the right-hand side). The sufficiently nice refers to the
fact that E [h(X)] is only defined upon existence. The most important functions h
in our analysis define the following moments (based upon existence):
mean, expectation, expected value or first moment of X F
X = E [X] =
x dF (x);
k-th moment of X F
h
E Xk =
xk dF (x);
16
Chapter 1. Introduction
variance of X F
h
2
X
= Var (X) = E (X E[X])2 = E X 2 E [X]2 0;
skewness of X F
E (X E[X])3
3
X
(m
w
X =
X
E[X]
Vco(X) =
X = Var (X)1/2
no
tes
Lemma 1.1. Choose X F and assume that there exists r0 > 0 such that
MX (r) < for all r (r0 , r0 ). Then MX (r) has a power series expansion
for r (r0 , r0 ) with
h
i
X rk
MX (r) =
E Xk .
k0 k!
NL
Proof. Note that it suffices to choose r (r0 , r0 ) with r 6= 0. Since e|rx| erx + erx , the
assumptions imply the following integrability E [exp {|rX|}] < . This implies that E[|X|k ] <
for all k N because |x|k is dominated by e|rx| for sufficiently large |x|. It also implies that
Pm
the partial sums |fm (x)| = | k=0 (rx)k /k!| are uniformly bounded by the integrable (w.r.t. dF )
P
function k0 |rx|k /k! = e|rx| . This allows to apply the dominated convergence theorem which
provides
m
h
i
X
rk k
E X = lim E [fm (X)] = E lim fm (X) = MX (r).
lim
m
m
m
k!
k=0
Lemma 1.1 implies that the power series converges for all r (r0 , r0 ) for given
r0 > 0 and, thus, we have a strictly positive radius of convergence 0 > 0. A
standard result from analysis implies that in the interior of the interval [0 , 0 ]
we can differentiate MX () arbitrarily often (term by term of the power series) and
the derivatives at the origin are given by
h
i
dk
k
M
(r)|
=
E
X
<
X
r=0
drk
for k N0 .
(1.3)
Chapter 1. Introduction
17
Lemma 1.2. Choose a random variable X F and assume that there exists r0 > 0
such that MX (r) < for all r (r0 , r0 ). Then the distribution function F of X
is completely determined by its moment generating function MX .
Proof. The existence of a strictly positive radius of convergence 0 implies that all moments
of X exist and that they are directly determined by the moment generating function via (1.3).
Theorem 30.1 of Billingsley [13] then implies that there is at most one distribution function F
which has the same moments (1.3) for all k N.
2
w)
For one-sided random variables the statement even holds true in general:
Lemma 1.3. Assume X 0, P-a.s. The distribution function F of X is completely
determined by its moment generating function MX .
2
(m
(d)
X = Y.
tes
MX MY
no
NL
A.A. Markov
18
Chapter 1. Introduction
1
1 (x )2
f (x) =
exp
.
2
2
2
(
for r R.
(1.4)
w)
X = E [X] =
d2
1 2 2
= 2 + 2 .
( + r 2 )2 + 2
= 2 MX (r)|r=0 = exp r + r
r=0
dr
2
(m
2
X
= Var(X) = E X 2 E [X]2 = 2 .
tes
Moreover, any random variable Y that has moment generating function of the form
(1.4) is Gaussian with mean Y = and variance Y2 = 2 , see Lemma 1.2.
Exercise 1 (Gaussian distribution).
no
Xi
NL
(c) Assume X N (0, 1). Prove that E[X 2k+1 ] = 0 for all k N0 .
C.F. Gauss
Chapter 1. Introduction
19
d
log MX (r)|r=0 =
dr
d2
MX00 (r)MX (r) (MX0 (r))2
2
= Var (X) = X
,
log
M
(r)|
=
X
r=0
dr2
(MX (r))2
r=0
h
i
d3
3
3
log
M
(r)|
=
E
(X
E[X])
= X X
.
X
r=0
dr3
(1.5)
(m
w)
Lemma 1.6. Assume that MX is finite on (r0 , r0 ) with r0 > 0. Then log MX ()
is a convex function on (r0 , r0 ).
Proof. In order to prove convexity we calculate the second derivative at position r (r0 , r0 )
=
=
00
0
00
MX
(r)MX (r) (MX
(r))2
MX
(r)
=
2
(MX (r))
MX (r)
!2
E XerX
E X 2 erX
.
E [erX ]
E [erX ]
tes
d2
log MX (r)
dr2
no
1
Fr (x) =
MX (r)
0
MX
(r)
MX (r)
2
ery dF (y).
(1.6)
E[Xr ] =
E [erX ]
NL
0 Var(Xr ) =
E[Xr2 ]
!2
E XerX
d2
= 2 log MX (r).
rX
E [e ]
dr
2
Remark. The distribution function Fr defined in (1.6) gives the Esscher measure
of F . The Esscher measure has been introduced by Bhlmann [19] for a new
premium calculation principle. We come back to this in Section 6.2.2, below.
The next formula is often used: Assume that X F is non-negative, P-a.s., and
has finite first moment. Then we have identity
E[X] =
Z
0
x dF (x) =
Z
0
[1 F (x)] dx =
Z
0
P [X > x] dx.
20
Chapter 1. Introduction
The proof uses integration by parts and the result says that we can calculate
expected values from survival functions F (x) = 1 F (x) = P[X > x]. Survival
functions will be important for the study of the fatness of the tails of distribution
functions. This plays a crucial role for the modeling of large claims.
w)
(m
(1.7)
tes
where E[X|Y ] is an abbreviation for E[X|(Y )] with (Y ) F denoting the algebra generated by the random variable Y . Assume that X is square integrable
then tower property (1.7) implies
(1.8)
no
NL
where we use convention inf = . For p (0, 1), F (p) is often called the
p-quantile of X F . The generalized inverse F is only tricky at places where F
has a discontinuity or where F is not strictly increasing. It satisfies the following
properties, see Proposition A3 in McNeil et al. [77],
1. F is non-decreasing and left-continuous.
2. F is continuous iff F is strictly increasing.
3. F is strictly increasing iff F is continuous.
4. (If F is right-continuous, then) F (x) z iff F (z) x.
5. F (F (x)) x.
6. F (F (z)) z.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
Chapter 1. Introduction
21
1.2.2
Terminology in statistics
w)
Items 4. to 8. need F (z) < . Note that the first part of item 4. is put in brackets
because distribution functions are right-continuous. However, generalized inverses
can also be defined for functions that are not right-continuous (as long as they are
non-decreasing) and then the condition in the bracket of item 4. is needed.
tes
(m
Often we face the problem that we need to predict the outcome of a random varic For
able X F . This problem is solved by specifying an appropriate predictor X.
c = = E[X]. On the other hand a distriinstance, we can choose as predictor X
X
bution function F often involves unknown parameters. These unknown parameters
need to be estimated, for instance, using past experience and expert opinion. For
example, we can estimate the (unknown) mean X of X by an estimator b X . If we
c=
b X for predicting X, then
b X serves at the same time
now choose predictor X
as estimator for X and as predictor for X. In this sense we obtain an estimation
error which is specified by the difference X b X and we obtain a prediction error
which is characterized by the following difference
c = X
b X = (X X ) + (X
bX ) .
X X
(1.9)
no
The second term on the right-hand side of (1.9) specifies the estimation error and
the first term on the right-hand side of (1.9) is often called pure process error which
is due to the stochastic nature of X, see also Section 9.3.
NL
Statistical tests deal with the problem of making decisions. Assume we have an
observation x of a random vector X F with given but unknown parameter
which lies in a given set of possible parameters. The aim is to test whether
the (true, unknown) parameter that has generated x may belong to some subset
0 . In the simplest case we have a singleton 0 = {0 }. Assume that we
would like to check whether x may have been generated by a given parameter 0 .
Null hypothesis H0 : = 0 .
(Two-sided) alternative hypothesis H1 : 6= 0 .
We then build a test statistics T (X) whose distribution function is known under
the null hypothesis H0 and we consider the question whether T (x) takes an unlikely
value under the null hypothesis. Therefore one chooses a significance level q (0, 1)
(typically 5% or 1%) and for this significance level one chooses a critical region Cq
with P[T (X) Cq ] q (under the null hypothesis). The null hypothesis is then
rejected if T (x) falls into this critical region. In practice, one often calculates the
Version April 14, 2016, M.V. Wthrich, ETH Zurich
22
Chapter 1. Introduction
so-called p-value. This denotes the critical probability at which the null hypothesis
is just rejected (for one-sided unbounded intervals). For instance, if we choose a
significance level of 5% and the resulting p-value of T (x) is less or equal to 5% then
the test rejects the null hypothesis on the 5% significance level.
Exercise 2 (2 -distribution). Assume that Xk has a 2 -distribution with k N
degrees of freedom, i.e. Xk is absolutely continuous with density
1
xk/21 exp {x/2} ,
k/2
2 (k/2)
for x 0.
w)
f (x) =
(a) Prove that f is a density. Hint: see Section 3.3.3 and proof of Proposition
2.20.
(m
(b) Prove
MXk (r) = (1 2r)k/2
(d)
Pk
i=1
(d)
NL
no
tes
Chapter 2
w)
(m
The aim of this chapter is to describe the probability distribution of the total claim
amount S that an insurance company faces within a fixed time period. For the
time period we take one (accounting) year. Assume that N counts all claims that
occur within this fixed accounting year. The total claim amount is then given by
S = Y1 + Y2 + . . . + YN =
N
X
Yi ,
tes
i=1
2.1
NL
no
Compound distributions
N
X
i=1
Yi ,
24
2. Y1 , Y2 , . . . G with G(0) = 0;
3. N and (Y1 , Y2 , . . .) are independent.
Remarks.
If S satisfies these three standard assumptions from Model Assumptions 2.1
we say that S has a compound distribution.
(m
w
The first assumption of the compound distribution says that the number of
claims N takes only non-negative integer values. The event {N = 0} means
that no claim occurs which provides a total claim amount of S = 0.
tes
The second assumption means that the individual claims Yi do not affect
each other, for instance, if we face a large first claim Y1 this does not give
any information for the remaining claims Yi , i 2. Moreover, we have
homogeneity in the sense that all claims have the same marginal distribution
function G with
0 = G(0) = P [Y1 0] ,
i.e. the individual claim sizes Yi are strictly positive, P-a.s. We use synonymous the terminology (individual) claim size, (individual) claim and claims
severity for Yi .
no
Finally, the last assumption says that the individual claim sizes are not affected by the number of claims and vice versa, for instance, if we observe
many claims this does not contain any information whether these claims are
of smaller or larger size.
NL
This compound distribution is the base model for collective risk modeling and we
are going to describe different choices for the claims count distribution of N and
for the individual claim size distributions of Yi . We start with the basic recognition
features of compound distributions.
Proposition 2.2. Assume S has a compound distribution. We have
E[S] = E[N ] E[Y1 ],
Var(S) = Var(N ) E[Y1 ]2 + E[N ] Var(Y1 ),
s
1
Vco(S) =
Vco(N )2 +
Vco(Y1 )2 ,
E[N ]
MS (r) = MN (log(MY1 (r)))
for r R,
whenever they exist.
25
Proof. Using the tower property (1.7) we obtain for the mean of S
##
"N
"N
#
"N
#
#
" " N
X
X
X
X
E[S] = E
Yi = E E
= E
E [ Yi | N ] = E
E [Yi ]
Yi N
i=1
i=1
i=1
i=1
E [N E [Y1 ]] = E [N ] E [Y1 ] .
i=1
Var
N
X
i=1
!
E [Yi ]
"
+E
i=1
!#
Yi N
Var (Yi )
i=1
(m
i=1
N
X
N
X
w)
tes
" N
X
kA
i=1
" k
X
Yi
x N
#
no
P [S x] =
kA
= k P [N = k]
Yi x P [N = k] =
i=1
(2.1)
Gk (x) P [N = k] ,
kA
NL
With formula (2.1) we obtain a closed form solution for the distribution function
of S. However, in general, this formula is not useful due to the computational
complexity of calculating Gk for too many k A. We present other solutions
for the calculation of the distribution function of S. These involve simulations,
approximations and smart analytic techniques under additional model assumptions.
2.2
In this section we give explicit distribution functions for the number of claims N
modeling. The three most commonly used distribution functions are the binomial
Version April 14, 2016, M.V. Wthrich, ETH Zurich
26
(m
w)
tes
E[N ] = v,
where > 0 denotes the expected claims frequency. Under these premises we would
like to describe the probability weights
for k A N0 .
2.2.1
no
pk = P [N = k]
Binomial distribution
NL
For the binomial distribution we choose a fixed volume v N and a fixed default
probability p (0, 1) (expected claims frequency).
We say N has a binomial distribution, write N Binom(v, p), if
v
pk = P [N = k] =
k
pk (1 p)vk
P [N = k] =
1p
p
for k = 0,
for k = 1.
27
Proposition 2.3. Assume N Binom(v, p) for fixed v N and p (0, 1). Then
s
E[N ] = vp,
Vco(N ) =
1p
,
vp
for all r R.
w)
Proof. We calculate the moment generating function and then the first two moments follow from
formula (1.5). For the moment generating function we have
X
X v
k
rk v
k
vk
MN (r) =
e
p (1 p)
=
(per ) (1 p)vk
k
k
kA
kA
k
vk
X v
per
1p
r
v
= (pe + (1 p))
.
k
per + (1 p)
per + (1 p)
kA
(m
The last sum is again a summation over probability weights pk , k A, of a binomial distribution
with default probability p = (per )/(per + (1 p)) (0, 1). Therefore it adds up to 1 which
completes the proof.
2
tes
Corollary 2.4. Assume N Binom(v, p) with given v N and p (0, 1). Choose
i.i.d.
X1 , . . . , Xv Bernoulli(p). Then we have
(d)
no
N =
v
X
Xi .
i=1
Pv
Proof. In view of Lemma 1.3 it suffices to prove that N and X =
i=1 Xi have the same
moment generating function. The moment generating function of the latter is given by
" v
#
v
v
Y
Y
Y
rXi
(per + (1 p)) = MN (r).
E erXi =
MX (r) = E
e
=
i=1
i=1
NL
i=1
Remarks. The corollary states that N describes the number of defaults within
a portfolio of fixed size v N. Every risk in this portfolio has the same default
probability p and defaults between different risks do not influence each other (are
independent). Thus, if N has a binomial distribution then every risk in such a
portfolio can at most default once. This is the case, for instance, for life insurance
policies where an insured can die at most once. In non-life insurance this distribution is less commonly used because for typical non-life insurance policies we can
have more than one claim within a fixed time interval, e.g., a car insurance policy
can suffer two or more accidents within the same accounting year. Therefore, the
binomial distribution is not of central interest in non-life insurance modeling.
28
Definition 2.5 (compound binomial model). The total claim amount S has a
compound binomial distribution, write
S CompBinom(v, p, G),
if S has a compound distribution with N Binom(v, p) for given v N and
p (0, 1) and individual claim size distribution G.
w)
1 q
1 p + Vco(Y1 )2 ,
vp
MS (r) = (pMY1 (r) + (1 p))v
for r R,
(m
Vco(S) =
tes
no
Remark. The coefficient of variation Vco(S) is a measure for the degree of diversification within the portfolio. If S has a compound binomial distribution with
fixed default probability p and fixed claim size distribution G having finite second
moment, then the coefficient of variation converges to zero of order v 1/2 as the
portfolio size v increases.
Corollary 2.7 (aggregation property). Assume S1 , . . . , Sn are independent with
Sj CompBinom(vj , p, G) for all j = 1, . . . , n. The aggregated claim has a compound binomial distribution with
NL
S=
n
X
Sj CompBinom
n
X
vj , p, G .
j=1
j=1
Proof. Exercise. Note here that n describes the (deterministic) number of portfolios and should
not be confused with the binomial random variable N . 2
N
X
Yi 1{Yi >M } .
i=1
Then we have Slc CompBinom(v, p(1 G(M )), Glc ) where the large claims size
distribution satisfies Glc (y) = P [Y1 y|Y1 > M ].
Version April 14, 2016, M.V. Wthrich, ETH Zurich
2.2.2
29
Poisson distribution
For defining the Poisson distribution we choose a fixed volume v > 0 and a fixed
expected claims frequency > 0.
We say N has a Poisson distribution, write N Poi(v), if
(v)k
k!
for all k A = N0 .
w)
pk = P [N = k] = ev
(m
The Poisson distribution goes back to Simon Denis Poisson (1781-1840) who has published his work on probability
theory in 1837.
no
tes
NL
E[N ] = v = Var(N ),
1
,
v
for all r R.
Vco(N ) =
Proof. We calculate the moment generating function and then the first two moments follow from
formula (1.5). For the moment generating function we have using the power series expansion of
the exponential function
MN (r)
X
k0
erk ev
X (ver )k
(v)k
= ev
= exp {v + ver } .
k!
k!
k0
Proposition 2.8 provides the interpretation of the parameter . For given volume
v > 0 the expected claims frequency is
N
= .
v
30
Moreover, for the coefficient of variation of the claims frequency N/v we obtain
Vco
N
v
= (v)1/2 0
for v .
(2.2)
w)
Proof. In view of Lemma 1.4 we need to prove that the moment generating functions of Nv have
the appropriate convergence property.
h
ivp(v)
1/p(v)
MNv (r) = (per + (1 p))v = (1 + p(v) (er 1))
.
(m
Note that p(v) 0 as v . If we apply this limit to the inner bracket (1 + p(v)(er 1))1/p(v)
we exactly obtain the limit definition of the exponential function exp{er 1}, see Definition 14.30
in Merz-Wthrich [79]. This with the fact that vp(v) c as v provides the proof.
2
tes
no
S CompPoi(v, G),
if S has a compound distribution with N Poi(v) for given , v > 0 and individual
claim size distribution G.
NL
Var(S) = v E[Y12 ],
s
1 q
1 + Vco(Y1 )2 ,
v
MS (r) = exp {v(MY1 (r) 1)}
Vco(S) =
for r R,
31
then the coefficient of variation converges to zero of order v 1/2 as the portfolio
size v increases.
The compound Poisson distribution has the so-called aggregation property and the
disjoint decomposition property. These are two extremely beautiful and useful
properties which explain part of the popularity of the compound Poisson model.
We first state and prove these two properties and then we give interpretations in
the context of non-life insurance portfolio modeling.
n
X
Sj CompPoi(v, G),
j=1
with
v=
n
X
vj ,
n
X
vj
j=1
j=1
(m
S=
w)
G=
and
n
X
j vj
Gj .
j=1 v
tes
NL
no
Proof. We have assumed that Gj (0) = 0 for all j = 1, . . . , n which implies that S 0, P-a.s.
From Lemma 1.3 it follows that we only need to identify the moment generating function of S in
order to prove that it is compound Poisson distributed. Observe that MS (r) exists at least for
r 0. Thus, we calculate (using the independence of the Sj s)
n
n
n
X
Y
Y
E [exp {rSj }]
MS (r) = E exp r
Sj = E
exp {rSj } =
j=1
j=1
j=1
n
n
n
o
Y
X
j vj
=
exp j vj MY (j) (r) 1
= exp v
MY (j) (r) 1 ,
1
1
v
j=1
j=1
(j)
where we have assumed Y1 Gj . This is a compound Poisson distribution with expected number of claims v and the claim size distribution G is obtained from the moment generating funcPn j vj
Pn j vj
tion j=1 v
MY (j) (r): note that G = j=1 v
Gj is a distribution function (non-decreasing,
1
right-continuous, limx G(x) = 0 and limx G(x) = 1). We choose Y G and obtain
Z
Z
n
X
v
j
j
MY (r) =
ery dG(y) =
ery d
Gj (y)
v
0
0
j=1
=
Z
n
X
j vj
j=1
n
X
j vj
j=1
MY (j) (r).
1
Using Lemma 1.3 once more for the claim size distribution proves the theorem.
32
m
X
G(y) =
p+
j Gj (y)
w)
j=1
P [I = j] = p+
j
(m
(2.3)
tes
This allows to extend the compound Poisson model from Definition 2.10.
no
Definition 2.13 (extended compound Poisson model). The total claim amount
P
S= N
i=1 Yi has a compound Poisson distribution as defined in Definition 2.10. In
addition, we assume that (Yi , Ii )i1 are i.i.d. and independent of N with Yi having
marginal distribution function G with G(0) = 0 and Ii having marginal distribution
function given by (2.3).
NL
Remark. Note that Definition 2.13 gives a well-defined extension, i.e. it fully
respects the assumptions made in Definition 2.10 because (Yi , Ii )i1 are i.i.d. and
independent of N with Yi having the appropriate marginal distribution function
G. Observe that we do not specify the dependence structure between Yi and Ii . If
we choose m = 1 in (2.3) we are back in the classical compound Poisson model.
Therefore, the next theorem especially applies to the compound Poisson model.
Before stating the next theorem we introduce an admissible and measurable disjoint
decomposition (partition) of the total space. The random vector (Y1 , I1 ) takes
values in R+ {1, . . . , m}. On this latter we choose a finite sequence A1 , . . . , An
of (measurable) sets such that Ak Al = for all k 6= l and
n
[
Ak = R+ {1, . . . , m}.
(2.4)
k=1
Pn
k=1
33
Theorem 2.14 (disjoint decomposition of compound Poisson distributions). Assume that S fulfills the extended compound Poisson model assumptions of Definition
2.13. We choose an admissible and measurable disjoint decomposition A1 , . . . , An
for (Y1 , I1 ). Define for k = 1, . . . , n the random variables
Sk =
N
X
w)
i=1
Gk (y) = P [ Y1 y| (Y1 , I1 ) Ak ] .
(m
and
Proof of Theorem 2.14. We prove the theorem using the multivariate extension of the moment generating function. Choose r = (r1 , . . . , rn )0 Rn . The multivariate moment generating
function of random vector S = (S1 , . . . , Sn )0 is given by
"
( n
)#
"
( n
)#
N
X
X X
0
MS (r) = E [exp {r S}] = E exp
rk Sk
= E exp
rk
Yi 1{(Yi ,Ii )Ak }
k=1
i=1
k=1
) ##
= E
E exp
rk Yi 1{(Yi ,Ii )Ak } N
i=1
k=1
"N "
( n
)##
Y
X
= E
E exp
rk Yi 1{(Yi ,Ii )Ak }
.
"
i=1
n
X
tes
"N
Y
k=1
l=1
E exp
n
X
k=1
)
#
rk Yi 1{(Yi ,Ii )Ak } (Yi , Ii ) Al P [(Yi , Ii ) Al ]
NL
k=1
"
n
X
no
Note that N is a Poisson distributed random variable and n denotes the deterministic number of
disjoint sets A1 , . . . , An . We calculate the inner expected values of the last expression.
"
( n
)#
"
( n
)
#
n
X
X
X
E exp
rk Yi 1{(Yi ,Ii )Ak }
=
E exp
rk Yi 1{(Yi ,Ii )Ak } 1{(Yi ,Ii )Al }
l=1
n
X
k=1
n
X
l=1
l=1
(l)
!N
"
(
!)#
n
n
X
X
= E exp N log
MS (r) = E
p(l) MY (l) (rl )
p(l) MY (l) (rl )
1
l=1
(
=
exp v
n
X
!)
p
(l)
MY (l) (rl ) 1
n
Y
l=1
(
= exp v
l=1
l=1
n
X
l=1
(l)
MY (l) (rl ) 1
n
n
o
Y
exp vp(l) MY (l) (rl ) 1
=
MSl (rl ).
1
l=1
This proves the theorem because we have obtained a product (i.e. independence) of moment
generating functions of compound Poisson distributed random variables Sl , l = 1, . . . , n.
2
34
w)
tes
(m
no
For I we have chosen a finite (discrete) indicator. Of course, this model can
easily be extended to other indicators. The crucial property is the i.i.d. assumption on the random vectors (Yi , Ii ). We have chosen a finite indicator I
because this has the natural interpretation of sub-portfolios. If I = 1, P-a.s.,
then we can completely drop this indicator.
NL
The choice of the appropriate volume on the sub-portfolios depends on the choice
of the indicator I. If m = 1, i.e. if we only consider one portfolio, and if we apply
a disjoint decomposition of this portfolio as follows
Yi = Yi 1{Yi A1 } + . . . + Yi 1{Yi An } ,
then it is natural to set vk = v and k = p(k) for k = 1, . . . , n. That is, the volume
v > 0 remains constant but the expected claims frequencies k change accordingly
to Ak . This is also called thinning of the Poisson point process.
The second extreme case is m = n > 1 and the disjoint decomposition is given by
{(Yi , Ii ) Ak } = {Ii = k},
i.e. we only consider a decomposition according to different sub-portfolios k =
1, . . . , m. In this case we would rather define vk > 0 by the volume of portfolio k
and k = p(k) v/vk .
Version April 14, 2016, M.V. Wthrich, ETH Zurich
35
A = A1 = {Y1 M }
and
w)
Ssc =
N
X
Yi 1{Yi M }
and
Slc =
N
X
Yi 1{Yi >M } .
i=1
tes
i=1
(m
Assume that S CompPoi(v, G). We define the total claim Ssc in the small
claims layer and the total claim Slc in the large claims layer by
Theorem 2.14 implies that Ssc and Slc are independent and compound Poisson
distributed with
and
no
Slc CompPoi (lc v = (1 G(M ))v , Glc (y) = P [Y1 y|Y1 > M ]) .
NL
In particular, this means that we can model the small and the large claims layers
completely separately and then obtain the total claim amount distribution by a simple convolution of the two resulting distribution functions (due to independence),
see Example 4.11, below.
For the large claims layer we need to determine the expected large claims frequency
lc > 0. The individual large claim sizes Y1 |{Y1 >M } are often modeled with a Pareto
distribution with threshold M and tail parameter > 1, for more details see
Sections 3.2.5 and 3.4.1.
The small claims layer is often approximated by a parametric distribution function: we have seen in (2.1) that compound distributions may lead to rather time
consuming computational complexity when the expected number of claims sc v is
large. Therefore, one typically assumes that the expected number of small claims
is sufficiently large so that we are already in the asymptotic regime of the central
limit theorem and then we approximate this compound distribution by the Gaussian distribution, see Theorem 4.1 below, or maybe by a distribution function that
Version April 14, 2016, M.V. Wthrich, ETH Zurich
36
is slightly skewed, see Sections 4.1.2 and 4.1.3. Note that the small claims layer
cannot be distorted by large claims because they are already sorted out by the
threshold M . We will describe this in more detail in Section 3.4.1, below.
2.2.3
w)
Above we have introduced the binomial and the Poisson distributions. These two
distributions have the following relationship
E [N ] > Var(N ),
Poisson distribution
E [N ] = Var(N ).
(m
binomial distribution
E [N ] < Var(N ).
no
tes
NL
In the next section we make an explicit choice for the distribution function H.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
2.2.4
37
Negative-binomial distribution
In this section we assume that N has a mixed Poisson distribution and we assume
that the latent variable is drawn from a gamma distribution. Therefore, we
briefly introduce the gamma distribution, which is described in more detail in
Section 3.3.3, below.
f (x) =
c
x1 exp {cx}
()
E[X] =
,
c
Var(X) =
c2
for x 0.
(m
w)
and
MX (r) =
c
cr
for r < c.
tes
The gamma distribution has many nice properties and it is used rather frequently
for the modeling of latent variables and for the modeling of individual claim sizes,
see Section 3.3.3.
no
NL
Var() = 2 / > 0.
and
Proposition 2.20 (negative-binomial distribution, 2nd definition). The negative-binomial distribution as defined in Definition
2.19 satisfies for k A = N0
!
k+1
pk = P[N = k] =
k
(1 p) pk ,
G. Plya
38
(m
w)
(xv)
exp{xv}
x1 exp {x} dx
=
k!
()
0
Z
( + v)+k +k1
(v)k ( + k)
=
x
exp {( + v)x} dx
() k! ( + v)+k 0
( + k)
k
( + k)
v
k+1
=
=
(1 p) pk ,
() k!
+ v
+ v
k
notice that the second last inequality follows because we have a gamma density with shape
parameter + k and scale parameter + v under the integral. This trick of completion should
be remembered because it is applied rather frequently.
2
tes
1 q
1 + v/ > 1/2 > 0,
v
!
1p
for all r < log p,
1 per
no
Vco(N ) =
MN (r) =
NL
Proof. The first three statements are a direct consequence of the proof of Lemma 2.18 and the
properties of the gamma distribution. Therefore, it remains to calculate the moment generating
function. The tower property implies
MN (r) = E E erN
= E [exp {v (er 1)}] = M (v(er 1)),
2
Proposition 2.21 provides a nice interpretation. For given volume v > 0 the expected claims frequency is
N
= .
E
v
Moreover, for the coefficient of variation of the claims frequency N/v we obtain
Vco
N
v
for v .
39
w)
This can be interpreted as follows. The random variable reflects the uncertainty
in the true underlying frequency parameter of the Poisson distribution. This
uncertainty also remains in the portfolio for infinitely large volume v, i.e. this
risk is not diversifiable, and the positive lower bound 1/2 is determined by the
dispersion parameter (0, ). In particular, consider a time series N1 , N2 , . . .
of claims counts in different accounting years 1, 2, . . .. Each of these accounting
years has its own (risk) characteristics 1 , 2 , . . ., like weather conditions, inflation
index, portfolio fluctuations, etc. Since we do not know these characteristics a
priori, i.e. prior to future accounting years, we model these characteristics with a
latent factor (t )t1 which provides the true frequency parameter for accounting
year t, given by t = t . This differs from the Poisson case, see (2.2).
0.010
no
0.015
tes
0.020
binomial
Poisson
negativebinomial
0.000
NL
0.005
(m
0.025
200
300
400
500
600
700
800
40
E[S] = v E[Y1 ],
Var(S) = v E[Y12 ] + (v)2 E[Y1 ]2 /,
Vco(S) =
MS (r) =
(m
w)
1 q
1 + Vco(Y1 )2 + v/ > 1/2 ,
v
!
1p
for r R such that MY1 (r) < 1/p,
1 pMY1 (r)
tes
no
N
X
Yi 1{Yi >M } .
i=1
2.3
NL
Then we have Slc CompNB((1 G(M ))v, , Glc ) where the large claims size
distribution satisfies Glc (y) = P [Y1 y|Y1 > M ].
Parameter estimation
Once we have specified the distribution functions for N and Yi we still need to
determine their parameters. In the case of the claims count distribution of N these
are (i) the default probability p for the binomial distribution; (ii) the expected
claims frequency for the Poisson distribution; or (iii) the expected claims frequency and the dispersion parameter for the negative-binomial distribution.
Essentially, there are three different common ways to estimate these parameters:
1. method of moments (MM),
2. maximum likelihood estimation (MLE) method,
Version April 14, 2016, M.V. Wthrich, ETH Zurich
41
2.3.1
Method of moments
and
2 = 2 (1 , 2 ) = Var(Xt ) < .
(m
= (1 , 2 ) = E[Xt ] <
w)
We start with an example to explain the method of moments. Assume that we have
i.i.d.
an i.i.d. sequence X1 , . . . , XT F , where F is a parametric distribution function
that depends (for simplicity) on a two dimensional (real valued) parameter (1 , 2 ).
Assume that the first two moments of X1 are finite, and thus, for all t = 1, . . . , T
we have mean and variance (as a function of (1 , 2 ))
Remark. For general d-dimensional (real valued) parameters (1 , . . . , d ) we extend the argument to the first d moments of Xt .
b T =
T
1 X
Xt
T t=1
tes
We define the sample mean and sample variance by, T 2 for the latter,
and
bT2 =
T
X
1
(Xt b T )2 .
T 1 t=1
(2.5)
no
A straightforward calculation shows that these are unbiased estimators for and
2 , that is,
E[b T ] = = (1 , 2 )
and
E[bT2 ] = 2 = 2 (1 , 2 ).
(2.6)
NL
This motivates the moment estimator (b1 , b2 ) for (1 , 2 ) by solving the system of
equations
b T = (b1 , b2 )
and
bT2 = 2 (b1 , b2 ).
In our situation the problem is more involved. Assume we have a vector of observations N = (N1 , . . . , NT )0 , where Nt denotes the number of claims in accounting
year t. The difficulty is that Nt , t = 1, . . . , T , are not i.i.d. because they depend
on different volumes vt . That is, in general, the portfolio changes over accounting
years. Therefore, we need to slightly modify the framework described above.
Assumption 2.25. Assume there exist strictly positive volumes v1 , . . . , vT such
that the components of F = (N1 /v1 , . . . , NT /vT )0 are independent with
= E[Nt /vt ]
and
for all t = 1, . . . , T .
Version April 14, 2016, M.V. Wthrich, ETH Zurich
42
Lemma 2.26. We make Assumption 2.25. The unbiased linear (in F) estimator
for with minimal variance is given by
b MV
T
X
1
=
2
t=1 t
!1 T
X
Nt /vt
,
2
t=1 t
T
X
1
=
2
t=1 t
!1
w)
Var
(m
Proof. We apply the method of Lagrange, see Section 24.3 in Merz-Wthrich [79]. We define
the mean vector = e = (1, . . . , 1)0 RT and the diagonal positive definite covariance matrix
= diag(12 , . . . , T2 ) of F. Then we would like to solve the following minimization problem
x+ = arg min{xRT ;x0 =}
1 0
x x,
2
thus, we minimize the variance Var(x0 F) = x0 x subject to all unbiased linear combinations of
F which gives the constraint = E[x0 F] = x0 . The Lagrangian for this problem is given by
1 0
x x c(x0 ),
2
tes
L(x, c) =
L(x, c) = x c = 0
x
and
L(x, c) = x0 + = 0.
c
no
The first requirement implies x = c1 = c1 e. Plugging this into the second requirement
implies = x0 = c2 e0 1 e. If we solve this for the Lagrange multiplier we obtain c =
1 (e0 1 e)1 . This provides
!1
T
X
0
1
1
+
1
x = 0 1 e =
12 , . . . , T2 .
2
e e
t=1 t
NL
bMV = (x+ )0 x+ = (e0 1 e)1 =
Var
T
T
X
!1
t2
t=1
We apply this lemma to the case of the binomial and the Poisson distributions.
Assume that Nt , t = 1, . . . , T , are independent with Nt Binom(vt , p) or Nt
Poi(vt ), respectively. Then we have in the binomial case
E[Nt /vt ] = p
and
and
43
Note that in both cases the unknown parameter p and , respectively, appears in the
variance. However, the appearance is of multiplicative
nature which implies that
1
2 PT
2
. Therefore, we get the following
it cancels in the weights wt = t
s=1 s
moment estimators in the binomial and the Poisson cases.
Estimator 2.27 (moment estimators in the binomial and Poisson cases).
We have the following unbiased linear minimal variance estimators:
pbMV
= PT
T
T
X
s=1
vs
w)
Nt =
t=1
vt
Nt
;
s=1 vs vt
PT
t=1
b MV = P
T
T
T
X
s=1
vs
(m
Nt =
t=1
vt
Nt
.
s=1 vs vt
PT
t=1
tes
p(1 p)
Var pbMV
= PT
T
s=1 vs
and
b MV = P
Var
T
T
s=1
vs
These variances (and uncertainties) converge to zero for Ts=1 vs , and they can
be estimated by replacing the unknown parameters p and , respectively, by their
estimators. Note that we can explicitly give these distributions of the estimators
P
P
because in the former case Tt=1 Nt Binom( Tt=1 vt , p) and in the latter case
PT
PT
t=1 vt ).
t=1 Nt Poi(
no
NL
and
The variance term has two unknown parameters and and we lose the nice multiplicative structure from the binomial and the Poisson case which has allowed to
apply Lemma 2.26 in a straightforward manner. If we drop the condition minimal
variance we obtain the following unbiased linear estimator.
Estimator 2.28 (moment estimator in the negative-binomial case (1/2)).
We have the following unbiased linear estimator for
b NB = P
T
T
T
X
s=1
vs
t=1
Nt =
T
X
t=1
vt
Nt
.
s=1 vs vt
PT
44
In the last formula we could also take other volume weighted averages. The unbib NB immediately follows from the assumptions of the negative-binomial
asedness of
T
distribution. The variance of this estimator is given by
Var
b NB
!2 T
X
PT
s=1
vs
PT
t=1
Var(Nt ) =
vt + (vt )2 /
P
T
s=1
t=1
vs
2
Vb 2
T
w)
2
(2.7)
E VbT2
(m
tes
1
.
no
bTNB
NL
bN B for we have
Proof of Lemma 2.29. Using the unbiasedness of
T
"
#
T
T
2
h i
X
X
1
Nt b N B
1
Nt bN B
2
b
E VT
=
vt E
T
=
vt Var
T
T 1 t=1
vt
T 1 t=1
vt
T
X
1
Nt
Nt b N B
bN B
=
vt Var
2Cov
, T
+ Var
T
T 1 t=1
vt
vt
" T
#
X vt + (vt )2 / PT vt + (vt )2 /
1
t=1
=
.
PT
T 1 t=1
vt
s=1 vs
This proves the lemma.
Nt = b T ,
T
T t=1
Version April 14, 2016, M.V. Wthrich, ETH Zurich
45
which is the sample mean of i.i.d. random variables Nt . For the estimate we
obtain in the uniform volume case
bTNB =
b NB v)2
(
T
b NB v
VbT2 v
T
VbT2 v =
T
1 X
bN B v 2 =
b T2 ,
Nt
T
T 1 t=1
with
and
E[bT2 ] = 2 = v + (v)2 /.
(m
E[b T ] = = v
w)
where the latter term is the sample variance of i.i.d. random variables Nt . Or in
other words, the proposed estimators in the uniform volume vt = v case are found
by looking at the system of equations (2.6). In the negative-binomial model this
system is given by
Replacing and 2 by their sample estimators and solving the system of equations
b NB and
bTNB in the uniform volume case.
provides
T
tes
2.3.2
no
NL
T
Y
(t)
pNt (),
t=1
T
X
(t)
t=1
46
The MLE for is based on the rationale that should be chosen such that the
probability of observing N = (N1 , . . . , NT )0 is maximized. The MLE bMLE
for
T
based on the observation N is thus given by (subject to existence and uniqueness)
= arg max LN () = arg max `N ().
bMLE
T
(m
w
X
(t)
`N () =
log pNt () = 0.
t=1
This is solved by a root search algorithm. Under suitable regularity properties and
real valued parameter the MLE bMLE
is found as solution of
T
(t)
If the probability weights pk () are sufficiently regular as a function of in a regis asympular domain which contains the true parameter , then the MLE bMLE
T
totically unbiased for T and under appropriate scaling it has an asymptotic
Gaussian distribution with inverse Fishers information as covariance matrix, for
details see Theorem 4.1 in Lehmann [70].
pbMLE
= PT
T
tes
vs
T
X
t=1
t=1
vt
Nt
= pbMV
T .
v
v
t
s=1 s
PT
no
s=1
Nt =
NL
X Nt
v t Nt
`N (p) =
= 0.
p
p
1p
t=1
2
T
T
T
X
s=1
vs
Nt =
t=1
T
X
t=1
Nt
b MV .
=
T
v
v
t
s=1 s
vt
PT
T
X
t=1
47
Nt
`N () =
= 0.
vt +
t=1
2
Nt + 1
log
+ log(1 pt ) + Nt log pt = 0,
(, ) t=1
Nt
w)
(m
Unfortunately, this system of equations does not have a closed form solution, and
a root search algorithm is needed to find the MLE solution for (, ), see also page
61 below.
tes
2.3.3
no
NL
We observe a strong growth of volume of more than 40% in this insurance portfolio
from v1982 = 2400 755 policies to v1991 = 3440 757 policies. Such a strong growth
might question the stationarity assumption in the expected claims frequency t
because this growth might also reflect a substantial change in the portfolio (and
the underlying product possibly). Nevertheless we assume its validity (because
the observed claims frequencies Nt /vt do not show any structure such as a linear
trend, see Figure 2.2) and we fit the Poisson and, if necessary, the negative-binomial
distribution to this data set.
Poisson model. We assume that Nt are independent with Nt Poi(vt ). The
linear minimal variance estimator and the MLE for are given by, see Estimator
2.32,
1991
X
b MV =
b MLE = P 1
Nt = 5.43%.
T
T
1991
s=1982 vs t=1982
Version April 14, 2016, M.V. Wthrich, ETH Zurich
volume
vt
240755
255571
269739
281708
306888
320265
323481
334753
340265
344757
3018182
number of
claims Nt
13153
14186
14207
13461
21261
19934
15796
15157
17483
19185
163823
frequency
Nt /vt
5.46%
5.55%
5.27%
4.78%
6.93%
6.22%
4.88%
4.53%
5.14%
5.56%
5.43%
w)
48
(m
Table 2.1: Private households water insurance: number of policies, claims counts
and observed yearly claims frequencies, source Gisler [54].
The coefficient of variation in the Poisson model is given by, see (2.2),
tes
no
d
b MV 1/2 0.8%.
Vco(N
t /vt ) = (T vt )
NL
0.070
49
0.065
0.060
w)
0.055
observed frequencies
0.050
0.045
1982
1984
1986
(m
1988
1990
tes
Figure 2.2: Observed yearly claims frequencies Nt /vt from t = 1982 to 1991 compared to the overall average frequency of 5.43%, see Table 2.1.
VbT2 = 15.84 > 5.43%. Thus, we have a clear over-dispersion which results in the
estimate
bTNB = 56.23
and
d
Vco(N
t /vt ) =
b NB v )1 + (
bTNB )1 13%.
(
t
T
no
NL
This makes much more sense in view of the observed frequencies Nt /vt in Table
2.1. We see that 7/10 of the observations are within these confidence bounds, see
Figure 2.3 (rhs).
We close this section with a statistical test: In the previous example it was obvious
that the Poisson model does not fit to the data. In situations where this is less
obvious we can use the following 2 -goodness-of-fit test.
Null hypothesis H0 : Nt are independent and Poi(vt ) distributed for t = 1, . . . , T .
We are going to build a test statistics for the evaluation of this null hypothesis H0 .
We define
T
X
(Nt /vt )2
= (N) =
.
/vt
t=1
It is not straightforward to determine the explicit distribution function of .
Therefore, we give an approximate answer to this request of hypothesis testing.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
0.070
0.070
50
0.065
0.065
0.060
0.055
observed frequencies
0.060
0.055
observed frequencies
0.050
0.050
1984
1986
1988
1990
1982
1984
1986
1988
1990
(m
1982
w)
0.045
0.045
Figure 2.3: Observed yearly claims frequencies Nt /vt from t = 1982 to 1991 compared to the to the estimated overall frequency of 5.43%. (lhs): 1 standard deviation confidence bounds Poisson case; (rhs): 1 standard deviation confidence bounds
negative-binomial case.
tes
The aggregation and disjoint decomposition theorems (Theorems 2.12 and 2.14)
imply that Nt Poi(vt ) can be understood as a sum of vt i.i.d. random variables
Xi Poi(). That is,
(d)
Nt =
vt
X
Xi ,
no
i=1
with E[X1 ] = and Var(X1 ) = . But then the CLT (1.2) applies with
Ze
Nt /vt
q
/vt
Nt vt (d)
=
=
vt
Pvt
X vt
i
N (0, 1)
vt
i=1
as
vt .
NL
T
X
b MLE .
In the last step we need to replace the unknown parameter by its estimate
T
By doing so, we lose one degree of freedom, thus, we get the test statistics b and
the corresponding distributional approximation
b =
T
X
vt
t=1
51
b MLE
Nt /vt
T
2
b MLE
(d)
2T 1 .
(2.8)
w)
We revisit the previous example. For the data in Table 2.1 we obtain b = 20 627.
The 99%-quantile of the 2 -distribution with T 1 = 9 degrees of freedom is given
by 21.67. Since this is by far smaller than b we reject the null hypothesis H0 on
the significance level of q = 1%. This, of course, is not surprising in view of Figure
2.3 (lhs).
Exercise 5. Consider the data given in Table 2.2. Estimate the parameters for
1
1000
10000
2
997
10000
3
985
10000
4
989
10000
5
1056
10000
6
1070
10000
7
994
10000
8
986
10000
(m
t
Nt
vt
9
1093
10000
10
1054
10000
no
tes
the Poisson and the negative-binomial models. Which model is preferred? Does
a 2 -goodness-of-fit test reject the null hypothesis on the 5% significance level of
having Poisson distributions?
NL
NL
no
tes
(m
w)
52
Chapter 3
w)
(m
N
X
Yi ,
i=1
tes
2. Y1 , Y2 , . . . G with G(0) = 0;
no
NL
3.1
The first observation is that the two data sets contain many claims records with
zero claims payments. That is, many of the recorded claims were settled without
any payments. In the case of PP insurance these were about 16% of the reported
claims and in the case of CP insurance we observe about 21% of zero claims. Zero
claims are due to reasons such as: the final claim does not exceed the deductible,
the insurance company is not liable for the claim, another insurance policy covers
53
54
the claim, reporting a (small) claim reduces the no-claims-benefit too much so that
the insured decides to withdraw the claim, etc.
(m
w)
We can treat zero claims in two different ways: (i) estimate the proportion of
zero claims separately and add this probability weight to G at 0; (ii) we simply
reduce the expected claims frequency by these zero claims. The first way (i) is
mathematically consistent, but contradicts our model assumption G(0) = 0; the
second way (ii) perfectly fits into the compound Poisson modeling framework due
to the disjoint decomposition Theorem 2.14 (also the binomial and the negativebinomial case can be handled, see Examples 3 and 4). In general, the second
version (ii) is the simpler one to deal with (however, one may lose information by
dropping zero claims). Here, we assume that G(0) = 0 and E[N ] = v, where v > 0
is the portfolio size and N only counts strictly positive claims. Henceforth, after
subtracting these zero claims, we have n = 610 053 strictly positive claims records
in PP insurance and n = 140 532 in CP insurance denoted by Y1 , . . . , Yn .
NL
no
tes
We start with the scatter plots of the data, see Figures 3.1 and 3.2. We plot the
individual claim sizes (ordered by arrival date) both on the original scale (lhs) and
on the log scale (rhs). These scatter plots do not offer much information because
Figure 3.1: Scatter plot of the n = 610 053 strictly positive claims records of PP
insurance ordered by arrival date: original scale (lhs) and log scale (rhs).
they are overloaded, at least they do not show any obvious trends (and therefore
suggest stationarity of the data). We calculate the sample means and the sample
variances of the observations, see also (2.5),
b n =
n
1 X
Yi
n i=1
and
bn2 =
n
1 X
(Yi b n )2 ,
n 1 i=1
55
(m
w)
Figure 3.2: Scatter plot of the n = 140 532 strictly positive claims records of CP
insurance ordered by arrival date: original scale (lhs) and log scale (rhs).
For our data sets we obtain empirical moments
CP :
d = 2.42;
b n = 30 116, bn = 70 534, Vco
n
(3.1)
tes
PP :
d = 4.16.
b n = 6 850, bn = 28 505, Vco
n
(3.2)
Next we give the histogram for PP insurance, see Figure 3.3 (lhs). We see that
50000
100000
150000
claim sizes
200000
12000
10000
8000
count
6000
4000
2000
NL
0
60000
50000
40000
count
30000
20000
10000
0
no
250000
10
12
Figure 3.3: Histogram of the n = 610 053 strictly positive claims records of PP
insurance: original scale (lhs) and log scale (rhs).
a few large claims distort the whole picture so that the histogram is not helpful.
We could plot a second one only considering small claims. In Figure 3.3 (rhs) we
plot the histogram for logged claim sizes. In Figure 3.4 we give the corresponding
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
56
Figure 3.4: Box plots of claims records of PP and CP insurance: original scale (lhs)
and log scale (rhs).
tes
box plots. They show positive skewness. The ultimate goal is to model the full
distribution functions G(y) = P[Y y] for the two portfolios PP and CP. Having so
many observations we could directly work with the empirical distribution function
(at least for small claims, see Section 3.4.1) which is given by
b (y) =
G
n
n
1 X
1{Y y} .
n i=1 i
(3.3)
no
The empirical distribution function of logged claim sizes is given in Figure 3.5
(lhs). For a sequence of observations Y1 , . . . , Yn we denote the ordered sample by
Y(1) Y(2) . . . Y(n) . For the next definitions we assume that Y G has finite
mean. We define the loss size index function and its empirical counterpart by
Ry
NL
I(G(y)) =
0 z
R
0 z
Pbnc
dG(z)
dG(z)
and
Ib
n ()
Y(i)
,
i=1 Yi
= Pi=1
n
for [0, 1]. The loss size index function I(G(y)) chooses a claim size threshold
y and then evaluates the relative expected claim that is explained by claim sizes
below this threshold y. The resulting empirical graphs are presented in Figure 3.5
(rhs). Rather typically in non-life insurance we see that the 20% largest claims
roughly cause 75% of the total claim size! This explains that large claims heavily
influence the total claim amount.
We have already seen in the previous figures that large claims may lead to several
modeling challenges. Two plots that especially focus on large claims are the mean
excess plot and the log-log plot. We define the mean excess function and empirical
mean excess function by
Pn
and
ebn (u) =
57
(m
w)
tes
and
u 7 ebn (u),
and
y 7
b (y)) .
log y, log(1 G
n
NL
no
Figure 3.6: Empirical log-log plot (lhs) and empirical mean excess plot (rhs) of PP
and CP insurance data.
In Figure 3.6 we present the empirical log-log and mean excess plots of the two
Version April 14, 2016, M.V. Wthrich, ETH Zurich
58
data sets. Linear decrease in the log-log plot and linear increase in the mean excess
plot will have the interpretation of heavy tailed distributions in the sense that the
= 1 G is regularly varying at infinity, see (3.4) below.
survival function G
3.2
w)
MY (r)
(m
Y2
variance of Y , if it exists,
Vco(Y )
=1G
G
tes
no
NL
59
G(xt)
1 G(xt)
lim
= t .
= lim
x G(x)
x 1 G(x)
(3.4)
3.2.1
Gamma distribution
(m
w)
is slowly
If the above holds true for = 0 then we say G
R0 ; if the above holds
varying at infinity and we write G
is rapidly varying at infinity and we write G
R .
true for = then we say G
R for some
From an insurance point of view distribution functions G with G
[0, ) are dangerous because they have a large potential for big claims, see
Chapter 3 in Embrechts et al. [39]. Therefore, it is crucial to know this index of
regular variation at infinity, see also Remarks 5.17.
no
tes
Some people refer the gamma distribution to Karl Pearson (1857-1936), however, it seems that already Laplace [69] has used it. We have introduced the gamma
distribution in Section 2.2.4 for the definition of the negative-binomial distribution
and we will also see that this distribution is very useful in the context of generalized
linear models and Bayesian modeling, see Chapters 7, 8 and 9 below.
NL
The gamma distribution has two parameters, a shape parameter > 0 and a scale
parameter c > 0. We write Y (, c). The distribution function of Y has positive
support R+ with density for y 0 given by
g(y) =
c 1
y
exp {cy} .
()
There is no closed form solution for the distribution function G. For y 0 it can
only be expressed as
G(y) =
Z y
0
c 1 cx
1 Z cy 1 z
x e
dx =
z e dz = G(, cy),
()
() 0
where G(, ) is the incomplete gamma function. From this we see that the family
of gamma distributions is closed towards multiplication with a positive constant,
that is, for > 0 we have
Y (, c/).
(3.5)
Version April 14, 2016, M.V. Wthrich, ETH Zurich
60
This property is important when we deal with claims inflation and it explains why
c is called scale parameter. For the moment generating function and the moments
we have
MY (r) =
c
cr
for r < c,
,
Y2 = 2 ,
c
c
1/2
Vco(Y ) =
,
Y = 2 1/2 > 0.
=
w)
NL
no
tes
(m
1 I(G(u))
u,
1 G(u)
61
w)
The gamma distribution does not have a regularly varying tail at infinity, see
(m
The method of moments estimators (based on the first two empirical moments) are
given by
b n
b 2
cbMM = 2
and
b MM = n2 .
bn
bn
For the MLE we have log-likelihood function, set Y = (Y1 , . . . , Yn )0 ,
log c log () + ( 1) log Yi cYi .
i=1
tes
`Y (, c) =
n
X
n
0 () 1 X
log Yi = 0.
+
()
n i=1
no
log log b n
(3.6)
NL
cbMLE =
b MLE
.
b n
62
tes
(m
w)
Example 3.2 (gamma distribution for PP data). We fit the PP insurance data
displayed in Figure 3.1 to the gamma distribution.
NL
no
Figure 3.8: Gamma distribution with MM and MLE fits applied to the PP insurance data. lhs: QQ plot; rhs: loss size index function.
Figure 3.9: Gamma distribution with MM and MLE fits applied to the PP insurance data. lhs: log-log plot; rhs: mean excess plot.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
63
(m
w)
From Figures 3.8 and 3.9 we immediately conclude that the gamma model does not
fit to the PP data. The reason is that the data is more heavy tailed. This can be
seen in the QQ plot in Figure 3.8 (lhs): the data at the right end of the distribution
lie substantially above the line. The MM estimators manage to model the data up
to some layer, the MLE estimators, however, are heavily distorted by the small
claims which can be seen in the mean excess plot in Figure 3.9 (rhs). In fact, we
have too many small claims (observations below 1500) to be explained by a gamma
distribution. The MLE is heavily based on these small observations, in Figure 3.8
(rhs) and Figure 3.9 (lhs) we see that MLE fits well for small claims, whereas MM
provides more appropriate results in the upper range of the data. Summarizing, we
should choose more heavy tailed distribution functions to model this data and the
resulting figures are already sufficient for rejecting the gamma model. This first
data example also indicates that probably there is not one distribution that fits all
claims layers. We come back to this in Section 3.4.
Remark 3.3 (inverse Gaussian distribution). A distribution function which is also
found quite often in the actuarial literature is the inverse Gaussian distribution,
see for instance Section 3.9.6 in Kaas et al. [64]. Its density is for y 0 given by
(
3/2
1
2
=
+ cy ,
y
exp
2cy
2
2c
tes
3/2
( cy)2
g(y) =
y
exp
2cy
2c
where > 0 is a shape parameter and c > 0 a scale parameter. Observe that this
density behaves similar as the gamma density for y . For the distribution
function we have a closed form solution in the following (weak) sense
!
no
G(y) = + cy + e2 cy ,
cy
cy
NL
where () is the standard Gaussian distribution. This can be checked by calculating the derivative of the latter. For the moment generating function and the
moments we have
n h
io
Y =
,
Y2 = 2 ,
c
c
Vco(Y ) = 1/2 ,
Y = 31/2 > 0.
n
1X
Y 1
n i=1 i
#1
n
1X
Yi 1
n i=1
and
cbMLE =
b MLE
.
b n
64
3.2.2
Weibull distribution
w)
(m
E.H.W. Weibull
tes
no
NL
This still does not have a regularly varying tail at infinity but the decay of the
is slower than in the gamma case for < 1, see also Table 3.4.4
survival function G
Y
Y2
Y
w)
MY (r)
65
(1 + 1/ )
[G(1 + 1/, (cu2 ) ) G(1 + 1/, (cu1 ) )] ,
c
I(G(y)) = G(1 + 1/, (cy) ),
!
(1 + 1/ ) 1 G(1 + 1/, (cu) )
u.
e(u) =
c
exp{(cu) }
NL
no
tes
(m
For generating Weibull random numbers observe that we have the identity Y =
(d)
1 1/
Z
with Z expo(1) = (1, 1). The R code for the (1, 1) distribution is
c
> rgamma(n, shape=1, rate=1)
The method of moments estimators are given by
cbMM =
(1 + 1/bMM )
b n
and
bn2
(1 + 2/bMM )
+
1
=
.
b 2n
(1 + 1/bMM )2
66
and
n
1X
log(cYi ) ((cYi ) 1) = 1.
n i=1
w)
c=
n
1X
Y
n i=1 i
no
tes
(m
Example 3.4 (Weibull distribution for PP data). We fit the PP insurance data
displayed in Figure 3.1. From Figures 3.11 and 3.12 we see that the Weibull model
Figure 3.11: Weibull distribution with MM and MLE fits applied to the PP insurance data. lhs: QQ plot; rhs: loss size index function.
NL
gives a better fit to the PP data compared to the gamma model. The reason is
that it allows for more probability mass in the upper tail of the distribution, the
estimate for is in the interval (0.5, 0.75). The MM estimators manage to model
the data up to some layer. The MLE estimators, however, are still distorted by
the big mass of small claims which can be seen in the mean excess plot in Figure
3.12 (rhs). Summarizing, we should choose even more heavy tailed distributions to
model this data, and we should carefully treat (and probably separate) small and
large claims.
3.2.3
Log-normal distribution
Making the tail of the distribution function heavier than the Weibull distribution
tail leads us to the log-normal distribution. The log-normal distribution has two
parameters, a mean parameter R and a standard deviation parameter >
Version April 14, 2016, M.V. Wthrich, ETH Zurich
67
(m
w)
Figure 3.12: Weibull distribution with MM and MLE fits applied to the PP insurance data. lhs: log-log plot; rhs: mean excess plot.
tes
no
1
1 (log y )2
1
exp
.
g(y) =
2
2
2 y
NL
log y
G(y) =
,
68
o
= exp + 2 /2 ,
Y2
= exp 2 + 2
Vco(Y ) =
exp{ 2 } 1 ,
1/2
exp{ 2 } 1
exp{ 2 } + 2
,
1/2
exp{ 2 } 1
> 0.
w)
MY (r)
log u(+ 2 )
log u
u.
NL
no
tes
e(u) = Y
!#
(m
log u2 ( + 2 )
log u1 ( + 2 )
E[Y 1{u1 <Y u2 } ] = Y
!
2
log y ( + )
I(G(y)) =
,
69
MM
bn2
+1
= log
b 2n
!#1/2
and
b MM = log b n (b MM )2 /2.
and
(b MLE )2
n
2
1X
=
log Yi b MLE .
n i=1
w)
b MLE
no
tes
(m
Example 3.5 (log-normal distribution for PP data). We fit the PP insurance data
displayed in Figure 3.1. In Figures 3.14 and 3.15 we present the results. We observe
Figure 3.14: Log-normal distribution with MM and MLE fits applied to the PP
insurance data. lhs: QQ plot; rhs: loss size index function.
NL
that the log-normal distribution gives quite a good fit. We give some comments on
the plots: The MM estimator looks convincing because the observations match the
lines quite well in the QQ plot. The only things that slightly disturb the picture
are the three largest observations, see QQ plot. It seems that they are less heavy
tailed then the log-normal distribution suggests. This is also the reason why the
empirical mean excess plot deviates from the log-normal distribution, see Figure
3.15 (rhs). A little bit puzzling is the bad performance of the MLE. The reason is
again that more than 50% of the claims are less than 1500. The MLE therefore is
very much based on these small observations and provides a good fit in that range of
observations but it gives a bad fit for larger claims. We conclude from this that the
PP data set should be modeled with different distributions in different layers. The
reason for this heterogeneity is that PP insurance contracts have different modules
such as theft, water damage, fire, etc. and it is recommended (if data allows) to
model each of these modules separately. This may also explain the abnormalities in
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
70
tes
Figure 3.15: Log-normal distribution with MM and MLE fits applied to the PP
insurance data. lhs: log-log plot; rhs: mean excess plot. Note that the small
hump in the empirical distribution is at CHF 3000 which is probably induced by
a maximal cover for a particular risk factor.
3.2.4
no
the log-log plot because these different modules, in general, have different maximal
covers.
Log-gamma distribution
NL
The log-gamma distribution is more heavy tailed than the log-normal distribution
and is obtained by assuming that log Y (, c) for positive parameters and c.
The density for y 1 is given by
c
g(y) =
(log y)1 y (c+1) ,
()
Y
Y2
Y
c
=
2Y
for c > 2,
c2
1
c
3
2
=
3Y Y Y
Y3
c3
for c > 3.
w)
MY (r)
71
c
[G(, (c 1) log u2 ) G(, (c 1) log u1 )] ,
c1
I(G(y)) = G(, (c 1) log y),
!
1 G(, (c 1) log u)
c
e(u) =
u.
c1
1 G(, c log u)
NL
no
tes
(m
log b n
MM
b
c
log bcMM
1
and
where the latter is solved numerically using, e.g., the R command uniroot().
The MLE is obtained analogously to the MLE for gamma observations by simply
replacing Yi by log Yi .
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
72
NL
no
tes
Figure 3.17: Log-gamma distribution with MM and MLE fits applied to the PP
insurance data. lhs: QQ plot; rhs: loss size index function.
Figure 3.18: Log-gamma distribution with MM and MLE fits applied to the PP
insurance data. lhs: log-log plot; rhs: mean excess plot.
3.2.5
73
Pareto distribution
w)
V.F.D. Pareto
(m
The Pareto distribution specifies a (large claims) threshold > 0 and then only
models claims above this threshold, see also Example 2.16. The claims above this
threshold are assumed to have regularly varying tails with tail index > 0. For
Y Pareto(, ), the density for y is given by
(+1)
tes
y
g(y) =
no
NL
We have closedness towards multiplication with a positive constant, that is, for
> 0 we have
Y Pareto(, ).
MY (r)
Y
Y2
Y
=
for > 1,
1
= 2
for > 2,
( 1)2 ( 2)
2(1 + ) 2 1/2
=
for > 3.
3
(3.7)
74
u2 +1
u1 +1
+1
y
I(G(y)) = 1
for > 1,
1
e(u) =
u
for > 1,
1
for 6= 1,
tes
(m
w
no
NL
As soon as we only study tails of distributions we should use MLEs for parameter
estimation (the method of moments is not sufficiently robust against outliers).
Since the threshold has a natural meaning we only need to estimate . The MLE
is given by
!1
n
1X
MLE
b
=
log Yi log
.
n i=1
i.i.d.
b MLE =
E
n1
and
b MLE =
Var
n2
2 .
2
(n 1) (n 2)
(d)
75
)
=
Z
0
z k
(n)n n1 nz
(n k)
z
e
dz =
(n)k .
(n)
(n)
(3.8)
2
H
b k,n
(m
w)
!1
The Hill estimator is based on the rationale that the Pareto distribution is closed
towards increasing thresholds, i.e. for Y Pareto(0 , ) and 1 > 0 we have for
all y 1
tes
y
0
1
0
P [ Y > y| Y 1 ] =
y
=
1
no
Therefore, if the data comes from a Pareto distribution we should observe stability
H
b k,n
for changing k. The confidence bounds of the Hill estimators are determined
in
by Lemma 3.7.
NL
Example 3.8 (Pareto for extremes of PP insurance). We start the analysis with
the PP insurance data.
To perform this large claims analysis we choose only the largest
H
b k,n
claims of Figure 3.1. The Hill plot k 7
is given in Figure
3.20 (together with confidence bounds of 1 standard deviation,
estimated by Lemma 3.7). We observe a fairly stable picture
in k around value = 2.5 up to the largest 100 claims. For
larger claims the Hill estimator disappears to 4 or 5 which
(once more) explains that the tail of the largest observations
is not really heavy tailed. This is similar to the log-normal
S.I. Resnick
and the log-gamma fit. Sidney Ira Resnick [86] has called
this phenomenon Hill horror plot and it stems from the difficulty that the Hill
estimator cannot correctly adjust non-Pareto like tails. The right-hand side of
Figure 3.20 gives the log-log plot for = 2.5, in accordance to the Hill plot we
see that the slope of the data is slightly less than this value for smaller claims,
but the data becomes less heavy tailed further out in the tails. This becomes also
obvious from the mean excess plot and the QQ plot in Figure 3.21.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
76
NL
no
tes
H
b k,n
with confidence bounds of
Figure 3.20: PP insurance data; lhs: Hill plot k 7
1 standard deviation; rhs: log-log plot for = 2.5.
Figure 3.21: PP insurance data largest claims only; lhs: QQ plot; rhs: mean excess
plot for = 2.5.
Example 3.9 (Pareto for extremes of CP insurance). In a second analysis we
examine the extremes of the CP claims data of Figure 3.2. The results are presented
in Figure 3.22. At the first sight they look similar to the PP insurance example,
i.e. they begin to destabilize between the 150 and 100 largest claims. However, the
main difference is that the tail index is much smaller in the CP example. That is,
there is a higher potential for large claims for this line of business.
Example 3.10 (nuclear power accident example). We revisit the nuclear power
accident data set studied in Hofert-Wthrich [60], see also Sovacool [94].
Version April 14, 2016, M.V. Wthrich, ETH Zurich
77
(m
w)
H
b k,n
with a confidence interval
Figure 3.22: CP insurance data; lhs: Hill plot k 7
of 1 standard deviation; rhs: log-log plot for = 1.4.
tes
no
Fukushima 2011
In Figure 3.24 we provide the Hill plot. We observe that
this data is very heavy tailed. The Hill plot suggests to set the tail index around
empirical distribution
1.0
24
17
10
20
0.0
18
0.2
20
0.4
empirical distribution
NL
21
19
22
0.6
0.8
23
30
40
50
60
17
18
19
20
21
22
23
24
Figure 3.23: 61 largest nuclear power accidents until 2011; lhs: logged claim sizes
(in chronological order, the last entry is Fukushima); rhs: empirical distribution
function of claim sizes.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
78
1.2
0.4
61
51
41
31
21
11
w)
0.6
0.8
Pareto parameter
1.0
Pareto distribution
observations
17
18
20
21
22
23
24
(m
number of observations
19
H
b k,n
Figure 3.24: 61 largest nuclear power accidents until 2011; lhs: Hill plot k 7
with confidence bounds of 1 standard deviation; rhs: log-log plot for = 0.64.
tes
0.64, which means that we have an infinite mean model. The log-log plot in Figure
3.24 shows that this tail index choice captures the slope quite well.
no
Exercise 8. Natural hazards in Switzerland are covered by the so-called Schweizerische Elementarschaden-Pool (ES-Pool). This is a pool of private Swiss insurance
companies which organizes the diversification of natural hazards in Switzerland.
NL
For pricing of these natural hazards one distinguishes between small events and large events, the latter having a
total claim amount exceeding CHF 50 millions per event.
The following 15 storm and flood events have been observed in years 1986 2005 (these are the events with a
total claim amount exceeding CHF 50 millions).
Storm Lothar 26.12.1999
date
amount in CHF mio.
20.06.1986
52.8
18.08.1986
135.2
18.07.1987
55.9
23.08.1987
138.6
26.02.1990
122.9
21.08.1992
55.8
24.09.1993
368.2
08.10.1993
83.8
date
amount in CHF mio.
18.05.1994
78.5
18.02.1999
75.3
12.05.1999
178.3
26.12.1999
182.8
04.07.2000
54.4
13.10.2000
365.3
20.08.2005
1051.1
79
w)
What is the probability that we observe a storm and flood event next year
which exceeds the level of M = 2 billions CHF?
3.3
Model selection
(m
3.3.1
no
tes
In the previous section we have presented different distributions for claim size
modeling and we have debated which one fits best to the observed data. The argumentation was completely based on graphical tools like log-log plots. Graphical
tools are important, but in statistics there are also methodological tools that consider these questions from a more analytical point of view. Two commonly used
tests are the Kolmogorov-Smirnov (KS) test and the Anderson-Darling (AD) test.
These two tests are discussed in Sections 3.3.1 and 3.3.2.
In Section 3.3.3 we give the 2 -goodness-of-fit test and we discuss the Akaike
information criterion (AIC) as well as the Bayesian information criterion (BIC).
Kolmogorov-Smirnov test
NL
A.N. Kolmogorov
80
w)
tes
(m
Assume we have an i.i.d. sequence Y1 , Y2 , . . . from an unknown continuous distribution function G and we denote the corresponding
b , see
empirical distribution function of finite sample size n by G
n
also (3.3). We would like to test whether these samples Y1 , Y2 , . . .
may stem from G0 .
Consider the null hypothesis H0 : G = G0 against the two-sided
alternative hypothesis that these distribution functions differ. We
define the KS test statistics by
b G
Dn = Dn (Y1 , . . . , Yn ) =
G
n
0
N.V. Smirnov
b (y) G (y) .
= sup G
n
0
y
no
The KS test statistics has the property, see (13.4) in Billingsley [12],
as n .
NL
(1)j+1 exp 2j 2 y 2 .
j=1
20%
1.07
10% 5% 2% 1%
1.22 1.36 1.52 1.63
81
(m
w)
Figure 3.24: KS test statistics for method of moments and MLE fits applied to the
PP insurance data; lhs: log-normal distribution; rhs: log-gamma distribution.
NL
no
tes
Example 3.11 (KS test, PP insurance data). We apply the KS test to the lognormal and the log-gamma fits of the PP insurance data, see Examples 3.5 and 3.6.
In the log-normal case we obtain for the MLE fit Dn = 0.05 and for the methods of
moment fit Dn = 0.12. These values are far too large compared to the large sample
size of n = 610 053 and the KS test clearly rejects the null hypothesis of having a
log-normal distribution on the 1% significance level. If we look at Figure 3.24 (lhs)
we see that these big values of the KS test statistics are driven by small claims,
i.e. we obtain a bad fit for small claims, the tails however do not look too badly.
The log-gamma fit looks better than the log-normal fit, see Figure 3.24 (note that
the y-axes have different scales in the two plots). It provides KS test statistics
Dn = 0.04 for the MLE fit and Dn = 0.06 for the method of moments fit. These
values are still far too large to not reject H0 on the 1% significance level.
Conclusion. The claim size modeling should be split into different claim size layers.
Example 3.12 (KS test, tail distribution). In this example we investigate the tail
fits of the Pareto distributions in the CP and the PP examples for the n = 505
largest claims, see Examples 3.8 and 3.9. The results are presented in Figure 3.25.
For the PP insurance data we obtain Dn = 0.027 (for = 2.5) and for the CP
insurance data we receive Dn = 0.061 (for = 1.4). The first value is sufficiently
small so that the null hypothesis cannot be rejected on the 5% significance level,
the CP insurance value reflects just about the critical value on the 5% significance
level, i.e. the resulting p-value is just about 5%. The plot of the point-wise terms of
Dn looks fine for the PP insurance data, however, the graph for the CP insurance
data looks a bit one-sided, suggesting two different regimes (this can also seen from
Figure 3.22).
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
82
Figure 3.25: Point-wise terms of KS test statistics for MLE fits applied to the 505
largest claims; lhs: PP insurance data; rhs: CP insurance data.
3.3.2
Anderson-Darling test
tes
NL
no
The two statisticians Theodore Wilbur Anderson and Donald Allan Darling have developed a modification of the KS
test, the so-called AD test, which gives more weight to the tail
of the distributions. It is therefore more sensitive in detecting
tail fits, but on the other hand it has the disadvantage of not
being non-parametric, and critical values need to be calculated
for every chosen distribution function.
The KS test statistics is modified by the introduction of a weight
T.W. Anderson
function : [0, 1] R+ which then modifies the KS test statistics Dn as follows
q
b
sup Gn (y) G0 (y) (G0 (y)).
y
Different choices of allow to weight different regions of the support of the distribution function differently, the KS test statistics
is obtained by 1. The choice proposed by Anderson and Darling is (t) = (t(1 t))1 in order to investigate the tails of the
distributions.
D.A. Darling
In contrast to the maximal difference between the empirical distribution function
b and the null hypothesis G we could also consider a weighted L2 -distance. This
G
n
0
Version April 14, 2016, M.V. Wthrich, ETH Zurich
83
A2n = n
Z
R
2
b (y) G (y)
G
n
0
G0 (y)(1 G0 (y))
dG0 (y).
3.3.3
w)
K. Pearson
(3.9)
no
tes
(m
There are many other criteria that can be applied for testing fits
and distributional choices. Many of them are based on asymptotic normality. For instance, a 2 -goodness-of-fit test splits the
support of the null hypothesis distribution function G0 into K
disjoint intervals Ik = [ck , ck+1 ), k = 1, . . . , K. Then, data is
grouped according to these intervals, i.e. Ok counts the number
of observed realizations Y1 , . . . , Yn in interval Ik and Ek denotes
the expected number of observations in Ik according to the distribution function G0 . The test statistics of n observations is
then defined by
K
X
(Ok Ek )2
2
.
Xn,K =
Ek
k=1
NL
2
If d parameters were estimated in G0 , then Xn,K
is compared to a 2 -distribution
with K 1 d degrees of freedom, see also Exercise 2 on page 22. Often it is
suggested that we should have Ek > 4 for reasonable results. However, these
rules-of-thumbs are not very reliable.
This 2 -goodness-of-fit test is sometimes also called Pearsons -square test, named
after Karl Pearson (1857-1936) who has investigated this test in 1900.
H. Akaike
84
where `Y is the log-likelihood function of density gi for data Y and d(i) denotes
the number of estimated parameters in gi , for i = 1, 2. For MLE we maximize
(i)
`Y and in order to evaluate the AIC we penalize the model for having too many
parameters. The AIC then says that the model with the smallest AIC value should
be preferred.
The BIC uses a different penalty term for the number of parameters (all these
penalty terms are motivated by asymptotic results). It reads as
(i)
w)
no
tes
(m
and the model with the smallest BIC value should be preferred.
Figure 3.26: Akaikes original hand notes on the AIC (lhs) at the Institute of
Statistical Mathematics in Tokyo, Japan (rhs).
NL
Exercise 9 (AIC and BIC). Assume we have i.i.d. claim sizes Y = (Y1 , . . . , Yn )0
with n = 1000 which were generated by a gamma distribution, see Figure 3.27.
The sample mean and sample standard deviation are given by
b n = 0.1039
and
bn = 0.1050.
If we fit the parameters of the gamma distribution we obtain the method of moments estimators and the MLEs
b MM = 0.9794
and
cbMM = 9.4249,
b MLE = 1.0013
and
cbMLE = 9.6360.
This provides the fitted distributions displayed in Figure 3.28. The fits look perfect
and the corresponding log-likelihoods are given by
`Y (b MM , cbMM ) = 1264.013
and
85
(m
w)
NL
no
tes
Figure 3.27: I.i.d. claim sizes Y = (Y1 , . . . , Yn )0 with n = 1000; lhs: observed data;
rhs: empirical distribution function.
Figure 3.28: Fitted gamma distributions; lhs: log-log plot; rhs: QQ plot.
(a) Why is `Y (b MLE , cbMLE ) > `Y (b MM , cbMM ) and which fit should be preferred
according to AIC?
(b) The estimates of are very close to 1 and we could also use an exponential
distribution function. For the exponential distribution function we obtain
MLE cbMLE = 9.6231 and `Y (cbMLE ) = 1264.169. Which model (gamma or
exponential) should be preferred according to the AIC and the BIC?
86
3.4
3.4.1
w)
In the previous sections we have experienced that it is difficult to fit one parametric
distribution function to the entire range of possible outcomes of the claim sizes.
Therefore, we often consider claim sizes in different layers. Another reason why
different layers of claim sizes are of interest is that re-insurance can often be bought
for different claims layers. For these reasons we would like to understand how claim
sizes behave in different layers. First we discuss the modeling issue and second we
describe modeling of re-insurance layers.
(m
We come back to the issue that the KS test rejects the most popular parametric
i.i.d.
fits, see Example 3.11. We assume that Y1 , Y2 , . . . G and we would like to split
G into different layers. The simplest case is to choose two layers, see Example 2.16,
that is, choose a large claims threshold M > 0 such that G(M ) (0, 1), i.e. G(M )
is bounded away from zero and one. We then define the disjoint decomposition
and
{Y1 > M } .
tes
{Y1 M }
Assume that S CompPoi(v, G). We consider the total claim Ssc in the small
claims layer and the total claim Slc in the large claims layer given by
i=1
Yi 1{Yi M }
and
no
Ssc =
N
X
Slc =
N
X
Yi 1{Yi >M } .
i=1
Theorem 2.14 implies that Ssc and Slc are independent and compound Poisson
distributed with
and
NL
Slc CompPoi (lc v = (1 G(M ))v , Glc (y) = P [Y1 y|Y1 > M ]) .
Thus, we can model large claims and small claims separately (independently).
Observe that we have the following decomposition
G(y) = P [ Y1 y| Y1 M ] G(M ) + P [ Y1 y| Y1 > M ] (1 G(M ))
= Gsc (y)G(M ) + Glc (y)(1 G(M )).
Often a successful modeling approach involves 3 steps:
87
w)
(m
3. Fit a Pareto distribution to Glc for threshold = M , i.e. estimate the tail
index > 0 from the observations exceeding this threshold M .
NL
no
tes
Example 3.13. We revisit the PP and the CP insurance data set. We choose
Figure 3.29: Empirical fit in small claims layer and Pareto distribution fit in large
claims layer, the gray lines show the large claims threshold; lhs: PP insurance data;
rhs: CP insurance data.
large claims threshold M = 500 000 in both cases. In the PP insurance data set we
b
have 237 observations above this threshold, which provides estimate 1 G(M
)=
0
237/61 053 = 0.39%. For the CP insurance example we have 272 claims above
b
this threshold, which provides estimate 1 G(M
) = 1.87%. Next we calculate
the sample mean and the sample coefficient of variation in the small claims layer
{Yi M }:
PP :
b {Yi M } = 20 805,
d
Vco
{Yi M } = 1.80,
CP :
b {Yi M } = 40 377,
d
Vco
{Yi M } = 1.51.
88
3.4.2
(m
w)
tes
Above we have calculated expected values in claims layers E[Y 1{u1 <Y u2 } ] for various parametric distribution functions. This is of interest for several reasons. This
we are going to discuss next.
NL
no
(i) The first reason is that insurance contracts often have deductibles. On the one
hand small claims often cause too much administrative costs, and on the other
hand deductibles are also an instrument to prevent from fraud (moral hazard). For
instance, it can become quite expensive for an insurance company if every insured
claims that his umbrella got stolen. Therefore, a deductible d > 0 of, say, 200
CHF is introduced and the insurance company only covers the claim (Y d)+ that
exceeds this deductible d. In this case the pure risk premium for claim Y G is
given by
E [(Y d)+ ] =
Z
d
(3.10)
under the assumption that P[Y > d] > 0 and that the mean excess function e() of
Y exists.
Remark. Fitting a distribution function to claims data (Y d)+ needs some
care. If the original claims Y G (absolutely continuous with density g), then the
density after deductible is for y d given by
gd (y) =
g(y)
.
1 G(d)
(3.11)
89
(ii) The second reason is that the insurance company may have a maximal insurance
cover per claim, i.e. it covers claims only up to a maximal size of M > 0 and the
exceedances need to be paid by the insured; or, similarly, it may cover claims
exceeding M but has a re-insurance cover for these exceedances. In that case the
insurance company covers (Y M ) and the pure risk premium for this (bounded)
claim is given by
y dG(y) + M P[Y > M ] = E[Y 1{Y M } ] + M P[Y > M ]
(m
w
E [Y M ] =
Z M
tes
no
An issue, when dealing with layers, is claims inflation. Assume we sell insurance
contracts with a deductible d > 0 and we ask for a pure risk premium E [(Y d)+ ].
Since cash flows have time values this premium has to be revised carefully for later
periods as the following theorem shows.
NL
Theorem 3.14 (leverage effect of claims inflation). Choose a fixed deductible d >
0 and assume that the claim at time 0 is given by Y0 . Assume that there is a
deterministic inflation index i > 0 such that the claim at time 1 can be represented
(d)
by Y1 = (1 + i)Y0 . We have
E[(Y1 d)+ ] (1 + i) E[(Y0 d)+ ].
=
=
=
90
where we have twice applied a change of variables. The latter is calculated as follows
!
Z d
Z
E[(Y1 d)+ ] = (1 + i)
P [Y0 > y] dy +
P[Y0 > y] dy
d
1+i
Z
=
(1 + i)
d
1+i
E [(Y0 d)+ ] =
w)
(d)
(m
Choose inflation index i > 0 such that (1 + i) < d. From (3.7) we obtain
Y1 = (1 + i)Y0 Pareto((1 + i), ).
This provides for > 1 and i > 0
1
d
1
1
d > (1 + i) E [(Y0 d)+ ] .
1
tes
E [(Y1 d)+ ] =
d
(1 + i)
= (1 + i)
no
Observe that we obtain a strict inequality, i.e. the pure risk premium grows faster
than the claim sizes itself. The reason for this faster growth is that claims Y0 d
may entitle for claims payments after claims inflation adjustments, i.e. not only the
claim sizes are growing under inflation but also the number of claims is growing if
one does not adapt the deductible to inflation.
NL
Exercise 10. In Figure 3.30 we display the distribution function of loss Y G and
the distribution function of the loss after applying different re-insurance covers to
Y . Can you explicitly determine the re-insurance covers from the graphs in Figure
3.30.
Exercise 11. Assume claims sizes Yi in a given line of business can be described
by a log-normal distribution with mean E[Yi ] = 30 000 and Vco(Yi ) = 4.
Up to now the insurance company was not offering contracts with deductibles. Now
it wants to offer the following three deductible versions d = 200, 500, 10 000. Answer
the following questions:
1. How does the claims frequency change by the introduction of deductibles?
2. How does the expected claim size change by the introduction of deductibles?
3. By which amount changes the expected total claim amount?
91
no
tes
(m
w)
NL
NL
no
tes
(m
w)
92
Chapter 4
(m
w)
no
tes
In Chapter 2 we have introduced claims count distributions for the modeling of the
number of claims N within a fixed time period. In Chapter 3 we have met several
claim size distribution functions G for the modeling of the claim sizes Y1 , Y2 , . . ..
Ultimately, we always need to calculate the compound distribution of S, see Definition 2.1. As explained in Proposition 2.2, we can easily calculate the moments
and the moment generating function of this compound distribution. On the other
hand the distribution function of S given in (2.1) is a notoriously difficult object
because it involves (too) many convolutions of the claim size distribution function
G. The aim here is to explain how we can circumvent this difficulty.
NL
4.1
Approximations
94
4.1.1
Normal approximation
(m
w
distributions may have two different risk drivers in the tail of the distribution function, namely, the number of claims N may contribute to large values of S or single
large claims in Y1 , . . . , YN may drive extreme values in S. Let us concentrate on
the compound Poisson model, in particular, we would like to use the decomposition theorem in the spirit of Example 2.16. In this case, mostly the claim sizes Yi
contribute to the tail of the distribution (if these are heavy tailed). Therefore, we
emphasize that in the light of the compound Poisson model one should separate
small from large claims resulting in the independent decomposition S = Ssc + Slc .
Next, if the expected number of small claims vsc is large, Ssc can be approximated
by a parametric distribution function and Slc should be modeled explicitly. This
we are going to describe in detail in the remainder of this chapter.
tes
no
NL
S vE[Y1 ]
q
vE[Y12 ]
N (0, 1)
as
v .
N
X
i=1
(d)
Yi =
N`
v X
X
`=1 i=1
(`)
Yi
v
X
S` ,
`=1
95
i.i.d.
where S` CompPoi(, G). The first two moments of these compound Poisson distributions
are given by E[S1 ] = E[Y1 ] and Var(S1 ) = E[Y12 ]. Therefore, the assumptions of the CLT are
fulfilled and the claim follows from (1.2).
2
Theorem 4.1 is the motivation for the following approximation of the distribution
function of S
vE[Y12 ]
x vE[Y1 ]
vE[Y12 ]
x vE[Y1 ]
vE[Y12 ]
(4.1)
w)
P [S x] = P
S vE[Y1 ]
tes
(m
no
Example 4.2 (Normal approximation for PP insurance). We revisit the PP insurance data of Example 3.13. We consider 3 different examples:
NL
(a) Only small claims: in this example we only consider the claim size distribution
function G(y) = P [Y y|Y M ], i.e. the claims are compactly supported
in (0, M ]. As explicit claim size distribution function we choose the empirical
distribution of Example 3.13, see Figure 3.29 (lhs), with M = 500 000. We
choose portfolio size v such that v = 100.
(b) Claim size distribution function G is chosen as in (a), but this time we choose
portfolio size v such that v = 1000.
(c) In addition to (b) we add the large claims layer modeled by a Pareto distribution with = M = 500 000 and = 2.5 and for the expected number of
large claims we set lc v = 3.9.
For simplicity the true distribution function is evaluated by Monte Carlo simulation, which contradicts our statement above, but is appropriate for sufficiently
large samples (and sufficient patience). We choose 100000 simulations, this will
be further illustrated in Example 4.11 below.
In Figure 4.1 we present the results of the normal approximation (4.1) in case (a).
We observe an appropriately good fit around the mean but the normal approximation clearly under-estimates the tails of the true distribution function, see log-log
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
96
NL
no
tes
plot in Figure 4.1 (rhs). Moreover, the true distribution function has positive skewness S = 0.43 whereas the normal approximation has zeroqskewness. In the normal
approximation we obtain probability mass (vE[Y1 ]/ vE[Y12 ]) = 6 107 for
a negative total claim amount (which is fairly small).
97
(m
w)
tes
4.1.2
no
Finally, in Figure 4.3 we also include large claims (in contrast to Figure 4.2) having
an expected number of large claims of 3.9 and a Pareto tail parameter of = 2.5.
We see that in this case the normal approximation is useless in the tail, which
strongly favors the large claims separation as suggested in Example 2.16.
NL
In Example 4.2 we have seen that the normal approximation can be useful for large
portfolio sizes v and under the exclusion of large claims. For small portfolio sizes
the approximation may be bad because the true distribution often has substantial
positive skewness. This leads to the idea of approximating the small claims layer
by other distribution functions that enjoy positive skewness.
We choose k R and define the (translated or shifted) random variables
X = k + Z,
where Z (, c)
or
Z LN(, 2 ).
Var(X) = /c2
and
X = 2 1/2 > 0,
98
The idea now is to do a fit of moments between S and X. Assume that S has finite
third moment and then we choose
where Z (, c)
X = k + Z,
or
Z LN(, 2 ),
Var(X) = Var(S)
and
X = S ,
(4.2)
w)
v E[Y1 ] = k + /c,
(m
1. Prove that the fit of moments approximation (4.2) for a translated gamma
distribution for X provides the following system of equations
v E[Y12 ] = /c2
and
E[Y13 ]
= 2 1/2 .
(v)1/2 E[Y12 ]3/2
tes
2. Solve this system of equations for k R, > 0 and c > 0 and prove that it
has a well-defined solution for G(0) = 0.
3. Why should this approximation not be applied to case (c) of Example 4.2?
no
NL
99
(m
w)
NL
no
tes
The KS test rejects the null hypothesis on the 5% significance level for the normal
approximation in both cases (a) and (b), whereas this is not the case for the
translated gamma and log-normal approximations in both cases (a) and (b), the
p-values are clearly bigger than 5%; for the exact p-values we refer to Table 4.1,
below. In case (a) the translated gamma approximation is favored, in case (b)
the translated log-normal approximation (though the differences in the latter are
Version April 14, 2016, M.V. Wthrich, ETH Zurich
100
negligible).
4.1.3
Edgeworth approximation
(m
w)
tes
S vE[Y1 ]
Z= q
.
vE[Y12 ]
no
n dk
X
k
dr
k=0
log MZ (r)|r=0 k
r + o(rn )
k!
as r 0.
NL
d
We set ak = dr
k log MZ (r)|r=0 /k! and note that we have a0 = log MZ (0) = 0,
a1 = E[Z] = 0 and a2 = Var(Z)/2! = 1/2. This provides approximation
n
1 2 X
MZ (r) exp
r +
ak r k
2
k=3
n
X
1 2
= exp r exp
ak rk .
2
k=3
(
MZ (r) er
2 /2
1 +
n
X
k=3
P
ak r k +
n
k=3
ak r k
2!
2
+ . . . .
101
reflected by the upper index n in the summation. Thus, for appropriate constants
bk R we get the approximation (for small r)
MZ (r) e
r2 /2
1 + a3 r 3
bk r k .
(4.3)
k4
Lemma 4.4. Let denote the standard Gaussian distribution function and (k)
its k-th derivative. For k N0 and r R
k r2 /2
r e
= (1)
w)
(m
tes
Note that the first term on the right-hand side is equal to zero because (k+1) (x) goes faster to
zero than erx may possibly converge to infinity. This and the induction assumption for k provides
identity
Z
Z
2
k+1
rx (k+2)
k
(1)
e
(x) dx = r (1)
erx (k+1) (x) dx = r rk er /2 ,
no
Lemma 4.4 allows to rewrite approximation (4.3) for small r as follows, set X
N (0, 1),
h
MZ (r) E erX a3
bk (1)k
k4
NL
=
k4
Assume that Z has distribution function denoted by FZ , then the latter suggests
approximation, see Lemmas 1.2, 1.3 and 1.4,
k4
def.
vE[Y12 ] z + vE[Y1 ],
X
k4
102
This formula provides the refinement of the normal approximation (4.1), namely we
correct the first order approximation by higher order terms involving skewness
and other higher order terms reflected by a3 and bk in (4.4). The Edgeworth
approximation (4.4) is elegant but its use requires some care as we are just going
to highlight.
1
2
0 (z) = ez /2 ,
2
w)
We first consider the derivatives (k) for k 1. The first derivative is given by
dk1 1 z2 /2
k1 z 2 /2
e
=
O
z
e
dz k1 2
and
(m
(k) (z) =
for |z| .
lim EW(z) = 1.
no
tes
Attention. The issue with the Edgeworth approximation EW(z) is that it is not
necessarily a distribution function because it does not need to be monotone in z,
see Example 4.5, below!
Example 4.5. To see the possible non-monotonicity of EW(z) we only take into
account skewness, i.e. a3 = Z Z3 /6 = S /6, and the approximation ez 1 + z in
(4.4). We have
NL
1
0
2
(z) = ez /2 ,
2
1
2
(2) (z) = z ez /2 ,
2
1
1
2
2
(3) (z) = ez /2 + z 2 ez /2 ,
2
2
1
1
1
2
2
2
(4) (z) = z ez /2 + 2z ez /2 z 3 ez /2 .
2
2
2
This implies
d
EW(z) = 0 (z) a3 (4) (z) = 0 (z) 1 3a3 z + a3 z 3 .
dz
(4.5)
Consider the function h(z) = 1 3a3 z + a3 z 3 for positive skewness S > 0. Then
we have
lim h(z) =
and
lim h(z) = ,
z
z
103
tes
(m
w)
which explains that the derivative of EW(z) has both signs and therefore EW(z)
is not monotone. However, in the upper tail of the distribution of S, that is, for
z sufficiently large, the Edgeworth approximation (4.5) is monotone and can be
used as an appropriate approximation. We emphasize that these monotonicity
properties should always be carefully checked in the Edgeworth approximation.
no
NL
We revisit the numerical examples given in Examples 4.3. In Figure 4.6 we give
the approximation in case (a), i.e. expected number of claims equal 100, and in
Figure 4.7 we give the approximation in case (b), i.e. expected number of claims
equal 1000. In both cases we only choose the next additional moment, which is
the skewness and refers to term a3 , and we choose approximation ez 1 + z in
(4.4). We see in both cases that the Edgeworth approximation clearly outperforms
the Gaussian approximation. However, the Edgeworth approximation is still lighttailed which can be seen by comparing it to the translated gamma approximation.
In Figure 4.8 we compare the Edgeworth density (4.5) to the Gaussian density.
We clearly see the influence of the skewness parameter a3 and S > 0, respectively.
Moreover, we also see that the influence of the skewness parameter is decreasing
with a higher expected number of claims. Of course, this exactly reflects the CLT,
see Theorem 4.1.
If we calculate the minimal value of the Edgeworth density (4.5) we obtain in
case (a) the value 9.8 104 and in case (b) the value 4.1 105 . This exactly
explains that the Edgeworth density is not a proper probability density because
it violates the positivity property. However, this only occurs in the range of very
small claims and therefore it can be used as an approximation in the range of large
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
104
NL
no
tes
Figure 4.8: We compare the Edgeworth density (4.5) to the Gaussian density;
lhs: in case (a), i.e. expected number of claims 100; rhs: in case (b), i.e. expected
number of claims 1000.
claims.
Finally, in Table 4.1 we present the p-values resulting from the KS test of the
different approximations, see Section 3.3.1. In this particular case we see that the
translated gamma distribution is preferred in case (a), whereas in case (b) the
approximations are very similar. For this reason, one often chooses a translated
gamma distribution in practice (and also because it can easily be handled). Note
Version April 14, 2016, M.V. Wthrich, ETH Zurich
normal approximation
translated gamma approximation
translated log-normal approximation
Edgeworth approximation
105
w)
(m
We finally remark that there exist similar approximations as the Edgeworth approximation, for instance, the Gram-Charlier expansion, the Laguerre-gamma expansion or the Jacobi-beta expansion. These expansions are quite popular in engineering but they have similar weaknesses as the Edgeworth approximation and we
will not further discuss them.
4.2.1
tes
4.2
Panjer algorithm
no
NL
106
Note that Panjer distributions require p0 > 0, otherwise the recursion for k 1 will not provide a well-defined distribution function.
Bjrn Sundt and William S. Jewell (1932-2003) have characterized the Panjer distributions. This is stated in the following
lemma.
B. Sundt
w)
Lemma 4.7 (Sundt-Jewell [95]). Assume N has a non-degenerate Panjer distribution. N is either binomially, Poisson or negative-binomially distributed.
b
>0
k
for all k N.
W.S. Jewell
tes
pk = pk1
(m
no
This is exactly the Poisson distribution with parameters a = 0 and b = v > 0 for A = N0
because for the Poisson distribution we have, see Section 2.2.2, pk /pk1 = v/k.
Case (ii). Assume a < 0. To have positive probabilities we need to make sure that a + b/k
remains positive for all k A. This requires |A| < . We denote the maximal value in A
by v N (assuming it has pv > 0). The positivity constraint then provides b/v > a and
a + b/(v + 1) = 0. The latter implies that pk = 0 for all k > v and is equivalent to the requirement
v = (a + b)/a > 0. We set p = a/(1 a) (0, 1) which provides
v+1
b
a(v + 1)
p
pk = pk1 a +
= pk1 a
= pk1
1
.
k
k
1p
k
NL
pk1
p vk+1
p
p v+1
=
+
.
1p
k
1p 1p k
This is exactly the binomial distribution with parameters a = p/(1 p) and b = (v + 1)p/(1 p)
and A = {0, . . . , v}.
Case (iii). Assume a > 0. In this case we define = (a + b)/a > 0. This provides b = a( 1)
and
b
1
pk = pk1 a +
= pk1 a 1 +
.
k
k
Since the latter should be summable in order to obtain a well-defined distribution function we
need to have a < 1. For the negative-binomial distribution we have, see Proposition 2.20,
pk
p(k + 1)
p( 1)
=
=p+
.
pk1
k
k
This is exactly the negative-binomial distribution with parameters a = p and b = p( 1) and
A = N0 . This proves the lemma.
2
107
The previous lemma shows that the (important) claims count distributions that
we have considered in Chapter 2 are Panjer distributions and the corresponding
choices a, b R are provided in the proof of Lemma 4.7. We restate this in the
next corollary.
w)
def.
fr = P[S = r] =
p0
Pr
k=1
a+
b kr
gk frk
for r = 0,
for r > 0.
tes
(m
Proof of Theorem 4.9. We will prove a more general result in Theorem 4.9(B) below.
no
Remarks.
NL
The Panjer algorithm requires a Panjer distribution for N and strictly positive
and discrete claim sizes Yi N, P-a.s. Then it provides an algorithm that
allows to calculate the compound distribution without doing the involved
convolutions (2.1): Assume N Poi(v), henceforth, a = 0, b = v and for
rN
r
X
k
fr =
v gk frk .
(4.6)
r
k=1
Theorem 4.9 allows to apply recursion (4.6) as follows
f0 = p0 = ev ,
f1 = vg1 f0 ,
1
f2 = v g1 f1 + vg2 f0 ,
2
1
2
f3 = v g1 f2 + v g2 f1 + vg3 f0 ,
3
3
..
.
Observe that fr only depends on f0 , . . . , fr1 .
Version April 14, 2016, M.V. Wthrich, ETH Zurich
108
In practical applications there might occur the situation where the initial
value f0 is nonsensical on IT systems. This has to do with the fact that
IT systems can represent numbers only up to some numerical precision. Let
us explain this using the compound Poisson distribution providing Panjer
algorithm (4.6). If the expected number of claims v is very large, then
on IT systems the initial value f0 = p0 = ev may be interpreted as zero
and thus the algorithm cannot start due to missing numerical precision and
meaningful starting value. We call this numerical underflow.
w)
In this case we can modify the Panjer algorithm as follows: choose any strictly
positive starting value fe0 > 0 and develop the iteration
(m
tes
Observe that this provides a multiplicative shift from fr to fer . The true
probability weights are then found by
n
no
NL
n
X
fr 1
as n ,
r=0
"N
X
i=1
Yi = dr = P
"N
X
Yi /d = r = P
i=1
"N
X
Ye
=r ,
i=1
with Yei = Yi /d N.
For non-discrete claim sizes Yi we need to discretize them in order to apply
the Panjer algorithm. Choose span size d > 0 and consider for k N0
G((k + 1)d) G(kd) = P [kd < Y1 (k + 1)d] .
Version April 14, 2016, M.V. Wthrich, ETH Zurich
109
These probabilities can now either be shifted to the left or to the right endpoint of the interval [kd, (k+1)d]. We define the two new discrete distribution
functions for k N0
h
(4.7)
(4.8)
+
gk+1
= P Y1+ = (k + 1)d = G((k + 1)d) G(kd),
and
h
w)
S =
N
X
Yi sd S =
i=1
(m
where the latter means P[Y1 > x] P[Y1 > x] P[Y1+ > x]. This implies
N
X
Yi sd S + =
i=1
N
X
Yi+ ,
i=1
tes
for Yi being i.i.d. copies of Y1 and Yi+ being i.i.d. copies of Y1+ (also independent of N ). Thus, we get lower and upper bounds S sd S sd S + which
become more narrow the smaller we choose the span d. In most applications,
especially for small v, these bounds/approximations are sufficient compared
to the other uncertainties involved in the prediction process (parameter estimation uncertainty, etc.).
NL
no
To S + we can directly apply the Panjer algorithm. S is more subtle because it may happen that g0 > 0 and, thus, the Panjer algorithm cannot be
applied in its classical form of Theorem 4.9. In the case of the compound
Poisson distribution this problem is easily circumvented due to the disjoint
decomposition theorem, Theorem 2.14, which says that
S =
N
X
i=1
Yi =
N
X
i=1
(d)
Yi 1{Y >0} = Se
i
e = v(1 g )
has again a compound Poisson distribution with parameters v
0
e
and weights of the claim sizes gk = gk /(1 g0 ) for k N. Finally, we apply
the Panjer algorithm to the compound Poisson distributed random variable
Se to get the second bound. We prefer to give a more general version of
the Panjer algorithm that also allows to treat the case g0 > 0, see Theorem
4.9(B) below.
There are more sophisticated discretization methods, but often our proposal
(4.7)-(4.8) is sufficient. Moreover, it provides explicit upper and lower bounds
which is an advantage if one tries to quantify the precision of the approximation.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
110
Theorem 4.9(B) (modified Panjer algorithm) Assume S has a compound distribution according to Model Assumptions 2.1 with N having a Panjer distribution
with parameters a, b R and the claim size distribution G is discrete with support
N0 (we allow for g0 = P[Y1 = 0] > 0). Then we have for r N0
P
k
kN0 pk g0
P
r
1
k=1 a
1ag0
b kr
for r = 0,
for r > 0.
gk frk
fr = P[S = r] =
kN
(m
w
Proof of Theorem 4.9(B). Note that we have kpk = (ak + b)pk1 = a(k 1)pk1 + (a + b)pk1 .
We multiply this equation with (MY1 (x))k1 MY0 1 (x) and sum over k N. This provides the
identity
X
X
kpk (MY1 (x))k1 MY0 1 (x) =
(a(k 1)pk1 + (a + b)pk1 ) (MY1 (x))k1 MY0 1 (x).
kN
kN0
tes
whereas the right-hand side fulfills, again using the derivative of MS (x) in the second step,
X
(a(k 1)pk1 + (a + b)pk1 ) (MY1 (x))k1 MY0 1 (x)
kN
(akpk + (a + b)pk ) (MY1 (x))k MY0 1 (x) = aMS0 (x)MY1 (x) + (a + b)MS (x)MY0 1 (x).
no
kN0
Thus, we have just proved that the moment generating function for compound Panjer distributions
satisfies the following differential equation
MS0 (x) = aMY1 (x)MS0 (x) + (a + b)MY0 1 (x)MS (x).
NL
k0
l1
k1
l0
r1
X
gk (r k)frk + (a + b)
k=0
=
=
arg0 fr + a
r
X
r
X
kgk frk
k=1
gk (r k)frk + (a + b)
k=1
r
X
arg0 fr + ar
k=1
r
X
kgk frk
k=1
gk frk + b
r
X
k=1
r
X
k=1
Dividing both sides by r 1 and bringing the first term on the right-hand side of the last equality
to the other side we obtain
r
X
k
gk frk .
(1 ag0 )fr =
a+b
r
k=1
111
kN
p0 +
pk g0k
kN
kN
pk g0k ,
kN0
w)
where in the second last step we have used the independence property of the claim sizes Yi .
This finishes the proof. Note that we have (implicitly) assumed that there exists a positive
radius of convergence for the moment generating functions, see also Lemma 1.1. We can do this
w.l.o.g. because in order to calculate fr = P[S = r] we may replace the claim sizes Yi by bounded
claim sizes Yi (r + 1) and the resulting probability weight fr will be the same.
2
(m
+
gk+1
kd
(k + 1)d
NL
no
tes
gk
Figure 4.9: Discretized claim size distributions (gk )k and (gk+ )k ; lhs: case (i) with
span d = 100 000; rhs: case (ii) with span d = 10 000.
As span size we choose two different values: (i) d = 100 000 and (ii) d = 10 000. In
Figure 4.9 we plot the resulting probability weights (gk )k and (gk+ )k . We see that
the discretization error disappears for decreasing span d.
We then implement the Panjer algorithm in R. The implementation is rather
straightforward. In a first step we invert the ordering in the claim size distributions
Version April 14, 2016, M.V. Wthrich, ETH Zurich
112
(gk )k and (gk+ )k so that in the second step we can apply matrix multiplications.
This looks as follows:
Note that we shift indexes by 1 (because arrays start at 1)
for (k in 0:(Kmax-1)) { g[2,Kmax-k] <- g[1,k+1]*k }
f[1] <- exp(-lambda * v)
for (r in 1:(Kmax-1)) {
f[r+1] <- g[2,(Kmax-r):(Kmax-1)] %*% f[1:r] * lambda * v / r
}
no
tes
(m
w)
#
>
>
>
NL
113
(m
w)
tes
slope of the blue line). We observe that asymptotically the compound Poisson
distribution with v = 1 coincides with the Pareto claim size distribution.
NL
no
Example 4.11. We revisit case (c) of Example 4.2. For large claims Slc we assume
a compound Poisson distribution with expected number of claims lc v = 3.9 and
Pareto(, ) claim size distribution with = 500 000 and = 2.5. We choose the
same two discretizations as in Example 4.10, see Figure 4.9, and then we apply the
Panjer algorithm to the large claims layer as explained above. The results for the
distributions of Slc are presented in Figures 4.12 and 4.13.
The results are in line with the ones of Example 4.10 and we should prefer span size
d = 10 000 which gives a sufficiently good approximation to the continuous Pareto
claims size distribution. Observe that due to lc v = 3.9 the resulting compound
Poisson distribution has more modes now, see Figure 4.12 (rhs). In Figure 4.13 we
see that the asymptotic behavior is sandwiched between the Pareto distribution
Pareto(, ) with tail parameter = 2.5 and this Pareto distribution stretched
with the expected number of claims lc v = 3.9 (blue lines in Figure 4.13). We
observe a rather slow convergence to the asymptotic slope which tells us that
parameter estimation for Pareto distributions is a very difficult (if not impossible)
task if only few observations are available.
Finally, we convolute the large claims layer Slc of case (c) in Example 4.2 with the
corresponding small claims layer Ssc , see case (b) of Example 4.2. For the small
claims layer we choose a translated gamma distribution as approximation to the
true distribution function of Ssc , i.e. we set
S = Ssc + Slc Xsc + Slc ,
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(4.9)
(m
w)
114
NL
no
tes
Figure 4.13: Log-log plot of compound Poisson distribution with lc v = 3.9 from
Panjer algorithm; lhs: case (i) with span d = 100 000; rhs: case (ii) with span
d = 10 000.
where Xsc is the translated gamma approximation to Ssc (see Example 2.16 and
(4.2)) and Slc are the discretized versions of Slc which model the large claims layer
having a compound Poisson distribution with Pareto claim sizes.
In order to calculate the compound Poisson random variable Slc we apply the Panjer
algorithm with span d = 10 000. The disjoint decomposition theorem, see Theorem
2.14 and Example 2.16, implies that in the compound Poisson case we may and will
assume that the large claims separation leads to an independent decoupling of Ssc
Version April 14, 2016, M.V. Wthrich, ETH Zurich
115
(m
w)
+ Slc for
Figure 4.14: Case (c) of Example 4.2: exact discretized distribution Xsc
span d = 10 000, Monte Carlo approximation and normal approximation (only rhs).
lhs: discrete probability weights (upper and lower bounds); rhs: log-log plot (see
also Figure 4.3 (rhs)).
(1)
fk
no
tes
and Slc , and Xsc and Slc , respectively, see (4.9). Therefore, the aggregate distribution of Xsc + Slc is obtained by a simple convolution of the marginal distributions
of Xsc and Slc . Using also a discretization for the distribution function of Xsc to
, the
the same span d = 10 000 as in the Panjer algorithm for Slc , denoted by Xsc
convolution of Xsc + Slc can easily be calculated analytically. That is, no Monte
Carlo simulation is needed. Namely, denote the discrete probability weights of Xsc
(2)
(1)
by (fk )k0 and the discrete probability weights of Slc by (fk )k0 , i.e. set
h
= P Xsc
= kd
(2)
and
fk
= P Slc = kd .
NL
fr = P Xsc
+ Slc = rd =
r
X
(1) (2)
fk
frk .
(4.10)
k=0
116
w)
for heavy tailed distribution functions the Monte Carlo simulation approach has
a weak speed of convergence performance. Note that convolution (4.10) is exact,
and in some sense this discretized version can be interpreted as an optimal Monte
Carlo sample with equidistant observations.
We conclude that approximation (4.9) with a translated gamma distribution for
the small claims layer and a compound Poisson distribution with Pareto tails for
the large claims layer is often a good model for total claim amount modeling in
non-life insurance. Moreover, using a discretization with appropriate span size d
the resulting discrete distribution function can be calculated analytically (and we
obtain upper and lower bounds which can be controlled).
(m
no
tes
Table 4.2: Resulting key figures, the 99.5%-VaR corresponds to the 99.5%-quantile
of S, see Example 6.25, below. The 99.5%-VaR is calculated with the discretized
version with span d = 10 000, therefore we obtain upper and lower bounds resulting
+ Slc .
from the discretization error in Xsc
Finally, in Table 4.2 we present the resulting key figures. We observe that the
resulting distribution function is substantially more heavy tailed than the Gaussian
distribution which is not surprising in view of Figure 4.14 (rhs).
NL
4.2.2
We briefly sketch the fast Fourier transform (FFT) to explain the main idea. We
follow Embrechts-Frei [38], Section 6.7 in Panjer [84], and we also recommend Cern
[27] as a reference.
In Chapter 1 we have introduced the moment generating function of X given by
MX (r) = E[erX ]. The beauty of such transforms is that they allow to treat independent random variables in an elegant way, in the sense that convolutions turn
into products, i.e. for X and Y independent we have (whenever they exist)
MX+Y (r) = MX (r)MY (r).
For compound distributed random variables S we have, see Proposition 2.2,
MS (r) = MN (log MY1 (r)).
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(4.11)
117
If we manage to identify the right-hand side of the latter equation, that is, find Z
such that MN (log MY1 (r)) = MZ (r), then Lemma 1.2 explains that S and Z have
the same distribution function and we do not need to perform the convolutions (if
Z is sufficiently explicit). This is also the idea behind this section.
w)
(m
J.B.J. Fourier
Assume we have finite support A = {0, . . . , n 1} and that (fl )lA is a discrete
distribution function on A. The discrete Fourier transform of (fl )l is defined by
fz =
n1
X
zl
fl exp 2i
n
l=0
for z A.
(4.12)
tes
no
The discrete Fourier transform has the following nice inversion formula
X
1 n1
zl
fl =
fz exp 2i
n z=0
n
(
for l A.
(4.13)
NL
This provides the first part of the idea to the algorithm: if we are able to explicitly
calculate the discrete Fourier transform (fz )z , then the inversion formula provides
the wanted probability weights (fl )l . Note that this idea also applies if (fl )l are
weights that do not necessarily add up to 1.
Remarks. In the literature one also finds an other definition of the discrete Fourier
transform, namely in (4.12) the factor 2i is sometimes replaced by 2i. This
implies that we also need a switch of sign in the inversion formula (4.13). Similarly,
the scaling n1 in (4.13) may be shifted to (4.12). Note that the discrete Fourier
transform acts on the cyclic group Z/nZ.
The above gives the following recipe:
Step 1. Choose threshold n N up to which we would like to determine the
distribution function of S, i.e. we are interested in P[S n 1].
Version April 14, 2016, M.V. Wthrich, ETH Zurich
118
Step 2. Discretize the claim severity distribution G to obtain weights (gk )kA .
For discretization we refer to the last section on the Panjer algorithm, see
P
remarks on page 107. Note that typically we have kA gk < 1, because
claims Yi may exceed threshold n 1 with positive probability.
Step 3. Calculate the discrete Fourier transform (
gz )zA of of (gk )kA .
z
z
fz = MS 2i
= MN log MY1 2i
n
n
w)
Step 4. Calculate the discrete Fourier transform (fz )zA of S (fl )lA using
identity (4.11) with r = 2iz/n and (
gz )zA , respectively, that is, set
= MN (log gz ) .
(4.14)
(m
Step 5. Apply the inversion formula to obtain (fl )lA from (fz )zA .
l=0
zl
gl exp 2i d
2
2d1
X1
l=0
2d1
X1
l=0
g2l
2zl
exp 2i d
2
2d1
X1
g2l exp 2i
2d1
z(2l + 1)
exp 2i
2d
2d1
X1
z
+ exp 2i d
2
l=0
g2l+1 exp 2i
zl
2d1
z
gb(1) ,
2d z
NL
= gbz(0) + exp 2i
zl
g2l+1
l=0
no
gz =
d 1
2X
tes
The remaining part of the FFT explains how to calculate the discrete Fourier
transform (
gz )zA of Y1 (gl )lA efficiently. There is a nice recursive algorithm
that allows to calculate these discrete Fourier transforms for the choices n = 2d ,
d N0 . The discrete Fourier transform of (gl )l for n = 2d is given by
(0)
where gbz(0) is the discrete Fourier transform of (gl )l=0,...,m1 = (g2l )l=0,...,m1 and
(1)
gbz(1) is the discrete Fourier transform of (gl )l=0,...,m1 = (g2l+1 )l=0,...,m1 for m =
2d1 . Note that this step reduces length 2d to length 2d1 and iterating this until we
have reduced the total length 2d to 20 = 1 calculates the discrete Fourier transform
of (gl )l in an efficient way.
Observe that the total length of (fz )z is also n = 2d . Therefore, the exactly same
recursive algorithm is applied for the calculation of the inversion formula to obtain
(fl )l .
In R there is a command for the FFT. Use the following lines to transform a
discrete, finite distribution g = (gl )l :
119
For more information on the FFT and calculation with complex numbers we refer
to Cern
[27].
w)
claims counts N
precision
O(n3 )
O(n2 )
O(n log n)
any distribution
Panjer distributions
any distribution
exact
exact
not exact
(m
method
NL
no
tes
Observe that we have hidden one issue when applying the FFT to compound distributions. As mentioned above, the discrete Fourier transform acts on the cyclic
group Z/nZ. But transformation (4.14) does not respect this cyclic structure and
compound claims that exceed n 1 are wrapped around. This wrap around error
(also called aliasing error) can be substantial and needs a careful consideration. If
it is too large, then n should be increased so that less probability mass exceeds the
threshold n 1, an example is provided in Figure 4.15.
Figure 4.15: Panjer algorithm versus FFT for the compound Poisson distribution
with v = 1 and discrete claim size distribution (g` )` with g` = 1/10 for ` =
1, . . . , 10 with (lhs) n = 12, (middle) n = 15, and (rhs) n = 20.
NL
no
tes
(m
w)
120
Chapter 5
w)
no
tes
(m
Ruin theory has its origin in the early twentieth century when
Ernst Filip Oskar Lundberg (1876-1965) [71] wrote his famous Uppsala PhD thesis in 1903. It was later the distinguished
Swedish mathematician and actuary Harald Cramr (18931985) [29, 30] who developed the cornerstones in collective risk
and ruin theory and has made many of Lundbergs ideas mathematically rigorous. Therefore, the underlying process studied in
ruin theory is called Cramr-Lundberg process. For the collected
H. Cramr
work of Cramr we refer to [31]. Since then a vast literature has
developed in this field, important contributions are Feller [45], Bhlmann [19],
Rolski et al. [89], Asmussen-Albrecher [7], Dickson [36], Kaas et al. [64] and many
scientific papers by Hans-Ulrich Gerber and Elias S.W. Shiu. Therefore,
this theory is sometimes also called Gerber-Shiu risk theory, see Kyprianou [68].
NL
H.-U. Gerber
5.1
E.S.W. Shiu
122
Ct = Ct
= c0 +
t
X
(u Su ) ,
u=1
w)
tes
(m
no
Ct 0
for all t 0,
NL
otherwise the company cannot fulfill its liabilities at any point in time t N0 . In
the present set-up we look at a homogeneous surplus process (having independent
and stationary increments Xt = t St ). Moreover, no financial return on assets is
considered. Of course, this is a rather synthetic situation. For the present purpose
it is sufficient because it already highlights crucial issues and it will be refined for
solvency considerations in Chapter 10.
Definition 5.2 (ruin time and finite horizon ruin probability). We define the ruin
time of the surplus process (Ct )tN0 by
= inf {s N0 ; Cs < 0} .
The finite horizon ruin probability up to time t N and for initial capital c0 0
is defined by
t (c0 ) = P [ t| C0 = c0 ] = P
inf C (c0 )
s=0,...,t s
<0 .
Remark on the notation. Below we use that for c0 = 0 the stochastic process
(0)
(Ct )tN0 = (Ct )tN0 is a random walk on the probability space (, F, P) starting
Version April 14, 2016, M.V. Wthrich, ETH Zurich
123
(c )
(0)
at zero. The general surplus process can then be described by (Ct 0 )tN0 = (Ct +
c0 )tN0 under P and, as stated in Definition 5.2, we can indicate the initial capital
by using the notation P[|C0 = c0 ]. In Markov process theory it has naturalized
that the latter is written as Pc0 [] meaning that (Ct )tN0 under Pc0 is equal in law
(0)
to (Ct + c0 )tN0 under P.
The event { t} can be written as follows
n
{ t} = inf {s N0 ; Cs < 0} t =
{Cs < 0} ,
w)
s=0,...,t
(m
and therefore is a stopping time w.r.t. the filtration generated by (Ct )tN0 . To
consider the limiting case t we need to extend the positive real line by an
additional point {} because is not necessarily finite, P-a.s. We use the notation
R+ for the extended positive real line [0, ].
The finite horizon ruin probability t (c0 ) is non-decreasing in t and it is
bounded by 1 (because it is a probability). This immediately implies convergence
for t and
tes
(5.1)
no
Lemma 5.3 (ultimate ruin probability). The ultimate ruin probability for initial
capital c0 0 is given by
[0, 1].
NL
Proof. The second equality is a direct consequence of the definition, note that
[
[ [
[
{ < } =
{ t} =
{Cs < 0} =
{Ct < 0} = inf Ct < 0 .
tN0 s=0,...,t
tN0
tN0
tN0
For the first equality we use the monotone convergence property of probability measures, note
{ t} { t + 1},
"
#
[
Pc0 [ < ] = Pc0
{ t} = lim Pc0 [ t] = lim t (c0 ) = (c0 ).
t
tN0
Zt = Ct
(0)
c0 = C t
t
X
(u Su ) =
u=1
t
X
u=1
Xu ,
(5.2)
124
w)
Theorem 5.4 (random walk theorem). Assume Xt are i.i.d. with P[X1 = 0] <
1 and E[|X1 |] < . The random walk (Zt )tN0 defined in (5.2) has one of the
following three behaviors
(m
tes
no
Corollary 5.5 (ultimate ruin with probability one). Assume E[1 ] E[S1 ]. Then (c0 ) 1 for any initial capital
c0 0.
Proof. The random walk theorem implies for E[X1 ] = E[1 ]E[S1 ]
0 that lim inf t Zt = , P-a.s., and thus lim inf t Ct = , Pc0 -a.s (for any c0 0). But
this means that we have ultimate ruin with probability 1.
2
NL
Henceforth, for avoiding ultimate ruin with positive probability we need to charge
an (expected) annual premium E[1 ] which exceeds the expected annual claim
E[S1 ]. This gives rise to the following standard assumption.
Assumption 5.6 (net profit condition). The surplus process satisfies the net profit
condition (NPC) given by
E[1 ] > E[S1 ].
125
5.2
w)
Our next goal is to find more explicit bounds on the ruin probability as a function
of the initial capital c0 0.
Lundberg bound
(m
We start with a lemma which gives the renewal property of the surplus process.
i.i.d.
We define the distribution function F by S1 1 F . Thus, we have Xt F .
Note that from S1 FS , 1 F and independence of S1 and 1 it follows
F = FS F .
tes
Lemma 5.8. The finite horizon ruin probability and the ultimate ruin probability
satisfy the following equations for t N0 and initial capital c0 0
t+1 (c0 ) = 1 F (c0 ) +
Z c0
t (c0 y) dF (y),
(c0 y) dF (y).
no
(c0 ) = 1 F (c0 ) +
Z c0
Proof. We start with the finite horizon ruin probability. Observe that we have partition for
c0 0
{ t + 1} = { 1} {1 < t + 1} = {S1 1 > c0 } {1 < t + 1}.
NL
Z c0
= P[S1 1 > c0 ] +
Pc0 [ 1 < t + 1| C1 = c0 y] dF (y)
Z c0
= P[S1 1 > c0 ] +
Pc0 y [ t] dF (y)
Z c0
= 1 F (c0 ) +
t (c0 y) dF (y).
The ultimate ruin probability statement is a direct consequence of the finite horizon statement.
Using that we have point-wise convergence (5.1) and that t is bounded by 1 which is integrable
w.r.t. dF we can apply the dominated convergence theorem to the finite horizon ruin probability
statement which provides the claim for the ultimate ruin probability as t .
2
126
no
tes
(m
w
Lemma 5.10 (uniqueness of Lundberg coefficient). Assume that (NPC) holds and
that a Lundberg coefficient R > 0 exists. Then, R is unique.
NL
Proof. Due to the existence of a Lundberg coefficient R > 0 and due to the independence
between S1 and 1 the following function is well-defined for all r [0, R] and satisfies
r 7 h(r) = log MS1 1 (r) = log(MS1 (r) M1 (r)) = log E erS1 + log E er1 .
Similar to Lemma 1.6 we see that h(r) is a convex function on [0, R] with h(0) = 0 and h0 (0) =
E[S1 1 ] < 0 under (NPC). But then there is at most one R > 0 with h(R) = 0. This proves
the uniqueness of the Lundberg coefficient.
2
Theorem 5.11 (Lundbergs exponential bound). Assume (NPC) and R > 0 exists.
(c0 ) eRc0
for all c0 0.
Proof. It suffices to prove that t (c0 ) eRc0 for all t N because t (c0 ) (c0 ) for t .
We apply Lemma 5.8 to the finite horizon ruin probability t (c0 ) to obtain the following proof
by induction.
127
t = 1: We apply Chebychevs inequality to obtain for Lundberg coefficient R > 0 and any c0 0
h
i
1 (c0 ) = Pc0 [ 1] = P[S1 1 > c0 ] = P eR(S1 1 ) > eRc0
eRc0 MS1 1 (R) = eRc0 .
t t + 1: We assume that the claim holds true for t (c0 ) and any c0 0. Then with Lemma 5.8
Z
Z c0
t+1 (c0 ) =
dF (y) +
t (c0 y) dF (y)
c0
Z
Z c0
eR(c0 y) dF (y) +
eR(c0 y) dF (y)
c0
(m
w
due to the choice of the Lundberg coefficient R > 0. This proves the Lundberg bound.
tes
(c0 ) eRc0 .
no
NL
as x .
This means that the claims S1 have exponentially decaying tails which are
so-called light tailed claims.
A main question is whether this exponential bound can be improved in the case
where the Lundberg coefficient exists. The difficulty in most selected cases is that
the ultimate ruin probability cannot be calculated explicitly. An exception is the
Bernoulli case.
Proposition 5.12 (Bernoulli random walk). Assume that Xt are i.i.d. with P[Xt =
1] = p and P[Xt = 1] = 1 p for given p > 1/2. For all c0 N we have
(c0 ) =
1p
p
!c0 +1
128
Note that this model is obtained by assuming t 1 and St {0, 2} with probability p having a zero claim.
Proof. We choose a finite interval (1, a) for a N and define for fixed c0 [0, a) N0 the
stopping time
a = inf {s N0 ; Cs = c0 + Zs
/ (1, a)} .
w)
The random walk theorem implies a < , P-a.s., because the interval (1, a) is finite. We define
the random variable
c +Z
C
1p 0 t
1p t
Yt =
=
.
p
p
It satisfies
"
=
Yt1
(m
E [ Yt | Yt1 ]
#
"
#
c0 +Zt1 +Xt
Xt
1p
Yt1 = Yt1 E
Yt1
p
#
1
1p
1p
(1 p)
= Yt1 ,
+p
p
p
1p
p
"
no
tes
thus (Yt )t0 is a martingale. Then also the stopped process (Ya t )t0 is a martingale. Moreover,
the latter martingale is bounded and since the stopping time is finite, P-a.s., we can apply the
stopping theorem (uniform integrability), see Section 10.10 in Williams [97], which provides
c
1p 0
= E[Y0 ] = E[Ya ]
p
1
a
1p
1p
Pc0 [Ca = 1] +
Pc0 [Ca = a]
=
p
p
1
a
1p
1p
=
Pc0 [Ca = 1] +
(1 Pc0 [Ca = 1]) ,
p
p
NL
where the last step follows because (Ct )tN0 leaves the interval (1, a), Pc0 -a.s., either at 1 or
at a. This provides the identity
c0
a
1p
1p
p
p
Pc0 [Ca = 1] =
1
a .
1p
1p
p
p
Finally, note that {Ca = 1} is increasing in a and thus
1p
p
c0 +1
,
2
The Lundberg coefficient for the Bernoulli random walk is found by the positive
solution of
R
= 1,
i.e.
p
R = log
1p
1p
p
eRc0 .
> 0.
129
That is, the Lundberg bound is optimal in the sense that we cannot improve the
exponential order of decay because the Lundberg coefficient R already provides the
optimal order.
5.3
(m
w)
In most cases we cannot explicitly calculate the ultimate ruin probability (c0 ).
Exceptions are the Bernoulli random walk of Proposition 5.12 and the CramrLundberg process in continuous time with an exponential claim size distribution,
see (5.3.8) in Rolski et al. [89]. In other cases where the Lundberg coefficient
exists we apply Lundbergs exponential bound of Theorem 5.11, or refined versions
thereof. But the following question remains: what can we do if the Lundberg
coefficient does not exist, i.e. if the tail probability of St does not necessarily decay
exponentially? The latter is quite typical in non-life insurance modeling.
Pollaczek-Khinchin formula
5.3.1
Ladder epochs
no
tes
if k1 < ,
otherwise.
NL
k is called the k-th strong descending ladder epoch, see (6.3.6) in Rolski et al. [89].
These stopping times form an increasing sequence that record the arrivals of new
ladder heights (descending records). For their distribution functions we have under
the i.i.d. property of the Xt s (independent and stationary increments)
h
The probability of a finite ladder epoch is exactly equal to the ultimate ruin probability (0) with initial capital c0 = 0.
Note that we could have t St 0, P-a.s., which would imply that the ultimate
ruin probability (0) = 0 because the premium collected is bigger than the maximal
Version April 14, 2016, M.V. Wthrich, ETH Zurich
130
claim, P-a.s. We exclude this situation as it is not interesting for ruin probability
considerations and because the insured will (hopefully) never pay a premium that
exceeds his maximal loss in any situation. Henceforth, under (NPC) we throughout
assume that (0) (0, 1) (where the upper bound follows from (NPC)).
We define the random variable
K + = sup {k N0 ; k < } .
K + counts the total number of finite ladder epochs, i.e. the total number of strong
descending records. We have (applying the tower property several times)
h
w)
(m
that is, the total number of finite ladder epochs has a geometric distribution with
success probability 1 (0) (0, 1) under (NPC). On the set {K + = k}, k 1,
we study the ladder heights which are for l k given by
Zl+ = Zl1 Zl > 0,
P-a.s.
The random variable Zl+ measures by which amount the old local minima Zl1 is
improved. Due to the i.i.d. property of the Xt s we have
{Zl+
l=1
xl } K +
=k =
k
Y
P Zl+ xl l < =
k
Y
H(xl ),
(5.3)
l=1
l=1
tes
" k
\
K
X
M =
l=1
no
where the distribution function H neither depends on k nor on l. Thus, the ladder
heights (Zl+ )l=1,...,k are i.i.d. on the set {K + = k}. Finally, we consider the maximal
height achieved by (Zt )tN0 , this is the global minimum of the random walk
(Zt )tN0 ,
Zl+ = Z0 ZK + = ZK + = sup Zt = inf Zt .
tN0
tN0
This now allows to study the ultimate ruin probability as follows. Choose initial
capital c0 0. The ultimate ruin probability is given by
NL
= P M + > c0
= (1 (0))
P K+ = k P
kN0
K+
X
K
X
Zl+
l=1
= P inf Zt < c0
tN0
Zl+ >
l=1
(0)k 1 P
kN
= (1 (0))
c0 K +
c0 K +
= k
= k
(0)k 1 H k (c0 ) .
kN
This proves Spitzers formula, which is Corollary 6.3.1 in Rolski et al. [89]:
Theorem 5.13 (Spitzers formula). Assume (0) (0, 1). Then for c0 0
(c0 ) = (1 (0))
(0)k 1 H k (c0 ) .
kN
131
(u Su ) =
u=1
t
X
F.L. Spitzer
Xu .
u=1
w)
t
X
Cramr-Lundberg process
(m
5.3.2
In classical (continuous time) ruin theory one starts with a homogeneous Poisson
point process (Nt )tR+ having constant intensity v > 0 for the arrival of claims.
The premium income is modeled proportionally to time with constant premium
rate v > 0. The continuous time surplus process is then defined by C0 = c0 0
and for t > 0
= c0 + vt
Nt
X
tes
Ct
(5.4)
Su ,
u=1
no
with i.i.d. claim amounts Su satisfying Su > 0, P-a.s., and with these claim amounts
being independent of the claims arrival process (Nt )tR+ . This continuous time
surplus process (Ct )tR+ is called Cramr-Lundberg process. Definition 5.2 of the
ruin time is then extended to continuous time, namely
= inf {s R+ ; Cs < 0} .
NL
Note that ruin can only occur at time points where claims happen, otherwise the
continuous time surplus process (Ct )tR+ is strictly increasing with constant slope
v > 0 (in fact, the continuous time surplus process is a spectrally negative Lvy
process, see Chapter 1 in Kyprianou [68]). We define the inter-arrival times between
two claims by Wu , u N. For the homogeneous Poisson point process (Nt )tR+
these inter-arrival times are i.i.d. exponentially distributed with parameter v >
0. Therefore, we can rewrite the continuous time surplus process in these claims
P
arrival times by, define Vn = nu=1 Wu ,
def.
Cn =
CVn
NVn
= c0 + vVn
X
u=1
Su = c0 +
n
X
(vWu Su ).
u=1
This is exactly in the set-up of Definition 5.1 with i.i.d. premia u = vWu , u N.
A crucial thing that has changed is time, moving from t R+ to operational time
n N0 , and therefore
P [ < | C0 = c0 ] = Pc0 [ < ] = (c0 ),
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(5.5)
132
with u = vWu . For (NPC) we require premium rate v > 0 such that
0 < E[X1 ] = vE[W1 ] E[S1 ] = v/(v) E[S1 ]
v > vE[S1 ].
H(x) = 1 E[S1 ]1
w)
(5.6)
(m
We do not prove this statement, it uses the Wiener-Hopf factorization, for details we refer to Theorem 6.4.4 in Rolski et al. [89].
F. Pollaczek
R
Note that H is a distribution function on R+ because 0 P[S1 > y] dy = E[S1 ].
This then allows to state the following theorem which gives the Flix Pollaczek
(1892-1981) and Aleksandr Yakovlevich Khinchin (1894-1959) formula.
tes
k 1 H k (c0 ) ,
kN
no
NL
(c0 ) =
Z
c0
(1 FS (x))dx +
Z c0
0
with distribution function S1 FS . We do not prove this statement because the Pollaczek-Khinchin formula is sufficient for A.Y. Khinchin
our purposes. The exact assumptions and a proof of this integral equation are, for
instance, provided in Rolski et al. [89], Theorem 5.3.2.
We conclude that for the compound Poisson case (5.4) we have three different
descriptions for the ultimate ruin probability: (i) probabilistic description, (ii)
Pollaczek-Khinchin formula from renewal theory, and (iii) the integral equation.
Depending on the problem one then chooses the most convenient one, i.e. we can
apply different techniques coming from different fields to solve the questions.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
5.4
133
(m
w
1 F n (x)
= n.
1 F (x)
In fact, this is an if and only if statement.
lim
x
tes
3. For all > 0 there exists D < such that for all n 2 and all x 0
1 F n (x)
D(1 + )n .
1 F (x)
no
Proof of Lemma 5.15. We start with the following statement for subexponential distribution
functions F : for all t R
1 F (x t)
lim
= 1.
(5.7)
x
1 F (x)
NL
We first prove (5.7). Choose t 0, then we have for x > t, using monotonicity of F ,
Z x
1 F 2 (x)
F (x) F 2 (x)
1 F (x y)
1 =
=
dF (y)
1 F (x)
1 F (x)
1 F (x)
0
Z x
Z t
1 F (x y)
1 F (x y)
=
dF (y) +
dF (y)
1
F
(x)
1 F (x)
0
t
1 F (x t)
F (t) +
(F (x) F (t)) .
1 F (x)
This implies (the sandwich is for lim inf x lim supx )
1 F (x t)
1 F 2 (x)
1
1 lim
lim sup (F (x) F (t))
1 F (t)
= 1.
x
1 F (x)
1 F (x)
x
For t < 0 note that
lim
1 F (x t)
= lim
x
1 F (x)
1
1F (x)
1F (xt)
= lim
y 1F (y(t))
1F (y)
= 1.
(5.8)
134
Z
0
1 F (x y)
dF (y).
1 F (x)
(5.9)
We now turn to the proof of the first statement of Lemma 5.15. We prove the claim by induction.
For n = 2, 1 the statement holds true by definition. Thus, we assume that it holds true for n 2
and we would like to prove it for n + 1. Choose > 0 then there exists x0 such that for all x > x0
1 F n (x)
1 F (x) n < .
(m
w
Z x
1 F (n+1) (x)
F (x) F (n+1) (x)
1 F n (x y)
1=
=
dF (y)
1 F (x)
1 F (x)
1 F (x)
0
Z xx0
Z x
1 F n (x y) 1 F (x y)
1 F n (x y)
=
dF (y) +
dF (y).
1 F (x y)
1 F (x)
1 F (x)
0
xx0
The second integral is non-negative and using (5.7) we obtain
Z x
Z x
1
1 F n (x y)
dF (y) lim sup
dF (y)
lim sup
1 F (x)
x
x
xx0 1 F (x)
xx0
1 F (x x0 )
F (x) F (x x0 )
= lim sup
= 1 + lim sup
= 0.
1 F (x)
1 F (x)
x
x
no
tes
For the first integral we have for x > x0 , using the triangle inequality,
Z xx0
Z xx0
1 F (x y)
1 F n (x y) 1 F (x y)
dF (y) n
dF (y) 1 n
1 F (x y)
1 F (x)
1 F (x)
0
0
Z xx0
n
1 F (x y)
1 F (x y)
+
n
dF (y)
1 F (x y)
1 F (x)
0
Z xx0
Z xx0
1 F (x y)
1
F
(x
y)
dF (y) 1 +
dF (y).
n
1 F (x)
1 F (x)
0
0
NL
Finally observe
Z xx0
Z x
Z x
1 F (x y)
1 F (x y)
1 F (x y)
dF (y) =
dF (y)
dF (y),
1
F
(x)
1
F
(x)
1 F (x)
0
0
xx0
the first integral converges to 1, see (5.8), and the second integral converges to 0 because it is
non-negative with
Z x
Z x
1 F (x y)
1
lim sup
dF (y) lim sup
dF (y)
1 F (x)
x
x
xx0
xx0 1 F (x)
F (x) F (x x0 )
1 F (x x0 )
= lim sup
= 1 + lim sup
= 0.
1 F (x)
1 F (x)
x
x
This proves that for all > 0 there exists x1 x0 such that for all x > x1 we have
1 F (n+1) (x)
(n + 1) 4.
1 F (x)
This proves the first statement of Lemma 5.15. We now turn to the second statement of the
lemma. Note that for 0 < y < x
erx (1 F (x)) =
1 F (x)
(1 F (x y))er(xy) ery .
1 F (x y)
1
r
135
log(3/(1 )) > 0. With (5.7) there exists x0 such that for all x > x0
sup
(m
w)
n+1
tes
where we have used (5.9) in the last step. The subexponentiality of F implies that for all > 0
there exists x0 such that
1
n+1 1 +
+ n (1 + ).
1 F (x0 )
Iteration provides
1
1
+ 1+
+ n1 (1 + ) (1 + )
1 F (x0 )
1 F (x0 )
n1
X
1
(1 + )k + (1 + )n
1+
1 F (x0 )
k=0
X
n
1
1
1
1+
(1 + )k
1+
(1 + )n+1 ,
1 F (x0 )
1 F (x0 )
1+
no
n+1
k=0
NL
which proves the claim for D = (1 + (1 F (x0 ))1 )/ (0, ). This proves Lemma 5.15.
= r
P erX > y dy =
Z
0
P [X > log(y)/r] dy
erx P [X > x] dx = .
(5.10)
We conclude that for any r > 0 the moment generating function of subexponential
distributions does not exist, and therefore there is no Lundberg coefficient in this
case. We call such subexponential distributions heavy tailed distributions.
136
Theorem 2.5.5 in Rolski et al. [89] gives an important sufficient condition for having
a subexponential distribution.
Lemma 5.16 (regularly varying survival function). Assume that F is supported on
R+ and has regularly varying survival function at infinity with index (0, ),
i.e. for all y > 0
1 F (xy)
lim
= y ,
x 1 F (x)
w)
then F is subexponential.
Proof. Assume that X1 and X2 are two i.i.d. random variables with regularly varying survival
functions with parameter (0, ). Note that we have for all (0, 1)
(m
{X1 + X2 > x} {X1 > (1 )x} {X2 > (1 )x} {X1 > x, X2 > x}.
The i.i.d. property implies
1 F 2 (x)
1 F (x)
(0,1) x
tes
lim sup
inf 2(1 ) = 2.
(0,1)
On the other hand we have for any positively supported distribution function F , see also (5.9),
F (x) F 2 (x)
= 1+
= 1+
1 F (x)
Z x
1+
dF (y) = 1 + F (x),
no
1 F 2 (x)
1 F (x)
1 F (x y)
dF (y)
1 F (x)
NL
1 F 2 (x)
2.
1 F (x)
Note that the lower bound holds true for any distribution function supported on R+ .
Remarks 5.17. Lemma 5.16 gives the connection to classical extreme value theory.
In extreme value theory one distinguishes three different domains of attraction for
tail behavior, see Section 3.3 in Embrechts et al. [39]: (i) Weibull case, which are
distribution functions with finite right endpoint of their support; (ii) Gumbel case,
which are light tailed to moderately heavy tailed distribution functions; (iii) Frchet
case, which are heavy tailed distribution functions. The Frchet case is exactly
characterized by regularly varying survival functions with (tail) index (0, ),
see Theorem 3.3.7 in Embrechts et al. [39]. This index has already been met
in Section 3.2, see formula (3.4). Lemma 5.16 now says that every distribution
Version April 14, 2016, M.V. Wthrich, ETH Zurich
137
regularly varying at
no
no
no
yes
yes
(m
w
subexponential
gamma distribution
no
Weibull distribution with < 1
yes
log-normal distribution
yes
log-gamma distribution
yes
Pareto distribution
yes
tes
We apply the Pollaczek-Khinchin formula, see Theorem 5.14, to obtain the following result in the subexponential case.
no
NL
(c0 )
=
.
1 H(c0 )
1
Proof. Our aim is to apply Lemma 5.15 to the Pollaczek-Khinchin formula. The latter provides
X
(c0 )
1 H k (c0 )
= (1 ) lim
k
.
c0 1 H(c0 )
c0
1 H(c0 )
lim
kN
X
kN
X
X
1 H k (c0 )
(1 )
k D(1 + )k = (1 )D
((1 + ))k < ,
1 H(c0 )
kN
kN
138
because (1 + ) < 1. Thus, we have found a uniform integrable upper bound and this allows to
exchange the two limits. This provides
X
X
(c0 )
1 H k (c0 )
= (1 )
k lim
= (1 )
k k.
c0 1 H(c0 )
c0 1 H(c0 )
lim
kN
kN
The last term is the expected value of the geometric distribution which is given by /(1 ).
This proves the theorem.
2
= 1
Z
x
P[S1 > y] dy
1Z y
(m
H(x) = 1 E[S1 ]1
w)
N. Veraverbeke
+1
1 x
dy = 1
lim
(c0 )
c0
+1
no
c0 1
tes
.
( 1)
That is, we have found in the Pareto (subexponential) case for > 1
(c0 )
c0 +1
( 1)
as c0 .
NL
139
NL
no
tes
(m
w)
The most general version of asymptotic ruin behavior in the subexponential case
goes back to Paul Embrechts and Nol Veraverbeke [41]. However, an important missing piece in the argumentation was provided by Charles M. Goldie.
The Pareto case has previously been solved by Bengt von Bahr [8].
NL
no
tes
(m
w)
140
Chapter 6
w)
(m
From the random walk Theorem 5.4 and from Assumption 5.6 we see that we need
to charge an (expected) premium that exceeds the expected claim amount E[St ],
otherwise there is ultimate ruin, P-a.s. This is referred to the net profit condition
(NPC). In the present chapter we assume that the premium t is deterministic, then
(NPC) reads as t > E[St ]. For simplicity (because we consider a fixed accounting
year in this chapter) we drop the time index t and then (NPC) is given by
tes
> E[S],
(6.1)
no
we justify why the insurance company can charge a premium that exceeds the
average claim amount E[S], i.e. why the insured is willing to pay a premium
that exceeds his expected claim amount E[S]; and
we give different pricing principles to calculate premium loadings E[S] > 0.
NL
Simple solution (expected value principle). Choose a fixed constant > 0 and
charge (to everyone) the premium
= (1 + ) E[S].
(6.2)
Example 6.1 (expected value principle). We consider two different portfolios with
claims S1 and S2 having the same mean E[S1 ] = E[S2 ]. Under the previous simple
solution both insured pay the same insurance premium
= (1 + ) E[S1 ] = (1 + ) E[S2 ] > E[S2 ] = E[S1 ].
We give an explicit distributional example.
141
142
w)
6.1
(m
Conclusion. The premium loading should be risk-based! That is, the loading
E[S] > 0 should reflect the risk of fluctuations of S around its mean E[S].
tes
The first notion of risk is usually described by the variance of a random variable.
Therefore, we assume in this section that the second moment of S exists.
Variance loading principle. Choose a fixed constant > 0 and define the
insurance premium by
= E[S] + Var(S).
NL
no
This non-linearity of the variance implies that the premium cannot easily be scaled
with exchange rates and inflation indexes. Therefore, one often studies modifications of the variance principle which brings us to the next principle.
Standard deviation loading principle. Choose a fixed constant > 0 and
define the insurance premium by
= E[S] + Var(S)1/2 = E[S] (1 + Vco(S)) ,
Version April 14, 2016, M.V. Wthrich, ETH Zurich
143
1/2
+
>
= E[S2 ] + Var(S2 )1/2 = 2 .
c
c
c
For the risky position S1 we charge a premium that strictly exceeds the expected
claim and the loading is zero for the deterministic claim S2 . The standard deviation
loading principle is usually better understood than the variance loading principle
because practitioners often have a good feeling for appropriate ranges of the coefficient of variation. For instance, they know that for certain lines of business
it should be around 10%. Moreover, this principle is invariant under changes of
currencies. Assume that rfx > 0 is again the (deterministic) exchange rate between
two different currencies. Then we obtain the identity
(m
w)
tes
The previous examples consider rather simple premium loading principles and there
are more principles of this type such as the modified variance principle. In the next
section we describe more sophisticated principles which are motivated by economic
behavior of financial agents and give risk measurement and risk management perspectives. These more advanced principles try to describe decision making and
include:
no
NL
vi
40
30
10
i
25%
23%
19%
(i)
E[Y1 ]
2000
1700
4000
(i)
Vco(Y1 )
2.5
2.0
3.0
Assume that the car fleet can be modeled by a compound Poisson distribution.
1. Calculate the expected claim amount of the car fleet.
2. Calculate the premium for the car fleet using the variance loading principle
with = 3 106 .
144
6.2
In this section we consider more advanced principles for the calculation of premium
loadings. These considerations can also be viewed as an introduction to economic
decision making, risk measurement and risk management.
6.2.1
(m
w)
Utility theory aims at modeling the happiness index of financial agents making economic decision. That is, for a financial
agent holding a position X, we try to evaluate an index that
quantifies his happiness generated by this position X.
Utility theory can be introduced in a rather general framework
using preference ordering. If this system of preference ordering
is sufficiently regular then there exists a so-called numerical representation for the preference ordering, for details we refer to the J. von Neumann
book of Fllmer-Schied [47].
tes
We always start from the latter and assume that there exists a
John von Neumann (1903-1957) and Oskar Morgenstern
(1902-1977) representation for the preference ordering on a given
set
X L1 (, F, P).
O. Morgenstern
NL
no
145
50
50
100
10
15
20
(m
100
w)
2500
2000
1500
1000
500
gamma>1
gamma=1
gamma<1
Figure 6.1: lhs: exponential utility function with = 0.05 and I = R, see (6.6);
rhs: power utility function with {0.5, 1, 1.5} and I = R+ , see (6.7).
no
tes
and
u00 < 0
on I, respectively.
NL
(6.3)
i.e. we strictly prefer X over Y . In this context, X has always the interpretation of
a gain and if the gain of position X dominates the gain of position Y (in the above
sense) we have strict preference X Y . We conclude: u introduces a preference
ordering on X where positive outcomes of X X describe gains and negative
outcomes losses.
Strict concavity property. Strict concavity implies that we can apply Jensens
inequality which provides for all X X
E [u(X)] u (E [X]) ,
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(6.4)
146
(m
w)
This latter property is exactly the argument why policyholders are willing to pay
an insurance premium that exceeds their average claim amount E[Y ], and hence
finance (NPC). Assume that a policyholder has (deterministic) initial wealth c0 and
he faces a risk that may reduce his wealth by (the random amount) Y . Hence, he
holds a risky position X = c0 Y and his happiness index of this position is given
by E[u(c0 Y )] if u describes the (risk-averse) utility function of this policyholder.
The strict concavity and increasing properties now imply the following preference
tes
no
The left-hand side describes the present happiness and the right-hand side describes
the happiness that he would achieve if he could exchange Y by E[Y ]. Therefore,
any deterministic premium > E [Y ] such that
NL
would make him more happy than his current position c0 Y . Thus, strict concavity
and increasing property of u implies that he is willing to pay any premium in
the (non-empty) interval
E [Y ] , c0 u1 (E [u(c0 Y )]) ,
(6.5)
to improve his happiness position. The lower bound of this interval is the (NPC)
and the upper bound is the maximal price that the policyholder will just tolerate
according to his risk-averse utility function u (this bound may also be infinite).
The less risk-averse he is the narrower the interval will get. The extreme case of
risk-neutrality, which corresponds to the linear function u(x) = x, will just provide
that the upper bound is equal to the lower bound in (6.5), and no insurance is
necessary.
147
The two most popular utility functions are, see also Figure 6.1:
exponential utility function, constant absolute risk-aversion (CARA) utility
function: for > 0 (defined on I = R)
u(x) = 1
1
exp {x} ;
(6.6)
( x1
1
log x
for 6= 1,
for = 1.
(6.7)
(m
u(x) =
w)
power utility function, constant relative risk-aversion (CRRA) utility function, isoelastic utility function (defined on I = R+ )
Example 6.3 (exponential utility function). Assume that the policyholder has
exponential utility function (6.6), he has initial wealth c0 and he faces a risky
position Y L1 (, F, P) with Var(Y ) > 0 and Y 0, P-a.s. This implies that
the expected claim is given by E[Y ] > 0. The exponential utility function has the
following properties
and
tes
no
Therefore, it is strictly increasing and concave on R, see Figure 6.1 (lhs). Its inverse
is given by
1
u1 (y) = log ((1 y)) .
This implies that acceptable premia lie in the non-empty interval, see (6.5),
1
E [Y ] , log E [exp{Y }] ,
where the upper bound is infinite if the moment generating function of Y does not
exist in . The important observation in this example is that the price tolerance
in does not depend on the initial wealth c0 of the policyholder. We will see that
this property uniquely holds true for the exponential utility function, and we may
ask the question how realistic this property is in real world decision making?
NL
Example 6.4 (power utility function). Assume that the policyholder has power
utility function (6.7), he has initial wealth c0 > 1 and he faces a risky position Y
Bernoulli(p = 1/2). This implies that the expected claim is given by E[Y ] = 1/2.
The power utility function has the following properties
u0 (x) = x > 0
and
148
c0 (c0 1) = b(c0 ).
= c0
This implies that any possible premium lies in the non-empty interval, see (6.5),
q
1
, c0 c0 (c0 1) .
2
w)
The important observation in this example is that the price tolerance in depends
on the initial wealth c0 > 1 of the policyholder.
and
0.6
c0
0.8
lim b(c0 ) = 1
c0 1
1.0
1.2
(m
no
b0 (c0 ) = 1
c20 c0
NL
1 2c0 1
q
=
2 c0 (c0 1)
0.0
tes
0.2
0.4
10
c0 (c0 1)
c20 c0 + 1/4
c0 (c0 1)
< 0.
This shows that we have strict monotonicity in the initial capital c0 > 1, i.e. the
richer the policyholder the narrower the price tolerance interval (6.5), see also
Example 6.14, below.
Definition 6.5 (utility indifference price). The utility indifference price =
(u, FS , c0 ) R for utility function u, initial capital c0 I and risky position
S FS is given by the solution of (subject to existence)
u(c0 ) = E[u(c0 + S)].
149
Of course, and S need to be such that c0 + S I, P-a.s. This may give rise
to restrictions on the range of S if I is a bounded interval, see also Example 6.4.
Note that if the utility indifference price exists, it is unique. This follows from
the strict monotonicity of u.
w)
The utility indifference price given in Definition 6.5 gives the insurance companys
point of view. It is assumed that the insurance company has initial capital c0 I,
similar to the surplus process given in Definition 5.1. It will then only accept an
insurance contract S at price if the resulting utility does not decrease, i.e. if it is
indifferent about accepting S at price and not selling such a contract.
(m
Jensens inequality and the strict increasing property of u immediately provide the
following corollary.
Corollary 6.6. The utility indifference price = (u, FS , c0 ) for initial capital c0 ,
risk-averse utility function u and risky position S FS satisfies
= (u, FS , c0 ) > E[S].
tes
Proof. Exercise. 2
no
NL
Remarks.
150
(m
w
u(x) = a b exp{cx},
for a R and b, c > 0.
tes
Remark. Note that the utility function u(x) = a b exp{cx} gives the same
preference ordering as the exponential utility function (6.6) with c = : if we have
two different utility functions u() and v() with v = a + bu for a R and b R+
(positive affine transformation) then they generate the same preference ordering.
no
Proof of Proposition 6.8. Note that assumption u C 2 is not necessary because concavity
implies that u is differentiable almost everywhere, and this is sufficient to prove the result, for
details on this we refer to Lemma 1.8 in Schmidli [91].
Direction is immediately clear just by evaluating Definition 6.5. So we prove direction .
The following proof is borrowed from Schmidli [91]. Choose S Bernoulli(p). Definition 6.5
implies for this Bernoulli claim S identity
NL
for utility indifference price = (p) = (u, p, c0 ) depending on p (0, 1) only. We now consider
the derivatives w.r.t. c0 and p. The former provides
u0 (c0 )
(c0 + )
c0
where in the last step we have used the assumption that the premium does not depend on
c0 . The derivative w.r.t. p is given by (the implicit function theorem provides existence of the
derivative of w.r.t. p, denoted by 0 (p))
0
(6.8)
151
Strict increasing property of u implies that 0 (p) > 0. Next we calculate the derivatives of (6.8)
w.r.t. c0 and p (again using the implicit function theorem for the latter). This provides the two
identities
u00 (c0 ) 0 (p) = u0 (c0 + ) u0 (c0 + 1),
and
u0 (c0 ) 00 (p) = [u0 (c0 + ) u0 (c0 + 1)] 0 (p).
Merging these identities implies
u00 (c0 )
00 (p)
c < 0,
=
u0 (c0 )
( 0 (p))2
(m
w
for some constant c > 0. The last identity follows because the left-hand side is independent of p
and the middle term is independent of c0 . This last identity is a differential equation for utility
function u whose (unique) solution is exactly given by the exponential function.
2
The proof of Proposition 6.8 provides insights into risk-aversion. Define the absolute and the relative risk-aversions of a twice differentiable utility function u
by
u00 (x)
u0 (x)
and
u00 (x)
.
u0 (x)
tes
Example 6.9 (exponential utility function). The exponential utility function (6.6)
with risk-aversion parameter > 0 satisfies for all x R
no
ARA (x) = .
NL
Example 6.10 (power utility function). The power utility function (6.7) with
risk-aversion parameter > 0 satisfies for all x R+
RRA (x) = .
Assume that u and v are two utility functions that are defined on the same interval
I. Then, u is more risk-averse than v on I if for any X with range I we have
u1 (E[u(X)]) v 1 (E[v(X)]) .
Proposition 6.11. Assume that u, v C 2 (I) are two utility functions defined on
the same interval I R. The following are equivalent:
u is more risk-averse than v on I;
uARA (x) vARA (x) for all x I.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
152
Proof. We first prove direction . The proof goes by contradiction. Assume that the claim
does not hold true. Due to the twice continuous differentiability property of the utility functions
on I there exists a non-empty open interval O I such that
uARA (x) =
u00 (x)
v 00 (x)
>
= vARA (x)
u0 (x)
v 0 (x)
for all x O.
We consider the function u(v 1 ()) on the non-empty open interval v(O) (note that v is continuous
and strictly increasing). We calculate
w)
d
u0 (v 1 (z))
d
u(v 1 (z)) = u0 (v 1 (z)) v 1 (z) = 0 1
> 0,
dz
dz
v (v (z))
because both u and v are strictly increasing, and
=
=
u00 (v 1 (z))
u0 (v 1 (z))v 00 (v 1 (z))
0
1
2
(v (v (z)))
(v 0 (v 1 (z)))3
u0 (v 1 (z))
u00 (v 1 (z)) v 00 (v 1 (z))
> 0
(v 0 (v 1 (z)))2 u0 (v 1 (z))
v 0 (v 1 (z))
(m
d2
u(v 1 (z))
dz 2
tes
This implies that u(v 1 ()) is a risk-seeking (convex) utility function on the non-empty interval
v(O). Choose a non-deterministic random variable Y such that Y O, P-a.s. Since O is a
non-empty open interval such a random variable can be chosen (i.e. no concentration in a single
point). This implies that Z = v(Y ) is a non-deterministic random variable with range in v(O)
and the strict convexity of u(v 1 ()) on v(O) implies using Jensens inequality
u1 (E [u(Y )]) = u1 E u(v 1 (v(Y ))) > u1 u v 1 (E [v(Y )]) = v 1 (E [v(Y )]) .
(6.9)
no
NL
The last corollary also explains that the price elasticity interval (6.5) becomes more
narrow for decreasing risk-aversion.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
153
w)
Proof of Theorem 6.13. We start with direction . Calculating the derivatives w.r.t. c0 ,
using that is decreasing in c0 and setting v = u0 we obtain
1
c0 = v
E v(c0 + (u, FS , c0 ) S)
(c0 + (c0 ))
v 1 (E [v(c0 + (u, FS , c0 ) S)]) .
c0
(m
Since this holds for any c0 and S we obtain that v is more risk-averse than u, and Proposition
6.11 implies that vARA (x) uARA (x) for all x I. From this we obtain
u000
v 00
u00
= 0 0,
00
u
v
u
tes
d u00
u000
(u00 )2
u00 u000
u00
d u
(x) =
= 0 + 0 2 = 0
0 0.
dx ARA
dx u0
u
(u )
u u00
u
no
This proves the first direction of the equivalence. The proof of direction is received by just
reading the above proof into the other direction (all the statements are equivalences).
2
Example 6.14 (power utility function). The power utility function (6.7) with
risk-aversion parameter > 0 satisfies for all x R+
ARA (x) = x1 .
NL
154
Exercise 16. Choose the car fleet example from Exercise 13 on page 143. Assume
that this car fleet can be modeled by an appropriate compound Poisson distribution
having gamma claim sizes.
1. Calculate the expected claim amount of the car fleet.
2. Calculate the premium for the car fleet using the utility indifference price
principle for the exponential utility function with parameter = 1.5 106 .
6.2.2
Esscher premium
(m
w)
s dF (s).
tes
NL
no
Classical actuarial practice calculates premium loadings by giving more weight to bad events compared to good events. Basically, this means that one does a change of measure towards a
less favorable probability measure. Hans Bhlmann [19] introduces this idea in the actuarial literature by constructing the
Esscher measure.
Define for > 0 the Esscher (probability) distribution F of F
H. Bhlmann
as follows:
Z s
1
ex dF (x),
F (s) =
MS ()
under the additional assumption that the moment generating function MS () of S
exists in . Note that this defines a (normalized) distribution function F .
Definition 6.15 (Esscher premium). Choose S F and assume that there exists
r0 > 0 such that MS (r) < for all r (r0 , r0 ). The Esscher premium of S
in (0, r0 ) is defined by
= E [S] =
s dF (s).
155
d
log MS (r)|r= E[S],
dr
w)
(m
Example 6.17 (Esscher premium for Gaussian distributions). Choose > 0 and
assume that S N (, 2 ). Then we have
d
log MS (r)|r= = + 2 > = E[S].
dr
tes
In the Gaussian case we obtain the variance loading. Thus, the variance loading,
the exponential utility function and the Esscher premium principles provide exactly
the same insurance premium in the Gaussian case.
Conclusions.
no
The Esscher premium can easily be calculated from the moment generating
function MS (r).
NL
The Esscher premium can only be calculated for light tailed claims, see also
Section 5.2 on the Lundberg coefficient. Towards all more heavy tailed claims
the Esscher premium reacts so sensitive that it becomes infinite. In the next
section we study probability distortion principles that allow for more heavy
tailed distributions in premium calculations still leading to finite premia.
In classical economic theory, prices are often derived by the assumption of
market clearing in a risk exchange economy. That is, if we assume that
we have (i) an economy with risky positions S1 , . . . , SK ; (ii) market participants who have an exponential utility function with risk aversion parameters
i > 0; and (iii) market clearing in the sense that all risky positions are allocated to the market participants, then one can prove that the risky positions
are exactly priced with the Esscher measure of the aggregate market capitalization. This is in the spirit of Bhlmann [19] and is, for instance, found in
Tsanakas-Christofides [96].
Version April 14, 2016, M.V. Wthrich, ETH Zurich
156
6.2.3
In the previous section we have met a pricing principle that was based on probability
distortions. In this first case it was only possible to calculate insurance prices for
light tailed claims because the distortion reacted very sensitively to heavy tails. In
the present section we look at probability distortions from a different angle which
will allow for more flexibility. Assume that S F with S 0, P-a.s. Then using
integration by parts the expected claim is calculated as
x dF (x) =
Z
0
w)
E[S] =
(m
In this section we directly distort the survival function F (x) = P[S > x]. Therefore,
we introduce a distortion function h : [0, 1] [0, 1] which is a continuous, increasing
and concave function with h(0) = 0 and h(1) = 1, in Figure 6.2 we give two
examples.
no
power distortion
expected shortfall
NL
0.0
0.2
0.4
0.6
0.8
tes
1.0
probability distortions
0.0
0.2
0.4
0.6
0.8
1.0
Figure 6.2: Distortion functions h of Examples 6.19 and 6.20, below, with = 1/2
and q = 0.1, respectively.
h(p) distorts the probability p with the property that h(p) p for all p [0, 1]
because h is increasing and concave with h(0) = 0 and h(1) = 1.
The concavity of h reflects risk aversion, similar to the utility functions used
in Section 6.2.1.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
157
Note that the existence of p (0, 1) with h(p) > p implies that h(p) > p for
all p (0, 1). Therefore, we assume under strict risk-aversion that h(p) > p
for all p (0, 1).
Definition 6.18. Assume that h : [0, 1] [0, 1] is a continuous, increasing and
concave function with h(0) = 0, h(1) = 1 and h(p) > p for all p (0, 1). The
probability distorted price h of S 0 is defined by (subject to existence)
h(P[S > x]) dx.
w)
h = Eh [S] =
P[S > x] dx
Z
0
(m
E[S] =
tes
Similar to the Esscher premium we modify the probability distribution function of the claims S (in contrast to the utility theory approach where we
modify the claim sizes).
no
NL
(0, 1).
for
(6.10)
1 dx +
Z " #
x
Z
0
1 dx +
Z
x
dx =
Z
0
dx
158
h =
>
= E[S]
for (1/, 1).
1
1
In contrast to the Esscher premium we can calculate the probability distorted
premium also for heavy tailed claims as long as the risk aversion (concavity of h)
is not too large, i.e. in our case (1/, 1).
Exercise 18. Choose power distortion function (6.10). Calculate the probability
distorted price of S (1, c) and of S Bernoulli(p).
x/q
1
for x q,
otherwise.
(6.11)
(m
h(x) =
w)
tes
For simplicity we assume that F is continuous and strictly increasing. This simplifies considerations because then also F is continuous and strictly increasing and
we have F (F ()) = and F (F (x)) = x, see Chapter 1 (the strictly increasing
property of F would not be necessary for getting the full flavor of this example).
Consider the survival function of S given by F (x) = 1 F (x) = P[S > x]. Note
that under our assumptions
Z
0
no
1 dx
h(P[S > x]) dx =
F (x) dx +
q {F (x)q}
{F (x)>q}
NL
Z
1Z
=
1 dx
F (x) dx +
q {xF (1q)}
{x<F (1q)}
1Z
=
P[S > x] dx + F (1 q).
q F (1q)
Note that these identities need more care if F is not strictly increasing. The
continuity and strictly increasing property of F also imply
Z
F (1q)
Z
0
P [S > x |S F (1 q)] dx + F (1 q)
159
The latter is exactly the so-called Tail-Value-at-Risk (TVaR) or the conditional tail
expectation (CTE) of the random variable S at the 1 q security level. Moreover,
F (1 q) is the Value-at-Risk (VaR) of the random variable S at the 1 q security
level. The continuity of F implies that this TVaR is equal to the expected shortfall
(ES) of S at the security level 1 q, that is,
h = E [S |S F (1 q)] =
1 Z1
F (u) du = ES1q (S),
q 1q
w)
see Artzner et al. [5, 6], Acerbi-Tasche [1] and Lemma 2.16 in McNeil et al. [77].
The proof again uses the fact that for continuous distribution functions F we have
F (F ()) = and then the left-hand side of the above statement can be obtained
by a change of variables from the right-hand side.
(m
We conclude that under continuity assumptions the risk measure ES1q (S) can be
obtained via probability distortion (6.11), and following Delbaen [32], it is therefore
a coherent risk measure, see also next section.
Exercise 19. Choose probability distortion (6.11) for q = 1% and calculate the
probability distorted price for
tes
S LN(, 2 ),
S Pareto(, ) with > 1,
i.i.d.
6.2.4
no
NL
with
X 7 %(X).
Remarks.
A risk measure % attaches to each (risky) position X a value %(X) R.
If the risk measure % is the regulatory risk measure then %(X) R reflects the
necessary risk bearing capital that needs to be available within the insurance
company to run business X. This is the minimal equity the insurance company needs to hold to balance possible shortfalls in the insurance portfolio.
This is going to be explained in more detail below.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
160
tes
(m
w)
no
NL
Interpretation.
For outcomes S E[S]: the claim can be financed by the pure risk premium
E[S], solely.
For outcomes S > E[S]: the pure risk premium E[S] is not sufficient and the
shortfall S E[S] > 0 needs to be paid from %(S E[S]). Thus, the investors
capital %(S E[S]) is at risk, and he may lose (part of) it. Therefore, he will
ask for a cost-of-capital rate
rCoC > r0 ,
if r0 denotes the risk-free rate (he receives on a risk-free bank account with
the same time to maturity as his investment).
Version April 14, 2016, M.V. Wthrich, ETH Zurich
161
w)
(m
Axioms 6.22 (axioms for risk measures %). Assume % is a risk measure on the
convex cone X containing R. Then we define for X, Y X , c R and > 0:
(a) normalization: %(0) = 0;
tes
(d) positive homogeneity: for all X and for every > 0 we have %(X) = %(X);
(e) subadditivity: for all X, Y we have %(X + Y ) %(X) + %(Y ).
NL
no
Observe that some of the axioms imply others, e.g. positive homogeneity implies
normalization since %(0) = %(0) = %(0) for all > 0 this immediately says
%(0) = 0. For a detailed analysis of such implications we refer to Section 6.1 in
McNeil et al. [77] and Section 9.1 in Wthrich-Merz [101].
For our analysis we require (at least) normalization (a) and translation invariance
(c). We briefly comment on this.
Translation invariance. If we hold a risky position X and if we inject capital
c > 0 then the loss is reduced to X c. This implies for risk measure % that the
reduced position satisfies
%(X c) = %(X) c.
This justifies the definition of the regulatory risk measure as stated above. Namely,
if we sell a risky portfolio S and we collect pure risk premium E[S] then the risk
of the residual loss S E[S] is given by
%(S E[S]) = %(S) E[S].
Normalization and translation invariance. A balance sheet of an insurance
company is called acceptable if its (future) surplus C1 X satisfies %(C1 ) 0, see
also Wthrich [98]. Assume that the insurance company sells a policy S at price
Version April 14, 2016, M.V. Wthrich, ETH Zurich
162
E[S] and at the same time it has initial capital c0 = %(S E[S]) 0. Then
the future surplus of the company is given by C1 = c0 + S. The regulator then
checks the acceptability condition which reads as
%(C1 ) = % ((c0 + S)) = c0 + %(S) = + E[S] 0.
(6.12)
(m
w)
tes
Definition 6.23 (coherent risk measure). The risk measure % is called coherent if
it satisfies Axioms 6.22.
no
P. Artzner
NL
for a given parameter > 0. This risk measure is normalized, positive homogeneous, and subadditive. But it is neither translation invariant nor monotone. Note
that for the standard deviation risk measure the cost-of-capital pricing principle
coincides with the standard deviation loading principle presented in Section 6.1.
Example 6.25 (Value-at-Risk, VaR). The VaR of S F at security level 1 q
(0, 1) is given by the left-continuous generalized inverse of F at 1 q, i.e.
%(S) = VaR1q (S) = F (1 q).
Version April 14, 2016, M.V. Wthrich, ETH Zurich
163
w)
Example 6.26 (expected shortfall). The expected shortfall has already been introduced in Example 6.20, where we have stated that the expected shortfall is equal
to the TVaR for continuous distribution functions F . Instead of introducing it via
probability distortion functions we can also directly define it. Assume that S F
with F continuous. Then we have
1Z 1
%(S) = TVaR1q (S) = E [S |S VaR1q (S)] =
VaRu (S) du = ES1q (S).
q 1q
(m
tes
This cost-of-capital pricing principle can also be obtained with probability distortion functions: choose h as in Example 6.20 and define the distortion function
e : [0, 1] [0, 1] as follows
h
e
h(x)
= (1 rCoC ) x + rCoC h(x),
eh =
no
for fixed rCoC (0, 1), see Figure 6.3. For a non-negative random variable S
0 with continuous (and strictly increasing) distribution function we obtain, see
Example 6.20,
e (P[S > x]) dx
h
NL
= (1 rCoC )
Remarks.
Solvency II considers VaR1q (S E[S]) for 1 q = 99.5% as the regulatory
risk measure.
The Swiss Solvency Test considers ES1q (S E[S]) for 1 q = 99% as the
regulatory risk measure.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
164
0.2
w)
0.4
0.6
0.8
1.0
probability distortions
0.0
0.2
0.4
(m
0.0
0.8
1.0
tes
Figure 6.3: Distortion functions h of Example 6.20 (expected shortfall) and corree for expected shortfall cost-of-capital loading.
sponding h
For rCoC one often sets 6% above the risk-free rate. However, this is a heavily
debated number because in stress periods this rate should probably be higher.
no
NL
= + rCoC
2
1
1
.
exp 1 (1 q)
q 2
2
6.2.5
165
E[] = d0 =
(m
w)
tes
no
NL
166
the FKG inequality, see Fortuin et al. [48], it follows that and S are positively
correlated, and thus
(0) = E[S] E[S].
Observe the identity
(0) = E[S] =
h
i
1
E eS S = ,
MS ()
w)
which is exactly the Esscher premium , and P is the Esscher measure corresponding to F , see Section 6.2.2.
(m
no
tes
Example 6.28 (cost-of-capital loading with expected shortfall). This example treats the expected shortfall risk measure. Assume S F
with continuous distribution function F . The VaR on security level 1 q (0, 1)
is then given by VaR1q (S) = F (1 q), see Example 6.25. Note again that
F (F (1 q)) = 1 q, see Chapter 1. Choose rCoC (0, 1) and define the probability distortion
rCoC
= (1 rCoC ) +
1{SVaR1q (S)} > 0,
P-a.s.
q
NL
(0)
! #
rCoC
= E (1 rCoC ) +
1{SVaR1q (S)} S
q
i
rCoC h
= (1 rCoC ) E [S] +
E 1{SVaR1q (S)} S
q
= (1 rCoC ) E [S] + rCoC E [ S| S VaR1q (S)]
Chapter 7
(m
w)
no
tes
NL
S =
N
X
i=1
(l)
Yi =
(l)
v
N
X
X
l=1
(l)
Yi
i=1
v
X
Sl ,
l=1
(l)
where Sl = N
describes the total claim amount of policy l = 1, . . . , v. This
i=1 Yi
decoupling provides independent compound Poisson distributions Sl . That is, we
have Sl CompPoi(l , Gl ), where we set volume vl = 1, l > 0 is the expected
(l)
number of claims of policy l and Yi Gl describes the claim size distribution of
policy l. This provides the following decomposition of the mean
P
E[S] =
v
X
l=1
E[Sl ] =
v
X
(l)
l E[Y1 ] = E[Y1 ]
l=1
v
X
l=1
(l)
v
X
l E[Y1 ]
=
(l) ,
E[Y1 ]
l=1
where = E[S]/v = E[Y1 ] is the average claim over all policies and (l) > 0 reflects
the contribution of policy l = 1, . . . , v. This means that in the case of heterogeneity
we should determine these risk characteristics (l) for every policy l to obtain
167
168
w)
risk adjusted premia because these risk characteristics (l) describe the differences
between the policies. This would require to model v different parameters. To avoid
over-parametrization and to have sufficient volume(s) for a LLN one chooses a fixed
(finite) number, say K, of tariff criteria (like age, type of car, kilometers yearly
driven, place of living, etc.) such that the total portfolio is divided into sufficiently
homogeneous sub-portfolios (risk classes, risk cells). These tariff criteria play the
role of covariates in regression theory.
Then we try to modify the overall average claim = E[S]/v = E[Y1 ] to these risk
classes such that their prices become a function of the risk characteristics in the
K tariff criteria. This way we may substantially reduce the number of parameters
and estimation can be done.
(m
For this exposition we assume to only have two tariff criteria (covariates), i.e. K =
2, and we would like to set up a multiplicative tariff structure.
tes
Assume we have K = 2 tariff criteria. The first criterion (covariate) has I risk
characteristics i {1, . . . , I} and the second criterion (covariate) has J risk characteristics j {1, . . . , J}. Thus, we have M = I J different risk classes (risk cells),
see Table 7.1 for an illustration.
no
1
..
.
NL
i
..
.
vi,j (i,j) ,
i,j
where vi,j denotes the number of policies belonging to risk class (i, j) and (i,j)
describes the quality of that risk class. Our aim is to set up
169
(7.1)
w)
(m
yearly km
2,
1,
0-10000
10-15000
15-20000
20-25000
25000+
0.8
0.9
1.0
1.1
1.2
0 years
1.2
1 year
1.1
2 years
1.0
3 years
0.9
4 years
0.8
5 years
0.7
6+ years
0.5
no
no accident
tes
Observe that the 1st tariff criterion is continuous, but typically it is discretized for
having finitely many risk characteristics, see Table 7.2 for an example.
NL
170
Related to the first item: the aim should be to build homogeneous risk classes
of sufficient volume such that a LLN applies and we get statistical significance.
w)
(m
i,j
tes
no
E[S] (i,j)
(7.2)
7.1
NL
where = E[Y1 ] is the average claim per policy over the whole portfolio v,
i.e. E[S] = v, and (i,j) = 1,i 2,j describes the multiplicative tariff structure
for two tariff criteria.
X2 =
X
i,j
171
w)
(7.3)
R.A. Bailey
(m
i,j
tes
This can either be done by first summing over rows i or columns j. Note that
b2,j is found by
the solution of
X (Si,j vi,j 1,i 2,j )2
X 2
=
.
2,j
2,j
vi,j 1,i 2,j
i
no
b2,j =
S 2 /(vi,j
b
b1,i )
Pi,j
b
b1,i
i vi,j
!1/2
.
vi,j
b
b1,i
b2,j
vi,j
b
b1,i
!1/2
X
i
2
Si,j
vi,j
b
b1,i
!1/2
.
NL
Next we apply the Schwarz inequality to the terms on the right-hand side which provides the
following lower bound
!1/2
2
X
X
X
S
i,j
1/2
=
vi,j
b
b1,i
b2,j
(vi,j
b
b1,i )
Si,j .
vi,j
b
b1,i
i
i
i
2
Example 7.3 (method of Bailey & Simon). We choose an example with two tariff
criteria. The first one specifies whether the car is owned or leased, the second
one specifies the age of the driver. For simplicity we set vi,j 1 and we aim to
determine the tariff factors , 1,i and 2,j . The method of Bailey & Simon then
requires minimization of
2
X =
X
i,j
172
Note that we need to initialize the estimators for obtaining a unique solution. We
set b = 1 and b1,1 = 1. The observations Si,j are given by, see also Figure 7.1,
21-30y
owned 1300
leased 1800
31-40y
1200
1300
41-50y
1000
1300
51-60y
1200
1500
w)
L leased
O owned
(m
1600
1400
claim amount
1800
2000
scatter plot
1200
tes
1000
2130y
3140y
4150y
5160y
age classes
no
NL
31-40y
1112
1395
1112
41-50y
1020
1280
1020
51-60y
b1,i
1197 1.0000
1503 1.2548
1197
In this example we have (systematic) positive bias as stated in Lemma 7.2, i.e.
X
i,j
Si,j = S.
i,j
J
X
j=1
j=1
I
X
Si,j ,
(7.4)
Si,j .
(7.5)
(m
I
X
J
X
i=1
J. Jung
w)
173
i=1
Remarks.
tes
no
NL
Both the method of Bailey & Simon and the method of Bailey & Jung are
rather pragmatic methods because they are not directly based on a stochastic
model. Therefore, in the remainder of this chapter we are going to describe
more sophisticated methods which are motivated by a probabilistic model.
Example 7.4 (method of Bailey & Jung, method of total marginal sums). We
revisit the data of Example 7.3. This time we determine the parameters by solving
the system (7.4)-(7.5). This needs to be done numerically and provides the following
multiplicative tariff structure:
21-30y
owned 1375
leased 1725
b2,j
1375
31-40y
1108
1392
1108
41-50y
1020
1280
1020
51-60y
b1,i
1197 1.0000
1503 1.2553
1197
We conclude that both methods give similar results for this example.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
174
7.2
7.2.1
Gaussian approximation
Maximum likelihood estimation
w)
The expected value of this claim ratio is given by, see (7.2),
(m
def.
tes
Combining this two items implies that we plan to consider the following model
no
Thus, taking logarithms may turn the multiplicative tariff structure into an additive
structure. If this logarithm Xi,j of Ri,j has a Gaussian distribution we have nice
mathematical properties. Therefore, we assume a log-normal distribution for Ri,j
which hopefully gives a good approximation to the true tariffication problem. These
choices imply for the first two moments
2 /2
e1,i e2,j
and
NL
E[Ri,j ] = e0 +
Observe that the mean has the right multiplicative structure, set = e0 + /2 ,
1,i = e1,i and 2,j = e2,j . However, the distributional properties are rather
different from compound models, and the underlying volumes vi,j are also not
considered in an appropriate way. Nevertheless, this log-linear additive Gaussian
structure is often used because of its nice mathematical structure and because
popular statistical methods can be applied.
Set M = I J and define for Xi,j = log Ri,j = log(Si,j /vi,j ) the vector
X = (X1 , . . . , XM )0 = (X1,1 , . . . , X1,J , . . . , XI,1 , . . . , XI,J )0 RM .
(7.6)
Note that we change the labeling of the observations because this is going to be
simpler in the sequel. Index m always refers to
m = m(i, j) = (i 1)J + j {1, . . . , M = I J}.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(7.7)
175
(7.8)
w)
Xi,j
Var(Ri,j ) = Var(e
(m
Throughout we assume that Z has full rank. We initialize 1,1 = 2,1 = 0 and
0 plays the role of the intercept. At the moment the weights wm do not have a
1
natural meaning, often one sets wm = vi,j
(inversely proportional to the underlying
volume) because in this case one has
2 /vi.j
) = E[Ri,j ] (e
2
E[Ri,j ]2 ,
1)
vi,j
leased
0
0
0
0
1
1
1
1
21-30y
1
0
0
0
1
0
0
0
31-40y
0
1
0
0
0
1
0
0
no
owned
1
1
1
1
0
0
0
0
NL
1
2
tes
for vi,j large. Thus, the variance of the claims ratio Ri,j is roughly inversely proportional to the underlying volume vi,j . In view of Example 7.3 this gives the following
table where the 1s show to which class the observations belong to:
41-50y
0
0
1
0
0
0
1
0
51-60y
0
0
0
1
0
0
0
1
X = log R
7.17
7.09
6.91
7.09
7.50
7.17
7.17
7.31
This table needs to be turned into the appropriate form so that it fits to (7.8).
Therefore we need to drop the columns owned and 21-30y because of the
chosen normalization 1,1 = 2,1 = 0. This provides the following table:
Z =
intercept
1
1
1
1
1
1
1
1
leased
0
0
0
0
1
1
1
1
31-40y
0
1
0
0
0
1
0
0
41-50y
0
0
1
0
0
0
1
0
51-60y
0
0
0
1
0
0
0
1
0
1,2
2,2
2,3
2,4
176
(2)M/2 ||1/2
exp
1
(x Z)0 1 (x Z) .
2
= Z 0 1 Z
1
Z 0 1 X.
(7.9)
w)
The tariff factors can then be estimated by (avoiding the variance correction term
which is appropriate for 2 wm /2 0 )
o
MLE
b1,i = exp b1,i
and
MLE
b2,j = exp b2,j
.
(m
b = exp b0MLE ,
tes
Example 7.5 (log-linear model). We use the data Si,j from Example 7.3. Assume
1
wm = vi,j
1 and initialize b = 1 and b1,1 = 1. The log-linear MLE formula (7.9)
provides the following multiplicative tariff structure:
31-40y
1117
1396
1117
no
21-30y
owned 1368
leased 1710
b2,j
1368
41-50y
1020
1274
1020
51-60y
b1,i
1200 1.0000
1500 1.2495
1200
NL
We compare the results from the method of Bailey & Simon, the method of total
marginal sums (Bailey & Jung) and the log-linear MLE method.
We see that in this example all three methods provide similar results.
Observe: the risk class (owned, 21-30y) is punished by the bad performance
of (leased, 21-30y) and vice verse. A similar remark holds true for risk class
(leased, 31-40y).
Remarks.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
177
The multiplicative tariff construction above has used the design matrix Z =
(zm,k )m,k RM (r+1) which was generated by categorical variables. Categorical variables allow to group observations into disjoint risk categories.
w)
Binary variables are a special case of categorical variables that can only have
two specifications, 1 for true and 0 for false. Recall that all our zm,k {0, 1}.
E.g., the observation Si,j either belongs to the class owned or to the class
leased.
tes
(m
7.2.2
NL
no
A serious drawback of the log-linear model is that we need to have observations in all risk classes because otherwise Xi,j = log(Si,j /vi,j ) is not welldefined. In practice, it may happen that one has a risk class with positive
volume vi,j > 0 but there is no claim in that risk class. This results in Si,j = 0.
In this case one should use the more sophisticated models presented below,
see for instance Section 7.3.4 for a claims count example. Moreover, volumes
vi,j should be large in order to have the right relationship for the resulting
variances of the claims ratios.
Goodness-of-fit analysis
Compared to the methods in the previous section, the log-linear MLE formula (7.9)
has the advantage that we can apply classical statistical methods for a goodness-offit test and for variable selection/reduction techniques. We introduce this statistical
language. For this discussion we assume homoscedasticity, i.e. identical weights
wm = 1
and
= 2 1,
MLE
b
which simplifies the MLE to
= (Z 0 Z)1 Z 0 X. The general case is treated in
the next section. We introduce the total sum of squares (the first and last equalities
are definitions)
178
SStot =
X
Xm X
2
X
c X
X
m
2
X
c
Xm X
m
2
= SSreg + SSerr ,
(7.10)
with X =
1
M
PM
m=1
Xm and
c
X
b MLE .
Z
SStot is the total difference between observations Xm and the sample mean
X without knowing the explaining variables Z.
w)
(m
Proof of (7.10). We rewrite the total sums of squares SStot in vector notation. Therefore we
define
b MLE = X X
c
= X (1, . . . , 1)0 .
b
= X Z
and
X
(7.11)
We calculate
X 0X
c+b
c+b
cX
c + 2X
cb
+b
0 b
.
(X
)0 (X
) = X
tes
b
0 = Z 0 (X Z
and as a consequence
) = Z 0b
,
(7.12)
0
b MLE )0 b
cb
X
= (Z
= 0.
no
This implies
X 0X
cX
c+b
X
0 b
.
0X
to obtain
We subtract on both sides X
0X
=b
cX
cX
0X
= SSerr + SSreg ,
SStot = X 0 X X
0 b
+X
NL
where for the last step we need to observe that the intercept 0 is contained in every row of the
design matrix Z, therefore the first column in Z is equal to (1, . . . , 1)0 . This and (7.12) imply
0
P
P b
cc 0
0 = (1, . . . , 1)0 b
=
Xm X
m . This treats the cross-product terms leading to X X X X =
SSreg . This proves (7.10).
2
SSerr
SSreg
=1
[0, 1].
SStot
SStot
This is the ratio of explaining variables SSreg and the total sum of squares SStot . If
the model explains well the structure in the observations then R2 should be close
c is able to explain the underlying structure.
to 1, because X
179
For Example 7.5 we obtain R2 = 0.9202 which is in favor for this model explaining
the data Si,j .
Residual standard deviation : For further analysis we also need the residual
standard deviation . It is estimated (in the homoscedastic case) by
b0 b
1 X
c 2 = = SSerr ,
Xm X
m
M m
M
M
w)
b 2 =
(m
where b was defined in (7.11). Set r = I +J 2, i.e. the dimension of parameter is r +1. b 2 is the MLE for 2 and
M b 2 is distributed as 2 2M r1 , see, for instance, Section
7.4 in Johnson-Wichern [62]. Often, one also considers the
M
unbiased variance parameter estimator sb2 = M r1
b 2 .
tes
Likelihood ratio test: Finally, we would like to see whether we need to include
a specific parameter k,lk .
no
NL
Note that the model is, of course, invariant under permutation of parameters and
components. Therefore, we can choose any specific ordering and to simplify notation we define
= (0 , 1 , . . . , r )0 Rr+1 ,
(7.13)
so that we have the ordering of components that is appropriate for the next layout.
Null hypothesis H0 : 0 = . . . = p1 = 0 for given p < r + 1.
b in the full model with r + 1
1. Calculate the residual differences SSfull
err and
r+1
dimensional parameter vector R .
0
r+1p
0
2. Calculate residual differences SSH
.
err in the reduced model (p , . . . , r ) R
We calculate the likelihood ratio . Therefore, we denote the design matrix of the
Version April 14, 2016, M.V. Wthrich, ETH Zurich
180
fbH0 (X)
=
fbfull (X)
SSerr0
M
SSfull
err
M
bH0
bfull
M/2
M
exp
2b12
H
(X Z0
b MLE )0 (X
H0
Z0
MLE
b
)
H0
MLE
MLE
b
b
0
exp 2b12 (X Z
full ) (X Z full )
full
0
SSH
err
SSfull
err
!M/2
SSH0 SSfull
1 + err full err
SSerr
!M/2
. (7.14)
w)
The likelihood ratio test rejects the null hypothesis H0 for small values of . This
full
full
0
is equivalent to rejection for large values of (SSH
err SSerr )/SSerr .
This motivates to consider the test statistics
(m
full
full
0
0
SSH
SSH
err SSerr M r 1
err SSerr
F=
=
.
p
p sb2full
SSfull
err
(7.15)
tes
F > Fp,M
r1 (),
(7.16)
no
where the latter denotes the quantile of the F -distribution with degrees of freedom df 1 and df 2 . The heteroscedastic case is given in (7.22), below.
Example 7.6 (regression model, revisited). We revisit Example 7.5.
In Figure 7.2 we give the R output of the command lm.
The lines Call give the MLE problem to be solved.
NL
SSerr M 1
.
SStot M r 1
181
(m
w)
tes
The final line displays an F test statistics (7.15) of value 8.653 for df1 = 4
and df2 = 3 for dropping all variables except of the intercept 0 . This gives
a p-value of 5.36% which says that the null hypothesis is just about to be
rejected on the 5% significance level and we stay with the full model.
no
NL
21-30y
2000
2200
2500
31-40y
1800
1600
2000
41-50y
1500
1400
1700
51-60y
1600
1400
1600
182
7.3
(1)
w)
In the previous section we have taken a log-normal approximation for the total claim
amounts Si,j in risk classes (i, j). Taking logarithms has then led to a multiplicative
structure in a natural way. In the present section we express the expected claim of
risk class (i, j) as expected number of claims times the average claim, i.e.
(l)
(l)
(m
where Ni,j describes the number of claims in risk class (i, j) and Yi,j the corresponding i.i.d. claim sizes for l = 1, . . . , Ni,j in risk class (i, j). Note that we suppose a
compound distribution for this decoupling.
tes
x b()
fX (x; , ) = exp
+ c(x, , w) ,
/w
w>0
>0
no
NL
is a given weight,
b:R
c(, , )
183
b( + r/w) b()
.
MX (r) = exp
/w
w)
Proof. Choose and r in the neighborhood of zero such that MX (r) exists. Then we have
Z
x b()
rx
MX (r) =
e exp
+ c(x, , w) dx
/w
Z
x( + r/w) b()
=
exp
+ c(x, , w) dx
/w
Z
b( + r/w) b()
x( + r/w) b( + r/w)
= exp
exp
+ c(x, , w) dx.
/w
/w
(m
We have assumed that is an open set. Therefore, for any we have that r = +r/w
for r sufficiently close to zero. Therefore, the last integral is the density that corresponds to
EDF(r , , w, b()) and since this is a well-defined density with identical support for all r
this last integral is equal to 1. This proves the claim.
2
Corollary 7.9. We make the same assumptions as in Lemma 7.8 and in addition
we assume that b C 2 in the interior of . Then we have
and
Var(X) =
tes
E[X] = b0 ()
00
b ().
w
and
d2
M
(r)
X
dr2
r=0
no
Proof. In view of (1.3) we only need to calculate the first and second derivatives at zero of the
moment generating function. We have from Lemma 7.8
b( + r/w) b() 0
d
= exp
= b0 (),
MX (r)
b ( + r/w)
dr
/w
r=0
r=0
00
0
2
(b ( + r/w)) + b ( + r/w)
w
r=0
NL
b( + r/w) b()
exp
/w
=
=
exp v x log(1 + e )
= exp v x log e log(1 + e )
e
1
exp vx log
exp
v(1
x)
log
= pvx (1 p)vvx ,
1 + e
1 + e
184
e
= p,
1 + e
and
Var(X) =
1 00
1 e
1
1
b () =
= p(1 p).
v
v 1+e 1+e
v
w)
(m
n
o
fX (x; , 1)
= exp v x e
= vx ev ,
exp{c(x, 1, v)}
Var(X) =
1 00
1
1
b () = e = .
v
v
v
tes
and
no
fX (x; , )
x 2 /2
= exp
exp {c(x, , w)}
/w
1 2 2x
= exp
,
2 /w
NL
fX (x; , )
x + log()
= exp
exp {c(x, , w)}
/w
= ()
w/
w
exp
x ,
and
Var(X) =
= 2.
2
w
c
For more examples we refer to Table 13.8 in Frees [49] on page 379.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
185
These examples show that several popular distribution functions belong to the
exponential dispersion family. In the present notes we concentrate on the Poisson
and the gamma distributions for pricing the two components number of claims
and claims severities. However, the theory holds true in more generality. Our aim
is to consider compound Poisson models and to express the expected claim of risk
class (i, j) as expected number of claims times the average claim, i.e.
(1)
(m
w)
where Ni,j describes the number of claims in risk class (i, j) and Yi,j the corresponding i.i.d. claim sizes for l = 1, . . . , Ni,j in risk class (i, j). We then aim for
calculating a multiplicative tariff which considers risk characteristics s for both
the number of claims and the claims severities of risk class (i, j).
7.3.1
tes
We assume that Ni,j are independent with Ni,j Poi(i,j vi,j ) and vi,j counting the
number of policies in risk class (i, j). Under these assumptions we derive a multiplicative tariff structure for the characteristics of the expected claims frequency
i,j . For the claim sizes we will do a similar construction by making a gamma distributional assumption. Since the latter is slightly more involved than the former
we start with the Poisson case.
no
We assume that Ni,j are independent with Ni,j Poi(i,j vi,j ) where vi,j denotes the
number of policies in risk class (i, j). In view of the exponential dispersion family
we make the following Ansatz for the expected claims frequency, see Example 7.10,
"
i,j
Ni,j
=E
= b0 (i,j ) = exp{i,j } = exp{(Z)m },
vi,j
(7.17)
NL
where in the last step we assume having a multiplicative tariff structure which
provides an additive structure on the log-scale reflected by the linear term Z.
The index m = m(i, j) was defined in (7.7), matrix Z RM (r+1) denotes the
design matrix and Rr+1 is the parameter vector. Thus, we assume that
Xi,j = Ni,j /vi,j N0 /vi,j are independent with
Xi,j EDF(i,j = (Z)m , = 1, vi,j , b() = exp{}).
Our aim is to estimate the parameter vector Rr+1 . Identity (7.17) immediately explains that the natural link function g in this problem (between mean
and parameter) is the so-called log-link function g() = log(), because this turns
the multiplicative tariff structure into an additive form. The joint log-likelihood
function of X RM
+ is given by (we use independence here)
`X ()
X
m
X Xm (Z)m exp{(Z)m }
Xm m exp{m }
=
,
1/vm
1/vm
m
186
where we have applied the relabeling of the components of X and vi,j such that
b MLE for is found by
they fit to the design matrix Z, see also (7.6). The MLE
the solution of
`X () = 0.
(7.18)
w)
X Xm exp{m } m
X Xm m exp{m }
`X () =
=
l
l m
1/vm
1/vm
l
m
X Xm exp{(Z)m } (Z)m
X Xm exp{(Z)m }
=
=
zm,l ,
1/vm
l
1/vm
m
m
(m
b
Z 0 1 Z
MLE
tes
Remarks. One should observe the similarities between the Gaussian case (7.9)
and the Poisson case of Proposition 7.11 given by, respectively,
= Z 0 1 X
and
b
Z 0 V exp{Z
MLE
} = Z 0 V X.
no
The Gaussian case is solved analytically (assuming full rank of Z), the Poisson case
can only be solved numerically, due to the presence of the exponential function.
The Poisson case can be rewritten as
Z 0 V exp{Z MLE } Z 0 N = 0.
7.3.2
NL
Observe that the latter exactly provides the solution to the method of total marginal
sums by Bailey & Jung [9, 63] given by (7.4)-(7.5).
The analysis of the gamma claim sizes is more involved because it needs more
(l)
transformations. We denote by ni,j the number of observations Yi,j in risk class
(i, j), this plays the role of the volume in the exponential dispersion family. We
assume that
(l) i.i.d.
Yi,j (i,j , ci,j )
for l = 1, . . . , ni,j .
From the moment generating function given in Section 3.3.3 we immediately see
that for given ni,j the convolution is given by
Yi,j =
ni,j
X
(l)
l=1
187
Thus, the total claim amount Yi,j in risk class (i, j) for given ni,j has a gamma
distribution (which belongs to the exponential dispersion family). We define the
normalized random variable Xm = Yi,j /ni,j , where we again use the relabeling
defined in (7.7). Observe that the family of gamma distributions is closed towards
multiplication, see (3.5). Therefore, the density of Xm is given by
fXm (x) =
(cm nm )m nm m nm 1
x
exp{cm nm x}.
(m nm )
(7.19)
w)
(m
Finally, define cumulant function b() = log() for < 0, see Example 7.10.
The density of Xm = Yi,j /ni,j in risk class (i, j) is then given by
(
m x b(m )
fXm (x) = exp
m /nm
1
(nm /m )
nm
m
!nm /m
xnm /m 1 .
tes
no
The first two moments are given by, see Corollary 7.9,
1
E[Xm ] = m
and
Var(Xm ) =
m 2
.
nm m
NL
1
log E[Xm ] = log(m
) = log(m ) = (Z)m ,
(7.20)
with design matrix Z RM (r+1) and parameter vector Rr+1 . This gives
relationship
m = exp {(Z)m } .
For the joint log-likelihood function of X RM
+ we then obtain (assuming independence between the components of X)
`X ()
+ log(m ) X nm
=
[Xm exp{(Z)m } (Z)m ] .
m /nm
m m
X m Xm
m
Note that this excludes risk classes (i, j) with no observation ni,j = 0. The MLE
b MLE for is found by the solution of
`X () = 0.
(7.21)
188
`X () =
[Xm exp{(Z)m } 1] zm,l =
[Xm m 1] zm,l ,
l
m m
m m
where Z = (zm,l )m,l RM (r+1) . For rewriting the previous equation in matrix
form we define the weight matrix V = diag(1 n1 /1 , . . . , M nM /M ). The last
equation is then written as
w)
`X () = Z 0 V X Z 0 V exp{Z}.
(m
Proposition 7.12. The solution to the MLE problem (7.21) in the gamma case is
given by the solution of
Z 0 V exp{Z} = Z 0 V X.
Remarks.
tes
Proposition 7.12 for the gamma case looks very promising because it has
the same structure as Proposition 7.11 for the Poisson case. However, this
similarity is only at the first sight: parameter vector determines which
is also integrated into the weight matrix V = V() . Therefore, the MLE
MLE
no
NL
Note that the parameter vector acts on the scale parameter cm because
cm = m /m with m = exp {(Z)m }. The shape parameter m is
determined through the dispersion parameter, i.e. m = 1/m .
For the general case within the exponential dispersion family with link function g we refer to Section 2.3.2 in Ohlsson-Johansson [82].
We have seen that the weights wi,j are given by the number of policies vi,j in
the Poisson case and by the number of claims ni,j in the gamma case.
In the log-linear Gaussian model there was the difficulty that we could not
handle risk classes without claims, see page 177. For the Poisson model, this
is not a difficulty because Xm = Nm /vm = 0 is a valid observation. For the
gamma claim sizes risk classes without an observation are naturally excluded
in the MLE.
We summarize the 3 cases considered:
189
Gaussian case:
b
Z 0 1 Z
MLE
Z 0 1 X = 0.
Poisson case:
b
Z 0 V exp{Z
MLE
} Z 0 V X = 0.
MLE
} Z 0 Vb X = 0,
b
Z 0 Vb exp{Z
b MLE }.
with b = exp{Z
(m
7.3.3
w)
Gamma case:
no
tes
In this section, we consider variable reduction for the exponential dispersion family
under the assumption of choosing the log-link function. In the Gaussian case we
have calculated the F statistics given in (7.15). This F statistics was based on
the classical (unscaled) Pearsons residuals b which measure the difference between
the observations and the (estimated) mean, see (7.11). In the general case of the
exponential dispersion family it is more appropriate to replace Pearsons residuals
by the deviance residuals which measure the contributions of residual differences
to the log-likelihood. This we are going to explain next.
Having observations X = (X1 , . . . , XM )0 with independent components, we deterb MLE for Rr+1 within the exponential dispersion family with
mine the MLE
log-link function and design matrix Z RM (r+1) as described above. This then
provides the estimate for the mean given by, see (7.17) and (7.20),
b (b
m)
= exp
b MLE )
(Z
NL
b m =
We define the inverse function h = (b0 )1 which implies bm = h(b m ). The logb is then given by
likelihood function at this estimate
b =
`X ()
X
m
Xm h(b m ) b(h(b m ))
+ c(Xm , , wm ),
/wm
X
m
Xm h(Xm ) b(h(Xm ))
+ c(Xm , , wm ).
/wm
190
w)
b = D (X, )
b = 2 (`X (X) `X ())
b .
D(X, )
(m
Observe that these deviance statistics play the role of the residual differences SSerr
(Pearsons residuals) which were used in the likelihood ratio given in (7.14).
This deviance statistics measure the contribution of the residual differences to the
log-likelihood.
Similar to Section 7.2.2 we would now like to see whether we can reduce the number
of parameters in Rr+1 .
tes
no
b H0 ) D(X,
b full ) M r 1
D(X,
0.
b full )
D(X,
p
(7.22)
NL
F=
(7.23)
191
estimated by bm (under the assumption that m = for all m and thus cancels
in the MLE). Then, we can estimate from Pearsons (classical) residuals by
bP =
X
1
(Xm b0 (bm ))2
.
wm
M r1 m
b00 (bm )
bD =
w)
We can also calculate bP and bD in the Poisson case and if they are substantially
different from 1, then we either have under- or over-dispersion, and a different
model should be used.
(m
Finally, to check the accuracy of the model and the fit one should also plot the
residuals. Again, we have two options. We can either study Pearsons residuals
given by
Xm b0 (bm )
q
rP,m =
,
b00 (bm )/wm
or the deviance residuals given by
r
tes
no
for m = 1, . . . , M . These residuals should not show any structure because the
Xm s were assumed to be independent and the observed residuals should roughly
be centered having similar variances. We come back to this in Section 7.3.4, below.
Example 7.13. Assume that X1 , . . . , XM are independent with
Xm EDF(m , , wm , b() = ()2 /2).
(7.25)
From Example 7.10 we know that these Xm s have a Gaussian distribution, i.e. their
densities are given by
(
NL
1 (xm m )2
f (xm ; m , ) = q
exp
.
2
/wm
2/wm
1
b = ,
b
b = b0 ()
The scaled deviance is given by, set
2
1X
b =
D (X, )
wm Xm bm ,
m
wm Xm bm
2
(7.26)
Compare this to the residual difference SSerr of Section 7.2.2. Compare (7.22) and
(7.15) for the Gaussian model (7.25).
Exercise 22. Calculate the deviance statistics for the Poisson and the gamma
model, see also (3.4) in Ohlsson-Johansson [82].
Version April 14, 2016, M.V. Wthrich, ETH Zurich
192
7.3.4
w)
(m
Nm = Nl1 ,l2 ,l3 ,l4 Poi (l1 ,l2 ,l3 ,l4 vl1 ,l2 ,l3 ,l4 ) ,
tes
with vm = vl1 ,l2 ,l3 ,l4 being the number of policies in risk class m and l1 ,l2 ,l3 ,l4 the
expected claims frequency in the corresponding risk class. We assume independence
between different risk classes and we choose a multiplicative tariff structure for the
expected claims frequency, see also (7.1) and (7.17),
m = l1 ,l2 ,l3 ,l4 = exp {l1 ,l2 ,l3 ,l4 } = exp {0 + 1,l1 + 2,l2 + 3,l3 + 4,l4 } ,
(7.27)
no
with intercept 0 and tariff factors k,lk for the tariff criteria k = 1, . . . , 4. The 4
tariff criteria reflect weight category of car, age of driver, kilometers yearly
driven and local region (canton) in Switzerland. We define the relative volume
measures for the 4 different tariff factors as follows
vlweight
1 ,
=P
l2 ,l3 ,l4
NL
, vlkm
and vlcanton
. Moreover, for all tariff criteria k =
and analogously for vlage
3 ,
4 ,
2 ,
1, . . . , 4 we can consider the marginal MLEs. These are given by, see also Estimator
2.32,
X
1
b weight = P
k = 1. The first tariff criterion is the weight category of the car. We have the
following 7 risk characteristics for l1 {1, . . . , 7}:
l1
in kg
label
vlweight
,
1
1-500
W1-500
<1%
2
501-1000
W501-1000
8%
3
1001-1500
W1001-1500
56%
4
1501-2000
W1501-2000
30%
5
2001-2500
W2001-2500
4%
6
2501-3000
W2501-3000
1%
7
3001-3500
W3001-3500
1%
b
weight
l
15.4%
7.1%
6.7%
7.3%
11.0%
13.3%
21.4%
193
k = 2. The second tariff criterion is the age of the driver. We have the following
8 risk characteristics l2 {1, . . . , 8}:
l2
age
label
vlage
,
1
18-20
Y18-20
6%
2
21-25
Y21-25
5%
3
26-30
Y26-30
6%
4
31-40
Y31-40
17%
5
41-50
Y41-50
22%
6
51-60
Y51-60
20%
7
61-70
Y61-70
14%
8
71-99
Y71-99
10%
b
age
l
19.8%
8.8%
7.7%
6.6%
6.2%
5.8%
5.4%
6.7%
1
1-5
K1-5
1%
2
6-10
K6-10
52%
3
11-15
K11-15
30%
b
km
l
7.4%
6.6%
7.3%
4
16-20
K16-20
14%
5
21-25
K21-25
1%
6
26-30
K26-30
1%
(m
l3
in 10 000 km
label
vlkm,
w)
k = 3. The third tariff criterion is the kilometers yearly driven (in 10 000 km).
We have the following 7 risk characteristics l3 {1, . . . , 7}:
8.2%
12.3%
12.4%
7
31-99
K31-99
1%
13.1%
NL
no
tes
k = 4. The fourth tariff criterion is the Swiss canton the car is registered in
(according to its license plate). There are 26 different cantons in Switzerland which
implies l4 {1, . . . , 26}.
label
AG
AI
BE
AR
BL
BS
GE
FR
GL
GR
JU
LU
NE
NW
OW
SG
SH
SO
SZ
TG
TI
UR
VD
VS
ZG
ZH
Figure 7.3: Fourth tariff criterion: cantons of Switzerland the car is registered in,
i.e. l4 {AG, AI, . . . , ZH}.
k = 1. We observe that the light weight category W1-500 and the heavy weight
categories W2001-2500, W2501-3000 and W3001-3500 have a much higher claims
frequencies than the middle weight classes, see Figure 7.4 (lhs). The straight horizontal line is the overall sample claims frequency. Figure 7.4 (lhs) also indicates
that light and heavy weight categories have much less volume then the middle
weight categories, this is also reflected in the values vlweight
, l1 = 1, . . . , 7.
1 ,
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
194
NL
no
tes
Figure 7.5: Marginal MLEs (lhs) for the different kilometers yearly driven cateb km and (rhs) for the different cantons
b canton .
gories
l3
l4
k = 2. From the marginal claims frequencies for the different age classes mainly
young drivers are conspicuous, see Figure 7.4 (rhs). The average claims frequency
of drivers between 18 and 20 is more than twice as large as the average claims
frequency of all other drivers.
k = 3. Figure 7.5 (lhs) shows that frequent long-distance drivers have a much
higher claims frequency than other drivers. But frequent long-distance drivers are
only a small proportion of the total MTPL portfolio, see also vlkm
values.
3 ,
Version April 14, 2016, M.V. Wthrich, ETH Zurich
195
tes
(m
w)
no
b
Figure 7.6: (lhs) Tukey-Anscombe plot which shows the fitted means E[N
m ] versus
the deviance residuals rD,m for m = 1, . . . , M ; (rhs) QQ plot of the deviance
residuals rD,m versus the theoretical (estimated) quantiles qbm for m = 1, . . . , M .
NL
Observe that in this example we have 7 8 7 26 = 100 192 (potential) risk classes.
However, in only M = 60 146 risk classes we have a positive volume vm > 0 (in the
other risk classes we have not sold any policies). Introducing the multiplicative
tariff structure (7.27) with K = 4 tariff criteria reduces the complexity to r + 1 =
7 + 8 + 7 + 26 3 = 45 parameters. We apply the GLM estimation method for
Poisson claims counts, i.e. we evaluate (7.18) using Proposition 7.11. This is done
with R command
> d.glm <- glm(counts W1-500 + W501-1000 + ...,
data=input, offset=log(volumes), family=poisson())
where input contains the counts Nm , the volumes vm as well as the corresponding design matrix Z {0, 1}M (r+1) that consists of binary variables only. The
summary of the results, similar to Figure 7.2, is obtained by R command
196
> summary(d.glm)
MLE
w)
b
This R command provides the MLE
Rr+1 with corresponding standard
b = 30 761.8 on degrees of freedom
errors and p-values, the deviance statistics D(X, )
M r 1 = 60 146 45 = 60 101 and the AIC value of 13519. Furthermore, it
provides the so-called Null Deviance which corresponds to a model which only has
an intercept 0 . This Null Deviance corresponds to the total difference SStot in the
Gaussian model. In our example the Null Deviance is 80 025.5 on 60 146 1 = 60 145
degrees of freedom. The R command
m]
b
E[N
= vm
b MLE
(m
= vm exp
b MLE )
(Z
Next we determine the deviance residuals rD,m , see (7.24). In the Poisson case they
take a rather simple form
m ])
"
b
b
E[N
E[N
m]
m]
+
1 ,
log
Nm
Nm
tes
rD,m = sgn(Nm
b
E[N
v
u
u
t2N
no
b
for Nm = 0 the deviance residual reduces to rD,m = 2E[N
m ]. These deviance
b
residuals and the corresponding theoretical quantiles qm are obtained from the R
command
NL
b
30 761.8
D(X, )
= 0
= 0.62
M r1
6 101
and
bP = 0.89.
Both estimates bD and bP suggest under-dispersion in the data (a 2 -goodnessVersion April 14, 2016, M.V. Wthrich, ETH Zurich
197
(m
w)
NL
no
tes
Figure 7.7: Marginal MLEs and the GLM fitted values: (lhs) for the different
weight categories and (rhs) for the different age classes.
Figure 7.8: (lhs) Marginal MLEs and the GLM fitted values for the different kilometers yearly driven categories; (rhs) GLM fitted values for the different cantons.
of-fit test, see (2.8) for the test statistics, would reject the Poisson assumption on
the 1% significance level). However, as long as we are only interested into tariff
segmentation for different risk classes we may still use the GLM fit as relative tariff
factors unless we have drastic changes in the portfolio mix. Finally, in Figures
7.7 and 7.8 we provide the fitted tariff factors compared to the marginal MLE
estimates.
k = 1. We see that the GLM fit punishes the light weight cars even slightly more
whereas heavy weight cars are relieved, see Figure 7.7 (lhs). From a practical point
Version April 14, 2016, M.V. Wthrich, ETH Zurich
198
of view the former seems a bit unreasonable. It probably has to do with the fact
weight
in the lowest weight class is very small (this weight class
that the volume v1,
should probably be merged with the next one). The relieve for the heavy weight
cars might be compensated by the fact that these heavy weight cars are typically
driven by frequent long-distant drivers.
k = 2. The marginal estimates for different age classes are very much in line
with the corresponding GLM fits, see Figure 7.7 (rhs).
w)
k = 3. Figure 7.8 (lhs) suggests that we can probably merge the three kilometers
yearly driven classes K21-25, K26-30 and K31-99, also due to their small volumes.
(m
k = 4. Figure 7.8 (rhs) shows that we might be able to merge the different
cantons into 4 or 5 different tariff regions to reduce the complexity of the tariff
structure. This is what we will analyze next using the variable reduction technique
of Section 7.3.3.
no
tes
In the last step we present the reduction of variables technique presented in Section
7.3.3. We have performed this for all tariff criteria: the weight category criterion
and the age classes cannot further be reduced. This is a bit surprising for the lowest
weight class W1-500 because the resulting estimate seems a bit unreasonable and
weight
. But the tests clearly reject the null
this risk factor has a very low volume v1,
hypothesis of a merger with the next weight class. Therefore, we only present the
analysis for the kilometers yearly driven tariff criterion and for the canton tariff
criterion.
NL
Null hypothesis H0 : The three kilometers yearly driven classes K21-25, K26-30 and
K31-99 are merged, i.e. 3,5 = 3,6 = 3,7 .
We calculate the test statistics F given in (7.22), the test statistics X 2 given in
(7.23) and the AIC. These values are given in Table 7.3.
AIC
deviance statistics
test statistics F
test statistics X 2
full model
13519
3761.8
under H0
13516
3762.0
test statistics
p-value
0.23
0.28
79%
87%
Table 7.3: Parameter reduction analysis for the tariff criterion kilometers yearly
driven.
The AIC supports the model with merged classes K21-25, K26-30 and K31-99
Version April 14, 2016, M.V. Wthrich, ETH Zurich
199
and both, the F test statistics and the X 2 test statistics, do not reject the null
hypothesis H0 on a 5% significance level. Therefore, we consider a merger of these
risk classes to one risk factor according to null hypothesis H0 .
(m
w)
Null hypothesis H0 :
(i) The three kilometers yearly driven classes K21-25, K26-30 and K31-99 are
merged, i.e. 3,5 = 3,6 = 3,7 ;
(ii) the following cantons are merged:
(a) 4,AG = 4,BE = 4,LU ,
(b) 4,AI = 4,AR ,
(c) 4,GR = 4,SG ,
(d) 4,GL = 4,NW = 4,OW = 4,SZ = 4,UR = 4,ZG ,
(e) 4,FR = 4,GE = 4,JU = 4,NE = 4,TI = 4,VD = 4,VS .
We again calculate the test statistics F given in (7.22), the test statistics X 2 given
in (7.23) and the AIC. The results are presented in Table 7.4.
under H0
13516
3792.4
test statistics
tes
AIC
deviance statistics
test statistics F
test statistics X 2
full model
13519
3761.8
2.92
30.6
p-value
<1%
2.2%
no
Table 7.4: Parameter reduction analysis for the tariff criteria kilometers yearly
driven and for the cantons according to null hypothesis H0 .
NL
The AIC supports the reduced model of null hypothesis H0 , the F test statistics
rejects H0 on a 1% significance level, whereas the X 2 test statistics does not reject
the null hypothesis H0 on a 1% significance level. Thus, if we want to reduce
the complexity of the tariff structure, we could choose the reduced model of null
hypothesis H0 , see Figure 7.9 for the resulting regional tariff factors.
At the end this tariff decision is a strategic business decision (which is supported by
statistical analysis). This business decision will also depend on the tariff structure
applied in the previous year: in these considerations, in particular when introducing a new tariff structure, one should always keep in mind that the individual
premia on single policies should not change too much from one year to the next.
Otherwise loyal customers will be very upset about the new price politics in the
insurance company and they will think that the companys business is not under
control. Therefore, transitions should always be done as smoothly as possible. An
other reason for such business decisions is that the prices should be competitive
(hopefully) in many segments. Therefore, it is also important that these business
decisions take into account what competitors are doing.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
(m
w)
200
NL
no
tes
Figure 7.9: Tariff factors for cantons: (lhs) full model; (rhs) reduced model of null
hypothesis H0 .
Chapter 8
(m
w)
tes
In the previous chapter we have done tariffication using GLM. This was done by
splitting the total portfolio into different homogeneous risk classes (i, j). The volume measures in these risk classes (i, j) were given by vi,j in Section 7.3.1 (Poisson
case) and by ni,j in Section 7.3.2 (gamma case), respectively. There might occur the
situation where a risk class (i, j) has only small volume vi,j and ni,j , respectively,
i.e. only a few policies or claims fall into that risk class. In that case an observation
Ni,j and Si,j may not be very informative and single outliers may disturb the whole
picture, see Figure 7.7 (lhs). Credibility theory aims at dealing with such situations
in that it specifies a tariff of the following type
Si,j
+ (1 i,j ),
vi,j
no
b i,j = i,j
NL
i.e. the tariff b i,j for next accounting year is calculated as a credibility weighted
average between the individual past observation Si,j /vi,j and the overall average
with credibility weight i,j [0, 1]. For i,j = 1 we completely believe into
past the observation Si,j /vi,j , for i,j = 0 we only believe into the overall average
. Credibility theory makes this approach rigorous and specifies the credibility
weights i,j .
Credibility theory belongs to the field of Bayesian statistics:
There are exact Bayesian methods which allow for analytical solutions.
There are simulation methods such as the Markov chain Monte Carlo (MCMC)
method which allow for numerical solutions of Bayesian models.
There are approximations such as linear credibility methods which give optimal solutions in sub-spaces of possible solutions.
Central to these methods is the Bayes rule.
201
202
8.1
(m
w)
NL
no
tes
Figure 8.1: (lhs) Grave of the Bayes family at Bunhill Fields Burial Ground, London
UK; (rhs) historical review of McGrayne [76].
We specify a prior distribution/density for the (unknown) parameter . We
will explain below how this prior distribution is specified. The joint density of
Version April 14, 2016, M.V. Wthrich, ETH Zurich
203
w)
(|x) = R
(m
This means that we start with a prior distribution (). This prior distribution
either expresses expert knowledge or is determined from a portfolio of similar business. Having observed x, we modify the prior believe () to obtain the posterior
distribution (|x) that reflects both prior knowledge () about and experience
x. That is, the prior believe () is improved by the arriving observation x. The
general idea then is to update our (prior) knowledge about whenever an observation arrives. These updates constantly improve our estimation of the unknown
model parameter .
tes
Poisson-gamma model
no
8.1.1
NL
F. Bichsel
Definition 8.1 (Poisson-gamma model). Assume fixed volumes vt > 0 are given
for t N.
Conditionally, given , the components of N = (N1 , . . . , NT ) are independent
with Nt Poi(vt ).
(, c) with prior parameters > 0 and c > 0.
204
|{N } +
T
X
Nt , c +
T
X
vt .
t=1
(m
t=1
w)
(vt )Nt c 1 c
e
Nt ! ()
t=1
PT
PT
c+
vt
t=1
.
+ t=1 Nt 1 e
evt
tes
Remarks 8.3.
no
NL
Tpost
T
X
Nt
and
c 7
cpost
T
=c+
T
X
vt .
t=1
t=1
Often and c are called prior parameters and Tpost and cpost
posterior paT
rameters (at time T ).
Note that this update has a recursive structure
Tpost = Tpost
1 + NT
and
cpost
= cpost
T
T 1 + vT .
205
For the estimation of the unknown parameter we obtain the following prior
and posterior estimators
0 = E[] = ,
c
P
Tpost
+ Tt=1 Nt
post
b
.
T
= E[|N ] = post =
P
c + Tt=1 vt
cT
w)
(m
b post =
b + (1 ) .
T
T
T
0
T
b given by
with credibility weight T and observation based estimator
T
PT
T =
c+
t=1 vt
PT
t=1 vt
(0, 1)
and
b = P 1
T
T
t=1
T
X
vt
Nt .
t=1
post 2
b
T
N
Tpost
1 b post
= post 2 = (1 T )
.
c T
(cT )
no
tes
t=1 vt
bpost = + Pt=1 Nt =
Nt +
PT
PT
PT
T
T
c
c + t=1 vt
c + t=1 vt
v
c
+
v
t=1 t t=1
t=1 t
bT + (1 T ) 0 .
T
NL
This proves the first claim. For the estimation uncertainty we have
2
Tpost
1 bpost
post
b
N
.
E T
=
Var
(
|
N
)
=
post 2 = (1 T )
c T
(cT )
This proves the claim.
Remarks 8.5.
b post is a credibility weighted
Corollary 8.4 shows that the posterior estimator
T
average between the prior guess 0 and the purely observation based estimator
b with credibility weight (0, 1).
T
T
206
1
= 0 .
2
c
c
For c large we have informative prior distribution, for c small we have vague
prior distribution and for c = 0 we have non-informative or improper prior
distribution. The latter means that we have no prior parameter knowledge
(this has to be understood in an asymptotic sense).
w)
Var () =
The observation based estimator satisfies, see Estimators 2.27 and 2.32,
(m
b MV =
b MLE =
b .
T
T
T
tes
NT
b post .
+ (1 T )
T 1
vT
no
b post =
T
T
T =
vT
c+
PT
t=1
(0, 1).
vt
t=1 vt
PT
t=1 vt
NL
bpost
c+
c+
c+
1
PT
t=1
1
PT
t=1
PT
t=1 vt
T
X
vt
T
X
Nt +
t=1
T
1
X
vt
Nt +
t=1
c+
c
PT
c
v
t=1 t
c+
PT 1
vt
c+
PT
vt c +
t=1
t=1
c
PT 1
t=1
vt c
!
Nt + NT
+ (1 T )(1 T 1 )0 .
t=1
c+
1
PT
T
1
X
t=1
vt
c+
= T
!
Nt + NT
t=1
PT 1
PT 1
T
1
X
c + t=1 vt
NT
1
t=1 vt
+
Nt
PT
PT 1
PT 1
vT
c + t=1 vt c + t=1 vt
t=1 vt
t=1 vt t=1
vT
PT
NT
bT 1 .
+ (1 T ) T 1
vT
207
2
w)
b post =
t
t
tes
(m
8.1.2
no
Next we study a larger class of distribution functions for which we can explicitly
solve the pricing problem in a Bayesian context.
NL
The crucial property of the Poisson-gamma model is that the prior and the posterior distributions belong to the same family of parametric distributions, only the
parameters change from prior parameters to posterior parameters. There are many
examples of this type. The best known examples belong to the exponential dispersion family with conjugate priors. We have already met the exponential dispersion
family in Definition 7.7, X EDF(, , w, b()) has (generalized) density
(
x b()
fX (x; , ) = exp
+ c(x, , w) ,
/w
for an (unknown) parameter in the open set . In the Bayesian case we will
model this parameter = with a prior distribution on and then try to
determine the posterior distribution after we have collected (independent) observations X1 , . . . , XT that belong to this EDF(, , w, b()).
208
x0 b()
x0 , () = exp
+ d(x0 , ) ,
2
(m
w
with fixed prior parameters x0 I and (0, c ), and d(, ) describes the
normalization. I R denotes the possible choices of x0 so that x0 , is a
well-defined density on for all (0, c ) for a fixed given constant c > 0.
Conditionally, given , the components of X = (X1 , . . . , XT ) are independent with Xt EDF(, , wt , b()), having well-defined densities with
supports not depending on .
post
" T
X
tes
Theorem 8.8. We make Model Assumptions 8.7 and assume that the domain I of
possible prior choices x0 is an open interval which contains the range of Xt for all
and t = 1, . . . , T . The posterior distribution of , given X, is given by the
density bxpost , post () with
1
wt
+ 2
t=1
#1/2
<
with
post (0, c ),
no
= T xbMV
xbpost
+ (1 T )x0 I,
T
T
wt
t=1 wt + 2
T = PT
t=1
and
xbMV
= PT
T
T
X
t=1
wt
wt X t ,
t=1
NL
where for the minimum variance statement we additionally assume that the second
moments of Xt |{} exist and the cumulant function b C 2 in the interior of .
Proof. The Bayes rule gives for the posterior distribution of , conditionally given X,
T
T
Y
Y
Xt b()
x0 b()
(| X)
fXt (Xt ; , ) x0 , ()
exp
exp
/wt
2
t=1
t=1
(" T
#
" T
#
)
X wt
X Xt wt
x0
1
= exp
+ 2
+ 2 b()
t=1
t=1
"
#
"
#
1
T
T
X
X
w
1
X
w
x
t
t t
0
= exp ( post )2
+ 2
+ 2 b() .
t=1
t=1
t=1
t=1
T PT
T
X
wt
wt
t=1 t=1
Xt + (1 T )x0 I.
209
The latter holds true because I is (by assumption) an open interval that contains x0 and the
range of all possible outcomes Xt for all and t = 1, . . . , T . Therefore, we obtain posterior
density b
which is a well-defined density on by assumption. There remains the proof
xpost
, post
T
of the minimum variance statement. For fixed parameter we know that X = (X1 , . . . , Xn )
are independent with Xt EDF(, , wt , b()). Corollary 7.9 (or its generalization) implies
E[Xt |] = b0 ()
and
Var(Xt |) =
00
b ().
wt
(8.1)
w)
Note that does not depend on t, therefore the statement of the minimum variance estimator
follows from Lemma 2.26. This closes the proof.
2
= T xbMV
+ (1 T ) x0 ,
E [b0 ()| X] = xbpost
T
T
(m
E [b0 ()] = x0
no
tes
Proof. In view of Theorem 8.8 it suffices to prove the first statement for all x0 I and (0, c ).
Z
x0 b()
0
0
+ d(x0 , ) d
E [b ()] =
b () exp
2
Z
x0 b0 ()
x0 b()
2
= x0
exp
+ d(x0 , ) d
2
2
x0 b()
= x0 2 exp {d(x0 , )} exp
= x0 .
2
NL
(8.2)
+ (1 T ) x0 .
= T xbMV
T
Thus, we get a credibility weighted average for the premium of XT +1 which is based
on the prior knowledge x0 , and on the past experience X1 , . . . , XT . Similar to
Version April 14, 2016, M.V. Wthrich, ETH Zurich
210
Corollary 8.6 we obtain a recursive update structure for this experience premium,
which allows to express the premium more and more accurately as time passes
(under the above stationarity assumptions, of course).
Remarks 8.11.
w)
Examples that belong to the exponential dispersion family with conjugate priors are: Poisson-gamma model,
gamma-gamma model, (log-)normal-normal model. For
detailed information we refer to Chapter 2 in BhlmannGisler [24].
(m
tes
Theorem 8.8 gives an additional way of parameter estimation within the exponential dispersion family. In contrast to the MLEs and
the minimum variance estimators, this Bayesian way also allows to include
prior information, which may come from experts or from similar business.
Moreover, parameter uncertainty is quantified by the posterior distribution.
no
NL
Example 8.12 (gamma-gamma model). We close this section with the example of
the gamma-gamma model. We recall Example 7.10. Choose fixed volumes wt > 0,
t = 1, . . . , T , and dispersion parameter = 1/ > 0. Assume that conditionally,
given > 0, X1 , . . . , XT are independent gamma distributed with densities
fXt (x; , ) =
(wt /)wt / wt /1
x
exp {wt / x}
(wt /)
for x R+ .
This is the form used in (7.19) with scale parameter c = / > 0. Observe that
the range of the random variables Xt is R+ and that we obtain well-defined gamma
densities on R+ for all R+ and all t = 1, . . . , T . This motivates the choice of
f = R for the possible parameter choices .
the open set
+
Thus, we need to show two things: (i) the density fXt (x; , ) belongs to the
exponential dispersion family for a particular cumulant function b : R; (ii)
this will allow to define the conjugate prior density x0 , for which we would like
to show that we can apply Theorem 8.9.
211
Item (i) was already done in Example 7.10, however we will do it once more because
the signs need a careful treatment.
fXt (x; , ) = wt / exp {wt / x} exp {c(x, , wt )}
n
x() ( log(()))
exp {c(x, , wt )} .
= exp
/wt
w)
The last formula seems to be a waste of minus signs, but with the definitions
= and b() = log() for < 0 we see that the gamma density belongs
to the exponential dispersion family, that is, by a slight abuse of notation in fXt ,
(m
x b()
fXt (x; , ) = exp
exp {c(x, , wt )} .
/wt
1
= 1 R+ .
tes
E [Xt | ] = b0 () =
This completes task (i).
no
x0 b()
+ d(x0 , )
x0 , () = exp
2
() 2 +11 exp
x0
() .
2
NL
This is a gamma density, set = , with shape parameter 1 + 1/ 2 > 0 and scale
parameter x0 / 2 . This implies that we should choose I = R+ and > 0. In view of
Theorem 8.8 the assumptions are fulfilled because I is an open interval containing
all possible observations Xt , and thus Theorem 8.8 can be applied.
Next we observe that this density disappears on the boundary of = R+ given
by the set {} {0}. Therefore, we have from Theorem 8.9 (we also perform
the whole calculation to back test the result)
h
x0 = E[b ()] = E
1 (x0 /
R+
2 1+1/ 2
1
)
x0
2 +11 exp 2
2
(1 + 1/ )
2
2
(x0 / 2 )1+1/ (1/ 2 ) Z (x0 / 2 )1/ 12 1
x0
exp
d = x0 .
(1 + 1/ 2 ) (x0 / 2 )1/ 2 R+ (1/ 2 )
2
E 1 X = xbpost
= T xbMV
+ (1 T ) x0 ,
T
T
Version April 14, 2016, M.V. Wthrich, ETH Zurich
212
wt
.
t=1 wt + 2
T = PT
t=1
8.2
w)
In this section we have considered examples for which we can explicitly calculate
the posterior distribution. The next section will give approximations where this is
not the case.
no
tes
(m
NL
(|x) f (x)(),
8.2.1
213
Bhlmann-Straub model
Model 8.13 (Bhlmann-Straub (BS) model [25]). Assume we have I risk classes
and T random variables per risk class. Assume fixed volumes wi,t > 0, i = 1, . . . , I
and t = 1, . . . , T , are given.
Conditionally, given i , the components of X i = (Xi,1 , . . . , Xi,T ) are independent with the first two conditional moments given by
w)
E [ Xi,t | i ] = (i ),
2 (i )
Var (Xi,t | i ) =
.
wi,t
(m
tes
Remarks 8.14.
no
NL
If we set I = 1, i.e. we only have one risk class, then an explicit example to the
BS Model 8.13 is given by the exponential dispersion family with conjugate
priors, Model Assumptions 8.7. The conditional mean and variance are then
modeled by, see (8.1),
() = b0 ()
and
2 () = b00 (),
= Var((1 ))
= E[ (1 )]
collective mean,
(8.3)
(8.4)
(8.5)
214
8.2.2
The Bayesian estimator for the (unknown) mean (i ) of risk class i is given by
\
(
i ) = E [(i )| X 1 , . . . , X I ] .
(8.6)
w)
In the exponential dispersion family with conjugate priors this posterior mean can
be calculated explicitly, see Theorem 8.9. In most other situations, however, this is
not the case. Therefore, we approximate this posterior mean. We briefly describe
how this approximation is done. Assume that all considered random variables
are square integrable, thus we work on the Hilbert space L2 (, F, P) of square
integrable random variables, where the inner product is given by
for X, Y L2 (, F, P).
hX, Y i = E [XY ]
L0 (X) =
b = a0 +
X
i,t
b =
no
L(X, 1) =
tes
(m
In this Hilbert space the random vectors X 1 , . . . , X I generate the subspace G(X)
\
of all (X 1 , . . . , X I )-measurable random variables. The posterior mean (
i)
2
given by (8.6) is the element of the subspace G(X) that minimizes the L -distance
\
between this subspace G(X) and (i ). In the Hilbert space this estimate (
i)
corresponds to the orthogonal projection of (i ) onto G(X). In general, this minimization and orthogonal projection to G(X), respectively, has a too complicated
form. To reduce this complexity we restrict the orthogonal projection to simpler
\
subsets L of G(X). This will provide approximations to (
i ) G(X) in the more
restricted subsets L G(X). We define the following two subsets
b = 0
ai,t Xi,t ; ai,t R for all i, t and E[]
i,t
G(X),
G(X).
NL
The first subset L(X, 1) includes the constants which will imply unbiasedness of
the estimators, whereas in the second case L0 (X) we need to enforce unbiasedness
by a side constraint.
Definition 8.15 (inhomogeneous and homogeneous credibility estimator). We assume that the BS Model 8.13 is fulfilled with collective mean 0 R.
The inhomogeneous (linear) credibility estimator of (i ) based on X 1 , . . . , X I is
defined by
i
h
\
\
b (i ))2 .
(
i ) = arg min E (
bL(X,1)
bL0 (X)
215
\
\
Remark 8.16. The inhomogeneous credibility estimator (
i ) is the best approx2
imation to (i ) (in the L -sense) among all linear estimators given by L(X, 1).
Because L(X, 1) is a subset of G(X), we immediately obtain for the mean square
error with the Pythagorean theorem for successive orthogonal projections
!2
"
\
(
i ) (i )
=E
2 #
\
\
\
+ E (
i ) (i )
!2
.
(8.7)
\
\
E (
i ) (i )
A. Gisler
tes
(m
w
no
c
with credibility weight i,T and observation based estimator X
i,1:T
PT
wi,t
2
t=1 wi,t + 2
t=1
NL
i,T = PT
c
X
i,1:T = PT
and
T
X
t=1
wi,t
wi,t Xi,t .
t=1
\
\
(
i)
c
bT ,
= i,T X
i,1:T + (1 i,T )
with estimate
b T = PI
i=1 i,T
I
X
c
i,T X
i,1:T .
i=1
Proof of Theorem 8.17. The theorem can be proved by brute force doing convex optimization
(using the method of Lagrange in the latter case) or we can apply Hilbert space techniques using
projection properties, see Chapters 3 and 4 in Bhlmann-Gisler [24]. We do the brute force
216
2
X
al,t Xl,t (i )
h(a) = E a0 +
l,t
(m
w
l,t
over all possible choices a0 , al,t R. This requires that we calculate all derivatives w.r.t. these
parameters and set them equal to zero.
!
h(a) = 2E a0 +
al,t Xl,t (i ) = 0,
(8.8)
a0
l,t
!
h(a) = 2E Xj,s a0 +
al,t Xl,t (i ) = 0.
(8.9)
aj,s
Equation (8.8) immediately implies unbiasedness of the inhomogeneous credibility estimator, and
moreover that
X
al,t .
a0 = 0 1
l,t
Plugging this into (8.9) and using (8.8) once more immediately gives for all j, s the requirement
X
!
Cov Xj,s ,
al,t Xl,t (i ) = 0.
tes
l,t
no
Using the uncorrelatedness between different risk classes (which is implied by the independence)
we obtain the following (normal) equations, see Corollary 3.17 and Section 4.3 in Bhlmann-Gisler
[24],
X
a0 = 0 1
al,t ,
(8.10)
l,t
Cov (Xj,s , (i )) =
T
X
for all j, s.
(8.11)
t=1
NL
2
1{t=s} + 2 > 0.
wj,s
This implies that the left-hand side of (8.11) is equal to 0 for j 6= i and because Cov (Xj,s , Xj,t )
2 > 0 it follows that aj,s = 0 for all j 6= i. This is not surprising because we have assumed that
the different risk classes are independent. Therefore (8.10)-(8.11) reduce to
!
T
X
def.
a0 = 0 1
ai,t
= 0 (1 i,T ) ,
(8.12)
t=1
T
X
2
+ 2
ai,t = ai,s
+ 2 i,T
wi,s
w
i,s
t=1
2
= ai,s
for all s.
(8.13)
217
PT
This defines i,T = t=1 ai,t and we still need to see that this credibility weight has the claimed
form. Requirement (8.13) then implies for all s
ai,s
2
(1 i,T ) wi,s .
2
i,T =
ai,s =
s=1
T
X
2
(1
)
wi,s .
i,T
2
s=1
2
2
i,T =
2
2
PT
PT
t=1 wi,t
=
,
PT
PT
2
s=1 wi,s + 1
t=1 wi,t + 2
s=1
wi,s
t=1
wi,t
2
2
t=1 wi,t +
wi,s =
2
2
P
2 Tt=1 wi,t +
wi,s
wi,s = i,T PT
.
t=1 wi,t
(m
ai,s
2
= 2
w)
2
2
If we collect all the terms we have found the following inhomogeneous credibility estimator
1
\
\
(
i ) = i,T PT
bi,1:T + (1 i,T ) 0 .
wi,s Xi,s + (1 i,T ) 0 = i,T X
s=1
tes
t=1 wi,t
T
X
no
This proves the first claim and an important observation is that this credibility estimator is
unbiased for 0 . Therefore, it coincides with the estimator if we would have projected to
L0 (X, 1) = L(X, 1) {b
L2 (, F, P) : E[b
] = 0 }.
The proof of the homogeneous credibility estimator goes along the same lines as the inhomogeneous one, using the method of Lagrange for replacing (8.8) by the side constraint
X
X
X
0 = E [b
] = E
ai,t Xi,t =
ai,t E [Xi,t ] =
ai,t 0 ,
i,t
i,t
i,t
NL
which implies i,t ai,t = 1. An alternative proof would go by using the iterative property and the
linearity of orthogonal projections on subspaces. For details we refer to Section 4.6 in BhlmannGisler [24]. This closes the proof of Theorem 8.17.
2
Remarks 8.18 (interpretation of the BS formula of Theorem 8.17). The BS formula provides the best linear approximations to the true premium (i ) and the
2
\
Bayesian estimator (
i ) in the L -sense, see also (8.7).
The inhomogeneous and the homogeneous credibility estimators are somewhat different which may also lead to different interpretations.
For the inhomogeneous credibility estimator we assume that there is prior
knowledge on (i ) in the form of the prior mean parameter 0 . This prior
knowledge has uncertainty described by the variance parameter 2 and the
resulting estimator is the classical credibility weighted average between portfolio experience X i and prior knowledge 0 which leads to the credibility
weights i,T . To calculate this estimator it is sufficient to have one risk class
only.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
218
(m
w)
It describes the ratio of volatilities within risk classes and between risk classes.
This is the crucial ratio that determines the credibility weights
PT
wi,t
.
t=1 wi,t +
i,T = PT
t=1
8.2.3
no
tes
This latter case can now be used for tariffication of risk factors on different risk
classes, similar to the GLM Chapter 7. The overall premium is given by b T , the
c
experience of risk class i is given by X
i,1:T and the credibility weight i,T (0, 1)
explains how this information needs to be combined to obtain the risk adjusted
premium of risk class i.
NL
In order to apply the credibility estimators there remains the specification of the
structural parameters 2 and 2 . We make the same choice as in Bhlmann-Gisler
[24]. We define the sample estimators of risk class i
sb2i =
T
2
X
1
c
wi,t Xi,t X
.
i,1:T
T 1 t=1
I
1 X
sb2 ,
I i=1 i
(8.15)
with E[bT2 ] = 2 . Observe that one risk class is sufficient to get an estimate for 2
if T > 1.
If we have prior knowledge 0 then 2 should be calibrated such that it quantifies the
reliability of this prior knowledge. If we use the homogeneous credibility estimator
Version April 14, 2016, M.V. Wthrich, ETH Zurich
219
then 2 is estimated from the volatility between the risk classes (here we need more
than one risk class i). We define the weighted sample mean over all observations
!
X
X X
1
c
=P 1
wi,t X
X
wi,t Xi,t = P
i,1:T .
i,t wi,t i,t
i,t wi,t
t
i
w)
vbT2
(m
Similar to Lemma 2.29 we can calculate the expected value of vbT2 which then shows
that we need to define
!
I bT2
2
2
b
tT = cw vbT P
,
j,s wj,s
with constant
"
I 1 X t wi,t
wi,t
cw =
1 P t
P
I
j,s wj,s
j,s wj,s
i
P
!#1
tes
This estimator has the unbiasedness property E[tb2T ] = 2 , we refer to Section 4.8 in
Bhlmann-Gisler [24]. The only difficulty is that it might become negative which,
of course, is non-sense for estimating 2 . Therefore, we set for the final estimator
n
no
(8.16)
NL
estimates are provided in Table 8.2. We see that in risk class 4 we have big volumes
b 4,T = 87.8%. In risk class
v4,t which results in a high credibility weight estimate of
5 we have small volumes v5,t which results in a low credibility weight estimate of
b 5,T = 45.2%. From this we calculate the credibility weighted overall claims ratio
risk class 2
risk class 3
risk class 4
3
872
262
30.0%
2090
326
15.6%
874
699
80.0%
3715
3121
84.0%
422
169
40.0%
4
951
837
88.0%
2300
463
20.1%
917
1742
190.0%
3859
4129
107.0%
424
1018
240.1%
5
1019
1630
160.0%
2368
895
37.8%
944
1038
110.0%
4198
3358
80.0%
440
44
10.0%
tes
risk class 5
v1,t
S1,t
X1,t
v2,t
S2,t
X2,t
v3,t
S3,t
X3,t
v4,t
S4,t
X4,t
v5,t
S5,t
X5,t
2
786
1100
139.9%
1802
1298
72.0%
827
496
60.0%
3454
4145
120.0%
420
0
0.0%
w)
risk class 1
1
729
583
80.0%
1631
99
6.1%
796
1433
180.0%
3152
1765
56.0%
400
40
10.0%
(m
220
NL
no
Table 8.1: Observed claims Si,t and corresponding numbers of policies vi,t .
b i,T
b
Xi,1:T
\
\
(
i)
risk class 1
63.0%
101.3%
risk class 2
79.9%
30.2%
risk class 3
63.0%
124.1%
risk class 4
87.8%
89.9%
risk class 5
45.2%
60.4%
93.5%
40.3%
107.9%
88.7%
71.3%
hom
c
b i,T , observation based estimate X
Table 8.2: Estimated credibility weights
i,1:T and
hom
\
\
homogeneous credibility estimate (
of the claims ratio at time T = 5.
i)
221
= 77.9%) and
b T = 80.4% (which should be compared to the sample mean X
from this we finally calculate the homogeneous credibility estimators for the claims
c
b T according to the
ratios, see Table 8.2. We observe smoothing of X
i,1:T towards
b i,T .
credibility weights
8.2.4
w)
Exercise 23.
(a) Choose the data of Table 8.1 and calculate the inhomogeneous credibility esti\
\
mators (
i ) for the claims ratios under the assumption that the collective mean
is given by 0 = 90% and the variance between risk classes is given by 2 = 0.20.
(b) What changes if the variance between risk classes is given by 2 = 0.05?
(m
\
\
Observe that the credibility estimator (
i ) is used to estimate (i ) and to predict
next years claim Xi,T +1 . Similar to (1.9) we can analyze the total prediction error
!
\
\
\
\
Xi,T +1 (
i ) = (Xi,T +1 (i )) + (i ) (i ) .
\
\
E Xi,T +1 (
i)
tes
= E (Xi,T +1 (i ))
!2
\
\
+ E (i ) (
i)
no
\
\
= E [Var (Xi,T +1 | i )] + E (i ) (
i)
wi,T +1
+ (1 i,T ) 2 ,
!2
(8.17)
NL
see Theorem 4.3 in Bhlmann-Gisler [24]. The first term on the right-hand side
of (8.17) is called process variance and the second term parameter uncertainty.
Similarly we obtain for the homogeneous credibility estimator, see Theorem 4.6 in
Bhlmann-Gisler [24],
E Xi,T +1
hom
\
\
(
i)
2
wi,T +1
+ (1 i,T ) 2
1 i,T
1+ P
i i,T
(8.18)
The expressions in (8.17) and (8.18) are called mean square error of prediction
(MSEP). We will come back to this notion in Section 9.3 and for a comprehensive
treatment we refer to Section 3.1 in Wthrich-Merz [100].
hom
\
\
Exercise 24. Estimate the prediction uncertainty E[(Xi,T +1 (
)2 ] for the
i)
data of Example 8.19 under the assumption that the volume grows 5% in each risk
class.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
222
Ni
3880
794
8941
3448
1672
5186
314
1934
2285
2689
661
4878
1205
1646
850
2229
3389
5937
1530
671
15014
69153
(m
vi
50061
10135
121310
35045
19720
39092
4192
19635
21618
34332
11105
56590
13551
19139
10242
28137
33846
61573
17067
8263
148872
763525
tes
region i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
total
w)
no
Calculate the inhomogeneous credibility estimators for each region i under the
assumption that Ni |i has a Poisson distribution with mean (i )vi = i 0 vi and
E[i ] = 1. The prior frequency parameter is given by 0 = 8.8% and the prior
uncertainty by 2 = 2.4 104 .
NL
Hint: For the estimation of the credibility coefficient = 2 / 2 one should use that
Ni |i is Poisson distributed which has direct consequences for the corresponding
variance 2 (i ), see also Proposition 2.8.
Example 8.20 (MTPL frequencies). We revisit the MTPL example of Section
7.3.4. In this example we have observed that some of the risk classes m have a
rather small volume vm which, of course, is in favor of applying credibility methods.
For this analysis we only consider a risk classification by cantons. This exactly
corresponds to the marginal consideration in Figure 7.5 (rhs). We assume that the
regional data fulfills the BS model assumptions with wm = vm and, henceforth,
we can use Theorem 8.17. We do this under the additional assumption of having
conditional Poisson distributions for Nm , m {AG, AI, . . . , ZH}. The latter implies
for Xm = Nm /vm that, see also Exercise 25 above,
2 (m )
E [ Xm | m ]
(m )
= Var( Xm | m ) =
=
.
wm
vm
vm
Version April 14, 2016, M.V. Wthrich, ETH Zurich
223
(m
w)
hom
\
b canton and (rhs) credibility estimators (
\
Figure 8.2: (lhs) MLEs
m)
m
different cantons m {AG, AI, . . . , ZH}.
for the
tes
2 = E[ 2 (m )] = E[(m )] = 0 ,
NL
no
\
\
(
m)
b ,
= m Xm + (1 m )
0
m =
vm
.
b /b2
vm +
0
The resulting credibility weights m are within the interval (35%, 98%) depending
b = 7.48%
on having a small or large canton. Remarkable is that the estimate
0
is substantially higher than the sample mean of 7.15%. In Figure 8.2 we present
hom
\
canton
b
\
the MLEs m
= Xm and the credibility estimators (m )
for the different
cantons m {AG, AI, . . . , ZH}. We observe that the MLEs are smoothed out
b . This applies in particular to small cantons
towards the collective mean estimate
0
Version April 14, 2016, M.V. Wthrich, ETH Zurich
224
NL
no
tes
(m
w)
such as m = AI, whereas large cantons are only marginally affected by the collective
mean estimate.
This picture could now be further refined using methods from spatial statistics
(based on the intuition that neighboring cantons behave more similarly, etc.). This
has, for instance, been done in Fringeli [50].
Chapter 9
w)
Claims Reserving
St =
Nt
X
i=1
(m
This chapter will give a completely new perspective to non-life insurance business
which has not been covered in these notes, yet. Until now we have assumed that
the total claim amount for a fixed accounting year can be described by a compound
distribution of the form
Yi ,
NL
no
tes
226
debt securities
equity securities
loans & mortgages
real estate
participation
short term investments
other assets
6374
1280
1882
908
2101
693
696
total assets
13934
liabilities as of 31/12/2013
mio. CHF
claims reserves
provisions for annuities
other liabilities and provisions
share capital
legal reserve
free reserve, forwarded gains
total liabilities & equity
7189
1178
2481
169
951
1966
13934
w)
assets as of 31/12/2013
Table 9.1: Source: Annual Report 2013 of AXA Versicherungen AG, Switzerland.
(m
9.1
NL
no
tes
227
claims payments
claims closing T_3
w)
time
(m
Figure 9.1: Non-life insurance run-off showing insurance period [U1 , U2 ] and a claim
with accident date T1 [U1 , U2 ], reporting date T2 > U2 > T1 and settlement date
T3 > T2 . Moreover, we have claims payments during the settlement period [T2 , T3 ].
tes
no
1. t < T1 . Such (potential) claims have not yet occurred. If the company is
lucky then T1 > U2 . This means that it is not liable for this particular claim
with the actual insurance policy because the contract is already terminated
at claims occurrence. Be careful, the company may still be liable for this
particular claim, namely, if the contract is renewed and T1 falls into the
renewed insurance period, but renewals are not of interest for the present
(claims reserving) discussion because they correspond to insurance exposures
only sold in the future.
NL
In this first case t < T1 the only information available at the insurance company is the insurance contract signed, i.e. the exposure for which it is liable in
case of a claims occurrence T1 [U1 , U2 ]. Therefore, one often speaks about
unearned premium if the exposure has not yet expired, i.e. if t < U2 .
2. T1 t < T2 and T1 [U1 , U2 ]. In this case the insurance claim has occurred
but it has not yet been reported to the insurance company. These claims
are called Incurred But Not Yet Reported (IBNYR) claims. For such claims
we do not have any individual claims information (because it is IBNYR) but
we already have external information, like economic environment (e.g. unemployment rate, inflation rate, financial distress), weather conditions and
natural catastrophes (storm, flood, earthquake, etc.), nuclear power accident,
flu epidemic, and so on. This external information already gives us a hint
whether we should expect more or less claims reportings.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
228
3. T2 t < T3 and T1 [U1 , U2 ]. These claims are reported at the company but
the final assessment is still missing. Typically, we are in the situation of more
and more information becoming available about the claim, i.e. the prediction
uncertainty in the final claim assessment decreases. However, these claims
are not completely resolved and therefore they are called Reported But Not
Settled (RBNS) claims. The settlement period [T2 , T3 ] is also the period
within which claims payments are done, see Figure 9.1.
w)
During the settlement period we receive more and more information of the
individual claim like accident date, cause of accident, type of accident, line-ofbusiness and contracts involved, claims assessments and predictions by claims
adjusters, payments already done, etc.
tes
(m
4. T3 < t and T1 [U1 , U2 ]. Claim is settled, file is closed and stored and we
expect no further payments for that claim. Under some circumstances, it
may be necessary that a claim file is re-opened due to unexpected further
claims development. If this happens too often then the files are probably
closed too early and the claims settlement philosophy should be reviewed in
that particular company. If there is a systematic re-opening it may also ask
for a special provision for unexpected re-openings, for example, for contracts
which have a timely unlimited cover for relapses.
NL
no
To give statistical statements about insurance contracts and claims behavior, insurance companies build homogeneous groups and sub-portfolios to which a LLN
applies. In non-life insurance, contracts are often grouped into different business
lines such as private property, commercial property, private liability, commercial
liability, accident insurance, health insurance, motor third party liability insurance,
motor hull insurance, etc. If this classification is too rough it can further be divided
into sub-portfolios, for example, private property can be divided by hazard categories like fire, water, theft, etc. Often such sub-classes are built by geographical
markets and for different jurisdictions.
Once these (hopefully) homogeneous risk classes are built we study all claims that
belong to such a sub-portfolio. These claims are further categorized by the accident date. Claims that fall into the same accident period are triggered by similar
external factors like weather conditions, economic environment; therefore such a
classification is reasonable. Since the usual time scale for insurance contracts and
business consolidation is years, claims are typically gathered on the yearly time
scale. Therefore, we consider accounting years denoted by k N. All claims that
have accident dates T1 [1/1/k, 31/12/k] are called claims with accident year k.
We abbreviate the latter interval by [k, k + 1). These claims generate cash flows
which are also considered on the consolidated yearly level, i.e. all payments that
are done in the same accounting year are aggregated. This motivates the classical
Version April 14, 2016, M.V. Wthrich, ETH Zurich
229
w)
Thus, we consider all claims (of a given sub-portfolio) which have accident dates
T1 [i, i+1) = [1/1/i, 31/12/i], i.e. have the same accident year i. For these claims
we consider aggregate cash flows which are further sub-divided by their payment
delays denoted by j N0 and called development years. For instance,
payments in year [i, i + 1) for claims with accident year i;
Xi,1 =
Xi,j =
(m
Xi,0 =
1
..
.
tJ
X1,0
..
.
XtJ,0
..
.
X1,1
..
.
XtJ,1
..
.
NL
i
..
.
t1
t
development years j
...
j
no
accident
year i
tes
Xt,0
Xt,1
...
X1,J
..
.
XtJ,J
..
.
observations Dt
to be predicted Dtc
Xt,J
230
done in the same accounting year (and hence are influenced by the same external
factors like inflation). Therefore, we denote the accounting year payments by
Xk =
Xi,j =
i+j=k
tk
X
i=1(kJ)
J(k1)
Xi,ki =
Xkj,j .
(9.1)
j=0(kt)
(m
w)
At time t N we are liable for all claims that have occurred in accident years i t.
We call these claims past exposure claims. Some of these past exposure claims are
already settled (if the settlement date T3 t), others belong to either the class of
RBNS claims (if the reporting date T2 t but the settlement date T3 > t) or the
class of IBNYR claims (if the reporting date T2 > t).
On the aggregate level we have the following payment information at time t N
for past exposure claims
Dt = {Xi,j ; i + j t, 1 i t, 0 j J} .
(9.2)
tes
NL
no
This corresponds to the lower triangle in Table 9.2. This lower triangle Dtc is
called outstanding loss liabilities and it is the major object of interest. Namely,
these outstanding loss liabilities constitute the liabilities of the insurance company
originating from past premium exposures. In particular, the company needs to
build appropriate provisions so that it is able to fulfill these future cash flows.
These provisions are called claims reserves and they should satisfy the following
requirements:
the claims reserves should be evaluated such that it considers all relevant
(past) information;
the claims reserves should be a best-estimate for the outstanding loss liabilities adjusted for time value of money.
Basically, this means that we need to predict the lower triangle Dtc based on all
available information Ft Dt at time t. In particular, we need to define a stochastic
model on a filtered probability space (, F, P, F) (i) that allows to incorporate
past information Ft F through a filtration F = (Ft )t ; (ii) that reflects the
characteristics of past observations Dt ; (iii) that is able to predict future payments
of the outstanding loss liabilities Dtc ; and (vi) that is able to attach time values
of money to these future cash flows Xi,j , i + j > t. Of course, this is a rather
ambitious program and we will build such a stochastic model step-by-step.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
231
For the time-being we skip the task of attaching time values of money to cash flows
and we only consider nominal payments. The total nominal claims payments for
accident year i are given by
J
X
Xi,j =
j=0
Ni
X
Yl = Si .
(9.3)
l=1
(m
w)
Thus, for assessing the total claim amount Si of accounting year i we need to
describe the claims settlement process Xi,0 , . . . , Xi,J . In particular, we need to
predict the (unobserved) future claims cash flows of the outstanding loss liabilities
to quantify the total claim Si of accounting year i. Here, Si is measured on a
nominal basis, therefore we use the symbol = in the above identity (9.3),
see also Wthrich [98]. Moreover, we see that the total claim amount of a fixed
accounting year i is by far more complex than a compound distribution.
We assume that the latest observed accident/accounting year is t = I and we do
all considerations based on this (fixed) accounting year.
R=
X
i+j>I
tes
The (nominal) best-estimate reserves at time t = I > J for past exposure claims
are (under these model assumptions) defined by
X
E [Xi,j | FI ] =
E [Xi,j | FI ] ,
(i,j)IIc
no
and
IIc = I \ II .
NL
The set IIc exactly corresponds to the lower triangle DIc . (Ft )t0 is a filtration on
(, F, P) assuming that Xi,j is Fi+j -measurable for all (i, j) I.
The best-estimate reserves R are a predictor for the (nominal) outstanding loss
liabilities of past exposure claims at time t = I
X
Xi,j .
(i,j)IIc
232
9.2
9.2.1
no
tes
(m
w)
The title of this section contains the term algorithms. Initially, in the insurance industry, actuaries have designed algorithms that enable to determine claims
reserves R. These algorithms should be understood as mechanical guidelines to
obtain claims reserves. Only much later actuaries started to think about stochastic
models underlying these algorithms. In this section we present claims reserving
from this algorithmic point of view and in the next section we present stochastic
models that support these algorithms.
The two most popular algorithms are the so-called chain-ladder (CL) algorithm
and the Bornhuetter-Ferguson (BF) algorithm [16]. These two algorithms take
different viewpoints. The CL algorithm takes the position that the observations DI
are extrapolated into the lower triangle, the BF algorithm takes the position that
the lower triangle DIc is extrapolated independently of the observations DI using
expert knowledge. Depending on the line of business considered and the progress
of the claims development process one or the other method may provide better
predictions. Only actuarial experience may tell which one should be preferred in
which particular situation. Therefore, we are going to present both algorithms
from a rather mechanical point of view, because we cannot provide applied insight
to a given data set.
Chain-ladder algorithm
NL
For the study of the CL algorithm we need to define (nominal) cumulative payments
Ci,j =
j
X
Xi,l .
l=0
That is, we sum all payments Xi,l , l 0, for a fixed accident year i so that
ultimately we obtain Ci,J = Si , if Si denotes the total (nominal) claim amount
that corresponds to accident year i, see also (9.3). We call Ci,J ultimate (nominal)
claim of accident year i.
CL idea. All accident years i {1, . . . , I} behave similarly and for cumulative
payments we have approximately
Ci,j+1 fj Ci,j ,
(9.4)
for given factors fj > 0. These factors fj are called CL factors, age-to-age factors
or link ratios.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
233
Structure (9.4) immediately provides the intuition for predicting the ultimate claim
Ci,J based on observations DI , namely, choose for every accident year i the observation on the last observed diagonal, that is Ci,Ii , and multiply this observation
with the successive CL factors fIi , . . . , fJ1 .
fbCL
PIj1
X
Ci,j+1 Ij1
Ci,j
Ci,j+1
= Pi=1
=
.
PIj1
Ij1
C
C
C
i,j
i,j
n,j
i=1
n=1
i=1
(9.5)
(m
w)
The remaining difficulty is that, in general, the CL factors fj are not known and,
henceforth, need to be estimated. Assuming that a volume weighted average provides the most reliable results we set in view of (9.4)
tes
no
CL
Cbi,J
= Ci,Ii
fbjCL ,
(9.6)
j=Ii
n1
Y
CL
Cbi,n
= Ci,Ii
fbjCL ,
j=Ii
NL
for i + n > I.
cCL = C
b CL C
R
i,Ii = Ci,Ii
i,J
i
J1
Y
fbjCL 1 ,
j=Ii
and aggregated over all accident years we predict the (nominal) outstanding loss
liabilities of past exposure claims by the CL predictor
cCL =
R
I
X
cCL .
R
i
i=IJ+1
fbjCL
206704
67824
152703
132976
230288
104957
tes
207760
151797
262768
190653
273395
244899
225517
62124
36603
65444
88340
105224
development years j
3
4
5
6
65813
52752
53545
43329
7
14850
11186
8924
8
11130
11646
1.0778
9668212
9593162
9245313
8546239
8524114
9013132
8493391
7728169
7648729
1.0229
10563929
10316383
10092366
9268771
9178009
9585897
9056505
8256211
1.0148
10771690
10468180
10355134
9459424
9451404
9830796
9282022
1.0070
10978394
10536004
10507837
9592399
9681692
9935753
1.0051
11040518
10572608
10573282
9680740
9786916
1.0011
11106331
10625360
10626827
9724068
7
11121181
10636546
10635751
1.0010
1.0014
11132310
10648192
w)
(m
development years j
4
5
9
15813
9
11148124
Table 9.4: Observed cumulative payments Ci,j with (i, j) II and estimated CL factors fbjCL .
5946975
6346756
6269090
5863015
5778885
6184793
5600184
5288066
5290793
5675568
1
2
3
4
5
6
7
8
9
10
accident
year i
895717
723222
847053
722532
653894
572765
563114
528043
no
3721237
3246406
2976223
2683224
2745229
2828338
2893207
2440103
2357936
5946975
6346756
6269090
5863015
5778885
6184793
5600184
5288066
5290793
5675568
NL
1
2
3
4
5
6
7
8
9
10
accident
year i
234
Chapter 9. Claims Reserving
1
2
3
4
5
6
7
8
9
10
accident
year i
2
8470989
8243496
9129696
NL
9837277
10056528
9534279
8674568
8661208
9592313
tes
9419776
8570389
8557190
9477113
10005044
9485469
8630159
8616868
9543206
development years j
4
5
no
8445057
8432051
9338521
9734574
9847906
10067393
9544580
8683940
8670566
9602676
10646884
9744764
9858214
10077931
9554571
8693030
8679642
9612728
8
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383
total
1
2
3
4
5
6
7
8
9
10
100.0%
99.9%
99.8%
99.6%
99.1%
98.4%
97.0%
94.8%
88.0%
59.0%
CL
bIi
11148124
10664316
10662749
9761643
9882350
10113777
9623328
8830301
8967375
10443953
BF
bi,J
C
11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383
CL
bi,J
C
total
(m
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061
CL reserves
bCL
R
i
w)
16124
26998
37575
95434
178024
341305
574089
1318646
4768384
7356580
BF reserves
bBF
R
i
Table 9.6: Claims reserves from the BF method and the CL method.
prior
estimate
bi
accident
year i
CL
cCL .
Table 9.5: CL predicted cumulative payments Cbi,j
, (i, j) IIc , and estimated CL reserves R
i
0
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061
bCL
R
i
236
9.2.2
Bornhuetter-Ferguson algorithm
w)
(m
BF idea. All accident years i {1, . . . , I} behave similarly and payments approximately behave as
Xi,j j b i ,
(9.7)
for given prior information b i and given development pattern (j )j=0,...,J with norP
malization Jj=0 j = 1.
bCL
j
no
tes
The prior value b i should reflect an estimate for the total expected
ultimate claim E[Ci,J ] of accounting year i. It is assumed that this
prior value is given externally by expert opinion which, in theory,
should not be based on DI . There only remains to estimate the
development pattern (j )j . In view of the CL method, one defines
the following estimates for the development pattern:
=
J1
Y
l=j
fblCL
Qj1 bCL
l=0 fl
QJ1 bCL .
l=0
R.E. Ferguson
fl
NL
This ratio exactly reflects the proportion paid after the first j development periods
(according to the estimated CL pattern). Therefore, we define estimates
b0CL = b0CL ,
CL
bjCL = bjCL bj1
bJCL
for j = 1, . . . , J 1,
1 bCL
J1 .
Equipped with these estimates we predict the ultimate claim Ci,J for i + J > I in
the BF method by
J
X
BF
Cbi,J
= Ci,Ii + b i
CL
.
bjCL = Ci,Ii + b i 1 bIi
j=Ii+1
J
X
CL
bjCL = b i 1 bIi
,
j=Ii+1
(9.8)
237
I
X
cBF =
R
i
b i bjCL .
X
(i,j)IIc
i=IJ+1
fbjCL 1
j=Ii
J1
Y
j=Ii
1
.
fbjCL
(m
CL
Cbi,J
= Ci,Ii + Ci,Ii
J1
Y
w)
CL
CL
CL
Cbi,J
= Ci,Ii + 1 bIi
Cbi,J
,
tes
BF
CL
Cbi,J
= Ci,Ii + 1 bIi
b i .
NL
9.3
no
Thus, we see that we have the same structure. The only difference is that for
the BF method we use the external prior estimate b i for the ultimate claim and
CL
. Therefore, we have two
in the CL method the observation based estimate Cbi,J
complementing prediction positions, which exactly gives the explanation mentioned
in the introduction to Section 9.2. For further remarks (also detailed remarks on
the example in Tables 9.3-9.6) we refer to Wthrich-Merz [100].
In the previous section we have presented algorithms for the calculation of the
claims reserves R. Of course, we should also estimate the precision of these preP
dictions, i.e. by how much the true payouts (i,j)IIc Xi,j may deviate from these
predictions R, see also (1.9) and Smith-Thaper [93]. This brings us back to the
notion of risk measures of Section 6.2.4. In claims reserving, the most popular
risk measure is the conditional mean square error of prediction (MSEP) because it
c is a D can be calculated or estimated explicitly in many examples. Assume X
I
measurable predictor for the random variable X. The conditional MSEP is defined
by
msepX|DI
c
X
=E
2
c
X DI .
(9.9)
238
(9.10)
(m
w)
tes
NL
no
9.3.1
239
w)
(m
(b) j are independent and (j , fj (j 1))-distributed with given prior parameters fj > 0 and j > 1 for j = 0, . . . , J 1.
(c) and C1,0 , . . . , CI,0 are independent and Ci,0 > 0, P-a.s., for all i = 1, . . . , I.
tes
E [ Ci,j+1 | Ci,j , ] = 1
j Ci,j .
no
NL
(9.11)
h(DI , ) =
(i,j)II ,j1
j1
2
j1
Ci,j1
2
j1
Ci,j1
2
j1
g(C1,0 , . . . , CI,0 )
J1
Y
j=0
Ci,j1
1
2
j1
j1
exp 2 Ci,j
j1
Ci,j
(fj (j 1))j j 1
j
exp {j fj (j 1)} .
(j )
g(C1,0 , . . . , CI,0 ) denotes the density of the first column j = 0. Applying Bayes
rule provides for the posterior distribution of , conditionally given DI ,
PIj1
h(|DI )
J1
Y j +
i=1
Ci,j
2
j
PIj1
1 j fj (j 1)+
j=0
i=1
Ci,j+1
2
j
240
j |DI j +
Ij1
X
i=1
Ci,j
, fj (j 1) +
j2
Ij1
X
i=1
Ci,j+1
.
j2
w)
Corollary 9.3. Under Model Assumptions 9.1, the posterior Bayesian CL factors
are given by
h
i
def.
bCL + (1 )f ,
fbjBCL = E 1
D
I = j f j
j j
j
PIj1
j = PIj1
i=1
Ci,j
(0, 1).
+ j2 (j 1)
i=1
Ci,j
(m
(9.12)
tes
Proof. The proof is a straightforward application of the gamma distributional properties, namely
"
#
Ij1
X Ci,j+1
1
1
=
E j DI
fj (j 1) +
PIj1 Ci,j
j2
j + i=1
2 1
i=1
j
Remarks 9.4.
PIj1
Ci,j
j2
fj +
i=1
j 1 +
Ci,j
j2
PIj1
i=1
no
j 1
PIj1
j 1 + i=1
PIj1
i=1
Ci,j
j2
Ci,j+1
j2
PIj1
i=1
Ci,j
j2
.
2
NL
Lemma 9.2 and Corollary 9.3 are the key for the derivation of the reserves.
The result says that in the gamma-gamma Bayesian CL model the Bayesian
CL factors should be estimated by a credibility weighted average between the
classical CL estimate fbjCL and the prior estimate fj with credibility weight
j (0, 1). Moreover, for j 0, we can consider the product of these
estimates fbjBCL due to posterior independence, this will be highlighted in
more detail in Theorem 9.5, below.
The parameter j describes the degree of information contained in the prior
distribution of j . If we let j 1 (non-informative priors) we obtain
j 1. In this case we give full credibility to the observation based estimate,
i.e. we have fbjBCL fbjCL in the non-informative limit j 1.
Observe that the individual development factors Fi,j+1 = Ci,j+1 /Ci,j satisfy
the Bhlmann-Straub (BS) model, see Model 8.13: conditionally given j
Version April 14, 2016, M.V. Wthrich, ETH Zurich
241
j2 (j )
Ci,j
j2 2
j
Ci,j
(9.13)
.
(9.14)
plays
Thus, Ci,j plays the role of the volume measure and j2 () = j2 2
j
the role of the variance function. We calculate, see (8.4) and (8.5),
1
,
j 2
2 2 j 1
= E[j2 2
.
j ] = j fj
j 2
ej2
w)
j2 = Var((j )) = fj2
j =
(m
Therefore, Corollary 9.3 provides the classical BS formula and the structure
of the credibility weights is given by, see Theorem 8.17 and (9.12),
PIj1
Ci,j
.
Ci,j + j
tes
i=1
j = PIj1
i=1
no
Note that the BS formula requires j > 2 otherwise the credibility coefficient
j cannot be calculated. However, (9.12) is more general in this sense because
the second prior moment of 1
j does not need to exist for Corollary 9.3.
Theorem 9.5. Under Model Assumptions 9.1, the Bayesian CL predictor for Ci,J
with i + J > I is given by
NL
BCL
Cbi,J
= E [ Ci,J | DI ] = Ci,Ii
J1
Y
fbjBCL .
j=Ii
Proof. We use conditional independence between different accident years, the conditional Markov
property and the tower property to obtain
J1
Y
1
BCL
b
Ci,J
= E [ E [ Ci,J | Ci,0 , . . . , Ci,Ii , ]| DI ] = Ci,Ii E
j DI .
j=Ii
Using the posterior independence of Lemma 9.2 and Corollary 9.3 proves the claim.
Remark 9.6. Theorem 9.5 explains that our Model Assumptions 9.1 give the CL
reserves if we let the prior distributions of 1
become non-informative, i.e. for
j
j 1, j = I i, . . . , J 1, we have
BCL
CL
Cbi,J
.
Cbi,J
(9.15)
242
For this reason we can use the (non-informative prior) gamma-gamma Bayesian
CL model as a stochastic representation of the CL algorithm (9.6). This analogy
allows to study prediction uncertainty within Model Assumptions 9.1 for the CL
algorithm in an asymptotic sense.
For the conditional MSEP we obtain, see (9.10),
BCL
msepCi,J |DI Cbi,J
BCL
= Var (Ci,J | DI ) + E [ Ci,J | DI ] Cbi,J
2
= Var (Ci,J | DI ) .
j =
(m
w
BCL
This shows the optimality of the Bayesian CL predictor Cbi,J
within our model
assumptions and there remains the calculation of the conditional variance of the
ultimate claim Ci,J . We define (subject to being well-defined)
j2
j2 (j 2) +
PIj1
l=1
Cl,j
BCL
msepCi,J |DI Cbi,J
tes
Theorem 9.7. Under Model Assumptions 9.1 the Bayesian CL predictor satisfies
for i > I J
J1
X
BCL
= Cbi,J
j2
no
j=Ii
J1
Y
BCL
Cbi,J
n=j
fbnBCL (1 + n )
2
J1
Y
(1 + j ) 1 ,
j=Ii
NL
In1
Cl,n /n2 > 2 for all I i n
under the additional assumption that n + l=1
J 1; otherwise the second moment is infinite. The aggregated conditional MSEP
is given by
msepP
C |DI
i i,J
BCL
Cbi,J
BCL
msepCi,J |DI Cbi,J
+2
BCL b BCL
Cl,J
Cbi,J
J1
Y
(1 + j ) 1 ,
j=Ii
i<l
X
i
!
X
Ci,J DI
=
Cov (Ci,J , Cl,J | DI ) .
i,l
243
We calculate these covariance terms. Applying the tower property for conditional expectations
implies for i, l > I J
Cov (Ci,J , Cl,J | DI )
(9.16)
Ci,Ii
J2
Y
w)
Var (Ci,J | DI , )
2
2
1
J1
2
j
J1 + J1 Var ( Ci,J1 | DI , ) .
(m
j=Ii
Hence, we obtain the well-known recursive formula for the process variance in the CL method
(see Section 3.2.2 in Wthrich-Merz [100]). By iterating the recursion we find for given (see
also Lemma 3.6 in Wthrich-Merz [100])
Var (Ci,J | DI , )
J1
X
= Ci,Ii
j1
Y
2 2
1
m j j
J1
Y
2
n ,
(9.17)
n=j+1
tes
j=Ii m=Ii
where empty products are set equal to 1. Applying operator E[|DI ] to (9.17) and using the
posterior independence of the random variables j we obtain
J1
X
j1
Y
BCL 2
fbm
j
Ci,Ii
no
J1
X
j1
Y
BCL 2
fbm
j
j=Ii m=Ii
BCL
bi,J
C
J1
X
J1
Y
(fbnBCL )2
n=j
n 1 +
n 2 +
Cl,n
2
l=1
n
PIn1 Cl,n
2
l=1
n
PIn1
fbnBCL (1 + n ) .
n=j
NL
j=Ii
j2
J1
Y
DI
E 2
n
n=j
j=Ii m=Ii
J1
Y
PIn1
Note that in the second step we need n + l=1 Cl,n /n2 > 2 for all I i n J 1 so
that these conditional expectations are finite. For the second term in (9.16) we have, w.l.o.g. we
assume l i > I J,
J1
J1
Y
Y
1
1
Ii1
J1
J1
J1
Y
Y
Y
Y
1
2
1
1
= Ci,Ii Cl,Il E
j
j DI E
j DI E
j DI
j=Ii
j=Ii
j=Il
j=Il
J1
Y
BCL b BCL
bi,J
= C
Cl,J
(1 + j ) 1 .
j=Ii
244
Ij1
X
(9.18)
Cl,j ,
l=1
we obtain
0 j 1.
Note that assumption (9.18) is stronger than j + Ij1
Cl,j /j2 > 2 which provides
l=1
finiteness of conditional variances in Theorem 9.7. Assumption (9.18) for all j then
implies for the first term in Theorem 9.7
J1
X
Cb BCL
i,J
j2
J1
Y
fbBCL (1 +
n
n)
Cb BCL
i,J
n=j
J1
X
J1
Y
j2
BCL
Cbi,J
j2
J1
X
2
fbnBCL
n=j
j=Ii
(m
j=Ii
w)
j=Ii
BCL
Cbi,j
In fact, the right-hand side is a lower bound for the left-hand side for any j > 1
(where the second posterior moment exists). For the second term in Theorem 9.7
we have under (9.18)
BCL
Cbi,J
2
J1
Y
(1 + j ) 1
j=Ii
BCL
Cbi,J
tes
2 J1
X
j .
j=Ii
no
In fact, the right-hand side is again a lower bound for the left-hand side for any
j > 1.
This implies that under assumption (9.18) for all I i j J 1 we have
approximation
BCL
BCL
msepCi,J |DI Cbi,J
Cbi,J
2
J1
X
j2
BCL
Cbi,j
+ j ,
(9.19)
NL
j=Ii
where the right-hand side is a lower bound for the left-hand side for any j > 1.
Since the latter formula applies to any j > 1 (it can even be made uniform in
j 1) we can consider its non-informative limit j 1. In this case the Bayesian
CL predictor converges to the classical CL predictor, see (9.15). For the j -terms
we obtain in the non-informative limit
lim j = lim
j2
j2
j2
PIj1
,
Cl,j
(9.20)
where in the last step we have again used (9.18). In fact the last approximation is
again a lower bound. This motivates in the non-informative prior case j 1 the
following approximation and lower bound to (9.19) and Theorem 9.7, respectively,
j 1
j 1
j2 (j 2) +
PIj1
l=1
Cl,j
j2 +
PIj1
l=1
Cl,j
l=1
245
b CL = (C
b CL )2
msepMack
Ci,J |DI Ci,J
i,J
J1
X
s2j /(fbjCL )2
CL
Cbi,j
j=Ii
s2j /(fbjCL )2
+ PIj1
l=1
Cl,j
(9.21)
where we set j2 = s2j /(fbjCL )2 . The conditional MSEP formula (9.21) is exactly the
famous Mack formula [73]. We emphasis important remarks and differences:
Remarks 9.8.
(m
w)
tes
no
This implies that Macks model and our model are different and the derivations of the corresponding conditional MSEP formulas are different. However,
we have proved that under assumption (9.18) we expect that the numerical
results of the two approaches are very close. This will be justified in Example 9.9, below. Assumption (9.18) is fulfilled in many applied data sets
and, therefore, it is a relief because both methods come to similar conclusions
about prediction uncertainties in many applied situations.
NL
We use variance parameters j2 , Mack [73] uses variance parameters s2j . Their
relationship is justified by identity (9.11), see also (9.14).
The blue terms are the process uncertainty terms and the red terms are the
parameter estimation error terms in Macks formula. For more interpretation
we refer to Section 9.4, below, and to Merz-Wthrich [80].
For aggregated accident years, one has under assumption (9.18) approximation and
lower bound to Theorem 9.7 given by
!
Mack
msepP
C |DI
i i,J
Cb CL
i,J
b CL
msepMack
Ci,J |DI Ci,J
(9.22)
+2
X
i<l
J1
X
CL b CL
Cbi,J
Cl,J
j=Ii
s2j /(fbjCL )2
PIj1
n=1
Cn,j
246
Again, the red term describes the parameter estimation error in Macks formula
and for interpretation we refer to Merz-Wthrich [80].
tes
(m
w)
no
(9.24)
NL
sbj
b
j
135.25
90.62
33.80
31.36
15.76
15.41
19.85
19.56
9.34
9.27
2.00
1.99
0.82
0.82
0.22
0.22
0.06
0.06
These parameters provide the results for the square-rooted conditional MSEPs
given in Table 9.8. We observe that for the total claims reserves the 1 standard
deviation confidence bounds are about 7.7% of the total claims reserves. These
confidence bounds should now be put in relation to the point estimator in the
balance sheet of Table 9.1 for the claims reserves.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
1
2
3
4
5
6
7
8
9
10
covariance1/2
total
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061
msep1/2 msep1/2
Bayes
Mack
267
914
3058
7628
33341
73467
85399
134338
410850
116811
462990
in %
reserves
267
914
3058
7628
33341
73467
85398
134337
410817
116810
462960
1.8%
3.5%
8.9%
8.9%
21.3%
25.7%
19.0%
12.9%
10.4%
w)
CL reserves
cCL
R
i
(m
accident
year i
247
7.7%
tes
Table 9.8: Claims reserves and prediction uncertainty in the non-informative priors
gamma-gamma Bayesian CL model (see Theorem 9.7) and Macks formula (9.21)(9.22).
9.3.2
no
We also observe that the exact formula given by Theorem 9.7 with non-informative
priors and Macks formula (9.21)-(9.22) are very close, i.e. 462990 versus 462960.
This observation holds true for many typical non-life insurance data sets and it
says that both models (though being different) come to the same conclusion about
prediction uncertainty.
NL
248
Observe that
E[Xi,j ] = i j ,
Var(Xi,j ) = i j .
1 = 1
or
J
X
P.D. England
w)
(m
j=0
The first option is more convenient in the application of GLM methods, the second
option gives an explicit meaning to the pattern (j )j , namely, it corresponds to the
cash flow pattern.
The best-estimate reserves at time I are given by
X
E [ Xi,j | DI ] =
tes
R=
i j .
(i,j)IIc
(i,j)IIc
`DI (, , ) =
no
Hence, we need to estimate the parameters i and j . This is done with MLE
methods. We assume J + 1 = I which simplifies notation. Having observations DI
allows to estimate the parameters. The log-likelihood function for = (1 , . . . , I ),
= (0 , . . . , J ) and is given by
X
NL
(i,j)II
Calculating the derivatives w.r.t. and and setting them equal to zero implies
that we need to solve the following system of equations to find the MLEs
j
Ij
X
i=1
Ii
X
j=0
i =
j =
Ij
X
i=1
Ii
X
Xi,j
for all j = 0, . . . , J,
(9.25)
Xi,j
for all i = 1, . . . , I,
(9.26)
j=0
w.l.o.g. under side constraint Jj=0 j = 1. The remarkable fact about the MLE
system (9.25)-(9.26) is that it can be solved explicitly and that it provides the CL
reserves. Moreover, the constant dispersion parameter cancels and is not relevant
for estimating the reserves.
P
249
Theorem 9.11. Under Model Assumptions 9.10, the MLEs for and under
P
side constraint Jj=0 j = 1, given DI , are given by
CL
b MLE
= Cbi,J
i
and
bjMLE =
J1
Y
k=j
1
1
1 bCL ,
CL
b
fk
fj1
cCL .
bjMLE = R
i
j=Ii+1
w)
cODP =
b MLE
R
i
i
(m
Proof. For the proof we refer to Lemma 2.16, Corollary 2.18 and Remarks 2.19 in Wthrich-Merz
[100]. Basically, the proof goes by induction along the last observed diagonal in DI .
2
Remarks 9.12.
tes
Theorem 9.11 goes back to Hachemeister-Stanard [58], Kremer [67] and Mack
[72].
no
Theorem 9.11 explains the popularity of the ODP model for claims reserving
because it provides exactly the CL reserves. Thus, we have found a second
stochastic model (besides the non-informative prior gamma-gamma Bayesian
CL model) that can be used to explain the CL algorithm from a stochastic
point of view.
NL
In this ODP model we can also give an estimate for the conditional MSEP.
This estimate uses that MLEs can be approximated by standard Gaussian
asymptotic results for GLM. For details we refer England-Verrall [42] and
Wthrich-Merz [100], Section 6.4.3. Another way to assess prediction uncertainty is to use bootstrap simulation.
The ODP framework also allows to give an estimate for the conditional MSEP
P
in the BF method, and it justifies the choice bjCL = jk=0 bkMLE . For details
we refer to Alai et al. [3, 4].
9.4
9.4.1
In the previous sections we have given a static point of view of claims reserving.
However, claims reserving should be understood as a dynamic process, where more
and more information becomes available over time and prediction is continuously
Version April 14, 2016, M.V. Wthrich, ETH Zurich
250
adapted to this new knowledge. This is also the viewpoint that needs to be taken
for solvency considerations.
We consider the run-off situation, and thus the last accident year I is kept fixed.
In the run-off situation the flow of information (9.2) is changed to (we do a slight
abuse of notation here)
Dt = {Xi,j ; i + j t, 1 i I, 0 j J} .
w)
This generates a filtration denoted by (Dt )t0 on (, F, P) that describes the flow
of information (we abbreviate Dt = (Dt )). At time t I the ultimate claim of
accident year i is predicted by the best-estimate
(t)
Cbi,J = E [ Ci,J | Dt ] .
(9.27)
(t)
(t)
(m
This is the predictor that minimizes the conditional MSEP at time t. The bestestimate reserves at time t I for accident year i > t J are provided by
Ri = Cbi,J Ci,ti .
(9.28)
(t)
tes
(t+1)
(t)
(t+1)
= Cbi,J Cbi,J
(9.29)
no
CDRi,t+1 = Ri Xi,ti+1 + Ri
NL
M. Merz
Corollary 9.13. Assume Ci,J has finite first moment and i + J > t I. Then we
have
E [ CDRi,t+1 | Dt ] = 0.
This corollary explains that in average we neither expect losses nor gains in the
claims development result but the prediction is just unbiased. Note that (9.27)
Version April 14, 2016, M.V. Wthrich, ETH Zurich
251
DI .
(I+1)
= Var Cbi,J
(9.30)
9.4.2
w)
We aim to study the volatility of this one-period update. We do this in the gammagamma Bayesian CL Model 9.1.
(m
Firstly, observe that Lemma 9.2 easily extends to the following lemma (the proof
is an exercise).
Lemma 9.14. Choose t I. Under Model Assumptions 9.1, the posteriors of
0 , . . . , J1 are independent, conditionally given Dt , with
(tj1)I
tes
j |Dt j +
(tj1)I
X
Ci,j
Ci,j+1
,
f
(
1)
+
.
j
j
2
j
j2
i=1
i=1
= E [ Ci,J | Dt ] = Ci,ti
no
i,J
J1
Y
(t)
fbj ,
j=ti
(t)
with posterior expected Bayesian CL factors given by fbj = E[1
j |Dt ].
NL
Here, we slightly change notation, the upper index now indicates the time point t
of the available information Dt .
Next we exploit the recursive structure of credibility estimators, see for instance
Corollary 8.6. This holds true in quite some generality, for the current exposition
we restrict to t {I, I + 1} because these are the only indexes of interest for the
analysis of (9.30). For t = I + 1 and j 0 we have (in the last step we use the
calculation of the proof of Corollary 9.3)
(I+1)
fbj
= E
i
1
j DI+1
fj (j 1) +
=
CIj,j+1
j2
j 1 +
(I)
= j
PIj Ci,j
i=1 2
j
j 1 +
PIj Ci,j+1
i=1
i=1 2
j
fj (j 1) +
+
j2
PIj Ci,j
j 1 +
PIj1 Ci,j+1
i=1
PIj Ci,j
CIj,j+1
(I)
(I)
+ 1 j fbj ,
CIj,j
i=1 2
j
j2
252
(I)
j =
j2
CIj,j
(0, 1).
P
(j 1) + Ij
i=1 Ci,j
(I+1)
w)
The important observation is that there is only one random term in fbj
, conditionally given DI . This is crucial in the calculation of the conditional MSEP of the
claims development result prediction. We start with a technical lemma.
Var (Ci,Ii+1 | DI ) =
(m
Cbi,Ii+1
Pi1
l=1
(I)
(Ii )1 Ii ,
2
> 2; otherwise the
Cl,Ii /Ii
tes
2
Proof. In the first step we apply Theorem 9.7 for J 1 = I i and then we derive
(I)
(I)
2
2
= Ci,Ii Ii
(fbIi )2 (1 + Ii ) + Ci,Ii
(fbIi )2 Ii
2 2
Ii
b (I)
C
=
(1 + Ii ) + Ii
i,Ii+1
Ci,Ii
2 2
Ii
b (I)
+
1
(1
+
1
.
=
C
Ii
i,Ii+1
Ci,Ii
no
Var (Ci,Ii+1 | DI )
NL
2
2
2
2
Ii
Ii
Ii
Ii
+
+
P
Pi1
i1
2 (
2 (
Ci,Ii
Ci,Ii Ii
Ii
Ii 2) +
Ii 2) +
l=1 Cl,Ii
l=1 Cl,Ii
Pi1
2
2 (
2) + l=1 Cl,Ii + Ci,Ii + Ii
2
Ii Ii
Ii
P
i1
2 (
C
Ci,Ii Ii
Ii 2) +
l,Ii
l=1
(I)
2
Ii
(Ii )1
(I)
= (Ii )1 Ii .
Pi1
2 (
Ii
2)
+
C
Ii
l=1 l,Ii
253
Theorem 9.16. Under Model Assumptions 9.1 the Bayesian CL predictor satisfies
i>I J
msepCDRi,I+1 |DI (0) =
(I)
(Cbi,J )2
1 + ( (I) )1 Ii
J1
Y
Ii
1+
(I)
j j
1 ,
j=Ii+1
J1
Y
(m
w
+2
(I)
(I)
Cbi,J Cbl,J (1 + Ii )
(I)
1 + j j 1 ,
j=Ii+1
i<l
Var
b (I+1) DI
C
i,J
b (I+1) , C
b (I+1) DI .
Cov C
i,J
l,J
i,l
tes
msepP
J1
Y
J1
Y
(I+1)
fbj
= Ci,Ii+1
j=Ii+1
j=Ii+1
(I) b(I)
(I) CIj,j+1
+ 1 j
fj
.
j
CIj,j
NL
no
The only random terms under the measure P[|DI ] are Ci,Ii+1 , Ci1,Ii+2 , . . . , CIJ+1,J . All
these random variables belong to different accident years i and to different development periods
j. Therefore, they are independent given DI , this follows from Model Assumptions 9.1 and
Lemma 9.2. Moreover, we have the following unbiasedness of successive estimations (use the
tower property)
i
h
(I) CIj,j+1
(I) b(I)
(I+1)
(I)
E j
+ 1 j
fj DI = E fbj
DI = fbj .
CIj,j
In the first step we decouple the covariance as follows
i
h
b (I+1) , C
b (I+1) DI = E C
b (I+1) C
b (I+1) DI C
b (I) C
b (I)
Cov C
i,J
i,J
i,J l,J ,
l,J
l,J
with
i
h
b (I+1) C
b (I+1) DI = E Ci,Ii+1
E C
i,J
J1
Y
l,J
j=Ii+1
(I+1)
fbj
(I+1)
Cl,Il+1
fbm
DI .
m=Il+1
J1
Y
We first treat the variance case i = l. In that case we have using conditional independence
2
J1
i
h
Y
(I) b(I)
(I) CIj,j+1
b (I+1) )2 DI
DI
+ 1 j
fj
E (C
= E (Ci,Ii+1 )2
j
i,J
CIj,j
j=Ii+1
#
"
2
Y
J1
(I) CIj,j+1
(I) b(I)
= E (Ci,Ii+1 )2 DI
E
j
+ 1 j
fj
DI ,
CIj,j
j=Ii+1
254
w)
which allows to calculate each term individually. Unbiasedness and Lemma 9.15 for i = I j
imply for these individual terms
i
h
(I) CIj,j+1
(I+1) 2
(I) b(I)
(I)
= Var j
E (fbj
) DI
+ 1 j
fj DI + (fbj )2
CIj,j
!2
(I)
j
(I)
=
Var (CIj,j+1 | DI ) + (fbj )2
CIj,j
!2
(I)
2
j
(I)
(I)
b (I)
=
C
(j )1 j + (fbj )2
Ij,j+1
CIj,j
(I)
(I)
= (fbj )2 j j + 1 .
Similarly we have for the first term
2
b (I)
Var (Ci,Ii+1 | DI ) + C
i,Ii+1
2
(I) 1
b (I)
C
(
)
+
1
.
Ii
i,Ii+1
Ii
(m
E (Ci,Ii+1 )2 DI
=
= Cl,Il
Ii1
Y
tes
Collecting all the terms proves the statement for i = l. There remains the case of different
accident years. W.l.o.g. we assume i < l which implies I i + 1 > I l + 1. This and conditional
independence, given DI , imply for the covariance between these accident years
J1
J1
i
h
Y
Y
b(I+1) Cl,Il+1
b(I+1) DI
b (I+1) DI
b (I+1) C
Ci,Ii+1
=
E
E C
f
f
m
j
i,J
l,J
j=Ii+1
m=Il+1
i
h
(I+1) 2
E (fbj
) DI
j=Ii+1
m=Il
h
i
(I+1)
(I)
Cov Ci,Ii+1 , fbIi DI + Ci,Ii (fbIi )2
no
b (I)
= C
l,Ii
J1
Y
i
h
(I+1)
(I)
fbm
E Ci,Ii+1 fbIi DI
b (I) [Ii + 1]
b (I) C
= C
i,J
l,J
J1
Y
(I)
J1
Y
(I)
(I)
j j + 1 (fbj )2
j=Ii+1
j j + 1 .
j=Ii+1
NL
We study the conditional MSEP formula of the claims development result under
assumption (9.18). This assumption implies again that 0 j 1. Moreover, we
(I)
have j (0, 1) from which we see that (9.18) implies
(I)
0 j j 1.
The other term in Theorem 9.16 is more sophisticated. We have from the proof of
Lemma 9.15
(I)
(Ii )1 Ii
2
Ii
+ 1 (1 + Ii ) 1.
Ci,Ii
(9.31)
255
0 (j )1 j 1.
Moreover, we get approximation (and lower bound) under (9.18) and (9.31)
2
Ii
+ Ii .
Ci,Ii
(I)
(Ii )1 Ii
This implies that under assumptions (9.18) and (9.31) we obtain approximation
w)
J1
X
2
(I)
(I)
msepCDRi,I+1 |DI (0) (Cbi,J )2 Ii + Ii +
j j ,
Ci,Ii
j=Ii+1
(9.32)
(m
where the right-hand side is a lower bound for the left-hand side for any j > 1.
tes
l=1
Cl,j
j 1
no
(I)
lim j = lim
j 1 2
j
(9.33)
NL
This motivates in the non-informative prior case j 1 the following approximation and lower bound to (9.32) and Theorem 9.16, respectively,
msepMW
CDRi,I+1 |DI
"
(0) =
(Cb CL )2
i,J
CL 2
s2Ii /(fbIi
)
Ci,Ii
(9.34)
2
J1
bCL 2
CL 2
X
s2Ii /(fbIi
)
e(I) sj /(fj )
+ P
+
,
P
j
i1
Ij1
Cl,j
l=1 Cl,Ii
j=Ii+1
l=1
where we set j2 = s2j /(fbjCL )2 . This is the Merz-Wthrich (MW) formula, see (3.17)
in [78]. We also refer to Bhlmann et al. [23] and Merz-Wthrich [80].
Remarks 9.17.
Concerning derivation and stochastic model choice for the MW formula (9.34)
the same Remarks 9.8 apply as for Macks formula (9.21).
Version April 14, 2016, M.V. Wthrich, ETH Zurich
256
w)
Macks formula (9.21) is often called total run-off uncertainty and the MW
formula (9.34) corresponds to the one-year run-off uncertainty. Comparing
these two formulas we observe that from the total run-off uncertainty the first
blue term with index j = I i also appears in the one-year run-off uncertainty.
This is the process variance in period j = I i. From the red terms, the
first red term with index j = I i appears (parameter uncertainty) and the
remaining red terms j I i + 1 of the summation in (9.21) are scaled
(I)
with factor ej (0, 1) to obtain the one-year run-off uncertainty . These
scalings reflect the release of parameter uncertainty when new information (a
new diagonal in the claims development triangle) arrives.
The same interpretation applies to (9.32) versus (9.19).
P
msepMW
CDRi,I+1 |DI (0) =
i
msepMW
CDRi,I+1 |DI (0)
"
X
Cb CL Cb CL
i,J
l,J
i<l
(9.35)
2
J1
bCL )2
CL 2
X
)
s2Ii /(fbIi
(I) s /(f
ej PjIj1j
.
+
Pi1
Cn,j
n=1 Cn,Ii
j=Ii+1
n=1
tes
+2
(m
Example 9.18. We revisit claims reserving Example 9.9 and calculate the claims
development result uncertainty. We consider the non-informative prior case and
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061
NL
2
3
4
5
6
7
8
9
10
total
total msep1/2
Mack (9.21)
CDR msep1/2
MW (9.34)
CDR/total
msep1/2
267
914
3058
7628
33341
73467
85398
134337
410817
462960
267
884
2948
7018
32470
66178
50296
104311
385773
420220
100%
97%
96%
92%
97%
90%
59%
78%
94%
91%
no
CL reserves
b CL
R
i
Table 9.9: Claims reserves and prediction uncertainty: Macks formula (9.21)-(9.22)
for the total run-off uncertainty and MW formula (9.34)-(9.35) for the one-year
claims development uncertainty.
257
w)
The results are presented in Table 9.9. We see that in this example the one-year
claims development result uncertainty measured by the square-rooted conditional
MSEP results in 91% of the total run-off uncertainty. The reason for this high
value is that knowing the next diagonal in the claims development triangle already
releases a major part of the claims run-off risks. For the next accounting year
we predict payments of 3873205 which is almost 2/3 of the total claims reserves,
i.e. we expect a rather fast claims settlement in this example and a fast decrease
of run-off uncertainties. Typically, the square-rooted conditional MSEP of the
claims development result is in the range of 50% to 95% relative to the total runoff uncertainty, the former relates to liability insurance and the latter to property
insurance.
no
9.4.3
tes
(m
Exercise 26 (Italian motor third party liability insurance example). We revisit the
Italian motor third party liability insurance example of Bhlmann et al. [23]. The
field study considers 12 12 run-off triangles of 37 Italian insurance companies at
the end of 2006. For these data the claims reserves and the corresponding squarerooted conditional MSEPs for the total run-off uncertainty and for the one-year
claims development result uncertainty using Macks formula (9.22) and the MW
formula (9.35), respectively, were calculated. The results are presented in Table
9.10. Note that for confidentiality reasons the volumes of the 4 biggest companies
were all set equal to 100.0 and the order of these 4 companies is arbitrary.
Give interpretations to these results.
NL
Note that in Theorem 9.16 and in the MW formula (9.34)-(9.35) we have only
derived the uncertainties in the next accounting year I + 1. A natural question is
what can we say about the individual uncertainties in all future accounting years?
This is exactly the question answered in Merz-Wthrich [80]. We would like to
briefly summarize these results (without proofs) because they give further insight
in the run-off of risk behavior of claims development triangles.
We consider the total prediction error as a telescoping sum of successive claims
(i+J)
development results. Note that we have Cbi,J = Ci,J , P-a.s., because this ultimate claim is observable at time t = i + J. This and the definition of the claims
development result imply for the total prediction error at time t = I
(I)
Cbi,J Ci,J =
i+J
X
k=I+1
(k1)
Cbi,J
(k)
Cbi,J =
i+JI
X
CDRi,I+k ,
k=1
for i > I J, see (9.29). This telescoping sum describes all innovations of the
claims development process. These innovations have mean zero (martingale), see
Corollary 9.13. This immediately implies that they are uncorrelated. Under the
assumption that the second moment exists, uncorrelatedness provides the following
Version April 14, 2016, M.V. Wthrich, ETH Zurich
258
CDR msep1/2
CDR msep1/2
total msep1/2
volume in %
(in % reserves)
(in % reserves)
(in %)
100.0
100.0
100.0
100.0
61.8
56.9
53.0
49.4
46.2
41.6
..
.
4.03
2.90
2.41
3.45
3.66
5.54
4.52
4.60
5.61
5.32
..
.
3.24
2.36
1.98
2.85
3.04
4.50
3.70
3.82
4.59
4.36
..
.
80.4
81.4
82.3
82.6
82.9
81.2
81.8
83.1
81.8
82.0
..
.
3.5
3.4
2.6
2.5
2.2
2.0
1.8
1.8
18.02
17.23
18.73
23.11
20.83
17.01
26.16
27.79
0.96
14.78
13.92
14.89
19.10
17.53
13.87
21.54
22.25
0.78
82.0
80.8
79.5
82.6
84.2
81.5
82.4
80.1
81.8
30
31
32
33
34
35
36
37
total
(m
1
2
3
4
5
6
7
8
9
10
..
.
w)
business
tes
company
no
Table 9.10: Italian motor third party liability insurance example of Bhlmann et
al. [23]. Prediction uncertainties: Macks formula (9.22) for the total run-off uncertainty and MW formula (9.35) for the one-year claims development uncertainty.
NL
(I)
=
=
=
i+JI
X
k=1
i+JI
X
k=1
i+JI
X
Var (CDRi,I+k | DI )
msepCDRi,I+k |DI (0)
h
(9.36)
k=1
The first line of (9.36) describes the total run-off uncertainty over the entire settlement period of the claims; the second line considers the claims development
result volatilities based on todays knowledge DI ; and the third line considers the
expected one-year run-off uncertainties of all future periods. Thus, formula (9.36)
exactly explains how the total run-off uncertainty needs to be split (dynamically)
across all future development periods. In Theorem 9.16 and the MW formula (9.34)
Version April 14, 2016, M.V. Wthrich, ETH Zurich
259
we have only derived the first term with index k = 1 of this sum on the right-hand
side (in the gamma-gamma Bayesian CL model).
In the non-informative prior gamma-gamma Bayesian CL model all terms k =
1, . . . , i + J I can be estimated, see Merz-Wthrich [80], and these estimates have
exactly the same structure as the MW formula (9.34). They are estimated by
MW
b
msepMW
CDRi,I+k |DI (0) = E msepCDRi,I+k |DI+k1 (0) DI
(9.37)
CL
2
bCL
s2
Y
)2 k1
/(fbIi+k1
s2
Ii+k1 /(fIi+k1 )
e(I)
Ii+k1
+
1
P
Ii+m
ik
CL
Cbi,Ii+k1
m=1
l=1 Cl,Ii+k1
CL 2
Cbi,J
CL
+ Cbi,J
2
J1
X
e(I)
k2
Y
jk+1
m=0
s2 /(fbCL )2
(I)
j
.
1 ejm PjIj1
l=1
Cl,j
(m
j=Ii+k
w)
def.
no
tes
MW
P
MW
I+k|I = msep I
CDRi,I+k |DI
i=IJ+k
k1
Y
I
X
(0) =
msepMW
CDRi,I+k |DI (0)
i=IJ+k
2
bCL
Ii+k1 /(fIi+k1 )
(9.38)
Pik
n=1 Cn,Ii+k1
m=1
i<l
J1
k2
s2 /(fbCL(I) )2
X
X
Y
j
(I)
CL b CL
e(I)
,
2
Cbi,J
Cl,J
1 ejm Pj Ij1
jk+1
Cn,j
m=0
n=1
i<l
j=Ii+k
CL b CL
Cbi,J
Cl,J
(I)
1 eIi+m
s2
NL
+2
+
260
> plot(tri,ylab="",main="")
> plot(tri,lattice=TRUE,ylab="",main="")
# calculating the CL reserves and the corresponding MSEPs
> M <- MackChainLadder(tri,est.sigma="Mack")
CL reserves and Macks formula (9.21)-(9.22) including illustrations
M
plot(M)
plot(M,lattice=TRUE)
#
>
>
>
>
(m
w)
#
>
>
>
tes
no
Example 9.19. We revisit Example 9.9 and calculate for this example the full runoff uncertainty picture using (9.37)-(9.38). We start by illustrating the data using
the above R commands. This provides Figure 9.2. The graphs show that the data
2
1
2
3
6
5
4
NL
1
2
3
1
2
3
1
2
3
6
4
5
7
4
5
7
3
2
5
4
3
2
3
2
10
10
10
8
7
8
11
in 1'000'000
4
5
7
11
8
9
10
8
7
in 1'000'000
10
11
10
11
2
3
6
1
4
5
0
7
10
9
8
9
8
10
10
dev. period
dev. period
Figure 9.2: Illustration of the data of Table 9.4 (the labeling of the development
year axis is shifted by 1).
261
is rather regular, with a small decrease of volume over accident years. Moreover,
most of the payments are done in the first two development years j = 0, 1.
10
Forecast
Latest
9
8
Amount
6
4
5
7
1
3
2
1
3
2
5
4
1
2
1
3
2
8
9
Amount
1
2
3
6
5
4
1
2
3
6
4
5
7
1
2
3
1
2
3
6
4
5
7
2
3
6
1
4
5
0
7
9
8
10
Mack's S.E.
10
10
11
10
2
Origin period
10
Development period
9.0
10.0
10.5
2
1
11.0
Fitted
10
11
Amount
9.5
w)
8.5
8.0
1
0
Standardised residuals
Standardised residuals
Origin period
10
7
6
Calendar period
(m
Standardised residuals
10
11
Standardised residuals
10
Development period
Development period
tes
NL
no
In Figure 9.3 the graphs of Figure 9.2 are complemented by the predicted payments
in the lower triangle. These graphs also include the 1 standard deviation confidence
bounds (top left and right-hand side). Moreover, Figure 9.3 (lhs) provides residuals
in the direction of all three time axes and ordered by the size of the observations.
These residuals should not show any trends in one of the (time) axis. We see that
there might be some problem in the accident year direction. The decrease in the
accounting/calendar year direction should not be overstated because the first two
years contain rather scarce information.
Finally, in Table 9.11 we provide the full run-off picture. This table summarizes in
the 5th column the expected future accounting year cash flows for t > I
X
i+j=t
E [Xi,j | DI ] =
Ci,Ii
i+j=t
j2
Y
(I)
fbl
(I)
fbj1 1 ,
l=Ii
and in the 2nd column the corresponding expected run-off of the claims reserves
for t I
h
i
X
E R(t) DI =
E [Xi,j | DI ] .
i+jt+1
Moreover, the table provides in the 6th column the square-rooted expected one1/2
year uncertainties (MW
for t I and in the 3rd column the expected run-off
t+1|I )
of the total uncertainty calculated as
1/2
X
MW
,
s+1|I
st
262
6047061
2173856
1048144
570584
293063
148951
67824
36036
13655
0
462960
194285
122813
79758
32397
7739
2906
769
191
0
in %
reserves
expected
cash flows
P
E[Xi,j |DI ]
1/2
(MW
t+1|I )
3873205
1125712
477560
277521
144112
81127
31788
22381
13655
420220
150544
93390
72882
31459
7172
2803
744
191
0
i+j=t
8%
9%
12%
14%
11%
5%
4%
2%
1%
(m
10
11
12
13
14
15
16
17
18
19
rooted exp.
run-off of
MSEP
w)
accounting
years t
exp. run-off
of reserves
E[R(t) |DI ]
NL
no
tes
where the first term t = I corresponds to the square-rooted Mack formula (9.22).
We conclude that we now have the full run-off picture, the 2nd column displays the
expected run-off of the claims reserves and the 3rd column provides the expected
run-off of the prediction uncertainty (measured by the square-rooted remaining
conditional MSEPs). This is in particular of interest for risk margin calculations
in solvency considerations.
Chapter 10
w)
Solvency Considerations
NL
no
tes
(m
In the previous chapters we have mainly discussed the modeling of insurance contracts, the related liability cash flows and
the implications for tariffication. If we remind of the discussion
in Chapter 1, we recall that the insurance company organizes
the equal balance within the community. That is, it issues insurance contracts at a fixed premium and in return it promises
to cover all (financial) claims that fall under these contracts.
Of course, we need to make sure that the insurance company
can keep its promises. This is exactly the crucial task of supervision (regulation) and sound risk management practice. Regulation aims to
protect the policyholder in that it enforces (by law) the insurance company to
follow good risk management practice. Companies should be sufficiently well capitalized so that they can fulfill their promises also under certain stress scenarios.
This is exactly what we would like (and need) to study in the present chapter.
We have already touched this issue in Chapter 5 on ruin theory. The main purpose
of Chapter 5 was to explain that there is a huge difference in ruin behavior between
light tailed and heavy tailed claims. Beyond that insight the random walk model of
Chapter 5 is much too simple to reflect real world insurance problems. Therefore,
we modify the ultimate ruin probability considerations so that they reflect the
current risk management task. In a first step we will discuss more general risk
management views, for a comprehensive discussion we refer to Wthrich-Merz [101],
and in a second step we discuss more explicitly the solvency and risk management
implementations used in the insurance industry.
10.1
264
deposits
policyholder deposits
reinsurance deposits
borrowings
money market
hybrid debt
convertible debt
insurance liabilities
claims reserves
premium reserves
annuities
derivatives
(m
w)
assets
tes
no
Table 10.1: Balance sheet of a non-life insurance company at a fixed point in time.
NL
Table 10.1 presents a snap shot of a non-life insurance companys balance sheet,
that is, it reflects all positions at a certain moment in time t R+ . The left
hand side shows the assets at time point t and the right hand side should show the
liabilities at the same time point t. We denote the value of the assets at time t by
At , and Lt denotes the value of the liabilities at time t.
In the language of Chapter 5, we can think of At denoting all asset values in the
company at time t. These comprise the initial capital, all premia received and all
other amounts received minus the payments done up to time t. These amounts
are invested at the financial market and, thus, are allocated to the different asset
classes displayed in Table 10.1. On the other hand, the liabilities Lt reflect the
value of all obligations accepted by the insurance company that are still open at
time t.
In a similar context to the ruin theory Chapter 5, we should have At Lt in
order to cover the liabilities by asset values at time t. In fact, we may study the
continuous time surplus process (Cet )tR+ , given by Cet = At Lt , which should
Version April 14, 2016, M.V. Wthrich, ETH Zurich
265
P inf Cet 0 Ce0 = c0 = Pc0 inf At Lt 0 1 p.
tR+
tR+
(10.1)
(m
w)
tes
for a given large probability 1p (0, 1) the initial capital c0 and the asset strategy
should be chosen such that
Pc0 [A1 L1 ] = Pc0 [L1 A1 0] 1 p.
(10.2)
no
This means that we need to choose the initial capital c0 and the asset strategy, which
maps value A0 at time 0 to value A1 at time 1, such that the (given stochastic)
liabilities L1 can be covered with large probability at time 1. Note that A1 and L1
are, in general, not independent.
NL
Step 2 (risk measure). The no ruin condition in (10.2) is described under the
Value-at-Risk risk measure VaR1p (L1 A1 ) on security level 1 p (0, 1), see
Example 6.25. Assume we have a normalized, monotone and translation invariant
risk measure %, see (6.12), then more generally
the initial capital c0 and the asset strategy should be chosen such that
% (L1 A1 ) 0.
(10.3)
Solvency II uses the VaR risk measure on the 1 p = 99.5% security level and the
Swiss Solvency Test (SST) uses the TVaR risk measure on the 1p = 99% security
level, see also Examples 6.25, 6.20 and 6.26. The main aspect is now concerned
with the stochastic modeling of position L1 A1 .
266
w)
Step 3 (market(-consistent) values). The main difficulty is the stochastic modeling of L1 A1 . Some positions in this difference are traded at active financial
markets. For these positions we need to stochastically model their market prices
at time 1 (viewed from time 0). However, most positions (on the liability) side
of the balance sheet are not traded at active markets. For these positions we
need to determine market-consistent values of their stochastic developments in a
marked-to-model approach, see also Happ et al. [59]. Let us explain the rationale
behind this with the liabilities L1 at hand and using the claims reserving context
of Chapter 9.
Assume we can split the liabilities L1 into two elements:
(m
(i) payments X1 done at time 1 (similar to Section 9.1 we map all payments in
accounting year [1, 2) = [1/1/1, 31/12/1] to its endpoint);
(ii) outstanding loss liabilities L+
1 at time 1 (at the end of accounting year 1).
The liabilities at time 1 are then given by
tes
L1 = X 1 + L+
1.
no
The easier part is the modeling of X1 . We need to find a stochastic model that is
able to predict the payments X1 and capture the dependencies with A1 and L+
1.
+
The more complicated part is L1 . This amount should reflect a market-consistent
value for the outstanding loss liabilities at time 1. Observe that it differs from the
best-estimate reserves R(1) given in (9.28) in two crucial ways:
(1) The best-estimate reserves R(1) were calculated on a nominal basis, i.e. the
time value of money was not considered because no discounting was applied
to R(1) .
NL
267
blocks are not independent). In the last step, we need to evaluate risk measure
condition (10.3). If this condition is fulfilled we have an acceptable balance sheet
and the company is solvent at time 0 w.r.t. the chosen risk measure %. If (10.3)
is not fulfilled we have an unacceptable balance sheet and it needs to be modified
to achieve solvency. Options for modification are the following: change the asset
strategy so that it matches better the liabilities; reduce liabilities and mitigate
uncertainties in liabilities (if possible); inject more initial capital c0 .
w)
In the remainder of this chapter we discuss the modeling of the asset deficit at time
t = 1, where the asset deficit is for t N0 defined by
def.
ADt = Lt At = Xt + L+
t At .
(10.4)
(m
Thus, the insurance company is solvent at time 0 (w.r.t. the risk measure %) if
% (AD1 ) = % (L1 A1 ) 0.
Risk modules
tes
10.2
NL
no
Typically the modeling of the asset deficit AD1 at time t = 1, defined in (10.4), is
split into different modules that reflect different risk classes. In a first step each
risk class is studied individually and in a second step the results are aggregated to
obtain the overall picture.
Figure 10.1: lhs: Swiss Solvency Test risk modules; rhs: Solvency II risk modules
(sources [26] and [44]).
One may question whether this modeling approach is smart. Modeling individual risk classes may still be fine, but aggregation of risk classes is rather nonstraightforward because it is very difficult to capture the interaction between the
different risk classes. Nevertheless we would like to describe the approach used in
practice (and also the short cuts applied).
Version April 14, 2016, M.V. Wthrich, ETH Zurich
268
In Figure 10.1 we show the individual risk modules used in the Swiss Solvency Test
[26] and in Solvency II [44]. Overall they are rather similar though some differences
exist. Often one considers the following 4 risk classes that are driven by the risk
factors that we will just describe:
w)
1. Market risk. We cite SCR.5.1. of QIS5 [44]: Market risk arises from the level
or volatility of market prices of financial instruments. Exposure to market
risk is measured by the impact of movements in the level of financial variables
such as stock prices, interest rates, real estate prices and exchange rates.
(m
2. Insurance risk. Insurance risk is typically split into the different insurance branches: non-life insurance, life insurance, health insurance and reinsurance. Here we concentrate on non-life insurance risk. This is further
subdivided into (i) reserve risk which describes outstanding loss liabilities of
past exposure claims; and (ii) premium risk which describes the risk deriving
from newly sold contracts that give an exposure over the next accounting period. Additionally, there is often an annuity portfolio deriving from liability
insurance covering disability claims of third party.
no
tes
3. Credit risk. We cite SCR.6.1. of QIS5 [44]: The counterparty default risk
module should reflect possible losses due to unexpected default, or deterioration in the credit standing, of the counterparties and debtors of undertakings
over the forthcoming twelve months. The scope of the counterparty default
risk module includes risk-mitigating contracts, such as reinsurance arrangements, securitisations and derivatives, and receivables from intermediaries,
as well as any other credit exposures which are not covered in the spread risk
sub-module.
NL
4. Operational risk. We cite SCR.3.1. of QIS5 [44]: Operational risk is the risk
of loss arising from inadequate or failed internal processes, or from personnel
and systems, or from external events. Operational risk should include legal
risks, and exclude risks arising from strategic decisions, as well as reputation
risks. The operational risk module is designed to address operational risks to
the extent that these have not been explicitly covered in other risk modules.
Let us formalize these risk factors and classes. Therefore, we first consider the
beginning of accounting year 1. At time t = 0 the asset deficit is given by
AD0 = L0 A0 .
+
We assume that X0 = 0 which implies that L0 = L+
0 , thus, L0 is the value of all
liabilities that need to be settled after t = 0. For simplification we assume that
the liabilities consist of insurance liabilities only. In this case, L+
0 describes the
liabilities stemming from claims with accident date prior to t = 0 (these are the
liabilities of past exposure claims; we denote them by previous year (PY) claims,
269
see also Chapter 9), and of claims with accident date in accounting year 1 (these
are all liabilities of the new premium exposure if we assume one-year contracts
only; we denote them by current year (CY) claims). Summarizing, this implies on
the liability side of the balance sheet at time t = 0 (with the obvious notation)
PY
CY
L0 = L+
0 = L0 + L0 .
On the asset side of the balance sheet we have (this is also a simplified version)
w)
CY
,
A0 = c0 + APY
0 +
L1 = X1 + L+
1 =
(m
CY
where APY
are the provisions to cover the PY liabilities LPY
is the premium
0
0 ,
CY
received for the CY claims L0 and c0 is the initial capital. As described above,
this amount A0 is invested at the financial market and provides value A1 at time
t = 1. This value needs to be compared to
AD1 =
no
tes
where X1PY are the payments for PY claims, X1CY are the payments for CY claims,
L+,PY
is the value of the outstanding loss liabilities at time t = 1 for claims with
1
is the value of the outstanding loss liabilities
accident year prior to t = 0, and L+,CY
1
at time t = 1 for CY claims (i.e. accident date in year 1). Thus, if we merge these
+,PY
we obtain the new outstanding loss liabilities for
+ L+,CY
two values L+
1
1 = L1
past exposure claims with accident date prior to t = 1. Finally, X1Op denotes
the operational risk loss payment where, for simplicity, we assume that this can
immediately be settled. We conclude that the asset deficit at time 1 is given by
X1PY + L+,PY
+ X1CY + L+,CY
+ X1Op A1 .
1
1
(10.5)
(10.6)
NL
270
Coming back to the risk modules: market risk affects all variables in (10.5);
; credit risk
, X1CY and L+,CY
insurance risk is mainly reflected in X1PY , L+,PY
1
1
is a main risk driver in A0 (if we assume that liabilities are considered before
re-insurance is applied (gross)); and operational risk is reflected in X1Op .
In the remainder we concentrate on the modeling of insurance liabilities.
10.3.1
Market-consistent values
w)
10.3
(m
+ X1CY + L+,CY
.
LIns
= X1PY + L+,PY
1
1
1
tes
Op
Note that in our terminology L1 = LIns
1 + X1 . Assume that X = (X1 , . . . , Xn )
denotes the (random) cash flow that is generated by the insurance liabilities, see
also (9.1). We assume that this cash flow is adapted to the filtration F = (Fs )s1 .
In analogy to Wthrich-Merz [100] we need to choose an appropriate (state price)
deflator = (1 , . . . , n ) (which is F-adapted and strictly positive, P-a.s.) and
then
1 X
1 X
LIns
=
E [ s Xs | F1 ] = X1 +
E [ s Xs | F1 ]
1
1 s1
1 s2
X
1
E [ s | F1 ] E [ Xs | F1 ] =
P (1, s) E [ Xs | F1 ]
s1 1
s1
X
NL
LIns
=
1
no
= X1PY + X1CY +
(10.7)
P (1, s) E [ Xs | F1 ] ,
s2
where P (1, s) denotes the price at time 1 of the zero-coupon bond that matures
at time s 2 (and P (1, 1) = 1). Note that viewed from time 0 both P (1, s) and
E [ Xs | F1 ] are F1 -measurable random variables in (10.7) and (expected) insurance
cash flows are adjusted for time value of money.
Under all the previous assumptions (in particular the uncorrelatedness assumption
(10.7)) the acceptability requirement (10.3) reads as:
The initial capital c0 and the asset strategy should be chosen such that
P (1, s) E [ Xs | F1 ] A1 0.
s2
(10.8)
271
Since the asset deficit still has a rather involved form the model is further simplified.
Denote the expected values
p(1, s) = E [ P (1, s)| F0 ]
xs = E [ Xs | F0 ] .
and
Z1 =
(m
w)
The first term p(1, s)xs is the expected value (viewed from time 0) of the time-1price P (1, s)E [ Xs | F1 ]. The term (P (1, s) p(1, s)) xs coins uncertainties in financial discounting and p(1, s) (E [ Xs | F1 ] xs ) describes volatilities in the insurance
cash flows. The cross term of the uncertainties was dropped in this approximation. Typically, the above terms are assumed to be independent so that they can
be studied individually and aggregation is obtained by simply convoluting their
marginal distributions.
This approximation implies that for (10.8) we study the following three terms
p(1, s)xs + (P (1, s) p(1, s)) xs A1 ,
Z2 =
Z3 =
tes
s1
p(1, s) (E [ Xs | F1 ] xs ) ,
s1
X1Op .
10.3.2
NL
no
Z1 describes market and credit risks, Z2 describes insurance risk and Z3 describes
operational risk. In non-life insurance one often assumes that these three random variables are independent (which may be problematic in particular w.r.t. reinsurance).
In the remainder of this chapter we describe the insurance liability variable Z2 . For
the other terms we refer to the related solvency literature QIS5 [44], Swiss Solvency
Test [46] and Wthrich-Merz [101].
Insurance risk
p(1, s) (E [ Xs | F1 ] xs ) .
s1
As already mentioned the insurance variables are separated into PY variables and
CY variables w.r.t. the valuation date t = 0. This provides the split
Z2
=
def.
Z2PY + Z2CY
X
s1
s1
272
The final simplification is that we assume that there are deterministic payout patterns (sPY )s1 and (sCY )s1 , for instance, obtained by the CL method, see Theorem 9.11 (and the estimation errors in these patterns are neglected). Then the last
expressions can be modified to
Z2PY =
s1
p(1, s)sCY [E [ S1 | F1 ] E [ S1 | F0 ]] .
w)
Z2CY =
X
s1
tes
(m
The first line Z2PY reflects the study of the claims development result, see (9.29).
The second line Z2CY describes the total nominal claim S1 of accident year 1 that
is caused by the premium exposure CY . The terms in the round brackets are the
deterministic discount factors that respect the underlying maturities of the cash
flows; the terms in the square brackets are the random terms that need further
modeling and analysis.
no
has expected value 0, see Corollary 9.13, if the claims reserves are defined by
conditional expectations in a Bayesian model. Therefore, there remains the study
of higher moments. In practice, one restricts to the second moment:
NL
Calculate for every line of business the conditional MSEP of the claims development result prediction, for instance using MW formula (9.35). This
provides a variance estimate for every line of business.
Specify a correlation matrix between the different lines of business, see for
instance SCR.9.34. in QIS5 [44].
The previous two items allow to aggregate the uncertainties of the individual
lines of business to obtain the overall variance over the sum of all lines of
business.
Fit a translated gamma or log-normal distribution to these first two moments assuming that the mean is exactly given by R(0) . This provides an
approximation to the distribution of CDR1 .
Version April 14, 2016, M.V. Wthrich, ETH Zurich
273
(m
w)
The claim E [ S1 | F1 ] resulting from the premium exposure CY is split into two
independent random variables Ssc and Slc , where Ssc reflects all small claims below
a given threshold M and Slc the claims above that threshold, see Examples 2.16
and 4.11.
The large claim Ssc is modeled per line of business (or per peril) by independent
compound Poisson distributions with Pareto claims severities and aggregation is
done using the aggregation Theorem 2.12 resulting in a compound Poisson distribution. The latter can be determined, for instance, with the Panjer algorithm,
see Theorem 4.9, or the fast Fourier transform FFT, see Section 4.2.2.
The small claim Ssc is treated similarly to the claims development result, i.e. estimate per line of business the first two moments. Aggregate these moments using
an appropriate correlation matrix, see for instance Section 8.4.2 in the technical
Swiss Solvency Test document [46], and fit a gamma or a log-normal distribution
to this first two moments.
Remarks.
no
tes
In the Swiss Solvency Test one distinguishes between pure process risk and
parameter uncertainty for the small claims layer, too. Process risk is diversifiable with increasing volume, whereas parameter uncertainty is not. As a
result the coefficient of variation per line of business has a similar form as has
been found for the compound negative-binomial distribution, see Proposition
2.24. That is, for volume v the coefficient of variation does not vanish
but stays strictly positive.
NL
In the Swiss Solvency Test one aggregates in addition so-called scenarios. The
motivation for this is that the present model cannot reflect all uncertainties
and therefore it is disturbed by scenarios. Basically, these scenarios are claims
of Bernoulli type, i.e. they occur with a certain probability and if they occur
they have a given amount.
For the aggregation between PY and CY claims it is either assumed that they
are independent, or that the claims development result uncertainty CDR1 and
the small claim CY Ssc are again aggregated via a correlation matrix and then
an overall distribution is fitted to the resulting first two moments.
In summary we see that many approximations are used (as described above)
and, also crucially, that aggregation is done over correlation matrices. The
latter may be quite problematic because correlations typically also depend on
underlying volumes which is neglected in actual solvency implementations.
Therefore, this needs to be revised carefully in each individual case.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
274
Market-value margin
NL
no
tes
(m
w)
The careful reader will have noticed that we have lost the risk margin somewhere
on the way to the final result. We will not further discuss the risk and marketvalue margin here, we only want to mention that the current calculation of the
market-value margin is quite ad-hoc, see Chapter 6 in Swiss Solvency Test [46] and
Section 10.3 in Wthrich-Merz [100], and further refinements are necessary. The
crucial point is that the conditional uncorrelatedness in (10.7) does not hold true
in general, see Wthrich-Merz [100], and for a more general discussion we also refer
to Happ et al. [59] and Wthrich [98].
w)
Appendix
Derivations from Gaussian distributions
Assume Z0 , Z1 , . . .
i.i.d.
Xk =
k
X
i=1
(m
f (x) =
tes
Xk has a 2 -distribution with k degrees of freedom, see Example 2 on page 22. Its
density is given by
1
xk/21 exp {x/2}
2k/2 (k/2)
for x 0,
no
NL
Z0
i=1
Zi2 /k
for x R.
The moment generating function MXk (r) does not exist for r > 0, and we have
E[Xk ] = 0, for k > 1, and Var(Xk ) = k/(k 2), for k > 2.
F -distribution. Define for k, m N the random variable
Pk
Zi2 /k
.
2
i=k+1 Zi /m
i=1
Xk,m = Pk+m
275
276
for x 0.
The moment generating function MXk,m (r) does not exist for r > 0, and we have
E[Xk,m ] = m/(m 2), for m > 2.
i.i.d.
k
1 X
Z =
Zj
k j=1
and
S2 =
w)
Z N (, 2 /k)
and
(m
S2 =
2
2 ,
k 1 k1
tes
k Z
NL
no
Bibliography
w)
[1] Acerbi, C., Tasche, D. (2002). On the coherence of expected shortfall. Journal
Banking and Finance 26/7, 1487-1503.
[2] Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control 19/6, 716-723.
(m
[3] Alai, D.H., Merz, M., Wthrich, M.V. (2009). Mean square error of prediction in
the Bornhuetter-Ferguson claims reserving method. Annals of Actuarial Science
4/1, 7-31.
tes
[4] Alai, D.H., Merz, M., Wthrich, M.V. (2010). Prediction uncertainty in the
Bornhuetter-Ferguson claims reserving method: revisited. Annals of Actuarial Science 5/1, 7-17.
[5] Artzner, P., Delbaen, F., Eber, J.M., Heath, D. (1997). Thinking coherently. Risk
10/11, 68-71.
no
[6] Artzner, P., Delbaen, F., Eber, J.M., Heath, D. (1999). Coherent measures of risk.
Mathematical Finance 9/3, 203-228.
[7] Asmussen, S., Albrecher, H. (2010). Ruin Probabilities. 2nd edition. World Scientific.
NL
[8] Bahr, von B. (1975). Asymptotic ruin probabilities when exponential moments do
not exist. Scandinavian Actuarial Journal 1975, 6-10.
[9] Bailey, R.A. (1963). Insurance rates with minimum bias. Proceedings CAS 50, 4-11.
[10] Bailey, R.A., Simon, L.J. (1960). Two studies on automobile insurance ratemaking.
ASTIN Bulletin 1, 192-217.
[11] Bichsel, F. (1964). Erfahrungstarifierung in der Motorfahrzeug-HaftpflichtVersicherung. Bulletin of the Swiss Association of Actuaries 1964, 119-130.
[12] Billingsley, P. (1968). Probability and Measure. Wiley.
[13] Billingsley, P. (1995). Probability and Measure. 3rd edition. Wiley.
[14] Boland, P.J. (2007). Statistical and Probabilistic Methods in Actuarial Science.
Chapman & Hall/CRC.
[15] Bolthausen, E., Wthrich, M.V. (2013). Bernoullis law of large numbers. ASTIN
Bulletin 43/2, 73-79.
277
278
Bibliography
[16] Bornhuetter, R.L., Ferguson, R.E. (1972). The actuary and IBNR. Proceedings
CAS 59, 181-195.
[17] Boyd, S., Vandenberghe, L. (2004). Convex Optimization. Cambridge University
Press.
[18] Bhlmann, H. (1970). Mathematical Methods in Risk Theory. Springer.
[19] Bhlmann, H. (1980). An economic premium principle. ASTIN Bulletin 11/1, 5260.
w)
[20] Bhlmann, H. (1992). Stochastic discounting. Insurance: Mathematics and Economics 11/2, 113-127.
(m
[21] Bhlmann, H. (1995). Life insurance with stochastic interest rates. In: Financial
Risk in Insurance, G. Ottaviani (ed.), Springer, 1-24.
[22] Bhlmann, H. (2004). Multidimensional valuation. Finance 25, 15-29.
[23] Bhlmann, H., De Felice, M., Gisler, A., Moriconi, F., Wthrich, M.V. (2009).
Recursive credibility formula for chain ladder factors and the claims development
result. ASTIN Bulletin 39/1, 275-306.
tes
[24] Bhlmann, H., Gisler, A. (2005). A Course in Credibility Theory and its Applications. Springer.
[25] Bhlmann, H., Straub, E. (1970). Glaubwrdigkeit fr Schadenstze. Bulletin of
the Swiss Association of Actuaries 1970, 111-131.
no
[27] Cern,
A. (2006). Introduction to fast Fourier transform in finance. SSRN
manuscript ID 559416.
NL
Bibliography
279
[35] Denuit, M., Marchal, X., Pitrebois, S., Walhin, J.-F. (2007). Actuarial Modelling
of Claims Count. Wiley.
[36] Dickson, D.C.M. (2005). Insurance Risk and Ruin. Cambridge University Press.
[37] Duffie, D. (2001). Dynamic Asset Pricing Theory. 3rd edition. Princeton University
Press.
w)
[38] Embrechts, P., Frei, M. (2009). Panjer recursion versus FFT for compound distributions. Mathematical Methods of Operations Research 69/3, 497-508.
[39] Embrechts, P., Klppelberg, C., Mikosch, T. (2003). Modelling Extremal Events
for Insurance and Finance. 4th printing. Springer.
(m
[40] Embrechts, P., Nelehov, J., Wthrich, M.V. (2009). Additivity properties for
Value-at-Risk under Archimedean dependence and heavy-tailedness. Insurance:
Mathematics and Economics 44/2, 164-169.
[41] Embrechts, P., Veraverbeke, N. (1982). Estimates for the probability of ruin with
special emphasis on the possibility of large claims. Insurance: Mathematics and
Economics 1/1, 55-72.
tes
[42] England, P.D., Verrall, R.J. (2002). Stochastic claims reserving in general insurance. British Actuarial Journal 8/3, 443-518.
no
[43] England, P.D., Verrall, R.J., Wthrich, M.V. (2012). Bayesian overdispersed Poisson model and the Bornhuetter-Ferguson claims reserving method. Annals of Actuarial Science 6/2, 258-283.
[44] European Commission (2010). QIS 5 Technical Specifications, Annex to Call for
Advice from CEIOPS on QIS5.
NL
[45] Feller, W. (1966). An Introduction to Probability Theory and its Applications. Volume II. Wiley.
[46] FINMA (2006). Swiss Solvency Test. FINMA SST Technisches Dokument, Version
2. October 2006.
[47] Fllmer, H., Schied, A. (2004). Stochastic Finance, An Introduction in Discrete
Time. 2nd edition. de Gruyter.
[48] Fortuin, C.M., Kasteleyn, P.W., Ginibre, J. (1971). Correlation inequalities on
some partially ordered sets. Communication Mathematical Physics 22/2, 89-103.
[49] Frees, E.W. (2010). Regression Modeling with Actuarial and Financial Applications.
Cambridge University Press.
[50] Fringeli, M. (2005). Credibility fr Probleme mit rumlicher Abhngigkeit. Diploma
Thesis, ETH Zurich.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
280
Bibliography
[51] Garcia Ben, M., Yohai, V.J. (2004). Quantile-quantile plot for deviance residuals
in the generalized linear model. Journal of Computational and Graphical Statistics
13/1, 36-47.
[52] Gesmann, M., Murphy, D., Zhang, W., Carrato, A., Crupi G., Wthrich, M.V.
(2015). ChainLadder: statistical methods and models for the calculation of outstanding claims reserves in general insurance. R package version 0.2.0.
w)
[53] Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo
in Practice. Chapman & Hall.
[54] Gisler, A. (2011). Nicht-Leben Versicherungsmathematik. Lecture Notes, ETH
Zurich.
(m
[55] Gisler, A., Wthrich, M.V. (2008). Credibility for the chain ladder reserving
method. ASTIN Bulletin 38/2, 565-600.
[56] Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and
Bayesian model determination. Biometrika 82/4, 711-732.
tes
[57] Green, P.J. (2003). Trans-dimensional Markov chain Monte Carlo. In: Highly Structured Stochastic Systems, P.J. Green, N.L. Hjort, S. Richardson (eds.), Oxford
Statistical Science Series, 179-206. Oxford University Press.
[58] Hachemeister, C.A., Stanard, J.N. (1975). IBNR claims count estimation with
static lag functions. ASTIN Colloquium 1975, Portugal.
no
[59] Happ, S., Merz, M., Wthrich, M.V. (2015). Best-estimate claims reserves in incomplete markets. European Actuarial Journal 5/1, 55-77.
[60] Hofert, M., Wthrich, M.V. (2013). Statistical review of nuclear power accidents.
Asia-Pacific Journal of Risk and Insurance 7/1, Article 1.
NL
[61] Johansen, A.M., Evers, L., Whiteley, N. (2010). Monte Carlo Methods. Lecture
Notes, Department of Mathematics, University of Bristol.
[62] Johnson, R.A., Wichern, D.W. (1998). Applied Multivariate Statistical Analysis.
4th edition. Prentice-Hall.
[63] Jung, J. (1968). On automobile insurance ratemaking. ASTIN Bulletin 5, 41-48.
[64] Kaas, R., Goovaerts, M., Dhaene, J., Denuit, M. (2008). Modern Actuarial Risk
Theory, Using R. 2nd edition. Springer.
[65] Kehlmann, D. (2005). Die Vermessung der Welt. Rowohlt Verlag.
[66] Kolmogoroff, A. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer.
[67] Kremer, E. (1985). Einfhrung in die Versicherungsmathematik. Vandenhoek &
Ruprecht, Gttingen.
[68] Kyprianou, A. (2014). Gerber-Shiu Risk Theory. Springer.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
Bibliography
281
[69] Laplace, P.S. (1812). Thorie analytique des probabilits. Suppl. to 3rd edition
Courcier, Paris 1820.
[70] Lehmann, E.L. (1983). Theory of Point Estimation. Wiley.
[71] Lundberg, F. (1903). Approximerad framstllning av sannolikhetsfunktionen. terfrskering av kolletivrisker. Almqvist & Wiksell, Uppsala.
[72] Mack, T. (1991). A simple parametric model for rating automobile insurance or
estimating IBNR claims reserves. ASTIN Bulletin 21/1, 93-109.
w)
[73] Mack, T. (1993). Distribution-free calculation of the standard error of chain ladder
reserve estimates. ASTIN Bulletin 23/2, 213-225.
(m
tes
[77] McNeil, A.J., Frey, R., Embrechts, P. (2005). Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press.
[78] Merz, M., Wthrich, M.V. (2008). Modelling the claims development result for
solvency purposes. CAS E-Forum Fall 2008, 542-568.
no
NL
[82] Ohlsson, E., Johansson, B. (2010). Non-Life Insurance Pricing with Generalized
Linear Models. Springer.
[83] Panjer, H.H. (1981). Recursive evaluation of a family of compound distributions.
ASTIN Bulletin 12/1, 22-26.
[84] Panjer, H.H. (2006). Operational Risk: Modeling Analytics. Wiley.
[85] Renshaw, A.E., Verrall, R.J. (1998). A stochastic model underlying the chainladder technique. British Actuarial Journal 4/4, 903-923.
[86] Resnick, S.I. (1997). Heavy tail modeling of teletraffic data. Annals of Statistics
25/5, 1805-1869.
[87] Resnick, S.I. (2002). Adventures in Stochastic Processes. 3rd printing. Birkhuser.
[88] Robert, C.P. (2001). The Bayesian Choice. 2nd edition. Springer.
Version April 14, 2016, M.V. Wthrich, ETH Zurich
282
Bibliography
[89] Rolski, T., Schmidli, H., Schmidt, V., Teugels, J. (1999). Stochastic Processes for
Insurance and Finance. Wiley.
[90] Saluz, A., Gisler, A., Wthrich, M.V. (2011). Development pattern and prediction error for the stochastic Bornhuetter-Ferguson claims reserving model. ASTIN
Bulletin 41/2, 279-317.
[91] Schmidli, H. (2007). Risk Theory. Lecture Notes, University of Cologne.
w)
[92] Schweizer, M. (2009). Stochastic Processes and Stochastic Analysis. Lecture Notes,
ETH Zurich.
[93] Smith, A., Thaper, S. (2014). Making uncertainty explicit: stochastic modelling.
Actuarial Post, February 12, 2014, 12-15.
(m
[94] Sovacool, B.K. (2008). The costs of failure: a preliminary assessment of major
energy accidents, 19072007. Energy Policy 36/5, 1802-1820.
[95] Sundt, B., Jewell, W.S. (1981). Further results of recursive evaluation of compound
distributions. ASTIN Bulletin 12/1, 27-39.
tes
[96] Tsanakas, A., Christofides, N. (2006). Risk exchange with distorted probabilities.
ASTIN Bulletin 36/1, 219-243.
[97] Williams, D. (1991). Probability with Martingales. Cambridge University Press.
no
[98] Wthrich, M.V. (2013). From ruin theory to solvency in non-life insurance. To
appear in Scandinavian Actuarial Journal.
[99] Wthrich, M.V., Bhlmann, H., Furrer, H. (2010). Market-Consistent Actuarial
Valuation. 2nd edition. Springer.
[100] Wthrich, M.V., Merz, M. (2008). Stochastic Claims Reserving Methods in Insurance. Wiley.
NL
[101] Wthrich, M.V., Merz, M. (2013). Financial Modeling, Actuarial Valuation and
Solvency in Insurance. Springer.
[102] Wthrich, M.V., Merz, M. (2015). Stochastic claims reserving manual: advances
in dynamic modeling. SSRN Manuscript ID 2649057.
(m
tes
NL
no
Exercise 1, page 18
Exercise 2, page 22
Corollary 2.7, page 28
Exercise 3, page 28
Exercise 4, page 40
Exercise 5, page 51
Exercise 6, page 51
Exercise 7, page 60
Exercise 8, page 78
Exercise 9, page 84
Exercise 10, page 90
Exercise 11, page 90
Exercise 12, page 98
Exercise 13, page 143
Corollary 6.6, page 149
Exercise 14, page 150
Exercise 15, page 153
Exercise 16, page 154
Exercise 17, page 155
Exercise 18, page 158
Exercise 19, page 159
Exercise 20, page 164
Exercise 21, page 181
Exercise 22, page 191
Exercise 23, page 221
Exercise 24, page 221
Exercise 25, page 222
Exercise 26, page 257
w)
List of exercises
283
Index
NL
(m
no
tes
w)
F -distribution, 180
2 -distribution, 22, 62
2 -goodness-of-fit test, 49, 83
p-value, 22
Index
285
NL
no
tes
(m
w)
decomposition property, 33
chi-square distribution, 22, 62
chi-square-goodness-of-fit test, 49, 83
definition, 30
CL factor, 232
moments, 30
Bayes, 240
concave, 144
estimate, 240
conditional tail expectation, 159
CL method, 232
conjugate prior, 208
CL model
constant absolute risk-aversion, 147
gamma-gamma Bayes, 239
constant relative risk-aversion, 147
MSEP, 242
continuous variable, 177
CL reserves, 233
convergence in distribution, 17
claims
convex cone, 161
counts, 23
convolution, 25
frequency, 26
cost-of-capital, 160, 163
claims development
rate, 160, 164
result, 250, 272
Cramr, Harald, 121
triangle, 229
Cramr-Lundberg process, 121
claims inflation, 89
credibility coefficient, 218, 241
claims reserves, 230, 231
credibility estimator, 209
claims reserving, 225
homogeneous, 214
algorithm, 232
inhomogeneous, 214
method stochastic, 237
credibility weight, 201, 205, 208
closing date, 226
credit risk, 268, 271
CLT, 13, 94
CRRA utility function, 147
CoC, 160
CTE, 159
rate, 160
cumulant function, 182
coefficient of determination, 178
cumulant generating function, 19
coefficient of variation, 16, 58
current year claim, 269
coherent risk measure, 157, 162
CY claim, 269
collective mean, 213
CY risk, 272
collective risk model, 23
Darling, Donald Allan, 82
compound binomial distribution, 27
De Moivre, Abraham, 13, 94
definition, 27
decomposition property, 33
moments, 28
deductible, 88
compound distribution, 23
deflator, 165, 270
definition, 23
Delbaen, Freddy, 157
moments, 24
compound negative-binomial distribution, density, 15, 58
descending ladder epoch, 129
40
design matrix, 175
definition, 40
development year, 229
moments, 40
deviance statistics, 190
compound Poisson distribution, 30
aggregation property, 31
discrete distribution, 15
Version April 14, 2016, M.V. Wthrich, ETH Zurich
286
Index
w)
NL
no
tes
EDF, 182
Edgeworth approximation, 100
Edgeworth, Francis Ysidro, 100
Embrechts, Paul, 139
Embrechts-Veraverbeke theorem, 137
empirical
distribution function, 56
loss size index function, 56
mean excess function, 56
England, Peter D., 247
ES, 159
Esscher
measure, 154
premium, 154, 165
estimation error, 21
estimator, 21
expectation, 15
expected claims frequency, 26
expected shortfall, 158, 159, 163
expected value, 15, 58
expected value principle, 141
exponential dispersion family, 182, 208
exponential distribution, 62
exponential utility function, 147
(m
discretization, 108
disjoint decomposition, 32
property, 33
dispersion, 182
distortion function, 156
distribution function, 15
distribution-free CL model, 238
Duffie, James Darrell, 165
F-distribution, 180
fast Fourier transform, 116
Ferguson, Ronald E., 236
FFT, 116
finite horizon ruin probability, 122
first moment, 15
Fisher, Sir Ronald Aylmer, 45
Fourier transform
discrete, 117
i.i.d., 20
IBNYR, 227
incomplete gamma function, 59
independent and identically distributed,
20
individual claim size, 23, 53
informative prior, 206
inhomogeneous credibility estimator, 214
insurance risk, 268, 271
inverse Gaussian distribution, 63
inversion formula, 117
isoelastic utility function, 147
Jewell, William S., 106
Jung, Jan, 173
Khinchin, Aleksandr Yakovlevich, 132
Kolmogorov distribution, 80
Index
287
NL
no
w)
tes
ladder
epoch, 129
height, 129
Laplace, Pierre-Simon, 13, 94, 202
large claims separation, 35
law of large numbers, 12
layer, 58, 86
leverage effect, 89
likelihood function, 45
likelihood ratio test, 179
linear credibility, 201, 212
link ratio, 232
LLN, 12
log-gamma distribution, 70
log-likelihood function, 45
log-linear model, 176
log-link function, 185
log-log plot, 57
log-normal distribution, 66
loss size index function, 56, 58
Lundberg
bound, 125, 126
coefficient, 125
Lundberg, Ernst Filip Oskar, 121
Lyapunov, Aleksandr Mikhailovich, 94
mean, 15, 58
mean excess function, 56, 58
mean excess plot, 57
mean square error of prediction
conditional, 237
Merz, Michael, 250
Merz-Wthrich formula, 255
method of
Bailey & Jung, 173
Bailey & Simon, 170
total marginal sums, 173
method of moments, 40
minimal variance estimator, 42
mixed Poisson distribution, 36
definition, 36
MLE, 40, 45
MM, 40
model risk, 13
model world, 14
moment estimator, 41
moment generating function, 16, 19, 58
moments, 15
monotonicity, 161
Monte Carlo simulation, 93
Morgenstern, Oskar, 144
MSEP, 237, 242
multiplicative tariff, 168
multivariate Gaussian distribution, 175
density, 176
MV, 42
MW formula, 255
(m
negative-binomial distribution, 37
definition, 37
moments, 38
net profit condition, 124
Neumann, von John, 144
non-informative prior, 206
normal approximation, 94
normalization, 161
NPC, 124
null hypothesis, 21
288
Index
radius of convergence, 16
Radon-Nikodym derivative, 165
random variables, 14
random walk theorem, 124
rapidly varying, 59
RBNS, 228
re-insurance, 88
real world, 14
regularly varying, 59, 136
renewal property, 125
reporting
date, 226
delay, 226
reserve risk, 268
reserves, 230, 231
residual standard deviation, 179
Resnick, Sidney Ira, 75
Riemann-Stieltjes integral, 15
risk
averse, 144
bearing capital, 159
characteristics, 168
class, 168
components, 13
margin, 266, 274
measure, 159, 265
modules, 267
NL
no
tes
p-value, 22
Plya, George, 38
Panjer
algorithm, 105, 107
distribution, 105
recursion, 105
Panjer, Harry H., 105
parameter estimation
claims count distribution, 40
error, 238
Pareto distribution, 73
Pareto, Vilfredo Federico Damaso, 73
past exposure claim, 230
Pearsons residuals, 191
Pearson, Karl, 59, 83
Poisson distribution, 29, 184
definition, 29
moments, 29
Poisson, Simon Denis, 29
Poisson-gamma model, 203
Pollaczek, Flix, 132
Pollaczek-Khinchin formula, 129, 132
positive homogeneity, 161
posterior
distribution, 203
parameter, 204
power law distribution, 73
power utility function, 147
prediction error, 21, 221
predictor, 21, 237
premium
calculation principle, 141
CY, 269
w)
elements, 13
liability risk, 268, 273
previous year claim, 268
Price, Richard, 202
prior
distribution, 202
parameter, 204
probability distortion, 156
probability space, 14
process uncertainty, 238
provisions, 230, 269
pure randomness, 13
PY claim, 268
PY risk, 272
(m
number of claims, 23
Index
289
surplus process, 121, 264
survival function, 20, 58
Swiss Solvency Test, 265
w)
NL
no
tes
sample
estimators, 41
mean, 41, 54
variance, 41, 54
saturated model, 189
scale parameter, 59
scaled deviance, 190
scatter plot, 54
settlement
date, 226
delay, 229
period, 226
shape parameter, 59
Shiu, Elias S.W., 121
significance level, 21
Simon, LeRoy J., 170
skewness, 16, 58
slowly varying, 59
Smirnov, Nikolai Vasilyevich, 79
solvency, 266
Solvency II, 265
Spitzers formula, 130
Spitzer, Frank Ludvig, 131
SST, 265
standard assumptions for compound distributions, 23
standard deviation, 16
standard deviation loading principle, 142
stochastic claims reserving method, 237
stochastic dominance, 109
stopping time, 129
Straub, Erwin, 212
structural parameter, 218
subadditivity, 161
subexponential, 133, 135, 137
Sundt, Bjrn, 106
(m
ruin probability
finite horizon, 122
ultimate, 123
ruin theory, 121
ruin time, 122