Bayesian Analysis of Extreme Operational Losses: Chyng-Lan Liang

The Journal of Operational Risk (2743) Volume 4/Number 3, Fall 2009
Bayesian analysis of extreme operational

losses
Chyng-Lan Liang
Algorithmics (UK) Limited, Eldon House, 2 Eldon Street, London EC2M 7LS, UK;
email: chyngl@yahoo.com
Bayesian techniques offer an alternative to parameter estimation methods,
such as maximum likelihood estimation, for extreme value models. These
techniques treat the parameters to be estimated as random variables,
instead of some xed, possibly unknown, constants. We investigate, with
simulated examples, how Bayesian analysis can be used to estimate the
parameters of extreme value models, for the case where we have no prior
knowledge at all and the case where we have prior knowledge in the form
of expert opinion. In addition, Bayesian analysis provides a framework for
the incorporation of information from external data into a loss model based
on internal data; this is again illustrated using simulation.
1 INTRODUCTION
Maximum likelihood estimation (MLE) techniques are not the only way to draw
inferences from the likelihood function; Bayesian inference offers an alternative
methodology, as well as viewpoint. There is some debate concerning the viability
of these methods, which we will only briey touch upon. We will be concentrating
instead on how these methods might be applied in practice.
The Bayesian techniques we will consider here have wide applicability, but we are
particularly interested in how these techniques may give us greater insight into the
behavior of loss processes at extreme levels. There has been a great deal of interest in
the use of Bayesian methods in operational risk; recent papers on the subject include
those by Shevchenko and Wthrich (2006) and Peters and Sisson (2006). We will not
be restricting ourselves to conjugate priors (see Shevchenko and Wthrich (2006) for
classes of frequency and severity models admitting conjugate forms) and will instead
be concentrating on the tting of extreme losses, using the distributions suggested by
extreme value theory.
In Section 2, we introduce the concepts behind Bayesian analysis; in Section 3
we briey describe how simulation-based techniques, and in particular Markov
chain Monte Carlo (MCMC) techniques, can help us overcome the difculty of
computation in Bayesian analysis. In Section 4, we consider parameter estimation
for the extreme value model for annual maxima, through a small simulation study.
For simulated data, we compare the Bayesian estimates for the model parameters
when we have no prior information (Section 4.1) with those obtained when we have
27

2009 Incisive Media. Copying or distributing in print or electronic
forms without written permission of Incisive Media is prohibited.
28 C.-L. Liang
prior expert opinion (Section 4.2). In the latter case, there is no obvious methodology
for transforming expert opinions into prior distributions for the parameters. We will
be following the method for eliciting prior information proposed by Coles and Tawn
(1996). We consider using external data for prior specication in Section 5. Typically
there are concerns about the incorporation of external data into the modeling
process questions about scaling, applicability and loss severity thresholds. We
illustrate a possible use of Bayesian analysis with an example showing where
the external data can provide us with information concerning the shape of the
distribution.
2 BAYESIAN INFERENCE
Suppose we have data x =(x
1
, . . . , x
n
), constituting independent identically dis-
tributed (iid) realizations of a random variable, X, whose density belongs to a
parametric family parametrized by . The likelihood for may then be given by
P(x | ) =
n
i=1
P(x
i
| ), since the observations x
i
, i =1, . . . , n, are independent.
In the classical framework, is a constant an unknown constant to be estimated
in many cases. In the Bayesian setting, itself is a random variable, with an
a priori distribution, which reects uncertainty concerning the parameter value prior
to observing any data. This distribution is called the prior distribution.
Bayes theorem states that:
P( | x) =
P()P(x | )
_
P(y)P(x | y) dy
(1)
Bayes theorem is an immediate consequence of the axioms of probability it is its
use in statistical analysis that is controversial and revolutionary. The theorem allows
us to convert some prior set of beliefs concerning the unknown , as represented by
the prior distribution, P(), into a posterior distribution, P( | x), which incorporates
information provided by the data x. In addition, since we get a complete distribution,
not just an estimate, the accuracy of our estimate can be summarized, for example,
by the variance of the posterior distribution; we do not need to resort to asymptotic
theory, as we do for MLE. Proponents for this approach argue that the prior
distribution allows us to supplement data with other sources of relevant information;
opponents contend that conclusions become subjective, with the subjective choice
of priors.
3 METROPOLISHASTINGS ALGORITHM
Computation of the integral in the denominator in Equation (1) can cause problems,
especially for high-dimensional . This problem has been overcome by simulation-
based techniques. The idea behind simulation-based techniques is to simulate a
sample from the posterior distribution and use this simulated sample to obtain
estimates of the moments and properties of the posterior distribution.
The Journal of Operational Risk Volume 4/Number 3, Fall 2009

Bayesian analysis of extreme operational losses 29
Under fairly general conditions, a Markov chain eventually reaches an equilibrium
distribution . The MetropolisHastings algorithm aims to construct a Markov chain
that has the posterior distribution as its equilibrium distribution. The algorithm is
given below:
Set an initial value
1
.
Specify an arbitrary probability rule q(
i+1
|
i
) for iterative simulation of
successive values. The q function here is known as the transition kernel of
the chain.
This generates a rst order Markov chain, as the stochastic properties of
i+1
are independent of
1
, . . . ,
i1
, given
i
. In order for the sequence
1
,
2
, . . . to have the equilibrium distribution given by Equation (1), we add
an acceptance/rejection step:
For iteration i, use the probability rule q( |
i
) to generate a candidate value, y,
for
i+1
.
Let:
i
=min
_
1,
P(y)P(x | y)q(
i
| y)
P(
i
)P(x |
i
)q(y |
i
)
_
(2)
We set:
i+1
=
_
y with probability
i
i
with probability 1
i
(3)
Under simple regularity assumptions, the generated sequence is a Markov chain,
with equilibrium distribution being the distribution given by Equation (1). For a large
enough value of m, the sequence
m
,
m+1
, . . . can be used in a similar way to a
sequence of independent values to estimate properties of the posterior distribution,
like the posterior mean. This will be regardless of the choice of the transition
kernel, q, though this choice will affect the settling-in period and the dependence
in the sequence.
3.1 Estimation of extreme quantiles
The objective of extreme value analysis is usually an estimate of the probability
of future events reaching extreme levels. Extreme quantiles are therefore often of
primary interest. We will use the 99.5% quantile as a comparison point it is often
convenient to have one point of comparison, as opposed to comparing the estimates
for each distribution parameter.
Suppose we again have our iid observations x =(x
1
, x
2
, . . . , x
n
) for X with
probability density function (pdf) P(x | ). Prediction can be handled within a
Bayesian setting. Let x denote a future observation of X. The predictive density
of x given our previous observations x =(x
1
, x
2
, . . . , x
n
) is given by:
P(x | x
1
, x
2
, . . . , x
n
) =
_
P(x | )P( | x
1
, x
2
, . . . , x
n
) (4)
Research Paper www.thejournalofoperationalrisk.com

30 C.-L. Liang
Suppose the posterior distribution has been estimated by simulation. If
1
, . . . ,
s
denotes a sample from a random variable with distribution P( | x), then we have, in
the Bayesian framework:
P(X < c | x
1
, x
2
, . . . , x
n
) =
_
P(X < c | )P( | x

1
, x
2
, . . . , x
n
)
1
s
s
i=1
P(X < c |
i
) (5)
The predictive distribution, given by Equation (5), allows for both parameter uncer-
tainty and randomness in future observations. Equation (5) may be used to estimate
the extreme quantiles of interest, by determining the c giving the desired probability.
4 ESTIMATION FOR EXTREME VALUE MODEL
Let Y
1
, Y
2
, . . . be a sequence of iid random variables. Let X
n
=max{Y
1
, . . . , Y
n
}.
From work by Fisher and Tippett (1928), we have that if two sequences of real
numbers a
n
> 0, b
n
exist such that:
lim
n
P
_
X
n
b
n
a
n
x
_
=F(x) (6)
then, if F is a non-degenerate distribution function, it can be given by:
F(x) =
_
_
exp
_
_
1
k(x )
_
1/k
_
if k =0
exp
_
exp
_
(x )
__
if k =0
(7)
Here > 0 and for k < 0, x > +/k and for k > 0, x < +/k. k is the shape
parameter, is the location parameter and is the scale parameter. This is the
generalized extreme value (GEV) distribution.
Suppose we have data (x
1
, x
2
, . . . , x
n
), where x
i
is the annual maximum loss
for the year indexed by i. We will illustrate the estimation of distribution parameters
using Bayesian inference through a small simulation study, and we will simulate the
maximum losses from the GEV distribution.
One thousand iid observations were simulated from GEV(k =0.2, =5,
=5). Figure 1 shows the histogram and empirical cumulative distribution function
(cdf) for our simulated data.
In the following sections, we consider the case where we lack prior informa-
tion (4.1) and the case where we have some (4.2). We use Bayesian techniques
to estimate the parameters and extreme quantiles for our GEV distribution in
these two cases; the resulting estimates will be compared with those obtained by
classical MLE.

FIGURE 1 Histogram and empirical cdf of data generated from GEV(0.2, 5, 5),
with n =1,000.
120 100 80 60
PDF CDF
x x
40 20 0 120 100 80 60 40 20 0
0.0 0.0
0.2
0.4
0.6
0.8
1.0
0.02
0.04
0.06
f
(
x
)
F
(
x
)
The cdf and pdf of the GEV distribution tted by MLE are also shown.
4.1 Non-informative prior
In the situation where we have no prior information, we will still need to specify a
prior distribution. It is common practice to use either uniform priors or priors with
very high variance, reecting the absence of any genuine prior information. Such
priors are referred to as non-informative priors.
We reparameterize by setting =log , as an easier way to respect the positivity
of . The MCMC realizations for may be transformed back to realizations of by
taking the exponential.
We assume independence and the following prior pdf:
P(, , k) =f
()f
()f
k
(k) (8)
where f
(), f
() and f
k
() are normal density functions with mean zero and
variances v
, v
and v
k
respectively. The variances, v
, v
and v
k
, should be large
for a near at prior; the choice of normality is arbitrary here. We chose v
=v
=
v
k
=10
4
.
We adopt an algorithmthat is a slight variant to the one described in Section 3. The
generation of a candidate value and its subsequent acceptance/rejection are carried
out sequentially for each of the parameters, , and k, where q is replaced in turn
by transition densities q
, q
and q
k
, each being a function of its own argument only.
We determined on normal qs, with mean zero and variances w
, w
and w
k
.
We note that, unlike our choice for the prior distribution, our choice for the
transition density, q, will not affect the model; it will affect only the efciency
of the algorithm. We chose w
=0.01, w
=0.0025 and w
k
=0.001. This choice
was made fairly arbitrarily; no attempt has been made here to tailor either the

32 C.-L. Liang
FIGURE 2 MCMC realizations of GEV parameters for non-informative prior: Top
panel, k; middle panel, ; bottom panel, .
1,000 800 600
Iteration
Location parameter ()
Scale parameter ()
Shape parameter (k)
400 200 0
2
1,000 800 600
Iteration
400 200 0
5
4
3
2
1
5
4
3
1,000 800 600
Iteration
400 200 0
0.0
0.30
0.20
0.10
The horizontal line marks the actual parameter value our data was simulated from GEV(0.2, 5, 5).
transition density or the MCMC algorithm to improve its efciency or the properties
of the chain.
Figure 2 shows the values generated by 1,000 iterations of the chain, with initial
values of k
1
=0,
1
=0,
1
=2. In Figure 2 (scale parameter), we have transformed
the scale parameter back to the scale by setting
i
=exp(
i
) for each of the
simulated
i
values.
The settling-in period seems to take around 200 iterations in this example. If we
delete the rst 200 simulated values, the remaining 800 can be used to determine
properties of the posterior distribution.
The sample means and standard deviation of the 800 simulated values, for each
of the model parameters, are shown in Table 1, together with the MLE and standard

TABLE 1 Estimates (to three signicant gures) for the GEV model parameters, k,
, .
Parameter k 99.5% quantile
Bayesian inference 0.194 (0.027) 5.19 (0.145) 4.89 (0.175) 53.2
MLE 0.200 (0.024) 5.15 (0.145) 4.85 (0.183) 53.3
Actual value 0.2 5 5 52.1
Results shown to three signicant gures, derived by MLE and Bayesian analysis, with non-informative
priors (standard deviations/errors in parentheses) and the resulting estimate for the 99.5% quantile.
errors. The results are very similar, which is reassuring given how uninformative the
prior specication is.
We are also interested in estimating extreme quantiles. Equation (5) was used,
together with the bisection method, to calculate the 99.5% quantile for our annual
maximum. Discarding the rst 200 of 1,000 MCMC realizations, we have s =800
and an estimate of 53.2 (to three signicant gures) for the 99.5% quantile. The
MLE is 53.3, and the 99.5% quantile for the distribution from which the data was
generated is 52.1 (to three signicant gures).
When we have no prior information concerning the extreme value model param-
eters, Bayesian inference, with non-informative priors, may provide us with another
method to estimate the parameter values. There are no ranges in the parameter space
where the method breaks down, unlike with MLE, and estimates for the variability
of the estimates are also a side product of the methodology.
4.2 Prior expert opinion
We will next consider the situation where we do have prior information or beliefs
concerning the parameters, and show how we may use data to update our beliefs.
In this section, we will model the same data as in Section 4.1, but this time we will
assume that we have experts on whose opinion to base our prior specication.
It is unlikely that experts will be able to express their prior beliefs concerning
extremal behavior directly in terms of the model parameters. Even if they were able
to come up with prior marginal distributions for the parameters, coming up with
the joint prior specication would still remain problematic. In particular, increasing
the scale parameter or decreasing the shape parameter leads to a longer-tailed
distribution, so dependence between these two parameters is expected.
Coles and Tawn (1996) advocate asking experts about the quantiles of some
extreme values. In particular, we may, for instance, ask experts for their estimates
of the median and 90% quantile for particular quantiles of the annual maximum loss.
Since we have three model parameters, we require expert opinion for three
quantiles, which we will denote by q
1
, q
2
and q
3
, where for i =1, 2, 3:
q
i
=
([log p
i
]
k
1)
k
(9)

34 C.-L. Liang
for some (large) probabilities p
1
< p
2
< p
3
. Since, by denition, q
1
< q
2
< q
3
and
we need to respect the ordering, Coles and Tawn (1996) advocate working instead
with the differences q
1
=q
1
, q
2
=q
2
q
1
, q
3
=q
3
q
2
, with the assumption that:
q
i
gamma(
i
,
i
), i =1, 2, 3 (10)
The estimates for the median and 90% quantile for each of q
1
, q
2
and q
3
, provided
by our experts, will allowus to calculate the gamma parameters,
i
,
i
for i =1, 2, 3,
by solving two simultaneous equations in two unknowns. Assuming independence,
we then have:
P( q
1
, q
2
, q
3
) =

1
1
q
1
1
1
exp(
1
q
1
)
(
1
)
2
2
(q
2
q
1
)
2
1
exp(
2
[q
2
q
1
])
(
2
)
3
3
(q
3
q
2
)
3
1
exp(
3
[q
3
q
2
])
(
3
)
(11)
We now have our prior distribution in terms of q
1
, q
2
, q
3
and we would like it in
terms of our GEV model parameters, k, and . The determinant of the Jacobian
matrix, J, is:
det(J) =

k
2
[log p
1
]
k
log(log p
1
)[(log p
2
)
k
(log p
3
)
k
]
+

k
2
[log p
2
]
k
log(log p
2
)[(log p
1
)
k
(log p
3
)
k
]

k
2
[log p
3
]
k
log(log p
3
)[(log p
1
)
k
(log p
2
)
k
] (12)
Substituting Equation (9) into Equation (11) and multiplying by the absolute value
of the determinant of J gives us our prior joint distribution for k, and . Figure 4
(see page 36) shows the marginal prior distributions for the GEV model parameters,
k, and , for gamma priors with (
1
,
2
,
3
) =(35, 15, 50) and (
1
,
2
,
3
) =
(0.9, 0.5, 0.8).
We next congure our MCMC algorithm, noting that again no attempt has
been made to guarantee that the generated chain has desirable properties, such
as a short settling-in period or low correlation, opting instead for simplicity. Our
analysis is based on a Gibbs sampler, successively updating the individual param-
eters conditonal on the current values of the other parameters, with a Metropolis
acceptance/rejection step.
We sequentially generate the candidate values for each of the model parameters
in turn and determine the acceptance/rejection probability, with the other parameters
xed at their last accepted values. We again use normal transition kernels.
The rst 1,000 realizations of our MCMC simulation, for each of the three
parameters, are shown in Figure 3. Initial values for the parameters are again taken to
be k
1
=0,
1
=0,
1
=2. The settling-in period seems to take around 200 iterations.

FIGURE 3 First 1,000 MCMC realizations of GEV parameters for gamma priors: Top
panel, k; middle panel, , bottom panel, .
1,000 800 600
Iteration
Location parameter ()
Scale parameter ()
Shape parameter (k)
400 200 0
2
1,000 800 600
Iteration
400 200 0
5
4
3
2
1
5
4
3
1,000 800 600
Iteration
400 200 0
0.0
0.30
0.20
0.10
The horizontal line marks the actual parameter value our data was simulated from GEV(0.2, 5, 5).
Discarding the rst 200 observations, the distribution of the remaining observations
may be used to approximate the posterior distribution of our model parameters.
Figure 4 (see page 36) shows the marginal prior and posterior pdfs for k,
and . Since we wished to plot the posterior distributions with a higher degree of
smoothness, 3,000 MCMC realizations (with the rst 200 observations discarded)
were generated.
As might be expected, given data, our uncertainty concerning the model parameter
values decreases, and the posterior variance is smaller than the prior variance.
Comparisons between prior and posterior estimates and between those obtained
through Bayesian and classical inferences are shown in Table 2 (see page 36). Our
prior beliefs have affected the resulting estimates Bayesian inference is subjective.

36 C.-L. Liang
FIGURE 4 Marginal prior and posterior distributions for k, and .
Shape parameter
0.8 0.6 0.4 0.2 0.0
0
5
10
15
20
Posterior
Prior
Scale parameter
0 2 4 6
0.0
0.5
1.0
1.5
2.0 Posterior
Prior
Location parameter
10 20 30 40 50
0.0
0.5
1.0
1.5
2.0
Posterior
Prior
TABLE 2 Comparison of the prior, posterior and maximum likelihood estimates
(to two signicant gures) for the GEV model parameters.
Parameter k 99.5% quantile
Prior 0.33 4.8 24
Posterior 0.25 (0.020) 5.3 (0.16) 4.9 (0.19) 62.5
MLE 0.20 (0.024) 5.1 (0.15) 4.8 (0.18) 53.3
Actual value 0.2 5 5 52.1
Simulation means are shown for the posterior; the mean for the prior was found with a discrete
approximation. Standard errors/standard deviations are given in parentheses for the maximum
likelihood and posterior estimates respectively. The estimates for the 99.5% quantile are also
shown, together with the actual value.
Equation (5), together with the bisection method, was again used to calculate the
99.5% quantile for our annual maximum. Discarding the rst 200 of 3,000 MCMC
realizations, we have s =2,800 and an estimate of 62.5 (to three signicant gures)
for the 99.5% quantile. Our prior beliefs have, in this instance, raised our estimate
for the extreme quantile. We note that this methodology might be particularly useful
where the experts believe that conditions have changed since the loss data was
collected and they wish to add their own input to the estimation process. Our example
results may be the result of expert belief that the losses to be suffered in the future will
be larger than the collected loss data might indicate incorporating their prior beliefs
into our estimation process has resulted in a higher estimate for the 99.5% quantile.
Coles and Tawn (1996) were able to elicit prior information of the kind we have
illustrated here, from their hydrology expert. It remains to be seen whether experts
in other elds, including those for operational losses, will be able to do the same.

5 EXTERNAL DATA
Data sparseness is a particular problem when data is heavy tailed. With only a
limited amount of data, it is very hard to determine the properties of the tail of the
distribution. Banks are increasingly attempting to supplement their loss databases
with external sources of loss data.
These efforts to catalogue the operational loss experience of the industry typically
fall into two categories: databases that use public sources, such as newspapers and
press releases, and consortium databases, based on losses contributed by member
banks, which are then pooled and shared. For both types of databases, the larger
losses are usually more likely to be catalogued, and the threshold above which losses
are reported may or may not be known.
We need to be careful when utilizing external data. The industry is beginning
to realize that one cannot just take a tail event from another bank and look at it
as a signal to what could happen internally. Rather, a better approach might be to
consider tail events occurring only within peer rms that are of a similar size and
operate in a similar business environment, with a comparable scope of business
activities. Attempts to scale data have also been made, although no consensus on
the best practice for this has yet been reached.
Bayesian methods may be used to incorporate the use of external data into the
creation of the model for internal losses. Depending on the source and applicability
of the external data, it may be possible for external data to provide us with prior
information concerning our internal model parameters. This would be especially
desirable in the situation where there is little internal data, resulting in a bad t
when using the internal data alone. There are various ways in which information
from external data can be used to come up with a prior distribution.
We may only have, or have condence in, external data above a high (known)
threshold. In this case, a peak over threshold (POT) approach (Embrechts et al
(2003)) is suggested. Alternatively, we may not know the threshold for the data (or
it may not be of a xed value) this may be the case for external databases derived
from public sources. Here we may prefer to use the largest loss suffered in each year,
and we could consider tting a GEV distribution to the annual maximum losses, as
we did in Section 4.
5.1 Losses over a threshold
In this section, we will illustrate a possible Bayesian approach with a simple example
where we have external data over a known threshold. Suppose we have only a small
number of observations (internal data) from the distribution we wish to t. With only
a small amount of observations from a heavy-tailed distribution, a good t based
on the internal data alone is less likely. Suppose that, in addition, we have a large
quantity of observations (external data) independently sampled from the tail of a
distribution, which we believe has the same shape parameter.

38 C.-L. Liang
We will use external data to provide us with information on the shape parameter.
The main assumption being made is that the internal and external data reects
observations from distributions that have the same shape parameter. We will assume
non-informative priors for the location and scale parameters.
A random variable X is said to have a generalized Pareto distribution (GPD) with
shape parameter k, location parameter and scale parameter , if the distribution
function is given by:
F(x | k, , ) =
_
_
1
_
1
k(x )
_
1/k
if k =0
1 exp
_
(x )
_
if k =0
(13)
where > 0 and for k 0, x > 0 and for k > 0, 0 < x < /k. The extreme
value index, a function of the power of the tail decay, is k. The GPD reduces to the
Pareto distribution when k < 0.
Pickands (1975) proved that the distribution of exceedances above a threshold
will tend to the GPD, as the threshold tends to innity, provided the underlying
distribution function belongs to the domain of attraction of the GEV distribution.
The class of distributions belonging to this domain is fairly large, making the theorem
quite widely applicable.
We will consider a couple of ways of using external data to determine a prior
distribution for the shape parameter:
Method 1. We may t the GPD to the external data, in order to determine
the MLE of the shape parameter for the internal model. We can take the prior
distribution to be normal, with mean given by the MLE of the shape parameter
and standard deviation given by the estimated standard errors in our estimation.
Method 2. Estimators of the extreme value index could be used to provide
information on the shape parameter. Such estimators abound in the literature;
they include those proposed by Pickands (1975), Hill (1975) and Huisman et al
(2001). The estimates for the extreme value index over a range of threshold
values may, for example, be used to generate the prior distribution for the shape
parameter.
5.2 Simulation study
A small simulation study was conducted to illustrate the use of external data in a
Bayesian approach to parameter estimation. For Y Gamma(, ), we will say that
X = exp(Y) is log gamma, with parameters , and . is the shape parameter
for the log gamma distribution, and 1/ is the extreme value index.
Our internal data consisted of 100 iid observations simulated from the
log gamma( =1.2, =2, =7) distribution. We tted the log gamma distribution
to this data by MLE. Figure 5 (see page 39) shows the histogram and empirical cdf

FIGURE 5 Histogram of our internal data and the pdf of the tted log gamma
distribution (left). Empirical cdf for internal data and the tted cdf (right).
3.0 3.2 3.4 3.6 3.8 4.0 4.2
0
2
4
6
PDF
log 10(x)
3.2 3.4 3.6 3.8 4.0 4.2
log 10(x)
CDF
0.0
0.2
0.4
0.6
0.8
1.0
of the internal data, together with the tted pdf and cdf. There was no statistically
signicant evidence for the KolmogorovSmirnov goodness of t test that the
data did not come from the tted distribution. The MLE were =1.11,

=1.70,
=7.00 (to three signicant gures).
We then generated 10,000 iid observations from the tail (50,000) of the
log gamma(1.8, 2, 7.5) distribution this was our external data. We have assumed
that the shape parameter, , is the same for the models generating the internal and
external losses.
For method 1, we tted the GPD distribution to the exceedances over 50,000.
The resulting MLE for the extreme value index was found to be 0.527, with
an estimated standard error of 0.0154 (to three signicant gures). We assumed
that 1/ had a normal prior distribution with mean 0.527 and standard deviation
0.0154, N(0.527, 0.0154
2
), with and having non-informative prior distributions,
N(0, 10
4
).
For method 2, the results from the estimation for the extreme value index,
proposed by Pickands (1975), Hill (1975) and Huisman et al (2001), are shown in
Figure 6 (see page 40). Taking the Huisman estimates, for example, we can form a
non-parametric prior distribution for the extreme value index. and will again be
assumed to have non-informative prior distributions, N(0, 10
4
).
Normal transition kernels were used, with initial parameter values of
1
=1,
1/
1
=0.7,
1
=1. Two thousand realizations of the MCMC simulation for meth-
ods 1 and 2 are shown in Figure 7 (see page 41). Discarding the rst 500 MCMC
realizations gives us the estimates for the log gamma parameters; the results are
summarized in Table 3 (see page 40).

40 C.-L. Liang
FIGURE 6 Estimates for the extreme value index derived from the Pickands, Hill and
Huisman estimators, using external data above varying thresholds.
Number of losses above threshold

E
x
t
r
e
m
e

v
a
l
u
e

i
n
d
e
x
0 200 400 600 800 1,000
0.3
0.4
0.5
0.6
0.7
Hill
Huisman
Pickands
0.46 0.48 0.50 0.52 0.54 0.56
40
30
20
10
0
Extreme value index
The unbroken line shows the actual parameter value, and the dashed line shows the MLE from tting
internal data (left). The histogram for the Huisman estimates for thresholds resulting in 301,000 losses
above the threshold (right).
TABLE 3 Comparison of the Bayesian estimates (to three signicant gures) for the
log gamma model parameters, for priors obtained using methods 1 and 2, with the
estimates obtained through MLE.
Parameter 99.5% quantile
MLE 1.11 1.70 7.00 29,200
Prior method 1 1.26 1.89 6.99 24,800
Prior method 2 1.28 1.91 6.99 24,600
Actual value 1.2 2 7 19,600
The estimates for the 99.5% quantile are 24,800 and 24,600 (to three signicant
gures) for methods 1 and 2 respectively. The estimate from maximum likelihood
tting on the internal data alone is 29,200; the actual 99.5% quantile for the
log gamma(1.2, 2, 7) distribution is 19,600 (to three signicant gures).
In this example, external data has helped us to netune our model for internal
losses. However, it should be noted that, in our example, the external data that we
simulated actually had relevance for our internal model (external and internal shared

FIGURE 7 MCMC realizations of GEV parameters for prior information from exter-
nal data with method 1 (left) and method 2 (right): Top panels, ; middle panels, ;
bottom panels, .
Alpha Alpha
1.0
1.5
2.0
1.0
1.5
2.0
Delta
1.4
1.6
1.8
2.0
2.2
1.4
1.6
1.8
2.0
2.2
Delta
Mu
Iteration
0 500 1,000 1,500 2,000
Iteration
0 500 1,000 1,500 2,000
Iteration
0 500 1,000 1,500 2,000
Iteration
0 500 1,000 1,500 2,000
Iteration
0 500 1,000 1,500 2,000
Iteration
0 500 1,000 1,500 2,000
6.6
6.7
6.8
6.9
7.0
6.6
6.7
6.8
6.9
7.0
Mu
The unbroken line shows the actual parameter value; the dashed line shows the MLE from tting internal
data.

42 C.-L. Liang
the same shape parameter). We have also assumed that we do not have a lot of internal
losses, but we have a large quantity of external losses, in the tail of the distribution.
The model assumptions made here were known to be correct in the simulation
world. If the assumptions are not correct, our estimates will be biased by our
preconceptions or prior beliefs.
6 CONCLUSION
Bayesian analysis provides another way of viewing the problem of parameter
estimation. Most importantly, it allows for the incorporation of prior opinion into the
estimation, which some proponents viewas its greatest strength (and some opponents
view as its greatest weakness).
In this paper, we have considered how Bayesian inference may be used in the
tting of extreme value distributions to extreme losses. The prior specication
may be based on expert opinion, external data sources or a combination of the
two. In particular, Bayesian methods provide a framework for the incorporation of
information derived from external and internal sources. Currently the combination of
data from various sources in operational risk has involved mostly pooling the data or
creating a mixture model. We would have to consider whether to and/or how to scale
the data, for the former methodology; for the latter, we would need to decide on the
mixture coefcient. Bayesian analysis provides another option though choices still
do have to be made by the modeler.
The simulation study conducted here demonstrates how prior information may
be used to affect the tted distribution and that, where the prior specication does
provide relevant information about the distribution of the losses, the estimation
of the model parameters can be improved. The development of simulation-based
techniques, in particular MCMC, has overcome the difculty of computation of
the posterior, and it has made Bayesian techniques very popular in many areas of
application.
REFERENCES
Coles, S. G., and Tawn, J. A. (1996). A Bayesian analysis of extreme rainfall data. Applied
Statistics 45(4), 463478.
Embrechts, P., Klppelberg, C., and Mikosch, T. (2003). Modelling Extremal Events for
Insurance and Finance. Springer-Verlag, New York.
Fisher, R. A., and Tippett, L. H. C. (1928). Limiting forms of the frequency distribution of
the largest or smallest member of a sample. Proceedings of the Cambridge Philosophical
Society 24, 180190.
Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution.
Annals of Statistics 3, 11631174.
Huisman, R., Koedijk, K. G., Kool, C. J. M., and Palm, F. (2001). Tail-index estimates in small
samples. Journal of Business and Economic Statistics 19(1), 208216.

Peters, G. W., and Sisson, S. A. (2006). Bayesian inference, Monte Carlo sampling and
operational risk. The Journal of Operational Risk 1(3), 2750.
Pickands, J. (1975). Statistical inference using extreme order statistics. Annals of Statistics 3,
199131.
Shevchenko, P., and Wthrich, M. (2006). The structural modelling of operational risk via
Bayesian inference: combining loss data with expert opinions. The Journal of Operational
Risk 1(3), 326.


Bayesian Analysis of Extreme Operational Losses: Chyng-Lan Liang

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Analysis of Extreme Operational Losses: Chyng-Lan Liang

Uploaded by

Copyright:

Available Formats

The Journal of Operational Risk (2743) Volume 4/Number 3, Fall 2009

Bayesian analysis of extreme operational

P(X < c | )P( | x

Number of losses above threshold

You might also like