You are on page 1of 18

A Mixing Model For Operational Risk

Jim Gustafsson∗ Jens Perch Nielsen†

30th June 2008

Abstract

External data can often be useful in improving estimation of operational risk loss distributions.

This paper develops a systematic approach that incorporates external information into internal

loss distribution modelling. The standard statistical model resembles bayesian methodology or

credibility theory in the sense that prior knowledge (external data) has more weight when internal

data is scarce than when internal data is abundant.

Key words and phrases: Mixing data sources, Prior knowledge, Operational risk, Actuarial

loss models, Transformation, Generalized Champernowne distribution, Scarce data.

JEL Classification: C13, C14.

1 Introduction

Calculating loss distributions for operational risk by only using internal data often fails to capture
potential risks and unprecedented large loss amounts that has huge impact on the capital. Conversely,

calculating loss distribution by only using external data provides a capital figure that is not sensi-

tive to internal losses, does not increase despite the occurrence of a large internal loss, and does not
decrease despite the improvement of internal controls. The proposed model in this paper contains

both internal and external data. We first go through an intuitive and simplified example on how prior

RSA Scandinavia & University of Copenhagen | Gammel Kongevej 60 | DK 1790 Copenhagen V, Denmark | E-mail:
jgu@codan.dk | telephone: +45 33 555 23 57, fax: +45 33 55 21 22 |

Cass Business School | E:mail: Jens.Nielsen,1@city.ac.uk |

Electronic copy available at: http://ssrn.com/abstract=1090863


knowledge can be incorporated into modern estimation techniques. Consider the simplified example:

Let (Hi )1≤i≤n be iid stochastic variables with density h on [0, 1], and let hθ be another density on [0, 1]
representing prior knowledge of the density h. We wish to estimate h guided by our prior knowledge

density hθ . Let now gθ = hθ /h and let ĥθ = h · ĝθ be our estimator of h guided by prior knowledge.
When hθ is close to h then gθ is close to the density of a Uniform distribution. To estimate gθ we

take advantage of nonparametric smoothing techniques, where we consider the simplest smoothing
technique available: kernel smoothing. Note that in the above estimation of h, we have changed the

estimation from estimating the completely unknown and complicated density h to the simpler gθ that

is close to the density of a Uniform distribution when hθ represent good prior knowledge. Now let ĝθ

be a kernel density estimation function that estimates gθ . The nature of kernel smoothing is such that

it smooths a lot when data is sparse. In our case this means that ĥθ will be close to hθ for very small
sample sizes. When data is more abundant not as much kernel smoothing will take place and the com-
plicated features of gθ can be visualized. Therefore, kernel smoothing incorporating prior knowledge
has many similarities to bayesian methodology or credibility theory that also use global information

as prior knowledge when data is scarce and a more local estimation when data is abundant.

The purpose of this paper is to use prior knowledge from external data to improve our estimation
of the internal data distribution. The underlying dynamics of the new mixing model transformation
approach reflects the simple intuition in the illustration above on how to include prior knowledge

on [0, 1]-densities. In our case we wish to estimate loss distributions on the entire positive real axis.

This is done by estimating a distribution Fθ on the external data. Then we transform the internal data
(Hi )1≤i≤n with the estimated external cumulative distribution function, i.e using Fθ̂ (Hi ). The trans-

formed data set will have a density equal to the true internal density divided by the estimated external

density distribution transformed such that the support is on [0, 1]. When the external data distribution
is a close approximation to the internal data distribution, we have simplified the problem to estimating

something close to a Uniform distribution. When this estimation problem has been solved through

kernel smoothing, we back-transform to the original axis and find our final mixing estimator.

Electronic copy available at: http://ssrn.com/abstract=1090863


The proposed mixing model is based on a semi-parametric estimator of the loss distribution. Klug-

man, Panjer and Willmot (1998), McNeil, Frey and Embrechts (2005), Cizek, Härdle and Weron
(2005) and Panjer (2006) are important introductions to actuarial estimation techniques of purely

parametric loss distributions. Cruz (2001) describes a number of parametric distributions which can
be used for modeling loss distributions. Embrechts, Klüppelberg and Mikosch (1999) and Embrechts

(2000) focus on the tail and Extreme Value Theory (EVT). Further, the recent paper by Degen, Em-
brechts and Lambrigger (2007) discusses some fundamental properties of the g-and-h distribution

and how it is linked to the well documented EVT based methodology. However, if one focus on the

excess function, the link to the Generalized Pareto Distribution (GPD) has an extremely slow conver-

gency rate and capital estimation for small level of risk tolerance using EVT may lead to inaccurate

results if the parametric g-and-h distribution is chosen as model. Also Diebold, Schuermann and
Stroughair (2001) and Dutta and Perry (2006) stress weaknesses of EVT when it comes to real data
analysis. This problem may be solved by non- and semi-parametric procedures. For a review of
modern kernel smoothing techniques see Wand and Jones (1995). During the last decade a class of

semi-parametric methods was developed and designed to work better than a purely non-parametric

approach, see Bolancé, Guillén and Nielsen (2003), Hjort and Glad (1995) and Clements, Hurn and
Lindsey (2003). They showed that non-parametric estimators could substantially be improved in a
transformation process and they offer several alternatives for the transformation itself.

The paper by Buch-Larsen, Bolancé, Guillén and Nielsen (2005) maps the original data within
[0, 1] via the parametric start and correct non-parametrically for possible misspecification. A flexible

parametric start helps to mitigate an accurate estimation, Buch-Larsen et al. (2005) generalized the

Champernowne distribution (GCD), see Champernowne (1936,1952), and used the cumulative dis-

tribution as transformation function. In the spirit of Buch-Larsen et al. (2005) many remedies have
been proposed. Gustafsson, Hagmann, Nielsen and Scaillet (2008) investigate the performance be-

tween symmetric and asymmetric kernels in [0,1], Gustafsson, Nielsen, Pritchard and Roberts (2006)

estimate operational risk losses with a continuous credibility model, and Guillén, Gustafsson, Nielsen
and Pritchard (2007), and the extended version by Buch-Kromann, Englund, Gustafsson, Nielsen and

3
Thuring (2007), introduced the concept of under-reporting in operational risk quantification. Bolancé,

Guillén and Nielsen (2008) develop a transformation kernel density estimator by using a double
transformation to estimate heavy-tailed distributions. Before describing the new method we mention

other recent methods in the literature. The papers by Shevchenko and Wüthrich (2006), Bühlmann,
Shevchenko and Wüthrich (2007) and Lambrigger, Shevchenko and Wüthrich (2007) combine loss

data with scenario analysis information via Bayesian inference for the assessment of operational risk,
and Verrall, Cowell and Khoon (2007) examine Bayesian networks for operational risk assessment.

Figini, Guidici, Uberti and Sanyal (2008) develops a method that can estimate the internal data distri-

bution with help from truncated external data originating from the exact same underlying distribution.

In our approach we allow the external distribution to be different from the internal distribution, our

transformation approach has the consequence that the internal estimation process is guided by the
more stable estimator originating from the external data. If the underlying distribution of the external
data is close to the underlying distribution of the internal data, then this approach improves the effi-
ciency of estimation. We also present a variation of this approach where we correct for the estimated

median, such that only the shape of the underlying internal data has to be close to the shape of the

underlying external data for our procedure to work well.

2 The asymptotic properties of the transformation approach

Let (Xi )1≤i≤n be a sequence of iid random losses from a probability distribution F (x) with unknown
density function f (x). Let T (x) be a set of twice continuous differentiable transformations, with a

first derivative T 0 (x) that serves as a global start. This density is assumed to provide a meaningful but

potentially inaccurate description of f (x). The transformation function T (x) could be a cumulative
distribution function, or it could be a non-parametric class of functions, see e.g Wand et al. (1991)

and Buch-Larsen et al. (2005). The transformed density estimator with a local constant boundary

4
correction can be written as
à n
!
X
fˆ(x) = T (x) · 0 −1
α01 (T (x), h) −1
n Kh (T (Xi ) − T (x))
i=1
= T 0 (x) · l(T (x)). (2.1)

with local model l(·), transformed losses T (Xi ), bandwidth h, symmetric kernel Kh with Kh (·) =
1
¡·¢
h
K h
and
Z min{1,u/h}
αij (u, h) = (vh)i K(v)j dv. (2.2)
max{−1,(u−1)/h}

A deeper analysis of the transformation process could be found in e.g Gustafsson et al. (2008). The

asymptotic theory of (2.1) are presented by theorem 1, and the proof can be found in Buch-Larsen et

al. (2005).

Theorem 1 Let the transformation function T (x) be a two times differentiable known function. The
bias of fˆ(x) is given by

no
ˆ
E f (x) − f (x) = 2−1 α21 (T (x), h) β(T (x)) + o(h2 )

·n o0 ¸0
f (x) 1
with β(T (x)) = T 0 (x) T 0 (x)
and the variance

n o ¡ ¢
V fˆ(x) = α02 (T (x), h) (nh)−1 f (x)T 0 (x) + o (nh)−1

where the asymptotics is given for n → ∞, h → 0 and nh → ∞.

3 The transformation technique

In the previous section a general description of the transformation approach and its asymptotic prop-

erties was given. In this section we assume that the transformation function is a cumulative distri-

bution function described by T (x, θ) with density T 0 (x, θ), indexed by the global parameter θ =
{θ1 , θ2 , ..., θp } ∈ Θ ∈ Rp . As before, let (Xi )1≤i≤n be a sequence of random losses originating from a

5
probability distribution F (x) with unknown density f (x). Then equation (2.1) could be reformulated

as
à n
!
X
fˆ(x, θ) = T (x, θ) ·
0
α01 (T (x, θ), h) −1 −1
n Kh (T (Xi , θ) − T (x, θ))
i=1
= T 0 (x, θ) · l(T (x, θ)) (3.1)

Now let θ̂ be some suitable estimator of θ, then the estimator (3.1) becomes
à !
³ ´−1 n
X ³ ´
fˆ(x, θ̂) = T (x, θ̂) ·
0
α01 T (x, θ̂), h −1
n Kh T (Xi , θ̂) − T (x, θ̂)
i=1

= T 0 (x, θ̂) · l(T (x, θ̂)) (3.2)

From e.g Buch-Larsen et al. (2005) we know that when θ̂ is a square-root-n-consistent estimator of

θ the asymptotic theory of fˆ(x, θ̂) is equivalent to the asymptotic theory of fˆ(x, θ). The asymptotic
theory of fˆ(x, θ) immediately follows from Theorem 1, and

n ³ ´ ³ ´o µ ¶
√ α02 (T (x), h) f (x)T 0 (x)
ˆ ˆ
nh f x, θ̂ − Ef x, θ̂ ∼ N 0,
nh

holds.

Let now (Xi )1≤i≤n be a sequence of collected internal losses and let (Yj )1≤j≤m be a sequence of

external reported losses. Then (Yj )1≤j≤m represent prior knowledge that should enrich a limited in-
ternal data set. The parametric density estimator based on the external data set offers prior knowledge

from the general industry. The correcting of this prior knowledge is based on a local kernel smoother

of the transformed internal data points. This provide us with consistent and fully nonparametric esti-

mator of the density of the internal data . The approach is therefore fully nonparametric at the same
time as it is informed by prior information based on external data sources. The new mixing local

transformation kernel density estimator has asymptotics as (3.1) in Theorem 1. Hence, if T 0 (x, θ̂y ) is

a misspecified approximation of f (x), a convergency in probability of θ̂y takes place to the pseudo

6
true value θ which minimizes the Kullback-Leibler distance1 between T 0 (x, θ̂y ) and the true density

f (x). Therefore θ̂y is a square-root-n-consistent estimator of θ. The proposed mixing model could be
expressed as
à !
³ ´−1 n
X ³ ´
fˆ(x, θ̂ ) = T (x, θ̂ ) ·
y 0 y y
α01 T (x, θ̂ ), h n −1 y y
Kh T (Xi , θ̂ ) − T (x, θ̂ )
i=1
0 y y
= T (x, θ̂ ) · l(T (x, θ̂ )) (3.3)

and have the same asymptotic theory as (3.1) for n → ∞, h → 0 and nh → ∞.

The global start T 0 (x, ·) in (3.2) and (3.3) are modelled by the Generalized Champernowne Dis-
tribution (GCD) with density

α (x + c)(α−1) ((M + c)α − (c)α )


T 0 (x, ·) = 2 , x ∈ R+ (3.4)
((x + c)α + (M + c)α − 2(c)α )

incorporating internal parameters θ = {α, M, c} and external parameters θy = {αy , M y , cy }. Here α

is a tail parameter, M controls the body of the distribution and c is a shift parameter that control the
domain close to zero. The parameter vectors θ and θy are estimated by maximum likelihood and for
a deeper analysis on the GCD, see Buch-Larsen et al. (2005).

Another slightly different mixing model will also be included in the paper. This estimator follows
the same idea as in Gustafsson et al. (2005). The model is the same as (3.3) but instead of using

the prior knowledge parameter vector θ̂y , this generalized model transforms the internal losses with

a combined parameter vector θ̂ȳ = {α̂y , M̂ , ĉy }. The differences are that prior knowledge is only
provided by the tail parameter α̂y and the shift parameter ĉy , while the mode is represented by the

estimated internal parameter M̂ . With this model we enclose the external parameters α̂y and ĉy with

a body described internally by M̂ .


1
The Kullback-Leibler distance is a distancenfunction from
o a true probability density, f (x), to a target probability
0 y
R∞ f (w)
estimator, T (x, θ̂ ), and defined as 0 f (w)log T 0 (w,θ̂y ) dw

7
4 Data study

This section demonstrates the practical relevance of the above presented mixing model. We estimate
a loss distribution and calculate next years operational risk exposure by using the common risk mea-
sures Value-at-Risk (VaR) and Tail-Value-at-Risk (TVaR) for different levels of risk tolerance. For

the severity estimator we employ model (3.2) and model (3.3), and benchmark with the parametric
Weibull distribution. For the frequency model we assume that N (t) is an independent homogenous

Poisson process, N (t) ∈ P o(λt), with positive intensity λ. By using Monte Carlo simulation with

the severity and frequency assumptions we could create a simulated one year operational risk loss

distribution.

We denote the internal collected losses from the event risk category Employment Practices and
Workplace Safety with (Xi )1≤i≤N (t) where N (t) describes random number of losses over a fixed time
period t = 0, ..., T and with N (0) = 0. Further, (Yj )1≤j≤M (t) represent external data from the same

event risk category with random number of losses M (t) with M (0) = 0. Table 1 reports summary
statistics on each data sets.

TABLE 1

Statistics for Event Risk Category Employment Practices and Workplace Safety

Number of Maximum Sample Sample Standard Time

Losses Loss ($M) Mean ($M) Median ($M) Deviation ($M) Horizon (T )

Internal data 120 16.80 1.86 0.30 3.66 2

External data 6526 561.43 1.82 0.18 13.41 8

The mean and median are similar for the two data sets. However, the number of losses, the maximum

loss, the standard deviation and the collection period are widely different. We condition on N = n and
sample the Poisson process of internal events that occur over a one year time horizon. The maximum

likelihood estimator of the annual intensity of internal losses n = 120 is λ̂ = n/T = 120/2 = 60,

and denote the annual simulated frequencies by λ̂r with r = 1, ..., R and number of simulations
R = 10.000. For each λ̂r we draw randomly uniform distributed samples and combine these with

8
loss sizes taken from the inverse function of the severity distribution. The outcome is the annual total

loss distribution denoted by


λ̂r
X
Sr = F̂ ← (urk , ·), r = 1, ..., R
k=1

with urk ∈ U (0, 1) for k = 1, ..., λ̂r , and

Z urk
F̂ (urk , ·) = fˆ(ξ, ·)dξ
0

with incorporating θ̂ for model (3.2) or the prior knowledge parameters θ̂y and θ̂ȳ for (3.3). For

simplicity, we introduce the abbreviations M1 for the semi-parametric model (3.2), M2 and M3 for

(3.3) with θ̂y and θ̂ȳ respectively. M4 is the purely external data version of (3.2), where the prior
knowledge is locally corrected with external data. This corresponds to the situation when a company
has not started to collect internal data and must rely entirely on prior knowledge. Finally, M5 is
the benchmark model with a Weibull assumption on the severity. Then, a loss distribution could be
calculated with relevant return periods and thereby identify the capital to be held to protect against

unexpected losses. The two risk measured used are the VaR

VaRα (Sr ) = sup {s ∈ R | P (Sr ≤ s) ≤ α}

and TVaR that give us the expectation of the area above VaR and is defined as

TVaRα (Sr ) = E[Sr | Sr ≥ V aRα (Sr )]

for risk tolerance α. Table 2 collects summary statistics for the simulated total loss distribution across
each model. Among the usual summary statistics we report VaR and TVaR for risk tolerance α =

{0.95, 0.99, 0.999}.

9
TABLE 2

Statistics of Simulated Loss Distributions for Event Risk Category Employment Practices and Workplace Safety (∼ $M)

Model Mean Median Sd VaR95% VaR99% VaR99.9% TVaR95% TVaR99% TVaR99.9%

M1 226 178 149 547 765 980 691 889 1180

M2 244 186 167 609 881 1227 787 1025 1385

M3 296 213 241 633 936 1331 820 1143 1487

M4 316 274 322 801 1165 1644 1073 1388 1858

M5 100 97 27 249 272 294 363 383 405

One can see from Table 2 that all loss distributions are right skewed since all means are larger than

its respective median. Interpreting the VaR results one notices that the fully prior knowledge model
M4 shows much larger values than the others. The benchmark model M5 predicts the lowest values

of the considered models and if we compare M5 against the purely internal estimator M1 , model M5

suggests 30% − 40% capital requirement, of the total amount M1 recommends. For the mixing model
M2 we have only 8% higher sample mean and 10% − 25% higher VaR values than M1 and this is due
to prior knowledge based on external data. If we view the TVaR results, similar pattern as in VaR
could be visualized. A conclusion is that the two mixing models, M2 and M3 , are able to stabilizing

the estimation process and does not underestimate the tail as model M5 apparently does.

5 Data Driven Simulation Study

This section provides a data driven simulation study. Here we evaluate scenarios with different prior

knowledge for the new mixing model. The study demonstrates the intuitive expectation, namely, the

mixing model outperform model This simulation study teach us what we already expected, namely,
that the mixing model outperform model (3.2) in situations with scarce data. As data becomes more

abundant, we experienced that the proposed model oppose the performance, sometimes even hurting

a bit. Another insight is that the parametric Weibull distribution provides doubtful goodness-of-fit

to heavy tailed distributed data. The study brings into play two changing true distributions with

10
unlike characteristics and the idea with a data driven simulation study, in contrast to a predetermined

simulation study, is that the two true distributions are estimated on the real operational risk samples
described in the previous section. The two distributions used in the study are presented by their

respective densities below.

log(x)−µ 2
e− 2 ( ) , x ∈ R , with location- and scale parameter
1
1
1. Lognormal, f (x, ξ1 ) = xσ


σ
+

ξ1 = {µ, σ}.

α(x+c)(α−1) ((M +c)α −(c)α )


2. Generalized Champernowne Distribution (GCD), f (x, ξ2 ) = ((x+c)α +(M +c)α −2(c)α )2
,x ∈ R+ ,

with ξ2 = {α, M, c}.

The study was also made with the distributions; Exponential, Weibull, Gamma and Pareto. The
outcomes were similar to the lognormal and GCD, but are not outlined in the paper although avail-

able upon request. The two true distributions are estimated on the two available data sets by max-

imum likelihood estimation and produce the estimated parameter vectors ξˆd and ξˆdy with d = 1, 2.
Random losses are drawn from the estimated true models to obtain pseudo data. The internal data
b
are expressed by (Xdi )1≤i≤nk , were d indicate distribution treated, the iteration b = 1, ..., B, with

B = 2000, and the sample sizes simulated for next years exposure N (t) = nk are chosen as

nk = {10, 20, 50, 100, 200, 500, 1000, 2000, 4000}. The external samples are denoted by (Ydjb )1≤j≤ml

with M (t) = ml where ml = {100, 5000}. For each data set the internal and external global start,
T 0 (x, ·) in (3.2) and (3.3), are assumed to be explained by the GCD described in (3.4). The model

described in (3.2) is then estimated on each sample combination {b, d, nk }, and for each sample nk

we estimate the mixing model (3.3) on external data combination {b, d, ml }.

In summary, for each internal sample nk six estimators will be compared. One that is pure internal
(3.2), four scenarios of the mixing model (3.3), informed with different external sample sizes ml and

different global parametric start. Also here we benchmark with a parametric Weibull distribution.
The performance measures considered are means over replications for the integrated squared error

(ISE), which measure the error between the true and the estimated density with equal weight across

the support, and the weighted integrated squared error (WISE) that focuses on the fit in the tails. We

11
can write the statistical performance measures as
 1/2 
B
X Z∞ ³ ´2
1  
ISE =  fˆ(x, ·) − f (x, ξˆd ) dx 
B b=1 0
 1/2 
B
X Z∞ ³ ´2
1  
WISE =  fˆ(x, ·) − f (x, ξˆd ) x2 dx 
B b=1 0

with true density f (x, ξd ), and the estimator fˆ (x, ·) as one of the six different models. Note that the

true model is only enriched with the internal data.

Figure 1 presents the results in the situation with the true lognormal distribution. We illustrate the
result by a relative error, meaning that the outcome that the pure internal estimator of ISE and WISE

are divided with the outcome of the proposed models. A curve above one provides a worse prediction

from the mixing model in that specific sample point.

*** Figure 1 About Here ***

The top left-hand graph fluctuates around one. The mixing model therefore has similar ISE as the
pure internal estimator. For some internal sample points we have less than a half percentage better

results using the pure internal estimator (3.2), and of the compared mixing models the model that is

enriched with 5000 external observations, fˆ(x, θ̂y )5000 performs best when the maximum number of

internal observations are reviewed. The top right-hand graph presents the benchmark model perfor-

mance. The parametric Weibull grows from 20% worse to above 30% worse as the internal sample

size increases. For the tail measure WISE we stress that above n4 the mixing models are stabilized

between 1%-2% worse performance. Note that the breakpoint are higher when one focuses on the
extremes than with the whole axes as in ISE. The bottom right-hand graph presents the benchmark

performance and clearly the Weibull has predicts poorly the true underlying density of the internal

data set with culmination 60% poorer results on sample size n9 = 4000.

12
In the situation with the GCD almost identical appearance as in the lognormal case could be rec-

ognized. Figure 2 gives a visual interpretation of the results.

*** Figure 2 About Here ***

For the ISE the breakpoint have increased to around n5 compared to the lognormal, thereafter the

mixing models become stabilized in the region between 5%-10% poorer prediction power. Focusing
on WISE we could see that the breakpoint are between 100-500 internal observations.

6 Conclusions

This paper gives an intuitive and easily implemented model that combine different data sources. From
the data study we found that using a light tailed parametric fit, small numbers on the Value-at-Risk

would be identified. Also, the mixing methods seem to give more stable and more realistic results than
parametric methods not including external data. From the data driven simulation study we conclude

that in the situation with scarce internal data the proposed mixing model outperforms the model with
only one data source. However, as the internal data becomes more abundant the breakpoint where
(3.2) becomes better than (3.3) could be identified. Also we find that the breakpoints catch in earlier

for light-tailed distributions than for heavy-tailed ones. Large data sets will often do better without

prior knowledge in the sense that the transformation is better estimated from pure internal data, how-

ever a deep data analysis needs to be done by practioners when the breakpoint is reached to find out

if the internal data alone includes large losses that they believe cover the whole company operational
risk exposure. Otherwise the mixing model should be preferred since it automatically enriches the

internal exposure with large external losses.

13
References

[1] B OLANC É , C., G UILL ÉN , M., AND N IELSEN , J.P. (2003). Kernel density estimation of actu-
arial loss functions. Insurance, Mathematics and Economics, Vol. 32, 19-36.

[2] B OLANC É , C, G UILL ÉN , M. AND N IELSEN , J.P. (2008). Inverse Beta transformation in kernel

density estimation. Statistics & Probability Letters (to appear).

[3] B UCH -L ARSEN , T., N IELSEN , J.P., G UILL ÉN , M. AND B OLANC É , C. (2005). Kernel den-
sity estimation for heavy-tailed distributions using the Champernowne transformation. Statistics,

Vol. 39, No. 6, 503-518.

[4] B UCH -K ROMANN , T., E NGLUND , M., G USTAFSSON , J., N IELSEN , J.P. AND T HURING , F.

(2007). Non-parametric estimation of operational risk losses adjusted for under-reporting. Scan-
dinavian Actuarial Journal, Vol. 4, 293-304.

[5] B ÜHLMANN , H., S HEVCHENKO , P.V. AND W ÜTHRICH , M.V. (2007). A ”toy” model for

operational risk quantification using credibility theory. Journal of Operational Risk, Vol. 2, No.
1, 3-20.

[6] C HAMPERNOWNE , D.G. (1936). The Oxford meeting, September 25-29, 1936. Econometrica,

Vol. 5, No. 4, October 1937.

[7] C HAMPERNOWNE , D.G. (1952). The graduation of income distributions. Econometrica, Vol.

20, 591-615.

[8] C IZEK , P., H ÄRDLE , W. AND W ERON , R. (2005). Statistical Tools for Finance and Insurance.

Springer-Verlag Berlin Heidelberg.

[9] C LEMENTS , A.E., H URN , A.S. AND L INDSAY, K.A. (2003). Möbius-like mappings and their

use in kernel density estimation. Journal of the Americal Statistical Association, Vol. 98, 993-
1000.

[10] C RUZ , M.G. (2001). Modeling, Measuring and Hedging Operational Risk. John Wiley & Sons,
LTD.

14
[11] D EGEN , M., E MBRECHTS , P. AND L AMBRIGGER , D.D. (2007). The Quantitative Modeling

of Operational Risk: Between G-and-H and EVT. Astin Bulletin, Vol. 37, No. 2, 265-292.

[12] D IEBOLD , F., S CHUERMANN , T. AND S TROUGHAIR , J. (2001). Pitfalls and oppurtunities in

the use of extreme value theory in risk mangement. In: Refenes, A.-P., Moody, J. and Burgess,
A. (Eds.), Advances in Computational Finance, Kluwer Academic Press, Amsterdam, pp. 3-12,

Reprinted from: Journal of Risk Finance, Vol. 1, 30-36 (Winter 2000).

[13] D UTTA , K. AND P ERRY, J. (2006). A tale of tails: an empirical analysis of loss distribution

models for estimating operational risk capital. Federal Reserve Bank of Boston, Working Paper

No 06-13.

[14] E MBRECHTS , P., K L ÜPPELBERG , C. AND M IKOSCH , T. (1999). Modeling Extremal Events

for Insurance and Finance. Springer.

[15] E MBRECHTS , P. (2000). Extremes and Integrated Risk Management. London: Risk Books,

Risk Waters Group.

[16] F IGINI , S., G IUDICI , P., U BERTI , P. AND S ANYAL , A. (2008). A statistical method to opti-
mize the combination of internal and external data in operational risk measurement. Journal of

Operational Risk, Vol. 2, No. 4, 69-78.

[17] G UILL ÉN , M., G USTAFSSON , J., N IELSEN , J.P. AND P RITCHARD , P. (2007). Using external
data in operational risk. Geneva Papers, Vol. 32, 178-189.

[18] G USTAFSSON , J., N IELSEN , J.P., P RITCHARD , P. AND ROBERTS , D. (2006). Quantifying
Operational Risk Guided by Kernel Smoothing and Continuous credibility: A Practioner’s view.

Journal of Operational Risk, Vol. 1, No. 1, 43-56.

[19] G USTAFSSON , J., H AGMANN , M., N IELSEN , J.P. AND S CAILLET, O. (2008). Local Trans-
formation Kernel Density Estimation of Loss Distributions. Journal of Business and Economic

Statistics (to appear)

15
[20] H JORT, N.L. AND G LAD , I.K. (1995). Nonparametric Density Estimation with a Parametric

Start. Annals of Statistics, Vol. 24, 882-904.

[21] K LUGMAN , S.A., PANJER , H.A. AND W ILLMOT, G.E. (1998). Loss Models: From Data to

Decisions. New York: John Wiley & Sons, Inc.

[22] L AMBRIGGER , D.D., S HEVCHENKO , P.V. AND W ÜTHRICH , M.V. (2007). The quantification
of operational risk using internal data, relevant external data and expert opinion. Journal of

Operational Risk, Vol. 2, No. 3, 3-27.

[23] M C N EIL , A.J., F REY, R. AND E MBRECHTS , P. (2005). Quantitative Risk Management.
Princeton Series in Finance.

[24] PANJER , H.H. (2006). Operational Risk: Modeling Analytics. New York: John Wiley & Sons,
Inc.

[25] S HEVCHENKO , P.V. AND W ÜTHRICH , M.V. (2006). The structural modeling of operational
risk via Bayesian inference. Journal of Operational Risk, Vol. 1, No. 3, 3-26.

[26] V ERRALL , R.J., C OWELL , R. AND K HOON , Y.Y. (2007). Modelling Operational Risk with

Bayesian Networks. Journal of Risk and Insurance, Vol. 74, No. 4, 795-827.

[27] WAND , M.P., M ARRON , J.S. AND RUPPERT, D. (1991). Transformation in Density Estima-

tion (with comments). Journal of the American Statistical Association, Vol. 94, 1231-1241.

[28] WAND , M.P. AND J ONES , M.C. (1995). Kernel Smoothing. Chapman & Hall.

16
Lognormal Distribution Lognormal Distribution

1.4
1.002

1.3
1.2
ISE (Rel)

ISE (Rel)

1.1
0.998

1.0
^ ^y
f (x, θ )100
^ ^y
f (x, θ )5000
0.9
^ ^y
f (x, θ )
100
0.994

^ ^y
f (x, θ )5000
0.8

Benchmark Weibull
10

20

50

100

200

500

1000

2000

4000

10

20

50

100

200

500

1000

2000

4000
Internal Data Internal Data

Lognormal Distribution Lognormal Distribution


1.6
1.02

1.4
WISE (Rel)

WISE (Rel)
1.00

1.2

^ ^y
f (x, θ )100
0.98

1.0

^ ^y
f (x, θ )5000
^ ^y
f (x, θ )
100
^ ^y
0.96

f (x, θ )5000
0.8

Benchmark Weibull
10

20

50

100

200

500

1000

2000

4000

10

20

50

100

200

500

1000

2000

4000

Internal Data Internal Data

Figure 1: The relative error between model (3.2) and mixing model (3.3) estimated on data driven lognormal
data. The internal data used is in the range [10, 4000] and the mixing model (3.3) is enriched by the
external sample sizes m = 100 and m = 5000.

17
GCD GCD
1.2

3.0
1.1

2.5
ISE (Rel)

ISE (Rel)
1.0

2.0
0.9

1.5
^ ^y
f (x, θ )100
^ ^y
f (x, θ )5000
0.8

1.0
^ ^y
f (x, θ )
100
^ ^y
f (x, θ )5000
0.7

0.5

Benchmark Weibull
10

20

50

100

200

500

1000

2000

4000

10

20

50

100

200

500

1000

2000

4000
Internal Data Internal Data

GCD GCD
10
2.0

8
WISE (Rel)

WISE (Rel)
1.5

6
1.0

^ ^y
f (x, θ )100
^ ^y
f (x, θ )5000
0.5

^ ^y
f (x, θ )
100
^ ^y
f (x, θ )5000
0.0

Benchmark Weibull
0
10

20

50

100

200

500

1000

2000

4000

10

20

50

100

200

500

1000

2000

4000

Internal Data Internal Data

Figure 2: The relative error between model (3.2) and mixing model (3.3) estimated on data driven GCD data.
The internal data used is in the range [10, 4000] and the mixing model (3.3) is enriched by the
external sample sizes m = 100 and m = 5000.

18

You might also like