A Mixing Model For Operational Risk

A Mixing Model For Operational Risk
Jim Gustafsson∗ Jens Perch Nielsen†
30th June 2008
Abstract
External data can often be useful in improving estimation of operational risk loss distributions.
This paper develops a systematic approach that incorporates external information into internal
loss distribution modelling. The standard statistical model resembles bayesian methodology or
credibility theory in the sense that prior knowledge (external data) has more weight when internal
data is scarce than when internal data is abundant.
Key words and phrases: Mixing data sources, Prior knowledge, Operational risk, Actuarial
loss models, Transformation, Generalized Champernowne distribution, Scarce data.
JEL Classification: C13, C14.
1 Introduction
Calculating loss distributions for operational risk by only using internal data often fails to capture
potential risks and unprecedented large loss amounts that has huge impact on the capital. Conversely,
calculating loss distribution by only using external data provides a capital figure that is not sensi-
tive to internal losses, does not increase despite the occurrence of a large internal loss, and does not
decrease despite the improvement of internal controls. The proposed model in this paper contains
both internal and external data. We first go through an intuitive and simplified example on how prior
∗
RSA Scandinavia & University of Copenhagen | Gammel Kongevej 60 | DK 1790 Copenhagen V, Denmark | E-mail:
jgu@codan.dk | telephone: +45 33 555 23 57, fax: +45 33 55 21 22 |
†
Cass Business School | E:mail: Jens.Nielsen,1@city.ac.uk |
Electronic copy available at: http://ssrn.com/abstract=1090863

knowledge can be incorporated into modern estimation techniques. Consider the simplified example:
Let (Hi )1≤i≤n be iid stochastic variables with density h on [0, 1], and let hθ be another density on [0, 1]
representing prior knowledge of the density h. We wish to estimate h guided by our prior knowledge
density hθ . Let now gθ = hθ /h and let ĥθ = h · ĝθ be our estimator of h guided by prior knowledge.
When hθ is close to h then gθ is close to the density of a Uniform distribution. To estimate gθ we
take advantage of nonparametric smoothing techniques, where we consider the simplest smoothing
technique available: kernel smoothing. Note that in the above estimation of h, we have changed the
estimation from estimating the completely unknown and complicated density h to the simpler gθ that
is close to the density of a Uniform distribution when hθ represent good prior knowledge. Now let ĝθ
be a kernel density estimation function that estimates gθ . The nature of kernel smoothing is such that
it smooths a lot when data is sparse. In our case this means that ĥθ will be close to hθ for very small
sample sizes. When data is more abundant not as much kernel smoothing will take place and the com-
plicated features of gθ can be visualized. Therefore, kernel smoothing incorporating prior knowledge
has many similarities to bayesian methodology or credibility theory that also use global information
as prior knowledge when data is scarce and a more local estimation when data is abundant.
The purpose of this paper is to use prior knowledge from external data to improve our estimation
of the internal data distribution. The underlying dynamics of the new mixing model transformation
approach reflects the simple intuition in the illustration above on how to include prior knowledge
on [0, 1]-densities. In our case we wish to estimate loss distributions on the entire positive real axis.
This is done by estimating a distribution Fθ on the external data. Then we transform the internal data
(Hi )1≤i≤n with the estimated external cumulative distribution function, i.e using Fθ̂ (Hi ). The trans-
formed data set will have a density equal to the true internal density divided by the estimated external
density distribution transformed such that the support is on [0, 1]. When the external data distribution
is a close approximation to the internal data distribution, we have simplified the problem to estimating
something close to a Uniform distribution. When this estimation problem has been solved through
kernel smoothing, we back-transform to the original axis and find our final mixing estimator.
Electronic copy available at: http://ssrn.com/abstract=1090863

The proposed mixing model is based on a semi-parametric estimator of the loss distribution. Klug-
man, Panjer and Willmot (1998), McNeil, Frey and Embrechts (2005), Cizek, Härdle and Weron
(2005) and Panjer (2006) are important introductions to actuarial estimation techniques of purely
parametric loss distributions. Cruz (2001) describes a number of parametric distributions which can
be used for modeling loss distributions. Embrechts, Klüppelberg and Mikosch (1999) and Embrechts
(2000) focus on the tail and Extreme Value Theory (EVT). Further, the recent paper by Degen, Em-
brechts and Lambrigger (2007) discusses some fundamental properties of the g-and-h distribution
and how it is linked to the well documented EVT based methodology. However, if one focus on the
excess function, the link to the Generalized Pareto Distribution (GPD) has an extremely slow conver-
gency rate and capital estimation for small level of risk tolerance using EVT may lead to inaccurate
results if the parametric g-and-h distribution is chosen as model. Also Diebold, Schuermann and
Stroughair (2001) and Dutta and Perry (2006) stress weaknesses of EVT when it comes to real data
analysis. This problem may be solved by non- and semi-parametric procedures. For a review of
modern kernel smoothing techniques see Wand and Jones (1995). During the last decade a class of
semi-parametric methods was developed and designed to work better than a purely non-parametric
approach, see Bolancé, Guillén and Nielsen (2003), Hjort and Glad (1995) and Clements, Hurn and
Lindsey (2003). They showed that non-parametric estimators could substantially be improved in a
transformation process and they offer several alternatives for the transformation itself.
The paper by Buch-Larsen, Bolancé, Guillén and Nielsen (2005) maps the original data within
[0, 1] via the parametric start and correct non-parametrically for possible misspecification. A flexible
parametric start helps to mitigate an accurate estimation, Buch-Larsen et al. (2005) generalized the
Champernowne distribution (GCD), see Champernowne (1936,1952), and used the cumulative dis-
tribution as transformation function. In the spirit of Buch-Larsen et al. (2005) many remedies have
been proposed. Gustafsson, Hagmann, Nielsen and Scaillet (2008) investigate the performance be-
tween symmetric and asymmetric kernels in [0,1], Gustafsson, Nielsen, Pritchard and Roberts (2006)
estimate operational risk losses with a continuous credibility model, and Guillén, Gustafsson, Nielsen
and Pritchard (2007), and the extended version by Buch-Kromann, Englund, Gustafsson, Nielsen and
3
Thuring (2007), introduced the concept of under-reporting in operational risk quantification. Bolancé,
Guillén and Nielsen (2008) develop a transformation kernel density estimator by using a double
transformation to estimate heavy-tailed distributions. Before describing the new method we mention
other recent methods in the literature. The papers by Shevchenko and Wüthrich (2006), Bühlmann,
Shevchenko and Wüthrich (2007) and Lambrigger, Shevchenko and Wüthrich (2007) combine loss
data with scenario analysis information via Bayesian inference for the assessment of operational risk,
and Verrall, Cowell and Khoon (2007) examine Bayesian networks for operational risk assessment.
Figini, Guidici, Uberti and Sanyal (2008) develops a method that can estimate the internal data distri-
bution with help from truncated external data originating from the exact same underlying distribution.
In our approach we allow the external distribution to be different from the internal distribution, our
transformation approach has the consequence that the internal estimation process is guided by the
more stable estimator originating from the external data. If the underlying distribution of the external
data is close to the underlying distribution of the internal data, then this approach improves the effi-
ciency of estimation. We also present a variation of this approach where we correct for the estimated
median, such that only the shape of the underlying internal data has to be close to the shape of the
underlying external data for our procedure to work well.
2 The asymptotic properties of the transformation approach
Let (Xi )1≤i≤n be a sequence of iid random losses from a probability distribution F (x) with unknown
density function f (x). Let T (x) be a set of twice continuous differentiable transformations, with a
first derivative T 0 (x) that serves as a global start. This density is assumed to provide a meaningful but
potentially inaccurate description of f (x). The transformation function T (x) could be a cumulative
distribution function, or it could be a non-parametric class of functions, see e.g Wand et al. (1991)
and Buch-Larsen et al. (2005). The transformed density estimator with a local constant boundary
4
correction can be written as
Ã n
!
X
fˆ(x) = T (x) · 0 −1
α01 (T (x), h) −1
n Kh (T (Xi ) − T (x))
i=1
= T 0 (x) · l(T (x)). (2.1)
with local model l(·), transformed losses T (Xi ), bandwidth h, symmetric kernel Kh with Kh (·) =
1
¡·¢
h
K h
and
Z min{1,u/h}
αij (u, h) = (vh)i K(v)j dv. (2.2)
max{−1,(u−1)/h}
A deeper analysis of the transformation process could be found in e.g Gustafsson et al. (2008). The
asymptotic theory of (2.1) are presented by theorem 1, and the proof can be found in Buch-Larsen et
al. (2005).
Theorem 1 Let the transformation function T (x) be a two times differentiable known function. The
bias of fˆ(x) is given by
no
ˆ
E f (x) − f (x) = 2−1 α21 (T (x), h) β(T (x)) + o(h2 )
·n o0 ¸0
f (x) 1
with β(T (x)) = T 0 (x) T 0 (x)
and the variance
n o ¡ ¢
V fˆ(x) = α02 (T (x), h) (nh)−1 f (x)T 0 (x) + o (nh)−1
where the asymptotics is given for n → ∞, h → 0 and nh → ∞.
3 The transformation technique
In the previous section a general description of the transformation approach and its asymptotic prop-
erties was given. In this section we assume that the transformation function is a cumulative distri-
bution function described by T (x, θ) with density T 0 (x, θ), indexed by the global parameter θ =
{θ1 , θ2 , ..., θp } ∈ Θ ∈ Rp . As before, let (Xi )1≤i≤n be a sequence of random losses originating from a
5
probability distribution F (x) with unknown density f (x). Then equation (2.1) could be reformulated
as
Ã n
!
X
fˆ(x, θ) = T (x, θ) ·
0
α01 (T (x, θ), h) −1 −1
n Kh (T (Xi , θ) − T (x, θ))
i=1
= T 0 (x, θ) · l(T (x, θ)) (3.1)
Now let θ̂ be some suitable estimator of θ, then the estimator (3.1) becomes
Ã !
³ ´−1 n
X ³ ´
fˆ(x, θ̂) = T (x, θ̂) ·
0
α01 T (x, θ̂), h −1
n Kh T (Xi , θ̂) − T (x, θ̂)
i=1
= T 0 (x, θ̂) · l(T (x, θ̂)) (3.2)
From e.g Buch-Larsen et al. (2005) we know that when θ̂ is a square-root-n-consistent estimator of
θ the asymptotic theory of fˆ(x, θ̂) is equivalent to the asymptotic theory of fˆ(x, θ). The asymptotic
theory of fˆ(x, θ) immediately follows from Theorem 1, and
n ³ ´ ³ ´o µ ¶
√ α02 (T (x), h) f (x)T 0 (x)
ˆ ˆ
nh f x, θ̂ − Ef x, θ̂ ∼ N 0,
nh
holds.
Let now (Xi )1≤i≤n be a sequence of collected internal losses and let (Yj )1≤j≤m be a sequence of
external reported losses. Then (Yj )1≤j≤m represent prior knowledge that should enrich a limited in-
ternal data set. The parametric density estimator based on the external data set offers prior knowledge
from the general industry. The correcting of this prior knowledge is based on a local kernel smoother
of the transformed internal data points. This provide us with consistent and fully nonparametric esti-
mator of the density of the internal data . The approach is therefore fully nonparametric at the same
time as it is informed by prior information based on external data sources. The new mixing local
transformation kernel density estimator has asymptotics as (3.1) in Theorem 1. Hence, if T 0 (x, θ̂y ) is
a misspecified approximation of f (x), a convergency in probability of θ̂y takes place to the pseudo
6
true value θ which minimizes the Kullback-Leibler distance1 between T 0 (x, θ̂y ) and the true density
f (x). Therefore θ̂y is a square-root-n-consistent estimator of θ. The proposed mixing model could be
expressed as
Ã !
³ ´−1 n
X ³ ´
fˆ(x, θ̂ ) = T (x, θ̂ ) ·
y 0 y y
α01 T (x, θ̂ ), h n −1 y y
Kh T (Xi , θ̂ ) − T (x, θ̂ )
i=1
0 y y
= T (x, θ̂ ) · l(T (x, θ̂ )) (3.3)
and have the same asymptotic theory as (3.1) for n → ∞, h → 0 and nh → ∞.
The global start T 0 (x, ·) in (3.2) and (3.3) are modelled by the Generalized Champernowne Dis-
tribution (GCD) with density
α (x + c)(α−1) ((M + c)α − (c)α )

T 0 (x, ·) = 2 , x ∈ R+ (3.4)
((x + c)α + (M + c)α − 2(c)α )
incorporating internal parameters θ = {α, M, c} and external parameters θy = {αy , M y , cy }. Here α
is a tail parameter, M controls the body of the distribution and c is a shift parameter that control the
domain close to zero. The parameter vectors θ and θy are estimated by maximum likelihood and for
a deeper analysis on the GCD, see Buch-Larsen et al. (2005).
Another slightly different mixing model will also be included in the paper. This estimator follows
the same idea as in Gustafsson et al. (2005). The model is the same as (3.3) but instead of using
the prior knowledge parameter vector θ̂y , this generalized model transforms the internal losses with
a combined parameter vector θ̂ȳ = {α̂y , M̂ , ĉy }. The differences are that prior knowledge is only
provided by the tail parameter α̂y and the shift parameter ĉy , while the mode is represented by the
estimated internal parameter M̂ . With this model we enclose the external parameters α̂y and ĉy with
a body described internally by M̂ .

1
The Kullback-Leibler distance is a distancenfunction from
o a true probability density, f (x), to a target probability
0 y
R∞ f (w)
estimator, T (x, θ̂ ), and defined as 0 f (w)log T 0 (w,θ̂y ) dw
7
4 Data study
This section demonstrates the practical relevance of the above presented mixing model. We estimate
a loss distribution and calculate next years operational risk exposure by using the common risk mea-
sures Value-at-Risk (VaR) and Tail-Value-at-Risk (TVaR) for different levels of risk tolerance. For
the severity estimator we employ model (3.2) and model (3.3), and benchmark with the parametric
Weibull distribution. For the frequency model we assume that N (t) is an independent homogenous
Poisson process, N (t) ∈ P o(λt), with positive intensity λ. By using Monte Carlo simulation with
the severity and frequency assumptions we could create a simulated one year operational risk loss
distribution.
We denote the internal collected losses from the event risk category Employment Practices and
Workplace Safety with (Xi )1≤i≤N (t) where N (t) describes random number of losses over a fixed time
period t = 0, ..., T and with N (0) = 0. Further, (Yj )1≤j≤M (t) represent external data from the same
event risk category with random number of losses M (t) with M (0) = 0. Table 1 reports summary
statistics on each data sets.
TABLE 1
Statistics for Event Risk Category Employment Practices and Workplace Safety
Number of Maximum Sample Sample Standard Time
Losses Loss ($M) Mean ($M) Median ($M) Deviation ($M) Horizon (T )
Internal data 120 16.80 1.86 0.30 3.66 2
External data 6526 561.43 1.82 0.18 13.41 8
The mean and median are similar for the two data sets. However, the number of losses, the maximum
loss, the standard deviation and the collection period are widely different. We condition on N = n and
sample the Poisson process of internal events that occur over a one year time horizon. The maximum
likelihood estimator of the annual intensity of internal losses n = 120 is λ̂ = n/T = 120/2 = 60,
and denote the annual simulated frequencies by λ̂r with r = 1, ..., R and number of simulations
R = 10.000. For each λ̂r we draw randomly uniform distributed samples and combine these with
8
loss sizes taken from the inverse function of the severity distribution. The outcome is the annual total
loss distribution denoted by

λ̂r
X
Sr = F̂ ← (urk , ·), r = 1, ..., R
k=1
with urk ∈ U (0, 1) for k = 1, ..., λ̂r , and
Z urk
F̂ (urk , ·) = fˆ(ξ, ·)dξ
0
with incorporating θ̂ for model (3.2) or the prior knowledge parameters θ̂y and θ̂ȳ for (3.3). For
simplicity, we introduce the abbreviations M1 for the semi-parametric model (3.2), M2 and M3 for
(3.3) with θ̂y and θ̂ȳ respectively. M4 is the purely external data version of (3.2), where the prior
knowledge is locally corrected with external data. This corresponds to the situation when a company
has not started to collect internal data and must rely entirely on prior knowledge. Finally, M5 is
the benchmark model with a Weibull assumption on the severity. Then, a loss distribution could be
calculated with relevant return periods and thereby identify the capital to be held to protect against
unexpected losses. The two risk measured used are the VaR
VaRα (Sr ) = sup {s ∈ R | P (Sr ≤ s) ≤ α}
and TVaR that give us the expectation of the area above VaR and is defined as
TVaRα (Sr ) = E[Sr | Sr ≥ V aRα (Sr )]
for risk tolerance α. Table 2 collects summary statistics for the simulated total loss distribution across
each model. Among the usual summary statistics we report VaR and TVaR for risk tolerance α =
{0.95, 0.99, 0.999}.
9
TABLE 2
Statistics of Simulated Loss Distributions for Event Risk Category Employment Practices and Workplace Safety (∼ $M)
Model Mean Median Sd VaR95% VaR99% VaR99.9% TVaR95% TVaR99% TVaR99.9%
M1 226 178 149 547 765 980 691 889 1180
M2 244 186 167 609 881 1227 787 1025 1385
M3 296 213 241 633 936 1331 820 1143 1487
M4 316 274 322 801 1165 1644 1073 1388 1858
M5 100 97 27 249 272 294 363 383 405
One can see from Table 2 that all loss distributions are right skewed since all means are larger than
its respective median. Interpreting the VaR results one notices that the fully prior knowledge model
M4 shows much larger values than the others. The benchmark model M5 predicts the lowest values
of the considered models and if we compare M5 against the purely internal estimator M1 , model M5
suggests 30% − 40% capital requirement, of the total amount M1 recommends. For the mixing model
M2 we have only 8% higher sample mean and 10% − 25% higher VaR values than M1 and this is due
to prior knowledge based on external data. If we view the TVaR results, similar pattern as in VaR
could be visualized. A conclusion is that the two mixing models, M2 and M3 , are able to stabilizing
the estimation process and does not underestimate the tail as model M5 apparently does.
5 Data Driven Simulation Study
This section provides a data driven simulation study. Here we evaluate scenarios with different prior
knowledge for the new mixing model. The study demonstrates the intuitive expectation, namely, the
mixing model outperform model This simulation study teach us what we already expected, namely,
that the mixing model outperform model (3.2) in situations with scarce data. As data becomes more
abundant, we experienced that the proposed model oppose the performance, sometimes even hurting
a bit. Another insight is that the parametric Weibull distribution provides doubtful goodness-of-fit
to heavy tailed distributed data. The study brings into play two changing true distributions with
10
unlike characteristics and the idea with a data driven simulation study, in contrast to a predetermined
simulation study, is that the two true distributions are estimated on the real operational risk samples
described in the previous section. The two distributions used in the study are presented by their
respective densities below.
log(x)−µ 2
e− 2 ( ) , x ∈ R , with location- and scale parameter
1
1
1. Lognormal, f (x, ξ1 ) = xσ
√
2π
σ
+
ξ1 = {µ, σ}.
α(x+c)(α−1) ((M +c)α −(c)α )

2. Generalized Champernowne Distribution (GCD), f (x, ξ2 ) = ((x+c)α +(M +c)α −2(c)α )2
,x ∈ R+ ,
with ξ2 = {α, M, c}.
The study was also made with the distributions; Exponential, Weibull, Gamma and Pareto. The
outcomes were similar to the lognormal and GCD, but are not outlined in the paper although avail-
able upon request. The two true distributions are estimated on the two available data sets by max-
imum likelihood estimation and produce the estimated parameter vectors ξˆd and ξˆdy with d = 1, 2.
Random losses are drawn from the estimated true models to obtain pseudo data. The internal data
b
are expressed by (Xdi )1≤i≤nk , were d indicate distribution treated, the iteration b = 1, ..., B, with
B = 2000, and the sample sizes simulated for next years exposure N (t) = nk are chosen as
nk = {10, 20, 50, 100, 200, 500, 1000, 2000, 4000}. The external samples are denoted by (Ydjb )1≤j≤ml
with M (t) = ml where ml = {100, 5000}. For each data set the internal and external global start,
T 0 (x, ·) in (3.2) and (3.3), are assumed to be explained by the GCD described in (3.4). The model
described in (3.2) is then estimated on each sample combination {b, d, nk }, and for each sample nk
we estimate the mixing model (3.3) on external data combination {b, d, ml }.
In summary, for each internal sample nk six estimators will be compared. One that is pure internal
(3.2), four scenarios of the mixing model (3.3), informed with different external sample sizes ml and
different global parametric start. Also here we benchmark with a parametric Weibull distribution.
The performance measures considered are means over replications for the integrated squared error
(ISE), which measure the error between the true and the estimated density with equal weight across
the support, and the weighted integrated squared error (WISE) that focuses on the fit in the tails. We
11
can write the statistical performance measures as
 1/2 
B
X Z∞ ³ ´2
1  
ISE =  fˆ(x, ·) − f (x, ξˆd ) dx 
B b=1 0
 1/2 
B
X Z∞ ³ ´2
1  
WISE =  fˆ(x, ·) − f (x, ξˆd ) x2 dx 
B b=1 0
with true density f (x, ξd ), and the estimator fˆ (x, ·) as one of the six different models. Note that the
true model is only enriched with the internal data.
Figure 1 presents the results in the situation with the true lognormal distribution. We illustrate the
result by a relative error, meaning that the outcome that the pure internal estimator of ISE and WISE
are divided with the outcome of the proposed models. A curve above one provides a worse prediction
from the mixing model in that specific sample point.
*** Figure 1 About Here ***
The top left-hand graph fluctuates around one. The mixing model therefore has similar ISE as the
pure internal estimator. For some internal sample points we have less than a half percentage better
results using the pure internal estimator (3.2), and of the compared mixing models the model that is
enriched with 5000 external observations, fˆ(x, θ̂y )5000 performs best when the maximum number of
internal observations are reviewed. The top right-hand graph presents the benchmark model perfor-
mance. The parametric Weibull grows from 20% worse to above 30% worse as the internal sample
size increases. For the tail measure WISE we stress that above n4 the mixing models are stabilized
between 1%-2% worse performance. Note that the breakpoint are higher when one focuses on the
extremes than with the whole axes as in ISE. The bottom right-hand graph presents the benchmark
performance and clearly the Weibull has predicts poorly the true underlying density of the internal
data set with culmination 60% poorer results on sample size n9 = 4000.
12
In the situation with the GCD almost identical appearance as in the lognormal case could be rec-
ognized. Figure 2 gives a visual interpretation of the results.
*** Figure 2 About Here ***
For the ISE the breakpoint have increased to around n5 compared to the lognormal, thereafter the
mixing models become stabilized in the region between 5%-10% poorer prediction power. Focusing
on WISE we could see that the breakpoint are between 100-500 internal observations.
6 Conclusions
This paper gives an intuitive and easily implemented model that combine different data sources. From
the data study we found that using a light tailed parametric fit, small numbers on the Value-at-Risk
would be identified. Also, the mixing methods seem to give more stable and more realistic results than
parametric methods not including external data. From the data driven simulation study we conclude
that in the situation with scarce internal data the proposed mixing model outperforms the model with
only one data source. However, as the internal data becomes more abundant the breakpoint where
(3.2) becomes better than (3.3) could be identified. Also we find that the breakpoints catch in earlier
for light-tailed distributions than for heavy-tailed ones. Large data sets will often do better without
prior knowledge in the sense that the transformation is better estimated from pure internal data, how-
ever a deep data analysis needs to be done by practioners when the breakpoint is reached to find out
if the internal data alone includes large losses that they believe cover the whole company operational
risk exposure. Otherwise the mixing model should be preferred since it automatically enriches the
internal exposure with large external losses.
13
References
[1] B OLANC É , C., G UILL ÉN , M., AND N IELSEN , J.P. (2003). Kernel density estimation of actu-
arial loss functions. Insurance, Mathematics and Economics, Vol. 32, 19-36.
[2] B OLANC É , C, G UILL ÉN , M. AND N IELSEN , J.P. (2008). Inverse Beta transformation in kernel
density estimation. Statistics & Probability Letters (to appear).
[3] B UCH -L ARSEN , T., N IELSEN , J.P., G UILL ÉN , M. AND B OLANC É , C. (2005). Kernel den-
sity estimation for heavy-tailed distributions using the Champernowne transformation. Statistics,
Vol. 39, No. 6, 503-518.
[4] B UCH -K ROMANN , T., E NGLUND , M., G USTAFSSON , J., N IELSEN , J.P. AND T HURING , F.
(2007). Non-parametric estimation of operational risk losses adjusted for under-reporting. Scan-
dinavian Actuarial Journal, Vol. 4, 293-304.
[5] B ÜHLMANN , H., S HEVCHENKO , P.V. AND W ÜTHRICH , M.V. (2007). A ”toy” model for
operational risk quantification using credibility theory. Journal of Operational Risk, Vol. 2, No.
1, 3-20.
[6] C HAMPERNOWNE , D.G. (1936). The Oxford meeting, September 25-29, 1936. Econometrica,
Vol. 5, No. 4, October 1937.
[7] C HAMPERNOWNE , D.G. (1952). The graduation of income distributions. Econometrica, Vol.
20, 591-615.
[8] C IZEK , P., H ÄRDLE , W. AND W ERON , R. (2005). Statistical Tools for Finance and Insurance.
Springer-Verlag Berlin Heidelberg.
[9] C LEMENTS , A.E., H URN , A.S. AND L INDSAY, K.A. (2003). Möbius-like mappings and their
use in kernel density estimation. Journal of the Americal Statistical Association, Vol. 98, 993-
1000.
[10] C RUZ , M.G. (2001). Modeling, Measuring and Hedging Operational Risk. John Wiley & Sons,
LTD.
14
[11] D EGEN , M., E MBRECHTS , P. AND L AMBRIGGER , D.D. (2007). The Quantitative Modeling
of Operational Risk: Between G-and-H and EVT. Astin Bulletin, Vol. 37, No. 2, 265-292.
[12] D IEBOLD , F., S CHUERMANN , T. AND S TROUGHAIR , J. (2001). Pitfalls and oppurtunities in
the use of extreme value theory in risk mangement. In: Refenes, A.-P., Moody, J. and Burgess,
A. (Eds.), Advances in Computational Finance, Kluwer Academic Press, Amsterdam, pp. 3-12,
Reprinted from: Journal of Risk Finance, Vol. 1, 30-36 (Winter 2000).
[13] D UTTA , K. AND P ERRY, J. (2006). A tale of tails: an empirical analysis of loss distribution
models for estimating operational risk capital. Federal Reserve Bank of Boston, Working Paper
No 06-13.
[14] E MBRECHTS , P., K L ÜPPELBERG , C. AND M IKOSCH , T. (1999). Modeling Extremal Events
for Insurance and Finance. Springer.
[15] E MBRECHTS , P. (2000). Extremes and Integrated Risk Management. London: Risk Books,
Risk Waters Group.
[16] F IGINI , S., G IUDICI , P., U BERTI , P. AND S ANYAL , A. (2008). A statistical method to opti-
mize the combination of internal and external data in operational risk measurement. Journal of
Operational Risk, Vol. 2, No. 4, 69-78.
[17] G UILL ÉN , M., G USTAFSSON , J., N IELSEN , J.P. AND P RITCHARD , P. (2007). Using external
data in operational risk. Geneva Papers, Vol. 32, 178-189.
[18] G USTAFSSON , J., N IELSEN , J.P., P RITCHARD , P. AND ROBERTS , D. (2006). Quantifying
Operational Risk Guided by Kernel Smoothing and Continuous credibility: A Practioner’s view.
Journal of Operational Risk, Vol. 1, No. 1, 43-56.
[19] G USTAFSSON , J., H AGMANN , M., N IELSEN , J.P. AND S CAILLET, O. (2008). Local Trans-
formation Kernel Density Estimation of Loss Distributions. Journal of Business and Economic
Statistics (to appear)
15
[20] H JORT, N.L. AND G LAD , I.K. (1995). Nonparametric Density Estimation with a Parametric
Start. Annals of Statistics, Vol. 24, 882-904.
[21] K LUGMAN , S.A., PANJER , H.A. AND W ILLMOT, G.E. (1998). Loss Models: From Data to
Decisions. New York: John Wiley & Sons, Inc.
[22] L AMBRIGGER , D.D., S HEVCHENKO , P.V. AND W ÜTHRICH , M.V. (2007). The quantification
of operational risk using internal data, relevant external data and expert opinion. Journal of
Operational Risk, Vol. 2, No. 3, 3-27.
[23] M C N EIL , A.J., F REY, R. AND E MBRECHTS , P. (2005). Quantitative Risk Management.
Princeton Series in Finance.
[24] PANJER , H.H. (2006). Operational Risk: Modeling Analytics. New York: John Wiley & Sons,
Inc.
[25] S HEVCHENKO , P.V. AND W ÜTHRICH , M.V. (2006). The structural modeling of operational
risk via Bayesian inference. Journal of Operational Risk, Vol. 1, No. 3, 3-26.
[26] V ERRALL , R.J., C OWELL , R. AND K HOON , Y.Y. (2007). Modelling Operational Risk with
Bayesian Networks. Journal of Risk and Insurance, Vol. 74, No. 4, 795-827.
[27] WAND , M.P., M ARRON , J.S. AND RUPPERT, D. (1991). Transformation in Density Estima-
tion (with comments). Journal of the American Statistical Association, Vol. 94, 1231-1241.
[28] WAND , M.P. AND J ONES , M.C. (1995). Kernel Smoothing. Chapman & Hall.
16
Lognormal Distribution Lognormal Distribution
1.4
1.002
1.3
1.2
ISE (Rel)
ISE (Rel)
1.1
0.998
1.0
^ ^y
f (x, θ )100
^ ^y
f (x, θ )5000
0.9
^ ^y
f (x, θ )
100
0.994
^ ^y
f (x, θ )5000
0.8
Benchmark Weibull
10
20
50
100
200
500
1000
2000
4000
10
20
50
100
200
500
1000
2000
4000
Internal Data Internal Data
Lognormal Distribution Lognormal Distribution

1.6
1.02
1.4
WISE (Rel)
WISE (Rel)
1.00
1.2
^ ^y
f (x, θ )100
0.98
1.0
^ ^y
f (x, θ )5000
^ ^y
f (x, θ )
100
^ ^y
0.96
f (x, θ )5000
0.8
Benchmark Weibull
10
20
50
100
200
500
1000
2000
4000
10
20
50
100
200
500
1000
2000
4000
Figure 1: The relative error between model (3.2) and mixing model (3.3) estimated on data driven lognormal
data. The internal data used is in the range [10, 4000] and the mixing model (3.3) is enriched by the
external sample sizes m = 100 and m = 5000.
17
GCD GCD
1.2
3.0
1.1
2.5
ISE (Rel)
ISE (Rel)
1.0
2.0
0.9
1.5
^ ^y
f (x, θ )100
^ ^y
f (x, θ )5000
0.8
1.0
^ ^y
f (x, θ )
100
^ ^y
f (x, θ )5000
0.7
0.5
Benchmark Weibull
10
20
50
100
200
500
1000
2000
4000
10
20
50
100
200
500
1000
2000
4000
GCD GCD
10
2.0
8
WISE (Rel)
WISE (Rel)
1.5
6
1.0
^ ^y
f (x, θ )100
^ ^y
f (x, θ )5000
0.5
^ ^y
f (x, θ )
100
^ ^y
f (x, θ )5000
0.0
Benchmark Weibull
0
10
20
50
100
200
500
1000
2000
4000
10
20
50
100
200
500
1000
2000
4000
Figure 2: The relative error between model (3.2) and mixing model (3.3) estimated on data driven GCD data.
The internal data used is in the range [10, 4000] and the mixing model (3.3) is enriched by the
external sample sizes m = 100 and m = 5000.
18

A Mixing Model For Operational Risk

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Mixing Model For Operational Risk

Uploaded by

Copyright:

Available Formats

A Mixing Model For Operational Risk

Jim Gustafsson∗ Jens Perch Nielsen†

30th June 2008

data is scarce than when internal data is abundant.

loss models, Transformation, Generalized Champernowne distribution, Scarce data.

JEL Classification: C13, C14.

Electronic copy available at: http://ssrn.com/abstract=1090863

Electronic copy available at: http://ssrn.com/abstract=1090863

underlying external data for our procedure to work well.

2 The asymptotic properties of the transformation approach

where the asymptotics is given for n → ∞, h → 0 and nh → ∞.

3 The transformation technique

= T 0 (x, θ̂) · l(T (x, θ̂)) (3.2)

and have the same asymptotic theory as (3.1) for n → ∞, h → 0 and nh → ∞.

α (x + c)(α−1) ((M + c)α − (c)α )

incorporating internal parameters θ = {α, M, c} and external parameters θy = {αy , M y , cy }. Here α

a body described internally by M̂ .

Number of Maximum Sample Sample Standard Time

Internal data 120 16.80 1.86 0.30 3.66 2

External data 6526 561.43 1.82 0.18 13.41 8

loss distribution denoted by

with urk ∈ U (0, 1) for k = 1, ..., λ̂r , and

VaRα (Sr ) = sup {s ∈ R | P (Sr ≤ s) ≤ α}

TVaRα (Sr ) = E[Sr | Sr ≥ V aRα (Sr )]

{0.95, 0.99, 0.999}.

Model Mean Median Sd VaR95% VaR99% VaR99.9% TVaR95% TVaR99% TVaR99.9%

M1 226 178 149 547 765 980 691 889 1180

M2 244 186 167 609 881 1227 787 1025 1385

M3 296 213 241 633 936 1331 820 1143 1487

M4 316 274 322 801 1165 1644 1073 1388 1858

M5 100 97 27 249 272 294 363 383 405

5 Data Driven Simulation Study

respective densities below.

α(x+c)(α−1) ((M +c)α −(c)α )

with ξ2 = {α, M, c}.

we estimate the mixing model (3.3) on external data combination {b, d, ml }.

true model is only enriched with the internal data.

from the mixing model in that specific sample point.

*** Figure 1 About Here ***

ognized. Figure 2 gives a visual interpretation of the results.

*** Figure 2 About Here ***

internal exposure with large external losses.

density estimation. Statistics & Probability Letters (to appear).

Vol. 39, No. 6, 503-518.

Vol. 5, No. 4, October 1937.

Springer-Verlag Berlin Heidelberg.

Reprinted from: Journal of Risk Finance, Vol. 1, 30-36 (Winter 2000).

for Insurance and Finance. Springer.

Risk Waters Group.

Operational Risk, Vol. 2, No. 4, 69-78.

Journal of Operational Risk, Vol. 1, No. 1, 43-56.

Statistics (to appear)

Start. Annals of Statistics, Vol. 24, 882-904.

Decisions. New York: John Wiley & Sons, Inc.

Operational Risk, Vol. 2, No. 3, 3-27.

Lognormal Distribution Lognormal Distribution

Internal Data Internal Data

Internal Data Internal Data

You might also like

* Figure 1 About Here *

* Figure 2 About Here *