You are on page 1of 29


Extended Extreme Value Extreme value models for Inference over Extremes


Professor’s Name


Location of Institution


Chapter 1

1.0 Introduction

Extreme value theory has become more vital in the recent past, since rare events may

occur with lower frequency but resulting to significant impacts in various fields, like a plunge in

financial markets or high rainfall in some parts around the globe (Alves, 2016). Extreme

value theory is not just like any standard statistical theory which explores the common behavior

processes, but conversely, extreme values are used to describe unusual behavior or occurrence of

rare events. According to Finkenstadt & Rootzén (2003), absolute values theory are parametric

explaining the upper or lower end of data generation. Hence, the extreme value theory is the one

relied upon in extrapolation. The performance of any model such as extreme value model is well

described by evaluating how well it describes the behavior of the data tail. If the model forms a

good fit, then it could be used in extrapolating the quantities of interest (Genest & Nešlehová,

2014). One particular field of extreme value theory looks at exceedances over a suitably high

threshold, and how that asymptotically motivates the extreme value model, called the generalized

Pareto distribution (GPD).

Scientists try to predict the unpredictable events in a scientific way. The major challenge

of analyzing extremes is how to propose a model and estimate its parameters with little

information due to the rare data available. Another challenge in analyzing extreme data is

precision, and it is difficult to estimate events those have not been observed. To tackle these

questions, extreme Value theory and Generalised Pareto Distribution, extended Pareto

distribution have been developed to analyze these types of occurrences. They attempt to propose

specific distributions for the extreme observations. A generalized Pareto distribution is one of the

continuous probability distributions, and it is utilized in modeling tails of other distributions, and

it is defined by three main parameters namely; scale, shape, and location. However, in some

cases it is specified or defined by only shape and scale, it is only in rare cases where it is

specified by only shape parameter. In this distribution, subjective threshold choices are made

using graphical tools (Baek, 2009). In most cases, the tail estimates of the threshold are

subjected to a considerable amount of uncertainty, which cannot be identified by the inferential

process. Some models utilize various kernel density estimators for the non-extreme component

of the distribution. The kernel density estimation (KDE) functionality developed for these

models has been provided in a standalone form, as these should be of interest to a broad


In financial market, the rare event could result in a significant impact to the stock price,

especially in the highly competitive technology industry. FANG, created by CNBC's Jim

Cramer, is the acronym for four well-known technology stocks in the market as – Facebook,

Amazon, Netflix, and Google (now Alphabet, Inc.). As of Mar. 20, 2018, the market

capitalization of these companies accounted for USD $2.127 trillion. The extreme events in these

stock could influence not only the short-term stock price but also the whole economy in the long

term. Thus, it is worthy to investigate the extreme events of FANG stocks price and attempt to

search a better prediction of its tail distribution.

The main goal of this project is utilizing extreme theory to find the suitable distribution

parameters of extreme event by examing the daily return of FANG (Facebook, Amazon,Netflix,

and Google) in past ten years (2008/6/30 ~ 2018/6/30). The dissertation is structured as follows.

In chapter 2 EVT theoretical main results and theorems are described. In chapter 3 the

generalized Pareto distribution and a brief literature review are reported. In chapter 4, the

extended generalized Pareto distribution models are presented together with the definitions of

MOGPD and HLGPD models. Posterior inference and considerations from an extensive

simulation plan are reported in chapter 5, while results and estimated measures from financial

applications are described in chapter 6. In chapter 7, conclusions about the methodologies used

and their effectiveness, as well as their shortcomings and future implementations are illustrated.

The detailed algorithms and the R code used for the simulations are reported in the Appendix.

1.1 Research Objectives

There are three main objectives that this project seeks to attain, and they include the


1. To determine whether extended generalized Pareto distributions predict extreme

observations more accurately than standard Pareto ones.

2. To establish if extended extreme value models can be fitted coherently to a

variety of dataset.

3. To determine if these models provide more accurate predictions over standard

extreme value models.

1.2 Research questions

To achieve the above-stated research objectives, this dissertation, therefore, seeks to

respond to the following questions

1. Do extended generalized Pareto distributions predict extreme observations more

accurately than standard Pareto ones?

2. Can extended extreme value models be fitted coherently to a variety of dataset?

3. Do these models provide more accurate predictions over standard extreme value models?

Chapter 2

2.0 Extreme Value Theorem

Extreme Value Theory is a specific field of statistics to tackle with the rare events far

from the center of a distribution. It attempts to access the probability of occurrence of extreme

events in different cases. Since the traditional statistical methods do not assure exact

extrapolation about the distribution of the tail, some advanced approaches have proposed to

make inference about the characteristic of the tail (Chan & Gray, 2016).

EVT can be divided into two major methods to deal with extreme data. The first way

depends on deriving block maxima series as a preliminary step and attempt to control the

skimmed dataset. Another one relies on capture the data with peak value above the chosen

sufficiently high threshold. These methods will be outlined in the details below, according to the


2.1 Asymptotic Model

Asymptotic model adapts a natural way of determining whether these observations are

extreme. These extreme data can be cataloged by the observations greater than some high value.

Suppose that 𝑋1 … 𝑋𝑛 are a sequence of independent random variables following by a common

distribution 𝐹, and 𝑀𝑛 is the maxima or minima of the process over the block of size n under n

observations. Thus, 𝑀𝑛 can be defined by :

𝑃(𝑀𝑛 ≤ 𝑥) = 𝑃𝑟 (𝑋1 ≤ 𝑥, … , 𝑋𝑛 ≤ 𝑥) = 𝑃𝑟 ( 𝑋1 ≤ 𝑥) × … × 𝑃𝑟 ( 𝑋𝑛 ≤ 𝑥) = {𝐹(𝑥)}𝑛


In ideal circumstances, the distribution 𝐹 is known, and the distribution of 𝑀𝑛 can be defined.

However, is unknown in some real cases. Thus, another alternative approach is to utilize the

approximate families of models for {𝐹(𝑥)}𝑛 , which can be evaluated only on the extreme data

(Aschenbrenner, 2017).

If n is close to infinity, the distribution of 𝑀𝑛 will degenerate to a point mass at the upper

point of 𝐹 and generate a problem of degeneracy problem. Hence, a linear renormalization of the

𝑀𝑛 −𝑏𝑛
variable 𝑀𝑛 can be used to avoid this difficulty. 𝑀𝑛∗ is defined as follow: 𝑀𝑛∗ = 𝑎𝑛

Where the 𝑎𝑛 and 𝑏𝑛 are the sequences of positive constants, and if this holds for suitable

choices of 𝑎𝑛 and 𝑏𝑛 , then the G can be defined as an extreme value cdf. Fisher and Tippett

(1928) point out that the suitable choices of the constants will lead the distribution of 𝑀𝑛 to

stabilize and this is known as Extreme Types Theorem, the Fisher Tippet Gnedenko theorem

(Cotter & Dowd, 2016).

2.1.1Theorem of Fisher Tippet Gnedenko theorem:

As 𝑛 → ∞, if there exist sequences of constants 𝑎𝑛 > 0 and 𝑏𝑛 , such that

𝑀𝑛 −𝑏𝑛
𝑃( ≤ 𝑥) → 𝐺(𝑥) (1)

where 𝐺(𝑥) is a non-degenerate distribution function, then 𝐺(𝑥) belongs to one of the following


𝐼 ∶ 𝐺𝑢𝑚𝑏𝑒𝑙 ∶ 𝐺(𝑥) = 𝑒𝑥𝑝 {−exp[− ( )]} , −∞ < 𝑥 < ∞ (2)

0 ,𝑥 ≤ 𝜇 ;
𝐼𝐼 ∶ 𝐹𝑟𝑒′𝑐ℎ𝑒𝑡 ∶ 𝐺(𝑥) = { 𝑥−𝜇 −𝜉 (3)
𝑒𝑥𝑝 {− ( ) } ,𝑥 ≥ 𝜇 ;

𝑥−𝜇 −𝜉
𝑒𝑥𝑝 {−[( ) ]} , 𝑥 ≤ 𝜇 ;
𝐼𝐼𝐼 ∶ 𝑊𝑒𝑖𝑏𝑢𝑙𝑙 ∶ 𝐺(𝑥) = { 𝜎 (4)
1 ,𝑥 ≤ 𝜇 ;

For parameters 𝜎 ≥ 0, 𝜇 ∈ ℝ 𝑎𝑛𝑑 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑎𝑠𝑒 𝑜𝑓 𝑓𝑎𝑚𝑖𝑙𝑦 𝐼𝐼 𝑎𝑛𝑑 𝐼𝐼𝐼, 𝜉 > 0

The Fisher - Tippett - Gnedenko hypothesis basically expresses that the sample to

𝑀𝑛 −𝑏𝑛
maxima will converge in distribution to a variable encapsulating with one of the families

named I, II and III. Thus, these three classes of circulations are widely known as the Gumbel,

Fr'echet and Weibull families (Dey & Yan, 2015). Each family has a location and scale

parameter, μ and 𝜎,in addition, the Fr'echet and Weibull families have a shape parameter 𝜉,

respectively. This indicates that 𝑀𝑛 with suitably normalized has a limiting distribution and can

subject to one of the three types of extreme value distribution, regardless of the distribution 𝐹.

Based on this theorem, it can apply an extreme value similarity of the central limit theorem

(Bartková & Čunderlíková, 2017).

2.1.2 The Generalized Extreme Value Distribution (GEV)

In some cases, the data is unknown distribution, and it is not suitable to intake limiting

distribution and ignore the uncertainties. Another approach for tackling this problem is utilizing a

universal extreme value distribution, reformulation of the models in theorem 2.1 (Beirlant, &

Matthys, 2006).

2.1.2-1Theorem of Generalised extreme value distribution (GEV) :

Suppose 𝑛 → ∞, if there exist sequences of constants 𝑎𝑛 > 0 and 𝑏𝑛 , such that

𝑀𝑛 −𝑏𝑛
𝑃( ≤ 𝑥) → 𝐺(𝑥) (i)

where 𝐺(𝑥) is a non-degenerate distribution function, then 𝐺(𝑥) belongs to a member of the

GEV family :

𝑥−𝜇 −
𝑒𝑥𝑝 {−[1 + 𝜉 ( ) ]+ } ,𝛿 ≠ 0 ;
𝐺(𝑥|𝜇, 𝜎, 𝛿) = (5)
𝑒𝑥𝑝 [−𝑒𝑥𝑝 (− ) ] ,𝛿 = 0 ;
{ 𝜎

Defined on 1 + 𝜉 ( ) ≥ 0, 𝑤ℎ𝑒𝑟𝑒 𝜎 > 0 , 𝜇 ∈ ℝ 𝑎𝑛𝑑 𝛿 ∈ ℝ ,

𝜇 ∶ 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟, 𝜎: 𝑠𝑐𝑎𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟, 𝜉: 𝑠ℎ𝑎𝑝𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟

The GEV distribution gathers the three different type of extreme value distribution to single

family, and it can be applied for modeling the maxima of a finite, sequence of data (Beranger &

Padoan, 2015). The GEV can transform to Gumbel, Fre'chet and Weibull based on a different

setting of the parameters. Assuming location parameter 𝜇 and scale parameter 𝜎 are fixed :

If 𝜉 < 0, the GEV distribution belongs to the negative Weibull distribution.

𝜉 = 0, the GEV distribution belongs to the Gumbel distribution.

𝜉 > 0, the GEV distribution belongs to the Fre’chet distribution with a heavy tail.

Given independent and identically distributed observations 𝑋1 … 𝑋𝑛 from a random

variable X divided into m blocks of arbitrary size n if n is large enough then the series of

blockwise maxima 𝑀𝑛,1 …𝑀𝑛,𝑚 converge asymptotically in distribution to a GEV. However, the

goodness of this approximation is associated with the number of n determines. This is the trade-

off between bias and variance. If the block contains few observations these asymptotic

arguments are no longer valid, if the blocks are too large then the number of observations is too

low, leading to high variance.


Different extreme value models have previously been recommended for the whole

distribution model, at the same time capturing a significant portion of the distribution, with the

adaptability of an extreme value model for both the lower/upper tails (Papastathopoulos & Tawn,

2013). One of the outstanding features of these extreme value models either expressly

incorporates parameter to be evaluated as the threshold. Hence overcoming the threshold choice

issues and estimating uncertainty. Mendes, Lopes, and Vas (2004) introduced a basic extreme

value model where the principle model is thought to be normal, and two separate generalized

Pareto distribution is utilized for the tails, with threshold estimation done by either model fit

statistic or quasi-likelihood procedure (Mendes, Lopes, & Vaz, 2004).

According to Frigessi, Ola, & Håvard, (2002) a progressively weighted extreme value

model, where the function of weight fluctuates over the range of support, moving the loads from

a light-tailed density functions such as the Weibull, forming the primary model as to compared

to generalised Pareto model, that will dominate the upper tail. There is no specific threshold in

this approach, as they have primarily replaced the threshold estimation issue with that of

evaluating the transition function parameters, on the other hand, the threshold could be

controlled by the time when the Weibull’s weighted contribution is small as compared to

generalized Pareto distribution. Behrens et al. (2004) developed a extreme value modelthat joins

a parametric frame for the density distributions like Gamma, Weibull or Normal up to the

particular threshold and a generalized Pareto distribution for the tail over the limit. In their

approach, the threshold is unequivocally regarded as a parameter to be evaluated. Recent studies

by Carreau and Bengio (2009) have shown a hybrid Pareto distribution, that is a blend of

ordinary and generalized Pareto distribution tail, with resultant probability density function

constrained to be continuous up to its derivative.


The significant disadvantage of the above approaches is specifying a parametric model

for the bulk distribution. And the complex sample properties of the hybrid pareto. In Tancredi,

Clive, & Anthony, (2006) they have recommended a quasi-parametric model containing

piecewise uniform distributions from threshold which are considered to be too low. Their

approach can be primarily be viewed as a piecewise linear estimate t the model below the

threshold, with the model based on the tail above the threshold. The Bayesian derivation is

utilized with a reversible algorithm because of the complex number of uniforms. The limits are

characterized as a parameter, so the inferences usually represent the threshold vulnerability.

This paper, therefore, proposes a model which is more flexible in analyzing external which

includes the upper and lower end of the threshold. This model will avoid the need to assume the

parametric distributions and captures the entire model below and above the threshold.

Chapter 3

3.0 Generalized Pareto Distribution & Literature Review

The generalized Pareto distribution is an example of an extreme value model. In a

classical extreme model, its believed that for an iid (identical independent distributed)

observations of the form { 𝑥𝑖 : 𝑖 = 1,2,3 … . 𝑛} the excess of 𝑥𝑖 − 𝑢 where u is a suitable high

threshold can be expalined using a pareto distribution, which is denoted as 𝐺𝐷𝑃(𝛿𝑢, 𝜀). It can be

expressed as shown below;

𝑥−𝑢 −
1 − [1 + 𝜀 ( 𝛿 )] 𝜀≠0
𝐺(𝑥|𝜀, 𝛿𝑢 , 𝑢) = 𝑝𝑟(𝑋 < 𝑥|𝑋 > 𝑢) = { + (6)
1 − 𝑒𝑥𝑝 [− ( 𝛿 )] 𝜀=0
𝑢 +

And this is a situation where x>u, 𝑦+ = max(𝑦, 0) 𝑎𝑛𝑑 𝜀 𝑎𝑛𝑑 𝜎𝑢 > 0 otherwise it will be

unbounded from below. On the other hand, if the where x<u,, , 𝑥+ = max(𝑥, 0) where 𝑓𝑜𝑟 𝜀 <

0, 𝑡ℎ𝑒𝑛 𝑢 < 𝑥 < 𝑢 − otherwise it will be unbounded from, above.

When working with extreme values, it's essential to understand the absolute values of the data.

An extreme quantile of the distribution is a value Zp in that the 𝑝(𝑥 < 𝑍𝑝 ) = 𝑝 for all values of

p that are close to 1. Thus the GPD for a quantile function can be obtained by inverting equation

one above as;

μ + σ/ε ∗ [ ((1 − p)𝑁/𝑁𝑢 )−𝜀 − 1] 𝜀≠0

P(X > x) { (7)
𝑢 − 𝛿 log(1 − 𝑝), 𝜀→0

Where 𝑝𝜖(0,1)

Pickands Balkema-de Haan Theorem

Let 𝑋1 … 𝑋𝑛 be a sequence of independent random variables, and let 𝐹𝑢 be their conditional

excess distribution function. If 𝐹𝑢 is in the domain of attraction of a GEV distribution, and 𝑢 is

large enough, 𝐹𝑢 is well approximated by the generalized Pareto distribution. In symbols:

lim 𝐹𝑢 (𝑥) = 𝐺(𝑥)


where 𝐺(𝑥) is defined as:

𝑥−𝜇 −
1 − [1 + 𝜉 ( 𝜎 )] , 𝛿 ≠ 0
𝐺(𝑥|𝑢, 𝜎𝑢 , 𝜉) = { + (8)
1 − exp (− 𝜎 ) , 𝛿=0

where 𝑥 > 𝑢, 𝜎𝑢 > 0 , [1 + 𝛿 ( )] > 0.

In the GEV distribution, the shape parameter 𝛿 is important for determining the tail of the GPD.

It can divide into below cases:

 𝜉 < 0, short tail with the finite upper end point 𝑥𝐹 = 𝑢 − ;

 𝜉 = 0, exponential tail ;

 𝜉 > 0, unbounded from the above threshold, heavy tail.

A distribution 𝐹𝑢 is said to be in the domain of attraction of an extreme value type

distribution if there exist 𝑎𝑛 and 𝑏𝑛 ∈ ℝ such that the equation in (i) is valid for an iid sequence

𝑋1 … 𝑋𝑛 with common distribution 𝐹.

It can also be revealed that the GEV are related with GPD model.

Let Y be a random variable following the GEV distribution over a high threshold u and y be the

1 1
𝜉(𝑢+𝑦−𝜇) −𝜉 𝜉(𝑢+𝑦−𝜇) − 1
1−𝑃(𝑌<𝑦+𝑢) [1+
𝜎 𝜉𝑦 −𝜉
𝑃𝑟 (𝑌 ≥ 𝑢 + 𝑦|𝑌 > 𝑢) = ≈ 1 = [1 + 𝜉(𝑢−𝜇) ] = [1 + 𝜎̂ ] (9)
1−𝑃(𝑌<𝑢) 𝜉(𝑢−𝜇) −𝜉 1+ 𝑢
[1+ ] 𝜎

Where 𝜎
̂𝑢 = 𝜎 + 𝜉(𝑢 − 𝜇).

Measure for the extreme data

To evaluate the extreme data, an extreme quantile of distribution is essential and

straightford. For the GPD, the measure of the quantile function can be defined by inverting the

equation (1), as follow:

[(1−𝑝)−𝜉 −1]𝜎
𝑢+ , 𝜉≠0
𝑧𝑝 = { 𝜉 (10)
𝑢 − 𝜎log(1 − 𝑝), 𝜉⟶0

Where 𝑝 𝜖 (0,1).

However, the function (3) is effectiveness when all the observations above threshold are

considering. In some real data, the majority of data is under th threshold, then the u is estimated
as the quantile 1 − , where 𝑁 is the total number of the observations and the 𝑁𝑢 is the number

of the consideration over the threshold. In this situation, the 𝑧𝑝 shows the quantile function of all

observations, and the p in equation (3) is substituted by 𝑝∗ = 1 − (1 − 𝑝) ∗ 𝑁/(𝑁 − 𝑁𝑢 )).

Then, the higher quantile formula is shown below :

𝜎 𝑁 −𝜀
𝑃(𝑋 > 𝑥) = 𝜇 + 𝜀 {[(1 − 𝑝) 𝑁−𝑁 ] − 1} (11)


𝜇: 𝑡ℎ𝑒 𝑐ℎ𝑜𝑠𝑒𝑛 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑

𝜎: 𝑡ℎ𝑒 𝑠𝑐𝑎𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑢𝑛𝑑𝑒𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒

𝜀: 𝑡ℎ𝑒 𝑠ℎ𝑎𝑝𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑢𝑛𝑑𝑒𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒

𝑝: 𝑡ℎ𝑒 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑐ℎ𝑜𝑠𝑒𝑛 𝑞𝑢𝑎𝑛𝑡𝑖𝑙𝑒

𝑁: 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

𝑁𝑢 : 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒

The traditional method of choosing threshold is via graphical diagnostics to decide a prior

threshold choice. The guidelines of graphical diagnostics are illustrated by Coles (2001), which

are the mean residual plot, threshold stability plot and an appropriate of the usual distribution fit

diagnostics (e.g., probability plots, quantile plots, return level plots, empirical and matched

density comparison). Basically, threshold selection is chosen by a subjective approach like the

diagnostics plots or the parameter stability plot as shown in Coles (2001). Although it is easy to

find the potential threshold by these plots, there are many drawbacks to this approach. The main

disadvantage of these diagnostics plots is that the threshold chosen from inspecting the graph

could suffer from subjective judgment and lacks the consideration of uncertainty of this choice in

the subsequent inferences. This approach is not ideal in many cases. For a review of such

diagnostics and the difficulties associated to them see Scarrott and MacDonald (2012).

According to Pickands, (1975) limiting distribution of extremes over probably high

thresholds behaves in a very stable way, thus converging to a generalized Pareto distribution.

However, no information is provided below the limit by the result even though there exist a lot of

possibilities in both below and above the threshold (Falk & Michel, 2008).

In extreme value modeling, Nascimento, Bourguignon, and Leão (2016) provided a new

baseline function for the general radical value distribution extension, during which is more

advantageous as compared to the available standard widespread extreme values distribution.

Papastathopoulos & Tawn proposed the most recent generalizations of the general Pareto

distribution, (2013), in their developments they extended the models by defining the generalized

Pareto distribution for gamma, beta, and the exponentiated Pareto distribution. They based their

models on widespread distributions developed by Zografos and Balakrishnan (2009).

According to Papastathopoulos and Tawn (2013) in 2013 he noted: “the inclusion of this

parameter offers an additional structure for the main body of the distribution, improves the

stability of the modified scale, tail index and return level estimates to threshold choice and

allows a lower threshold to be selected." However, on the other hand, the distributions developed

by Papastathopoulos $ Tawn (2013) are complicated. The generalized c.d.f. And p.d.f. of beta

and gamma function are incomplete models.

Inference about extremes is usually carried out by selecting observations that exceed a

fixed threshold and then fitting the generalized Pareto distribution over the resulting dataset.

Recently, models that take into account the full dataset and that do not consider a fixed threshold

have been introduced under the name of extreme value extreme value models. Furthermore,

extensions of the generalized Pareto distribution have been studied in recent works to allow for

further flexibility (Čunderlíková & Bartková, 2018). However, extreme value extreme value

models which embed these newer Pareto distributions have yet to be defined. The two projects

will define inferential routines for such extended extreme value extreme value models and

investigate their capability in predicting extremes. These methods can be applied to datasets from

environmental or financial applications. This project will specifically study the generalizations of

the Pareto distribution given in Nascimento & Pereira (2017).

3.1 Literature Review

A lot of studies have been carried out later to show how Bayesian and extreme values

theory have been put into practice to determine the threshold. Woolrich, Behrens, Christian,

Mark, & Stephen, (2004) developed one of the simplest models of extreme values. They formed

a parametric model for the bulk distribution and the Generalised Pareto model for tail

distribution. They used the gamma distribution in developing the mass distribution. This one of

the primary methods of extreme values to be produced that comprehensively tried to capture the

extreme models however it failed to create a nonparametric model to capture uncertainty in

absolute values.

Further, Carreau & Bengio, (2009) developed a model that used the normal distribution

and a generalized Pareto model which they referred to a hybrid Pareto model. They established a

continuity constraint on the density function at its first derivative at the threshold. This was the

first model that attempted to establish continuity at the threshold, hence create a connection

between the tail distribution and the bulk distribution. However, this model has been considered

by most scholars to be performing poorly.

According to Frigessi, Ola, & Håvard in 2002 Weibull distribution can be used to set a

dynamically weighted extreme value modelfor the bulk data. The utilized the Cauchy cumulative

distribution instead of defining the threshold explicitly to create a connection between the tail

and bulk distributions. In their model, the bulk distribution model was assumed to be right-tailed

while generalized Pareto distribution represented the upper tail of the extreme values.

In 2006 Tancredi, Clive, & Anthony developed a extreme value modelusing several

uniform distributions representing the bulk distribution and combining a non-parametric

estimation for the tail. The changes in the parameter occur depending on the number of uniform

density functions. They estimated the set through a reversible Markov chain Monte Carlo

algorithm that deals with the changes in the dimension of parameters. Moreover, Roberts &

Rosenthal (2006) and Brooks et al., (2011) also argued that semiparametric nature of extreme

value models leads to the execution of posterior inference different parameters via MCMC or the

Adaptive Markov Chain Monte Carlo machinery through the use of a block Metropolis-Hastings


Behren et al.,.(2004) designed an extreme value model that included a parametric model

fitting both the GPD and the bulk distribution to be used in the tail distribution. These scholars

used right-truncated Gamma in executing bulk distribution a model that proved to be flexable

and so straightforward amongst the known extreme value models. Mendes and Lopes (2004),

and Zhao et al., (2010) in support of the Behren et al.,.(2004) introduced an extreme value

modelwith an element of a normal distribution with the two tails of the normal distribution

represented using separate or different threshold models. Zhao et al., (2010) used their model in

financial applications especially in determining financial gain and loss risks. In this model, both

the lower and the upper thresholds are projected along with all other parameters within the

Bayesian framework (Mokrani, 2016). Therefore, their model like that of Behrens accepts

uncertainity quantification and automated threshold selection or choice. Finally, the model

allows testing of asymmetry of all the financial losses and gain tails via comparison of the model

fittness between these two tails using the same shape parameter.

Chapter 4

4.0 Extended Generalized Pareto Distribution Models

This section explores the paper recommended extended generalized pareto distribution

models, which describe the parameter distribution and the tail. From the observations below the

threshold is assumed to follow the non-parametric density given by ℎ(. |μ, X), where x is the

observation vector, and this model is dependent on the parameter µ. The excess above the

threshold termed as the upper tail follows a generalized Pareto distribution given by 𝐺𝐷𝑃(𝛿𝜇 , 𝜀)

other non parametric components of GDP are considered to be reasonably data generating

distribution approximation during the data generation process. Just like other extreme value

models, this pappers proposed model can thus be applied to many data sets without threshold

choice; hence uncertanity of threshold is accounted for fully (Walshaw, 2014).


According to Smith (1984), he proposed that estimating parameter through maximum

likelihood could not obey the regularity conditions if 𝜀 𝜖 (−1, −0.5) and does not exist when the

𝜀 < −1. In situations where the 𝜀 < −0.5 which are extremley rare in most cases, it will worrhy

to observe that the threshold and scalar parameter are related (Papastathopoulos & Tawn, 2013).

If the threshold us changed from 𝜇 < 𝜇′, then the new extremes will be descriped as generalized

pareto distribution represesnted as shown below with threshold parameter

ε and scalar parameters σ′ = σ + ε ( μ′ − μ)

The extreme values models are interested in determining higher quantiles given as q-values that

satisfy 𝑝(𝑋 > 𝑞) = 1 − 𝑝 when the values of p are large. This model too will allow the

estimation of quantiles above the threshold since they are functions of the generalized pareto

distribution. By inversion of the density function of 𝑝(𝑋 > 𝑞) = 1 − 𝑝 we create q represented

as 𝑝 = 𝐺(𝑞|𝜀, 𝜎, 𝜇) for any probability between [0,1] this will give us the equaltion below;

((1−𝑝)−𝜀 −1)𝜎
𝑞=𝜇+ . (12)

These quantiles are very important since they will show the importance of incorporating

the generalized Pareto model. Extreme value models consist of models that can be used to solve

non-parametric density function estimations. It involves the development of more complex

models that shall be used to compute complex techniques. According to Diebolt & Christian,

(1994) he found out that the normal distribution is used for non-parametric density estimation.

4.1 Dual Gamma GEV

In data applications where data is restricted to positive values only, the gamma family is

used as a preferable method of approximating the extreme value model. The extreme value

model used in this paper is represented as MPk with its distribution density function H which is

defined as shown below;

ℎ(𝑥|𝜃, 𝑃) = ∑𝑘𝑗=1(𝑝𝑗 𝑓𝑔 (𝑥|𝜇𝑗, 𝛾𝑗 ) (13)

Where 𝜃 = (𝜇, 𝛾) representing the parameters of gamma. Where 𝜇 = 𝜇1 … . . , 𝜇𝑘 and where 𝛾 =

(𝛾1 … . . , 𝛾𝑘 ) and 𝑃 = (𝑝1 … 𝑝𝑘 ) represents the weights of the extreme value while 𝑓𝑔 represents

the gamma distribution. The gamma distribution is given as shown below;

𝛾 𝛾
( ) 𝛾
𝑓𝐺 (𝑥|𝜇, 𝛾) = 𝑥 𝛾−1 exp (− (𝜇) 𝑥) , 𝑤ℎ𝑒𝑟𝑒 𝑥 > 0 (14)
 (γ)

The 𝜇𝑗, 𝛾𝑗 > 0 while 𝑝𝑗 𝑠 𝑎𝑟𝑒 [0,1]

The above provides evidence that gamma distribution can be used for the density

function estimation. They can cover the data satisfactorily but cannot handle the extrapolation of

the data towards the tail where there is minimal information/data. The extreme values theory

provides the precise information about the tail. Thus this model designed to overcome such


To obtain our required model, we will build on the above knowledge as shown, from the

above information suppose H shown above is the density of the 𝑀𝑃𝑘 and g be the density of the

generalized pareto then the density of our proposed model is given by;

ℎ(𝑥|𝜇, 𝛾, 𝑃) 𝑖𝑓 𝑥 ≤ 𝑢
𝑓(𝑥|𝜃, 𝑃, 𝜑) = { (15)
[1 − 𝐻(𝑢|𝜇, 𝛾, 𝑃]𝑔(𝑥|𝜑), 𝑖𝑓 𝑥 > 𝑢

H is considered to be a an extreme value of gammas, according to Pickands, (1975) their

theorem is only applicable in situations where the H belongs to the GEV domain of attraction.

On the other known, we understand that gamma distribution belongs to Gumbel distribution

maximum domain of attraction. The primary advantages of this model include; its flexibility, in

that non-parametric model, that majorly focuses on the center of the distribution without

introducing any form of unimodality. A parametric model is always assumed for the tail due to

its theoretical backing. Combining this two gives us a semiparametric approach in which

flexibility is explored in the threshold choice; established through parametric estimation. This

process allows for the division of the sample space in the two data regimes that are the central

and tail parts (Phoa, 2016). Performance of this task automatically eliminates uncertainty about

all components of the model since its governed by the data.

The gamma extreme value modelused in the central part of the distribution can be

separated from generalized Pareto distribution resulting to a change that can be obtained using

the likelihood. Hence getting a clear identity on the threshold. In situations where less

information about the data is available then the prior distribution shall be used. Like stated before

it will be essential to obtain higher quantiles in the distribution which is another advantage of

extended extreme value models. The p-quantiles are obtained as shown below;

𝑝 = 𝐻(𝑞|𝜇, 𝛾, 𝑃) = ∑𝑘𝑗=1 𝑝𝑗 ∫0 𝑓𝐺 (𝑥|𝜇𝑗 , 𝛾𝑗 )𝑑𝑥 (16)

This quantile must be computed using numerical methods. Another importance of using

our joint interconnect model is the ability to obtain high quantiles. For values that are above the

threshold then the density function of our extreme value modelis given as shown below;

𝐹(𝑥|𝜃, 𝑃, 𝜑) = 𝐻(𝑢|𝜇, 𝛾, 𝑃) + [1 − 𝐻(𝑢|𝜇, 𝛾, 𝑃)]𝐺(𝑥|𝜑) (17)

Thus its direct in obtaining the p-quantiles given the quantity equality p equation as shown


𝑝−𝐻(𝑢 |𝜇, 𝛾, 𝑃 )
𝑃∗ = 1−𝐻(𝑢|𝜇, 𝛾, 𝑃 ) (18)

The above equation is used when evaluating the higher quantile as opposed to when

estimating lower quantiles above the threshold. The quantiles function is considered to be a

nonlinear function for the parameters of the model. Hence they the quantile functions posterior

distribution can be obtained using approximation techniques. If this posterior distribution is

received at any probability p, then useful information about the extreme values behavior can be

obtained (Walshaw, 2013).

The prior distribution is considered as a allow identifiability in extreme value models. It

imposes several restrictions that enable model parameters identification. Great authors of

extreme value models like Frühwirth-Schnatter, (2001) have used this technique to impose a

limit on the means of the Gaussian extreme value models. In most cases, prior information about

data extremes is provided by the data experts who had background knowledge of the data.

However, this is not the case for Generalised Pareto distribution. Thus using Coles & Tawn,

(1996) we determine the prior distribution for generalized Pareto distribution as;

−1 −
𝜋 (𝜎𝑗 , 𝜀𝑗 )𝛼𝜎𝑗−1 ( 1 + 𝜀𝑗 ) (1 + 2𝜀𝑗 ) 2

Using Coles & Tawn, (1996) recommendation when 𝜀 < −0.5 which is rare in most

cases, the prior for threshold are normal distributions. The means of the prior distribution are

estimated at the 90th quantile while their variances are placed at 95% intervals of the threshold

while the range of the threshold is estimated between 50th to 99th quantiles (Carreau & Bengio,


According to Woolrich, Behrens, Christian, Mark, & Stephen, (2004) the prior

distribution was estimated to be an approximately normal distribution with 𝑁(𝜇𝑢 𝜎𝑢2 ) when we

are estimating this models paremeter we exercise a lot of care. Since the mean could be having a

significant influence on the resulting model. He the mean should be placed around very high

quantiles. In cases where there is no knowledge about the parameters of the prior distribution, the

prior distribution should be highly concertrated. It will be entirely reasonable id the threshold

concertrates at the upper part of the sample, hence avoiding the posibility of threshold having

negative vaules.

Taking the likelihood of the prior distribution we can obtain the posterior distribution, the

posterior distribution is generally associated with the results or observation in our case the

posterior is assumed as in equition 13 below;

𝑙𝑜𝑔𝜋(𝜃, 𝑃, 𝜑|𝑋) = 𝑍 + ∑𝑖:𝑥𝑖 ≤𝑢 log ( ∑𝑘𝑗=1 𝑃𝑗 𝑓𝐺 (𝑥𝑖 |𝜇𝑗 , 𝛾𝑗 ) +

1+𝜀 𝜀(𝑥𝑖 −𝑢)

∑𝑖:𝑥𝑖 ≥𝑢 log [1 − ∑𝑘𝑗=1 𝑃𝑗 𝐹𝐺 (𝑥𝑖 |𝜇𝑗 , 𝛾𝑗 )] − ∑𝑖:𝑥𝑖 ≥𝑢[log(𝜎) − log ( 1 + )] +
𝜀 𝜎

b 1 u−μu
∑kj=1 [(cj − 1) log(γ) − dj γj − (aj + 1) log(μ) − j ] − ( )^2 − log(σ) − log(1 + ε) −
μ 2 j σu

log(1 + 2ε) as depicted in the do Nascimento, Dani, & Hedibert, (2012).

4.2 The Proposed Generalized Pareto Distribution Model

The extreme value model used in this dissertation is based on two distinct density

functions that are purely generalizations of the generalized Pareto distribution (GPD) density.

These are the Marshall-Olkin generalized Pareto distribution or (MOGPD) and the half-logistics

generalized Pareto distribution or (HLGPD).

4.2.1 Marshall-Olkin generalized Pareto distribution


In some cases, the traditional extreme value analysis will face the concern about choosing

the suitable threshold. Since if the chosen threshold is too high might result in a few

observations, accompanying by a large variance estimates. Thus, it is essential to use a more

flexible models for the distribution of extreme values. Nascimento (2017) propose two models

for exceedances, based on extensions of the GPD. The main idea of the extended generalized

pareto model is utlised an extra parameter, by adding a different setting of the shape parameter,

in the P.D.F 𝑔 to capture the feature of the extreme events. MOGPD transformation is operated

with a wide range behaviors based on the baseline distributions. This approach could apply more

flexibility by defining the new MO generated distribution as below:


𝛿𝐺̅ (𝑥) 𝛿𝐺̅ (𝑥)

𝐹̅ (𝑥; 𝛿) = = 𝑥 𝜖 𝑋 ⊆ ℝ , 𝛿 > 0. (14)
1 − 𝛿𝐺̅ (𝑥) 𝐺(𝑥) + 𝛿𝐺̅ (𝑥)

Where 𝛿 ̅ = 1 − 𝛿, 𝑤ℎ𝑒𝑛 𝛿 = 1, 𝐹̅ (𝑥) = 𝐺̅ (𝑥)

The P.D.F of (4) is :

𝛿𝑔̅ (𝑥)
𝑓 (̅ 𝑥; 𝛿) = 𝑥 𝜖 𝑋 ⊆ ℝ , 𝛿 > 0. (4)
[1 − 𝛿𝐺̅ (𝑥)]2

The Marshall Olkin generalized pareto distribution

Considering the GPD and adapting the Marshall-Olkin generalization as in (14), the

C.D.F of MOGPD is in equition 15 below :


𝑥 − 𝜇 −𝜉
1 − [1 + 𝜉 ( )]
1, 𝜉≠0
𝑥−𝜇 −
1 − 𝛿 [1 + 𝜉 ( 𝜎 )] 𝜉
𝐹(𝑥; 𝑢, 𝜎, 𝜉, 𝛿) =
1 − 𝑒𝑥𝑝 [− ( )]
, 𝜉→0
̅ −𝑥
{ 1 − 𝛿 𝑒𝑥𝑝 [ 𝜎 ]

where 𝑥 > 𝑢, 𝜉 > 0, and [1 + 𝜉 ( )] > 0, for 𝜉 < 0 with 𝛿 ̅ = 1 − 𝛿 𝑎𝑛𝑑 𝛿 > 0

The p.d.f of MOGPD is

𝑥−𝜇 − 𝜉
𝛿[1+𝜉( )]
1 2
, 𝜉≠0
𝜉(𝑥−𝜇) −
̅ [(1+ 𝜉
𝜎{1−𝛿 )] }
𝑓(𝑥; 𝑢, 𝜎, 𝜉, 𝛿) = 𝜎 (16)
𝛿𝑒𝑥𝑝[−( )]
−(𝑥−𝜇) 2 ,𝜉 → 0
{ ̅ 𝑒𝑥𝑝[(
𝜎{1−𝛿 )]}

The formula above shows that when 𝛿 = 1, the MOGPD includes GDP. In addition, the

MOGPD converges to a distribution degenerated at 0. The parameter 𝛿 can be viewed as a

degeneration parameter.

Then, the quantile function of the MOGPD is an extra sharp parameter 𝜆 > 0

1−𝑝 −𝜉
[( ̅ 𝑝) −1]𝜎
𝑢+ , 𝜉≠0
𝑧𝑝 = 𝜉 (17)
{𝑢 − 𝜎log (1−𝛿̅𝑝) , 𝜉⟶0

4.2.2 The Half-logistic generalized Pareto distribution or (HLGPD)

Considering the GPD and adapting the half-logistic generalization we get the following equation;

In this case x > 𝑢 for 𝜉 > 0 & 0 ≤ (𝑥 − 𝑢) ≤ -σ/ 𝜉 for 𝜉 < 0, with λ >0 which is an extra

shape parameter and λ* is given by λ/σ (Cordeiro, 2016).

Then the p.d.f of this half-logistic generalized Pareto distribution is given as;

Where the above equation is simplistic in nature given that it does not entail any specialized

function as it is in the case of a beta function. Therefore, the quantile fuction in the case of

HLGPD is given as follows;

Where p ∈ (0, 1).


However, the whole equation can be presented in a two parameters equation that is ξ∗ and λ∗, if

λ* = λ/ σ, as well as, 𝜉 ∗ =𝜉/𝜎. The rewritten equation will be as given bellow where𝜉 ≠ 0. This

rewritten equation is the best for inference use.

It is important to note that HLPGD or half-logistic generalized Pareto distribution with λ =1 is

the same as Marshall Olkin generalized pareto distribution with δ = 2. This is to mean when X is

half-logistic generalized Pareto distribution, then


Baek, C., Pipiras, V., Wendt, H., & Abry, P. (2009). Second order properties of distribution tails and

estimation of tail exponents in random difference equations. Extremes, 12(4), 361-400.

doi:10.1007/s10687-009-0082-x Behrens, C. N., Lopes, H. F., & Gamerman, D. (2004).

Bayesian analysis of extreme events with threshold estimation. Statistical Modelling: An International

Journal, 4(3), 227-244. doi:10.1191/1471082x04st075oa

Carreau, J., & Bengio, Y. (2009). A hybrid Pareto model for asymmetric fat-tailed data: the univariate

case.Extremes. Extremes 12, 53-76.

Coles, S.G. (2001). An Introduction to Statistical Modelling of Extreme Values, Springer, London.

Diebolt, J., & Christian, R. P. (1994). Estimation of finite mixture distributions through Bayesian

sampling. Journal of the Royal Statistical Society. Series B (Methodological), 363-375.


Do Nascimento, F. F., Dani, G., & Hedibert, F. L. (2012). A semiparametric Bayesian approach to

extreme value estimation. Statistics and Computing, 661-675.

Finkenstadt, B., & Rootzén, H. (2003). Extreme values in finance, telecommunications, and the

environment. CRC Press.

Fisher, R.A. and Tippett, L.H.C. Limiting forms of the frequency distribution of the largest or smallest

member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society,

24(2):180–190, 1928.

Frigessi, A., Ola, H., & Håvard, R. (2002). A dynamic extreme value modelfor unsupervised tail

estimation without threshold selection. Extremes 5.3, 219-235.

Gelman, A., Meng, X., Brooks, S., & Jones, G. L. (2011). Handbook of Markov Chain Monte Carlo.

Boca Raton: CRC Press (Taylor & Francis Group).

Genest, C., & Nešlehová, J. (2014). Copula modeling for extremes. Wiley StatsRef, 34-36.

Mendes, d. M., Lopes, H. F., & Vaz, B. (2004). Data driven estimates for mixtures. Computational

statistics & data analysis , 583-598.

Papastathopoulos, I., & Tawn, J. (2013). Extended generalised Pareto models for tail estimation. Journal

of Statistical Planning and Inference, 131-143.

Pickands, J. (1975). Statistical inference using extreme order statistics. the Annals of Statistics, 119-131

Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of Computational

and Graphical Statistics 18, 349–367.

Tancredi, A., Clive, A., & Anthony, H. O. (2006). Accounting for threshold uncertainty in extreme

value estimation. Extremes 9.2, 87-89.


Woolrich, M. W., Behrens, T. E., Christian, F. B., Mark, J., & Stephen, M. S. (2004). Multilevel linear

Zhao, X.; Scarrott, C.J.; Oxley, L. & Reale, M. (2011). Let the tails speak for themselves”: Bayesian

extreme value mixture modelling for estimating VaR, Submitted. Available from:∼c.scarrott. modelling for FMRI group analysis using

Bayesian inference. Neuroimage 21, 1732-1747.

Zografos, K., & Balakrishnan, N. (2009). On families of beta-and generalized gamma-generated

distributions and associated inference. Statistical Methodology, 344-362.