Professional Documents
Culture Documents
Huyen PHAM
Laboratoire de Probabilites et
Mod`eles Aleatoires
CNRS, UMR 7599
Universite Paris 7
e-mail: pham@math.jussieu.fr
and Institut Universitaire de France
Abstract
The area of large deviations is a set of asymptotic estimates on rare events probabilities and a set of methods to derive such results. The subject had its origins in
the Scandinavian insurance industry where it was used for risk analysis. Since then,
it has undergone many developments, and large deviations theory is a very active field
in applied probability and statistical mechanics. It finds also important and various
applications in finance, and attracted a considerable interest in recent years among both
the academic and practitioners world. Financial applications range from Monte-Carlo
methods and importance sampling in option pricing to estimates of large portfolio
losses subject to credit risk, long term portfolio investment, and implied volatilities
asymptotics for stochastic volatility models. The purpose of these lecture notes is to
present some essential results and techniques in large deviations theory, and to review
recent developments in finance.
Key words: large deviations, importance sampling, rare event simulation, exit probability,
small time asymptotics, implied volatilities, credit risk, asymptotic arbitrage, long term
investment.
MSC Classification (2000): 60F10, 62P05, 65C05, 91B28, 91B30.
Lecture notes for the third SMAI European Summer School in Financial Mathematics, Paris, August
2010.
Contents
1 Introduction
2 An
2.1
2.2
2.3
2.4
2.5
2.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
6
9
13
15
17
22
22
23
37
37
37
38
38
44
48
48
49
54
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
26
29
29
32
Introduction
The area of large deviations is a set of asymptotic results on rare event probabilities and a
set of methods to derive such results. It goes back to the Scandinavian insurance industry,
where it was used for the evaluation of risk. Large deviations is a very active area in
applied probability, and finds important applications in finance where questions related
to extremal events play an increasingly important role. Large deviations arise in various
financial contexts. They appear naturally in option pricing with Monte-Carlo methods by
importance sampling and simulation of rare events. They also occur in risk management
for the estimation of probabilities of large portfolio losses in credit risk, and in long term
investment. Recently, there is a growing literature on asymptotics for stochastic volatility
models by methods of large deviations.
Large deviations theory deals with asymptotic estimates of probabilities of rare events
associated with random processes. These probabilities are exponentially small in the sense:
P[A ] = C exp
I
I
= exp + o(1/) ,
(1.1)
1
n!
.
n
6 n1 ! . . . n6 !
Let us now look at the limiting behaviour of this distribution when the number n of throws
1
nn
2n
pn (f ) '
.
n1
n6
n
6 n1 . . . n6
2n1 . . . 2n6
3
f
X
1
i
ln p ' I(f ) :=
.
fi ln
n
1/6
i=1
I(f ) 0 is the relative entropy of the a posteriori probability f = (fi )i with respect to the a
priori probability f = (1/6, . . . , 1/6). Hence, p = enI(f )+o(n) . This means that when n is
large, pn (f ) is concentrated where I(f ) is minimal. The minimizing point is attained for f
= (1/6, . . . , 1/6), and I(f ) = 0: this is the ordinary law of large numbers! For f 6= f , I(f )
> 0, and pn (f ) tends to zero exponentially fast. These ideas, concepts and computations in
large deviations (concentration phenomenon, entropy functional minimization, etc ..) still
hold in general random contexts, including diffusion processes, but need of course more
sophisticated mathematical treatments.
The outline of these lecture notes is organized as follows. The next section provides an
overview of large deviations theory. The aim is to present the main results, but also the different approaches and key ingredients for deriving such results. The fundamental theorems
in large deviations are often designed under the names of their historical founders. Section
3 is devoted to large deviations for option pricing. We shall address two issues. The first
one is concerned with importance sampling for Monte-Carlo computation of expectations
arising in option pricing. We present two approaches for the determination of suitable
change of probability measures, both relying on asymptotic results from large deviations.
The second problem is concerned with asymptotics in stochastic volatility models. We focus
in particular on the determination of implied volatilities for small time to maturity by using
methods from large deviations. The last part in Section 4 deals with large deviations in risk
management. We present some results on the estimation of large portfolio losses in credit
risk. We also address long term investment problem by considering asymptotic arbitrage,
and portfolio selection with outperformance probability criterion.
2
2.1
(2.1)
(X1 , . . . , Xn ), n N , by:
n
n
X
Y
dP
d
(X1 , . . . , Xn ) =
(Xi ) = exp
Xi n() .
dP
d
i=1
(2.2)
i=1
By denoting E the corresponding expectation under P , formula (2.2) means that for all
n N ,
n
h
i
h
i
X
E f (X1 , . . . , Xn ) = E f (X1 , . . . , Xn ) exp
Xi + n() ,
(2.3)
i=1
for all Borel functions f for which the expectation on the l.h.s. of (2.3) is finite. Moreover,
the random variables X1 , . . . , Xn , n N , are i.i.d. with probability distribution under
P . Actually, the relation (2.3) extends from a fixed number of steps n to a random number
of steps, provided the random horizon is a stopping time. More precisely, if is a stopping
time in N for X1 , . . . , Xn , . . ., i.e. the event { < n} is measurable with respect to the
algebra generated by {X1 , . . . , Xn } for all n, then
h
i
h
i
X
E f (X1 , . . . , X )1 < = E f (X1 , . . . , X ) exp
Xi + () 1 < , (2.4)
i=1
for all Borel functions f for which the expectation on the l.h.s. of (2.4) is finite.
The cumulant generating function records some useful information on the probability
distributions . For example, 0 () is the mean of . Indeed, for any in the interior of
D(), differentiation yields by dominated convergence:
0 () =
E[XeX ]
= E X exp X () = E [X].
X
E[e ]
(2.5)
Normal distribution
Let the normal distribution N (0, 2 ), whose c.g.f. is given by:
() =
2 2
.
2
X
dP
2 2
(X1 , . . . , Xn ) = exp
Xi n
,
dP
2
i=1
the random variables X1 , . . . , Xn are i.i.d. with normal distribution N ( 2 , 2 ): the effect
of P is to change the mean of Xi from 0 to 2 . This result can be interpreted as the
finite-dimensional version of Girsanovs theorem.
Exponential distribution
Let the exponential distribution of intensity . Its c.g.f. is given by
(
, <
ln
() =
,
A direct simple algebra calculation shows that for < , is the exponential distribution
of intensity . Hence, the effect of the change of probability measure P is to shift the
intensity from to .
2.2
Cramers theorem
The most classical result in large deviations area is Cramers theorem. This concerns
large deviations associated with the empirical mean of i.i.d. random variables valued in
a finite-dimensional space, and sometimes called large deviations of level 1. We do not
state the Cramers theorem in whole generality. Our purpose is to put emphasis on the
methods used to derive such result. For simplicity, we consider the case of real-valued i.i.d.
random variables Xi with (nondegenerate) probability distribution of finite mean EX1
R
P
= x(dx) < , and we introduce the random walk Sn = ni=1 Xi . It is well-known by
the law of large numbers that the empirical mean Sn /n converges in probability to x
=
EX1 , i.e. limn P[Sn /n (
x , x
+ )] = 1 for all > 0. Notice also, by the central limit
theorem that limn P[Sn /n [
x, x
+ )] = 1/2 for all > 0. Large deviations results focus
on asymptotics for probabilities of rare events, for example of the form P Snn x for x >
EX1 , and state that
Sn
x ' en ,
n
for some constant to be precised later. The symbol ' means that the ratio is one in
the log-limit (here when n goes to infinity), i.e. n1 ln P[Sn /n x] . The rate of
convergence is characterized by the Fenchel-Legendre transform of the c.g.f. of X1 :
(x) = sup x () [0, ], x R.
P
(2.6)
Proof. 1) Upper bound. The main step in the upper bound of (2.6) is based on Chebichev
inequality combined with the i.i.d. assumption on the Xi :
Sn
P
x = E 1 Sn x E e(Sn nx) = exp n() nx , 0.
n
n
By taking the infimum over 0, and since (x) = sup0 [x ()] for x EX1 , we
then obtain
Sn
x exp n (x) .
P
n
and so in particular the upper bound of (2.6).
2) Lower bound. Since P Snn x P Snn [x, x + ) , for all > 0, it suffices to show
that
hS
i
1
n
[x, x + ) (x).
(2.7)
lim lim inf ln P
0 n n
n
For simplicity, we assume that is supported on a bounded support so that is finite
everywhere, and there exists a solution = (x) > 0 to the saddle-point equation: 0 () =
x, i.e. attaining the supremum in (x) = (x)x((x)). The key step is now to introduce
the new probability distribution as in (2.1) and P the corresponding probability measure
on (, F) with likelihood ratio:
dP
dP
n
Y
d
i=1
(Xi ) = exp Sn n() .
and so
hS
i
i
hS
1
1
n
n
ln P
[x, x + ) [x ()] || + ln P
[x, x + ) .
n
n
n
n
(2.8)
Now, since
0 () = x, we
i have E [X1 ] = x, and by the law of large numbers and CLT:
h
Sn
limn P n [x, x + ) = 1/2 (> 0). We also have (x) = x (). Therefore,
by sending n to infinity and then to zero in (2.8), we get (2.7). Finally, notice that
inf yx (y) = (x) since is nondecreasing on [EX1 , ).
2
Examples
1) Bernoulli distribution: for X1 B(p), we have (x) = x ln xp + (1 x) ln 1x
1p for x
[0, 1] and otherwise.
2) Poisson distribution: for X1 P(), we have (x) = x ln x + x for x 0 and
otherwise.
x2
3) Normal distribution: for X1 N (0, 2 ), we have (x) = 2
2 , x R.
4) Exponential distribution: for X1 E(), we have (x) = x 1 ln(x) for x > 0 and
(x) = otherwise.
Remark 2.1 Cramers theorem possesses a multivariate counterpart dealing with the large
deviations of the empirical means of i.i.d. random vectors in Rd .
Remark 2.2 The independence of the random variables Xi in the large deviations result
P
for the empirical mean Sn = ni=1 Xi /n can be relaxed with the Gartner-Ellis theorem,
once we get the existence of the limit:
() :=
1
ln E en.Sn , Rd ,
n n
lim
and the large deviation principle holds with a rate function given by the Fenchel-Legendre
transform of :
(x) =
x Rd ,
(2.9)
Rd
provided is essentially smooth, i.e. (i) is differentiable in the interior of its domain
assumed to be non empty, and (ii) is steep: 0 (n ) for any sequence (n ) converging
to a boundary of the domain. The steepness condition ensures the existence of a saddlepoint (x) for any x Rd , with a maximum attained in (2.9) for (x).
Remark 2.3 (Relation with importance sampling)
Fix n and let us consider the estimation of pn = P[Sn /n x]. A standard estimator for
pn is the average with N independent copies of X = 1Sn /nx . However, as shown in the
introduction, for large n, pn is small, and the relative error of this estimator is large. By
using an exponential change of measure P with likelihood ratio
dP
dP
= exp Sn n() ,
so that
h
i
pn = E exp Sn + n() 1 Sn x ,
n
The parameter is chosen in order to minimize the variance of this estimator, or equivalently
its second moment:
h
i
Mn2 (, x) = E exp 2Sn + 2n() 1 Sn x
n
exp 2n(x ())
(2.10)
By noting from Cauchy-Schwarzs inequality that Mn2 (, x) p2n = P[Sn /n x] ' Ce2n (x)
as n goes to infinity, from Cramers theorem, we see that the fastest possible exponential
rate of decay of Mn2 (, x) is twice the rate of the probability itself, i.e. 2 (x). Hence,
from (2.10), and with the choice of = x s.t. (x) = x x (x ), we get an asymptotic
optimal IS estimator in the sense that:
lim
1
1
ln Mn2 (x , x) = 2 lim ln pn .
n n
n
This parameter x is such that Ex [Sn /n] = x so that the event {Sn /n x} is no more
rare under Px , and is precisely the parameter used in the derivation of the large deviations
result in Cramers theorem.
2.3
In this section, we present an approach to large deviations theory based on Laplace principle,
which consists in the evaluation of the asymptotics of certain expectations.
We first give the formal definition of a large deviation principle (LDP). Consider a
sequence {Z } on (, F, P) valued in some topological space X . The LDP characterizes
the limiting behaviour as 0 of the family of probability measures {P[Z dx]} on X
in terms of a rate function. A rate function I is a lower semicontinuous function mapping
I : X [0, ]. It is a good rate function if the level sets {x X : I(x) M } are compact
for all M < .
The sequence {Z } satisfies a LDP on X with rate function I (and speed ) if:
(i) Upper bound: for any closed subset F of X
lim sup ln P[Z F ] inf I(x).
xF
xG
which means that P[Z F ] = C eIF / where (C ) is a sequence converging at a subexponential rate, i.e. ln C tends to zero when goes to zero. The classical Cramers theorem
considered the case of the empirical mean Z = Sn /n of i.i.d. random variables in Rd , with
= 1/n.
We first state a basic transformation of LDP, namely a contraction principle, which
yields that LDP is preserved under continuous mappings.
Theorem 2.2 (Contraction principle)
Suppose that {Z } satisfies a LDP on X with a good rate function I, and let f be a
continuous mapping from X to Y. Then {f (X )} satisfies a LDP on Y with the good rate
function
J(y) = inf I(x) : x X , y = f (x) .
In particular, when f is a continuous one-to-one mapping, J = I(f 1 ).
Proof. Clearly, J is nonnegative. Since I is a good rate function, for all y f (X ), the
infimum in the definition of J is obtained at some point of X . Thus, the level sets of J,
J (M ) := {y : J(y) M } are equal to
J (M ) = {f (x) : I(x) M } = f (I (M )),
where I (M ) := {x : I(x) M } is the corresponding level set of I. Since I (M ) is
compact, so are the sets J (M ), which means that J is a good rate function. Moreover,
by definition of J, we have for any A Y:
inf J(y) =
yA
inf
xf 1 (A)
f (x).
Since f is continuous, the set f 1 (A) is open (resp. closed) for any open (resp. closed) A
Y. Therefore, the LDP for {f (Z )} with rate function J follows as a consequence of the
LDP for {Z } with rate function I.
2
We now provide an equivalent formulation of large deviation principle, relying on Varadhans integral formula, which involves the asymptotics behavior of certain expectations. It
extends the well-known method of Laplace for studying the asymptotics of certain integrals
on R: given a continuous function from [0, 1] into R, Laplaces method states that
Z 1
1
lim ln
en(x) dx = max (x).
n n
x[0,1]
0
Varadhan results is formulated as follows:
Theorem 2.3 (Varadhan)
Suppose that {Z } satisfies a LDP on X with good rate function I. Then, for any bounded
continuous function : X R, we have
xX
10
Proof. (a) Since is bounded, there exists M (0, ) s.t. M (x) M for all x
X . For N positive integer, and j {1, . . . , N }, we consider the closed subsets of X
FN,j
x X : M +
2(j 1)M
2jM
,
(x) M +
N
N
so that N
j=1 FN,j = X . We then have from the large deviations upper bound on (Z ),
Z
lim sup ln
0
X
N Z
X
N
X
)/
P[Z dx]
FN,j
j=1
lim sup ln
e(Z
e(M +2jM/N )/ P[Z FN,j ]
j=1
lim sup ln
0
j=1,...,N
2jM
+ lim sup ln P[Z FN,j ]
j=1,...,N
N
0
2jM
max
M +
+ sup [I(x)]
j=1,...,N
N
xFN,j
2jM
max
M +
+ sup [(x) I(x)] inf (x)
j=1,...,N
xFN,j
N
xFN,j
max
M +
2M
.
N
(x0 ) I(x0 ) .
2
Remark 2.4 The relation (2.11) has the following interpretation. By writing formally the
LDP for (Z ) with rate function I as P[Z dx] ' eI(x)/ dx, we can write
Z
Z
E e(Z )/ =
e(x)/ P[Z dx] '
e((x)I(x))/ dx
' C exp
sup
xX ((x)
I(x))
As in Laplaces method, Varadhans formula states that to exponential order, the main
contribution to the integral is due to the largest value of the exponent.
11
When (2.11) holds, we say that the sequence (Z ) satisfies a Laplace principle on X
with rate function I. Hence, Theorem 2.3 means that the large deviation principle implies
the Laplace principle. The next result proves the converse.
Theorem 2.4 The Laplace principle implies the large deviation principle with the same
good rate function. More precisely, if I is a good rate function on X and the limit
xX
is valid for all bounded continuous functions , then (Z ) satisfies a large deviation principle
on X with rate function I.
Proof. (a) We first prove the large deviation upper bound. Given a closed set F of X , we
define the nonpositive function: (x) = 0 if x F , and otherwise. Let d(x, F ) denote
the distance from x to F , and for n N , define
n (x) = n(d(x, F ) 1).
Then, n is a bounded continuous function and n % as n goes to infinity. Hence,
ln P[Z F ] = ln E[exp((Z )/)] ln E[exp(n (Z )/)],
and so from the Laplace principle
lim sup ln P[Z F ] lim sup ln E[exp(n (Z )/)]
0
xX
The proof of the large deviation upper bound is then completed once we show that
lim inf [n (x) + I(x)] =
n xX
inf I(x),
xF
and observe that is bounded, continuous, nonnegative, and satisfies: (x) = 0, (y) =
M for y
/ B(x, ). We then have
E[exp((Z )/)] eM/ P[Z
/ B(x, )] + P[Z B(x, )]
eM/ + P[Z B(x, )],
12
and so
max lim inf ln P[Z B(x, )], M
lim inf ln E[exp((Z )/)]
0
I(x).
Since M < I(x), and B(x, ) G, it follows that
lim inf ln P[Z G] lim inf ln P[Z B(x, )] I(x),
0
and thus
lim inf ln P[Z G] inf I(x) = IG ,
0
xG
We next show how one can evaluate expectations arising in Laplace principles, which
can then be used to derive the large deviation principle.
2.4
The relative entropy plays a key role in the determination of the rate function. We are
given a topological space S, and we denote by P(S) the set of probability measures on S
equipped with its Borel field.
defined by
For P(S), the relative entropy R(.|) is a mapping from P(S) into R,
Z
Z
d d
d
ln
d =
ln
d,
R(|) =
d
d
S d
S
whenever P(S) is absolutely continuous with respect to , and we set R(|) =
otherwise. By observing that s ln s s 1 with equality if and only if s = 1, we see that
R(|) 0, and R(|) = 0 if and only if = .
The relative entropy arises in the expectation in the Laplace principle via the following
variational formula.
Proposition 2.1 Let be a bounded measurable function on S and a probability measure
on S. Then,
Z
hZ
i
ln e d =
sup
d R(|) ,
(2.12)
S
P(S)
e
.
S e d
13
Proof. In the supremum in (2.12), we may restrict to P(S) with finite relative entropy:
R(|) < . If R(|) < , then is absolutely continuous with respect to , and since
is equivalent to 0 , is also absolutely continuous with respect to 0 . Thus,
Z
Z
Z
d
d
ln
d R(|) =
d
d
S
S
S
Z
Z
Z
d
d0
d
ln
=
d
ln
d
d0
d
S
SZ
S
= ln e d R(|0 ).
S
We conclude by using the fact that R(|0 ) 0 and R(|0 ) = 0 if and only if = 0 . 2
The dual formula to the variational formula (2.12) is known as the Donsker-Varadhan
variational formula. We denote by B(S) the set of bounded measurable functions on S.
Proposition 2.2 (Donsker-Varadhan variational formula)
For all , P(S), we have
Z
hZ
i
d ln e d
R(|) =
sup
(2.13)
B(S)
Proof. We denote by H(, ) the r.h.s. term in (2.13). By taking the zero function on S,
we observe that H(, ) 0. From (2.12), we have for any B(S):
Z
Z
d ln e d,
R(|)
S
and so by taking the supremum over : R(|) H(, ). To prove the converse inequality,
we may assume w.l.o.g. that H(, ) < . We first show that under this condition is
absolutely continuous with respect to . Let A be a Borel set for which (A) = 0. Consider
for any n > 0, the function n = n1A B(S) so that by definition of H:
Z
Z
> H(, )
n d ln en d = n(A).
S
Taking n to infinity, we get (A) = 0, and thus . We define f = d/d the RadonNikodym derivative of with respect to . If f is uniformly positive and bounded, then
= ln f lies in B(S), and we get by definition of H:
Z
Z
Z
d
H(, )
d ln e d =
ln
d = R(|),
d
S
S
S
which is the desired inequality. If f is uniformly positive but not bounded, we set fn =
R
R
f n, n = ln fn B(S), and use the inequality H(|) S n d ln S en d. By
sending n to infinity and using the monotone convergence theorem, we get the required
and
14
lim H( , ) = H(, ).
This is achieved by using convexity arguments, and we refer to [15] for the details.
The expression (2.13) of the relative entropy is useful, in particular, to show that for
fixed P(S), the function R(.|) is a good rate function.
2.5
Sanovs theorem
We deal now with large deviations of level 2, which concern random measures. In particular,
Sanovs theorem addresses large deviations associated with the empirical measure of i.i.d.
random variables. In this paragraph, we show how one can derive this large deviation result
by means of the Laplace principle. Let (Xn ) be a sequence of i.i.d. random variables valued
in some Polish space S, and with common probability distribution . We introduce the
corresponding sequence (Ln ) of empirical measures valued in P(S) by:
n1
Ln =
1X
Xj ,
n
j=0
where x is the Dirac measure in x S. The law of large numbers implies essentially the
weak convergence of Ln to . The next stage is Sanovs theorem, which states a large
deviation principle for Ln .
Theorem 2.5 (Sanov)
The sequence of empirical measures (Ln )n satisfies a large deviation principle with good
rate function the relative entropy R(.|).
The purpose of this paragraph is sketch the arguments for deriving Sanovs theorem
by using the Laplace principle. This entails calculating the asymptotic behavior of the
following expectations:
V n :=
1
ln E[exp(n(Ln ))],
n
(2.14)
where is any bounded continuous function mapping P(S) into R. The main issue is
to obtain a representation in the form (2.11), and a key step is to express V n as the gain
function of an associated stochastic control problem by using the variational formula (2.12).
In the sequel, is fixed. We introduce a sequence of random subprobability measures
related to the empirical measures as follows. For t [0, 1], we denote Mt (S) the set of
measures on S with total mass equal to t. Fix n N , and for i = 0, . . . , n 1, we define
Ln0 = 0, and
Lni+1 = Lni +
1
X ,
n i
so that Lnn equals the empirical measure Ln , and Lni is valued in Mi/n (S). We also introduce, for each i = 0, . . . , n, and Mi/n (S), the function
V n (i, ) =
1
ln Ei, [exp(n(Lnn ))],
n
15
where Ei, denotes the expectation conditioned on Lni = . Thus, V n (0, 0) = V n defined
in (2.14), and V n (n, ) = (). In order to obtain a representation formula for V n , we
first derive a recursive equation relating V n (i, .) and V n (i + 1, .) that we interpret as the
dynamic programmming equation of a stochastic control problem.
Recalling that the random variables Xi are i.i.d. with common distribution , we
see that the random measures {Lni , i = 0, . . . , n} form a Markov chain on state spaces
{Mi/n (S), i = 0, . . . , n} with probability transition:
Z
1
1
1A ( + y )(dy).
P[Lni+1 A|Lni = ] = P[ + Xi A] =
n
n
S
We then obtain by the law of iterated conditional expectations and Markov property:
V n (i, ) =
=
=
h
i
1
ln Ei, Ei+1,Lni+1 [exp(n(Lnn ))]
n
h
i
1
ln Ei, exp(nV n (i + 1, Lni+1 ))
n Z
1
1
ln exp nV n (i + 1, + y ) (dy).
n
n
S
(2.15)
The relation (2.15) is the dynamic programming equation for the following stochastic control
n , i = 0, . . . , n} starting from L
n =
problem. The controlled process is a Markov chain {L
0
i
0, with controlled probability transitions:
Z
1
n
n
P[Li+1 A|Li = ] =
1A ( + y )i (dy),
n
S
where {i , i = 0, . . . , n} is the control process valued in P(S), in feedback type, i.e. for each
n . The running gain is 1 R(|), and the terminal gain is
i, the decision i depends on L
i
n
. We deduce that
n1
h
i
X
nn ) 1
V n = Vn (0, 0) = sup E (L
R(i |) .
n
(i )
(2.16)
i=0
Fix some arbitrary P(S), and consider the constant control i = . With this
nn is the empirical measure of i.i.d. random variables having common distribution
choice, L
, and the representation (2.16) yields
n ) R(|)].
V n E[(L
n
n converges weakly to , we have by the dominated convergence theorem:
Moreover, since L
n
nn )] = ().
lim E[(L
16
The corresponding lower-bound requires more technical details (see the details in [15]), and
we get finally
lim V n =
This implies that Ln satisfies the Laplace principle, and thus the large deviation principle
with the good rate function R(.|) on P(S).
Remark 2.5 There are extensions of Sanovs theorem on LDP for empirical measure of
Markov chain and occupation times of continuous-time Markov processes. The main references are the seminal works by Donsker and Varadhan [11], [12], [13]. Consider an ergodic
Feller-Markov process X valued in S, with generator L, and invariant measure . Under
Rt
some conditions, the occupation measure Lt = 1t 0 Xs ds converges to m exponentially fast
with a Donsker-Varadhan rate function given by
h Z Lu i
I() =
sup
d , P(S),
u
uD+ (L)
where D+ (L) is the space of positive functions in the domain of L. For a one-dimensional
diffusion with generator Lu = 21 2 uxx + bux , the rate function may be written in terms of
Dirichlet form:
(
R
E( f , f ) := 12 2 ( f )2 (dx), if f = d
d
I() =
,
otherwise
This Donsker-Varadhan large deviations result is often formulated via the Laplace principle
in terms of asymptotic evaluation of Markov process expectation for large time: for any
bounded continuous function on S,
h
Z t
i
hZ
i
1
(Xs )ds
=
sup
d I() .
lim ln E exp
t t
0
P(S)
2.6
Freidlin-Wentzell theory
In many problems, the interest is in rare events that depend on random process, and the
corresponding asymptotics probabilities, usually called sample path large deviations or large
deviations of level 3, were developed by Freidlin-Wentzell and Donsker-Varadhan.
The first example is known as Schilders theorem, and concerns large deviations for
( W ) satisfies a large deviation principle on C([0, T ]) with rate function, also called
action functional:
( R
1 T
2
if h H0 ([0, T ]) := {h H([0, T ]) : h(0) = 0},
2 0 |h(t)| dt,
I(h) =
,
otherwise
Let us show the lower bound of this LDP. Consider G a nonempty open set of C([0, T ]),
h G, and > 0 s.t. B(h, ) G. We want to prove that
For h
/ H0 ([0, T ]), this inequality is trivial since I(h) = . Suppose now h H0 ([0, T ]),
and consider the probability measure:
dQh
dP
= exp
Z
0
Z T
h(t)
1
2
dWt
|h(t)| dt ,
2 0
"
Z
Qh
exp
= E
(W h Qh -BM ) =
(W W ) =
=
#
Z T
h(t)
1
2
dWth
|h(t)|
dt 1|W h |<
2 0
0
"
#
Z
Z
T
T
h(t)
1
2
dWt
E exp
|h(t)|
dt 1|W |<
2 0
0
#
"
Z
Z
T
T
h(t)
1
2
dWt
|h(t)|
dt 1|W |<
E exp +
0
0
"
#
Z T
Z T
1
h(t)
2
2 0
0
Z T
1
exp(
|h(t)|
dt) P[|W | < ].
2 0
This implies
Xt = x +
b(Xs )ds +
(Xs )dWs , 0 t T,
0
18
where b and are Lipschitz, and bounded. By using contraction principle for LDP, we
derive that (X ) satisfies a LDP in C([0, T ]) with the good rate function
1
Ix (h) =
R t inf
Rt
{f H0 ([0,T ]):h(t)=x+ 0 b(h(s))ds+ 0 (h(s))f(s)ds} 2
|f(t)|2 dt.
When is a square invertible matrix, the preceding formula for the rate function simplifies
to
( RT
1
2
if h Hx ([0, T ])
2 0 |h(t) b(h(t))|( 0 )1 (h(t)) dt,
Ix (h) =
,
otherwise
where Hx ([0, T ]) := {h H([0, T ]) : h(0) = x}. We sketch the proof in the case = Id,
One easily check from the Lipschiz condition on b, and Gronwall lemma that the map F is
continuous on C([0, T ]) so that the contraction principle is applicable, and we obtain that
(X ) satisfies a LDP with good rate function
1
I(h) =
inf
{f H0 ([0,T ]),h=F (f )} 2
|f(t)|2 dt.
h(t)
= b(h(t)) + f(t), h(0) = 0,
from which we derive the simple expression of the rate function
Z
1 T
I(h) =
|h(t) b(h(t))|2 dt.
2 0
Another important application of Freidlin-Wentzell theory deals with the problem of
diffusion exit from a domain, and occurs naturally in finance, see Section 3.1. We briefly
summarize these results. Let > 0 a (small) positive parameter and consider the stochastic
differential equation in Rd on some interval [0, T ],
dXs = b (s, Xs )ds +
(s, Xs )dWs ,
(2.17)
uniformly on compact sets. Given an open set of Rd , we consider the exit time from ,
t,x
= inf s t : Xs,t,x
/ ,
19
v (t, x) = P[t,x
T ], (t, x) [0, T ] Rd .
Here X ,t,x denotes the solution to (2.17) starting from x at time t. It is well-known that
the process X ,t,x converge to X 0,t,x the solution to the ordinary differential equation
dXs0 = b(s, Xs0 )ds,
Xt0 = x.
In order to ensure that v goes to zero, we assume that for all t [0, T ],
(H)
x = Xs0,t,x , s [t, T ].
Indeed, under (H), the system (2.17) tends, when is small, to stay inside , so that the
T } is rare. The large deviations asymptotics of v (t, x), when goes to zero,
event {t,x
(2.18)
v (T, x) = 0,
(2.19)
(2.20)
V
b (t, x).Dx V tr( 0 (t, x)Dx2 V )
t
2
1
+ (Dx V )0 0 (t, x)Dx V = 0, (t, x) [0, T ) ,
2
(2.21)
x .
(2.22)
(2.23)
V0
1
b(t, x).Dx V0 + (Dx V0 )0 0 (t, x)Dx V0 = 0, (t, x) [0, T ) , (2.24)
t
2
with the boundary data (2.22)-(2.23). By PDE-viscosity solutions methods and comparison
results, we can prove (see e.g. [3] or [20]) that V converges uniformly on compact subsets of
20
[0, T ) , as goes to zero, to V0 the unique viscosity solution to (2.24) with the boundary
data (2.22)-(2.23). Moreover, V0 has a representation in terms of control problem. Consider
the Hamiltonian function
1
H(t, x, p) = b(t, x).p + p0 0 (t, x)p, (t, x, p) [0, T ] Rd ,
2
which is quadratic and in particular convex in p. Then, using the Legendre transform, we
may rewrite
H(t, x, p) =
sup q.p H (t, x, q) ,
qRd
where
H (t, x, q) =
sup p.q H(t, x, p)
pRd
1
(q b(t, x))0 ( 0 (t, x))1 (q b(t, x)), (t, x, q) [0, T ] Rd .
2
(t, x) [0, T ) ,
which, together with the boundary data (2.22)-(2.23), is associated to the value function
for the following calculus of variations problem:
Z
V0 (t, x) =
(t, x) [0, T ) ,
x(.)A(t,x) t
Z
=
inf
inf
x(.)A(t,x) t
1
(x(u)
b(u, x(u)))du
2
where
A(t, x) =
x(.) H([0, T ]) : x(t) = x and (x) T ,
Here (x) is the exit time of x(.) from . The large deviations result is then stated as
lim ln v (t, x) = V0 (t, x),
(2.25)
and the above limit holds uniformly on compact subsets of [0, T ) . Notice that for t
p
= 0, and T = 1, the quantity d(x, ) = 2V0 (0, x) is the distance between x and in
the Riemannian metric defined by ( 0 )1 . A more precise result may be obtained, which
allows to remove the above log estimate. This type of result is developed in [18], and is
called sharp large deviations estimate. It states asymptotic expansion (in ) of the exit
probability for points (t, x) belonging to a set N of [0, T 0 ] for some T 0 < T , open in the
relative topology, and s.t. V0 C (N ). Then, under the condition that
b = b + b1 + 0(2 ),
21
one has
V0 (t, x)
w(t, x) (1 + O()),
W
(b 0 Dx V0 ).Dx w =
t
1
tr( 0 Dx2 V0 ) + b1 .Dx V0
2
w(t, x) = 0
on
in N
.
[0, T ) N
(s)
= (b 0 Dx V0 )(s, (s)),
(t) = x,
3
3.1
In this section, we show how to use large deviations approximation via importance sampling
for Monte-carlo computation of expectations arising in option pricing. In the context of
continuous-time models, we are interested in the computation of
h
i
Ig = E g(St , 0 t T ) ,
where S is the underlying asset price, and g is the payoff of the option, eventually pathdependent, i.e. depending on the path process St , 0 t T . The Monte-Carlo approximation technique consists in simulating N independent sample paths (Sti )0tT , i = 1, . . . , N ,
in the distribution of (St )0tT , and approximating the required expectation by the sample
mean estimator:
IgN
N
1 X
g(S i ).
N
i=1
The consistency of this estimator is ensured by the law of large numbers, while the error
approximation is given by the variance of this estimator from the central limit theorem:
the lower is the variance of g(S), the better is the approximation for a given number N of
simulations. As already mentioned in the introduction, the basic principle of importance
sampling is to reduce variance by changing probability measure from which paths are generated. Here, the idea is to change the distribution of the price process to be simulated in
order to take into account the specificities of the payoff function g, and to drive the process
to the region of high contribution to the required expectation. We focus in this section in
the importance sampling technique within the context of diffusion models, and then show
how to obtain an optimal change of measure by a large deviation approximation of the
required expectation.
22
3.1.1
We briefly describe the importance sampling variance reduction technique for diffusions.
Let X be a d-dimensional diffusion process governed by
dXs = b(Xs )ds + (Xs )dWs ,
(3.1)
= MT .
R
t = Wt + t u du, 0 t T , is a Brownian
Moreover, by Girsanovs theorem, the process W
0
motion under Q, and the dynamics of X under Q is given by
s.
dXs = b(Xs ) (Xs )s ds + (Xs )dW
(3.2)
From Bayes formula, the expectation of interest can be written as
h
i
v(t, x) = EQ g(Xst,x , t s T )LT ,
(3.3)
= exp
Z
u
0u dW
|u |2 du , 0 t T.
(3.4)
The expression (3.3) suggests, for any choice of , an alternative Monte-Carlo estimator for
v(t, x) with
N
Ig,
(t, x) =
N
1 X
g(X i,t,x )LiT ,
N
i=1
by simulating N independent sample paths (X i,t,x ) and LiT of (X t,x ) and LT under Q given
by (3.2)-(3.4). Hence, the change of probability measure through the choice of leads to a
modification of the drift process in the simulation of X. The variance reduction technique
consists in determining a process , which induces a smaller variance for the corresponding
23
estimator Ig, than the initial one Ig,0 . The two next paragraphs present two approaches
leading to the construction of such processes . In the first approach developed in [28], the
process is stochastic, and requires an approximation of the expectation of interest. In the
second approach due to [30], the process is deterministic and derived through a simple
optimization problem. Both approaches rely on asymptotic results from the theory of large
deviations.
3.1.2
We are looking for a stochastic process , which allows to reduce (possibly to zero!) the
variance of the corresponding estimator. The heuristics for achieving this goal is based
on the following argument. Suppose for the moment that the payoff g depends only on
the terminal value XT . Then, by applying Itos formula to the Q-martingale v(s, Xst,x )Ls
between s = t and s = T , we obtain:
Z T
t,x
s.
g(XT )LT = v(t, x)Lt +
Ls Dx v(s, Xst,x )0 (Xst,x ) + v(x, Xst,x )0s dW
t
N (t, x) is given by
Hence, the variance of Ig,
N
(t, x))
arQ (Ig,
1 Qh
E
N
2 i
L2s Dx v(s, Xst,x )0 (Xst,x ) + v(x, Xst,x )0s ds .
The choice of the process is motivated by the following remark. If the function v were
known, then one could vanish the variance by choosing
s = s =
1
v(s, Xst,x )
(3.5)
Of course, the function v is unknown (this is precisely what we want to compute), but this
suggests to use a process from the above formula with an approximation of the function
v. We may then reasonably hope to reduce the variance, and also to use such a method
for more general payoff functions, possibly path-dependent. We shall use a large deviations
approximation for the function v.
The basic idea for the use of large deviations approximation to the expectation function
v is the following. Suppose the option of interest, characterized by its payoff function g,
has a low probability of exercice, e.g. it is deeply out the money. Then, a large proportion
of simulated paths end up out of the exercice domain, giving no contribution to the Montecarlo estimator but increasing the variance. In order to reduce the variance, it is interesting
to change of drift in the simulation of price process to make the domain exercice more
likely. This is achieved with a large deviations approximation of the process of interest in
the asymptotics of small diffusion term: such a result is known in the literature as FreidlinWentzell sample path large deviations principle. Equivalently, by time-scaling, this amounts
to large deviation approximation of the process in small time, studied by Varadhan.
To illustrate our purpose, let us consider the case of an up-in bond, i.e. an option that
pays one unit of numeraire iff the underlying asset reached a given up-barrier K. Within a
24
(3.6)
(Yt )dWt2 ,
(3.7)
= inf u t : Xut,x
/ ,
= (0, K) R.
The event {maxtuT Sut,x K} = {t,x T } is rare when x = (s, y) , i.e. s < K
(out the money option) and the time to maturity T t is small. The large deviations
asymptotics for the exit probability v(t, x) in small time to maturity T t is provided by
the Freidlin-Wentzell and Varadhan theories. Indeed, we see from the time-homogeneity of
the coefficients of the diffusion and by time-scaling that we may write v(t, x) = wT t (0, x),
where for > 0, w is the function defined on [0, 1] (0, ) R by
w (t, x) = P[t,x
1],
(Xs )dWs ,
Xt = x.
= inf s t : X ,t,x
and t,x
/ . From the large deviations result (2.25) stated in
s
paragraph 2.3, we have:
lim (T t) ln v(t, x) = V0 (0, x),
t%T
where
Z
V0 (t, x) =
inf
x(.)A(t,x) t
1
x(u)
0 M (x(u))x(u)du,
(t, x) [0, 1) ,
2
x(.) H([0, 1]) : x(t) = x and (x) 1 .
We also have another interpretation of the positive function V0 in terms of Riemanian distance on Rd associated to the metric M (x) = (0 (x))1 . By denoting L0 (x) =
p
2V0 (0, x), one can prove (see [42]) that L0 is the unique viscosity solution to the eikonal
equation
(Dx L0 )0 0 (x)Dx L0 = 1,
L0 (x) = 0,
25
x
x
x ,
(3.8)
where
Z
L0 (x, z) =
inf
1p
x(u)
0 M (x(u))x(u)du,
x(.)A(x,z) 0
and A(x, z) = x(.) H([0, 1]) : x(0) = x and x(1) = z . Hence, the function L0 can
be computed either by the numerical resolution of the eikonal equation or by using the
representation (3.8). L0 (x) is interpreted as the minimal length (according to the metric
M ) of the path x(.) allowing to reach the boundary from x. From the above large
deviations result, which is written as
ln v(t, x) '
L20 (x)
,
2(T t)
as T t 0,
and the expression (3.5) for the optimal theoretical , we use a change of probability
measure with
(t, x) =
L0 (x) 0
(x)Dx L0 (x).
T t
Such a process may also appear interesting to use in a more general framework than
up-in bond: one can use it for computing any option whose exercice domain looks similar
to the up and in bond. We also expect that the variance reduction is more significant as
the exercice probability is low, i.e. for deep out the money options. In the particular case
of the Black-Scholes model, i.e. (x) = s, we have
L0 (x) =
1
s
ln
,
and so
(t, x) =
3.1.3
1
s
ln(
,
(T t)
K
s < K.
We describe here a method due to [30], which, in contrast with the above approach, does
not require the knowledge of the option price, and restricts to deterministic change of drifts.
The original approach of [30] was developed in a discrete-time setting, and extended to a
continuous-time context by [33]. In theses lectures, we follow the continuous-time diffusion
setting of paragraph 3.1.1. It is convenient to identify the option payoff with a nonnegative
functional G(W ) of the Brownian motion W = (Wt )0tT on the set C([0, T ]) of continuous
functions on [0, T ], and to define F = ln G valued in R {}. For example, in the case
of the Black-Scholes model for the stock price S, with
interest rate r and volatility ,
RT
1
ln
the payoff of a geometric average Asian option is (e T 0 St dt K)+ , corresponding to a
functional
RT
2
1
G(w) =
S0 e(r 2 )T e T 0 wt dt K ,
(3.9)
+
26
1 Z
T
RT
0
S0 exp wt + (r 2 /2)t K .
+
= exp
Z
h(t)dW
t
2
2
|h(t)|
dt ,
|h(t)|2 dt .
= E exp 2F (W )
h(t)dWt +
2 0
0
M 2 (h) = EQh
1 T
= E exp
h(t)dWt +
2F ( W )
|h(t)|2 dt
.
2 0
0
Now, from Schilders theorem, (Z = W ) satisfies a LDP on C([0, T ]) with rate funcRT
2 dt for z H0 ([0, T ]), and otherwise. Hence, under subquadratic
tion I(z) = 12 0 |z(t)|
growth conditions on the log payoff of the option, one can apply Varadhans integral prinRT
RT
2 dt, and
z(t)dt
+ 12 0 |h(t)|
ciple (see Theorem 2.3) to the function z 2F (z) 0 h(t).
get
M2 (h)
lim
ln M2 ()
1
=
sup
2F (z) +
2
zH0 ([0,T ])
|z(t)
h(t)|
dt
|z(t)|
2 dt . (3.10)
|z(t)
h(t)|
dt
|z(t)|
2 dt .
(3.11)
Z
2F (h)
0
hH0 ([0,T ])
27
2
|h(t)|
dt .
(3.12)
sup
2 ln e
c
|h(t)|
dt ,
(3.13)
0
hH0 ([0,T ])
where a = /T , c =
K
S0
2
exp (r 2 ) T2 . The corresponding Euler-Lagrange equation is
= ,
h
RT
exp 0 h(t)dt
,
with = a
RT
exp 0 h(t)dt c
(3.14)
h(t) = t2 + t.
2
(3.15)
This
satisfies (
) =
T , and thus the optimal drift is
h(t)
=
2
t +
T t.
2
The following table compares the results obtained from Monte-Carlo simulations without applying variance reduction, by applying antithetic control variates, and the above
importance sampling method. Parameter values are T = 1, r = 3%, = 30%, S0 = 100,
K = 145.
Number of simulations
20000
Standard deviation/mean
10000
Standard deviation/mean
without
anti-control
IS
13.9905
13.48
1.07
13.7428
13.58
1.065
We also report from [33] the following table, which compares the performance, in terms
of variance ratios between the risk-neutral sample and the sample with the optimal drift
for an Asian option in a Black-Scholes model. Parameter values are T = 1, r = 5%, =
20%, S0 = 50, and strikes are varying. Simulations are performed with 106 paths.
28
Strike
50
60
70
Price
304.0
28.00
1.063
Variance ratios
7.59
26.5
310
The performance gap increases with the strike. This is justified by the fact that a
larger strike cause the option to become more out-the-money, and then the role of the drift
in reshaping the payoff distribution in the region of interest becomes more crucial. An
extension of this method of importance sampling by using sample path large deviations
results is considered recently in [49] for stochastic volatility models.
3.2
In recent years, there has been an increasing interest for asymptotic and expansion methods in option pricing and implied volatility for stochastic volatility models. There is a
considerable literature dealing with various asymptotics (small-time, large time, fast meanreverting, extreme strike) for stochastic volatility models, see [2], [4], [44], [36], [6], [46],
[16], [43], [24], [17], [39], [53], [26], [29]. In particular, large deviations provides a powerful
tool for describing the limiting behavior of implied volatilities. We recall that an implied
volatility is the volatility parameter needed in the Black-Scholes formula in order to match
a call option price, and it is a common practice to quote prices in volatility through this
transformation. In this section, we shall focus on small time asymptotics near maturity of
options.
3.2.1
Let us consider a general stochastic volatility model for the log-stock price Xt = ln St given
by
p
1
dXt = 2 (Yt )dt + f (Yt ) 1 2 dWt1 + dWt2 ,
2
dYt = (Yt )dt + (Yt )dWt2 ,
(3.16)
(3.17)
t0
k 0,
(3.18)
for some continuous rate function I on R. For general SV models (3.16)-(3.17), such large
deviations results is derived from Freidlin-Wentzell theory. Indeed, by time scaling, we see
29
that for any > 0, the process t (Xt x0 , Yt ) has the same distribution as (X x0 , Y )
defined by
p
1
x(t)
2
y(t)
2 i
dt,
I(x(.), y(.)) =
+
2(1 2 ) 0 f (y(t))2
f (y(t))
(y(t))2
for all (x(.), y(.)) H([0, 1]) s.t. (x(0), y(0)) = (0, y0 ). Then by applying contraction
principle (see Theorem 2.2), we deduce that X1 x0 , as goes to zero, and so Xt x0 , as
t goes to zero, satisfies a LDP in R with the rate function I : R [0, ] given by
I(k) =
inf
(x,y)H([0,1]),(x,y)(0)=(0,y0 ),x(1)=k
I(x(.), y(.)).
p
The quantity d(k) = 2I(k) is actually the distance from (0, y0 ) to the line {x = k} on
the plane R2 for the Riemannian metric defined by the inverse of the diffusion coefficient
of (X, Y ). Hence, the LDP for Xt x0 means that:
1
lim t ln P[Xt x0 k] = d(k)2 , k 0.
t0
2
The calculation of d(k), and so the determination of the distance-minimizing geodesic
(x , y ) from (0, y0 ) to the line {x = k}, is a differential geometry problem associated
to a calculus of variations problem, but which does not have in general explicit solutions
(see [25] for some details). The solution to this problem can be also characterized by PDE
methods through a nonlinear eikonal equation, see [4]. For the Heston model and more
generally for affine SV models, the LDP (3.18) can be derived directly from explicit computation of the moment generating function and Gartner-Ellis theorem. We postpone the
details in the next paragraph.
Pricing. We show in general how the LDP for the log-stock price provides an approximation for pricing out-of-the money call options of small maturity. We have the following
estimate:
lim t ln E[(St K)+ ] = lim t ln P[St K] = I(x),
t0
t0
(3.19)
where x = ln(K/S0 ) > 0 is the log-moneyness. A similar result holds for out-of-the money
put options. Let us first show the lower bound. For any > 0, we have
E[(St K)+ ] E[(St K)+ 1St K ] P[St K + ].
(3.20)
By sending to zero, and from the continuity of I, we obtain the desired lower bound. To
show the upper bound, we apply Holder inequality for any p, q > 1, 1/p + 1/q = 1, to get
1
1
p
q
E[(St K)+ ] = E[(St K)+ 1St K ]
E[(St K)p+ ]
E[1St K ]
1
1
p
q
P[St K] .
E[Stp ]
Taking ln and multiplying by t, this implies
t ln E[(St K)+ ]
t
1
ln E[Stp ] + 1
t ln P[St K].
p
p
Now, for fixed p, t ln E[Stp ] 0 as t goes to zero. It follows from the LDP (3.18) that
1
lim sup t ln E[(St K)+ ] 1
I(x).
p
t0
By sending p to infinity, we obtain the required upper-bound and so finally the rare event
estimate in (3.19).
Implied volatility. As a consequence of the rare event estimate for call option pricing,
we can determine the asymptotic behaviour for the implied volatility. Recall that the
implied volatility t = t (x) of a call option on St with strike K = S0 ex , and time to
maturity t is determined from the relation:
E[(St K)+ ] = C BS (t, S0 , x, t ) := S0 (d1 (t, x, t )) S0 ex (d2 (t, x, t )), (3.21)
where
d1 (t, x, ) =
x + 21 2 t
,
t
d2 (t, x, ) = d1 (t, x, ) t,
Rd
and (d) = (x)dx is the cdf of the normal law N (0, 1). As a consequence of the large
deviation pricing (3.19), we compute the asymptotic implied volatility for out-of-the money
call options of small maturity:
lim t (x) =
t0
|x|
p
,
2I(x)
x 6= 0.
The derivation relies on the standard estimate on (see section 14.8 in [54]):
1 1
1
d+
(d) 1 (d) (d), d > 0,
d
d
(3.22)
(3.23)
which implies that 1 (d) (d)/d as d goes to infinity. We show the result for outthe-money call option price, i.e. x > 0. The same argument is valid for x < 0. Since the
out-of-the money call option price E[(St K)+ ] goes to zero as t 0, we see from the
(d1 )
d1
31
x2
,
2 lim inf t0 t2
which proves the lower bound in (3.22) by sending to zero. For the upper bound, fix t
the maturity of the option, and denote by S t the Black-Scholes price with the constant
implied volatility t . Then, from (3.19) and as in (3.20), for all > 0, we have for t small
enough,
I(x)
exp
E[(St K)+ ] = E[(Stt K)+ ]
t
P[Stt K + ] = (d2, ) = 1 (d2, )
1 1
(d2, )
|d2, | +
|d2, |
where
d2, =
ln
K+
S0
+ 1 t2 t
2
,
t t
as t goes to zero. Taking t ln in the above inequality, and sending t to zero, we deduce that
(I(x) )
ln
K+
S0
2
2 lim supt0 t2
which proves the upper bound in (3.22) by sending to zero, and finally the desired result.
3.2.2
In this paragraph, we consider the popular Heston stochastic volatility model for the log
stock price Xt = ln St (interest rates and dividends are assumed to be null):
p p
1
dXt = Yt dt + Yt ( 1 2 dWt1 + dWt2 )
2
p
dYt = ( Yt )dt + Yt dWt2 ,
(3.24)
(3.25)
32
1)
p
= ln E exp p
Ys dWs2
Ys ds exp
Ys ds ,
2
2
0
0
0
where we used the law of iterated conditional expectation in the second equality, and the
fact that Yt is measurable with respect to W 2 . By Girsanovs theorem, we then get
Z
h
i
p(p 1) t
Q
t (p) = ln E exp
Ys ds ,
2
0
where under Q, the process Y satisfies the sde
dYt = ( ( p)Yt )dt +
p
Yt dWt2,Q ,
(3.27)
with W 2,Q a Brownian motion. We are then reduced to the calculation of exponential of
functionals of CIR processes, for which we have closed-form expressions derived either by
probabilistic or PDE methods (see e.g. [14], [1], [37], [39]).
We present here the PDE approach. Fix p R, and consider the function defined by
Z
h
i
p(p 1) t
Q
Ys ds Y0 = y ,
F (t, y, p) = E exp
2
0
so that t (p) = ln F (t, y0 , p). From Feynman-Kac formula, the function F is solution to
the parabolic linear Cauchy problem
F
p(p 1)
F
2 2F
=
yF + ( ( p)y)
+ y 2
t
2
y
2 y
F (0, y, p) = 1.
We look for a function F in the form F (t, y, p) = exp((t, p)+y(t, p)) for some deterministic
functions (., p), (., p). By plugging into the PDE for F , we obtain that and satisfy
the ordinary differential equations (ode):
p(p 1)
2
( p) + 2 ,
2
2
= ,
(0, p) = 0
(0, p) = 0.
(3.28)
(3.29)
One can solve explicitly the Riccati equation (3.28) under the condition:
= (p) := ( p)2 2 p(p 1) 0.
33
(3.30)
Indeed, in this case, a particular solution to (3.28) is given by the constant function in time:
p +
0 (p) =
,
2
and denoting by = 1/( 0 ), i.e. = 0 + 1 , we see that the function satisfies the
first-order linear ode:
1
1
+ + 2 = 0, (0, p) =
.
t
2
0 (p)
The solution to this equation is given by
(t, p) =
1 2 t
2
e t
(e
1)
2
p +
We then obtain the solution to the Riccati equation (3.28) after some straightforward
calculations:
p 1 e t
1
=
(t, p) = 0 (p) +
(t, p)
2
1 he t
sinh 2 t
,
= p(p 1)
( p) sinh 2 t + cosh 2 t
and
(t, p) =
=
1 he t i
h
(
)t
2
ln
2
1h
t
h
e 2
( p )t + 2 ln
( p) sinh 2 t + cosh
where
h = h(p) :=
i
2 t
p
.
p +
The solutions , to these equations are defined for all t 0 such that (1 he
> 0, i.e. for t [0, T ) where
(
,
if p 0,
T = T (p) =
1
ln h,
if p < 0.
t )/(1 h)
When (p) < 0, we extend the functions and by analytic continuation by substituting
by i , which yields:
sin 2 t
(t, p) = p(p 1)
(3.31)
,
( p) sin 2 t + cos 2 t
h
(t, p) =
( p i )t
2
i
i
e 2 t
+ 2 ln
.
(3.32)
( p) sin 2 t + cos 2 t
34
( p) sin
t + cos
t > 0,
2
2
which corresponds to an explosion time
T
i
2 h
1p>0 + arctan
.
p
= T (p) =
Recalling that a Laplace transform is analytic in the interior of its convex domain (when
its is not empty), we deduce that the function t defined in (3.26) is explicitly given by
(
(t, p) + y0 (t, p), t < T (p), p R
t (p) =
,
t T (p), p R.
Our purpose is to derive a LDP for Xt x0 when t goes to zero, and thus, in view of
Gartner-Ellis theorem, we need to determine the limiting moment generating function:
(p) := lim tt (p/t).
t0
We are then led to substitute p p/t and let t 0 in the above calculations. Observe that
for t small, (p/t) (1 2 ) 2 p2 /t2 , and so:
h
p1 2 i
2t
p
1p0 + sgn(p) arctan
, for 6= 0, p 6= 0,
T (p/t)
|p| 1 2
t
, for = 0, p 6= 0,
|p|
= , for p = 0.
Hence, for t > 0 small, the set {t < T (p/t)} may be written equivalently as p (p , p+ )
where p < 0 is defined by
12
2 arctan
,
if < 0
12
p =
,
if = 0
2
1
2+2 arctan
, if > 0
12
p+ =
2+2 arctan
12
,
12
2 arctan
12
12
35
,
if < 0
if = 0
,
if > 0.
p
p
Moreover, by observing that t (p/t) 1 2 |p|, we find that for all p (p , p+ ):
p
1
p 12
p
2
1 cot
2
h
p
p
2
1 2 ei|p| 1 /2
+ 2 ln
p
p 12
sin
+
1 2 cos
2
t(t, p/t)
12
2
i
We conclude that
(p) =
y0
12 cot
12
2
for p (p , p+ )
otherwise.
From the basic properties of moment generating function, we know that is convex, lowersemicontinuous, and by direct inspection, we easily see that is smooth on (p , p+ ) with
(p) and |0 (p)| as p p+ and p p . We can then apply Ellis-Gartner theorem, which
implies that Xt x0 satisfies a LDP with rate function equal to the Fenchel-Legendre
transform of , i.e.
(x) =
sup
[px (p)], x R.
(3.33)
p(p ,p+ )
(3.34)
(3.35)
t0
t0
xk
xk
|x|
,
2I(x)
and explicit expressions of and its Fenchel Legendre transform I = in (3.33), we obtain
0 (x) =
2
5 2 2
h
3
y0 1 +
x+
1
x
+
O(x
)
,
4y0
2
24y02
which determines the level, slope and curvature of the small-time implied volatility at the
money in the Heston model, as in [46] and [16].
36
4.1
4.1.1
A basic problem in measuring portfolio credit risk is determining the distribution of losses
from default over a fixed horizon. Credit portfolios are often large, including exposure to
thousands of obligors, and the default probabilities of high-quality credits are extremely
small. These features in credit risk context lead to consider rare but significant large loss
events, and emphasis is put on the small probabilities of large losses, as these are relevant
for calculation of value at risk and related risk measures.
We use the following notation:
n = number of obligors to which portfolio is exposed,
Yk = default indicator (= 1 if default, 0 otherwise) for k-th obligor,
pk = marginal probability that k-th obligor defaults, i.e. pk = P[Yk = 1],
ck = loss resulting from default of the k-th obligor,
Ln = c1 Y1 + . . . + cn Yn = total loss from defaults.
We are interested in estimating tail probabilities P[Ln > `n ] in the limiting regime at
increasingly high loss thresholds `n , and rarity of large losses resulting from a large number
n of obligors and multiple defaults.
For simplicity, we consider a homogeneous portfolio where all pk are equal to p, and
all ck are equal constant to 1. An essential feature for credit risk management is the
mechanism used to model the dependence across sources of credit risk. The dependence
among obligors is modelled by the dependence among the default indicators Yk . This
dependence is introduced through a normal copula model as follows: each default indicator
is represented as
Yk = 1{Xk >xk } ,
k = 1, . . . , n,
where (X1 , . . . , Xn ) is a multivariate normal vector. Without loss of generality, we take each
Xk to have a standard normal distribution, and we choose xk to match the marginal default
probability pk , i.e. xk = 1 (1 pk ) = 1 (pk ), with cumulative normal distribution.
We also denote = 0 the density of the normal distribution. The correlations along the
Xk , which determine the dependence among the Yk , are specified through a single factor
model of the form:
p
Xk = Z + 1 2 k , k = 1, . . . , n.
(4.1)
where Z has the standard normal distribution N (0, 1), k are independent N (0, 1) distribution, and Z is independent of k , k = 1, . . . , n. Z is called systematic risk factor (industry,
regional risk factors for example ...), and k is an idiosyncratic risk associated with the k-th
obligor. The constant in [0, 1) is a factor loading on the single factor Z, and assumed
here to be identical for all obligors. We shall distinguish the case of independent obligors
( = 0), and dependent obligors ( > 0). More general multivariate factor models with inhomogeneous obligors are studied in [31]. Other recent works dealing with large deviations
37
in credit risk include the paper [50], which analyzes rare events related to losses in senior
traches of CDO, and the paper [45], which studies the portfolio loss process.
4.1.2
Independent obligors
In this case, = 0, the default indicators Yk are i.i.d. with Bernoulli distribution of
parameter p, and Ln is a binomial distribution of parameters n and p. By the law of large
numbers, Ln /n converges to p. Hence, in order that the loss event {Ln ln } becomes rare
(without being trivially impossible), we let ln /n approach q (p, 1). It is then appropriate
to specify ln = nq with p < q < 1. From Cramers theorem and the expressions of the
c.g.f. of the Bernoulli distribution and its Fenchel-Legendre transform, we obtain the large
deviation result for the loss probability:
1
1 q
q
lim ln P[Ln nq] = q ln
(1 q) ln
< 0.
n n
p
1p
Remark 4.1 By denoting () = ln(1 p + pe ) the c.g.f. of Yk , we have an IS (unbiased)
estimator of P[Ln nq] by taking the average of independent replications of
exp(Ln + n())1Ln nq
where Ln is sampled with a default probability p() = P [Yk = 1] = pe /(1 p + pe ).
Moreover, see Remark 2.3, this estimator is asymptotically optimal, as n goes to infinity,
for the choice of parameter q 0 attaining the argmax in q ().
4.1.3
Dependent obligors
We consider the case where > 0. Then, conditionally on the factor Z, the default indicators
Yk are i.i.d. with Bernoulli distribution of parameter:
p
p(Z) = P[Yk = 1|Z] = P[Z + 1 2 k > 1 (p)|Z]
Z + 1 (p)
p
=
.
(4.2)
1 2
Hence, by the law of large numbers, Ln /n converges in law to the random variable p(Z)
valued in (0, 1). In order that {Ln ln } becomes a rare event (without being impossible)
as n increases, we therefore let ln /n approach 1 from below. We then set
ln = nqn ,
with qn < 1, qn % 1 as n .
(4.3)
with 0 < a 1.
(4.4)
We then state the large deviations result for the large loss threshold regime.
Theorem 4.1 In the single-factor homogeneous portfolio credit risk model (4.1), and with
large threshold ln as in (4.3)-(4.4), we have
1 2
1
ln P[Ln nqn ] = a 2 .
n ln n
lim
38
Observe that in the above theorem, we normalize by ln n, indicating that the probability
decays like n , with = a(1 2 )/2 . We find that the decay rate is determined by the
effect of the dependence structure in the Gaussian copula model. When is small (weak
dependence between sources of credit risk), large losses occur very rarely, which is formalized
by a high decay rate. In the opposite case, this decay rate is small when tends to one,
which means that large losses are most likely to result from systematic risk factors.
Proof. 1) We first prove the lower bound:
lim inf
n
1
1 2
ln P[Ln nqn ] a 2 .
ln n
(4.5)
(4.6)
n 1.
(4.7)
Now given p(Z) = qn , Ln is binomially distributed with parameters n and qn , and thus
1
P[Ln nqn |p(Z) = qn ] 1 (0) = (> 0).
2
(4.8)
We focus on the tail probability P[Z zn ] as n goes to infinity. First, observe that since
qn goes to 1, we have zn going to infinity as n tends to infinity. Furthermore, from the
expression (4.2) of p(z), the rate of decrease (4.4), and using the property that 1 (x) '
(x)/x as x , we have
z + 1 (p)
n
p
1 2
p
1 z + 1 (p)
1 2
2
n
p
exp
,
1
zn + (p)
2
1 2
O(na ) = 1 qn = 1 p(zn ) = 1
'
as n , so that by taking logarithm:
a ln n
1 2 zn2
ln zn = O(1).
2 1 2
39
This implies
zn2
n ln n
lim
= 2a
1 2
.
2
(4.9)
By writing
P[Z zn ] = P[zn Z zn + 1]
1
1
exp (zn + 1)2 ,
2
2
we deduce with (4.9)
lim inf
n
1
1 2
ln P[Z zn ] a 2 .
ln n
Combining with (4.7) and (4.8), we get the required lower bound (4.5).
2) We now focus on the upper bound
lim sup
n
1
1 2
ln P[Ln nqn ] a 2 .
ln n
(4.10)
(4.11)
(4.12)
(4.13)
where
(
(q, z) = sup[q (, z)] =
0
q ln
q
p(z)
0,
+ (1 q) ln
1q
1p(z) ,
if q p(z)
if p(z) < q 1.
(4.14)
where we set Fn (z) = n (qn , z). Since > 0, the function p(z) is increasing in z, so
(, z) is an increasing function of z for all 0. Hence, Fn (z) is an increasing function of
z, which is nonpositive and attains its maximum value 0, for all z s.t. qn = p(zn ) p(z),
i.e. z zn . Moreover, by differentiation, we can check that Fn is a concave function of z.
We now introduce a change of measure. The idea is to shift the factor mean to reduce the
40
variance of the term eFn (Z) in the r.h.s. of (4.14). We consider the change of measure P
that puts the distribution of Z to N (, 1). Its likelihood ratio is given by
dP
dP
1
= exp Z 2 ,
2
so that
1 2
E eFn (Z) = E eFn (Z)Z+ 2 ,
where E denotes the expectation under P . By concavity of Fn , we have Fn (Z) Fn () +
Fn0 ()(Z ), so that
1 2
0
0
E eFn (Z) E eFn ()+(Fn ())ZFn ()+ 2 .
(4.15)
(4.16)
so that the term in the expectation in the r.h.s. of (4.15) does not depend on Z, and is
therefore a constant term (with zero-variance). Such a n exists, since, by strict concavity
of the function z Fn (z) 12 z 2 , equation (4.16) is the first-order equation associated to
the optimization problem:
1
n = arg max[Fn () 2 ].
R
2
With this choice of factor mean n , and by inequalities (4.14), (4.15), we get
1
(4.17)
We now prove that n /zn converges to 1 as n goes to infinity. Actually, we show that for
all > 0, there is n0 large enough so that for all n n0 , zn (1 ) < n < zn . Since
Fn0 (n ) n = 0, and the function Fn0 (z) z is decreasing by concavity Fn (z) z 2 /2, it
suffices to show that
Fn0 (zn (1 )) zn (1 ) > 0
and
(4.18)
We have
Fn0 (z) = n
n
n
p
p
.
2
p(z)
1 p(z)
1
1 2
The second inequality in (4.18) holds since Fn0 (zn ) = 0 and zn > 0 for qn > p, hence for
n large enough. Actually, zn goes to infinity as n goes to infinity from (4.9). For the first
inequality in (4.18), we use the property that 1 (x) ' (x)/x as x , so that
lim
p(zn )
= 1,
p(zn (1 ))
and
41
lim
1 p(zn )
= 0.
1 p(zn (1 ))
= 0(na(1) ),
2
1
and therefore
2
2
zn (1 ) = 0( ln n) = o(n1a(1) )
We deduce that for n large enough Fn0 (zn (1 )) zn (1 ) > 0 and so (4.18).
Finally, recalling that Fn is nonpositive, and from (4.17), we obtain:
lim sup
n
1
1
2
1
z2
1 2
ln P[Ln nqn ] lim n = lim n = a 2 . (4.19)
ln n
2 n ln n
2 n ln n
(4.20)
Mn2 (, qn ) (P[Ln nq])2 , so that the fastest possible rate of decay of Mn2 (, qn ) is twice
the probability itself:
lim inf
n
1
1
ln Mn2 (qn , ) 2 lim
ln P[Ln nqn ].
n ln n
ln n
(4.21)
1
1 2
1
ln Mn2 (n , qn ) 2a 2
= 2 lim
ln P[Ln nqn ],
n ln n
ln n
lim
and thus the estimator (4.20) for the choice = n is asymptotically optimal. The choice
of = zn also leads to an asymptotically optimal estimator.
Remark 4.2 We also prove by similar methods large deviation results for the loss distribution in the limiting regime where individual loss probabilities decrease toward zero, see
[31] for the details. This setting is relevant to portfolios of highly-rated obligors, for which
one-year default probabilities are extremely small. This is also relevant to measuring risk
over short time horizons. In this limiting regime, we set
ln = nq,
Then,
1
a
ln P[Ln nq] = 2 ,
n n
lim
and we may construct similarly as in the case of large losses, a two-step IS asymptotically
optimal estimator.
43
4.2
Arbitrage is the cornerstone concept of modern mathematical finance, and several versions
of the so-called fundamental theorem of asset pricing have been proved, see [8] for an
overview. This theorem states that absence of arbitrage is essentially equivalent to the
existence of an equivalent martingale measure. While typical models admit equivalent
martingale measures up to a finite horizon, this is not true globally. This means that shortterm arbitrage would not exist, but arbitrage opportunities may arise in the long run. The
existence of such infinite horizon arbitrage opportunities has been studied for example in
[38]. We present here some results in [23] on explicit estimates for asymptotic arbitrage,
and related to large deviation estimates for the market price of risk.
Let us consider for simplicity a one dimensional diffusion model for the stock price
dSt = (St ) dWt + (St )dt ,
(4.22)
on a filtered probability space (, F, F = (Ft )t0 , P), with W a standard Brownian motion,
the (local) volatility coefficient, and is the so-called market price of risk function: the
stocks rate of return per unit volatility. We assume that the Doleans-Dade exponential
process
Z
Z t
1 t
Zt = exp
(Su )dWu
|(Su )|2 du , t 0,
(4.23)
2 0
0
is a martingale, so that from Girsanovs theorem, for each T > 0, the measure QT on FT
defined by
dQT
dP
= ZT ,
(ii) P[XT eT ] 1 .
(4.25)
Here, for T > 0, we say that the trading strategy is admissible if it is predictable and SRt
integrable, and the corresponding outcome process {Xt = 0 u dSu , 0 t T } is bounded
from below by a constant. The message of the result (4.25) is the following. It says that
44
for any tolerance level , one may find T large enough, T T , such that an exponentially
growing profit can be obtained on [0, T ] with an exponentially decreasing potential loss and
with a probability of failure below . This can be interpreted as a strong form of asymptotic
arbitrage. However, the relation between and T is not clarified, and the trading strategies
and outcomes XT are not explicitly given (indeed, the proof is non-constructive).
We go further on such asymptotic arbitrage results by defining a stronger formulation
on the market price of risk than (4.24).
Definition 4.1 We say that the market price of risk satisfies a large deviations estimate
if there are constants c1 , c2 > 0 s.t.
Z
i
1 h1 T
lim sup ln P
|(St )|2 dt c1
< c2 .
(4.26)
T
T 0
T
For the Black-Scholes model, the market price of risk is constant, and so the estimate (4.26)
is trivially satisfied if the market price of risk is non zero. For other models, such large
deviations estimate for the market price of risk would follow by a contraction principle
whenever the diffusion process is ergodic and satisfies a large deviations principle. In the
case of the geometric Ornstein-Uhlenbeck process, this can be derived more directly from
explicit calculations of the Laplace transform and Ellis-Gartener theorem.
Under the stronger large deviation estimate (4.26), one expects a strenghtening of (4.25)
with an exponential decay in time for the probability of falling short of the exponential lower
bound in (4.25) (ii):
P[XT < e1 T ] e3 T ,
for some positive constants 1 , 3 . Such a result would establish an explicit relationship
between a preset tolerance level and the time necessary to reach that tolerance level.
We first illustrate such kind of result in the simple case of the Black-Scholes model with
constant volatility > 0, and where the market price of risk function is constant:
dSt = St (dWt + dt).
(4.27)
t 0.
Theorem 4.2 Consider the Black-Scholes model (4.27). Take (0, 2 /2), 0 < 1 < ,
and set for all T > 0, AT = {ZT eT }, T = e1 T Q[AcT ]/Q[AT ]. Then, the claim
XT
= e1 T 1AcT T 1AT
(4.28)
is attainable from zero initial capital, i.e. there exists an admissible trading strategy s.t.
RT
XT = 0 t dSt , and satisfies for any 0 < 2 < 1 :
e2 T , for large T,
1 2
1
lim
ln P[XT < e1 T ] =
.
T T
2 2
XT
45
(4.29)
(4.30)
{ZT eT } ZT dP
e(1 )T
e2 T
(4.31)
1
1 eT
for large T
if 2 < 1 .
Consider now the contingent claim XT defined in (4.28), which is FTW -measurable. Clearly,
XT T , and we have by construction:
EQ [XT ] = e1 T Q[AcT ] T Q[AT ] = 0.
By the martingale representation theorem applied to the Q-martingale EQ [XT |Ft ], there
RT
exists an admissible trading strategy s.t. XT = 0 t dSt . It remains to show the large
deviation estimate (4.30). Observe by definition of XT in (4.28) that
{XT < e1 T } = AT .
Thus,
P[XT < e1 T ] = P[ZT eT ]
h
i
2
= P WT T T
2
T ,
=
2
where denotes the distribution function of a standard normal variable with density .
Finally, recalling the equivalence (d) (d)/d as d goes to infinity, we obtain the large
deviation estimate (4.30).
2
Remark 4.3 1. We can compute explicitly the constant T in the assertion of Theorem
4.2. Indeed, notice by Girsanovs theorem that WtQ = Wt + t is a Q-Brownian motion so
that
i
2
Q[AT ] = Q[WTQ + T T
2
=
+
T ,
2
and so
2 + T
T = e1 T
.
2 + T
2. The decay rate in (4.30) is optimal under the constraint XT T . More precisely, for
R
T = T t dSt s.t. X
T T , we can show by using Neymanany admissible outcome X
0
T < e1 T ] P[ZT eT ], and so the shortfall probability P[X
T <
Pearson lemma that P[X
T
e 1 ] cannot decay at a faster rate than described in (4.30).
46
We next consider the less trivial example of the geometric Ornstein-Uhlenbeck process
for the stock price (also known as Platen-Rebolledo model):
St = exp(Yt ),
(4.32)
(4.33)
2
2
with a market price of risk: t = 1 Yt 2 , which satisfies the large deviation estimate
(4.26) with an explicit rate function indentified in [21]. The Doleans-Dade process
Z t
Z
2
1 t 1
2 2
1
Yu
dWu
Yu
Zt = exp
du
(4.34)
2
2 0 2
2
0
is a true martingale (see Proposition 2.5 in [1]), and this defines, for each T > 0, a unique
equivalent martingale measure Q on FT for S. Similarly as in Theorem 4.2, we have the
following large deviation result.
Theorem 4.3 Consider the geometric Ornstein-Uhlenbeck model (4.32)-(4.33). Take
2
(0, 8 + 4 ), 0 < 1 < , and set for all T > 0, AT = {ZT eT }, T = e1 T Q[AcT ]/Q[AT ].
Then, the claim
XT
= e1 T 1AcT T 1AT
(4.35)
is attainable from zero initial capital, i.e. there exists an admissible trading strategy s.t.
RT
XT = 0 t dSt , and satisfies for any 0 < 2 < 1 :
e2 T , for large T,
2
+
8
4
1
lim
ln P[XT < e1 T ] = 2
.
T T
8 + 2
XT
(4.36)
(4.37)
Proof. The proof follows the same lines of arguments as in Theorem 4.2. Keeping the
same notations, we have {XT < e1 T } = AT , and we only have to prove the large deviation
estimate:
2
+
8
4
1
ln P[AT ] = 2
.
(4.38)
lim
T T
+
8
= {ZT eT }
Z T
Z
1
T 2
2
2 2
=
Yt dYt +
Y dt + (Y0 YT )
T
2 0 t
2
8
0
n1
o
1
2 2
=
T + T
,
T
T
8
47
RT
RT
2
with T = 0 Yt dYt + 2 0 Yt2 dt and T = 2 (Y0 YT ). We then use results in [21], which
state large deviations with an explicit rate function for T /T , combined with a perturbation
argument for T /T (see details in [23]), to derive the large deviation estimate (4.38).
2
Remark 4.4 The key point in Theorem 4.2 and Theorem 4.3 is the exponential decay of
the probabilities
P[ZT eT ].
(4.39)
For general models as in (4.22) with density of martingale measure (4.23), such result would
result from Donsker-Varadhan large deviation principle for the empirical distribution of S
and contraction principle. Indeed, observe by Itos formula that
Z
dS
t
(St )
(St )dt
(St )
0
Z T
Z T
=
f (St )dSt
2 (St )dt
0
0
Z T
Z
1 T 0
= F (ST ) F (S0 )
f (St ) 2 (St )dt
2 (St )dt,
2 0
0
(St )dWt =
0
1
T
St dt,
0
the occupation measure of S, we see that the probability in (4.39) may be written as
P[ZT eT ] = P
h1
T
(F (ST ) F (S0 )
1
2
i
h(x)dLT ,
where we set h(x) = f 0 (x)2 (x) + 2 (x). Hence, we would obtain an exponential decay
estimate for (4.39) once we have a large deviation principle for
1
1
(F (ST ) F (S0 )
T
2
h(x)dLT .
0
Such LDP with explicit rate function is proved for the geometric Ornstein-Uhlenbeck process based on the result in [21], and could be extended for affine stochastic volatility (SV)
models by relying on recent results in this literature, see e.g. [39]. It is also used for large
time to maturity asymptotics in option pricing and implied volatility for affine SV models.
4.3
4.3.1
A popular approach for institutional managers is concerned about the performance of their
portfolio relative to the achievement of a given benchmark. This means that investors are
interested in maximizing the probability that their wealth exceed a predetermined index.
48
Equivalently, this may be also formulated as the problem of minimizing the probability that
the wealth of the investor falls below a specified value. This target problem was studied
by several authors for a goal achievement in finite time horizon, see e.g. [7] or [22]. In a
static framework, the paper [52] considered an asymptotic version of this outperformance
criterion when time horizon goes to infinity, which leads to a large deviations portfolio criterion. We now develop an asymptotic dynamic version of the outperformance management
criterion due to [47]. Such a problem corresponds to an ergodic objective of beating a given
benchmark, and may be of particular interest for institutional managers with long term
horizon, like mutual funds. On the other hand, stationary long term horizon problems are
expected to be more tractable than finite horizon problems, and should provide some good
insight for management problems with long, but finite, time horizon.
The general financial framework is the following. Let X be the portfolio wealth process
with a proportion A invested in stock. It grows typically in time at an exponential
rate, and thus, the relevant quantity over a long term horizon T is the logarithm of the
wealth:
1
ln XT ,
T
=
X
T
which is expected to converge when T goes to infinity. Given a threshold x, the outper x], which should decay exponentially fast as time horizon
formance probability is P[X
T
goes to infinity:
T x] exp(I(x, )T ),
P[X
as T .
Therefore, the lower is the decay rate I(x, ), the more chance there is of realizing an index
outperformance. The asymptotic version of the outperforming benchmark criterion is then
formulated as
v(x) = sup lim sup
A T
1
x],
ln P[X
T
T
x R.
(4.40)
This is a nonstandard large deviations control problem for which there is a priori no direct
dynamic programming, and we shall use a dual approach for solving this control problem.
4.3.2
We adopt a duality approach based on the relation relating rate function of a LDP and
cumulant generating function. The formal derivation is the following. Given a portfolio
, its rate function I(., ) should be related by the
policy A, if there is a LDP for X
T
Donsker-Varadhan formula:
I(x, ) = sup[x (, )],
49
ln E[eT XT ].
T
(4.41)
1
x] = inf I(x, )
ln P[X
T
A
T
= inf sup[x (, )],
A 0
and so, provided that one could intervert infinum and supremum in the above relation
(actually, the minmax theorem does not apply directly since A is not necessarily compact
and x (, ) is not convex):
v(x) = sup[x ()],
(4.42)
where
() = sup (, ) = sup lim sup
A T
ln E[eT XT ].
T
(4.43)
Problem (4.43) is the dual problem via (4.42) to the original problem (4.40). We shall
see in the next section that (4.43) can be reformulated via suitable change of probability
measures as a risk-sensitive ergodic control problem, which is more tractable than (4.40) and
is studied by dynamic programming methods leading in some cases to explicit calculations.
The above formal derivation suggests the following dual procedure for solving problem
(4.40):
Solve the dual risk-sensitive control problem (), and find the associated optimal
control
().
The solution to the primal large deviations portfolio selection problem is then given
by
v(x) = sup[x ()],
(4.44)
The rigorous derivation of this duality relation is stated in the following theorem, which
may be viewed as an extension of the Gartner-Ellis theorem with control components.
there exists
Theorem 4.4 Suppose that there exists (0, ] such that for all [0, ),
a solution
() A to the dual problem (), with a limit in (4.41), i.e.
() =
h
i
1
() .
ln E exp T X
T
T T
lim
(4.45)
[0,)
50
(4.46)
t x + n1 ,
,n
t
=
t 0 (0) + n1 ,
n T
h ,n
i
1
x = v(x).
ln P X
T
T
Proof.
Step 1. Let us consider the Fenchel-Legendre transform of the convex function on [0, ):
(x) =
sup [x ()],
[0,)
x R.
(4.47)
(4.49)
(, 0 ()).
Step 2: Upper bound. For all x R, A, an application of Chebycheffs inequality yields:
x] exp(xT )E[exp(T X
)], [0, ),
P[X
T
T
and so
lim sup
T
1
T x] (x).
ln P[X
T
(4.50)
51
A.
[0, ),
T
dQnT {xn <XT <xn +} T
1
(xn ) (xn + ) + T ((xn ), ,n )
T
h
i
1
n
T,n < xn + ,
+ ln QT xn < X
T
where we use (4.51) in the last inequality. By definition of the dual problem, this yields:
lim inf
T
1
T,n x] (xn ) (xn + ) + ((xn ))
ln P[X
T
h
i
1
,n < xn +
+ lim inf ln QnT xn < X
T
T T
(xn ) (xn )
h
i
1
,n < xn + , (4.52)
+ lim inf ln QnT xn < X
T
T T
where the second inequality follows by the definition of (and actually holds with equality
due to (4.48)). We now show that:
h
i
1
,n < xn + = 0.
lim inf ln QnT xn < X
(4.53)
T
T T
n the c.g.f. under Qn of X ,n . For all R, we have by (4.51):
Denote by
T
T
T
nT () := ln E QnT [exp(XT,n )]
T ((xn ) + , ,n ) T ((xn ), ,n ).
Therefore, by definition of the dual problem and (4.45), we have for all [(xn ),
(xn )):
lim sup
T
1 n
() ((xn ) + ) ((xn )).
T T
(4.54)
As in part 1) of this proof, by Chebycheffs inequality, we have for all [0, (xn )):
h ,n
i
1
T xn + (xn + ) + lim sup 1
nT ()
lim sup ln QnT X
T
T
T
T
(xn + ) + ( + (xn )) ((xn )),
where the second inequality follows from (4.54). We deduce
h ,n
i
1
(4.55)
where the second inequality and the last equality follow from (4.48). Similarly, we have for
all [(xn ), 0]:
i
h ,n
1
T xn (xn ) + lim sup 1
nT ()
lim sup ln QnT X
T T
T T
(xn ) + ((xn ) + ) ((xn )),
and so:
lim sup
T
h ,n
i
1
xn sup{ (xn ) () : [0, (xn )]}
ln QnT X
T
T
((xn )) + (xn ) (xn )
(xn ) + ((xn )) (xn ).
(4.56)
1
T,n x] (xn ).
ln P[X
T
0
0
(x) = 0 = ( (0)) for x (0):
lim inf lim inf
n
1
,n x] (x).
ln P[X
T
T
Remark 4.5 Connection with classical portfolio selection and risk aversion
Since XT = eT XT , we see that the value function of the dual problem can then be written
as:
h
i
1
()
ln E U XT
,
() = lim
T T
where U (x) = x is a power utility function with Constant Relative Risk Aversion (CRRA)
1 > 0 provided that < 1. Then, Theorem 4.4 means that for any target level x, the
optimal overperformance probability of growth rate is (approximately) directly related, for
large T , to the expected CRRA utility of wealth, by:
i
h
x] E U(x) X e(x)xT ,
P [X
(4.57)
T
T
53
with the convention that (x) = 0 for x 0 (0). Hence, 1 (x) can be interpreted as a
constant degree of relative risk aversion for an investor who has an overperformance target
level x. Moreover, by strict convexity of function in (4.47), it is clear that (x) is strictly
increasing for x > 0 (0). So an investor with a higher target level x has a lower degree
of relative risk aversion 1 (x). In summary, Theorem 4.4 (or relation (4.57)) inversely
relates the target level of growth rate to the degree of relative risk aversion in expected
utility theory.
4.3.3
Let us consider a factor model for the savings account S 0 , and the stock price S in the
form:
dSt0
= r(Yt )dt,
St0
dSt
= (Yt )dt + (Yt )dWt ,
St
with k > 0, > 0, and B, W are two correlated Brownian motions on a filtered probability
space (, F, F = (Ft )t0 , P). The coefficients r(y), (y), (y) > 0 are measurable functions
on R. The dynamics of the wealth process X = X controlled by the proportion invested
in stock is governed by
h
i
dXt = Xt r(Yt ) + ( r)(Yt )t dt + t (Yt )dWt .
(4.58)
Here, is an F-adapted process valued in A R, and we denote by A the set of controlled
processes s.t. the equation (4.58) is well-defined.
We now show that the dual control problem (4.43) may be transformed via a change of
probability measure into a risk-sensitive control problem. From the dynamics of the wealth
= 1 ln X as:
process, we may rewrite the moment generating function of X
T
T
T
h
Z
)] = X E () exp
JT (, ) := E[exp(T X
T
0
T
`(, Yt , t )dt
i
where
`(, y, ) = r(y) + ( r)(y)
(1 )
((y))2 ,
2
2
t () = exp
(Yu )u dWu
|(Yu )u | du , t 0.
2 0
0
(4.59)
k(
y Yt ) + (Yt )t dt + dWtQ ,
54
where W Q is a Q-Brownian motion. Hence, the dual problem may be written as a stochastic
control problem with exponential integral cost criterion:
1
ln JT (, )
A T T
Z T
1
Q
`(, Yt , t )dt , 0.
sup lim sup ln E exp
A T T
0
(4.60)
For fixed , this is an ergodic risk-sensitive control problem which has been studied by
several authors, see e.g. [19], [5] or [51] in a discrete-time setting. It admits a dynamic
programming equation:
() =
1 2 00
1
(y) + |0 (y)|2 + k(
y y)0 (y) + r(y)
2
2
( r)(y)
(1 )
0
2
+ max (y)( (y) +
)
((y)) .
A
(y)
2
(4.61)
The unknown is the pair ((), ) R C 2 (R), and () is a candidate for (). The
above P.D.E. is formally derived by considering the finite horizon problem
Z T
Q
u (T, y) = sup Ey exp
`(, Yt , t )dt ,
A
by writing the Bellman equation for this classical control problem and by making the
logarithm transformation
ln u (T, y) ' ()T + (y),
for large T . One can prove rigorously that a pair solution ((), ) to the PDE (4.61)
provides a solution () = () to the dual problem (4.43), with an optimal control given
by the argument max in (4.61). This is called a verification theorem in stochastic control
theory. Actually, there may have multiple solutions to (4.61) (even up to a constant),
and we need some ergodicity condition to select the good one that satisfies the verification
theorem. We refer to [47] for the details, and we illustrate our purpose in the case where A
= R, and the coefficients r(y), (y) are linear in y, and is constant. This includes BlackScholes, Platen-Rebolledo or Vasicek models. In this case, we are looking for a quadratic
solution to (4.61):
(y) =
1
A()y 2 + B()y,
2
(, y) =
0 (y)
( r)(y)
+
.
(y)2 (1 )
(y)(1 )
By substituting into (4.61), and cancelling terms in y 2 , y and constant terms, we obtain
a polynomial second degree equation for A()
a linear equation for B(), given A()
55
x () , x < 0 (),
(4.62)
[0,)
x + n1 , Yt ,
,n
t
=
0 (0) + n1 , Yt ,
) ,
if x x
:= 0 (0) =
( x x
v(x) = sup [x ()] =
0,
if x < x
,
[0,1)
(x) = 1
1 r 2
2 2
p
x
/x if x x
, and 0 otherwise, and
(
2x,
if x x
t =
r
,
if
x
<
x
.
2
k
1 1 +
2
2
k y r + 12 2
!2
,
[0, 1).
kyr+ 12 2 2
(xx)2
+
:= 12
xx+ k , if x x
v(x) =
4
0,
if x < x
,
and the optimal portfolio proportion is:
kyr+ 21 2
x)+k]
[4(x
Y
+
,
t
t (x) =
kyr+ 21 2
,
kY +
t
k
4
if x x
if x < x
.
Some variants and extensions in finance of this large deviations control problem are studied
in [35], [34] and recently for robust utility maximization in [41].
References
[1] Andersen L. and V. Piterbarg (2007): Moment explosions in stochastic volatility models,
Finance and Stochastics, 11, 29-50.
[2] Avellaneda M., Boyer-Olson D., Busca J. and P. Friz (2003): Methodes de grandes deviations
et pricing doptions sur indice, C.R. Acad. Sci. Paris, 336, 263-266.
[3] Barles G. (1994): Solutions de viscosite des equations dHamilton-Jacobi, Springer Verlag.
[4] Berestycki H., Busca J. and I. Florent (2004): Computing the implied volatility in stochastic
volatility models, Communications on Pure and Applied Mathematics, vol LVII, 1352-1373.
[5] Bielecki T. and S. Piska (2004): Risk-sensitive ICAPM with application to fixed-income management, IEEE Transactions on automatic control, 49, 420-432.
[6] Bourgade P. and O. Croissant (2005): Heat kernel expansion for a family of stochastic volatility
models: -geometry, available at http://arxiv.org/pdf/cs/0511024
[7] Browne S. (1999): Beating a moving target: optimal portfolio strategies for outperforming a
stochastic benchmark, Finance and Stochastics, 3, 275-294.
[8] Delbaen F. and W. Schachermayer (2006): The mathematics of arbitrage, Springer Finance.
[9] Dembo A., Deuschel J.D. and D. Duffie (2004): Large portfolio losses, Finance and Stochastics,
8, 3-16.
[10] Dembo A. and O. Zeitouni (1998): Large deviations techniques and applications, 2nd edition,
Springer Verlag.
[11] Donsker, M. D. and Varadhan, S. R. S. (1975): Asymptotic evaluation of certain Markov
process expectations for large time. I. II, Comm. Pure Appl. Math., 28, 1-47; ibid. 28, 279-301.
57
58
[31] Glasserman P., Kang W. and P. Shahabuddin (2007), Large deviations in multifactor portfolio
credit risk, Mathematical Finance, 17, 345-379.
[32] Glasserman P. and J. Li (2005), Importance sampling for portfolio credit risk, Management
science, 51, 1643-1656.
[33] Guasoni P. and S. Robertson (2008): Optimal importance sampling with explicit formulas in
continuous-time, Finance and Stochastics, 12, 1-19.
[34] Hata H., Nagai H. and S. Sheu (2009): Asymptotics of the probability minimizing a down-side
risk, to appear in Annals of Applied Probablity.
[35] Hata H. and J. Sekine (2005): Solving long term investment problems with Cox-Ingersoll-Ross
interest rates, Advances in Mathematical Economics, 8, 231-255.
[36] Henry-Labord`ere P. (2005): A general asymptotic implied volatility for stochastic volatility
models, working paper.
[37] Hurd T. and A. Kuznesov (2008): Explicit formulas for Laplace transforms of stochastic
integrals, Markov Processes and Related Fields, 14, 277-290.
[38] Kabanov Y. and D. Kramkov (1998): Asymptotic arbitrage in large financial markets, Finance and Stochastics, 2, 143-172.
[39] Keller-Ressel M. (2009): Moment explosions and long-term behavior of affine stochastic
volatility models, to appear in Mathematical Finance.
[40] Kemna A. and T. Vorst (1990): A pricing method for options based on average values,
Journal of Banking and Finance, 14, 113-129.
[41] Knispel T. (2010): Asymptotics of robust utility maximization, Preprint, University of Leibniz.
[42] Lasry J.M. and P.L. Lions (1995): Grandes deviations pour des processus de diffusion couples
par un processus de sauts, CRAS, t. 321, 849-854.
[43] Laurence P. (2008): Implied volatility, fundamental solutions, asymptotic analysis and symmetry methods, presentation at Linz, Ricam kick-off workshop.
[44] Lee R. (2004): The Moment Formula for Implied Volatility at Extreme Strikes Mathematical
Finance, 14, 469-480.
[45] Leijdekker, Mandjes M. and P. Spreij (2009): Sample path large deviations in credit risk,
preprint available at http://arxiv.org/pdf/0909.5610v1
[46] Lewis A. (2007): Geometries and smile asymptotics for a class of stochastic volatility models,
ww. optioncity.net
[47] Pham H. (2003): A large deviations approach to optimal long term investment, Finance and
Stochastics, 7, 169-195.
[48] Pham H. (2007): Some applications of large deviations in finance and insurance, in Paris
Princeton Lectures Notes on Mathematical Finance, 1919, 191-244.
[49] Robertson S. (2010): Sample path large deviations and optimal importance sampling for
stochastic volatility models, Stochastic Processes and their Applications, 120, 66-83.
[50] Sowers R. (2009): Losses in investment-grade tranches of synthetic CDOs: a large deviations
analysis, preprint.
59
[51] Stettner L. (2004): Duality and risk sensitive portfolio optimization, in Mathematics of
Finance, Proceedings AMS-IMS-SIAM, eds G. Yin and Q. Zhang, 333-347.
[52] Stutzer M. (2003): Portfolio choice with endogenous utility : a large deviations approach,
Journal of Econometrics, 116, 365-386.
[53] Tehranchi M. (2009): Asymptotics of implied volatility far from maturity, Journal of Applied
Probability, 46, 629-650.
[54] Williams D. (1991): Probability with martingales, Cambridge mathematical text.
60