Pham-Large Deviations in Finance-Smai

Large deviations in finance
Huyen PHAM
Laboratoire de Probabilites et
Mod`eles Aleatoires
CNRS, UMR 7599
Universite Paris 7
e-mail: pham@math.jussieu.fr
and Institut Universitaire de France
August 30, 2010
Abstract
The area of large deviations is a set of asymptotic estimates on rare events probabilities and a set of methods to derive such results. The subject had its origins in
the Scandinavian insurance industry where it was used for risk analysis. Since then,
it has undergone many developments, and large deviations theory is a very active field
in applied probability and statistical mechanics. It finds also important and various
applications in finance, and attracted a considerable interest in recent years among both
the academic and practitioners world. Financial applications range from Monte-Carlo
methods and importance sampling in option pricing to estimates of large portfolio
losses subject to credit risk, long term portfolio investment, and implied volatilities
asymptotics for stochastic volatility models. The purpose of these lecture notes is to
present some essential results and techniques in large deviations theory, and to review
recent developments in finance.
Key words: large deviations, importance sampling, rare event simulation, exit probability,
small time asymptotics, implied volatilities, credit risk, asymptotic arbitrage, long term
investment.
MSC Classification (2000): 60F10, 62P05, 65C05, 91B28, 91B30.
Lecture notes for the third SMAI European Summer School in Financial Mathematics, Paris, August
2010.
Contents
1 Introduction
2 An
2.1
2.2
2.3
2.4
2.5
2.6
overview of large deviations theory

Laplace transform and change of probability measures
Cramers theorem . . . . . . . . . . . . . . . . . . . . .
Large deviations and Laplace principles . . . . . . . .
Relative entropy and Donsker-Varadhan formula . . .
Sanovs theorem . . . . . . . . . . . . . . . . . . . . .
Freidlin-Wentzell theory . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
6
9
13
15
17
3 Large deviations in option pricing

3.1 Optimal importance sampling via large deviations approximation . . . . . .
3.1.1 Importance sampling for diffusions via Girsanovs theorem . . . . . .
3.1.2 Option pricing approximation with a Freidlin-Wentzell large deviation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3 Change of drift via Varadhan-Laplace principle . . . . . . . . . . . .
3.2 Asymptotics in stochastic volatility models . . . . . . . . . . . . . . . . . .
3.2.1 Large deviations in stochastic volatility models for small time to maturity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 The case of Heston model . . . . . . . . . . . . . . . . . . . . . . . .
22
22
23
4 Large deviations in risk management

4.1 Large portfolio losses in credit risk . . . . . . . . . . . . . . . . . .
4.1.1 Portfolio credit risk in a single factor normal copula model
4.1.2 Independent obligors . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Dependent obligors . . . . . . . . . . . . . . . . . . . . . . .
4.2 Asymptotic arbitrage and large deviations . . . . . . . . . . . . . .
4.3 A large deviations approach to optimal long term investment . . .
4.3.1 An asymptotic outperforming benchmark criterion . . . . .
4.3.2 Duality to the large deviations control problem . . . . . . .
4.3.3 Application to factor models . . . . . . . . . . . . . . . . .
37
37
37
38
38
44
48
48
49
54
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
26
29
29
32
Introduction
The area of large deviations is a set of asymptotic results on rare event probabilities and a
set of methods to derive such results. It goes back to the Scandinavian insurance industry,
where it was used for the evaluation of risk. Large deviations is a very active area in
applied probability, and finds important applications in finance where questions related
to extremal events play an increasingly important role. Large deviations arise in various
financial contexts. They appear naturally in option pricing with Monte-Carlo methods by
importance sampling and simulation of rare events. They also occur in risk management
for the estimation of probabilities of large portfolio losses in credit risk, and in long term
investment. Recently, there is a growing literature on asymptotics for stochastic volatility
models by methods of large deviations.
Large deviations theory deals with asymptotic estimates of probabilities of rare events
associated with random processes. These probabilities are exponentially small in the sense:
P[A ] = C exp

I
I
= exp + o(1/) ,
(1.1)
where I 0 is the so-called large deviations rate, and (C ) is a sequence converging at a

subexponential rate, i.e. ln C goes to zero when goes to zero. In the relation (1.1), I is
the leading order term on logarithm scale in large deviations theory, and C represents the
correction term in sharp large deviations. In these lectures, we shall mainly focus on the
leading order term.
Large deviations results are closely connected with change of probability measures under
which the event A , which is rare under the probability measure P, is no longer rare under
a new probability measure P . Typically, the Radon-Nikodym density dP/dP has an
exponential form, and the task is to determine the dominant contribution to the exponent,
when is small. Moreover, these changes of probability measure are useful in the simulation
of rare events. Indeed, if p = P[A ] is extremely small, then sampling according to P is
unlikely to produce the event A , and this will induce a high relative error for the estimation
of p . The method of quick simulation samples according to a change of probability measure
P will give more weight to the rare but important outcomes of A . It is called importance
sampling method, and the problem is then to find a desired P with an optimal control
according to some criterion, in order to reduce the variance of the estimator of p .
The large deviations rate function is often related to some form of entropy. Let us
illustrate this fact through an elementary example. Throw a (fair) dice n times and set fi
= ni /n as the frequency of number i = 1, . . . , 6. Denote by pn (f ) the probability that the
numbers 1, . . . , 6 appear with frequencies f = (f1 = n1 /n, . . . , f6 = n6 /n) in the n throws
of dices. This multinomial distribution is given by
pn (f ) =
1
n!
.
n
6 n1 ! . . . n6 !
Let us now look at the limiting behaviour of this distribution when the number n of throws
goes to infinity. By using Stirling formula: k! ' k k ek 2k, we get
1
nn
2n
pn (f ) '
.
n1
n6
n
6 n1 . . . n6
2n1 . . . 2n6
3
Since fi = ni /n, this implies

6
f
X
1
i
ln p ' I(f ) :=
.
fi ln
n
1/6
i=1
I(f ) 0 is the relative entropy of the a posteriori probability f = (fi )i with respect to the a
priori probability f = (1/6, . . . , 1/6). Hence, p = enI(f )+o(n) . This means that when n is
large, pn (f ) is concentrated where I(f ) is minimal. The minimizing point is attained for f
= (1/6, . . . , 1/6), and I(f ) = 0: this is the ordinary law of large numbers! For f 6= f , I(f )
> 0, and pn (f ) tends to zero exponentially fast. These ideas, concepts and computations in
large deviations (concentration phenomenon, entropy functional minimization, etc ..) still
hold in general random contexts, including diffusion processes, but need of course more
sophisticated mathematical treatments.
The outline of these lecture notes is organized as follows. The next section provides an
overview of large deviations theory. The aim is to present the main results, but also the different approaches and key ingredients for deriving such results. The fundamental theorems
in large deviations are often designed under the names of their historical founders. Section
3 is devoted to large deviations for option pricing. We shall address two issues. The first
one is concerned with importance sampling for Monte-Carlo computation of expectations
arising in option pricing. We present two approaches for the determination of suitable
change of probability measures, both relying on asymptotic results from large deviations.
The second problem is concerned with asymptotics in stochastic volatility models. We focus
in particular on the determination of implied volatilities for small time to maturity by using
methods from large deviations. The last part in Section 4 deals with large deviations in risk
management. We present some results on the estimation of large portfolio losses in credit
risk. We also address long term investment problem by considering asymptotic arbitrage,
and portfolio selection with outperformance probability criterion.
2
2.1
An overview of large deviations theory

Laplace transform and change of probability measures
If X is a (real-valued) random variable on (, F) with probability distribution (dx), the

cumulant generating function (c.g.f.) of is the logarithm of the Laplace function of X,
i.e.:
Z
X
() = ln E[e ] = ln ex (dx) (, ], R.
Notice that (0) = 0, and is convex by Holder inequality. We denote D() = { R :
() < }, and for any D(), we define a probability measure on R by:
(dx) = exp(x ())(dx).
(2.1)
Suppose that X1 , . . . , Xn , . . . , is an i.i.d. sequence of random variables with distribution

and consider the new probability measure P on (, F) with likelihood ratio evaluated at
4
(X1 , . . . , Xn ), n N , by:
n
n
X

Y
dP
d
(X1 , . . . , Xn ) =
(Xi ) = exp
Xi n() .
dP
d
i=1
(2.2)
i=1
By denoting E the corresponding expectation under P , formula (2.2) means that for all
n N ,
n
h
i
h

i
X
E f (X1 , . . . , Xn ) = E f (X1 , . . . , Xn ) exp
Xi + n() ,
(2.3)
i=1
for all Borel functions f for which the expectation on the l.h.s. of (2.3) is finite. Moreover,
the random variables X1 , . . . , Xn , n N , are i.i.d. with probability distribution under
P . Actually, the relation (2.3) extends from a fixed number of steps n to a random number
of steps, provided the random horizon is a stopping time. More precisely, if is a stopping
time in N for X1 , . . . , Xn , . . ., i.e. the event { < n} is measurable with respect to the
algebra generated by {X1 , . . . , Xn } for all n, then
h
i
h

i
X
E f (X1 , . . . , X )1 < = E f (X1 , . . . , X ) exp
Xi + () 1 < , (2.4)
i=1
for all Borel functions f for which the expectation on the l.h.s. of (2.4) is finite.
The cumulant generating function records some useful information on the probability
distributions . For example, 0 () is the mean of . Indeed, for any in the interior of
D(), differentiation yields by dominated convergence:
0 () =

E[XeX ]
= E X exp X () = E [X].
X
E[e ]
(2.5)
A similar calculation shows that 00 () is the variance of . Notice in particular that if 0

lies in the interior of D(), then 0 (0) = E[X] and 00 (0) = V ar(X).
Bernoulli distribution
Let the Bernoulli distribution of parameter p. Its c.g.f. is given by
() = ln(1 p + pe ).
A direct simple algebra calculation shows that is the Bernoulli distribution of parameter
pe /(1 p + pe ).
Poisson distribution
Let the Poisson distribution of intensity . Its c.g.f. is given by
() = (e 1).
A direct simple algebra calculation shows that is the Poisson distribution of intensity
e . Hence, the effect of the change of probability measure P is to multiply the intensity
by a factor e .
5
Normal distribution
Let the normal distribution N (0, 2 ), whose c.g.f. is given by:
() =
2 2
.
2
A direct simple algebra calculation shows that is the normal distribution N ( 2 , 2 ).

Hence, if X1 , . . . , Xn are i.i.d. with normal distribution N (0, 2 ), then under the change of
measure P with likelihood ratio:
n
X
dP
2 2
(X1 , . . . , Xn ) = exp
Xi n
,
dP
2
i=1
the random variables X1 , . . . , Xn are i.i.d. with normal distribution N ( 2 , 2 ): the effect
of P is to change the mean of Xi from 0 to 2 . This result can be interpreted as the
finite-dimensional version of Girsanovs theorem.
Exponential distribution
Let the exponential distribution of intensity . Its c.g.f. is given by
(
, <
ln
() =
,
A direct simple algebra calculation shows that for < , is the exponential distribution
of intensity . Hence, the effect of the change of probability measure P is to shift the
intensity from to .
2.2
Cramers theorem
The most classical result in large deviations area is Cramers theorem. This concerns
large deviations associated with the empirical mean of i.i.d. random variables valued in
a finite-dimensional space, and sometimes called large deviations of level 1. We do not
state the Cramers theorem in whole generality. Our purpose is to put emphasis on the
methods used to derive such result. For simplicity, we consider the case of real-valued i.i.d.
random variables Xi with (nondegenerate) probability distribution of finite mean EX1
R
P
= x(dx) < , and we introduce the random walk Sn = ni=1 Xi . It is well-known by
the law of large numbers that the empirical mean Sn /n converges in probability to x
=
EX1 , i.e. limn P[Sn /n (
x , x
+ )] = 1 for all > 0. Notice also, by the central limit
theorem that limn P[Sn /n [
x, x
+ )] = 1/2 for all > 0. Large deviations results focus

on asymptotics for probabilities of rare events, for example of the form P Snn x for x >
EX1 , and state that
Sn

x ' en ,
n
for some constant to be precised later. The symbol ' means that the ratio is one in
the log-limit (here when n goes to infinity), i.e. n1 ln P[Sn /n x] . The rate of
convergence is characterized by the Fenchel-Legendre transform of the c.g.f. of X1 :

(x) = sup x () [0, ], x R.
P
As supremum of affine functions, is convex. The sup in the definition of can be

evaluated by differentiation: for x R, if = (x) is solution to the saddle-point equation,
x = 0 (), then (x) = x (). Notice, from (2.5), that the exponential change of
measure P put the expectation of X1 to x. Actually, exponential change of measure is a
key tool in large deviations methods. The idea is to select a measure under which the rare
event is no longer rare, so that the rate of decrease of the original probability is given by
the rate of decrease of the likelihood ratio. This particular change of measure is intended
to approximate the most likely way for the rare event to occur.
By Jensens inequality, we show that (EX1 ) = 0. This implies that for all x EX1 ,

(x) = sup0 x () , and so is nondecreasing on [EX1 , ).
Theorem 2.1 (Cramers theorem)
For any x EX1 , we have
Sn

1
ln P
x = (x) = inf (y).
n n
yx
n
lim
(2.6)
Proof. 1) Upper bound. The main step in the upper bound of (2.6) is based on Chebichev
inequality combined with the i.i.d. assumption on the Xi :
Sn

P
x = E 1 Sn x E e(Sn nx) = exp n() nx , 0.
n
n
By taking the infimum over 0, and since (x) = sup0 [x ()] for x EX1 , we
then obtain

Sn
x exp n (x) .
P
n
and so in particular the upper bound of (2.6).

2) Lower bound. Since P Snn x P Snn [x, x + ) , for all > 0, it suffices to show
that
hS
i
1
n
[x, x + ) (x).
(2.7)
lim lim inf ln P
0 n n
n
For simplicity, we assume that is supported on a bounded support so that is finite
everywhere, and there exists a solution = (x) > 0 to the saddle-point equation: 0 () =
x, i.e. attaining the supremum in (x) = (x)x((x)). The key step is now to introduce
the new probability distribution as in (2.1) and P the corresponding probability measure
on (, F) with likelihood ratio:
dP
dP
n
Y
d
i=1

(Xi ) = exp Sn n() .
Then, we have by (2.3) and for all > 0:

hS
i
h

i
n
P
[x, x + ) = E exp Sn + n() 1 Sn [x,x+)
n
n
h

i
S
n
= en(x()) E exp n(
x) 1 Sn [x,x+)
n
n
hS
i
n
en(x()) en|| P
[x, x + ) ,
n
7
and so
hS
i
i
hS
1
1
n
n
ln P
[x, x + ) [x ()] || + ln P
[x, x + ) .
n
n
n
n
(2.8)
Now, since
0 () = x, we
i have E [X1 ] = x, and by the law of large numbers and CLT:
h
Sn
limn P n [x, x + ) = 1/2 (> 0). We also have (x) = x (). Therefore,
by sending n to infinity and then to zero in (2.8), we get (2.7). Finally, notice that
inf yx (y) = (x) since is nondecreasing on [EX1 , ).
2
Examples

1) Bernoulli distribution: for X1 B(p), we have (x) = x ln xp + (1 x) ln 1x
1p for x
[0, 1] and otherwise.

2) Poisson distribution: for X1 P(), we have (x) = x ln x + x for x 0 and
otherwise.
x2
3) Normal distribution: for X1 N (0, 2 ), we have (x) = 2
2 , x R.
4) Exponential distribution: for X1 E(), we have (x) = x 1 ln(x) for x > 0 and
(x) = otherwise.
Remark 2.1 Cramers theorem possesses a multivariate counterpart dealing with the large
deviations of the empirical means of i.i.d. random vectors in Rd .
Remark 2.2 The independence of the random variables Xi in the large deviations result
P
for the empirical mean Sn = ni=1 Xi /n can be relaxed with the Gartner-Ellis theorem,
once we get the existence of the limit:
() :=

1

ln E en.Sn , Rd ,
n n
lim
and the large deviation principle holds with a rate function given by the Fenchel-Legendre
transform of :
(x) =
sup [.x ()],
x Rd ,
(2.9)
Rd
provided is essentially smooth, i.e. (i) is differentiable in the interior of its domain
assumed to be non empty, and (ii) is steep: 0 (n ) for any sequence (n ) converging
to a boundary of the domain. The steepness condition ensures the existence of a saddlepoint (x) for any x Rd , with a maximum attained in (2.9) for (x).
Remark 2.3 (Relation with importance sampling)
Fix n and let us consider the estimation of pn = P[Sn /n x]. A standard estimator for
pn is the average with N independent copies of X = 1Sn /nx . However, as shown in the
introduction, for large n, pn is small, and the relative error of this estimator is large. By
using an exponential change of measure P with likelihood ratio
dP
dP

= exp Sn n() ,
so that
h
i

pn = E exp Sn + n() 1 Sn x ,
n
we have an importance sampling (IS) (unbiased) estimator of pn , by taking the average of

independent replications of

exp Sn + n() 1 Sn x .
n
The parameter is chosen in order to minimize the variance of this estimator, or equivalently
its second moment:
h
i

Mn2 (, x) = E exp 2Sn + 2n() 1 Sn x
n

exp 2n(x ())
(2.10)
By noting from Cauchy-Schwarzs inequality that Mn2 (, x) p2n = P[Sn /n x] ' Ce2n (x)
as n goes to infinity, from Cramers theorem, we see that the fastest possible exponential
rate of decay of Mn2 (, x) is twice the rate of the probability itself, i.e. 2 (x). Hence,
from (2.10), and with the choice of = x s.t. (x) = x x (x ), we get an asymptotic
optimal IS estimator in the sense that:
lim
1
1
ln Mn2 (x , x) = 2 lim ln pn .
n n
n
This parameter x is such that Ex [Sn /n] = x so that the event {Sn /n x} is no more
rare under Px , and is precisely the parameter used in the derivation of the large deviations
result in Cramers theorem.
2.3
Large deviations and Laplace principles
In this section, we present an approach to large deviations theory based on Laplace principle,
which consists in the evaluation of the asymptotics of certain expectations.
We first give the formal definition of a large deviation principle (LDP). Consider a
sequence {Z } on (, F, P) valued in some topological space X . The LDP characterizes
the limiting behaviour as 0 of the family of probability measures {P[Z dx]} on X
in terms of a rate function. A rate function I is a lower semicontinuous function mapping
I : X [0, ]. It is a good rate function if the level sets {x X : I(x) M } are compact
for all M < .
The sequence {Z } satisfies a LDP on X with rate function I (and speed ) if:
(i) Upper bound: for any closed subset F of X
lim sup ln P[Z F ] inf I(x).
xF
(ii) Lower bound: for any open subset G of X

lim inf ln P[Z G] inf I(x).
0
xG
If F is a subset of X s.t. inf xF o I(x) = inf xF I(x) := IF , then

lim ln P[Z F ] = IF ,
which means that P[Z F ] = C eIF / where (C ) is a sequence converging at a subexponential rate, i.e. ln C tends to zero when goes to zero. The classical Cramers theorem
considered the case of the empirical mean Z = Sn /n of i.i.d. random variables in Rd , with
= 1/n.
We first state a basic transformation of LDP, namely a contraction principle, which
yields that LDP is preserved under continuous mappings.
Theorem 2.2 (Contraction principle)
Suppose that {Z } satisfies a LDP on X with a good rate function I, and let f be a
continuous mapping from X to Y. Then {f (X )} satisfies a LDP on Y with the good rate
function

J(y) = inf I(x) : x X , y = f (x) .
In particular, when f is a continuous one-to-one mapping, J = I(f 1 ).
Proof. Clearly, J is nonnegative. Since I is a good rate function, for all y f (X ), the
infimum in the definition of J is obtained at some point of X . Thus, the level sets of J,
J (M ) := {y : J(y) M } are equal to
J (M ) = {f (x) : I(x) M } = f (I (M )),
where I (M ) := {x : I(x) M } is the corresponding level set of I. Since I (M ) is
compact, so are the sets J (M ), which means that J is a good rate function. Moreover,
by definition of J, we have for any A Y:
inf J(y) =
yA
inf
xf 1 (A)
f (x).
Since f is continuous, the set f 1 (A) is open (resp. closed) for any open (resp. closed) A
Y. Therefore, the LDP for {f (Z )} with rate function J follows as a consequence of the
LDP for {Z } with rate function I.
2
We now provide an equivalent formulation of large deviation principle, relying on Varadhans integral formula, which involves the asymptotics behavior of certain expectations. It
extends the well-known method of Laplace for studying the asymptotics of certain integrals
on R: given a continuous function from [0, 1] into R, Laplaces method states that
Z 1
1
lim ln
en(x) dx = max (x).
n n
x[0,1]
0
Varadhan results is formulated as follows:
Theorem 2.3 (Varadhan)
Suppose that {Z } satisfies a LDP on X with good rate function I. Then, for any bounded
continuous function : X R, we have

lim ln E e(Z )/ = sup (x) I(x) .

(2.11)
0
xX
10
Proof. (a) Since is bounded, there exists M (0, ) s.t. M (x) M for all x
X . For N positive integer, and j {1, . . . , N }, we consider the closed subsets of X
FN,j
x X : M +
2(j 1)M
2jM
,
(x) M +
N
N
so that N
j=1 FN,j = X . We then have from the large deviations upper bound on (Z ),
Z

e(Z )/ P[Z dx]

lim sup ln E e(Z )/ = lim sup ln
0
lim sup ln
0
X
N Z
X
N
X
)/

P[Z dx]
FN,j
j=1
lim sup ln
e(Z

e(M +2jM/N )/ P[Z FN,j ]
j=1
lim sup ln
0
max e(M +2jM/N )/ P[Z FN,j ]
j=1,...,N

2jM
+ lim sup ln P[Z FN,j ]
j=1,...,N
N
0

2jM
max
M +
+ sup [I(x)]
j=1,...,N
N
xFN,j

2jM
max
M +
+ sup [(x) I(x)] inf (x)
j=1,...,N
xFN,j
N
xFN,j
max
M +
sup [(x) I(x)] +

xX
2M
.
N
By sending N to infinity, we get the inequality in (2.11).

(b) To prove the reverse inequality, we fix an arbitrary point x0 X , an arbitrary > 0,
and we consider the open set G = {x X : (x) > (x0 ) }. Then, we have from the
large deviations lower bound on (Z ),

lim inf ln E e(Z )/ lim inf ln E e(Z )/ 1Z G

0
(x0 ) + lim inf ln P[Z G]

0
(x0 ) inf I(x)

xG
(x0 ) I(x0 ) .
2
Since x0 X and > 0 are arbitrary, we get the required result.
Remark 2.4 The relation (2.11) has the following interpretation. By writing formally the
LDP for (Z ) with rate function I as P[Z dx] ' eI(x)/ dx, we can write
Z
Z

E e(Z )/ =
e(x)/ P[Z dx] '
e((x)I(x))/ dx
' C exp
sup
xX ((x)
I(x))
As in Laplaces method, Varadhans formula states that to exponential order, the main
contribution to the integral is due to the largest value of the exponent.
11
When (2.11) holds, we say that the sequence (Z ) satisfies a Laplace principle on X
with rate function I. Hence, Theorem 2.3 means that the large deviation principle implies
the Laplace principle. The next result proves the converse.
Theorem 2.4 The Laplace principle implies the large deviation principle with the same
good rate function. More precisely, if I is a good rate function on X and the limit

lim ln E e(Z )/ = sup (x) I(x)
xX
is valid for all bounded continuous functions , then (Z ) satisfies a large deviation principle
on X with rate function I.
Proof. (a) We first prove the large deviation upper bound. Given a closed set F of X , we
define the nonpositive function: (x) = 0 if x F , and otherwise. Let d(x, F ) denote
the distance from x to F , and for n N , define
n (x) = n(d(x, F ) 1).
Then, n is a bounded continuous function and n % as n goes to infinity. Hence,
ln P[Z F ] = ln E[exp((Z )/)] ln E[exp(n (Z )/)],
and so from the Laplace principle
lim sup ln P[Z F ] lim sup ln E[exp(n (Z )/)]
0
= sup [n (x) I(x)] = inf [n (x) + I(x)].

xX
xX
The proof of the large deviation upper bound is then completed once we show that
lim inf [n (x) + I(x)] =
n xX
inf I(x),
xF
and this is left as an exercice to the reader.

(b) We now consider the large deviation lower bound. Let G be an open set in X . If IG =
, there is nothing to prove, so we may assume that IG < . Let x be an arbitrary point
in G. We can choose a real number M > I(x), and > 0 such that B(x, ) G. Define
the function
d(x, y)

(y) = M
1 ,
and observe that is bounded, continuous, nonnegative, and satisfies: (x) = 0, (y) =
M for y
/ B(x, ). We then have
E[exp((Z )/)] eM/ P[Z
/ B(x, )] + P[Z B(x, )]
eM/ + P[Z B(x, )],
12
and so

max lim inf ln P[Z B(x, )], M
lim inf ln E[exp((Z )/)]
0
= sup [(y) I(y)]

yX
I(x).
Since M < I(x), and B(x, ) G, it follows that
lim inf ln P[Z G] lim inf ln P[Z B(x, )] I(x),
0
and thus
lim inf ln P[Z G] inf I(x) = IG ,
0
xG
which ends the proof.
We next show how one can evaluate expectations arising in Laplace principles, which
can then be used to derive the large deviation principle.
2.4
Relative entropy and Donsker-Varadhan formula
The relative entropy plays a key role in the determination of the rate function. We are
given a topological space S, and we denote by P(S) the set of probability measures on S
equipped with its Borel field.
defined by
For P(S), the relative entropy R(.|) is a mapping from P(S) into R,
Z
Z
d d
d
ln
d =
ln
d,
R(|) =
d
d
S d
S
whenever P(S) is absolutely continuous with respect to , and we set R(|) =
otherwise. By observing that s ln s s 1 with equality if and only if s = 1, we see that
R(|) 0, and R(|) = 0 if and only if = .
The relative entropy arises in the expectation in the Laplace principle via the following
variational formula.
Proposition 2.1 Let be a bounded measurable function on S and a probability measure
on S. Then,
Z
hZ
i
ln e d =
sup
d R(|) ,
(2.12)
S
P(S)
and the supremum is attained uniquely by the probability measure 0 defined by

d0
d
e
.
S e d
13
Proof. In the supremum in (2.12), we may restrict to P(S) with finite relative entropy:
R(|) < . If R(|) < , then is absolutely continuous with respect to , and since
is equivalent to 0 , is also absolutely continuous with respect to 0 . Thus,
Z
Z
Z
d
d
ln
d R(|) =
d
d
S
S
S
Z
Z
Z
d
d0
d
ln
=
d
ln
d
d0
d
S
SZ
S
= ln e d R(|0 ).
S
We conclude by using the fact that R(|0 ) 0 and R(|0 ) = 0 if and only if = 0 . 2
The dual formula to the variational formula (2.12) is known as the Donsker-Varadhan
variational formula. We denote by B(S) the set of bounded measurable functions on S.
Proposition 2.2 (Donsker-Varadhan variational formula)
For all , P(S), we have
Z
hZ
i
d ln e d
R(|) =
sup
(2.13)
B(S)
Proof. We denote by H(, ) the r.h.s. term in (2.13). By taking the zero function on S,
we observe that H(, ) 0. From (2.12), we have for any B(S):
Z
Z
d ln e d,
R(|)
S
and so by taking the supremum over : R(|) H(, ). To prove the converse inequality,
we may assume w.l.o.g. that H(, ) < . We first show that under this condition is
absolutely continuous with respect to . Let A be a Borel set for which (A) = 0. Consider
for any n > 0, the function n = n1A B(S) so that by definition of H:
Z
Z
> H(, )
n d ln en d = n(A).
S
Taking n to infinity, we get (A) = 0, and thus . We define f = d/d the RadonNikodym derivative of with respect to . If f is uniformly positive and bounded, then
= ln f lies in B(S), and we get by definition of H:
Z
Z
Z
d
H(, )
d ln e d =
ln
d = R(|),
d
S
S
S
which is the desired inequality. If f is uniformly positive but not bounded, we set fn =
R
R
f n, n = ln fn B(S), and use the inequality H(|) S n d ln S en d. By
sending n to infinity and using the monotone convergence theorem, we get the required
inequality. In the general case, we define for [0, 1]: = (1 ) + , f = d

d =
(1 )f + . Since f is uniformy positive for > 0, we have R( |) H( , ). The
proof is completed by showing that
lim R( |) = R(|),
and
14
lim H( , ) = H(, ).
This is achieved by using convexity arguments, and we refer to [15] for the details.
The expression (2.13) of the relative entropy is useful, in particular, to show that for
fixed P(S), the function R(.|) is a good rate function.
2.5
Sanovs theorem
We deal now with large deviations of level 2, which concern random measures. In particular,
Sanovs theorem addresses large deviations associated with the empirical measure of i.i.d.
random variables. In this paragraph, we show how one can derive this large deviation result
by means of the Laplace principle. Let (Xn ) be a sequence of i.i.d. random variables valued
in some Polish space S, and with common probability distribution . We introduce the
corresponding sequence (Ln ) of empirical measures valued in P(S) by:
n1
Ln =
1X
Xj ,
n
j=0
where x is the Dirac measure in x S. The law of large numbers implies essentially the
weak convergence of Ln to . The next stage is Sanovs theorem, which states a large
deviation principle for Ln .
Theorem 2.5 (Sanov)
The sequence of empirical measures (Ln )n satisfies a large deviation principle with good
rate function the relative entropy R(.|).
The purpose of this paragraph is sketch the arguments for deriving Sanovs theorem
by using the Laplace principle. This entails calculating the asymptotic behavior of the
following expectations:
V n :=
1
ln E[exp(n(Ln ))],
n
(2.14)
where is any bounded continuous function mapping P(S) into R. The main issue is
to obtain a representation in the form (2.11), and a key step is to express V n as the gain
function of an associated stochastic control problem by using the variational formula (2.12).
In the sequel, is fixed. We introduce a sequence of random subprobability measures
related to the empirical measures as follows. For t [0, 1], we denote Mt (S) the set of
measures on S with total mass equal to t. Fix n N , and for i = 0, . . . , n 1, we define
Ln0 = 0, and
Lni+1 = Lni +
1
X ,
n i
so that Lnn equals the empirical measure Ln , and Lni is valued in Mi/n (S). We also introduce, for each i = 0, . . . , n, and Mi/n (S), the function
V n (i, ) =
1
ln Ei, [exp(n(Lnn ))],
n
15
where Ei, denotes the expectation conditioned on Lni = . Thus, V n (0, 0) = V n defined
in (2.14), and V n (n, ) = (). In order to obtain a representation formula for V n , we
first derive a recursive equation relating V n (i, .) and V n (i + 1, .) that we interpret as the
dynamic programmming equation of a stochastic control problem.
Recalling that the random variables Xi are i.i.d. with common distribution , we
see that the random measures {Lni , i = 0, . . . , n} form a Markov chain on state spaces
{Mi/n (S), i = 0, . . . , n} with probability transition:
Z
1
1
1A ( + y )(dy).
P[Lni+1 A|Lni = ] = P[ + Xi A] =
n
n
S
We then obtain by the law of iterated conditional expectations and Markov property:
V n (i, ) =
=
=
h
i
1
ln Ei, Ei+1,Lni+1 [exp(n(Lnn ))]
n
h
i
1
ln Ei, exp(nV n (i + 1, Lni+1 ))
n Z

1
1
ln exp nV n (i + 1, + y ) (dy).
n
n
S
By applying the variational formula (2.12), we obtain:

hZ
i
1
1
n
V n (i + 1, + y )(dy) R(|)
V (i, ) =
sup
n
n
S
P(S)
(2.15)
The relation (2.15) is the dynamic programming equation for the following stochastic control
n , i = 0, . . . , n} starting from L
n =
problem. The controlled process is a Markov chain {L
0
i
0, with controlled probability transitions:
Z
1
n
n
P[Li+1 A|Li = ] =
1A ( + y )i (dy),
n
S
where {i , i = 0, . . . , n} is the control process valued in P(S), in feedback type, i.e. for each
n . The running gain is 1 R(|), and the terminal gain is
i, the decision i depends on L
i
n
. We deduce that
n1
h
i
X
nn ) 1
V n = Vn (0, 0) = sup E (L
R(i |) .
n
(i )
(2.16)
i=0
Fix some arbitrary P(S), and consider the constant control i = . With this
nn is the empirical measure of i.i.d. random variables having common distribution
choice, L
, and the representation (2.16) yields
n ) R(|)].
V n E[(L
n
n converges weakly to , we have by the dominated convergence theorem:
Moreover, since L
n
nn )] = ().
lim E[(L
16
Since is arbitrary in P(S), we deduce that

lim inf V n
n
sup [() R(|)].

P(S)
The corresponding lower-bound requires more technical details (see the details in [15]), and
we get finally
lim V n =
sup [() R(|)].

P(S)
This implies that Ln satisfies the Laplace principle, and thus the large deviation principle
with the good rate function R(.|) on P(S).
Remark 2.5 There are extensions of Sanovs theorem on LDP for empirical measure of
Markov chain and occupation times of continuous-time Markov processes. The main references are the seminal works by Donsker and Varadhan [11], [12], [13]. Consider an ergodic
Feller-Markov process X valued in S, with generator L, and invariant measure . Under
Rt
some conditions, the occupation measure Lt = 1t 0 Xs ds converges to m exponentially fast
with a Donsker-Varadhan rate function given by
h Z Lu i
I() =
sup
d , P(S),
u
uD+ (L)
where D+ (L) is the space of positive functions in the domain of L. For a one-dimensional
diffusion with generator Lu = 21 2 uxx + bux , the rate function may be written in terms of
Dirichlet form:
(
R
E( f , f ) := 12 2 ( f )2 (dx), if f = d
d
I() =
,
otherwise
This Donsker-Varadhan large deviations result is often formulated via the Laplace principle
in terms of asymptotic evaluation of Markov process expectation for large time: for any
bounded continuous function on S,
h
Z t
i
hZ
i
1
(Xs )ds
=
sup
d I() .
lim ln E exp
t t
0
P(S)
2.6
Freidlin-Wentzell theory
In many problems, the interest is in rare events that depend on random process, and the
corresponding asymptotics probabilities, usually called sample path large deviations or large
deviations of level 3, were developed by Freidlin-Wentzell and Donsker-Varadhan.
The first example is known as Schilders theorem, and concerns large deviations for
the process Z = W , as goes to zero, where W = (Wt )t[0,T ] is a Brownian motion

in Rd . Denote by C([0, T ]) the space of continuous functions on [0, T ], and H([0, T ])
the Cameron-Martin space consisting of absolutely continuous functions h, with square
integrable derivative h.
17
Theorem 2.6 (Schilder)
( W ) satisfies a large deviation principle on C([0, T ]) with rate function, also called
action functional:
( R
1 T
2
if h H0 ([0, T ]) := {h H([0, T ]) : h(0) = 0},
2 0 |h(t)| dt,
I(h) =
,
otherwise
Let us show the lower bound of this LDP. Consider G a nonempty open set of C([0, T ]),
h G, and > 0 s.t. B(h, ) G. We want to prove that
lim inf ln P[ W B(h, )] I(h).

0
For h
/ H0 ([0, T ]), this inequality is trivial since I(h) = . Suppose now h H0 ([0, T ]),
and consider the probability measure:
dQh
dP
= exp
Z
0
Z T

h(t)
1
2
dWt
|h(t)| dt ,
2 0
so that by Cameron-Martin theorem, W h := W

Then, we have
P[ W B(h, )] = P[|W h | < ]
"
Z
Qh
exp
= E
(W h Qh -BM ) =
(W W ) =
=
is a Brownian motion under Qh .
#
Z T

h(t)
1
2
dWth
|h(t)|
dt 1|W h |<
2 0
0
"
#
Z
Z

T
T
h(t)
1
2
dWt
E exp
|h(t)|
dt 1|W |<
2 0
0
#
"
Z
Z

T
T
h(t)
1
2
dWt
|h(t)|
dt 1|W |<
E exp +
0
0
"
#
Z T
Z T
1
h(t)
2
dWt )1|W |<

E exp(
|h(t)|
dt) cosh(
2 0
0
Z T
1
exp(
|h(t)|
dt) P[|W | < ].
2 0
This implies
ln P[ W B(h, )] I(h) + ln P[|W | < ],
and thus the required lower bound.

One can extend Schilders result to the case of diffusion with small noise parameter:
Z t
Z t
Xt = x +
b(Xs )ds +
(Xs )dWs , 0 t T,
0
18
where b and are Lipschitz, and bounded. By using contraction principle for LDP, we
derive that (X ) satisfies a LDP in C([0, T ]) with the good rate function
1
Ix (h) =
R t inf
Rt
{f H0 ([0,T ]):h(t)=x+ 0 b(h(s))ds+ 0 (h(s))f(s)ds} 2
|f(t)|2 dt.
When is a square invertible matrix, the preceding formula for the rate function simplifies
to
( RT
1
2
if h Hx ([0, T ])
2 0 |h(t) b(h(t))|( 0 )1 (h(t)) dt,
Ix (h) =
,
otherwise
where Hx ([0, T ]) := {h H([0, T ]) : h(0) = x}. We sketch the proof in the case = Id,
and w.l.o.g. for x = 0. The transformation W X is given by the deterministic map

F : C([0, T ]) C([0, T ]) defined by F (f ) = h, where h is the solution to
Z t
h(t) = F (f )(t) =
b(h(s))ds + f (t), t [0, T ].
0
One easily check from the Lipschiz condition on b, and Gronwall lemma that the map F is
continuous on C([0, T ]) so that the contraction principle is applicable, and we obtain that
(X ) satisfies a LDP with good rate function
1
I(h) =
inf
{f H0 ([0,T ]),h=F (f )} 2
|f(t)|2 dt.
Moreover, observe that for f H0 ([0, T ]), h = F (f ) is differentiable a.e. with
h(t)
= b(h(t)) + f(t), h(0) = 0,
from which we derive the simple expression of the rate function
Z
1 T
I(h) =
|h(t) b(h(t))|2 dt.
2 0
Another important application of Freidlin-Wentzell theory deals with the problem of
diffusion exit from a domain, and occurs naturally in finance, see Section 3.1. We briefly
summarize these results. Let > 0 a (small) positive parameter and consider the stochastic
differential equation in Rd on some interval [0, T ],
dXs = b (s, Xs )ds +
(s, Xs )dWs ,
(2.17)
and suppose that there exists a Lipschitz function b on [0, T ] Rd s.t.

lim b = b,
uniformly on compact sets. Given an open set of Rd , we consider the exit time from ,
t,x

= inf s t : Xs,t,x
/ ,
19
and the corresponding exit probability
v (t, x) = P[t,x
T ], (t, x) [0, T ] Rd .
Here X ,t,x denotes the solution to (2.17) starting from x at time t. It is well-known that
the process X ,t,x converge to X 0,t,x the solution to the ordinary differential equation
dXs0 = b(s, Xs0 )ds,
Xt0 = x.
In order to ensure that v goes to zero, we assume that for all t [0, T ],
(H)
x = Xs0,t,x , s [t, T ].
Indeed, under (H), the system (2.17) tends, when is small, to stay inside , so that the
T } is rare. The large deviations asymptotics of v (t, x), when goes to zero,
event {t,x
was initiated by Varadhan and Freidlin-Wentzell by probabilistic arguments. An alternative

approach, introduced by Fleming, connects this theory with optimal control and Bellman
equation, and is developed within the theory of viscosity solutions, see e.g. [3]. We sketch
here this approach. It is well-known that the function v satisfies the linear PDE
v
+ b (t, x).Dx v + tr( 0 (t, x)Dx2 v ) = 0, (t, x) [0, T )

t
2
(2.18)
together with the boundary conditions

v (t, x) = 1, (t, x) [0, T )
x .
v (T, x) = 0,
(2.19)
(2.20)
Here is the boundary of . We now make the logarithm transformation

V = ln v .
Then, after some straightforward derivation, (2.18) becomes the nonlinear PDE
V
b (t, x).Dx V tr( 0 (t, x)Dx2 V )
t
2
1
+ (Dx V )0 0 (t, x)Dx V = 0, (t, x) [0, T ) ,
2
(2.21)
and the boundary data (2.19)-(2.20) become

V (t, x) = 0, (t, x) [0, T )
V (T, x) = ,
x .
(2.22)
(2.23)
At the limit = 0, the PDE (2.21) becomes a first-order PDE
V0
1
b(t, x).Dx V0 + (Dx V0 )0 0 (t, x)Dx V0 = 0, (t, x) [0, T ) , (2.24)
t
2
with the boundary data (2.22)-(2.23). By PDE-viscosity solutions methods and comparison
results, we can prove (see e.g. [3] or [20]) that V converges uniformly on compact subsets of
20
[0, T ) , as goes to zero, to V0 the unique viscosity solution to (2.24) with the boundary
data (2.22)-(2.23). Moreover, V0 has a representation in terms of control problem. Consider
the Hamiltonian function
1
H(t, x, p) = b(t, x).p + p0 0 (t, x)p, (t, x, p) [0, T ] Rd ,
2
which is quadratic and in particular convex in p. Then, using the Legendre transform, we
may rewrite
H(t, x, p) =

sup q.p H (t, x, q) ,
qRd
where
H (t, x, q) =

sup p.q H(t, x, p)
pRd
1
(q b(t, x))0 ( 0 (t, x))1 (q b(t, x)), (t, x, q) [0, T ] Rd .
2
Hence, the PDE (2.24) is rewritten as

V0
+ inf q.Dx V0 + H (t, x, q) = 0,
d
t
qR
(t, x) [0, T ) ,
which, together with the boundary data (2.22)-(2.23), is associated to the value function
for the following calculus of variations problem:
Z
V0 (t, x) =
H (u, x(u), x(u))du,
(t, x) [0, T ) ,
x(.)A(t,x) t
Z
=
inf
inf
x(.)A(t,x) t
1
(x(u)
b(u, x(u)))0 ( 0 (u, x(u)))1 (x(u)
b(u, x(u)))du
2
where
A(t, x) =

x(.) H([0, T ]) : x(t) = x and (x) T ,
Here (x) is the exit time of x(.) from . The large deviations result is then stated as
lim ln v (t, x) = V0 (t, x),
(2.25)
and the above limit holds uniformly on compact subsets of [0, T ) . Notice that for t
p
= 0, and T = 1, the quantity d(x, ) = 2V0 (0, x) is the distance between x and in
the Riemannian metric defined by ( 0 )1 . A more precise result may be obtained, which
allows to remove the above log estimate. This type of result is developed in [18], and is
called sharp large deviations estimate. It states asymptotic expansion (in ) of the exit
probability for points (t, x) belonging to a set N of [0, T 0 ] for some T 0 < T , open in the
relative topology, and s.t. V0 C (N ). Then, under the condition that
b = b + b1 + 0(2 ),
21
one has

V0 (t, x)
w(t, x) (1 + O()),
uniformly on compact sets of N , where w is solution to the PDE problem

v (t, x) = exp
W
(b 0 Dx V0 ).Dx w =
t
1
tr( 0 Dx2 V0 ) + b1 .Dx V0
2

w(t, x) = 0
on
in N
.
[0, T ) N
The function w may be represented as

Z

1
tr( 0 Dx2 V0 ) + b1 .Dx V0 (s, (s))ds,
w(t, x) =
2
t
where is the solution to
(s)
= (b 0 Dx V0 )(s, (s)),
(t) = x,
and is the exit time (after t) of (s, (s)) from N .
3
3.1
Large deviations in option pricing

Optimal importance sampling via large deviations approximation
In this section, we show how to use large deviations approximation via importance sampling
for Monte-carlo computation of expectations arising in option pricing. In the context of
continuous-time models, we are interested in the computation of
h
i
Ig = E g(St , 0 t T ) ,
where S is the underlying asset price, and g is the payoff of the option, eventually pathdependent, i.e. depending on the path process St , 0 t T . The Monte-Carlo approximation technique consists in simulating N independent sample paths (Sti )0tT , i = 1, . . . , N ,
in the distribution of (St )0tT , and approximating the required expectation by the sample
mean estimator:
IgN
N
1 X
g(S i ).
N
i=1
The consistency of this estimator is ensured by the law of large numbers, while the error
approximation is given by the variance of this estimator from the central limit theorem:
the lower is the variance of g(S), the better is the approximation for a given number N of
simulations. As already mentioned in the introduction, the basic principle of importance
sampling is to reduce variance by changing probability measure from which paths are generated. Here, the idea is to change the distribution of the price process to be simulated in
order to take into account the specificities of the payoff function g, and to drive the process
to the region of high contribution to the required expectation. We focus in this section in
the importance sampling technique within the context of diffusion models, and then show
how to obtain an optimal change of measure by a large deviation approximation of the
required expectation.
22
3.1.1
Importance sampling for diffusions via Girsanovs theorem
We briefly describe the importance sampling variance reduction technique for diffusions.
Let X be a d-dimensional diffusion process governed by
dXs = b(Xs )ds + (Xs )dWs ,
(3.1)
where (Wt )t0 is a n-dimensional Brownian motion on a filtered probability space (, F, F =

(Ft )t0 , P), and the Borel functions b, satisfy the usual Lipschitz condition ensuring the
existence of a strong solution to the s.d.e. (3.1). We denote by Xst,x the solution to (3.1)
starting fom x at time t, and we define the function:
h
i
v(t, x) = E g(Xst,x , t s T ) , (t, x) [0, T ] Rd .
Let = (t )0tT be an Rd -valued adapted process such that the process
Z
Z t

1 t
0
Mt = exp
u dWu
|u |2 du , 0 t T,
2 0
0
is a martingale, i.e. E[MT ] = 1. This is ensured for instance under the Novikov criterion:

RT
E exp 12 0 |u |2 du < . In this case, one can define a probability measure Q equivalent
to P on (, FT ) by:
dQ
dP
= MT .
R
t = Wt + t u du, 0 t T , is a Brownian
Moreover, by Girsanovs theorem, the process W
0
motion under Q, and the dynamics of X under Q is given by

s.
dXs = b(Xs ) (Xs )s ds + (Xs )dW
(3.2)
From Bayes formula, the expectation of interest can be written as
h
i
v(t, x) = EQ g(Xst,x , t s T )LT ,
(3.3)
where L is the Q-martingale

1
Lt =
Mt
= exp
Z
u
0u dW

|u |2 du , 0 t T.
(3.4)
The expression (3.3) suggests, for any choice of , an alternative Monte-Carlo estimator for
v(t, x) with
N
Ig,
(t, x) =
N
1 X
g(X i,t,x )LiT ,
N
i=1
by simulating N independent sample paths (X i,t,x ) and LiT of (X t,x ) and LT under Q given
by (3.2)-(3.4). Hence, the change of probability measure through the choice of leads to a
modification of the drift process in the simulation of X. The variance reduction technique
consists in determining a process , which induces a smaller variance for the corresponding
23
estimator Ig, than the initial one Ig,0 . The two next paragraphs present two approaches
leading to the construction of such processes . In the first approach developed in [28], the
process is stochastic, and requires an approximation of the expectation of interest. In the
second approach due to [30], the process is deterministic and derived through a simple
optimization problem. Both approaches rely on asymptotic results from the theory of large
deviations.
3.1.2
Option pricing approximation with a Freidlin-Wentzell large deviation

principle
We are looking for a stochastic process , which allows to reduce (possibly to zero!) the
variance of the corresponding estimator. The heuristics for achieving this goal is based
on the following argument. Suppose for the moment that the payoff g depends only on
the terminal value XT . Then, by applying Itos formula to the Q-martingale v(s, Xst,x )Ls
between s = t and s = T , we obtain:
Z T

t,x
s.
g(XT )LT = v(t, x)Lt +
Ls Dx v(s, Xst,x )0 (Xst,x ) + v(x, Xst,x )0s dW
t
N (t, x) is given by
Hence, the variance of Ig,
N
(t, x))
arQ (Ig,
1 Qh
E
N
2 i

L2s Dx v(s, Xst,x )0 (Xst,x ) + v(x, Xst,x )0s ds .
The choice of the process is motivated by the following remark. If the function v were
known, then one could vanish the variance by choosing
s = s =
1
v(s, Xst,x )
0 (Xst,x )Dx v(s, Xst,x ), t s T.
(3.5)
Of course, the function v is unknown (this is precisely what we want to compute), but this
suggests to use a process from the above formula with an approximation of the function
v. We may then reasonably hope to reduce the variance, and also to use such a method
for more general payoff functions, possibly path-dependent. We shall use a large deviations
approximation for the function v.
The basic idea for the use of large deviations approximation to the expectation function
v is the following. Suppose the option of interest, characterized by its payoff function g,
has a low probability of exercice, e.g. it is deeply out the money. Then, a large proportion
of simulated paths end up out of the exercice domain, giving no contribution to the Montecarlo estimator but increasing the variance. In order to reduce the variance, it is interesting
to change of drift in the simulation of price process to make the domain exercice more
likely. This is achieved with a large deviations approximation of the process of interest in
the asymptotics of small diffusion term: such a result is known in the literature as FreidlinWentzell sample path large deviations principle. Equivalently, by time-scaling, this amounts
to large deviation approximation of the process in small time, studied by Varadhan.
To illustrate our purpose, let us consider the case of an up-in bond, i.e. an option that
pays one unit of numeraire iff the underlying asset reached a given up-barrier K. Within a
24
stochastic volatility model X = (S, Y ) as in (3.1) and given by:

dSt = (Yt )St dWt1
dYt = (Yt )dt +
(3.6)
(Yt )dWt2 ,
d < W1 , W2 >t = dt,
(3.7)
its price is then given by

v(t, x) = E 1maxtuT Sut,x K = P[t,x T ], t [0, T ], x = (s, y) (0, ) R,
where
t,x

= inf u t : Xut,x
/ ,
= (0, K) R.
The event {maxtuT Sut,x K} = {t,x T } is rare when x = (s, y) , i.e. s < K
(out the money option) and the time to maturity T t is small. The large deviations
asymptotics for the exit probability v(t, x) in small time to maturity T t is provided by
the Freidlin-Wentzell and Varadhan theories. Indeed, we see from the time-homogeneity of
the coefficients of the diffusion and by time-scaling that we may write v(t, x) = wT t (0, x),
where for > 0, w is the function defined on [0, 1] (0, ) R by
w (t, x) = P[t,x
1],
and X ,t,x is the solution to

dXs = b(Xs )ds +
(Xs )dWs ,
Xt = x.

= inf s t : X ,t,x
and t,x
/ . From the large deviations result (2.25) stated in
s
paragraph 2.3, we have:
lim (T t) ln v(t, x) = V0 (0, x),
t%T
where
Z
V0 (t, x) =
inf
x(.)A(t,x) t
1
x(u)
0 M (x(u))x(u)du,
(t, x) [0, 1) ,
2
(x) is the diffusion matrix of X = (S, Y ), M (x) = (0 (x))1 , and

A(t, x) =

x(.) H([0, 1]) : x(t) = x and (x) 1 .
We also have another interpretation of the positive function V0 in terms of Riemanian distance on Rd associated to the metric M (x) = (0 (x))1 . By denoting L0 (x) =
p
2V0 (0, x), one can prove (see [42]) that L0 is the unique viscosity solution to the eikonal
equation
(Dx L0 )0 0 (x)Dx L0 = 1,
L0 (x) = 0,
25
x
x
and that it may be represented as

L0 (x) =
inf L0 (x, z),
x ,
(3.8)
where
Z
L0 (x, z) =
inf
1p
x(u)
0 M (x(u))x(u)du,
x(.)A(x,z) 0

and A(x, z) = x(.) H([0, 1]) : x(0) = x and x(1) = z . Hence, the function L0 can
be computed either by the numerical resolution of the eikonal equation or by using the
representation (3.8). L0 (x) is interpreted as the minimal length (according to the metric
M ) of the path x(.) allowing to reach the boundary from x. From the above large
deviations result, which is written as
ln v(t, x) '
L20 (x)
,
2(T t)
as T t 0,
and the expression (3.5) for the optimal theoretical , we use a change of probability
measure with
(t, x) =
L0 (x) 0
(x)Dx L0 (x).
T t
Such a process may also appear interesting to use in a more general framework than
up-in bond: one can use it for computing any option whose exercice domain looks similar
to the up and in bond. We also expect that the variance reduction is more significant as
the exercice probability is low, i.e. for deep out the money options. In the particular case
of the Black-Scholes model, i.e. (x) = s, we have
L0 (x) =
1
s
ln
,
and so
(t, x) =
3.1.3
1
s
ln(
,
(T t)
K
s < K.
Change of drift via Varadhan-Laplace principle
We describe here a method due to [30], which, in contrast with the above approach, does
not require the knowledge of the option price, and restricts to deterministic change of drifts.
The original approach of [30] was developed in a discrete-time setting, and extended to a
continuous-time context by [33]. In theses lectures, we follow the continuous-time diffusion
setting of paragraph 3.1.1. It is convenient to identify the option payoff with a nonnegative
functional G(W ) of the Brownian motion W = (Wt )0tT on the set C([0, T ]) of continuous
functions on [0, T ], and to define F = ln G valued in R {}. For example, in the case
of the Black-Scholes model for the stock price S, with
interest rate r and volatility ,
RT
1
ln
the payoff of a geometric average Asian option is (e T 0 St dt K)+ , corresponding to a
functional

RT
2
1
G(w) =
S0 e(r 2 )T e T 0 wt dt K ,
(3.9)
+
26
while the arithmetic Asian option is ( T1

G(w) =
1 Z
T
RT
0
St dt K)+ , corresponding to a functional:

S0 exp wt + (r 2 /2)t K .
+
We shall restrict to deterministic changes of drifts in Girsanovs theorem. We then consider

the Cameron-Martin space H0 ([0, T ]) of absolutely continuous functions, vanishing in zero,
with square integrable derivative. Any h H0 ([0, T ]) induces an equivalent probability
measure Qh via its Radon-Nikodym density:
dQh
dP
= exp
Z
h(t)dW
t
2

2
|h(t)|
dt ,
= W h is a Brownian motion under Qh . Moreover, from Bayes formula, we

and W
have an unbiased estimator of the option price E[G(W )] by simulating under Qh the payoff
dP
. An optimal choice of h should minimize the variance under Qh of this payoff,
G(W ) dQ
h
or equivalently its second moment given by:
h
h
dP 2 i
dP i
G(W )
= E G(W )2
dQh
dQh
Z
Z T
i
h

1 T
|h(t)|2 dt .
= E exp 2F (W )
h(t)dWt +
2 0
0
M 2 (h) = EQh
The above quantity is in general intractable, and we present here an approximation by

means of small-noise asymptotics:
Z T
Z
h
n1
oi
1 T
= E exp
h(t)dWt +
2F ( W )
|h(t)|2 dt
.
2 0
0
Now, from Schilders theorem, (Z = W ) satisfies a LDP on C([0, T ]) with rate funcRT
2 dt for z H0 ([0, T ]), and otherwise. Hence, under subquadratic
tion I(z) = 12 0 |z(t)|
growth conditions on the log payoff of the option, one can apply Varadhans integral prinRT
RT
2 dt, and
z(t)dt
+ 12 0 |h(t)|
ciple (see Theorem 2.3) to the function z 2F (z) 0 h(t).
get
M2 (h)
lim
ln M2 ()

1
=
sup
2F (z) +
2
zH0 ([0,T ])
|z(t)
h(t)|
dt

|z(t)|
2 dt . (3.10)
H0 ([0, T ]) is an asymptotic optimal drift if it is solution to the

We then say that h
problem:

1
inf
sup
2F (z) +
2
hH0 ([0,T ]) zH0 ([0,T ])
|z(t)
h(t)|
dt

|z(t)|
2 dt .
(3.11)
Swapping the order of optimization, this min-max problem is reduced to:

sup
Z
2F (h)
0
hH0 ([0,T ])
27

2
|h(t)|
dt .
(3.12)
Problem (3.12) is a standard problem of calculus of variations, which may be reduced to

the resolution of the associated Euler-Lagrange differential equation.
Let us illustrate the computation in the case of the geometric average Asian option (3.9)
(for which there is an explicit solution, see [40]). Problem (3.12) is written as
RT
Z T

a 0 h(t)dt
2
sup
2 ln e
c
|h(t)|
dt ,
(3.13)
0
hH0 ([0,T ])
where a = /T , c =
K
S0

2
exp (r 2 ) T2 . The corresponding Euler-Lagrange equation is
= ,
h

RT
exp 0 h(t)dt
,
with = a

RT
exp 0 h(t)dt c
(3.14)
h(t) = t2 + t.
2
(3.15)
hence with solutions in the form
The parameter is found by substituting (3.15) into in (3.14), which yields

aT 3 6 ln a
c
() =
.
3aT 2
Then, for this value of = (), the problem (3.13) is solved by maximizing over >
a. The optimal
is unique by strict concavity, and found implicitly via the first-order
equation

a
a
T 3 + 3 ln
= 0.
c
This
satisfies (
) =
T , and thus the optimal drift is
h(t)
=
2
t +
T t.
2
The following table compares the results obtained from Monte-Carlo simulations without applying variance reduction, by applying antithetic control variates, and the above
importance sampling method. Parameter values are T = 1, r = 3%, = 30%, S0 = 100,
K = 145.
Number of simulations
20000
Standard deviation/mean
10000
Standard deviation/mean
without
anti-control
IS
13.9905
13.48
1.07
13.7428
13.58
1.065
We also report from [33] the following table, which compares the performance, in terms
of variance ratios between the risk-neutral sample and the sample with the optimal drift
for an Asian option in a Black-Scholes model. Parameter values are T = 1, r = 5%, =
20%, S0 = 50, and strikes are varying. Simulations are performed with 106 paths.
28
Strike
50
60
70
Price
304.0
28.00
1.063
Variance ratios
7.59
26.5
310
The performance gap increases with the strike. This is justified by the fact that a
larger strike cause the option to become more out-the-money, and then the role of the drift
in reshaping the payoff distribution in the region of interest becomes more crucial. An
extension of this method of importance sampling by using sample path large deviations
results is considered recently in [49] for stochastic volatility models.
3.2
Asymptotics in stochastic volatility models
In recent years, there has been an increasing interest for asymptotic and expansion methods in option pricing and implied volatility for stochastic volatility models. There is a
considerable literature dealing with various asymptotics (small-time, large time, fast meanreverting, extreme strike) for stochastic volatility models, see [2], [4], [44], [36], [6], [46],
[16], [43], [24], [17], [39], [53], [26], [29]. In particular, large deviations provides a powerful
tool for describing the limiting behavior of implied volatilities. We recall that an implied
volatility is the volatility parameter needed in the Black-Scholes formula in order to match
a call option price, and it is a common practice to quote prices in volatility through this
transformation. In this section, we shall focus on small time asymptotics near maturity of
options.
3.2.1
Large deviations in stochastic volatility models for small time to maturity
Let us consider a general stochastic volatility model for the log-stock price Xt = ln St given
by
p

1
dXt = 2 (Yt )dt + f (Yt ) 1 2 dWt1 + dWt2 ,
2
dYt = (Yt )dt + (Yt )dWt2 ,
(3.16)
(3.17)
with X0 = x0 , Y0 = y0 , W 1 and W 2 are two independent Brownian motions, (1, 1),

, > 0, and > 0 are bounded and Lipschitz continuous functions on R. The aim is to
derive an approximation of call option price and implied volatility when time to maturity
is small.
Large deviations for the log-stock price. A first step is to obtain a large deviations
principle for the log stock price in the form:
lim t ln P[Xt x0 ] = I(k),
t0
k 0,
(3.18)
for some continuous rate function I on R. For general SV models (3.16)-(3.17), such large
deviations results is derived from Freidlin-Wentzell theory. Indeed, by time scaling, we see
29
that for any > 0, the process t (Xt x0 , Yt ) has the same distribution as (X x0 , Y )
defined by
p
dXt = (Yt )2 dt + f (Yt ) 1 2 dWt1 + dWt2 ),

2
dYt = (Yt )dt + (Yt )dWt2 .

Now, from the Freidlin-Wentzell or Varadhan sample path large deviations result recalled
in Section 2.6, we know that (Xt x0 , Yt )0t1 satisfies a LDP in C([0, 1]) as goes to
zero, with rate function
Z 1h
2x(t)
y(t)
1
x(t)
2
y(t)
2 i
dt,
I(x(.), y(.)) =
+
2(1 2 ) 0 f (y(t))2
f (y(t))
(y(t))2
for all (x(.), y(.)) H([0, 1]) s.t. (x(0), y(0)) = (0, y0 ). Then by applying contraction
principle (see Theorem 2.2), we deduce that X1 x0 , as goes to zero, and so Xt x0 , as
t goes to zero, satisfies a LDP in R with the rate function I : R [0, ] given by
I(k) =
inf
(x,y)H([0,1]),(x,y)(0)=(0,y0 ),x(1)=k
I(x(.), y(.)).
p
The quantity d(k) = 2I(k) is actually the distance from (0, y0 ) to the line {x = k} on
the plane R2 for the Riemannian metric defined by the inverse of the diffusion coefficient
of (X, Y ). Hence, the LDP for Xt x0 means that:
1
lim t ln P[Xt x0 k] = d(k)2 , k 0.
t0
2
The calculation of d(k), and so the determination of the distance-minimizing geodesic
(x , y ) from (0, y0 ) to the line {x = k}, is a differential geometry problem associated
to a calculus of variations problem, but which does not have in general explicit solutions
(see [25] for some details). The solution to this problem can be also characterized by PDE
methods through a nonlinear eikonal equation, see [4]. For the Heston model and more
generally for affine SV models, the LDP (3.18) can be derived directly from explicit computation of the moment generating function and Gartner-Ellis theorem. We postpone the
details in the next paragraph.
Pricing. We show in general how the LDP for the log-stock price provides an approximation for pricing out-of-the money call options of small maturity. We have the following
estimate:
lim t ln E[(St K)+ ] = lim t ln P[St K] = I(x),
t0
t0
(3.19)
where x = ln(K/S0 ) > 0 is the log-moneyness. A similar result holds for out-of-the money
put options. Let us first show the lower bound. For any > 0, we have
E[(St K)+ ] E[(St K)+ 1St K ] P[St K + ].
(3.20)
By using the LDP (3.18), we then get

h
K + i
lim inf t ln E[(St K)+ ] lim inf t ln P[St K + ] = lim inf t ln P Xt x0 ln
t0
t0
t0
S0

K +
I ln
.
S0
30
By sending to zero, and from the continuity of I, we obtain the desired lower bound. To
show the upper bound, we apply Holder inequality for any p, q > 1, 1/p + 1/q = 1, to get

1
1
p
q
E[(St K)+ ] = E[(St K)+ 1St K ]
E[(St K)p+ ]
E[1St K ]
1

1
p
q
P[St K] .
E[Stp ]
Taking ln and multiplying by t, this implies
t ln E[(St K)+ ]

t
1
ln E[Stp ] + 1
t ln P[St K].
p
p
Now, for fixed p, t ln E[Stp ] 0 as t goes to zero. It follows from the LDP (3.18) that

1
lim sup t ln E[(St K)+ ] 1
I(x).
p
t0
By sending p to infinity, we obtain the required upper-bound and so finally the rare event
estimate in (3.19).
Implied volatility. As a consequence of the rare event estimate for call option pricing,
we can determine the asymptotic behaviour for the implied volatility. Recall that the
implied volatility t = t (x) of a call option on St with strike K = S0 ex , and time to
maturity t is determined from the relation:
E[(St K)+ ] = C BS (t, S0 , x, t ) := S0 (d1 (t, x, t )) S0 ex (d2 (t, x, t )), (3.21)
where
d1 (t, x, ) =
x + 21 2 t
,
t
d2 (t, x, ) = d1 (t, x, ) t,
Rd
and (d) = (x)dx is the cdf of the normal law N (0, 1). As a consequence of the large
deviation pricing (3.19), we compute the asymptotic implied volatility for out-of-the money
call options of small maturity:
lim t (x) =
t0
|x|
p
,
2I(x)
x 6= 0.
The derivation relies on the standard estimate on (see section 14.8 in [54]):

1 1
1
d+
(d) 1 (d) (d), d > 0,
d
d
(3.22)
(3.23)
which implies that 1 (d) (d)/d as d goes to infinity. We show the result for outthe-money call option price, i.e. x > 0. The same argument is valid for x < 0. Since the
out-of-the money call option price E[(St K)+ ] goes to zero as t 0, we see from the
relation (3.21) defining the implied volatility that t t 0, and so d1 = d1 (t, x, t )

. Then, from (3.19) and (3.21), for any > 0, we have for t small enough
I(x) +

exp
E[(St K)+ ] S0 (d1 ) = S0 1 (d1 )
t
S0
(d1 )
d1
31
Taking t ln in the above inequality, and sending t to zero, we deduce that

(I(x) + )
x2
,
2 lim inf t0 t2
which proves the lower bound in (3.22) by sending to zero. For the upper bound, fix t
the maturity of the option, and denote by S t the Black-Scholes price with the constant
implied volatility t . Then, from (3.19) and as in (3.20), for all > 0, we have for t small
enough,
I(x)
exp
E[(St K)+ ] = E[(Stt K)+ ]
t

P[Stt K + ] = (d2, ) = 1 (d2, )

1 1
(d2, )
|d2, | +
|d2, |
where
d2, =
ln
K+
S0
+ 1 t2 t
2
,
t t
as t goes to zero. Taking t ln in the above inequality, and sending t to zero, we deduce that

(I(x) )
ln
K+
S0
2
2 lim supt0 t2
which proves the upper bound in (3.22) by sending to zero, and finally the desired result.
3.2.2
The case of Heston model
In this paragraph, we consider the popular Heston stochastic volatility model for the log
stock price Xt = ln St (interest rates and dividends are assumed to be null):
p p
1
dXt = Yt dt + Yt ( 1 2 dWt1 + dWt2 )
2
p
dYt = ( Yt )dt + Yt dWt2 ,
(3.24)
(3.25)
with X0 = x0 R, Y0 = y0 > 0, (1, 1), , , > 0 with 2 > 2 . This last

condition ensures that the Cox-Ingersoll-Ross SDE for Y admits a unique strong solution,
which remains strictly positive. In this case, one can derive explicitly the rate function
of the LDP for the log-stock price by an explicit computation of the moment generating
function of X, and of its limit.
Let us then define the quantity
h
i
t (p) = ln E exp p(Xt x0 ) , p R.
(3.26)
32
By definition of X in (3.24), we have for all p R:

Z
Z tp
Z tp
h
p
i
p t
2
2
t (p) = ln E exp
Ys ds + p
Ys dWs + p 1
Ys dWs1
2 0
0
0
Z tp
Z t
n

p
= ln E exp
Ys ds + p
Ys dWs2
2 0
0
Z tp
h
io
p

E exp p 1 2
Ys dWs1 (Ws2 )st
0
Z t
Z tp
Z
h
i
p
p2 (1 2 ) t
2
= ln E exp
Ys ds + p
Ys ds
Ys dWs +
2 0
2
0
0
Z tp
Z t
h
2 2 Z t

i
p(p
1)
p
= ln E exp p
Ys dWs2
Ys ds exp
Ys ds ,
2
2
0
0
0
where we used the law of iterated conditional expectation in the second equality, and the
fact that Yt is measurable with respect to W 2 . By Girsanovs theorem, we then get
Z
h
i
p(p 1) t
Q
t (p) = ln E exp
Ys ds ,
2
0
where under Q, the process Y satisfies the sde
dYt = ( ( p)Yt )dt +
p
Yt dWt2,Q ,
(3.27)
with W 2,Q a Brownian motion. We are then reduced to the calculation of exponential of
functionals of CIR processes, for which we have closed-form expressions derived either by
probabilistic or PDE methods (see e.g. [14], [1], [37], [39]).
We present here the PDE approach. Fix p R, and consider the function defined by
Z
h
i

p(p 1) t
Q
Ys ds Y0 = y ,
F (t, y, p) = E exp
2
0
so that t (p) = ln F (t, y0 , p). From Feynman-Kac formula, the function F is solution to
the parabolic linear Cauchy problem
F
p(p 1)
F
2 2F
=
yF + ( ( p)y)
+ y 2
t
2
y
2 y
F (0, y, p) = 1.
We look for a function F in the form F (t, y, p) = exp((t, p)+y(t, p)) for some deterministic
functions (., p), (., p). By plugging into the PDE for F , we obtain that and satisfy
the ordinary differential equations (ode):
p(p 1)
2
( p) + 2 ,
2
2
= ,
(0, p) = 0
(0, p) = 0.
(3.28)
(3.29)
One can solve explicitly the Riccati equation (3.28) under the condition:
= (p) := ( p)2 2 p(p 1) 0.
33
(3.30)
Indeed, in this case, a particular solution to (3.28) is given by the constant function in time:
p +
0 (p) =
,
2
and denoting by = 1/( 0 ), i.e. = 0 + 1 , we see that the function satisfies the
first-order linear ode:

1
1
+ + 2 = 0, (0, p) =
.
t
2
0 (p)
The solution to this equation is given by
(t, p) =
1 2 t
2
e t
(e
1)
2
p +
We then obtain the solution to the Riccati equation (3.28) after some straightforward
calculations:
p 1 e t
1
=
(t, p) = 0 (p) +
(t, p)
2
1 he t

sinh 2 t

,
= p(p 1)
( p) sinh 2 t + cosh 2 t
and
(t, p) =
=
1 he t i
h
(
)t
2
ln
2
1h
t

h
e 2

( p )t + 2 ln
( p) sinh 2 t + cosh
where
h = h(p) :=
i
2 t
p
.
p +
The solutions , to these equations are defined for all t 0 such that (1 he
> 0, i.e. for t [0, T ) where
(
,
if p 0,
T = T (p) =
1
ln h,
if p < 0.
t )/(1 h)
When (p) < 0, we extend the functions and by analytic continuation by substituting
by i , which yields:

sin 2 t
(t, p) = p(p 1)
(3.31)

,
( p) sin 2 t + cos 2 t
h
(t, p) =
( p i )t
2
i

i
e 2 t
+ 2 ln
.
(3.32)

( p) sin 2 t + cos 2 t
34
This analytic continuation holds as long as

( p) sin
t + cos
t > 0,
2
2
which corresponds to an explosion time
T
i
2 h
1p>0 + arctan
.
p
= T (p) =
Recalling that a Laplace transform is analytic in the interior of its convex domain (when
its is not empty), we deduce that the function t defined in (3.26) is explicitly given by
(
(t, p) + y0 (t, p), t < T (p), p R
t (p) =
,
t T (p), p R.
Our purpose is to derive a LDP for Xt x0 when t goes to zero, and thus, in view of
Gartner-Ellis theorem, we need to determine the limiting moment generating function:
(p) := lim tt (p/t).
t0
We are then led to substitute p p/t and let t 0 in the above calculations. Observe that
for t small, (p/t) (1 2 ) 2 p2 /t2 , and so:
h
p1 2 i
2t
p
1p0 + sgn(p) arctan
, for 6= 0, p 6= 0,
T (p/t)
|p| 1 2
t
, for = 0, p 6= 0,
|p|
= , for p = 0.
Hence, for t > 0 small, the set {t < T (p/t)} may be written equivalently as p (p , p+ )
where p < 0 is defined by

12
2 arctan
,
if < 0
12
p =
,
if = 0

2
1
2+2 arctan
, if > 0
12
and p+ > 0 is defined by
p+ =

2+2 arctan
12
,

12
2 arctan
12
12
35

,
if < 0
if = 0

,
if > 0.
p
p
Moreover, by observing that t (p/t) 1 2 |p|, we find that for all p (p , p+ ):
p
1
p 12
p
2
1 cot
2
h
p
t(t, p/t) t 2 (p + i 1 2 |p|)
p
2

1 2 ei|p| 1 /2
+ 2 ln
p
p 12
sin
+
1 2 cos
2
t(t, p/t)
12
2
i
We conclude that
(p) =
y0
12 cot
12
2
for p (p , p+ )
otherwise.
From the basic properties of moment generating function, we know that is convex, lowersemicontinuous, and by direct inspection, we easily see that is smooth on (p , p+ ) with
(p) and |0 (p)| as p p+ and p p . We can then apply Ellis-Gartner theorem, which
implies that Xt x0 satisfies a LDP with rate function equal to the Fenchel-Legendre
transform of , i.e.
(x) =
sup
[px (p)], x R.
(3.33)
p(p ,p+ )
For any x R, the supremum in (3.33) is attained at a point p = p (x) solution to x =

0 (p ). From Jensens inequality, notice that for all t > 0, p R, t (p) E[ln ep(Xt x0 ) ] =
p(E[Xt ] x0 ). Since E[Xt ] x0 as t goes to zero, this implies that (p) = limt0 t(p/t)
0 for all p R, and thus (0) = 0. It follows that for any x 0, p < 0, px (p)
(p) (0) = 0. Therefore, (x) = supp[0,p+ ) [px (p)], for x 0, which implies
that is nondecreasing on R+ . Similarly, (x) = supp(p ,0] [px (p)], for x 0, and
so is nonincreasing on R . The LDP for Xt x0 can then be formulated as:
lim t ln P[Xt x0 k] = inf (x) = (k), k 0,
(3.34)
lim t ln P[Xt x0 k] = inf (x) = (k), k 0.
(3.35)
t0
t0
xk
xk
Finally, by Taylor expansion around x = 0 (at-the-money) of the small time implied

volatility formula in (3.22):
0 (x) =
|x|
,
2I(x)
and explicit expressions of and its Fenchel Legendre transform I = in (3.33), we obtain
0 (x) =
2
5 2 2
h
3
y0 1 +
x+
1
x
+
O(x
)
,
4y0
2
24y02
which determines the level, slope and curvature of the small-time implied volatility at the
money in the Heston model, as in [46] and [16].
36
Large deviations in risk management
4.1
4.1.1
Large portfolio losses in credit risk

Portfolio credit risk in a single factor normal copula model
A basic problem in measuring portfolio credit risk is determining the distribution of losses
from default over a fixed horizon. Credit portfolios are often large, including exposure to
thousands of obligors, and the default probabilities of high-quality credits are extremely
small. These features in credit risk context lead to consider rare but significant large loss
events, and emphasis is put on the small probabilities of large losses, as these are relevant
for calculation of value at risk and related risk measures.
We use the following notation:
n = number of obligors to which portfolio is exposed,
Yk = default indicator (= 1 if default, 0 otherwise) for k-th obligor,
pk = marginal probability that k-th obligor defaults, i.e. pk = P[Yk = 1],
ck = loss resulting from default of the k-th obligor,
Ln = c1 Y1 + . . . + cn Yn = total loss from defaults.
We are interested in estimating tail probabilities P[Ln > `n ] in the limiting regime at
increasingly high loss thresholds `n , and rarity of large losses resulting from a large number
n of obligors and multiple defaults.
For simplicity, we consider a homogeneous portfolio where all pk are equal to p, and
all ck are equal constant to 1. An essential feature for credit risk management is the
mechanism used to model the dependence across sources of credit risk. The dependence
among obligors is modelled by the dependence among the default indicators Yk . This
dependence is introduced through a normal copula model as follows: each default indicator
is represented as
Yk = 1{Xk >xk } ,
k = 1, . . . , n,
where (X1 , . . . , Xn ) is a multivariate normal vector. Without loss of generality, we take each
Xk to have a standard normal distribution, and we choose xk to match the marginal default
probability pk , i.e. xk = 1 (1 pk ) = 1 (pk ), with cumulative normal distribution.
We also denote = 0 the density of the normal distribution. The correlations along the
Xk , which determine the dependence among the Yk , are specified through a single factor
model of the form:
p
Xk = Z + 1 2 k , k = 1, . . . , n.
(4.1)
where Z has the standard normal distribution N (0, 1), k are independent N (0, 1) distribution, and Z is independent of k , k = 1, . . . , n. Z is called systematic risk factor (industry,
regional risk factors for example ...), and k is an idiosyncratic risk associated with the k-th
obligor. The constant in [0, 1) is a factor loading on the single factor Z, and assumed
here to be identical for all obligors. We shall distinguish the case of independent obligors
( = 0), and dependent obligors ( > 0). More general multivariate factor models with inhomogeneous obligors are studied in [31]. Other recent works dealing with large deviations
37
in credit risk include the paper [50], which analyzes rare events related to losses in senior
traches of CDO, and the paper [45], which studies the portfolio loss process.
4.1.2
Independent obligors
In this case, = 0, the default indicators Yk are i.i.d. with Bernoulli distribution of
parameter p, and Ln is a binomial distribution of parameters n and p. By the law of large
numbers, Ln /n converges to p. Hence, in order that the loss event {Ln ln } becomes rare
(without being trivially impossible), we let ln /n approach q (p, 1). It is then appropriate
to specify ln = nq with p < q < 1. From Cramers theorem and the expressions of the
c.g.f. of the Bernoulli distribution and its Fenchel-Legendre transform, we obtain the large
deviation result for the loss probability:
1
1 q
q
lim ln P[Ln nq] = q ln
(1 q) ln
< 0.
n n
p
1p
Remark 4.1 By denoting () = ln(1 p + pe ) the c.g.f. of Yk , we have an IS (unbiased)
estimator of P[Ln nq] by taking the average of independent replications of
exp(Ln + n())1Ln nq
where Ln is sampled with a default probability p() = P [Yk = 1] = pe /(1 p + pe ).
Moreover, see Remark 2.3, this estimator is asymptotically optimal, as n goes to infinity,
for the choice of parameter q 0 attaining the argmax in q ().
4.1.3
Dependent obligors
We consider the case where > 0. Then, conditionally on the factor Z, the default indicators
Yk are i.i.d. with Bernoulli distribution of parameter:
p
p(Z) = P[Yk = 1|Z] = P[Z + 1 2 k > 1 (p)|Z]
Z + 1 (p)
p
=
.
(4.2)
1 2
Hence, by the law of large numbers, Ln /n converges in law to the random variable p(Z)
valued in (0, 1). In order that {Ln ln } becomes a rare event (without being impossible)
as n increases, we therefore let ln /n approach 1 from below. We then set
ln = nqn ,
with qn < 1, qn % 1 as n .
(4.3)
Actually, we assume that the rate of increase of qn to 1 is of order na with a 1:

1 qn = O(na ),
with 0 < a 1.
(4.4)
We then state the large deviations result for the large loss threshold regime.
Theorem 4.1 In the single-factor homogeneous portfolio credit risk model (4.1), and with
large threshold ln as in (4.3)-(4.4), we have
1 2
1
ln P[Ln nqn ] = a 2 .
n ln n
lim
38
Observe that in the above theorem, we normalize by ln n, indicating that the probability
decays like n , with = a(1 2 )/2 . We find that the decay rate is determined by the
effect of the dependence structure in the Gaussian copula model. When is small (weak
dependence between sources of credit risk), large losses occur very rarely, which is formalized
by a high decay rate. In the opposite case, this decay rate is small when tends to one,
which means that large losses are most likely to result from systematic risk factors.
Proof. 1) We first prove the lower bound:
lim inf
n
1
1 2
ln P[Ln nqn ] a 2 .
ln n
(4.5)
From Bayes formula, we have

P[Ln nqn ] P[Ln nqn , p(Z) qn ]
= P[Ln nqn |p(Z) qn ] P[p(Z) qn ].
(4.6)
For any n 1, we define zn R the solution to

p(zn ) = qn ,
n 1.
Since p(.) is an increasing one to one function, we have {p(Z) qn } = {Z zn }. Moreover,

observing that Ln is an increasing function of Z, we get
P[Ln nqn |p(Z) qn ] = P[Ln nqn |Z zn ]
P[Ln nqn |Z = zn ] = P[Ln nqn |p(Z) = qn ],
so that from (4.6)
P[Ln nqn ] P[Ln nqn |p(Z) = qn ]P[Z zn ].
(4.7)
Now given p(Z) = qn , Ln is binomially distributed with parameters n and qn , and thus
1
P[Ln nqn |p(Z) = qn ] 1 (0) = (> 0).
2
(4.8)
We focus on the tail probability P[Z zn ] as n goes to infinity. First, observe that since
qn goes to 1, we have zn going to infinity as n tends to infinity. Furthermore, from the
expression (4.2) of p(z), the rate of decrease (4.4), and using the property that 1 (x) '
(x)/x as x , we have
z + 1 (p)
n
p
1 2
p
1 z + 1 (p)
1 2
2
n
p
exp
,
1
zn + (p)
2
1 2
O(na ) = 1 qn = 1 p(zn ) = 1
'
as n , so that by taking logarithm:
a ln n
1 2 zn2
ln zn = O(1).
2 1 2
39
This implies
zn2
n ln n
lim
= 2a
1 2
.
2
(4.9)
By writing
P[Z zn ] = P[zn Z zn + 1]
1

1
exp (zn + 1)2 ,
2
2
we deduce with (4.9)
lim inf
n
1
1 2
ln P[Z zn ] a 2 .
ln n
Combining with (4.7) and (4.8), we get the required lower bound (4.5).
2) We now focus on the upper bound
lim sup
n
1
1 2
ln P[Ln nqn ] a 2 .
ln n
(4.10)
We introduce the conditional c.g.f. of Yk :

(, z) = ln E eYk |Z = z]
(4.11)
= ln(1 p(z) + p(z)e ).
(4.12)
Then, for any 0, we get by Chebichevs inequality,

P[Ln nqn |Z] E e(Ln nqn ) |Z = en(qn (,Z)) ,
so that
(q ,Z)
n
P[Ln nqn |Z] en
(4.13)
where
(
(q, z) = sup[q (, z)] =
0
q ln
q
p(z)
0,
+ (1 q) ln
1q
1p(z) ,
if q p(z)
if p(z) < q 1.
By taking expectation on both sides on (4.13), we get

P[Ln nqn ] E eFn (Z) ,
(4.14)
where we set Fn (z) = n (qn , z). Since > 0, the function p(z) is increasing in z, so
(, z) is an increasing function of z for all 0. Hence, Fn (z) is an increasing function of
z, which is nonpositive and attains its maximum value 0, for all z s.t. qn = p(zn ) p(z),
i.e. z zn . Moreover, by differentiation, we can check that Fn is a concave function of z.
We now introduce a change of measure. The idea is to shift the factor mean to reduce the
40
variance of the term eFn (Z) in the r.h.s. of (4.14). We consider the change of measure P
that puts the distribution of Z to N (, 1). Its likelihood ratio is given by
dP
dP
1
= exp Z 2 ,
2
so that

1 2
E eFn (Z) = E eFn (Z)Z+ 2 ,
where E denotes the expectation under P . By concavity of Fn , we have Fn (Z) Fn () +
Fn0 ()(Z ), so that

1 2
0
0
E eFn (Z) E eFn ()+(Fn ())ZFn ()+ 2 .
(4.15)
We now choose = n solution to

Fn0 (n ) = n ,
(4.16)
so that the term in the expectation in the r.h.s. of (4.15) does not depend on Z, and is
therefore a constant term (with zero-variance). Such a n exists, since, by strict concavity
of the function z Fn (z) 12 z 2 , equation (4.16) is the first-order equation associated to
the optimization problem:
1
n = arg max[Fn () 2 ].
R
2
With this choice of factor mean n , and by inequalities (4.14), (4.15), we get
1
P[Ln nqn ] eFn (n ) 2 n .
(4.17)
We now prove that n /zn converges to 1 as n goes to infinity. Actually, we show that for
all > 0, there is n0 large enough so that for all n n0 , zn (1 ) < n < zn . Since
Fn0 (n ) n = 0, and the function Fn0 (z) z is decreasing by concavity Fn (z) z 2 /2, it
suffices to show that
Fn0 (zn (1 )) zn (1 ) > 0
and
Fn0 (zn ) zn < 0.
(4.18)
We have
Fn0 (z) = n
p(z ) 1 p(z ) z + 1 (p)
n
n
p
p
.
2
p(z)
1 p(z)
1
1 2
The second inequality in (4.18) holds since Fn0 (zn ) = 0 and zn > 0 for qn > p, hence for
n large enough. Actually, zn goes to infinity as n goes to infinity from (4.9). For the first
inequality in (4.18), we use the property that 1 (x) ' (x)/x as x , so that
lim
p(zn )
= 1,
p(zn (1 ))
and
41
lim
1 p(zn )
= 0.
1 p(zn (1 ))
From (4.9), we have

z (1 ) + 1 (p)
2
n
p
= 0(na(1) ),
2
1
and therefore
2
Fn0 (zn (1 )) = 0(n1a(1) ).

Moreover, from (4.9) and as a 1, we have
2
zn (1 ) = 0( ln n) = o(n1a(1) )
We deduce that for n large enough Fn0 (zn (1 )) zn (1 ) > 0 and so (4.18).
Finally, recalling that Fn is nonpositive, and from (4.17), we obtain:
lim sup
n
1
1
2
1
z2
1 2
ln P[Ln nqn ] lim n = lim n = a 2 . (4.19)
ln n
2 n ln n
2 n ln n
Application: asymptotic optimality of two-step importance sampling estimator

Consider the estimation problem of P[Ln nq]. We apply a two-step importance sampling
(IS) by using IS conditional on the common factors Z and IS to the distribution of the
factors Z. Observe that conditioning on Z reduces to the problem of the independent case
studied in the previous paragraph, with default probability p(Z) as defined in (4.2), and
c.g.f. (., Z) in (4.11). Choose qn (Z) 0 attaining the argmax in qn (, Z), and
return the estimator
exp(qn (Z)Ln + n(qn (Z), Z))1Ln nqn ,
where Ln is sampled with a default probability p(q (Z), Z) = p(Z)eqn (Z) /(1 p(Z) +
p(Z)eqn (Z) ). This provides an unbiased conditional estimator of P[Ln nqn |Z] and an
asymptotically optimal conditional variance. We further apply IS to the factor Z N (0, 1)
under P, by shifting the factor mean to , and then considering the estimator
1
exp(Z + 2 ) exp(qn (Z)Ln + n(qn (Z), Z))1Ln nqn ,
2
(4.20)
where Z is sampled from N (, 1). To summarize, the two-step IS estimator is generated as

follows:
Sample Z from N (, 1)
Compute qn (Z) and p(qn (Z), Z)
Return the estimator (4.20) where Ln is sampled with default probability p(qn (Z), Z).
By construction, this provides an unbiaised estimator of P[Ln nqn ], and the key point
is to specify the choice of in order to reduce the global variance or equivalently the
second moment Mn2 (, qn ) of this estimator. First, recall from Cauchy-Schwarzs inequality:
42
Mn2 (, qn ) (P[Ln nq])2 , so that the fastest possible rate of decay of Mn2 (, qn ) is twice
the probability itself:
lim inf
n
1
1
ln Mn2 (qn , ) 2 lim
ln P[Ln nqn ].
n ln n
ln n
(4.21)
the expectation under the

To achieve this twice rate, we proceed as follows. Denoting by E
IS distribution, we have
i
h
exp(2Z + 2 ) exp(2qn (Z)Ln + 2n(qn (Z), Z))1L nq
Mn2 (, qn ) = E
n
n
h
i
exp(2Z + 2 ) exp(2nqn (Z)qn + 2n(qn (Z), Z))
E
h
i
exp(2Z + 2 + 2Fn (Z)) ,
= E
by definition of qn (Z) and Fn (z) = n sup0 [qn (, z)] introduced in the proof of the
upper bound in Theorem 4.1. As in (4.15), (4.17), by choosing = n solution to Fn0 (n )
= n , we then get
Mn2 (n , qn ) exp(2Fn (n ) 2n ) exp(2n ),
since Fn is nonpositive. From (4.19), this yields
lim sup
n
1
1 2
1
ln Mn2 (n , qn ) 2a 2
= 2 lim
ln P[Ln nqn ],
n ln n
ln n
which proves together with (4.21) that

1
1 2
1
ln Mn2 (n , qn ) = 2a 2
= 2 lim
ln P[Ln nqn ],
n ln n
n ln n
lim
and thus the estimator (4.20) for the choice = n is asymptotically optimal. The choice
of = zn also leads to an asymptotically optimal estimator.
Remark 4.2 We also prove by similar methods large deviation results for the loss distribution in the limiting regime where individual loss probabilities decrease toward zero, see
[31] for the details. This setting is relevant to portfolios of highly-rated obligors, for which
one-year default probabilities are extremely small. This is also relevant to measuring risk
over short time horizons. In this limiting regime, we set
ln = nq,
with 0 < q < 1,
p = pn = O(ena ), with a > 0.
Then,
1
a
ln P[Ln nq] = 2 ,
n n
lim
and we may construct similarly as in the case of large losses, a two-step IS asymptotically
optimal estimator.
43
4.2
Asymptotic arbitrage and large deviations
Arbitrage is the cornerstone concept of modern mathematical finance, and several versions
of the so-called fundamental theorem of asset pricing have been proved, see [8] for an
overview. This theorem states that absence of arbitrage is essentially equivalent to the
existence of an equivalent martingale measure. While typical models admit equivalent
martingale measures up to a finite horizon, this is not true globally. This means that shortterm arbitrage would not exist, but arbitrage opportunities may arise in the long run. The
existence of such infinite horizon arbitrage opportunities has been studied for example in
[38]. We present here some results in [23] on explicit estimates for asymptotic arbitrage,
and related to large deviation estimates for the market price of risk.
Let us consider for simplicity a one dimensional diffusion model for the stock price

dSt = (St ) dWt + (St )dt ,
(4.22)
on a filtered probability space (, F, F = (Ft )t0 , P), with W a standard Brownian motion,
the (local) volatility coefficient, and is the so-called market price of risk function: the
stocks rate of return per unit volatility. We assume that the Doleans-Dade exponential
process
Z
Z t

1 t
Zt = exp
(Su )dWu
|(Su )|2 du , t 0,
(4.23)
2 0
0
is a martingale, so that from Girsanovs theorem, for each T > 0, the measure QT on FT
defined by
dQT
dP
= ZT ,
is a probability measure equivalent to P on (, FT ) s.t. (St )0tT is a local martingale

under QT . To simplify notation, we omit the dependence of Q = QT on T .
We say that S has a non-trivial market price of risk if there is c > 0 s.t.
h1 Z T
i
lim P
|(St )|2 dt < c = 0.
(4.24)
T
T 0
The economic interpretation of (4.24) is that the market price of risk should on average be
bounded away from zero in the long run. Intuitively, in such a market, even if T goes to
infinity, market opportunities do not run dry as time goes on. More precisely, we have from
Theorem 1.4 in [23] the first result: If S has a non-trivial market price of risk satisfying
(4.24), then there exists > 0, and for each > 0, for T large enough, one can find some
RT
outcome XT = 0 t dSt for some admissible trading strategy , starting from zero initial
capital s.t.
(i) XT eT ,
(ii) P[XT eT ] 1 .
(4.25)
Here, for T > 0, we say that the trading strategy is admissible if it is predictable and SRt
integrable, and the corresponding outcome process {Xt = 0 u dSu , 0 t T } is bounded
from below by a constant. The message of the result (4.25) is the following. It says that
44
for any tolerance level , one may find T large enough, T T , such that an exponentially
growing profit can be obtained on [0, T ] with an exponentially decreasing potential loss and
with a probability of failure below . This can be interpreted as a strong form of asymptotic
arbitrage. However, the relation between and T is not clarified, and the trading strategies
and outcomes XT are not explicitly given (indeed, the proof is non-constructive).
We go further on such asymptotic arbitrage results by defining a stronger formulation
on the market price of risk than (4.24).
Definition 4.1 We say that the market price of risk satisfies a large deviations estimate
if there are constants c1 , c2 > 0 s.t.
Z
i
1 h1 T
lim sup ln P
|(St )|2 dt c1
< c2 .
(4.26)
T
T 0
T
For the Black-Scholes model, the market price of risk is constant, and so the estimate (4.26)
is trivially satisfied if the market price of risk is non zero. For other models, such large
deviations estimate for the market price of risk would follow by a contraction principle
whenever the diffusion process is ergodic and satisfies a large deviations principle. In the
case of the geometric Ornstein-Uhlenbeck process, this can be derived more directly from
explicit calculations of the Laplace transform and Ellis-Gartener theorem.
Under the stronger large deviation estimate (4.26), one expects a strenghtening of (4.25)
with an exponential decay in time for the probability of falling short of the exponential lower
bound in (4.25) (ii):
P[XT < e1 T ] e3 T ,
for some positive constants 1 , 3 . Such a result would establish an explicit relationship
between a preset tolerance level and the time necessary to reach that tolerance level.
We first illustrate such kind of result in the simple case of the Black-Scholes model with
constant volatility > 0, and where the market price of risk function is constant:
dSt = St (dWt + dt).
(4.27)
The density of the unique martingale measure Q is

2
Zt = exp Wt t ,
2
t 0.
Theorem 4.2 Consider the Black-Scholes model (4.27). Take (0, 2 /2), 0 < 1 < ,
and set for all T > 0, AT = {ZT eT }, T = e1 T Q[AcT ]/Q[AT ]. Then, the claim
XT
= e1 T 1AcT T 1AT
(4.28)
is attainable from zero initial capital, i.e. there exists an admissible trading strategy s.t.
RT
XT = 0 t dSt , and satisfies for any 0 < 2 < 1 :
e2 T , for large T,
1 2
1
lim
ln P[XT < e1 T ] =
.
T T
2 2
XT
45
(4.29)
(4.30)
Proof. First, observe that the sets AT satisfy

Q[AT ] 1 eT .
Indeed, Q[ZT eT ] =
the constants T satisfy
{ZT eT } ZT dP
e(1 )T
e2 T
(4.31)
eT , and this implies (4.31). We deduce that
1
1 eT
for large T
if 2 < 1 .
Consider now the contingent claim XT defined in (4.28), which is FTW -measurable. Clearly,
XT T , and we have by construction:
EQ [XT ] = e1 T Q[AcT ] T Q[AT ] = 0.
By the martingale representation theorem applied to the Q-martingale EQ [XT |Ft ], there
RT
exists an admissible trading strategy s.t. XT = 0 t dSt . It remains to show the large
deviation estimate (4.30). Observe by definition of XT in (4.28) that
{XT < e1 T } = AT .
Thus,
P[XT < e1 T ] = P[ZT eT ]
h
i
2
= P WT T T
2

T ,
=
2
where denotes the distribution function of a standard normal variable with density .
Finally, recalling the equivalence (d) (d)/d as d goes to infinity, we obtain the large
deviation estimate (4.30).
2
Remark 4.3 1. We can compute explicitly the constant T in the assertion of Theorem
4.2. Indeed, notice by Girsanovs theorem that WtQ = Wt + t is a Q-Brownian motion so
that
i
2
Q[AT ] = Q[WTQ + T T
2

=
+
T ,
2
and so

2 + T

T = e1 T
.
2 + T
2. The decay rate in (4.30) is optimal under the constraint XT T . More precisely, for
R
T = T t dSt s.t. X
T T , we can show by using Neymanany admissible outcome X
0
T < e1 T ] P[ZT eT ], and so the shortfall probability P[X
T <
Pearson lemma that P[X
T
e 1 ] cannot decay at a faster rate than described in (4.30).
46
We next consider the less trivial example of the geometric Ornstein-Uhlenbeck process
for the stock price (also known as Platen-Rebolledo model):
St = exp(Yt ),
(4.32)
where Y is the stationary Ornstein-Uhlenbeck process defined by

dYt = Yt dt + dWt ,
(4.33)
with parameters > 0, > 0. The process S satisfies the SDE

h
1
2 i
dSt = St dWt
Yt
dt ,
2

2
with a market price of risk: t = 1 Yt 2 , which satisfies the large deviation estimate
(4.26) with an explicit rate function indentified in [21]. The Doleans-Dade process
Z t

Z
2
1 t 1
2 2
1
Yu
dWu
Yu
Zt = exp
du
(4.34)
2
2 0 2
2
0
is a true martingale (see Proposition 2.5 in [1]), and this defines, for each T > 0, a unique
equivalent martingale measure Q on FT for S. Similarly as in Theorem 4.2, we have the
following large deviation result.
Theorem 4.3 Consider the geometric Ornstein-Uhlenbeck model (4.32)-(4.33). Take
2
(0, 8 + 4 ), 0 < 1 < , and set for all T > 0, AT = {ZT eT }, T = e1 T Q[AcT ]/Q[AT ].
Then, the claim
XT
= e1 T 1AcT T 1AT
(4.35)
is attainable from zero initial capital, i.e. there exists an admissible trading strategy s.t.
RT
XT = 0 t dSt , and satisfies for any 0 < 2 < 1 :
e2 T , for large T,

2
+
8
4
1
lim
ln P[XT < e1 T ] = 2
.
T T
8 + 2
XT
(4.36)
(4.37)
Proof. The proof follows the same lines of arguments as in Theorem 4.2. Keeping the
same notations, we have {XT < e1 T } = AT , and we only have to prove the large deviation
estimate:
2

+
8
4
1
ln P[AT ] = 2
.
(4.38)
lim
T T
+
8
To this end, we write (after some straightforward transformation on Z in (4.34))

AT
= {ZT eT }
Z T
Z

1
T 2
2
2 2
=
Yt dYt +
Y dt + (Y0 YT )
T
2 0 t
2
8
0
n1
o
1
2 2
=
T + T
,
T
T
8
47
RT
RT
2
with T = 0 Yt dYt + 2 0 Yt2 dt and T = 2 (Y0 YT ). We then use results in [21], which
state large deviations with an explicit rate function for T /T , combined with a perturbation
argument for T /T (see details in [23]), to derive the large deviation estimate (4.38).
2
Remark 4.4 The key point in Theorem 4.2 and Theorem 4.3 is the exponential decay of
the probabilities
P[ZT eT ].
(4.39)
For general models as in (4.22) with density of martingale measure (4.23), such result would
result from Donsker-Varadhan large deviation principle for the empirical distribution of S
and contraction principle. Indeed, observe by Itos formula that
Z
dS

t
(St )
(St )dt
(St )
0
Z T
Z T
=
f (St )dSt
2 (St )dt
0
0
Z T
Z
1 T 0
= F (ST ) F (S0 )
f (St ) 2 (St )dt
2 (St )dt,
2 0
0
(St )dWt =
0
where f (x) = (x)/(x), and F C 2 is s.t. F 0 = f . Thus, by denoting

LT
1
T
St dt,
0
the occupation measure of S, we see that the probability in (4.39) may be written as
P[ZT eT ] = P
h1
T
(F (ST ) F (S0 )
1
2
i
h(x)dLT ,
where we set h(x) = f 0 (x)2 (x) + 2 (x). Hence, we would obtain an exponential decay
estimate for (4.39) once we have a large deviation principle for
1
1
(F (ST ) F (S0 )
T
2
h(x)dLT .
0
Such LDP with explicit rate function is proved for the geometric Ornstein-Uhlenbeck process based on the result in [21], and could be extended for affine stochastic volatility (SV)
models by relying on recent results in this literature, see e.g. [39]. It is also used for large
time to maturity asymptotics in option pricing and implied volatility for affine SV models.
4.3
4.3.1
A large deviations approach to optimal long term investment

An asymptotic outperforming benchmark criterion
A popular approach for institutional managers is concerned about the performance of their
portfolio relative to the achievement of a given benchmark. This means that investors are
interested in maximizing the probability that their wealth exceed a predetermined index.
48
Equivalently, this may be also formulated as the problem of minimizing the probability that
the wealth of the investor falls below a specified value. This target problem was studied
by several authors for a goal achievement in finite time horizon, see e.g. [7] or [22]. In a
static framework, the paper [52] considered an asymptotic version of this outperformance
criterion when time horizon goes to infinity, which leads to a large deviations portfolio criterion. We now develop an asymptotic dynamic version of the outperformance management
criterion due to [47]. Such a problem corresponds to an ergodic objective of beating a given
benchmark, and may be of particular interest for institutional managers with long term
horizon, like mutual funds. On the other hand, stationary long term horizon problems are
expected to be more tractable than finite horizon problems, and should provide some good
insight for management problems with long, but finite, time horizon.
The general financial framework is the following. Let X be the portfolio wealth process
with a proportion A invested in stock. It grows typically in time at an exponential
rate, and thus, the relevant quantity over a long term horizon T is the logarithm of the
wealth:
1
ln XT ,
T
=
X
T
which is expected to converge when T goes to infinity. Given a threshold x, the outper x], which should decay exponentially fast as time horizon
formance probability is P[X
T
goes to infinity:
T x] exp(I(x, )T ),
P[X
as T .
Therefore, the lower is the decay rate I(x, ), the more chance there is of realizing an index
outperformance. The asymptotic version of the outperforming benchmark criterion is then
formulated as
v(x) = sup lim sup
A T
1
x],
ln P[X
T
T
x R.
(4.40)
This is a nonstandard large deviations control problem for which there is a priori no direct
dynamic programming, and we shall use a dual approach for solving this control problem.
4.3.2
Duality to the large deviations control problem
We adopt a duality approach based on the relation relating rate function of a LDP and
cumulant generating function. The formal derivation is the following. Given a portfolio
, its rate function I(., ) should be related by the
policy A, if there is a LDP for X
T
Donsker-Varadhan formula:
I(x, ) = sup[x (, )],
to the limiting log-moment generating function

(, ) = lim sup
T
49
ln E[eT XT ].
T
(4.41)
In this case, we would get

v(x) = sup lim sup
A T
1
x] = inf I(x, )
ln P[X
T
A
T
= inf sup[x (, )],
A 0
and so, provided that one could intervert infinum and supremum in the above relation
(actually, the minmax theorem does not apply directly since A is not necessarily compact
and x (, ) is not convex):
v(x) = sup[x ()],
(4.42)
where
() = sup (, ) = sup lim sup
A T
ln E[eT XT ].
T
(4.43)
Problem (4.43) is the dual problem via (4.42) to the original problem (4.40). We shall
see in the next section that (4.43) can be reformulated via suitable change of probability
measures as a risk-sensitive ergodic control problem, which is more tractable than (4.40) and
is studied by dynamic programming methods leading in some cases to explicit calculations.
The above formal derivation suggests the following dual procedure for solving problem
(4.40):
Solve the dual risk-sensitive control problem (), and find the associated optimal
control
().
The solution to the primal large deviations portfolio selection problem is then given
by
v(x) = sup[x ()],
(4.44)
with an optimal control (x) determined by

(x) =
((x)),
where (x) attains the supremum in (4.44).
The rigorous derivation of this duality relation is stated in the following theorem, which
may be viewed as an extension of the Gartner-Ellis theorem with control components.
there exists
Theorem 4.4 Suppose that there exists (0, ] such that for all [0, ),
a solution
() A to the dual problem (), with a limit in (4.41), i.e.
() =
h

i
1
() .
ln E exp T X
T
T T
lim
(4.45)
Then for all x < 0 ()

:=
Suppose also that () is continuously differentiable on [0, ).
lim% 0 (), we get
v(x) = sup [x ()] .
[0,)
50
(4.46)
Moreover, the sequence of controls

(
t x + n1 ,
,n

t
=
t 0 (0) + n1 ,
0 (0) < x < 0 ()

0
x (0),
s.t. 0 ((x)) = x (0 (0), 0 ()),

is nearly optimal in the sense that
with (x) (0, )
lim lim sup
n T
h ,n
i
1
x = v(x).
ln P X
T
T
Proof.
Step 1. Let us consider the Fenchel-Legendre transform of the convex function on [0, ):
(x) =
sup [x ()],
[0,)
x R.
(4.47)
it is well-known (see e.g. Lemma 2.3.9 in [10]) that the function

Since is C 1 on [0, ),
is convex, nondecreasing and satisfies:
(
(x)x ((x)), if 0 (0) < x < 0 ()

(x) =
(4.48)
0
0,
if x (0),
x0 6= x,
(x)x (x) > (x)x0 (x0 ), 0 (0) < x < 0 (),
(4.49)
is s.t. 0 ((x)) = x (0 (0), 0 ()).
where (x) (0, )

Moreover, is continuous on
(, 0 ()).
Step 2: Upper bound. For all x R, A, an application of Chebycheffs inequality yields:
x] exp(xT )E[exp(T X
)], [0, ),
P[X
T
T
and so
lim sup
T
x] x + lim sup 1 ln E[exp(T X

)], [0, ).
ln P[X
T
T
T
T
T
By definitions of and , we deduce:

sup lim sup
A T
1
T x] (x).
ln P[X
T
(4.50)
let us define the probability measure Qn on (, FT )

Step 3: Lower bound. Given x < 0 (),
T
via:
h
i
dQnT
,n
= exp (xn )XT T ((xn ), ,n ) ,
(4.51)
dP
where xn = x + 1/n if x > 0 (0), xn = 0 (0) + 1/n otherwise, ,n =
((xn )), and
)],
T (, ) = ln E[exp(T X
T
51
A.
[0, ),
We now take > 0 small enough so that

Here n is large enough so that x + 1/n < 0 ().
We then have:
x xn and xn + < 0 ().
h
i
1
,n x] 1 ln P xn < X
,n < xn +
ln P[X
T
T
T
T Z

1
dP
n
,n
=
ln
1
dQ
T
dQnT {xn <XT <xn +} T
1
(xn ) (xn + ) + T ((xn ), ,n )
T
h
i
1
n
T,n < xn + ,
+ ln QT xn < X
T
where we use (4.51) in the last inequality. By definition of the dual problem, this yields:
lim inf
T
1
T,n x] (xn ) (xn + ) + ((xn ))
ln P[X
T
h
i
1
,n < xn +
+ lim inf ln QnT xn < X
T
T T
(xn ) (xn )
h
i
1
,n < xn + , (4.52)
+ lim inf ln QnT xn < X
T
T T
where the second inequality follows by the definition of (and actually holds with equality
due to (4.48)). We now show that:
h
i
1
,n < xn + = 0.
lim inf ln QnT xn < X
(4.53)
T
T T
n the c.g.f. under Qn of X ,n . For all R, we have by (4.51):
Denote by
T
T
T
nT () := ln E QnT [exp(XT,n )]
T ((xn ) + , ,n ) T ((xn ), ,n ).
Therefore, by definition of the dual problem and (4.45), we have for all [(xn ),
(xn )):
lim sup
T
1 n
() ((xn ) + ) ((xn )).
T T
(4.54)
As in part 1) of this proof, by Chebycheffs inequality, we have for all [0, (xn )):
h ,n
i
1
T xn + (xn + ) + lim sup 1
nT ()
lim sup ln QnT X
T
T
T
T
(xn + ) + ( + (xn )) ((xn )),
where the second inequality follows from (4.54). We deduce
h ,n
i
1
T xn + sup{ (xn + ) () : [(xn ), )}

lim sup ln QnT X
T T
((xn )) + (xn ) (xn + )
(xn + ) ((xn )) + (xn ) (xn + ) ,
= (xn + ) + (xn ) + (xn ),
52
(4.55)
where the second inequality and the last equality follow from (4.48). Similarly, we have for
all [(xn ), 0]:
i
h ,n
1
T xn (xn ) + lim sup 1
nT ()
lim sup ln QnT X
T T
T T
(xn ) + ((xn ) + ) ((xn )),
and so:
lim sup
T
h ,n
i
1
xn sup{ (xn ) () : [0, (xn )]}
ln QnT X
T
T
((xn )) + (xn ) (xn )
(xn ) + ((xn )) (xn ).
(4.56)
By (4.55)-(4.56), we then get:

hn ,n
o n ,n
oi
1
T xn X
T xn +
lim sup ln QnT X
T T

h
i
h
i
1
1
n ,n
n ,n
max lim sup ln QT XT xn + ; lim sup ln QT XT xn
T T
T T
max { (xn + ) + (xn ) + (xn ); (xn ) + ((xn )) (xn )}

< 0,
,n xn }
where the strict inequality follows from (4.49). This implies that QnT [{X
T
,n < xn + ] 1 as T goes to infinity.
,n xn + }] 0 and hence Qn [xn < X
{X
T
T
T
In particular (4.53) is satisfied, and by sending to zero in (4.52), we get:
lim inf
T
1
T,n x] (xn ).
ln P[X
T
we obtain by sending n to infinity and recalling that

By continuity of on (, 0 ()),
0
0
(x) = 0 = ( (0)) for x (0):
lim inf lim inf
n
1
,n x] (x).
ln P[X
T
T
This last inequality combined with (4.50) ends the proof.
Remark 4.5 Connection with classical portfolio selection and risk aversion
Since XT = eT XT , we see that the value function of the dual problem can then be written
as:
h
i
1
()
ln E U XT
,
() = lim
T T
where U (x) = x is a power utility function with Constant Relative Risk Aversion (CRRA)
1 > 0 provided that < 1. Then, Theorem 4.4 means that for any target level x, the
optimal overperformance probability of growth rate is (approximately) directly related, for
large T , to the expected CRRA utility of wealth, by:
i
h

x] E U(x) X e(x)xT ,
P [X
(4.57)
T
T
53
with the convention that (x) = 0 for x 0 (0). Hence, 1 (x) can be interpreted as a
constant degree of relative risk aversion for an investor who has an overperformance target
level x. Moreover, by strict convexity of function in (4.47), it is clear that (x) is strictly
increasing for x > 0 (0). So an investor with a higher target level x has a lower degree
of relative risk aversion 1 (x). In summary, Theorem 4.4 (or relation (4.57)) inversely
relates the target level of growth rate to the degree of relative risk aversion in expected
utility theory.
4.3.3
Application to factor models
Let us consider a factor model for the savings account S 0 , and the stock price S in the
form:
dSt0
= r(Yt )dt,
St0
dSt
= (Yt )dt + (Yt )dWt ,
St
with a factor Y governed by an Ornstein-Uhlenbeck ergodic process:

dYt = k(
y Yt )dt + dBt ,
d < B, W > = dt,
with k > 0, > 0, and B, W are two correlated Brownian motions on a filtered probability
space (, F, F = (Ft )t0 , P). The coefficients r(y), (y), (y) > 0 are measurable functions
on R. The dynamics of the wealth process X = X controlled by the proportion invested
in stock is governed by
h
i

dXt = Xt r(Yt ) + ( r)(Yt )t dt + t (Yt )dWt .
(4.58)
Here, is an F-adapted process valued in A R, and we denote by A the set of controlled
processes s.t. the equation (4.58) is well-defined.
We now show that the dual control problem (4.43) may be transformed via a change of
probability measure into a risk-sensitive control problem. From the dynamics of the wealth
= 1 ln X as:
process, we may rewrite the moment generating function of X
T
T
T
h
Z
)] = X E () exp
JT (, ) := E[exp(T X
T
0
T
`(, Yt , t )dt
i
where
`(, y, ) = r(y) + ( r)(y)
(1 )
((y))2 ,
2
and () is the Doleans-Dade exponential local martingale

Z t

Z
2 t
2
t () = exp
(Yu )u dWu
|(Yu )u | du , t 0.
2 0
0
(4.59)
If () is a true martingale, it defines a probability measure Q = Q () under which,

by Girsanovs theorem, the dynamics of Y is given by:
dYt =

k(
y Yt ) + (Yt )t dt + dWtQ ,
54
where W Q is a Q-Brownian motion. Hence, the dual problem may be written as a stochastic
control problem with exponential integral cost criterion:
1
ln JT (, )
A T T

Z T

1
Q
`(, Yt , t )dt , 0.
sup lim sup ln E exp
A T T
0
() := sup lim sup

=
(4.60)
For fixed , this is an ergodic risk-sensitive control problem which has been studied by
several authors, see e.g. [19], [5] or [51] in a discrete-time setting. It admits a dynamic
programming equation:
() =
1 2 00
1
(y) + |0 (y)|2 + k(
y y)0 (y) + r(y)
2
2

( r)(y)
(1 )
0
2
+ max (y)( (y) +
)
((y)) .
A
(y)
2
(4.61)
The unknown is the pair ((), ) R C 2 (R), and () is a candidate for (). The
above P.D.E. is formally derived by considering the finite horizon problem

Z T

Q
u (T, y) = sup Ey exp
`(, Yt , t )dt ,
A
by writing the Bellman equation for this classical control problem and by making the
logarithm transformation
ln u (T, y) ' ()T + (y),
for large T . One can prove rigorously that a pair solution ((), ) to the PDE (4.61)
provides a solution () = () to the dual problem (4.43), with an optimal control given
by the argument max in (4.61). This is called a verification theorem in stochastic control
theory. Actually, there may have multiple solutions to (4.61) (even up to a constant),
and we need some ergodicity condition to select the good one that satisfies the verification
theorem. We refer to [47] for the details, and we illustrate our purpose in the case where A
= R, and the coefficients r(y), (y) are linear in y, and is constant. This includes BlackScholes, Platen-Rebolledo or Vasicek models. In this case, we are looking for a quadratic
solution to (4.61):
(y) =
1
A()y 2 + B()y,
2
with a candidate optimal Markov control
(, y) =
0 (y)
( r)(y)
+
.
(y)2 (1 )
(y)(1 )
By substituting into (4.61), and cancelling terms in y 2 , y and constant terms, we obtain
a polynomial second degree equation for A()
a linear equation for B(), given A()
55
() is then expressed explicitly in function of A() and B() from (4.61).

The existence of a solution to the second degree equation for A(), through the non
negativity of the discriminant, allows to determine the bound and so the interval [0, )
on which is well-defined and finite. Moreover, we find two possible roots to the polynomial second degree equation for A(), but only one satisfies the ergodicity condition. From
Theorem 4.4, we deduce that
v(x) = sup
x () , x < 0 (),
(4.62)
[0,)
with a sequence of nearly optimal controls given by:

(

x + n1 , Yt ,
,n

t
=
0 (0) + n1 , Yt ,
0 (0) < x < 0 ()

0
x (0),
s.t. 0 ((x)) = x. In the one-factor model described above with linear

with (x) (0, )
= , and so (4.62) holds
functions r, , and constant , the function is steep, i.e. 0 ()
for all x R.
Example 1: Black-Scholes model
dSt = St (dt + dWt ),
with constants , > 0, and a constant interest rate r. Then, the solution to the dual
problem is
1 r 2
() =
, for < = 1,
2 1 2
and the solution to the large deviations control problem is:
(
2
) ,
if x x
:= 0 (0) =
( x x
v(x) = sup [x ()] =
0,
if x < x
,
[0,1)
(x) = 1

1 r 2
2 2
p
x
/x if x x
, and 0 otherwise, and
(
2x,
if x x
t =
r
,
if
x
<
x
.
2
We observe that for an index value x small enough, actually x < x

, the optimal investment
for our large deviations criterion is equal to the optimal investment of the Mertons problem
for an investor with relative risk aversion one. When the value index is larger than x
, the
optimal investment is increasing with x, with a degree of relative risk aversion 1 (x)
decreasing in x.
Example 2: Platen-Rebolledo model
St = exp(Yt )
dYt = k(
y Yt )dt + dWt .
56
The solution to the dual problem is

() =

k
1 1 +
2
2
k y r + 12 2
!2
,
[0, 1).
The solution to the (primal) large deviations problem is

kyr+ 12 2 2
(xx)2
+
:= 12
xx+ k , if x x
v(x) =
4
0,
if x < x
,
and the optimal portfolio proportion is:
kyr+ 21 2
x)+k]
[4(x
Y
+
,
t
t (x) =
kyr+ 21 2
,
kY +
t
k
4
if x x
if x < x
.
Some variants and extensions in finance of this large deviations control problem are studied
in [35], [34] and recently for robust utility maximization in [41].
References
[1] Andersen L. and V. Piterbarg (2007): Moment explosions in stochastic volatility models,
Finance and Stochastics, 11, 29-50.
[2] Avellaneda M., Boyer-Olson D., Busca J. and P. Friz (2003): Methodes de grandes deviations
et pricing doptions sur indice, C.R. Acad. Sci. Paris, 336, 263-266.
[3] Barles G. (1994): Solutions de viscosite des equations dHamilton-Jacobi, Springer Verlag.
[4] Berestycki H., Busca J. and I. Florent (2004): Computing the implied volatility in stochastic
volatility models, Communications on Pure and Applied Mathematics, vol LVII, 1352-1373.
[5] Bielecki T. and S. Piska (2004): Risk-sensitive ICAPM with application to fixed-income management, IEEE Transactions on automatic control, 49, 420-432.
[6] Bourgade P. and O. Croissant (2005): Heat kernel expansion for a family of stochastic volatility
models: -geometry, available at http://arxiv.org/pdf/cs/0511024
[7] Browne S. (1999): Beating a moving target: optimal portfolio strategies for outperforming a
stochastic benchmark, Finance and Stochastics, 3, 275-294.
[8] Delbaen F. and W. Schachermayer (2006): The mathematics of arbitrage, Springer Finance.
[9] Dembo A., Deuschel J.D. and D. Duffie (2004): Large portfolio losses, Finance and Stochastics,
8, 3-16.
[10] Dembo A. and O. Zeitouni (1998): Large deviations techniques and applications, 2nd edition,
Springer Verlag.
[11] Donsker, M. D. and Varadhan, S. R. S. (1975): Asymptotic evaluation of certain Markov
process expectations for large time. I. II, Comm. Pure Appl. Math., 28, 1-47; ibid. 28, 279-301.
57

process expectations for large time. III, Comm. Pure Appl. Math., 29, (4), 389-461.
process expectations for large time. IV, Comm. Pure Appl. Math., 36, 183-212.
[14] Duffie D., Filipovic D. and W. Schachermayer (2003): Affine processes and applications in
finance, Annals of Applied Probability, 13, 984-1053.
[15] Dupuis P. and R. Ellis (1997): A weak convergence approach to the theory of large deviations,
Wiley Series in Probability and Statistics.
[16] Durrleman V. (2010): From implied to spot volatilities, Finance and Stochastics, 14, 157-177.
[17] Feng J., Forde M. and J.P. Fouque (2009): Short maturity asymptotics for a fast meanreverting Heston stochastic volatility model, to appear in SIAM Journal on Financial Mathematics.
[18] Fleming W. and M. James (1992): Asymptotic series and exit time probability, Annals of
Probability, 20, 1369-1384.
[19] Fleming W. and W. McEneaney (1995): Risk sensitive control on an infinite time horizon,
SIAM Journal on Control and Optimization, 33, 1881-1915.
[20] Fleming W. and M. Soner (1994): Controlled Markov processes and viscosity solutions, Springer
Verlag.
[21] Florens D. and H. Pham (1999): Large deviations in estimation of an Ornstein-Uhlenbeck
model, Journal of Applied Probability, 36, 60-77.
[22] F
ollmer H. and P. Leukert (1999): Quantile hedging, Finance and Stochastics, 3, 251-273.
[23] F
ollmer H. and W. Schachermayer (2008): Asymptotic arbitrage and large deviations, Math.
Fin. Econ., 1, 213-249.
[24] Forde M. and A. Jacquier (2009a): Small time asymptotics for implied volatility under the
Heston model, to appear in International Journal of Theoretical and Applied Finance.
[25] Forde M. and A. Jacquier (2009b): Small time asymptotics for implied volatility under a
general local-stochastic volatility model, preprint.
[26] Forde M. and A. Jacquier (2009c): The large-maturity smile for the Heston model, to appear
in Finance and Stochastics.
[27] Fournie E., Lasry J.M. and P.L. Lions (1997): Some nonlinear methods to study far-fromthe-money contingent claims, Numerical Methods in Finance, L.C.G. Rogers et D. Talay, eds,
Cambridge University Press.
[28] Fournie E., Lasry J.M. and N. Touzi (1997): Monte Carlo methods for stochastic volatility
models, Numerical Methods in Finance, L.C.G. Rogers et D. Talay, eds, Cambridge University
Press.
[29] Gatheral J., Hsu E., Laurence P., Ouyang C. and T.H. Wang (2010): Asymptotics of implied
volatility in local volatility models, Preprint.
[30] Glasserman P., Heidelberger P. and P. Shahabuddin (1999), Asymptotically optimal importance sampling and stratification for pricing path-dependent options, Mathematical finance, 9,
117-152.
58
[31] Glasserman P., Kang W. and P. Shahabuddin (2007), Large deviations in multifactor portfolio
credit risk, Mathematical Finance, 17, 345-379.
[32] Glasserman P. and J. Li (2005), Importance sampling for portfolio credit risk, Management
science, 51, 1643-1656.
[33] Guasoni P. and S. Robertson (2008): Optimal importance sampling with explicit formulas in
continuous-time, Finance and Stochastics, 12, 1-19.
[34] Hata H., Nagai H. and S. Sheu (2009): Asymptotics of the probability minimizing a down-side
risk, to appear in Annals of Applied Probablity.
[35] Hata H. and J. Sekine (2005): Solving long term investment problems with Cox-Ingersoll-Ross
interest rates, Advances in Mathematical Economics, 8, 231-255.
[36] Henry-Labord`ere P. (2005): A general asymptotic implied volatility for stochastic volatility
models, working paper.
[37] Hurd T. and A. Kuznesov (2008): Explicit formulas for Laplace transforms of stochastic
integrals, Markov Processes and Related Fields, 14, 277-290.
[38] Kabanov Y. and D. Kramkov (1998): Asymptotic arbitrage in large financial markets, Finance and Stochastics, 2, 143-172.
[39] Keller-Ressel M. (2009): Moment explosions and long-term behavior of affine stochastic
volatility models, to appear in Mathematical Finance.
[40] Kemna A. and T. Vorst (1990): A pricing method for options based on average values,
Journal of Banking and Finance, 14, 113-129.
[41] Knispel T. (2010): Asymptotics of robust utility maximization, Preprint, University of Leibniz.
[42] Lasry J.M. and P.L. Lions (1995): Grandes deviations pour des processus de diffusion couples
par un processus de sauts, CRAS, t. 321, 849-854.
[43] Laurence P. (2008): Implied volatility, fundamental solutions, asymptotic analysis and symmetry methods, presentation at Linz, Ricam kick-off workshop.
[44] Lee R. (2004): The Moment Formula for Implied Volatility at Extreme Strikes Mathematical
Finance, 14, 469-480.
[45] Leijdekker, Mandjes M. and P. Spreij (2009): Sample path large deviations in credit risk,
preprint available at http://arxiv.org/pdf/0909.5610v1
[46] Lewis A. (2007): Geometries and smile asymptotics for a class of stochastic volatility models,
ww. optioncity.net
[47] Pham H. (2003): A large deviations approach to optimal long term investment, Finance and
Stochastics, 7, 169-195.
[48] Pham H. (2007): Some applications of large deviations in finance and insurance, in Paris
Princeton Lectures Notes on Mathematical Finance, 1919, 191-244.
[49] Robertson S. (2010): Sample path large deviations and optimal importance sampling for
stochastic volatility models, Stochastic Processes and their Applications, 120, 66-83.
[50] Sowers R. (2009): Losses in investment-grade tranches of synthetic CDOs: a large deviations
analysis, preprint.
59
[51] Stettner L. (2004): Duality and risk sensitive portfolio optimization, in Mathematics of
Finance, Proceedings AMS-IMS-SIAM, eds G. Yin and Q. Zhang, 333-347.
[52] Stutzer M. (2003): Portfolio choice with endogenous utility : a large deviations approach,
Journal of Econometrics, 116, 365-386.
[53] Tehranchi M. (2009): Asymptotics of implied volatility far from maturity, Journal of Applied
Probability, 46, 629-650.
[54] Williams D. (1991): Probability with martingales, Cambridge mathematical text.
60

Pham-Large Deviations in Finance-Smai

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pham-Large Deviations in Finance-Smai

Uploaded by

Copyright:

Available Formats

Large deviations in finance

August 30, 2010

overview of large deviations theory

3 Large deviations in option pricing

4 Large deviations in risk management

where I 0 is the so-called large deviations rate, and (C ) is a sequence converging at a

goes to infinity. By using Stirling formula: k! ' k k ek 2k, we get

Since fi = ni /n, this implies

An overview of large deviations theory

If X is a (real-valued) random variable on (, F) with probability distribution (dx), the

Suppose that X1 , . . . , Xn , . . . , is an i.i.d. sequence of random variables with distribution

A similar calculation shows that 00 () is the variance of . Notice in particular that if 0

A direct simple algebra calculation shows that is the normal distribution N ( 2 , 2 ).

As supremum of affine functions, is convex. The sup in the definition of can be

Then, we have by (2.3) and for all > 0:

sup [.x ()],

we have an importance sampling (IS) (unbiased) estimator of pn , by taking the average of

Large deviations and Laplace principles

(ii) Lower bound: for any open subset G of X

If F is a subset of X s.t. inf xF o I(x) = inf xF I(x) := IF , then

lim ln E e(Z )/ = sup (x) I(x) .

e(Z )/ P[Z dx]

max e(M +2jM/N )/ P[Z FN,j ]

sup [(x) I(x)] +

By sending N to infinity, we get the inequality in (2.11).

lim inf ln E e(Z )/ lim inf ln E e(Z )/ 1Z G

(x0 ) + lim inf ln P[Z G]

(x0 ) inf I(x)

Since x0 X and > 0 are arbitrary, we get the required result.

lim ln E e(Z )/ = sup (x) I(x)

= sup [n (x) I(x)] = inf [n (x) + I(x)].

and this is left as an exercice to the reader.

= sup [(y) I(y)]

which ends the proof.

Relative entropy and Donsker-Varadhan formula

and the supremum is attained uniquely by the probability measure 0 defined by

inequality. In the general case, we define for [0, 1]: = (1 ) + , f = d

By applying the variational formula (2.12), we obtain:

Since is arbitrary in P(S), we deduce that

sup [() R(|)].

sup [() R(|)].

the process Z = W , as goes to zero, where W = (Wt )t[0,T ] is a Brownian motion

Theorem 2.6 (Schilder)

lim inf ln P[ W B(h, )] I(h).

so that by Cameron-Martin theorem, W h := W

P[ W B(h, )] = P[|W h | < ]

is a Brownian motion under Qh .

dWt )1|W |<

ln P[ W B(h, )] I(h) + ln P[|W | < ],

and thus the required lower bound.

and w.l.o.g. for x = 0. The transformation W X is given by the deterministic map

Moreover, observe that for f H0 ([0, T ]), h = F (f ) is differentiable a.e. with

and suppose that there exists a Lipschitz function b on [0, T ] Rd s.t.

and the corresponding exit probability

was initiated by Varadhan and Freidlin-Wentzell by probabilistic arguments. An alternative

+ b (t, x).Dx v + tr( 0 (t, x)Dx2 v ) = 0, (t, x) [0, T )

together with the boundary conditions

Here is the boundary of . We now make the logarithm transformation

and the boundary data (2.19)-(2.20) become

At the limit = 0, the PDE (2.21) becomes a first-order PDE

Hence, the PDE (2.24) is rewritten as

H (u, x(u), x(u))du,

b(u, x(u)))0 ( 0 (u, x(u)))1 (x(u)

uniformly on compact sets of N , where w is solution to the PDE problem

p(z ) 1 p(z ) z + 1 (p)