You are on page 1of 44

An Introduction to Malliavin

Calculus
Courant Institute of Mathematical Sciences
New York University
Peter K. Friz
August 3, 2002
These notes available on
www.math.nyu.edu/phd students/frizpete
Please send corrections to
Peter.Friz@cims.nyu.edu
These lecture-notes are based on a couple of seminar-talks I gave at Courant
in Spring 2001. I am extremely grateful to Prof. S.R.S. Varadhan for supporting
and stimulating my interest in Malliavin Calculus. I am also indepted to Nicolas
Victoir, Enrique Loubet for their careful reading of this text.
-Peter F.
Notations:
... Wienerspace C[0, 1] resp. C([0, 1], R
m
)
F ... natural ltration
H ... L
2
[0, 1] resp. L
2
([0, 1], R
m
)
H
k
... tensorproduct

= L
2
([0, 1]
k
), H

k
... symmetric tensorproduct

H ... Cameron-Martin-space , elements are paths with derivative in H


W : F R ... Wiener-measure on

t
= (t) ... Brownian Motion (= coordinate process on (, F, W))
W : H L
2
() ... dened by W(h) =
_
1
0
hd
S
2
... Wiener polynomials, functionals of form polynomial(W(h
1
), ..., W(H
n
)
S
1
... cylindrical functionals, S
2
D
k,p
... L
p
() containing k-times Malliavin dierentiable functionals
D

...
k,p
D
k,p
, smooth Wiener functionals
,
m
... (m-dimensional) Lebesgue-measure
,
n
... (n-dimensional) standard Gaussian measure
... gradient-operator on R
n
L
p
(, H) ... H-valued random-variables s.t.
_


H
dW <
D ... Malliavin derivative, operator L
p
() L
p
(, H)
... = D

the adjoint operator, also: divergence, Skorohod Integral


L ... = D, Ornstein-Uhlenbeck operator L
p
() L
p
()
W
k,p
... Sobolev-spaces built on R
n
H
k
...W
k,2
... (for functions f : R R) simple dierentiation

... adjoint of on L
2
(R, )
L ... =

, one-dimensional OU-operator

i
,
ij
... partial derivatices w.r.t. x
i
, x
j
etc
L ... generator of m-dimensional diusion process, for instance L = E
ij

ij
+B
i

i
H
n
... Hermite-polynomials

n
(t) ... n-dimensional simplex {0 < t
1
< ... < t
n
< t} [0, 1]
n
J() ... Iterated Wiener-Ito integral, operator L
2
[
n
] toC
n
L
2
()
C
n
... n
th
Wiener Chaos
... multiinex (nite-dimensional)
X ... m-dimensional diusion process given by SDE, driven by d BMs
= (X) ... < DX, DX >
H
, Malliavin covariance matrix
V, W ... vectorelds on R
m
, seen as map R
m
R
m
or as rst order dierential
operator
1
B, A
0
... vectorelds on R
m
, appearing as drift term in Ito (resp. Stratonovich)
SDE
A
1
, . . . , A
d
...vectorelds on R
m
, appearig in diusion term of the SDE
d ... Stratonovich dierential = Ito dierential + (...)dt
X ... diusion given by SDE, X(0) = x
Y, Z ... R
mm
-valued processes, derivative of X w.r.t. X(0) resp. the inverse
... V is short for the matrix
j
V
i
,
W
V ... connection, = (V )W
[V, W] ... Lie-bracket, yields another vectoreld
Lie {...} ... the smallest vectorspace closed under Lie-brackets, containing {...}
D ... = C

c
, test-functions
D

... Schwartz-distributions = cont. functionals on D


2
Chapter 1
Analysis on the Wiener
Space
1.1 Wiener Space
will denote the Wiener Space C([0, 1]). As usual, we put the Wiener measure
W on therefore getting a probability space
(, F, W)
where F is generated by the coordinate maps. On the other hand we can furnish
with the

- norm making it a (separable) Banach-space. F coincides with


the -eld generated by the open sets of this Banach-space. Random-variables
on are called Wiener functionals. The coordinate process (t) is a Brownian
motion under W, with natural ltration ({(s) : s t}) F
t
. Often we will
write this Brownian motion as (t) = (t, ) = (t), in particular in the context
of stochastic Wiener-Ito integrals.
1.2 Two simple classes of Wiener functionals
Let f be a polynomial, h
1
, . . . , h
n
H L
2
[0, 1]. Dene rst a class of cylin-
drical functionals
S
1
= {F : F = f(
t
1
, . . . ,
t
n
))},
then the larger class of Wiener polynomials
S
2
= {F : F = f(W(h
1
), . . . , W(h
n
))}
where W(h)
_
1
0
hd.
Remarks: - Both S
i
are algebras. In particular S
2
is what [Malliavin2] p13
calls the fundamental algebra.
- A S
2
-type function with all h
i
s deterministic step functions is in S
1
.
- In both cases, we are dealing with r.v. of the type
F = f(n-dimensional gaussian) =

f(n indep. std. gaussians).
3
Constructing

f boils down to a Gram-Schmidt-orthonormalization for the
h
i
s. When restricting discussion to S
2
-functionals one can actually forget and
simply work with (R
n
,
n
), that is, R
n
with n-dimensional standard Gaussian
measure d
n
(x) = (2)
n/2
exp(|x|
2
/2)dx.
This remark looks harmless here but will prove useful during the whole setup
of the theory.
- S
1
S
2
L
p
() for all p 1 as t he polynomial growth of f assures
the existence of all moments. From this point of view, one could weaken the
assumptions on f, for instance smooth and of maximal polyniomal growth or
exponential-martingale-type functionals.
1.3 Directional derivatives on the Wiener Space
Recall that [W(h)]() =
_
1
0
hd() is constructed as L
2
-limits and hence, as
element in L
2
(, W), only W-a.s. dened. Hence, any S
2
- or more general
Wiener functional is only W-a.s. dened.
In which directions can we shift the argument of a functional while keeping
it a.s. well-dened? By Girsanovs theorem, the Cameron-Martin-directions

h() :=
_
.
0
h(t)dt with h H
are ne, as the shifted Wiener-measure (

h
)W is equivalent to W. The set
of all

h is the Cameron-Martin-space

H. It is known that for a direction
k

H the shifted measure is singular wrt to W, see [RY], Ch. VIII/2.
Hence, F(+k) does not make sense, when F is an a.s. dened functional, and
neither does a directional derivative in direction k.
Remarks: - The paths

h are sometimes called nite energy paths
- The set

H has zero W-measure, since every

h is of bounded variation
while W-a.s. Brownian paths are not.
- The map h

h is a continuous linear injection from H into (,

).
- Also, h

h is a bijection from H

H with inverse
d
dt

h(t) = h(t). This


derivative exists dt-a.s. since

h is absolutly continuous, moreover h H i.e.
square-integrable.
In particular, we can use this transfer the Hilbert-structure from H to

H. For
g, k

H let g,

k denote their square-integrable derivatives. Then
< g, k >

H
< g,

k >
H
=
_
1
0
g

kd
- In a more general context

H (or indeed H) are known as reproducing kernel
space for the Gaussian measure W on the Banach space (terminology from
[DaPrato], p40).
1.4 The Malliavin derivative D in special cases
Take F S
1
, with slightly dierent notation
F() = f((t
1
), . . . , (t
n
)) = f(W(1
[0,t
1
]
), . . . , W(1
[0,t
n
]
)
4
Then, at = 0
d
d
F( +

h)
equals
n

i=1

i
f((t
1
), . . . , (t
n
))
_
t
i
0
hd =< DF, h >
H
where we dene
DF =

i
f(W(1
[0,t
1
]
), . . . , W(1
[0,t
n
]
)1
[0,t
i
]
.
This extends naturally to S
2
functionals,
DF =

i
f(W(h
1
), . . . , W(h
n
))h
i
,
and this should be regarded as an H-valued r.v.
Remarks: - D is well-dened. In particular for F = W(h) =
_
1
0
hd this is
a consequence of the Ito-isometry.
- Sometimes it is convinient to write
D
t
F() =

i
f(W(h
1
)(), . . .)h
i
(t)
which, of course, is only W-as well-dened.
- Since D(
_
1
0
hd) = D(W(h)) = h,
DF =

i
f(W(h
1
), . . . , W(h
n
))D(W(h
i
)),
which is the germ of a chain-rule-formula.
- Here is a product rule, for F, G S
2
D(FG) = FDG+GDF. (1.1)
(Just check it for monomials, F = W(H)
n
, G = W(g)
m
.) See [Nualart], p34
for an extension.
- As f has only polynomial growth, we have DF L
p
(, H) i.e.
_

DF()
p
H
dW <
. For p = 2, this can be expressed simpler, DF L
2
([0, 1] ), (after xing
a version) DF = DF(t, ) can be thought of a stochastic process.
1.5 Extending the Malliavin Derivative D
So far we have
D : L
p
() S
2
L
p
(, H).
It is instructive to compare this to the following well-known situation in
(Sobolev-)analysis. Take f L
p
(U), some domain U R
n
. Then the gradient-
operator = (
i
)
i=1,...,n
maps an appropriate subset of L
p
(U) into L
p
(U, R
n
).
The R
n
comes clearly into play as it is (isomorphic to) the tangent space at any
5
point of U.
Going back to the Wiener-space we see that H (or, equivalently,

H) plays the
role of the tangent space to the Wiener spaces.
1
Again, on L
P
(U). What is its natural domain? The best you can do is
(, W
1,p
), which is a closed operator, while (, C
1
c
) (for instance) is a closable
operator. This closability (see [RR]) is exactly what you need to extend to
operator to the closure of C
1
c
with respect to
W
1,p where
f
p
W
1,p
=
_
U
|f|
p
d
n
+
n

i=1
_
U
|
i
f|
p
d
n
or equivalently
_
U
|f|
p
d
n
+
_
U
f
p
R
n
d
n
.
Using an Integration-by-Parts formula (see the following section on IBP), (D, S
2
)
is easily seen to be closable (details found in [Nualart] p26 or [Uestuenel]). The
extended domain is denoted with D
1,p
and is exactly the closure of S
2
with
respect to
1,p
where
F
p
1,p
=
_

|F|
p
dW +
_

DF
p
H
dW
= E|F|
p
+EDF
p
H
.
Remarks: - Look up the denition of closed operator and compare Bass way
to introduce the Malliavin derivative ( [?, ?, Bass] p193) with the classical
result in [Stein] p122.
- For simplicity take p = 2 and consider F = f(W(h
1
), . . . , W(h
n
)) with h
i
s
in H. As mentioned in section 1.2, there is n.l.o.g. by assuming the h
i
s to be
orthonormal.
Then
DF
2
H
=
n

i=1
(
i
f(n iid std gaussians))
2
.
and F
2
1,2
simply becomes
_
R
n
f
2
d
n
+
_
R
n
f
2
R
nd
n
which is just the norm on the weighted Sobolev-space W
1,p
(
n
). More on this link
between D
1,p
and nite-dimensional Sobolev-spaces is to be found in [Malliavin1]
and [Nualart].
- A frequent characterization of Sobolev-spaces on R
n
is via Fourier trans-
form (see, for instance, [Evans] p 282). Let f L
2
= L
2
(R
n
), then
f H
k
i (1 +|x|
k
)

f L
2
.
1
We follow Malliavin himself and also Nualart by dening DF as H-valued r.v.. This seems
the simplest choice in view of the calculus to come. Oksendal, Uestuenel and Hsu dene it as

H-valued r.v. As commented in Section 1.3 the dierence is purely notational since there is a
natural isomorphism between H and

H. For instance, we can write D(

1
0
hd) = h while the

H choice leads to (

H-derivative of)(

1
0
hd =

.
0
hd.
6
Moreover,
f
H
k (1 +|x|
k
)

f
L
2.
In particular, this allows a natural denition of H
s
(R
n
) for all s R. For later
reference, we consider the case k = 1. Furthermore, for simplicity n = 1. Recall
that i is a self-adjoint operator on (L
2
(R), < , >)
< (1 +x)

f, (1 +x)

f > = < (1 +i)f, (1 +i)f >
= < f, f > + < if, if >
= < f, f > + < f,
2
f >
= < f, (1 +A)f >,
where A denotes the negative second derivative. In Section 1.10 this will be
linked to the usual denition of Sobolev-spaces (as seen at the beginning of this
section), both on (R
n
,
n
) as on (, W).
- The preceeding discussion about how to obtain the optimal domain for the
gradient on (R
n
,
n
) is rarely an issue in practical exposures of Sobolev Theory
on R
n
. The reason is, of course, that we can take weak derivatives resp. dis-
tributional derivatives. As well known, Sobolev-spaces can then be dened as
those L
p
-functions whose weak derivatives are again in L
p
. A priori, this cant
be done on the Wiener-spaces (at this stage, what are the smooth test-functions
there?).
1.6 Integration by Parts
As motivation, we look at (R, ) rst. Take f smooth with compact support
(for instance), then, by the translation invariance of Lebesgue-measure,
_
f(x +h)d =
_
f(x)d
and hence, after dividing by h and h 0,
_
f

d = 0.
Replacing f by f g this reads
_
f

gd =
_
fg

d.
The point is that IBP is the innitesimal expression of a measure-invariance.
Things are simple here because
n
is translation invariant, (
h
)

n
=
n
. Lets
look at (R
n
,
n
). It is elementary to check that for any h R
n
d(
h
)

n
d
n
(x) = exp(
n

i=1
h
i
x
i

1
2
n

i=1
h
2
i
).
The corresponding fact on the Wiener space (, W) is the Cameron-Martin
theorem. For

h

H and with

h
() = +
_
.
0
h = +

h
d(

h
)

W
dW
() = exp(
_
1
0
hd()
1
2
_
1
0
h
2
d).
7
Theorem 1 (IBP on the Wiener Space) Let h H, F S
2
. Then
E(< DF, h >
H
) = E(F
_
1
0
hd).
Proof: (1st variant following [Nualart]) By homogeneity, w.l.o.g. h = 1.
Furthermore, we can nd f such that F = f(W(h
1
), . . . , W(h
n
)), with (h
i
)
orthonormal in H and h = h
1
. Then, using classical IBP
E < DF, h > = E

i
f < h
i
, h >
=
_
R
n

1
f(x)(2)
n/2
e
|x|
2
/2
dx
=
_
R
n
f(x)(2)
n/2
e
|x|
2
/2
(x
1
)dx
=
_
R
n
f(x) x
1
d
n
= E(F W(h
1
))
= E(F W(h))
(2nd variant following an idea of Bismut, see [Bass]) We already saw in section
1.4 that for F S
1
the directional derivative in direction

h exists and coincides
with < DF, h >. For such F
_

F()dW() =
_

F(

h
() +

h)dW()
=
_

F( +

h)d(

h
)

W()
=
_

F( +
_
.
0
hd) exp(
_
1
0
hd +
1
2
_
1
0
h
2
d)dW(),
using Girsanovs theorem. Replace h by h and observe that the l.h.s. is inde-
pendent of . At least formally, when exchanging integration over and
d
d
at
= 0, we nd
_

_
< DF, h > F()
_
1
0
hd
_
dW() = 0
as required. To make this rigorous, approximate F by Fs which are, together
with DF
H
bounded on . Another approximation leads to S
2
-type function-
als. 2
Remarks: - IBP on the Wiener spaces is one of the cornerstones of the Malli-
avin Calculus. The second variant of the proof inspired the name Stochastic
Calculus of Variations: Wiener-paths are perturbed by paths

h(.). Stochastic
Calculus of Variations has (well, a priori) nothing to do with classical calculus
of variations.
- As before we can apply this result to a product FG, where both F, G S
2
.
This yields
E(G < DF, h >) = E(F < DG, h > +FGW(h)). (1.2)
8
1.7 Ito representation formula / Clark-Ocone-
Haussmann formula
As already mentioned DF L
2
([0, 1]) can be thought of a stochastic process.
Is it adapted? Lets see. Set
F(s) := E
s
(h) := exp(
_
s
0
hd
1
2
_
s
0
h
2
d),
an exponential martingale. F := E(h) := F(1) is not quite in S
2
but easily seen
to been in D
1,p
and, at least formally and d(t) -a.s.
D
t
F = e

1
2

1
0
h
2
d
D
t
(exp
_
1
0
hd)
= e

1
2

1
0
h
2
d
exp
_
_
1
0
hd
_
h(t)
= Fh(t)
The used chain-rule is made rigorous by approximation in S
2
using the partial
sums of the exponential.
As F contains information up to time 1, D
t
F is not adapted to F
t
but we can
always project down
E(D
t
F|F
t
) = E(F(1)h(t)|F
t
)
= h(t)E(F(1)|F
t
)
= h(t)F(t),
using the martingale-property of F(t). On the other hand, F solves the SDE
dF(t) = h(t)F(t)d(t)
with F(0) = 1 = E(F). Hence
F = E(F) +
_
1
0
h(t)F(t)d(t)
= E(F) +
_
1
0
E(D
t
F|F
t
)d(t) (1.3)
By throwing away some information this reads
F = E(F) +
_
1
0
(t, )d (1.4)
for some adapted process in L
2
([0, 1] ). We proved (1.4) for F of the
form E(h), sometimes called Wick-exponentials, call E the set of all such Fs.
Obviously this extends the the linear span (E) and by a density argument, see ??
for instance, to any F L
2
(, W). This is the Ito representation theorem.
Looking back to (1.3), we cant expect this to hold for any F L
2
(, W) since
D is only dened on the proper subset D
1,2
. However, it is true for F D
1,2
,
this is the Clark-Ocone-Haussmann formula.
9
Remarks: - In most books, for instance [Nualart], the proof uses the Wiener-
Ito-Chaos-decomposition, although approximation via the span(E) should work.
- A similar type of computations allows to compute, at least for F span(E),
2
and d(t) dW() a.s.
D
t
E(F|F
s
) = E(D
t
F|F
s
)1
[0,s]
(t).
In particular,
F is F
s
adapted D
t
F = 0 for Lebesgue-a.e.t > s. (1.5)
The intuition here is very clear: if F only depends on the early parts of the
paths up to time s, i.e. on {(s

) : s

s}, perturbing the pathes later on


(i.e. on t > s) shouldnt change a thing. Now recall the interpretation of
< DF, h >=
_
D
t
Fh(t)dt as directional derivatives in direction of the pertur-
bation

h =
_
.
0
hd.
- Comparing (1.4) and (1.3), the question arises what really happens for
F L
2
D
1,2
. There is an extension of D to D

, the space of Meyer-Watanabe-


distribution built on the space D

(introduced a little bit later in this text ),


and L
2
D

. In this context, (1.3) makes sense for all F L


2
, see [Uestuenel],
p42.
1.8 Higher derivatives
When f : R
n
U R then f = (
i
f) is a vectoreld on U, meaning that
at each point f(x) T
x
U

= R
n
with standard dierential-geometry notation.
Then (
ij
)f is a (symmetric) 2-tensoreld, i.e. at each point an element of
T
.
U T
.
U

= R
n
R
n
. As seen in section 1.5 the tangent space of corresponds
to H, therefore D
2
F (still to be dened!) should be a H H-valued r.v. (or
H

H to indicate symmetry). No need to worry about tensor-calculus in innite


dimension since H H

= L
2
([0, 1]
2
). For F S
2
(for instance), randomness
xed
D
2
s,t
F := D
s
(D
t
F)
is d
2
(s, t)-a.s. well-dened, i.e. good enough to dene an element of L
2
([0, 1]
2
).
Again, there is closability of the operator D
2
: L
p
(W) L
p
(W, H

H) to check,
leading to a maximal domain D
2,p
with associated norm
2,p
and the same
is done for higher derivatives. Details are in [Nualart],p26.
Remarks: - D
k,p
is not an algebra but
D

:=
k,p
D
k,p
is. As with the class of rapidly decreasing functions, underlying the tempered
distributions, D

can be given a metric and then serve to introduce continuous


functionals on it, the Meyer-Watanabe-distributions. This is quite a central
point in many exposures including [IW], [Oksendal2], [Ocone] and [Uestuenel].
- Standard Sobolev imbedding theorems, as for instance [RR], p215 tell us that
for U = R
n
W
k,p
(U) C
b
(U)
2
An extension to D
1,2
is proved in [Nualart], p32, via WICD.
10
whenever kp > dimU = n. Now, very formally, when n = one could
have a function in the intersection of all these Sobolev-spaces without achiev-
ing any continuity. And this is what happens on !!! For instance, taking
F = W(h), h H gives DF = h, D
2
F = 0, therefore F D

. On the other
hand, [Nualart] has classied those h for which a continuous choice of W(h)
exists, as those L
2
-functions that have a representative of bounded variation,
see [Nualart] p32 and the references therein.
1.9 The Skorohod Integral / Divergence
For simplicity consider p = 2, then
D : L
2
() D
1,2
L
2
(, H),
a densely dened unbounded operator. Let denote the adjoint operator, i.e.
for u Dom L
2
(, H)

= L
2
([0, 1] ) we require
E(< DF, u >
H
) = E(F(u)).
Remark: On (R
n
,
n
),
_
R
n
< f, u >
R
n d
n
=
_
R
n
f(divu)d
n
, (1.6)
this explains (up to a minus-sign) why is called divergence.
Take F, G S
2
, h H. Then (Fh) is easily computed using the IBP-
formula (1.2)
E((Fh)G) = E(< Fh, DG >)
= E(F < h, DG >)
= E(G < h, DF >) +E(FGW(h))
which implies
(Fh) = FW(h) < h, DF > (1.7)
Taking F 1 we immediatly get that coincides with the Ito-integral on
(deterministic) L
2
-functions. But we can see much more: take F F
r
-measurable,
h = 1
(r,s]
. We know from (1.5) that D
t
F = 0 for a.e. t > r. Therefore
< h, DF >=
_
1
0
1
[r,s]
(t)D
t
Fdt = 0,
i.e.
(Fh) = FW(h) = F(
s

r
) =
_
1
0
Fhd
by the very denition of the Ito-integral on adapted step-functions.
3
By an approximation, for u L
2
a
, the closed subspace of L
2
([0, 1]) formed
by the adapted processes, it still holds that
(u) =
_
1
0
u(t)d(t),
3
Also called simple processes. See [KS] for denitions and density results.
11
see [Nualart] p41 or [Uestuenel] p15.
The divergence is therefore a generalization of the Ito-integral (to non-
adapted integrands) and - in this context - called Skorohod-integral.
Remark: For u H, W(u) = (u) (1.7) also reads
(Fu) = F(u) < u, DF > (1.8)
and this relation stays true for u Dom(), F D
1,2
and some integrability
condition, see [Nualart], p40. The formal proof is simple, using the product
rule
E < Fu, DG > = E < u, FDG >
= E < u, D(FG) G(DF) >
= E[(u)FG < u, DF > G]
= E[(Fu < u, DF >)G].
1.10 The OU-operator
We found gradient and divergence on . On R
n
plugging them together yields
a positive operator (the negative Laplacian)
A = = div .
Here is an application. Again, we are on (R
n
,
n
), < ., . > denotes the inner
product on L
2
(R
n
).
f
2
W
1,2 = f
2
H
1 =
_
|f|
2
d
n
+
_
f
2
R
nd
n
=
_
|f|
2
d
n
+
_
fAfd
n
using (1.6)
= < f, f > + < Af, f >
= < (1 +A)f, f >
= < (1 +A)
1/2
f, (1 +A)
1/2
f > = (1 +A)
1/2
f
2
,
using the sqareroot of the positive operator (1 + A) as dened for instance by
spectal calculus. For p = 2 there is no equality but one still has

W
1,p (1 +A)
1/2

L
p,
when p > 1, see [Stein] p135.
Lets do the same on (, W), rst dene the Ornstein-Uhlenbeck operator
L := D.
Then the same is true, i.e. for 1 < p < .

1,p
(1 +L)
1/2

L
p
()
,
with equality for p = 2, the latter case is seen as before. This result is a corrolary
from the Meyer Inequalities. The proof is not easy and found in [Uestuenel]
p19, [Nualart] p61 or [Sugita] p37.
12
How does L act on a S
2
-type functional F = f(W(h
1
), . . . , W(h
n
)) where we
take w.l.o.g. the h
i
s orthonormal? Using DF =

(
i
f)h
i
and formula (1.7)
we get
LF =

i
fW(h
i
)

i
< D
i
f, h
i
>
=

i
fW(h
i
)

i,j

ij
f < h
j
, h
i
>
= (L
(n)
f)(W(h
1
), . . . , W(h
n
))
where L
(n)
is dened as the operator for functions on R
n
L
(n)
:=
n

i=1
[x
i

ii
]
= x .
Remarks: - Minus L
(n)
is the generator of the n-dimensional OU-process given
by the SDE
dx =

2d xdt.
with explicit solution
x(t) = x
0
e
t
+

2e
t
_
t
0
e
s
d(s) (1.9)
and for t xed x(t) has law N(x
0
e
t
, (1 e
2t
)Id).
- L plays the role the same role on (, W) as L
(n)
on (R
n
,
n
) or A =
on (R
n
,
n
).
- Here is some OU-calculus, (at least for) F, G S
2
L(FG) = F LG+G LF 2 < DF, DG >, (1.10)
as immediatly seen by (1.1) and (1.8).
- Some more of that kind,
(FDG) = F LG < DF, DG > . (1.11)
1.11 The OU-semigroup
We rst give a result from semigroup-theory.
Theorem 2 Let H be a Hilbert-space, B : H H be a (possibly unbounded,
densely dened) positive, self-adjoint operator. Then B is the innitesimal
generator of a strongly continuous semigroup of contractions on H.
Proof: (B) = (B)

and positivity implies that B is dissipative in


Pazys terminology. Now use Corollary 4.4 in [Pazy], page 15 (which is derived
from the Lumer-Phillips theorem, which is, itself, based on the Hille-Yoshida
13
theorem). 2
Applying this to A yields the heat-semigroup on L
2
(R
n
,
n
), applying it to
L
(n)
yields the OU-semigroup on L
2
(R
n
,
n
), and for L we get the OU-semigroup
on L
2
(, W).
Lets look at the OU-semigroup P
(n)
t
with generator L
(n)
. Take f : R
n
R,
say, smooth with compact support. Then it is well-known that, using (1.9),
(P
(n)
t
f)(x) = E
x
f(x(t)) =
=
_
R
n
f(e
t
x +
_
1 e
2t
y)d
n
(y).
is again a continuous function in x. (This property is summarized by saying
that P
(n)
t
is a Feller-semigroup.) Similarly, whenever F : R is nice enough
we can set
(P
t
F)(x) =
_

F(e
t
x +
_
1 e
2t
y)dW(y) (1.12)
=
_

F(xcos +y sin)dW(y)
A priori, this is not well-dened for F L
p
() since two W-a.s. identical Fs
could lead to dierent results. However, this does not happen:
Proposition 3 Let 1 p < . Then P
t
is a well-dened (bounded) operator
from L
p
() L
p
() (with norm 1).
Proof: Using Jensen and the rotational invariance of Wiener-measure, with
R(x, y) = (xcos +y sin, xsin +y cos ), we have
P
t
F
p
L
p
()
=
_

_
_

F(xcos +y sin)dW(y)

p
dW(x)

_
|F 1|(R(x, y))

p
d(W W)(x, y)
=
_

_
|F 1|(x, y)

p
d(W W)(x, y)
=
_

|F(x)|
p
d(W W)(x, y) =
_

|F(x)|
p
dW(x) = F
p
L
p
()
.
2
It can be checked that P
t
as dened via (1.12) and considered as operator
on L
2
(), coincides with the abstract semigroup provided by the theorem at
the beginning of this section. It suces to check that P
t
is a semigroup with
innitesimal generator L, the OU-operator, see [Uestuenel] p17.
Remark: P
t
is actually more than just a contraction on L
p
, it is hypercontractive
meaning that it increases the degree of integrability, see also [Uestuenel].
1.12 Some calculus on (R, )
From section 1.10,
(Lf)(x) := (L
(1)
)f(x) = xf

(x) f

(x).
14
Following in notation [Malliavin1], [Malliavin2], denote by f = f

the dierentiation-
operator and by

the adjoint operator on L


2
(). By standard IBP
(

f)(x) = f

(x) +xf(x).
Note that L =

. Dene the Hermite polynomials by


H
0
(x) = 1, H
n
=

H
n1
= (

)
n
1.
Using the commutation relation

= Id, an induction (one-line) proof


yields H
n
= nH
n1
. An immediate consequence is
LH
n
= nH
n
.
Since H
n
is a polynomial of degree n,
m
H
n
= 0 when m > n, therefore
< H
n
, H
m
>
L
2
()
= < H
n
, (

)
m
1 >
= < ()
m
H
n
, 1 >
= 0
On the other hand, since
n
H
n
= n!
< H
n
, H
n
>
L
2
()
= n!,
hence
_
1
(n!)
1/2
H
n
_
is a orthonormal system which is known to be complete, see
[Malliavin2] p7. Hence, given f L
2
() we have
f =

c
n
H
n
with c
n
=
1
n!
< f, H
n
> .
Assume that all derivatives are in L
2
, too. Then
< f, H
n
>=< f,

H
n1
>=< f, H
n1
>= . . . =<
n
f, 1 > .
Denote this projection on 1 by E(
n
f) and observe that it equals E((
n
f)(X))
for a std. gaussian X. We have
f =

n=0
1
n!
E(
n
f)H
n
. (1.13)
Apply this to f
t
(x) = exp(tx t
2
/2) where t is a xed parameter. Noting

n
f
t
= t
n
f
t
and E(
n
f
t
) = t
n
we get
exp(tx t
2
/2) =

n=0
t
n
n!
H
n
(x).
Remark: [Malliavin1], [Malliavin2] extend ,

, L in a straightforward-
manner to (R
N
,
N
) which is, in some sense, (, W) with a xed ONB in H.
15
1.13 Iterated Wiener-Ito integrals
There is a close link between Hermite polynomials and iterated Wiener-Ito in-
tegrals of the form
J
n
(f) :=
_

n
fd
n
:=
_
1
0
. . .
_
t
1
0
f(t
1
, . . . t
n
)d
t
1
...d
t
n
,
(well-dened) for f L
2
(
n
) where
n
:=
n
(1) := {0 < t
1
< . . . ... < t
n
<
1} [0, 1]
n
. Note that only integration over such a simplex makes sure that
every Ito-integration has an adapted integrand. Note that J
n
(f) L
2
(). A
straight-forward computation using the Ito-isometry shows that for n = m
E(J
n
(f)J
m
(g)) = 0
while
E(J
n
(f)J
n
(g)) =< f, g >
L
2
(
n
)
.
Proposition 4 Let h H with h
H
= 1. Let h
n
be the n-fold product, a
(symmetric) element of L
2
([0, 1]
n
) and restrict it to
n
. Then
n!J
n
(h
n
) = H
n
(W(h)). (1.14)
Proof: Set
M
t
:= E
t
(g) and N
t
:= 1 +

n=1
_

n
(t)
g
n
d
n
where g H. By the above orthonormality relations N
t
is seen to be in L
2
.
Moreover, both Y = M resp. N solve the integral equation,
Y
t
= 1 +
_
t
0
Y
s
g(s)d
s
.
By a unicity result for SDEs (OK, its just Gronwalls Lemma for the L
2
-norm
of M
t
N
t
) we see that W-a.s. M
t
= N
t
Now take f H with norm one. Use
the above result with g = f, t = 1
exp(
_
1
0
fd
1
2

2
) = 1 +

n=1

n
J
n
(f
n
), (1.15)
and using the generating function for the Hermite polynomials nishes the proof.
2
A simple geometric corollary of the preceeding is that for h, g both norm
one elements in H,
E(H
n
(W(h))H
m
(W(g)) = 0
if n = m and
E(H
n
(W(h))H
n
(W(g)) = n!(< h, g >
H
)
n
.
Remark: If it were just for this corollary, an elementary and simple proof is
contained in [Nualart].
16
1.14 The Wiener-Ito Chaos Decomposition
Set
C
n
:= {J
n
(f) : f L
2
(
n
)} (n
th
Wiener Chaos)
a family of closed, orthogonal subspaces in L
2
().
For F = E(h) L
2
() we know from the proof of proposition 4 that
F = 1 +

n=1
J
n
(h
n
) (orthogonal sum).
Less explicitly this is an orthogonal decompostion of the form
F = f
0
+

n=1
J
n
(f
n
)
for some sequence of f
n
L
2
(
n
). Clearly, this extends to span (E), and since
this span is dense in L
2
() this further extends to any F L
2
() which is the
same as saying that
L
2
() =

n=0
C
n
(orthogonal).
when setting C
0
the subspace of the constants. Indeed, assume that is a non-zero
element G (

C
n
)

, wlog of norm one. But there is a F span(E) (

C
n
)
arbitrarily close - contradiction. This result is called the Wiener-Ito Chaos
Decomposition.
Remarks: - A slightly dierent description of of the Wiener-Chaos,
C
n
= closure of span{J
n
(h
n
) : h
H
= 1}
= closure of span{H
n
(W(h)) : h
H
= 1}. (1.16)
The second equality is clear by (1.14). Denote by B
n
the r.h.s., clearly B
n
C
n
.
But since span (E)

B
n
, taking the closure yields

B
n
= L
2
, hence
B
n
= C
n
.
We now turn to the spectral decomposition of the OU-operator L
Theorem 5 Let
n
denote the orthogonal projection on C
n
, then
L =

n=1
n
n
.
Proof: Set X = W(h), Y = W(k) for two norm one elements in H,
a =< h, k >, F = H
n
(X). Then
E(LF, H
m
(Y ) = E < DH
n
(X), DH
m
(Y ) >
= E < nH
n1
(X)h, mH
m1
(Y )k > using H

n
= H
n
= nH
n
= nmaE(H
n1
(X), H
m1
(Y ))
which, see end of last section, is 0 when n = m and
nma(n 1)!a
n1
= nn!a
n
= nE(H
n
(X), H
m
(Y ))
17
otherwise i.e. when n = m. By density of the linear span of such H
n
(X)s the
result follows. 2
Another application of Hermite polynomials is the ne-structure of C
n
. Let
p : N N
0
such that |p| =

n
p(n) < . Fix a ONB e
i
for H and set
H
p
:=

n
H
p(n)
(W(e
n
)) (1.17)
well-dened since H
0
= 1 and p(n) = 0 but nitely often. Set p! =

p(n)!
Proposition 6 The set
{
1
(p!)
1/2
H
p
: |p| = n}
forms a complete orthonormal set for the n
th
Wiener-chaos C
n
.
Note that this proposition is true for any ONB-choice in H.
Proof: Orthonormality is quickly checked with the -properties of H
n
(W(h))
seen before. Next we show that H
p
C
n
. We do induction by N,the number
of non-trivial factors in (1.17). for N = 1 this is a consequence of (1.14). For
N > 1, H
p
splits up in
H
p
= H
q
H
i
with H
i
= H
i
(W(e
j
))
some i, j where H
q
S
2
is a Wiener-polynomial in which W(e
j
) does not appear
as argument. Randomness xed, it follows by the orthonormality of the e
i
s that
DH
q
e

j
hence DH
q
DH
i
.
By induction hypothesis, H
q
C
|q|
= C
ni
. Hence
LH
q
= (n i)H
q
using the the spectral decomposition of the OU-operator. By (1.10),
L(H
p
) = L(H
q
H
i
)
= H
q
LH
i
+H
i
LH
q
2 < DH
q
, DH
i
>
= H
q
(iH
i
) +H
i
(n i)H
q
= nH
p
,
hence H
p
C
n
. Introduce

C
n
, the closure of the span of all H
p
s with |p| = n.
We saw that

C
n
C
n
and we want to show equality. To this end, take any
F L
2
(, W) and set
f
k
:= E[F|(W(e
1
), . . . , W(e
k
))].
By martingale convergence, f
k
F in L
2
. Furthermore
f
k
= g
k
(W(e
1
), . . . , W(e
k
))
18
for some g
k
L
2
(R
k
,
k
) = (L
2
(R, ))
k
. Since the (simple) Hermite polyno-
mials form an ONB for L
2
(R, ) its k-fold tensor product has the ONB
{
1
(q!)
1/2
k

i=1
H
q(i)
(x
i
) : all multiindices q : {1, . . . , k} N}.
Hence
f
k

i=0

C
i
,
Set f
n
k
:=
n
f
k
, then we still have lim
k
f
n
k
= F, while f
n
k
C
n
for all k.
Therefore

C
n
= C
n
as claimed. 2
Remarks: - Compare this ONB for C
n
with (1.16). Choosing h = e
1
, e
2
, . . .
in that line will not span C
n
. The reason is that (e
n
i
)
i
is not a basis for H

n
,
the symmetric tensor-product space, whereas h
n
for all unit elements is a basis.
For instance, look at n = 2. A basis is (e
2
i
)
i
and (e
i

e
j
)
i,j
and
(e
i
+e
j
)
2
e
2
i
e
2
j
= e
i
e
j
+e
j
e
i
,
the last expression equals (up to a constant) e
i

e
j
.
- The link between Hermite-polynomials and iterated Wiener-Ito integrals, can
be extended to this setting. For instance,
H
p
= H
2
(W(e
1
)) H
1
(W(e
2
)) = (some constant) J
3
(e
1

e
1

e
2
).
There is surprisingly little found in books about this. Of course, its contained
in Itos original paper [Ito], but even [Oksendal2] p3.4. refers to that paper
when it comes down to it.
1.15 The Stroock-Taylor formula
Going back to the WICD, most authors prove it by an iterated application of
the Ito-representation theorem, see section 1.7. For instance, [Oksendal2], p1.4
writes this down in detail. Lets do the rst step
F = EF +
_
1
0

t
d
t
= EF +
_
1
0
_
E(
t
) +
_
t
0

s,t
d
s
_
d
t
= EF +
_
1
0
E(
t
)d
t
+
_

2
(s, t, )d
s
d
t
= f
0
+J
1
(f
1
) +
_

2
(s, t, )d
s
d
t
when setting f
0
= E(F), f
1
= E(). Its not hard to see that
_

2
(s, t, )d
s
d
t
is orthogonal to C
0
and C
1
(the same proof as for deterministic integrands -
it always boils down to the fact that an Ito-integral has mean zero), hence
19
we found the rst two fs of the WICD. But we also saw in section 1.7 that

t
= E(D
t
F|F
t
), hence
f
1
(t) = E(D
t
F),
d(t)-a.s. and for F D
1,2
. Similarly,
f
2
(s, t) = E(D
2
s,t
F)
d
2
(s, t)-a.s. and so for higher f
n
s, provided all necessary Malliavin-derivatives
of F exist. We have
Theorem 7 (Stroock-Taylor) Let F
k
D
k,2
, then the following rened
WICD holds,
F = EF +

n=1
J
n
(E(D
n
F))
= EF +

n=1
1
n!
I
n
(E(D
n
F))
where
I
n
(f) :=
_
[0,1]
n
fd
n
:= n!J
n
(f)
for any f L
2
(
n
) (or symmetric f L
2
[0, 1]
n
), this notation only introduced
here because of its current use in other texts.
Example: Consider F = f(W(h)) with h
H
= 1 a smooth function f which
is together with all its derivatives in L
2
(). By iteration,
D
n
F = (
n
f)(W(h))h
n
,
hence
E(D
n
F) = h
n
E((
n
f)(W(h))
= h
n
E(
n
f)
where we use the notation from 1.12,
E(f) =
_
fd.
Then
J
n
(E(D
n
F)) = E(
n
f)J
n
(h
n
)
= E(
n
f)
1
n!
H
n
(W(h)),
and Stroock-Taylor just says
f(W(h)) = E(f) +

n=1
1
n!
E(
n
f)H
n
(W(h))
which is, unsurprisingly, just (1.13) evaluated at W(h).
20
Chapter 2
Smoothness of laws
2.1
Proposition 8 Let F = (F
1
, ..., F
m
) be an m-dimensional r.v. Suppose that
for all k and all multiindices with || = k there is a constant c
k
such that for
all g C
k
(R
m
)
|E[

g(F)]| c
k
g

. (2.1)
Then the law of F has a C

density.
Proof: Let (dx) = P(F dx) and its Fourier-transform. Fix u R
m
and
take g = exp(i < u, >). Then, when || = k,
|u

|| (u)| = |E[

g(F)]| c
k
.
For any integer l, by choosing the right s of order l and maximising the l.h.s
we see that
( max
i=1,...,m
|u
i
|)
l
(u)| c
l
Hence, at innity, (u) decays faster than any polynomial in |u|. On the other
hand, as F-transform is bounded (by one), therefore L
1
(R
m
). By standard
Fourier-transform-results we have
F
1
( ) =: f C
0
(R
m
)
and since

f = , by uniqueness, d = fd
m
. Replacing by +(0, . . . , 0, l, 0, . . . , 0)
we have
|u
i
|
l
|u

||

f(u)| c
k+l
But since |u

||

f(u)| = |

f| we conclude as before that

f C
0
. 2
Remark: - Having (2.1) only for k m+ 1 you can still conclude that
(u) = O(
1
|u|
m+1
) and hence in L
1
, therefore d = fd for continuous f. How-
ever, as shown in [Malliavin1], having (2.1) only for k = 1, i.e. only involving
rst derivatives, one still has d = fd
m
for some f L
1
(R
m
).
Now one way to proceed is as follows: for all i = 1, . . . , m let F
i
D
1,2
(for the
moment) and take g : R
m
R as above. By an application of the chain-rule, j
xed,
< Dg(F), DF
j
> = <
i
g(F)DF
i
, DF
j
>
=
i
g(F) < DF
i
, DF
j
>
21
Introducing the Malliavin covariance matrix

ij
=< DF
i
, DF
j
> (2.2)
and assuming that

1
exists W a.s. (2.3)
this yields a.s.

i
g(F) = (
1
)
ij
< Dg(F), DF
j
>
= < Dg(F), (
1
)
ij
DF
j
>
and hence
E[
i
g(F)] = E < Dg(F), (
1
)
ij
DF
j
>
= E[g(F), ((
1
)
ij
DF
j
)]
by denition of the divergence while hoping that (
1
)
ij
DF
j
Dom. In this
case we have
E[
i
g(F)] g

E[((
1
)
ij
DF
j
)]
and we can conclude that F has a density w.r.t. Lebesgue measure
m
. With
some additional assumptions this outline is made rigorous:
1
Theorem 9 Suppose F = (F
1
, . . . , F
m
), F
i
D
2,4
and
1
exists a.s. Then
F has a density w.r.t. to
m
.
Under much stronger assumptions we have the following result.
Theorem 10 Suppose F = (F
1
, . . . , F
m
) D

and
1
L
p
for all p then F
has a C

-density.
For reference in the following proof,
D(g(F)) =
i
g(F)DF
i
(2.4)
L(g(F) =
i
g(F)LF
i

ij
g(F)
ij
(2.5)
L(FG) = FLG+GLF 2 < DF, DG > (2.6)
the last equation was already seen in (1.10). The middle equation is a simple
consequence of the chain-rule (2.4) and (1.8).
Also, D, L and E are extended compontentwise to vector- or matrix-valued r.v.,
for instance < DF, DF >= .
Proof: Since 0 = D(
1
) = L(
1
) we have
D(
1
) =
1
(D)
1
and
L(
1
) =
1
(L)
1
2 <
1
D,
1
(D)
1
>
Take a (scalar-valued) Q D

(at rst reading take Q = 1) and a smooth


function g : R
m
R. Then
E[
1
< DF, D(g F) > Q] = E[
1
< DF, DF > (g F)Q]
= E[(g F)Q]. (2.7)
1
[Nualart], p81
22
We also have
L(F(g F)) = F(L(g F)) + (LF)(g F) 2 < DF, D(g F) > .
This and the self-adjointness of L yields
E[
1
< DF, D(g F) > Q] =
1
2
E[
1
{L(F(g F)) +F(L(g F)) + (LF)(g F)}Q]
=
1
2
E[F(g F)L(
1
Q) + (g F)L(
1
FQ) + (g F)
1
(LF)Q]
= E[(g F)R(Q)] (2.8)
with a random vector
R(Q) =
1
2
[FL(
1
Q) +L(
1
FQ) +
1
(LF)Q].
From the vector-equality (2.7) = (2.8)
E[(
i
g F)Q] = E[(g F){e
i
R(Q)}],
with i
th
unit-vector e
i
. Now the idea is that together with the other assumptions
Q D

implies (componentwise) R(Q) D

. To see this you start with


proposition 3 but then some more information about L and its action on D

is
required. We dont go into details here, but see [Bass] and [IW].
The rest is easy, taking Q = 1 yields
|E[
i
g F]| c
1
g

and the nice thing is that we can simply iterate: taking Q = e


j
R(1) we get
E[
ji
g F] = E[(
i
g F)(e
j
R(1))] = E[(g F)e
i
R(e
j
R(1))]
and you conclude as before. Obviously we can continue by induction. Hence,
by the rst proposition of this section we get the desired result. 2
23
Chapter 3
Degenerated Diusions
3.1 Malliavin Calculus on the d-dimensional Wiener
Space
Generalizing the setup of Chapter 1, we call
= C([0, 1], R
d
)
the d-dimensional Wiener Space. Under the d-dimensional Wiener measure on
the coordinate process becomes a d-dimensional Brownian motion, (
1
, . . . ,
d
).
The reproducing kernel space is now
H = L
2
([0, 1], R
d
) = L
2
[0, 1] . . . L
2
[0, 1] (d copies).
As in Chapter 1 the Malliavin derivate of a real-valued r.v. X can be considered
as a H-valued r.v. Hence we can write
DX = (D
1
X, . . . , D
d
X).
For a m-dimensional random variable X = (X
i
) set
DX = (D
j
X
i
)
ij
,
which appears as a (md)-matrix of L
2
[0, 1]-valued r.v. The Malliavin covari-
ance matrix, as introduced in Chapter 2, reads

ij
=< DX
i
, DX
j
>
H
=
d

k=1
< D
k
X
i
, D
k
X
j
>
L
2
[0,1]
,
or simply
=< DX, (DX)
T
>
L
2
[0,1]
. (3.1)
3.2 The problem
Given vector-elds A
1
, . . . , A
d
, B on R
m
consider the SDE
dX
t
= A
j
(X
t
)d
j
t
+B(X
t
)dt (3.2)
24
For some xed t > 0 (and actually t 1 due to our choice of ) we want to
investigate the regularity of the law of X(t), i.e. existence and smoothness of
a density with repect to
m
on R
m
. We assume all the coecients as nice as
we need (smooth, bounded, bounded derivatives etc). Indeed, the degeneration
we are interested in lies somewhere else: taking all coecients zero, the law of
X(t) is just the Dirac-measure at X(0) = x, in particular there doesnt exist a
density.
3.3 SDEs and Malliavin Calculus, the 1-dimensional
case
For simplicity take m = d = 1 and consider
X
t
= x +
_
t
0
a(X
s
)d
s
+
_
t
0
b(X
s
)ds. (3.3)
Our try is to assume
1
that all X
s
are in the domain of D and then to bring D
under the integrals. To this end recall from section 1.7 that for xed s and a
F
s
-measureable r.v. F one has D
r
F = 0 for -a.e. r > s.
Let u(s, ) be some F
s
-adapted process, and let r t. Then
D
r
_
1
0
u(s)ds =
_
t
0
D
r
u(s)ds =
_
t
r
D
r
u(s)ds,
the rst step can be justied by a R-sum approximation and the closedness of
the operator D. The stochastic integral is more interesting, we restrict ourself
to a simple adapted process
2
of the form
u(t, ) = F() h(t)
with h(t) = 1
(s
1
,s
2
]
(t) and F
s
1
-measurable F. Again, let r t. Then
D
r
_
t
0
Fh(s)d(s) = D
r
_
_
[0,r)
Fh(s)d(s) +
_
[r,t]
Fh(s)d(s)
_
= 0 +D
r
_
1
0
Fh(s)1
[r,t]
(s)d(s)
= D
r
_
FW(h1
[r,t]
)

= (D
r
F)W(h1
[r,t]
) +Fh(r)
=
_
1
0
D
r
Fh1
[r,t]
(s)d(s) +u(r)
= u(r) +
_
t
r
D
r
u(s)d(s) ()
Let us comment on this result. First, if it makes you uncomfortable that our
only a.s.-welldened little r pops up in intervals, rewrite the preceding com-
pution in integrated form, i.e. multiply everything with some arbitrary deter-
ministic L
2
[0, 1]-function k = k(r) and integrate r over [0, 1]. (Hint: interchange
1
For a proof see [IW], p393.
2
We already proceeded like this in section 1.9 when computing (u).
25
integration w.r.t d
s
and dr).
Secondly, a few words about (). The reduction from
_
t
0
on the l.h.s. to
_
t
r
at
the end is easy to understand - see the recall above. Next, taking t = r + we
can, at least formally, reduce () to u(r) alone. Also, the l.h.s. is easily seen to
equal D
r
_
t
r
. That is, when operating D
r
on
_
r+
r
ud we create somehow a
Dirac point-mass
r
(s). But that is not surprising! Formally, D
r
Y =< Y,
r
>
corresponding to a (non-admissible!) perturbation of by a Heaviside-function
with jump at r, say H( r) with derivative
r
. Now, very formally, we interpret
as Brownian path perturbed in direction H( r.) Taking dierentials for use
in the stochastic integral we nd the Dirac mass
r
appearing.
(A detailed proof is found in [Oksendal2], corollary 5.13.)
Back to our SDE, applying these results to (3.3) we get
D
r
X
t
= a(X
r
) +
_
t
r
D
r
a(X
s
)d
s
+
_
t
r
D
r
b(X
s
)ds
= a(X
r
) +
_
t
r
a

(X
s
)D
r
X(s)d
s
+
_
t
r
b

(X
s
)D
r
X(s)ds
Fix r and set

X := D
r
X. We found the (linear!) SDE
d

X
t
= a

(X
t
)

X
t
d
t
+b

(X
t
)

X
t
dt, t > r (3.4)
with initial condition

X
r
= a(X
r
).
3.4 Stochastic Flow, the 1-dimensional case
A similar situation occurs when investigating the sensitivity of (3.3) w.r.t. the
initial condition X(0) = x. Set
Y (t) =

x
X(t).
(A nice version of) X(t, x) is called stochastic ow.
A formal computions (see [Bass], p30 for a rigorous proof) gives the same
SDE
dY
t
= a

(X
t
)Y
t
d
t
+b

(X
t
)Y
t
dt, t > 0
and clearly Y (0) = 1. Matching this with (3.4) yields
D
r
X(t) = Y (t)Y
1
(r)a(X(r)). (3.5)
Remark: In the multidimensional setting note that for xed
D
r
X(t) R
md
while
Y (t) R
mm
.
( [Bass] actually makes the choice m = d for a simpler exposure.)
26
3.5 SDE/ows in multidimensional setting
Rewrite (3.2) in coordinates
dX
i
= A
i
k
(X)d
k
+B
i
(X)dt, i = 1, . . . , m (3.6)
with initial condition X(0) = x = (x
j
) R
m
. Set
(Y )
ij
=
j
X
i


x
j
X
i
.
As before (formally)
d
j
X
i
=
l
A
i
k

j
X
l
d
k
+
l
B
i

j
X
l
dt
To simplify notation, for any vector-eld V on R
m
, considered as map R
m

R
m
, we set
3
(V )
ij
=
j
V
i
. (3.7)
This yields the following (mm)-matrix SDE
dY = A
k
(X)Y d
k
+B(X)Y dt
Y (0) = I
and there is no ambiguity in this notation. Note that this is (as before) a linear
SDE. We will be interested in the inverse Z := Y
1
. As a motivation, consider
the following 1-dimensional ODE
dy = f(t)ydt
Clearly z = 1/y satises
dz = f(t)zdt.
We can recover the same simplicity in the multidimensional SDE case by using
Strantonovich Calculus, a rst-order stochastic calculus.
3.6 Stratonovich Integrals
3.6.1
Let M, N be continuous semimartingales, dene
4
_
t
0
M
s
dN
s
=
_
t
0
M
s
dN
s
+
1
2
< N, M >
t
resp.
M
t
dN
t
= M
t
dN
t
+
1
2
d < N, M >
t
.
The Ito-formula becomes
f(M
t
) = f(M
0
) +
_
t
0
f

(M
s
) dM
s
. (3.8)
3
If you know classical tensor-calculus it is clear that
j
V
i
corresponds to a matrix where
i represent lines and j the columns.
4
Do not mix up the bracket with the inner product on Hilbert Spaces.
27
See [Bass]p27 or any modern account on semimartingales for these results. A
special case occurs, when M is given by the SDE
dM
t
= u
t
d
t
+v
t
dt or dM
t
= u
t
d
t
+ v
t
dt
Then
M
t
d
t
= M
t
d
t
+
1
2
u
t
dt. (3.9)
One could take this as a denition ( [Nualart], p21 does this).
3.6.2
Of course there is a multidimensional version of (3.8) (write it down!). For
instance, let V : R
m
R
m
and X some m-dimensional process, then
dV (X) = (V )(X) dX. (3.10)
It also implies a rst order product rule
d(MN) = N dM +M dN
where M, N are (real-valued) semi-martingales.
For later use we discuss a slight generalization. Let Y, Z be two matrix-
valued semi-martingales (with dimensions such that Y Z makes sense). Dene
d(Y Z) component-wise. Then
d(ZY ) = (dZ)Y +Z dY. (3.11)
This might look confusing at rst glance, but it simply means
Z
i
k
(t)Y
k
j
(t) = Z
i
k
(0)Y
k
j
(0) +
_
t
0
Y
k
j
dZ
i
k
+
_
t
0
Z
i
k
dY
k
j
.
3.6.3
Let M, N, O, P be semimartingales and
dP = NdO.
Then it is well-known that
5
MdP = MNdO. (3.12)
A similar formula, less well-known, holds for Stratonovich dierentials. Let
dP = N dO
then
M dP = MN dO. (3.13)
5
[KS], p145.
28
Proof: The equals MdP +
1
2
d < M, P >= M(NdO +
1
2
d < N, O >) +
1
2
d <
M, P > so the only thing to show is
Md < N, O > +d < M, P >= d < MN, O > .
Now d < M, P >= N < M, O > ( [KS], p143). On ther other hand
d(MN) = MdN +NdM +d(bounded variation)
shows d < MN, O >= Md < N, O > +N < M, O > (since the bracket kills the
bounded variation parts) and we are done. 2
3.7 Some dierential geometry jargon
3.7.1 Covariante derivatives
Given two smooth vectorelds V , W (on R
m
) and using (3.7)
(V )W = W
j

j
V
i

i
,
where we follow the dierential-geometry usage to denote the basis by (
1
, . . . ,
m
).
This simply means that (V )W is a vector whose ith component is W
j

j
V
i
.
Also, we recognize directional derivatives (in direction W) on the r.h.s. In Rie-
mannian geometry this is known as the covariante derivative
6
of V in direction
W. A standard notation is

W
V = W
j

j
V
i

i
.
is called connection.
3.7.2 The Lie Bracket
Let V, W be as before. It is common in Dierential Geometry that a vector
V (x) = (V
i
(x)) is identied with the rst order dierential operator
V (x) = V
i
(x)
i
|
x
.
Consider the ODEs on R
m
given by
dX = V (X)dt.
It is known
7
that there exists (at least locally) an intergral curve. More pre-
cisely, for every x R
n
there exists some open (time) interval I
x
around 0 and a
smooth curve X
x
: I(x) R
m
which satises the ODE and the initial condition
X
x
(0) = x. By setting
V
t
(x) = X
x
(t)
we obtain a so-called local 1-parameter group. For t xed V
t
() is a dieomor-
phism between appropriate open sets. [Warner] p37 proves all this, including
existence, on a general manifold.
6
On a general Riemannian manifold there is an additional term due to curvature. Cleary,
curvature is zero on R
m
.
7
A simple consequence of the standard existence/uniqueness result for ODEs.
29
Consider a second ODE, say dY = W(Y )dt with local one-parameter group
W
t
(). Then, for t small enough everyhting exists and a second order expansion
yields
W
t
V
t
W
t
V
t
(x) t
2
.
Dividing the l.h.s. by t
2
and letting t 0 one obtains a limit in R
m
depending
on x, say [V, W](x), the so-called Lie Bracket. We see that it is measures how
two ows lack to commute (innitesimaly).
Considering [V, W] as rst order operator one actually nds
[V, W] = V W W V,
where the r.h.s. is to be understood as composition of dierential operators.
Note that the r.h.s. is indeed a 1
st
order operator, since
ij
=
ji
(when
operating on smooth functions as here). We see that the Lie bracket measures
how much two ows lack to commute.
It is immediate to check that
8
[V, W] =
V
W
W
V
= (W)V (V )W
Generally speaking, whenever there are two vectorelds mixed together
the Lie bracket is likely to appear.
Example: Let A be a vectoreld. Inspired by section 3.5 consider
dX = A(X)dt, X(0) = x
and
dY = A(X)Y dt, Y (0) = I
Consider the matrix-ODE
dZ = ZA(X)dt, Z(0) = I. (3.14)
By computing d(ZY ) = (ZA(X)dt)Y + Z(A(X)Y dt = 0 we see that Y
1
exists for all times and Z = Y
1
. Without special motivation, but for later use
we compute
d[Z
t
V (X
t
)] = (dZ
t
)V (X
t
) +Z
t
dV (X
t
)
=
_
ZA(X
t
)V (X
t
) +Z
t
V (X
t
)A(X
t
)

dt
= Z
_
V (X
t
)A(X
t
) A(X
t
)V (X
t
)

dt
= Z[A, V ](X
t
)dt (3.15)
3.8 Our SDEs in Stratonovich form
Recall
dX = A
k
(X)d
k
+B(X)dt
= A
k
(X) d
k
+A
0
(X)dt (3.16)
8
In Riemannian geometry, the rst equation is known as the torsion-free-property of a
Riemanniann connection .
30
with X(0) = x. It is easy to check that
A
i
0
= B
i

1
2
A
j
k

j
A
i
k
, i = 1, . . . , m.
In the notations introduced in the last 2 sections,
A
0
= B
1
2
(A
k
)A
k
= B
1
2

A
k
A
k
.
With Y dened as as in section 3.5 we obtain
dY = A
k
(X)Y d
k
+A
0
(X)Y dt
Y (0) = I
and Z = Y
1
exists for all times and satises a generalized version of (3.14),
dZ = ZA
k
(X) d
k
ZA
0
(X)Zdt
Z(0) = I. (3.17)
The proof goes along (3.14) using (3.11): Since we already discussed the deter-
ministic version we restrict to the case where A
0
0. Then
d(ZY ) = (dZ)Y +Z dY
= ZA
k
(X)Y d
k
+ZA
k
(X)Y d
k
= 0
(References for this and the next section are [Bass] p199-201, [Nualart]
p109-p113 and [IW] p393.)
3.9 The Malliavin Covariance Matrix
Dene the (md) matrix
= (A
1
| . . . ...|A
d
).
Then a generalization of (3.5) holds (see [Nualart] p109 for details)
D
r
X(t) = Y (t)Y
1
(r)(X
r
)
= Y (t)Z(r)(X
r
),
and D
r
X(t) appears at a (random) R
md
-matrix as already remarked at the end
of section 3.4. Fix t and write X = X(t). From (3.1), the Malliavin covariance
matrix equals
=
t
=
_
1
0
D
r
X(D
r
X)
T
dr
= Y (t)
_
_
t
0
Z(r)(X
r
)
T
(X
r
)Z
T
(r)dr
_
Y
T
(t). (3.18)
31
3.10 Absolute continuity under Hormanders con-
dition
We need a generalization of (3.15).
Lemma 11 Let V be a smooth vector eld on R
m
. Let X and Z be processes
given by the Stratonovich SDEs (3.16) and (3.17). Then
d(Z
t
V (X
t
)) = Z
t
[A
k
, V ](X
t
) d
k
+Z
t
[A
0
, V ](X
t
)dt
= Z
t
[A
k
, V ](X
t
)d
k
(3.19)
+Z
t
_
1
2
[A
k
, [A
k
, V ]] + [A
0
, V ]]
_
(X
t
)dt. (3.20)
First observe that the second equality is a simply application of (3.9) and (3.19)
with V replaced by [A
k
, V ]. To see the rst equality one could just point at
(3.15) and argue with 1st order Stratonovich Calculus . Here is a rigorous
Proof: Since the deterministic case was already condidered in (3.15) we take
w.l.o.g A
0
0. Using (3.10) and (3.11) we nd
d(ZV (X)) = (dZ)V (X) +Z dV
= (ZA
k
V ) d
k
+ (ZV A
k
) d
k
= Z
_
[A
k
, V ] |
X
_
d
k
2
If you dont like the Strantonovich dierentials, a (straight forward but quite
tedious) compution via standard Ito calculus is given in [Nualart], p113.
Corollary 12
9
Let be a stopping time and y R
m
such that
< Z
t
V (X
t
), y >
R
m 0 for t [0, ].
Then for i = 0, 1, . . . , d
< Z
t
[A
i
, V ](X
t
), y >
R
m 0 for t [0, ].
Proof: Lets prove
Z
t
V (X
t
) 0 Z
t
[A
i
, V ](X
t
) 0.
(The proof of the actual statement goes along the same lines.) First, the as-
sumption implies that
Z
t
[A
k
, V ](X
t
)d
k
+Z
t
_
1
2
[A
k
, [A
k
, V ]] + [A
0
, V ]]
_
(X
t
)dt 0 for t [0, ].
By uniqueness of semimartingale-decomposition into (local) martingale and and
bounded variation part we get (always for t [0, ])
Z
t
[A
k
, V ](X
t
) 0 for k = 1, . . . , d
and
Z
t
_
1
2
[A
k
, [A
k
, V ]] + [A
0
, V ]]
_
(X
t
)dt 0
9
Compare [Bell], 75
32
By iterating this argument on the rst relation
Z
t
[A
k
, [A
j
, V ]](X
t
) 0 for k, j = 1, . . . , d
and together with the second relation we nd
Z
t
[A
0
, V ](X
t
) 0
and we are done. 2
In the following the range denotes the image (R
m
) R
m
of some (random,
time-dependent) mm - matrix .
Theorem 13 Recalling X(0) = x, for any t > 0 it holds that
span
_
A
1
|
x
, . . . , A
d
|
x
, [A
j
, A
k
] |
x
, [[A
j
, A
k
], A
l
] |
x
; j, k, l = 0, . . . , d
_
range
t
a.s.
Proof: For all s t dene
R
s
= span {Z(r)A
i
(X
r
) : r [0, s], i = 1, . . . , d}
and
R = R() =

s>0
R
s
.
We claim that R
t
= range
t
. From (3.18) it follows that
range
t
= range
_
t
0
Z(r)(X
r
)
T
(X
r
)Z
T
(r)ds (3.21)
and since, any r t xed, span {Z(r)A
i
: i = 1, . . . , d} = range Z(r)(X
r
)
range Z(r)(X
r
)
T
(X
r
)Z
T
(r) the inclusion R
t
range
t
is clear. On the other
hand take some v R
m
orthgonal to range
t
. Clearly
v
T

t
v = 0.
Using (3.21) we actually have
_
t
0
|v
T
Z
s
(X
s
)|
2
R
m =
d

k=1
_
t
0
|v
T
Z
r
A(X
k
)|
2
= 0
Since every diusion path X
k
() is continuous we see that the whole integrand
is continuous and we deduce that, for all k and r t,
v Z
r
A
k
(X
r
).
We showed ( Range
t
)

t
and hence the claim is proved.
Now, by Blumenthals 0-1 law there exists a (deterministic) set

R such that

R = R() a.s. Suppose that y



R

. Then a.s. there exists a stopping time


such that R
s
=

R for s [0, ]. This means that for all i = 1, . . . , d and for all
s [0, ]
< Z
s
A
i
(X
s
), y >
R
m= 0.
33
or simply y Z
s
A
i
(X
s
). Moreover, by iterating Corollary 12 we get
y Z
s
[A
j
, A
k
], Z
s
[[A
j
, A
k
], A
l
], . . .
for all s [0, ]. Calling S the set appearing in the l.h.s. of (3.21) and using
the last result at s = 0 shows that y S

. So we showed S

R. On the other
hand, it is clear that a.s.

R R
t
= Range
t
as we saw earlier. The proof is
nished. 2
Combing this with Theorem (9) we conclude
Theorem 14 Let A
0
, . . . , A
d
be smooth vector elds (satisfying certain bound-
edness conditions
10
) on R
m
which satisfy Hormanders condition (H1)
that is that
11
A
1
|
x
, . . . , A
d
|
x
, [A
j
, A
k
] |
x
, [[A
j
, A
k
], A
l
] |
x
. . . ; j, k, l = 0, . . . (3.22)
span the whole space R
m
. Equivalently we can write
Lie {A
1
|
x
, . . . , A
d
|
x
, [A
1
, A
0
]|
x
, . . . , [A
d
, A
0
]|
x
} = R
m
. (3.23)
Fix t > 0 and let X
t
be the solution of the SDE
dX
t
=
d

k=1
A
k
(X
t
) d
t
+A
0
(X
t
)dt. X(0) = x.
Then the law of X(t), i.e. the measure P[X(t) dy], has a density w.r.t. to
Lebesgue-measure on R
m
.
3.11 Smoothness under Hoermanders condition
Under essentially the same hypothesis as in the last theorem
12
one actually has
a smooth density of X(t), i.e. C

(R
m
). The idea is clearly to use Theorem
10, but there is some work to do. We refer to [Norris] and [Nualart], [Bass]
and [Bell].
3.12 The generator
It is well-know that the generator of a (Markov) process given by the SDE
dX = A
k
(X) d
k
+A
0
(X)dt
= A
k
(X)d
k
+B(X)dt
= d +B(X)dt. (3.24)
is the second order dierential operator
L =
1
2
E
ij

ij
+B
i

i
(3.25)
10
Bounded and bounded derivatives will do - we have to guarantee existence and uniqueness
of X and Y as solution of the corresponding SDEs.
11
Note that A
0
|
x
alone is not contained in the following list while it does appear in all
brackets.
12
One requires that the vectorelds have bounded derivatives of all orders, since higher-order
analogues to Y come into play.
34
with (mm) matrix E =
T
(or E
ij

d
k=1
A
i
k
A
j
k
. in coordinates). Identi-
fying a vector eld, say V , with a rst order dierntial operator, the expression
V
2
= V V makes sense as a second order dierential operator. In coordinates,
V
i

i
(V
j

j
) = V
i
V
j

ij
+V
j
(
j
V
i
)
i
.
Note the last term on the r.h.s is the vector V V =
V
V . Replacing V by A
k
and summing over all we see that
E
ij

ij
=
d

k=1
A
2
k

A
k
A
k
.
We recall (see chapter 3.8) that A
0
= B
1
2

A
k
A
k
. Hence
L =
1
2
d

k=1
A
2
k
+A
0
. (3.26)
Besides giving another justication of the Stratonovich calculus, it is impor-
tant to notice that this sum-of-square-form is invariant under coordinate-
transformation, hence a suited operator for analysis on manifolds.
3.13
Example 1 (bad): Given two vectorelds on R
2
(in 1st order di. operator
notation)
A
1
= x
1

1
+
2
, A
2
=
2
set
L =
1
2
(A
2
1
+A
2
2
).
Expanding,
(x
1

1
+
2
)
2
= x
2
1

11
+x
1

1
+ 2x
1

12
+ 2
22
,
yields
L =
1
2
E
ij

ij
+b
i

i
with
E =
_
x
2
1
x
1
x
1
2
_
and B = (x
1
, 0)
T
. Now E =
T
with
=
_
x
1
0
1 1
_
and the associated diusion process is
dX
t
= (X
t
)d
t
+B(X
t
)dt
= A
1
(X
t
)d
1
t
+A
2
(X
t
)d
2
t
+B(X
t
)dt.
We see that when we start from the x
2
-axis i.e. on {x
1
= 0} there is no drift,
B 0, and both brownian motions push us around along direction x
2
, therefore
35
no chance of ever leaving this axis again. Clearly, in such a situation, the law
of X(t) is singular with respect to Lebesgue-measure on R
2
.
To check Hormanders condition H1 compute
[A
1
, A
2
] = (x
1

1
+
2
)
2

2
(x
1

1
+
2
)
= 0
therefore the Lie Algebra generated by A
1
and A
2
simply equals the span
{A
1
, A
2
} and is not the entire of R
2
when evaluated at the degenerated area
{x
1
= 0} - exactly as expected.
Example 2 (good): Same setting but
A
1
= x
2

1
+
2
, A
2
=
2
.
Again,
L =
1
2
(V
2
1
+V
2
2
)
=
1
2
a
ij

ij
+b
i

i
Similarly we nd
dX
t
= A
1
(X
t
)d
1
t
+A
2
(X
t
)d
2
t
+B(X)dt.
with drift B = (1, 0)
T
.
The situation looks similar. On the x
1
-axis where {x
2
= 0} we have A
1
= A
2
,
therefore diusion happens in x
2
-direction only. However, when we start at
{x
2
= 0} we are pushed in x
2
-direction and hence immediatly leave the degen-
erated area.
To check Hormanders condition H1 compute
[V
1
, V
2
] = (x
2

1
+
2
)
2

2
(x
1

1
+
2
)
=
1
.
See that
span(V
2
, [V
1
, V
2
]) = R
2
for all points and our Theorem 14 applies.
Example 3 (How many driving BM?):
Consider the m = 2-dimensioal process driven by one BM (d = 1),
dX
1
= d,
dX
2
= X
1
dt.
From this extract A
1
=
1
, drift A
0
= x
1

2
and since [A
1
, A
0
] =
2
Hormanders
condition holds for all points on R
2
. Actually, it is an easy exercise to see that
(X
1
, X
2
) is a zero mean Gaussian process with covariance matrix
_
t t
2
/2
t
2
/2 t
3
/3
_
.
36
Hence we can write down explicitly a density with repsect to 2-dimensional
Lebesgue-measure. Generally, one BM together with the right drift is enough
for having a density.
Example 3 (Is Hormanders condition necessary?):
No! Take f : R R smooth, bounded etc such that f
(n)
(0) = 0 for all
n 0 (in particular f(0) = 0) and look at
2L =
_

1
_
2
+
_
f(x
1
)
2
_
2
.
as arising from m = d = 2, A
1
=
1
, A
2
= f(x
1
)
2
. Check that A
2
, [A
1
, A
2
], ...
are all 0 when evaluated at x
1
= 0 (simply because the Lie-brackets make all
derivatives of f appear.) Hence Hormanders condition is not satised when
starting from the degenerated region {x
1
= 0}. On the other hand, due to
A
1
we will immediatly leave the degenerate region and hence there is a density
(some argument as in example 2).
37
Chapter 4
Hypelliptic PDEs
4.1
Let V
0
, . . . , V
d
be smooth vectorelds on some open U R
n
, let c be a smooth
function on U. Dene the second order dierential operator (where c operates
by multiplication)
G :=
d

k=1
V
2
k
+V
0
+c.
Let f, g D

(U), assume
Gf = g
in the distributional sense, which means (by denition)
< f, G

>=< g, >
for all test-functions D(U). We call the operator G hypoelliptic if, forall
open V U,
g |
V
C

(V ) f |
V
C

(V ).
Hormanders Theorem, as proved in [Kohn], states:
Theorem 15 Assume
Lie [V
0
|
y
, . . . , V
d
|
y
] = R
n
for all y U. Then the operator G as given above is hypoelliptic.
Remark: An example the Hormanders Theorem is a sucient condition
for hypoellipticity but not a necessary one goes along Example 4 from the last
chapter.
4.2
Take X as in section (3.8), take U = (0, ) R
m
and let D(U). For T
large enough
E[(T, X
T
)] = E[(0, X
0
)] = 0,
38
hence, by Itos formula,
0 = E
_
T
0
(t +L)(t, X
t
)dt.
By Fubini and T this implies
0 =
_

0
_
R
m
(t +L)(t, y)p
t
(dy)dt
=
_

0
_
R
m
(t, y)p
t
(dy)dt
for D(U) as dened through the last equation. This also reads
0 =< , (t +L) >=< , >
for some distribution D

(U).
1
In distributional sense this writes
(
t
+L)

= (
t
+L

) = 0, (4.1)
saying that satises the forward Fokker-Planck equation. If we can guarantee
that
t
+L

is hypoelliptic then, by Hormanders theorem there exists p(t, y)


smooth in both variables s.t.
< , > =
_
(0,)R
m
p(t, y)(t, y)dtdy
=
_
(0,)R
m
(t, y)p
t
(dy)dt.
This implies
p
t
(dy) = p(t, y)dy
for p(t, y) smooth on (0, ) R
m
.
2
4.3
We need sucient conditions to guarantee the hypoellipticity of G =
t
+L

as operator on U = (0, ) R
m
R
n
with n = m+ 1.
Lemma 16 Given a rst order dierential operator V = v
i

i
its adjoint is
given by
V

= (V +c
V
)
where c
V
=
i
v
i
is a scalar-eld acting by multiplication.
Proof: Easy. 2
As corollary,
(V
2
)

= (V

)
2
= (c
V
+V )
2
= V
2
+ 2c
V
V +c
1
The distribution is also represented by the (nite-on-compacts-) measure given by the
semi-direct product of the kernel p(s, dy) and Lebesgue-measure ds = d(s).
2
Note that the smoothness conclusion via Malliavin calculus doesnt say anything about
smoothness in t, i.e. our conclusion is stronger.
39
for some scalar-eld c. For L as given in (3.26) this implies
L

=
1
2
d

k=1
A
2
k
(A
0
c
A
k
A
k
) +c
for some (dierent) scalar-eld c. Dening

A
0
= A
0
c
A
k
A
k
(4.2)
this reads
L

=
1
2
d

k=1
A
2
k


A
0
+c.
We can trivially extend vectorelds on R
m
to vectorelds on U = (0, ) R
m
(time-independent vectorelds). From the dierential-operator point of view
it just means that we have that we are acting only on the space-variables and
not in t. Then
G =
1
2
d

k=1
A
2
k
(

A
0
+
t
) +c
is an operator on U, in Hormander form as needed. Dene the vector

A =

A
0
+

t
R
n
.
3
Hence Hormanders (sucient) condition for G being hypoellitpic
reads
Lie {A
1
|
y
, . . . , A
d
|
y
,

A|
y
} = R
n
(4.3)
for all y U. Note that for k = 1, . . . , d
[A
k
,
t
] = A
i
k

t
A
i
k

i
= 0
since the A
i
k
are functions in space only. It follows that
4
[A
k
,

A] = [A
k
,

A
0
],
and similarly no higher bracket will yield any component in t-direction. From
this it follows that (4.3) is equivalent to
Lie {A
1
|
y
, . . . , A
d
|
y
, [A
1
,

A
0
]|
y
, . . . , [A
d
,

A
0
]|
y
} = R
m
(4.4)
for all y R
m
. Using (4.2) we can replace

A
0
in condition (4.4) by A
0
without
changing the spanned Lie-algebra. We summarize
Theorem 17 Assume that Hormanders condition (H2) holds:
Lie {A
1
|
y
, . . . , A
d
|
y
, [A
1
, A
0
]|
y
, . . . , [A
d
, A
0
]|
y
} = R
m
.) (4.5)
for all y R
m
. Then the law of the process X
t
has a density p(t, y) which is
smooth on (0, ) R
m
.
3
As vector, think of having a 1 in the 0th position (time), then use the coordinates from

A
0
to ll up positions 1 to m.
4
We abuse notation: the Bracket on the l.h.s. is taken in R
n
resulting in a vector with
no component in t-direction which, therefore, is identied with the R
m
-vector an the r.h.s.,
result of the bracket-operation in R
m
.
40
Remarks: - Compare conditions H1 and H2, see (3.23) and (4.5). The only
dierence is that H2 is required for all points while H1 only needs to hold for
x = X(0). (Hence H2 is a stronger condition.)
- Using H1 (ie Malliavins approach) we dont get (a priori) information about
smoothness in t.
- Neither H1 nor H2 allow A
0
(alone!) to help out with the span. The intuitive
meaning is clear: A
0
alone represents the drift hence doesnt cause any diusion
which is the origin for a density of the process X
t
.
- We identied the distribution as uniquely associated to the (smooth) func-
tion p(t, y) = p(t, y; x). Hence from (4.1)

t
p = L

p ( L

acts on y )
and p(0, dy) is the Dirac-measure at x. All that is usually summarized by saying
that p is a fundamental solution of the above parabolic PDE and our theorem
gives smoothness-results for it.
- Let = (A
1
| . . . |A
d
) and assume that E =
T
is uniformly elliptic. We claim
that in this case the vectors {A
1
, . . . , A
d
} already span R
m
(at all points), so
that Hormanders condition is always satised.
Proof: Assume v span {A
1
, . . . , A
d
}

. Then, for all k,


0 =< v, A
k
>
2
= |v
i
A
i
k
|
2
= v
i
A
i
k
A
j
k
v
j
= v
T
Ev.
Since E is symmetric, positive denite we see that v = 0. 2
41
Bibliography
[Bass] Bass Elliptic operators and diusion, Springer, 1997.
[Bell] Bell The Malliavin Calculus, Pitman Monographs 34, 1987.
[BH] Bouleau, Hirsch Dirichlet Forms ..., Springer, 1997.
[DaPrato] DaPrato, Zabczyk Stochastic Equations in Innite Dimensions,
Cambridge University Press, 1992.
[Evans] L.C. Evans Partial Dierential Equations AMS.
[IW] Ikeda Watanabe Stochastic Dierential Equations and Diusion Pro-
cesses, North-Holland, 1989
[Ito] K. Ito Multiple Wiener integral, J.Math.Soc.Japan 3 (1951), 157-169
[KS] Karatzas, Shreve Brownian Motion and Stochastic Calculus, 2nd Ed,
Springer.
[Kohn] J.J, Kohn Pseudo-dierential Operators and Hypoellipticity Proc.
Symp. Pure Math, 23, A.M.S. (1073) p61-69
[Malliavin1] P. Malliavin Probability and Integration Springer.
[Malliavin2] Malliavin Stochastic Analysis Springer.
[Norris] Norris Simplied Malliavin Calculus
[Nualart] D. Nualart Malliavin Calculus and Related Topics Springer.
[Ocone] Ocone A guide to stochastic calculus of variations LNM 1316, 1987.
[Oksendal1] B. Oksendal Stochastic Dierential Equations Springer 1995.
[Oksendal2] B. Oksendal An Introduction to Malliavin Calculus with Ap-
plication to Economics Lecture Notes, 1997, available on
www.nhh.no/for/dp/1996/
[Pazy] A. Pazy Semigroups of Linear Operators and Applications to Partial
Dierential Equations Springer, 1983.
[RR] Renardy Rogers Introduction to Partial Dierential Equations,
Springer.
[RU] W. Rudin Functional Analysis, McGraw-Hill, 1991.
42
[RY] Revuz, Yor Brownian Motion and Continuous Martingales, Springer.
[Stein] Stein Singular Integrals Princeton UP, 1970.
[Sugita] H. Sugita Sobolev spaces of Wiener functionals and Malliavins cal-
culus, J. Math. Kyoto Univ. 25-1 (1985) 31-48
[Uestuenel] A.S. Uestuenel An Introduction to Analysis on Wiener Space,
Springer LNM 1610.
[Warner] Warner Foundations of Dierentiable Manifolds and Lie Groups,
Springer.
[Williams] Williams To begin at the beginning., Stochastic Inergrals, Lecture
Notes in Math. 851 (1981), 1-55
43

You might also like