Discrete Time Overview

Stochastic Calculus I.
Discrete Time
Semyon Malamud
October 24, 2016
Contents
1 Densities, Expectations, and Moment Generating Functions 1
1.1 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Γ-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . 12
2 Conditional Densities 13
3 Conditional Expectation 16
4 Markov Processes 17
4.1 Transition probabilities and Kolmogorov Equations . . . . . . . . 18
4.2 Hitting Times for a Markov Chain . . . . . . . . . . . . . . . . . . 18
5 Stopped Processes and Optional Sampling for Martingales 20
1 Densities, Expectations, and Moment Gener-

ating Functions
Two basic properties:
1. If X has a density p(x) then the variable Y = βX + α has the density
pY (y) = β −1 p((y − α)/β) .
1
2. If X1 and X2 are independent with the densities pX1 (x) and pX2 (x) then
Y = X1 + X2 has the density
Z Z
pY (y) = pX1 (y − x)pX2 (x)dx = pX2 (y − x)pX1 (x)dx
R R
and the moment-generating functions multiply: if MX1 (α) = E[eX1 α ] and MX2 (α) =
E[eX2 α ] and the moment-generating functions of the independent variables X1
and X2 , then
MX1 +X2 (α) = E[eα(X1 +X2 ) ] = E[eαX1 ] E[eαX2 ] = MX1 (α) MX2 (α) .
(1.1)
The expectation is
Z
E[f (X)] = f (x) p(x) dx
The covariance of two random variables is
Cov(X1 , X2 ) = E[X1 X2 ] − E[X1 ] E[X2 ] .
If p(x1 , x2 ) is the joint density of (X1 , X2 ) then

Z Z Z
Cov(X1 , X2 ) = x1 x2 p(x1 , x2 ) dx1 dx2 − x1 p(x1 , x2 )dx1 dx2 · x2 p(x1 , x2 )dx1 dx2 .
The variance is
Var(X) = Cov(X, X)
and the standard deviation is

p
Var(X) .
1.1 Uniform distribution

The function
1
p(x) = 1[a,b]
b−a
is the density of the uniform distribution on [a, b].
2
A random variable X distributed uniformly clearly satisfies X ∈ [a, b] almost
surely because the density is equal to zero outside of the interval. Expectations
satisfy
Z b
1
E[f (X)] = f (x) dx .
b−a a
In particular,
Z b
1 1
E[e αX
] = eαx dx = α−1 (eαb − eαa )
b−a a b−a
is the moment-generating function and the moments can be directly calculated
as
Z b
1 1
n
E[X ] = xn dx = (n + 1)−1 (bn+1 − an+1 )
b−a a b−a
What about a sum of two uniformly distributed variables X1 and X2 ?

The moment-generating function is
2
α(X1 +X2 ) αX 2 −1 αb αa 1
E[e ] = (E[e ]) = α (e − e )
b−a
The density is a convolution:
Z +∞
pX1 +X2 (x) = pX1 (x − y) pX2 (y) dy
−∞
Z +∞ Z b
1 1
= 1[a,b] (x − y) 1[a,b] (y) dy = 1[a,b] (x − y) dy
(b − a)2 −∞ (b − a)2 a
(1.2)
Now, we need to consider several cases:

(i) if x 6 2a then the density should be zero because X1 + X2 ∈ [2a, 2b] almost
surely.
(ii) if x > 2b then the density is zero because X1 + X2 ∈ [2a, 2b] almost surely.
(ii) if a ∈ [2a, 2b] then the integrand will be non-zero (and, in fact, equal to 1)
for x − y ∈ [a, b] that is b > x − y > a that is x − a > y > x − b. Thus,
Z b Z min{b,x−a}
1 1
1[a,b] (x − y) dy = dy
(b − a)2 a (b − a)2 max{a,x−b}
1
= (min{b, x − a} − max{a, x − b}) (1.3)
(b − a)2
3
Now, we need to consider several cases:
(a) max{a, x − b} = a that is x 6 a + b that is min{b, x − a} = x − a. Therefore,
min{b, x − a} − max{a, x − b} = x − 2a.
(b) max{a, x − b} = x − b that is x > a + b that is min{b, x − a} = b. Therefore,
min{b, x − a} − max{a, x − b} = 2b − x.
We conclude that the density is


 0, x 6 2a

1  x − 2a, x ∈ [2a, a + b]
pX1 +X2 = 2
(b − a) 2b − x, x ∈ [a + b, 2b]


0, x > 2b
See figure 1.1.
density of a sum of two i.i.d. uniformly distributed on [0,1] variables

1.5
1.0
p(x)
0.5
0.0
-0.5
0.0 0.5 1.0 1.5 2.0
A multi-dimensional uniform distribution on an n-dimensional parallelepiped

def
D = [a1 , b1 ] × · · · × [an , bn ] = {(x1 , · · · , xn ) ∈ Rn : xi ∈ [ai , bi ] ∀i}
| {z }
n times
4
has the density
n
1 1 Y
p(x1 , · · · , xn ) = 1Dn = 1x ∈[a ,b ] .
(b1 − a1 ) · · · (bn − an ) (b1 − a1 ) · · · (bn − an ) i=1 i i i
Clearly, a random vector (X1 , · · · , Xn ) with this density has independent compo-
nents. This follows from Lemma 2.1 below.
In particular, we immediately see that the for a vector X = (X1 , · · · , Xn ) that is

uniformly distributed on a parallelepiped, the coordinates are independent.
1.2 Poisson Distribution

Note: The Poisson distribution with parameter λ is concentrated on the set
n
N = {0, 1, 2, . . . } and assigns probability e−λ λn! to the value n. The definition
of the Poisson distribution should thus be understood for n belonging to this set.
Let X be a r.v. with a Poisson distribution (λ is known and fixed). Then
∞ k ∞ k
sk −λ λ sλ s s
X X
−λ
MX (s) = e e =e e = e−λ ee λ = eλ(e −1) . (1.4)
k=0
k! k=0
k!
The two first moments are thus given by

d λ(es −1)

s
E(X) = MX (s) = e λe = λ, and
ds s=0 (1.5)
s s
E(X 2 ) = λes es(e −1) + (λes )2 eλ(e −1)

= λ + λ2 .
s=0
As a result,
Var(X) = E(X 2 ) − (E(X))2 = λ.
Claim: Sum of two independent Poisson variables with parameters λ1 and λ2 is

Poisson with parameter λ1 + λ2 .
Direct Proof. Let us choose k ≥ 0, and note that, as a Poisson random variable
only takes integer values, Ω = t∞l=0 {ω : ξ1 = l} (t means “disjoint union”).
5
Then
Prob ({ξ1 + ξ2 = k}) = Prob ({ξ1 + ξ2 = k} ∩ Ω)
= Prob ({ξ1 + ξ2 = k} ∩ (t∞l=0 {ξ1 = l}))
∞
= Prob (tl=0 ({ξ1 + ξ2 = k} ∩ {ξ1 = l}))
∞
disjoint sets X
= Prob ({ξ1 + ξ2 = k} ∩ {ξ1 = l})
l=0
∞
X
= Prob ({ξ2 = k − l} ∩ {ξ1 = l})
l=0
∞
indep. X
= Prob ({ξ2 = k − l}) Prob ({ξ1 = l})
l=0 (1.6)
k
ξ2 ≥0
X
= Prob ({ξ2 = k − l}) Prob ({ξ1 = l})
l=0
k
λ2k−l l

−λ1 λ1
X
−λ2
= e e
l=0
(k − l)! (l)!
k
λ1 +λ2 1 X k!
=e λk−l
2 λ1
l
k! l=0 (k − l)!l!
1
=eλ1 +λ2 (λ1 + λ2 )k ,
k!
which is the probability that a Poisson random variable with parameter λ1 + λ2
is equal to k. As k was arbitrary,
ξ1 + ξ2 ∼ Poisson(λ1 + λ2 ).
Proof via the Moment Generating Function From the problem 2 of the
previous assignment, the moment generating functions of our random variables
are
Mξi (s) = exp{λi (es − 1)}.
From the assumed independence,
Mξ1 +ξ2 (s) = exp{λ1 (es − 1)} exp{λ2 (es − 1)} = exp{(λ1 + λ2 )(es − 1)},
which is the moment generating function of a Poisson random variable with pa-
rameter λ1 + λ2 . We conclude from the uniqueness of the moment generating
function.
6
1.3 Binomial Distribution
Binomial distribution B(n, p) assigns probability nk pk (1 − p)n−k to a value k ∈

{0, 1, · · · , n} .
The moment generating function of a binomial r.v X ∼ B(n, p) is given by
M (s) =E esX

n
n−k n
X
sk k
= e p (1 − p)
k=1
k
n s k
n
X ep n (1.7)
=(1 − p)
k=1
1 − p k
n
es p

n
=(1 − p) 1 +
1−p
n
= (1 − p + pes ) .
It follows that

d
= n (1 − p + pes )n−1 pes s=0 = np,

E (X) = M (s) (1.8)
ds s=0
and that
d2

2

E X = 2 M (s)
ds s=0
= np es (1 − p + pes )n−1 + es (n − 1) (1 − p + pes )n−2 pes s=0 (1.9)

=np(1 + (n − 1)p)
=np(1 − p) + (np)2 .
Finally,
Var(X) = E(X 2 ) − (E(X))2 = np(1 − p). (1.10)
We were also asked to evaluate two finite sums. We observe that, if p = 21 ,

n
1 X s n
M (s) = n e . (1.11)
2 k=0 k
Using the explicit expression for the moment generating function,

n
X n
e sk
= (1 + es )n
k=0
k
7
It then follows that
n n !
X n d X n
k = esk

k ds k

k=0 k=0 s=0

d
((1 + es )n )

= (1.12)
ds s=0
s n−1 s

= n (1 + e ) e s=0
= n2n−1 ,
and that
n n !
d2

2 n n
X X
k = 2 esk

k ds k

k=0 k=0 s=0
d2

n

= 2 ((1 + es ) )
ds s=0
(1.13)
s n−1
s
+ es (n − 1)(1 + es )n−2 es s=0

= n e (1 + e )
= n2n−2 (2 + (n − 1)),
= 2n−2 n(n + 1).
Finally, we show how a binomial random variable can constructed from “binary”
ones. The moment generating function of any of the Xi ’s is
MXi (s) = E esXi = e0 (1 − p) + es p = 1 − p + pes .

But then, recalling the Xi ’s are i.i.d.,

n
Y
M Pn
i=1 Xi (s) = MXi (s) = (1 − p + pes )n ,
i=1
which is just the moment generating function of a binomial random variable. The
uniqueness of the Laplace transform then ensures that, indeed,
n
X
Xi ∼ B(n, p).
i=1
This representation of a binomial r.v. allows a straightforward calculation of the

first two moments. One must first note that
E(Xi ) = p,
and that
Var(Xi ) = E(Xi2 ) − (E(X))2 = p − p2 = p(p − 1).
8
The two centred moments of the binomial r.v. are given by
n
! n n
X X X
E(X) =E Xi = E (Xi ) = p = np, and
i=1 i=1 i=1
n
! n n
(1.14)
X X X
Var(X 2 ) =Var Xi = Var (Xi ) = p(p − 1) = np(p − 1).
i=1 i=1 i=1
Note that in the last line there is no covariance term because the terms of the
sum are independent.
1.4 Normal Distribution

• a random vector (X1 , · · · , Xd ) with the joint density N (µ, Σ)
1
exp −0.5 (x − µ)T Σ−1 (x − µ)

(2π)d/2 (det(Σ))1/2
• E[Xi ] = µi , Cov(Xi , Xj ) = Σij .

• Moment-generating function (Characteristic function)
" !# !
X X
T
E exp αi Xi = exp αi µi + 0.5α Σ α
i i
• if a random vector (X1 , · · · , Xn ) is normally distributed then so is (X1 , · · · , Xm )

for any m 6 n.
• if (X1 , · · · , Xn ) and (Y1 , · · · , Ym ) are independent and normally distributed
then so is the combination
• linear transformation of normals are normal
Example: One dimensional case
The moment generating function of a N (µ, σ 2 ) random variable X is given by
MX (s) =E esX

Z
1 (x−µ)2
=√ esx e− 2σ2 dx
2πσ ZR
1 2σ 2 sx−(x−µ)2
=√ e 2σ 2 dx
2πσ R (1.15)
Z
“identity” 1 2 2 2
(µ+sσ ) −µ −(x−µ−sσ ) 2 2
= √ e 2σ 2 dx
2πσ R Z
(µ+sσ 2 )2 −µ2 1 (x−µ−sσ 2 )2
=e 2σ 2 √ e− 2σ2 dx.
2πσ R
9
Now,
Z
1 (x−µ−sσ 2 )2
√ e− 2σ 2 dx = 1,
2πσ R
being the integral of the density of a random variable distributed as a N(µ +

sσ 2 , σ 2 ). Hence
(µ+sσ 2 )2 −µ2 µ2 +2µsσ 2 +s2 σ 4 −µ2 s2 σ 2
MX (s) = e 2σ 2 =e 2σ 2 = eµs+ 2 .
We now turn to the second part of the exercise. Using the independence of X1
and X2 ,
s 2 σ1
2 s 2 σ2
2 s2 (σ1
2 +σ 2 )
1
MX1 +X2 (s) = MX1 (s)MX2 (s) = eµ1 s+ 2 eµ2 s+ 2 = e(µ1 +µ2 )s+ 2 .
But this last expression is the moment generating function of a random vari-
able distributed as a N(µ1 + µ2 , σ12 + σ22 ). From the uniqueness of the mo-
ment generating function we conclude that X1 + X2 is actually distributed as
a N(µ1 + µ2 , σ12 + σ22 ), which means that its density is given by
(x−(µ1 +µ2 )) 2
1 − 2 +σ 2 )
√ p 2 2
e 2(σ1 2 .
2π σ1 + σ2
1.5 Cauchy distribution

Cauchy distribution has a density
λ
f (x) = .
π (λ2 + x2 )
Let us first check that the Cauchy distribution is a distribution. First, recall that
d 1
arctan(x) = .
dx 1 + x2
But then,
Z Z ∞
λ 1
f (x)dx = 2 2
dx
R −∞ π λ + x
Z ∞
λ 1 1
= dx
2 x 2
−∞ π λ 1 +
λ
1 ∞ 1
x
Z
u:= λ
= du (1.16)
π −∞ 1 + u2
1
= arctan(u)|∞ −∞
π
1 π
π
= − −
π 2 2
=1,
10
and we are actually facing a density function. The Cauchy distribution is rela-
tively unusual in the sense that it does not have a well defined mean. To see this,
we first observe that
Z M Z 2
λ x u:=x2 λ M 1 1
lim dx = lim du
M →+∞ 0 π λ 2 + x2 M →+∞ π 0 λ2 + u 2
λ M 2 (1.17)
= lim ln(λ2 + u)0
M →+∞ π
= + ∞.
But then, one cannot clearly define the mean of the Cauchy distribution, because
would it exist, the following should hold
Z +∞ Z 0 Z +∞
x f (x)dx = xf (x)dx + xf (x)dx
−∞ −∞ 0
Z 0 Z +∞
λ x λ x
= 2 2
dx + dx (1.18)
−∞ π λ + x 0 π λ 2 + x2
Z +∞ Z +∞
v:=−x λ v λ x
= − 2 2
dv + dx.
0 πλ +v 0 π λ 2 + x2
But this last expression is not well-defined, being of the form “∞ − ∞”.
As the mean of the distribution is not well defined, the variance cannot be defined
either. One may however consider the second moment, but it will turn out to be
+∞, and so will the moment generating function.
1.6 Γ-distribution
Γ-distribution has a density
1
(λx)k−1 λe−λx , x > 0.
Γ(k)
That is, this distribution is supported on [0, +∞) (i.e., the corresponding random
variable is non-negative with probability 1.
We first check that the proposed density actually defines a distribution. Recalling
the definition of the function Γ as, for z ∈ R>0 ,
Z +∞
∆
Γ(z) = tz−1 e−t dt,
0
11
we see that
Z +∞ Z +∞
1
f (x)dx = (λx)k−1 λe−λx dx
0 Γ(k) 0
Z +∞
u:=λx 1
= uk−1 e−u du
Γ(k) 0 (1.19)
1
= Γ(k)
Γ(k)
=1,
as expected.
The moment generating function can be calculated as follows:
Z +∞ Z +∞
∆ 1
M (s) = sx
e f (x)dx = esx (λx)k−1 λe−λx dx
0 Γ(k) 0
Z +∞
1
= (λx)k−1 λe−x(λ−s) dx
Γ(k) 0
Z +∞
u:=(λ−s)x 1 λ λ −u
= ( u)k−1 e du (1.20)
Γ(k) 0 λ−s λ−s
k
λ 1
= Γ(k)
λ−s Γ(k)
s −k
= 1− .
λ
Thus,

d s −k−1 −1
M (s) = − k 1 − ,
ds λ λ
(1.21)
d2

k s −k−2 −1
M (s) = (−k − 1) 1 − ,
ds2 λ λ λ
and

d k
mean = M (s) = ,
ds s=0 λ
(1.22)
d2 k(k + 1) k 2

k
variance = 2 M (s) − ( mean )2 =

2
− 2 = 2.
ds s=0 λ λ λ
1.7 Exponential Distribution

Exponential distribution is a special case of Γ-distribution, corresponding to k =
1. Of course, it is also supported on [0, +∞) and has the density
λ e−λx .
12
It has the remarkable property: if X is exponentially distributed then
Prob[X > s + τ |X > s] = Prob[X > τ ]
for any s, τ > 0. Indeed,
Z +∞
Prob[X > t] = λ e−λx dx = e−λt
t
and the claim follows because, by the definition of conditional probabilities,

Prob[(X > s + τ ) ∩ (X > s)]
Prob[X > s + τ |X > s] =
Prob[X > s]
−λ(s+τ )
e
= −λs
= e−λτ = Prob[X > τ ] . (1.23)
e
2 Conditional Densities
For two events A, B, the conditional probability of A given B is
def Prob(A ∩ B)
P (A|B) = .
Prob(B)
Given a random vector X with the density p(x), we can formally write
Prob(X ∈ dx) = p(x) dx
Here, dx is a formal, infinitesimal element of the space.
Given two random vectors X1 and X2 with the joint density p(x1 , x2 ), the densities
of X1 and X2 are given by
Z Z
pX1 (x1 ) = p(x1 , x2 )dx2 , pX2 (x2 ) = p(x1 , x2 )dx1 .
Then, the conditional density of X1 given that X2 = x2 is calculated as:
p(x1 |x2 )dx1 = Prob(X1 ∈ dx1 |X2 ∈ dx2 )

Prob((X1 ∈ dx1 ) ∩ (dX2 ∈ dx2 )) p(x1 , x2 )dx1 dx2 p(x1 , x2 )
= = = dx1 .
Prob(X2 ∈ dx2 ) pX2 (x2 )dx2 pX2 (x2 )
(2.1)
Given a random vector X = (X1 , · · · , Xn ) with a joint density p(x1 , · · · , xn ),

we know that the conditional density p(x1 , · · · , xk | xk+1 , · · · , xn ) of (X1 , · · · , Xk )
given a realization (xk+1 , · · · , xn ) of (Xk+1 , · · · , Xn ) is given by
p(x1 , · · · , xn )
p(x1 , · · · , xk |xk+1 , · · · , xn ) = R
p(x1 , · · · , xn ) dx1 · · · dxk
13
That is, in the denominator we integrate only over (x1 , · · · , xk ) , as the values
(xk+1 , · · · , xn ) are fixed. In fact, by the above,
Z
p(x1 , · · · , xn ) dx1 · · · dxk = p(Xk+1 ,··· ,Xn ) (xk+1 , · · · , xn )
is just the density of the random vector (Xk+1 , · · · , Xn ) .
Lemma 2.1 Random vectors X and Y are independent if and only if the joint
density p(x, y) for (X, Y ) is given by
p(x, y) = pX (x) pY (y)
where pX and pY are the densities of X and Y respectively.
Proof. If X and Y are independent then
p(x, y) dx dy = Prob((X ∈ dx) ∩ (Y ∈ dy))

= Prob((X ∈ dx) ) Prob( (Y ∈ dy)) = pX (x)dx pY (y)dy , (2.2)
that is p(x, y) = pX (x) pY (y) .

Conversely, if p(x, y) = pX (x) pY (y) , then
Z Z
E[f (X) g(Y )] = f (x) g(y) p(x, y) dx dy = f (x)g(y) pX (x) pY (y) dx dy
Z Z
= f (x)pX (x) dx g(y) pY (y) dy = E[f (X)] E[g(Y )] (2.3)
which means that X and Y are independent.
Another way to see it is through the conditional density. Clearly, X and Y are
independent if and only if the conditional density of X, conditioned on Y is the
same as the density of x itself:
pX (x) = p(x|y)
for any value of y. By definition, since pX is a probability density, we have

Z
pX (x) dx = 1
and hence
def p(x, y) pX (x) pY (y) pX (x) pY (y)
p(x|y) = R = R = R = pX (x) .
p(x, y) dx pX (x) pY (y)dx pY (y) pX (x) dx

14
In general, the joint density can be represented through conditional densities via
p(xn , · · · , x1 )
= p(xn |xn−1 , · · · , x1 ) p(xn−1 |xn−2 , · · · , x1 ) p(xn−2 |xn−3 , · · · , x1 ) · · · p(x2 |x1 ) p(x1 ) .
(2.4)
Example 1. Suppose that (X1 , X2 ) is jointly normal with the density N ((µ1 , µ2 ), Σ)
where the covariance matrix Σ is given by

σ11 σ12
Σ =
σ12 σ22
Then, the result proven in the lecture implies that
p(x1 |x2 )
is a N (µ, σ 2 ) density with

−1
µ = µ1 + σ12 σ22 (x2 − µ2 )
and
2 −1
σ 2 = σ11 − σ12 σ22 .
Example 2. Suppose that the random vector (X1 , X2 ) has a joint density
p(x1 , x2 ) = 2 1x1 ∈[0,1] 1x2 ∈[x1 ,1] . Let’s check that this is a probability density. In-
deed,
Z Z 1Z 1 Z 1
p(x1 , x2 )dx1 dx2 = 2 dx2 dx1 = 2 (1−x1 )dx1 = 2(1−0.5) = 1 .
0 x1 0
Now, the conditional density is
p(x1 , x2 )
p(x1 |x2 ) = R
p(x1 , x2 )dx1
and, since the integration region is {x2 ∈ [x1 , 1], x1 ∈ [0, 1]} = {x2 ∈ [0, 1], x1 ∈
[0, x2 ]} , we get
Z Z x2 Z x2
p(x1 , x2 )dx1 = 2 1x1 ∈[0,1] dx1 = 2 dx1 = 2x2 .
0 0
Example 3. An auto-regressive process Xt is a Markov process such that
Xt+1 = a Xt + wt+1
15
where wt is a sequence of i.i.d. random variables with the density f (z), and
x0 = 0. What is the joint density of X1 , · · · , Xt ?
Well, since the process is clearly Markov (the distribution of Xt+1 only depends
on Xt because wt are i.i.d.), we have
p(xt |xt−1 , · · · , x1 ) = p(xt |xt−1 ) .
What is p(xt |xt−1 )? Well, Xt = aXt−1 + wt+1 . Conditional on Xt−1 = xt−1 , the
density p(xt |xt−1 ) is the density of axt−1 + wt which is f (xt − a xt−1 ) . Thus,
p(xt |xt−1 , · · · , x1 ) = f (xt − axt−1 )
and the full density is
p(xt , xt−1 , · · · , x1 )
= p(xt |xt−1 , · · · , x1 ) p(xt−1 |xt−2 , · · · , x1 ) p(xt−2 |xt−3 , · · · , x1 ) · · · p(x2 |x1 ) p(x1 )
= f (xt − axt−1 ) f (xt−1 − axt−2 ) · · · f (x2 − ax1 ) f (x1 ) . (2.5)
3 Conditional Expectation
For any function φ(x1 , · · · , xn ), we have
E[φ(X1 , · · · , Xn )|(Xk+1 , · · · , Xn ) = (xk+1 , · · · , xn )]

Z
= φ(x1 , · · · , xn ) p(x1 , · · · , xk |xk+1 , · · · , xn ) dx1 · · · dxk (3.1)
We will also denote this as
E[φ(X1 , · · · , Xn )|(Xk+1 , · · · , Xn )] .
In particular, anything which is a function of the known Xk+1 , · · · , Xn can be

treated as a constant:
E[f (Xk+1 , · · · , Xn ) φ(X1 , · · · , Xn ) + g(Xk+1 , · · · , Xn ) | (Xk+1 , · · · , Xn )]

= f (Xk+1 , · · · , Xn ) E[φ(X1 , · · · , Xn ) | (Xk+1 , · · · , Xn )] + g(Xk+1 , · · · , Xn )
(3.2)
As a major application, suppose that we are observing the process Xt . Then, for
any t > τ,
E[φ(X1 , · · · , Xt )|Fτ ] = E[φ(X1 , · · · , Xt )|(X1 , · · · , Xτ )]
because observing X1 , · · · , Xτ is the same as observing all the information in Fτ .
16
Example 1. Let (X1 , X2 ) ∼ N ((µ1 , µ2 ), Σ) . Then,
−1
E[X1 |X2 ] = µ1 + σ12 σ22 (X2 − µ2 )
and
2 −1 −1
E[X12 |X2 ] = Var[X1 |X2 ]+(E[X1 |X2 ])2 = σ11
2
−σ12 σ22 + (µ1 +σ12 σ22 (X2 −µ2 ))2 .
Furthermore,
E[f (X2 ) X1 + g(X2 )|X2 ] = f (X2 ) E[X1 |X2 ] + g(X2 ) .
Example 2. Xt is an autoregressive process,
Xt+1 = aXt + wt+1 ,
where wt are i.i.d. with the density f (z).

We know that
E[φ(Xt+1 )|Ft ] = E[φ(Xt+1 )|Xt ]
because it is Markov. Furthermore, we know from the above that the conditional
density is
p(xt+1 |xt ) = f (xt+1 − axt )
and hence
Z
E[φ(Xt+1 )|Ft ] = φ(x) f (x − aXt ) dx .
R
4 Markov Processes
• Stochastic process Xt (possibly, vector, i.e., Xt ∈ Rd )
• (Ft ) its natural filtration
Definition Xt is Markov if
P (Xt ∈ A | X1 , · · · , Xt−1 ) = P (Xt ∈ A | Xt−1 )
In particular,
E[g(XT ) | Ft ] = E[g(XT ) | Xt ] = G(t, Xt )
is a deterministic function of (t, Xt )
17
4.1 Transition probabilities and Kolmogorov Equations
• Xt takes a finite number of values x1 , · · · , xn
• P (Xt+1 = xj |Xt = xj ) = p( xi , xj )
• X0 = x0 with transition probabilities p(x0 , xj )
• Kolmogorov difference equation
X
G(t, xi ) = p(xi , xj ) G(t + 1, xj )
j
• define transition matrix Π ∈ Mn (R) with Π = (p(xi , xj )). Then,
G(t, x) = ΠT −t g(x)
Example: valuing Markov cash flows
∞
X
V (x) = e−rt E[Xt | X0 = x]
t=0
Kolmogorov equation
V (x) = x + e−r Π V (x) ⇔ V (x) = (1 − e−r Π)−1 x
4.2 Hitting Times for a Markov Chain

Definition A random time is a random variable τ : Ω → Z+ .
A stopping time w.r.t. a filtration Ft is a random time such that the event {τ 6 t}
belongs to Ft .
Example: Hitting time. That is, the first time the process hits certain level.
Let X be a Markov Process taking values x1 , · · · , xn with transition prob. pij .

Let Txi be the first time of hitting xi starting and let
E[sTxi | X0 = xj ] = Fj (s)
Show that the functions F1 (s), · · · , Fn (s) satisfy the system of difference equa-
tions
X
Fj (s) = pji s + s pjk Fk (s).
k6=i
18
Solve it for n = 2 and n = 3.
Given the moment generating functions, we can calculate the moments
E[Txi |X0 = xj ] , E[Tx2i |X0 = xj ]
by differentiating the moment generating function:
d d2 d
E[Txi |X0 = xj ] = Fj (s)|s=1 , E[Tx2i |X0 = xj ] = 2
Fj (s)|s=1 + Fj (s)|s=1
ds ds ds
Solution. The general idea is to condition on the next state that will be reached
by the process. Namely,
Fj (s) =E(sTxi |X0 = xj )

n
X
= E(sTxi 1X1 =xk |X0 = xj )
k=1
n
“Bayes” X
= E(sTxi |X0 = xj , X1 = xk ) Prob(X1 = xk |X0 = xj )
k=1
n
Markov X
= E(sTxi |X1 = xk )pjk (4.1)
k=1
X
=E(sTxi |X1 = xi )pji + E(sTxi |X1 = xk )pjk
k6=i
X
=spji + sE(sTxi −1 |X1 = xk )pjk
k6=i
X
=spji + s Fk (s)pjk .
k6=i
Assuming there are n = 2 states, denoted by i and j, there is a unique equation,
Fj (s) = pji s + spjj Fj (s),
whose solution is
spji
Fj (s) = .
1 − spjj
Assuming there are n = 3 states, denoted by i, j, and k, the system above

becomes

Fj (s) = spji + s[pjj Fj (s) + pjk Fk (s)]
. (4.2)
Fk (s) = spki + s[pkj Fj (s) + pkk Fk (s)]
19
Expressing it in matrix form, we obtain

1 − spjj −spjk Fj (s) pji s
= , (4.3)
−spkj 1 − spkk Fk (s) pki s
and
(1−spkk )pji s+spjk pki s
(
Fj (s) = (1−spjj )(1−spkk )−s2 pkj pjk
spkj pji s+(1−spjj )pki s . (4.4)
Fk (s) = (1−spjj )(1−spkk )−s2 pkj pjk
5 Stopped Processes and Optional Sampling for

Martingales
Definition Given a filtration Ft ,
• a martingale is an adapted process (Xt ) such that
E[Xt | Ft−1 ] = Xt−1
• if
E[Xt | Ft−1 ] 6 Xt−1
then Xt is a super-martingale
• if
E[Xt | Ft−1 ] > Xt−1
then Xt is a sub-martingale
Example: random walk. Let
Xt = Z 1 + · · · + Z t
where Zt are i.i.d. Then,
Et [Xt+1 ] = Et [Xt + Zt+1 ] = Xt + Et [Zt+1 ] = Xt + E[Z1 ] .
Here, we have used that Zt are all i.i.d. and hence
Et [Zt+1 ] = E[Zt+1 ] = E[Z1 ] .
It follows that the process is sub(super) martingale if E[Z1 ] > 0 (6 0) .
Properties of Martingales
20
• Et [Mt+1 − Mt ] = 0
• Es [Mt − Ms ] = Es [Mt ] − Ms = 0
• convex(concave) function of a martingale is a sub(super) martingale

Definition Given a stopping time τ, the stopped process is
(Xtτ )(ω) = X(t∧τ )(ω) (ω)
where
t ∧ τ = min{t , τ (ω)}
Theorem
• If Xt is a martingale (sub/super martingale) then so is the stopped process;
• limt→∞ Xtτ = Xτ a.s. if τ is almost surely finite.
Proof for the martingale case. By definition,
E[IA (Xt+1 − Xt )|Fθ ] = E[IA (E[Xt+1 |Ft ] − Xt )|Fθ ] = 0 (5.1)
for any Ft -measurable A for θ 6 t.
We have
t∧τ
X t
X
Xtτ = X0 + (Xθ − Xθ−1 ) = X0 + Iτ >θ (Xθ − Xθ−1 )
θ=1 θ=1
Since the event A = {τ > θ} = {τ 6 θ}c is Fθ -measurable, the claim follows.
Optional Sampling Theorem Suppose that Xt is a martingale such that

|Xt+1 − Xt | < c
for some c > 0 and τ is a stopping time such that E[τ ] < ∞. Then,
E[Xτ ] = X0
Proof. The telescope sum above implies that

Xtτ < |X0 | + τ c
and thus |X0 | + τ c is an integrable majorant and the claim follows from the
Lebesque dominated convergence.
Examples .
21
1. Suppose Harriet has 7 dollars. Her plan is to make one dollar bets on fair
coin tosses until her wealth reaches either 0 or 50, and then to go home.
What is the expected amount of money that Harriet will have when she
goes home? What is the probability that she will have 50 when she goes
home?
2. Consider a contract that at time N will be worth either 100 or 0. Let S(n)
be its price at time 0 6 n 6 N. If S(n) is a martingale, and S(0) = 47, then
what is the probability that the contract will be worth 100 at time N ?
3. Pedro plans to buy the contract in the previous problem at time 0 and sell
it the first time T at which the price goes above 55 or below 15. What is
the expected value of S(T )?
4. Suppose S(N ) is with probability one either 100 or 0 and that S(0) = 50.
Suppose further there is at least a sixy percent probability that the price
will at some point dip to below 40 and then subsequently rise to above 60
before time N. Prove that S(n) cannot be a martingale.
5. A player bets on a sequence of i.i.d. (and balanced) coin tosses: at each

turn, the player wins twice his bet if the coin falls on heads or loses his bet
if the coin falls on tails. Assume now that the player adopts the following
strategy: he starts by betting 1 franc. If he wins his bet (that is, if the
outcome is heads), he quits the game and does not bet anymore. If he loses
(that is, if the outcome is tails), he plays again and doubles his bet for the
next turn. He then goes on with the same strategy for the rest of the game.
We assume here that the player can borrow any money he wants in order to
bet. Of course, we also assume that he has no information on the outcome
of the next coin toss while betting on it. a) Is the process of gains of the
player a martingale (by convention, we set the gain of the player at time
zero to be equal to zero)? b) What is the gain of the player at the first time
heads comes out? c) Isn’t there a contradiction between a) and b)?
We describe the outcome of betting 1 dollar at time t ≥ 0 by a random variable
, with probability 21

1
X(t, ω) = ,
−1 , with probability 12
and assume the random variable (X(t, ω))t≥0 are i.i.d. In particular, for any time
t≥0
E(X(t, ω)) = 0.
22
The gain process of someone that would start playing with Z(0) = 7 dollars and
never stop playing is given by, for t ≥ 0,
t
X
Z(t, ω) = Z(0) + X(s, ω).
s=1
If we take the filtration (Ft )t≥0 to be the one generated by the coin tosses
(X(t, ω))t≥0 , then, for any t ≥ 0,
t+1
X
E(Z(t + 1, ω)|Ft ) =E(Z(0) + X(s, ω)|Ft )
s=1
t
X
=Z(0) + X(s, ω) + E(X(t + 1, ω)|Ft )
s=1
(5.2)
Xt
=Z(0) + X(s, ω) + E(X(t + 1, ω))
s=1
=G(t, ω),
where we use the linearity of the conditional expectation and exercise 1, bullet
2 from the Problem Set 4 for the second equality, and the independence of the
“X(t, ω)”s for the third.
Harriet will stop playing when her wealth reaches 0 or 50. We describe this time
by the stopping time T (it is a stopping time). From the lecture, we know that
the stopped process
(Z T (t, ω))t≥0
is still a martingale. But it is also the process describing the evolution of Harriet’s
wealth. The mathematical formulation of our first question is thus: what is
E(Z T (T (ω), ω))?
To answer this, note that (Z T (t, ω))t≥0 is bounded by 0 and 50. But then it is
uniformly integrable, and by the suitable optional stopping theorem (slide 19,
day 5),
E(Z T (T (ω), ω)) = E(Z T (0, ω)) = E(Z(0, ω)) = Z(0) = 7. (5.3)
The expected amount of money that Harriet will have when she goes home is
thus her initial wealth.
We turn to the probability of going home with 50 dollars. Note that one of the
three following events must happen:
• Harriet plays forever because her wealth never reaches either of 0 or 50;
23
• Harriet goes home with 0;
• Harriet goes home with 50.
The first possibility will never happen. This is probably clear, but may be verified
by the following application of one of the two Borel-Cantelli lemmas. If at some
point the series of bets gives 50 positive outcome in a row, we can be sure that,
at worst, by the end of the 50 wins Harriet will have gone home. Let us then
consider the events
A1 ={X1 , X2 , . . . , X50 are winning bets}
A2 ={X51 , X52 , . . . , X100 are winning bets}
(5.4)
...
Ak ={X(k−1)·50+1 , X(k−1)·50+2 , . . . , Xk·50 are winning bets}.
1
The events (Ak )k≥0 are independent, and the probability of each of them is 250
,
which is small but strictly positive. But then, as
∞
X
Prob(Ak ) = ∞,
k=0
the “second” Borel-Cantelli lemma ensures that, with probability one, infinitely
many of the events Ak will occur. In particular, Harriet will go home after a
finite time with probability one.
With this observation in mind, from (5.3),
Z(0) = E(Z T (T (ω), ω)) =E Z T (T (ω), ω)1Z(T (ω),ω)=0 + E Z T (T, ω)1Z(T (ω),ω)=50

+ E Z T (T (ω), ω)1T (ω)=∞)

=E Z T (T (ω), ω)1Z(T (ω),ω)=0 + E Z T (T, ω)1Z(T (ω),ω)=50

=E 0 · 1Z(T (ω),ω)=0 + E 50 · 1Z(T (ω),ω)=50
=50 · Prob(Z T (T (ω), ω) = 50),
(5.5)
from which
Prob(Harriet’s wealth when she goes home is 50)
= Prob(Z T (T (ω), ω) = 50) (5.6)
Z(0) 7
= = .
50 50
24
Exercise 2 The process S(n) being a martingale,
47 = S(0) = E(S(N )) = 100·Prob(S(N ) = 100)+0·Prob(S(N ) = 0) = 100·Prob(S(N ) = 100).
From which,
47
Prob(S(N ) = 100) = .
100
Exercise 3 As (S(t, ω))t≥0 is a martingale, so is the stopped martingale (S T (t, ω))t≥0 .

Also, as
first possible final value of S =0

≤40
≤S(0)
(5.7)
≤60
≤100
=second possible final value of S,
it must be that whatever the state of the world ω,
0 < T (ω) ≤ N =⇒ T (ω) = min{T (ω), N }.
But then,
T
E(ST (ω) (ω)) = E(Smin{T (ω),N } (ω)) = E(SN (ω)) = S T (0) = S(0) = 47.
where we used the fact that S T is a martingale. The expected value is thus,
again, the initial value.
Exercise 4 For contradiction, we assume S is a martingale, and we will take

advantage of the process being likely to “go up after going down”. For this we
first write
• τ40 for the first time the process is under 40, and
• τ60 for first time the process is over 60, coming up from under 40 (hence,
τ60 ≥ τ40 ).
We have that
Ω ={τ40 > N } t {τ40 ≤ N ; τ60 > N } t {τ40 ≤ N ; τ60 ≤ N }
∆
(5.8)
=A1 t A2 t A3 ,
25
where, for example, τ60 > N means that the process does not go over 60 before
N (and thus ends at 0).
As S is assumed to be a martingale, so are S τ40 and S τ60 . In particular,
τ60 τ40
E(SN − SN ) = S0 − S0 = 0. (5.9)
We will work a bit more on the left hand side. We have that
τ60 τ40 τ60 τ40 τ60 τ40 τ60 τ40
E(SN − SN ) =E((SN − SN )1A1 ) + E((SN − SN )1A2 ) + E((SN − SN )1A3 )
≥E(0 · 1A1 ) + E((−40) · 1A2 ) + E(20 · 1A3 )
= − 40 · Prob(A2 ) + 20 · Prob(A3 )
(5.10)
6
By assumption, Prob(A3 ) ≥ 10
. Concerning the other probability, as S τ40 is
assumed to be a martingale,
τ40
50 = S(0) = E(SN ) =E(Sτ40 |τ40 ≤ N ) Prob(τ40 ≤ N ) + 100 · Prob(τ40 > N )
≤40 · Prob(τ40 ≤ N ) + 100 · Prob(τ40 > N )
=40 · Prob(τ40 ≤ N ) + 100 · (1 − Prob(τ40 ≤ N )).
(5.11)
As a result,
5
Prob(A2 t A3 ) = Prob(τ40 ≤ N ) ≤ ,
6
and
1
Prob(A1 ) = 1 − Prob(A1 t A2 ) ≥ .
6
But then,
1 6 7
Prob(A2 ) = 1 − Prob(A1 ) − Prob(A3 ) ≤ 1 − − = .
6 10 30
Going back to (5.10),
τ60 τ40 7 6 8
E(SN − SN ) ≥ −40 · Prob(A2 ) + 20 · Prob(A3 ) ≥ −40 · + 20 · = > 0,
30 10 3
which is clearly in contradiction with (5.9). As a result, S cannot be a martingale.
Exercise 5
26
a) We represent the coin tosses by the i.i.d. random variables (X(t, ω))t≥0 ,
where
1 ( for “head”) , with probability 21

X(t, ω) = ,
−1 ( for “tail”) , with probability 21
In particular, E(X(t, ω)) = 0 and the gain from betting a francs on the outcome
at time t is thus aX(t, ω).
The strategy of the player can be described by the following series of amounts
a1 =1
(5.12)
at (ω) =2t−1 1∩t−1
u=1 {X(u,ω)=−1}
, t ≥ 2,
where one should note that at (ω) is measurable with respect to the σ-field gener-
ated by the coin tosses up to t − 1. (In other words, the strategy is reasonable in
the sense that it does not require any knowledge of the coin tosses still to occur.)
The gains resulting from this strategy are
t
X
G(t, ω) = as (ω)X(s, ω),
s=1
and one may check that this process is a martingale. Indeed, it is adapted to the
filtration (Ft )t≥0 generated by the coin tosses, and for any t ≥ 0,
t+1
!
X
E (G(t + 1, ω)|Ft ) =E as (ω)X(s, ω) Ft

s=1
t
X
= as (ω)X(s, ω) + E (at+1 (ω)X(t + 1, ω)| Ft )
(5.13)
s=1
=G(t, ω) + at+1 (ω)E (X(t + 1, ω)| Ft )
=G(t, ω) + at+1 (ω)E (X(t + 1, ω))
=G(t, ω).
b) If we denote by τ (ω) the stopping time indicating the first head,

τ (ω)
X
G(τ (ω), ω) = as (ω)X(s, ω)
s=1
τ (ω)−1
X
= 2s−1 (−1) + 2τ (ω)−1 (5.14)
s=1
1 − 2τ (ω)−1
=− + 2τ (ω)−1
1−2
=1.
27
c) There is an apparent contradiction: given that every time I play, on average I
neither win nor lose, how can I find a strategy with which I will always eventually
win.
This argument is an intuitive, and incorrect, interpretation of the optional sam-
pling theorem. We have two versions of it, but the form is similar in both cases
and looks like: if we have
• a martingale M ,
• a stopping time τ , and
• an additional condition,
then
M0 = E(Mτ ).
What is puzzling in our case is that the stopped gain process (that correspond
to the term in the expectation on the right hand side) is identically equal to one,
but the starting value (that corresponds to the left hand side) is zero. There
is however no contradiction, as the setting of this problem does not satisfy the
additional conditions required by the theorem. (These conditions can be either
the uniform integrability (cf. slide 19, day 5) or the combination of a bound on
the increments and a finite expected value for the stopping time (cf. slide 12, day
5).
28

Discrete Time Overview

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discrete Time Overview

Uploaded by

Copyright:

Available Formats

Stochastic Calculus I.

5 Stopped Processes and Optional Sampling for Martingales 20

1 Densities, Expectations, and Moment Gener-

pY (y) = β −1 p((y − α)/β) .

The covariance of two random variables is

Cov(X1 , X2 ) = E[X1 X2 ] − E[X1 ] E[X2 ] .

If p(x1 , x2 ) is the joint density of (X1 , X2 ) then

and the standard deviation is

1.1 Uniform distribution

What about a sum of two uniformly distributed variables X1 and X2 ?

Now, we need to consider several cases:

density of a sum of two i.i.d. uniformly distributed on [0,1] variables

0.0 0.5 1.0 1.5 2.0

A multi-dimensional uniform distribution on an n-dimensional parallelepiped

In particular, we immediately see that the for a vector X = (X1 , · · · , Xn ) that is

1.2 Poisson Distribution

The two first moments are thus given by

Var(X) = E(X 2 ) − (E(X))2 = λ.

Claim: Sum of two independent Poisson variables with parameters λ1 and λ2 is

Mξi (s) = exp{λi (es − 1)}.

From the assumed independence,

Var(X) = E(X 2 ) − (E(X))2 = np(1 − p). (1.10)

We were also asked to evaluate two finite sums. We observe that, if p = 21 ,

Using the explicit expression for the moment generating function,

MXi (s) = E esXi = e0 (1 − p) + es p = 1 − p + pes .

But then, recalling the Xi ’s are i.i.d.,

This representation of a binomial r.v. allows a straightforward calculation of the

Var(Xi ) = E(Xi2 ) − (E(X))2 = p − p2 = p(p − 1).

1.4 Normal Distribution

• E[Xi ] = µi , Cov(Xi , Xj ) = Σij .

• if a random vector (X1 , · · · , Xn ) is normally distributed then so is (X1 , · · · , Xm )

being the integral of the density of a random variable distributed as a N(µ +

1.5 Cauchy distribution

1.7 Exponential Distribution

and the claim follows because, by the definition of conditional probabilities,

Then, the conditional density of X1 given that X2 = x2 is calculated as:

p(x1 |x2 )dx1 = Prob(X1 ∈ dx1 |X2 ∈ dx2 )

Given a random vector X = (X1 , · · · , Xn ) with a joint density p(x1 , · · · , xn ),

is just the density of the random vector (Xk+1 , · · · , Xn ) .

p(x, y) = pX (x) pY (y)

where pX and pY are the densities of X and Y respectively.

Proof. If X and Y are independent then

p(x, y) dx dy = Prob((X ∈ dx) ∩ (Y ∈ dy))

that is p(x, y) = pX (x) pY (y) .

which means that X and Y are independent.

for any value of y. By definition, since pX is a probability density, we have

Then, the result proven in the lecture implies that

is a N (µ, σ 2 ) density with

Now, the conditional density is

Example 3. An auto-regressive process Xt is a Markov process such that

p(xt |xt−1 , · · · , x1 ) = p(xt |xt−1 ) .

p(xt |xt−1 , · · · , x1 ) = f (xt − axt−1 )

and the full density is

E[φ(X1 , · · · , Xn )|(Xk+1 , · · · , Xn ) = (xk+1 , · · · , xn )]

We will also denote this as

In particular, anything which is a function of the known Xk+1 , · · · , Xn can be

E[f (Xk+1 , · · · , Xn ) φ(X1 , · · · , Xn ) + g(Xk+1 , · · · , Xn ) | (Xk+1 , · · · , Xn )]

E[φ(X1 , · · · , Xt )|Fτ ] = E[φ(X1 , · · · , Xt )|(X1 , · · · , Xτ )]

because observing X1 , · · · , Xτ is the same as observing all the information in Fτ .

E[f (X2 ) X1 + g(X2 )|X2 ] = f (X2 ) E[X1 |X2 ] + g(X2 ) .

Example 2. Xt is an autoregressive process,

Xt+1 = aXt + wt+1 ,

where wt are i.i.d. with the density f (z).

E[φ(Xt+1 )|Ft ] = E[φ(Xt+1 )|Xt ]