You are on page 1of 4

1 Sets if (x1 , y1 ) (x2 , y2 ) then by:

f (k) = nk pk q nk

FX,Y (x1 , y1 ) FX,Y (x2 , y2 )
(A B) C = A (B C) FX,Y is continuous from above, in that: E(X) = np, Var(X) = np(1 p)
(A B)C = AC B C FX,Y (x + u, y + v) FX,Y (x, y) as Poisson distribution
k
Definition: -field u, v 0 f (k) = k! exp()
F subset of is a f ield, if Theorem: E(X) = Var(X) =
(a) 0 F If X and Y are independent and g, h : Geometric distribution
(b) if A1 , A2 , ... F then i=1 Ai F R R, then g(X) and h(Y ) are inde- Independent Bernoulli trials are per-
(c) if A F then Ac F pendent too. formed. Let W be the waiting time
Definition: before the first succes occurs. Then
The expectation of the random variable f (k) = P (W = k) = p(1 p)k1
2 Probability X is: P E(X) = 1/p, Var(X) = (1 p)/p2
E(X) = x:f (x)>0 xf (x) negative binomial distribution
P(AC ) = 1 P(A) Lemma: Let Wr be the waiting time before the
If B A then P(B) = P(A)+P(B\A) If X has mass function f and g : R R, r:th success. Then
P(A) then: f (k) = P (Wr = k) = k1
 r kr
r1 p (1 p)
P(A B) = P(A) + P(B) P(A B) P
E(g(X)) = x g(x)f (x) k = r, r + 1
More Sn generally: Continous counterpart: E(X) = 1p pr pr
, Var = (1p)
P 2

P i=1 Ai )
P ( =P i P(Ai ) E (g(X)) = R g(x)fX (x)dx
Exponential distribution:
i<j P (Ai Aj ) i<j<k P (Ai Aj AkDefinition:
)
n+1 F (x) = 1 ex , x 0
+ (1) P (A1 A2 An ) If k is a positive integer, the k:th mo- E(X) = 1/, Var(X) = 1/2
Lemma 5, p. 7: Let A1 A2 . . . , ment mk of X is defined mk = E(X k ). Normal distribution:
and writeSA for their limit: The k:th central moment is k = (x)2

A =

= limi Ai then E((X m1 )k )
1
f (x) = 2 e 22 , < x <
i=1 Ai 2

P(A) = limi P(Ai ) Theorem: E(X) = , Var(X) = 2


Similarly, B 1 B 2 B3 . . . , then The expectation operator E: Cauchy distribution:
T 1
B = i=1 Bi = limi Bi (a) If X 0 then E(X) 0 f (x) = (1+x 2 ) (no moments!)
satisfies P(B) = limi P(Bi ) (b) If a, b R then E(aX + bY ) =
Multiplication rule aE(X) + bE(Y )
P (A, B) = P (A)P (B | A) Lemma:
Conditional Probability If X and Y are independent, then 3.2 Dependence
P (A | B) = P P(AB) (B)
E(XY ) = E(X)E(Y ) Joint distribution:
Definition:
P (A,B,C,...)
P (A | B, C, ...) = P (B,C,...) R y(x, Ry)x = P(X x, Y y) =
F
X and Y are uncorrelated if E(XY ) = f (u, v)dvdu
Bayes formula
E(X)E(Y ) Lemma:
P(A|B) = P(B|A)P(A)P(B) Theorem: The random variables X and Y are in-
Total probability For random variables X and Y dependent if and only if
P (A) = P (A | B)P (B) (a) Var(aX) = a2 Var(X) for a R fX,Y (x, y) = fX (x)fY (y) for all x, y R
+ P (A | P B C )P (B C ) (b) Var(X + Y ) = Var(X) + Var(Y ) if
n FX,Y (x, y) = FX (x)FY (y) for all
P (A) = i=1 P (A | Bi )P (Bi ) X and Y are uncorrelated. x, y R
Definition 1, p. 13: Indicator function: Marginal distribution:
A family {A i : iQ I}is independent if: EIA = P(A)
T FX (x) = P(X  x) = F (x, ) =
P iJ Ai = iJ P(Ai ) For all finite R x R
subset J of I
f (u, y dydx
distribution function
F : R > [0, 1] : F (x) = P (X x) Marginal densities: fX (x) =
S 
mass function P ({X = x} {Y = y}) =
P y P
3 Random Variable y P(X =R
x, Y = y) = y fX,Y (x, y)
fX (x) = f (x, y)dy
Lemma 11, p. 30: 3.1 Distribution functions Lemma: P
Let F be a distribution function of X, E(g(X, Y )) = x,y g(x, y)fX,Y (x, y)
then Constant variable Definition:
(a) P(X > x) = 1 F (x) X() = c: F (X) = (x c) Cov(X, Y ) = E [(X EX)(Y EY )]
(b) P(x < X y) = F (y) F (x) Bernoulli distribution Bern(p)
(c) F (X = x) = F (x) limyx F (y) A coin is tossed one time and shows
Marginal distribution: head with probability p with X(H) = 1
limy FX,Y (x, y) = FX (x) and X(T ) = 0 3.3 Conditional distribu-
limx FX,Y (x, y) = FY (y) F (X) = 0 x < 0 tions
Lemma 5, p. 39: F (X) = 1 p 0 x < 1
F (X) = 1 x 1 Definition:
The joint distribution function FX,Y of The conditional distribution of Y given
the random vector (X, Y ) has the fol- E(X) = p, Var(X) = p(1 p)
Binomial distribution bin(n, k) X = x is:
lowing properties: FY |X (y|x) = P(Y y|X x)
lim FX,Y (x, y) = 0 A coin is tossed n times and a head R x f (v,y)
x,y turns up each time with probability p. = fY (y)
dv, {y : fY (y) > 0}
lim FX,Y (x, y) = 1 The total number of heads is discribed Theorem: Conditional expectation
x,y

1
P
(X) = E (Y | X) , E ((X)) = E(Y ) Exponential
R itx x distr. (t) = Fij (s) = n=0 sn fij (n)
E ((X)g(X)) = E (Y g(X)) 0
e e dx then
R eitx
Cauchy distr: (t) = 1 (1+x 2 ) dx
(a) Pii (s) = 1 + Fii (s)Pii (s)
Normal distr, N (0, 1) : (t) = (b) Pij (s) = Fij (s)Pij (s) if i 6= j
3.4 Sums of random vari- E(eitX ) =
R 1
exp(itx 1 2
x )dx Definition: First visit time Tj
2 2
Tj := min{n 1 : Xn = j}
ables Corollary:
Random variables X and Y have the Definition: mean recurrenceP time i
Theorem: P same characteristic function if and only i := E(Ti | X0 = i) = n nfii (n)
P(X + Y = z) = x f (x, z x) if they have the same distribution func- Definition: null, non-null state
If X and Y are independent, then tion. Theorem: Law of large Numbers state i is null if i =
P(X + Y = z) = fX+Y
P P (z) = Let X1 , X2 , X3 . . . be a sequence of iid state i is non-null if i <
x fX (x)fY (z x) = y fX (z r.vs with finite mean . Theorem: nullness of a persistent state
y)fY (y) Their partial sums Sn = X1 + X2 + + A persistent state is null if and only if
D pii (n) 0(n )
Xn satisfy n1 Sn as n
Definition: period d(i)
Central Limit Theorem:
3.5 Multivariate normal dis- Let X1 , X2 , X3 . . . be a sequence of iid The period d(i) of a state i is defined
by d(i) = gcd{n : pii (n) > 0}. We call
tribution: r.vs with finite mean and finite non-
i periodic if d(i) > 1 and aperiodic if
exp( 12 (x)T V1 (x)) zero 2 , and let Sn = X1 +X2 + +Xn
f (x) = d(i) = 1
(2)n det(V) then
(Sn n) D Definition: Ergodic
X = (X1 , X2 , . . . , Xn )
T N (0, 1) as n
n 2 A state is called ergodic if it is persis-
= (1 , . . . , n ), E(Xi ) = i tent, non-null, and aperiodic.
V = (vij ), vij = Cov(Xi , Xj ) Definition: (Inter-)communication
i j if pij (m) < 0 for some m
5 Markov chains i j if i j and j i
Theorem: intercommunication
4 Generating functions Definiton Markov Chain If i j then:
P (Xn = s | X0 = x0 , ...Xn 1 = xn 1) = (a) i and j have the same period
Definition: Generating function
P (Xn | Xn 1 = xn 1) (b) i is transient iff j is transient
The generating function of the random
Definition homogenous chain (c) i is null peristent iff j is null peris-
variable X is defined by:
P (Xn + 1 = j | Xn = i) = P (X1 = j | tent
G(s) = E(sX )
X0 = i) Definition: closed, irreducible
Example: Generating functions
Definition transition matrix a set C of states is called:
Constant: G(s) = sc
P = (pij ) with (a) closed if pij = 0 for all i C, j
/C
Bernoulli: G(s) = (1 p) + ps
ps pij = P (Xn+1 = j | Xn = i) (b) irreducible if i j for all i, j C
Geometric: 1s(1p)
Theorem: P stochastic matrix an absorbing state is a closed set with
Poisson: G(s) = e(s1) (a) P has non-negative entries one state
Theorem: expectation G(s) (b) P has row sum equal 1 Theorem: Decomposition
(a) E(s) = G0 (1) n-step transition State space S can be partitioned as
(b) E(X(X 1) . . . (X k+1)) = G(k) (1) pij (m, m + n) = P (Xm+n = j | Xm = i) T = T C1 C2 . . . , T is the set of
Theorem: independance Theorem Chapman Kolmogorov transient states, Ci irreducible, closed
X and Y are independent, iff pij (m.m + n + r) = sets of persistent states
GX+Y (s) = GX (s)GY (s)
P
k pik (m, m + n)pkj (m + n, m + n + r) Lemma: finite S
so If S is finite, then at least one state is
n
4.1 Characteristic functions P(m, m + n) = P persisten and all persistent states are
Definiton: mass funtion non-null.
Definition: moment generating function (n) = P (X = i)
i n
M (t) = E(etX ) (m+n) = (m)Pn (n) = (0)Pn
Definition: characteristic function Definition: persistent, transient
(t) = E(eitX ) 5.1 Stationary distributions
persistent:
Theorem: independance P (Xn = i for some n 1 | X0 = i) = 1 Definition: stationary distribution
X and Y are independent iff transient: is called stationaryPdistribution if
X+Y (t) = X (t)Y (t) P (Xn = i for some n 1 | X0 = i) < 1 (a) j 0 for all j, Pj j = 1
Theorem: Y = aX + b Definition: first passage time (b) = P , so j = i i pij for all j
Y (t) = eitb X (at) fij (n) = P (X1 6= j, ., Xn1 6= j, Xn = Theorem: existence of stat. distribution
Definition: joint characteristic function j | X0 = i) An irreducible chain has a stationary
P
X,Y (s, t) = E(eisX eitY ) fij := n=1 fij (n) distribution iff all states are non-null
Independent if: Corollary: persistent, transient persistent.
X,Y (s, t) = X (s)Y (t) for all s and t State j is persistent if n pjj (n) = Then is unique and given by i = 1
P
i
Theorem: moment gf charact. fcn
P
and n pij (n) = Pfor all i Lemma: i (k)
Examples of characteristic functions: State j P is transient if n pjj (n) < i (k): mean number of visits of the
Ber(p): (t) = q + peit and n pij (n) < for all i chain to the state i between two succes-
Bin distribution, bin(n, p): X (tt) = Theorem:PGenerating functions sive visits to state k.
n
(q + peit ) Pij (s) = n=0 sn pij (n) Lemma: For any state k of an irre-

2
ducible persistent chain, the vector Eq. Forward System of Equations: convergent in distribution
(k) satisfies i (k) < for all i and p0 ij (t) = j1 pi,j1 (t) j pij (t) Xn
d
X
(k) = (k)P j i, 1 = 0, pij (0) = ij if P (Xn < x) P (X < x) as n
Theorem: irreducible, persistent Eq. Backward systems of equations: Theorem: implications
If the chain is irreducible and persis- p0ij (t) = i pi+1,j (t) pij (t) a.s./r P
tent, there exists a positive x with j i pij (0) = ij (Xn X) (Xn X)
D
x = xP , which is unique to a multiplica- Theorem: (Xn X)
tive
P constant. The chain P is non-null if The forward system has a unique solu- For r > s 1 :
r s
i x i < and null if i xi = tion which satisfies the backward equa- (Xn X) (Xn X)
Theorem: transient chain if tion. Theorem: additional implications
s any state of an irreducible chain. The D
(a) If Xn c, where c is const, then
chain is transient iff there exists a non- 5.4 Continuous Markov Xn P c
zero solution {yj : j 6= s}, with |yj | 1
for allPj, to the equation:
chain (b) If Xn
P
X and P (|Xn | k) = 1
r
yi = j,j6=s pij yj , i 6= s Definition: Continuous Markov chain for all n and some k, then Xn X for
Theorem: persistent if X is continuous Markov chain if: all r 1
s any state of an irreducible chain on P (X(tn ) = j | X(t1 ))i1 , ..., X(tn1 ) = (c) IfPPn () = P (|Xn X|) > ) satis-
S = {0, 1, 2, ...}. The chain is persistent in1 ) = P (X(tn ) = j | X(tn1 ) fies n Pn () < for all  > 0, then
a.s.
if there exists a solution {yj : j 6= s} to Definition: transistion probability Xn X
the inequalities
P pij (s, t) = P (X(t) = j | X(s) = i) for Theorem: Skorokhods representation t.
yi j,j6=s pij yj , i 6= s st D
If Xn X as n
Theorem: Limittheorem homogeneous if pij (s, t) = pij (0, t s) then there exists a probability space and
For an irreducible aperiodic chain, we Def: Generator Matrix random variable Yn , Y with:
have that G = (gij ), pij (h) = gij h if i 6= j and (a) Y and Y have distribution F , F
n n
pij (n) 1j as n for all i and j pii = 1 + gij h a.s.
(b) Yn Y as n
Eq. Forward systems of equations:
Theorem: Convergence over function
5.2 Reversibility Pt0 = Pt G D
Eq. Backward systems of equations: If Xn X and g : R R is continu-
D
Theorem: Inverse Chain Pt0 = GP t ous then g(Xn ) g(X)
Y with Yn = XN n is a Markov chain Often solutions on the form Pt = Theorem: Equivalence

with P (Yn+1 = j | Yn = i) = ( ji )pji exp(tG) The following statements are equivalent:
Definition: Reversible chain Matrix Exponential: (a) Xn X
D
A chain is called reversible if P tn n
exp(tG) = n=1 n! G (b) E(g(Xn )) E(g(X)) for all
i pij = j pji Definition: bounded continuous functions g
Theorem: reversible stationary Irreducible if for any pair i, j, pij (t) > 0 (c) E(g(Xn )) E(g(X)) for all func-
If there is a with i pij = j pji then for some t tions g of the form g(x) = f (x)I[a,b] (x)
is the stationary distribution of the Definition: Stationary where f is continuous.
chain. , j 0,
P
j j = 1 and Theorem: Borel-Cantelli
= Pt t 0 Let A = n m=n Am be the event that
5.3 Poisson process Claim: = Pt G = 0 infinitely many of Pthe An occur. Then:
Theorem: (a) P (A) = 0 if P n P (An ) <
Definition: Poisson process Stationary if pij (t) j as t i, j (b) P (A) = 1 if n P (An ) = and
N (t) gives the number of events in time Not stationary if pij 0 A1 , A2 , ... are independent.
t Theorem:
Poisson process N (t) in S = {0, 1, 2, ...}, X X and Y Y implies
if 6 Convergence of Ran- Xnn + Yn X + Yn for convergence
(a) N (0) = 0; if s < t then N (s) N (t)
(b) P (N (t + h) = n + m | N (t) = n) = dom Variables a.s., r:th mean and probability. Not
generally true in distribution.
h + o(h) if m = 1
Norm
o(h) if m > 1
(a)kf k 0
1 h + o(h) if m = 0
(b) kf k = 0 iff f = 0
(c) the emission per interval are inde- 6.1 Laws of large numbers
(c) kaf k = |a| kf k
pendent of the intervals before.
(d) kf + gk kf k + kgk
Theorem: Poisson distribution Theorem:
convergent almost surely 2
N (t) has the Poisson distribution: a.s. X 1 , X2 , . . . is iid and E(Xi ) < and
(t)j t Xn X,
P (N (t) = j) = j! e if { : Xn () X() as n } 1
Pn = then
E(X)
Definition: arrivaltime, interarrivaltime n i=1 Xi a.s. and in mean square
convergent in rth mean Theorem:
Arrival time: Tn = inf {t : N (t) = n} r
Interarrivaltime: Xn = Tn Tn1 Xn X {Xn } iid. Distribution function F .
Theorem: Interarrivaltime if E|Xnr | < and E(|Xn X|r ) 0 as Then 1 Pn X P
iff one of the fol-
n i=1 i
X1 , X2 , ... are independent having expo- n lowing holds:
nential distribution Definition: convergent in probability 1) nP (|X | > n) 0 and
R
xdF
1 [n,n]
P
Birth process Poisson process with Xn X as n
intensities 0 , 1 , . . . if P (|Xn X| > ) 0 as n 2) Char. Fcn. (t) of Xi is differen-

3
tiable at t = 0 and 0 (0) = i The best predictor of Y given X is with
Pnthe same mean as Xn such that
1
Theorem: Strong law of large numbers E(X|Y ) n j=1 Xj Y a.s and in mean
X1P, X2 , ... iid. Then Weakly stationary processes:
1 n
n i=1 Xi a.s. as n . If X = {Xn : n 1} is a weakly station-
for some , iff E|X1 | < . In this case 7 Stochastic processes ary process, there exists
Pn a Y such that
m.s.
= EX1 E(Y ) = E(X1 ) and n1 j=1 Xj Y .
Definition: Renewal process
N = {N (t) : t 0} is a process for
6.2 Law of iterated loga- which N (t) = max{n : Tn t} where 8.1 Gaussian processes
T0 = 0, Tn = X1 + X2 + + Xn for
rithm n 1, and the Xn are iid non-negative Definition:
If X1 , X2 , ... are iid with mean 0 and r.v.s A real valued c.t. process is called
variance 1 then Gaussian if each finite dimensional
Sn vector (X(t1 ), X(t2 ), . . . , X(tn )) has
P(lim supn 2nloglogn = 1) = 1
8 Stationary processes the multivariate normal distribution
N ((t), V(t)) , t = (t1 , t2 , . . . , tn )
Definition: Theorem:
6.3 Martingales The Autocovariance function The Gaussian process X is stationary
Definition: Martingale c(t, t + h) = Cov(X(t), X(t + h)) iff. E (X(t)) is constant for all t and
Sn : n 1 is called a martingale with Definition: The process X = {X(t) : V(t) = V(t + h) for all t and h > 0
respect to the sequence Xn : n 1, if t 0} taking real values is called
(a) E|Sn | < strongly stationary if the families

(b) E(Sn+1 | X1 , X2 , ..., Xn ) = Sn {X(t1 ), X(t2 ), ..., X(tn )} and {X(t1 + 9 Inequalities
Lemma: h), X(t2 +h), ..., X(tn +h)} has the same
E(X1 + X2 |Y ) = E(X1 |Y ) + E(X2 |Y ) joint distribution for all t1 , t2 , ..., tn and Cauchy-Schwarz:
2
E (Xg(Y )|Y ) = g(Y )E(X|Y ), g : h>0 (E(XY )) E(X 2 )E(Y 2 )
Rn R Definition: The process X = {X(t) : with equality if and only if aX +bY = 1.
E (X|h(Y )) = E(X|Y ) if h : Rn Rn t 0} taking real values is called Jensens inequality:
is one-one weakly stationary if, for all t1 and Given a convex function J(x) and a r.v.

Lemma: Tower Property t2 and h > 0 :E(X(t1 )) = E(X(t2 )) X with mean : E(J(X)) J()
E [E(X|Y1 , Y2 )|Y1 ] = E(X|Y1 ) and Cov(X(t1 ), X(t2 )) = Cov(X(t1 + Markovs inequality
Lemma: h), X(t2 + h)), thus if and only if it has P (|X| a) E|X| for any a > 0
a
If {Bi : 1 i Pn} is a partition of A constant means and its autocovariance h : (R) [0, ] non-negative fcn, then
n function satisfies c(t, t + h) = c(0, h)
then E(X|A) = i=1 E(X|Bi )P(Bi ) P (h(X) a) E(h(X)) a > 0
a
Theorem: Martingale convergence Definition: General Markovs inequality:
If {Sn } is a Martingale with E(Sn2 < The covariance of complex-valued h : (R) [0, M ] non-negative function
M < ) for some M and all n then C1 and C2 is bounded by some M . Then
a.s,L2 Cov(C1 , C2 ) = E (C1 EC1 )(C2 EC2 )
S : Sn S P (h(X) a) E(h(X))a M a 0a<M
Theorem:
Chebyshevs Inequality:
{X} real, stationary with zero mean E(X 2 )
and autocovariance c(m). P (|X| a) a2 if a 0
6.4 Prediction and condi- The best predictor from the class of lin- Theorem: Holders inequality
tional expectation ear functions of the subsequence {X}rrs If p, q > 1 and p1 + q 1 = 1 then
1 1
is E|XY | (E|X p |) p (E|Y p |) q
Notation: Xbr+k = P s
Psi=0 ai Xri Minkowskis inequality:
p p
kU k2 = E(U 2 ) = hU, U i where i=0 ai c (|i j|) = c(k + j) for If p 1 then
hU, V i = E(U V ), kUn U k2 0 0js 1 1 1

L2 [E(|X + Y |p )] p (E|X p |) p + (E|Y p |) q


Un U Definition: Minkowski 2:
kU + V k2 kU k2 + kV k2 Autocorrelation function of a weakly E(|X + Y |p ) CP [E|X p | + E|Y p |]
Def. stationary process with autocovariance where p > 0 and
X, Y r.v. E(Y 2 ) < . The minimum function c(t) is
(t) = Cov(X(0),X(t)) c(t) 1 0<p1

mean square predictor of Y given X is = c(0)
Var(X(0))Var(X(t)) CP
Y = h(X) = min kY Y k2 2p1 p > 0
Theorem:
Theorem:
Spectral theorem: Weakly stationary Kolomogorovs inequality:
space H is closed and Y H
If a (linear)
process X with strictly pos. 2 is the Let {X } be iid with zero means and
then min Y Y exists
n
2
char. function of some distribution F variances 2 . Then for  > 0
n
Projection Theorem: wheneverR (t) is continuous at t = 0 2 ++ 2
H is a closed linear space and Y : (t) = e dF () P( max |X 1 + + Xi | > ) 1 2 n
1in
E(Y 2 ) < F is called the spectral distribution Doob-Kolomogorovs inequality:
For M H then E((Y M )Z) = 0 function. If {Sn } is a martingale, then for any
kY M k2 kY Zk2 Z H Ergodic theorem: >0
E(S 2 )
Theorem: X is strongly stationary such that P( max |Si | > ) 2n
1in
Let X and Y be r.v., E(Y 2 ) < . E|X1 | < there exists a r.v Y

You might also like