Cheatsheet PDF

1 Sets if (x1 , y1 ) (x2 , y2 ) then by:
f (k) = nk pk q nk

FX,Y (x1 , y1 ) FX,Y (x2 , y2 )
(A B) C = A (B C) FX,Y is continuous from above, in that: E(X) = np, Var(X) = np(1 p)
(A B)C = AC B C FX,Y (x + u, y + v) FX,Y (x, y) as Poisson distribution
k
Definition: -field u, v 0 f (k) = k! exp()
F subset of is a f ield, if Theorem: E(X) = Var(X) =
(a) 0 F If X and Y are independent and g, h : Geometric distribution
(b) if A1 , A2 , ... F then i=1 Ai F R R, then g(X) and h(Y ) are inde- Independent Bernoulli trials are per-
(c) if A F then Ac F pendent too. formed. Let W be the waiting time
Definition: before the first succes occurs. Then
The expectation of the random variable f (k) = P (W = k) = p(1 p)k1
2 Probability X is: P E(X) = 1/p, Var(X) = (1 p)/p2
E(X) = x:f (x)>0 xf (x) negative binomial distribution
P(AC ) = 1 P(A) Lemma: Let Wr be the waiting time before the
If B A then P(B) = P(A)+P(B\A) If X has mass function f and g : R R, r:th success. Then
P(A) then: f (k) = P (Wr = k) = k1
r kr
r1 p (1 p)
P(A B) = P(A) + P(B) P(A B) P
E(g(X)) = x g(x)f (x) k = r, r + 1
More Sn generally: Continous counterpart: E(X) = 1p pr pr
, Var = (1p)
P 2
P i=1 Ai )
P ( =P i P(Ai ) E (g(X)) = R g(x)fX (x)dx
Exponential distribution:
i<j P (Ai Aj ) i<j<k P (Ai Aj AkDefinition:
)
n+1 F (x) = 1 ex , x 0
+ (1) P (A1 A2 An ) If k is a positive integer, the k:th mo- E(X) = 1/, Var(X) = 1/2
Lemma 5, p. 7: Let A1 A2 . . . , ment mk of X is defined mk = E(X k ). Normal distribution:
and writeSA for their limit: The k:th central moment is k = (x)2
A =

= limi Ai then E((X m1 )k )
1
f (x) = 2 e 22 , < x <
i=1 Ai 2
P(A) = limi P(Ai ) Theorem: E(X) = , Var(X) = 2

Similarly, B 1 B 2 B3 . . . , then The expectation operator E: Cauchy distribution:
T 1
B = i=1 Bi = limi Bi (a) If X 0 then E(X) 0 f (x) = (1+x 2 ) (no moments!)
satisfies P(B) = limi P(Bi ) (b) If a, b R then E(aX + bY ) =
Multiplication rule aE(X) + bE(Y )
P (A, B) = P (A)P (B | A) Lemma:
Conditional Probability If X and Y are independent, then 3.2 Dependence
P (A | B) = P P(AB) (B)
E(XY ) = E(X)E(Y ) Joint distribution:
Definition:
P (A,B,C,...)
P (A | B, C, ...) = P (B,C,...) R y(x, Ry)x = P(X x, Y y) =
F
X and Y are uncorrelated if E(XY ) = f (u, v)dvdu
Bayes formula
E(X)E(Y ) Lemma:
P(A|B) = P(B|A)P(A)P(B) Theorem: The random variables X and Y are in-
Total probability For random variables X and Y dependent if and only if
P (A) = P (A | B)P (B) (a) Var(aX) = a2 Var(X) for a R fX,Y (x, y) = fX (x)fY (y) for all x, y R
+ P (A | P B C )P (B C ) (b) Var(X + Y ) = Var(X) + Var(Y ) if
n FX,Y (x, y) = FX (x)FY (y) for all
P (A) = i=1 P (A | Bi )P (Bi ) X and Y are uncorrelated. x, y R
Definition 1, p. 13: Indicator function: Marginal distribution:
A family {A i : iQ I}is independent if: EIA = P(A)
T FX (x) = P(X x) = F (x, ) =
P iJ Ai = iJ P(Ai ) For all finite R x R
subset J of I
f (u, y dydx
distribution function
F : R > [0, 1] : F (x) = P (X x) Marginal densities: fX (x) =
S
mass function P ({X = x} {Y = y}) =
P y P
3 Random Variable y P(X =R
x, Y = y) = y fX,Y (x, y)
fX (x) = f (x, y)dy
Lemma 11, p. 30: 3.1 Distribution functions Lemma: P
Let F be a distribution function of X, E(g(X, Y )) = x,y g(x, y)fX,Y (x, y)
then Constant variable Definition:
(a) P(X > x) = 1 F (x) X() = c: F (X) = (x c) Cov(X, Y ) = E [(X EX)(Y EY )]
(b) P(x < X y) = F (y) F (x) Bernoulli distribution Bern(p)
(c) F (X = x) = F (x) limyx F (y) A coin is tossed one time and shows
Marginal distribution: head with probability p with X(H) = 1
limy FX,Y (x, y) = FX (x) and X(T ) = 0 3.3 Conditional distribu-
limx FX,Y (x, y) = FY (y) F (X) = 0 x < 0 tions
Lemma 5, p. 39: F (X) = 1 p 0 x < 1
F (X) = 1 x 1 Definition:
The joint distribution function FX,Y of The conditional distribution of Y given
the random vector (X, Y ) has the fol- E(X) = p, Var(X) = p(1 p)
Binomial distribution bin(n, k) X = x is:
lowing properties: FY |X (y|x) = P(Y y|X x)
lim FX,Y (x, y) = 0 A coin is tossed n times and a head R x f (v,y)
x,y turns up each time with probability p. = fY (y)
dv, {y : fY (y) > 0}
lim FX,Y (x, y) = 1 The total number of heads is discribed Theorem: Conditional expectation
x,y
1
P
(X) = E (Y | X) , E ((X)) = E(Y ) Exponential
R itx x distr. (t) = Fij (s) = n=0 sn fij (n)
E ((X)g(X)) = E (Y g(X)) 0
e e dx then
R eitx
Cauchy distr: (t) = 1 (1+x 2 ) dx
(a) Pii (s) = 1 + Fii (s)Pii (s)
Normal distr, N (0, 1) : (t) = (b) Pij (s) = Fij (s)Pij (s) if i 6= j
3.4 Sums of random vari- E(eitX ) =
R 1
exp(itx 1 2
x )dx Definition: First visit time Tj
2 2
Tj := min{n 1 : Xn = j}
ables Corollary:
Random variables X and Y have the Definition: mean recurrenceP time i
Theorem: P same characteristic function if and only i := E(Ti | X0 = i) = n nfii (n)
P(X + Y = z) = x f (x, z x) if they have the same distribution func- Definition: null, non-null state
If X and Y are independent, then tion. Theorem: Law of large Numbers state i is null if i =
P(X + Y = z) = fX+Y
P P (z) = Let X1 , X2 , X3 . . . be a sequence of iid state i is non-null if i <
x fX (x)fY (z x) = y fX (z r.vs with finite mean . Theorem: nullness of a persistent state
y)fY (y) Their partial sums Sn = X1 + X2 + + A persistent state is null if and only if
D pii (n) 0(n )
Xn satisfy n1 Sn as n
Definition: period d(i)
Central Limit Theorem:
3.5 Multivariate normal dis- Let X1 , X2 , X3 . . . be a sequence of iid The period d(i) of a state i is defined
by d(i) = gcd{n : pii (n) > 0}. We call
tribution: r.vs with finite mean and finite non-
i periodic if d(i) > 1 and aperiodic if
exp( 12 (x)T V1 (x)) zero 2 , and let Sn = X1 +X2 + +Xn
f (x) = d(i) = 1
(2)n det(V) then
(Sn n) D Definition: Ergodic
X = (X1 , X2 , . . . , Xn )
T N (0, 1) as n
n 2 A state is called ergodic if it is persis-
= (1 , . . . , n ), E(Xi ) = i tent, non-null, and aperiodic.
V = (vij ), vij = Cov(Xi , Xj ) Definition: (Inter-)communication
i j if pij (m) < 0 for some m
5 Markov chains i j if i j and j i
Theorem: intercommunication
4 Generating functions Definiton Markov Chain If i j then:
P (Xn = s | X0 = x0 , ...Xn 1 = xn 1) = (a) i and j have the same period
Definition: Generating function
P (Xn | Xn 1 = xn 1) (b) i is transient iff j is transient
The generating function of the random
Definition homogenous chain (c) i is null peristent iff j is null peris-
variable X is defined by:
P (Xn + 1 = j | Xn = i) = P (X1 = j | tent
G(s) = E(sX )
X0 = i) Definition: closed, irreducible
Example: Generating functions
Definition transition matrix a set C of states is called:
Constant: G(s) = sc
P = (pij ) with (a) closed if pij = 0 for all i C, j
/C
Bernoulli: G(s) = (1 p) + ps
ps pij = P (Xn+1 = j | Xn = i) (b) irreducible if i j for all i, j C
Geometric: 1s(1p)
Theorem: P stochastic matrix an absorbing state is a closed set with
Poisson: G(s) = e(s1) (a) P has non-negative entries one state
Theorem: expectation G(s) (b) P has row sum equal 1 Theorem: Decomposition
(a) E(s) = G0 (1) n-step transition State space S can be partitioned as
(b) E(X(X 1) . . . (X k+1)) = G(k) (1) pij (m, m + n) = P (Xm+n = j | Xm = i) T = T C1 C2 . . . , T is the set of
Theorem: independance Theorem Chapman Kolmogorov transient states, Ci irreducible, closed
X and Y are independent, iff pij (m.m + n + r) = sets of persistent states
GX+Y (s) = GX (s)GY (s)
P
k pik (m, m + n)pkj (m + n, m + n + r) Lemma: finite S
so If S is finite, then at least one state is
n
4.1 Characteristic functions P(m, m + n) = P persisten and all persistent states are
Definiton: mass funtion non-null.
Definition: moment generating function (n) = P (X = i)
i n
M (t) = E(etX ) (m+n) = (m)Pn (n) = (0)Pn
Definition: characteristic function Definition: persistent, transient
(t) = E(eitX ) 5.1 Stationary distributions
persistent:
Theorem: independance P (Xn = i for some n 1 | X0 = i) = 1 Definition: stationary distribution
X and Y are independent iff transient: is called stationaryPdistribution if
X+Y (t) = X (t)Y (t) P (Xn = i for some n 1 | X0 = i) < 1 (a) j 0 for all j, Pj j = 1
Theorem: Y = aX + b Definition: first passage time (b) = P , so j = i i pij for all j
Y (t) = eitb X (at) fij (n) = P (X1 6= j, ., Xn1 6= j, Xn = Theorem: existence of stat. distribution
Definition: joint characteristic function j | X0 = i) An irreducible chain has a stationary
P
X,Y (s, t) = E(eisX eitY ) fij := n=1 fij (n) distribution iff all states are non-null
Independent if: Corollary: persistent, transient persistent.
X,Y (s, t) = X (s)Y (t) for all s and t State j is persistent if n pjj (n) = Then is unique and given by i = 1
P
i
Theorem: moment gf charact. fcn
P
and n pij (n) = Pfor all i Lemma: i (k)
Examples of characteristic functions: State j P is transient if n pjj (n) < i (k): mean number of visits of the
Ber(p): (t) = q + peit and n pij (n) < for all i chain to the state i between two succes-
Bin distribution, bin(n, p): X (tt) = Theorem:PGenerating functions sive visits to state k.
n
(q + peit ) Pij (s) = n=0 sn pij (n) Lemma: For any state k of an irre-
2
ducible persistent chain, the vector Eq. Forward System of Equations: convergent in distribution
(k) satisfies i (k) < for all i and p0 ij (t) = j1 pi,j1 (t) j pij (t) Xn
d
X
(k) = (k)P j i, 1 = 0, pij (0) = ij if P (Xn < x) P (X < x) as n
Theorem: irreducible, persistent Eq. Backward systems of equations: Theorem: implications
If the chain is irreducible and persis- p0ij (t) = i pi+1,j (t) pij (t) a.s./r P
tent, there exists a positive x with j i pij (0) = ij (Xn X) (Xn X)
D
x = xP , which is unique to a multiplica- Theorem: (Xn X)
tive
P constant. The chain P is non-null if The forward system has a unique solu- For r > s 1 :
r s
i x i < and null if i xi = tion which satisfies the backward equa- (Xn X) (Xn X)
Theorem: transient chain if tion. Theorem: additional implications
s any state of an irreducible chain. The D
(a) If Xn c, where c is const, then
chain is transient iff there exists a non- 5.4 Continuous Markov Xn P c
zero solution {yj : j 6= s}, with |yj | 1
for allPj, to the equation:
chain (b) If Xn
P
X and P (|Xn | k) = 1
r
yi = j,j6=s pij yj , i 6= s Definition: Continuous Markov chain for all n and some k, then Xn X for
Theorem: persistent if X is continuous Markov chain if: all r 1
s any state of an irreducible chain on P (X(tn ) = j | X(t1 ))i1 , ..., X(tn1 ) = (c) IfPPn () = P (|Xn X|) > ) satis-
S = {0, 1, 2, ...}. The chain is persistent in1 ) = P (X(tn ) = j | X(tn1 ) fies n Pn () < for all > 0, then
a.s.
if there exists a solution {yj : j 6= s} to Definition: transistion probability Xn X
the inequalities
P pij (s, t) = P (X(t) = j | X(s) = i) for Theorem: Skorokhods representation t.
yi j,j6=s pij yj , i 6= s st D
If Xn X as n
Theorem: Limittheorem homogeneous if pij (s, t) = pij (0, t s) then there exists a probability space and
For an irreducible aperiodic chain, we Def: Generator Matrix random variable Yn , Y with:
have that G = (gij ), pij (h) = gij h if i 6= j and (a) Y and Y have distribution F , F
n n
pij (n) 1j as n for all i and j pii = 1 + gij h a.s.
(b) Yn Y as n
Eq. Forward systems of equations:
Theorem: Convergence over function
5.2 Reversibility Pt0 = Pt G D
Eq. Backward systems of equations: If Xn X and g : R R is continu-
D
Theorem: Inverse Chain Pt0 = GP t ous then g(Xn ) g(X)
Y with Yn = XN n is a Markov chain Often solutions on the form Pt = Theorem: Equivalence

with P (Yn+1 = j | Yn = i) = ( ji )pji exp(tG) The following statements are equivalent:
Definition: Reversible chain Matrix Exponential: (a) Xn X
D
A chain is called reversible if P tn n
exp(tG) = n=1 n! G (b) E(g(Xn )) E(g(X)) for all
i pij = j pji Definition: bounded continuous functions g
Theorem: reversible stationary Irreducible if for any pair i, j, pij (t) > 0 (c) E(g(Xn )) E(g(X)) for all func-
If there is a with i pij = j pji then for some t tions g of the form g(x) = f (x)I[a,b] (x)
is the stationary distribution of the Definition: Stationary where f is continuous.
chain. , j 0,
P
j j = 1 and Theorem: Borel-Cantelli
= Pt t 0 Let A = n m=n Am be the event that
5.3 Poisson process Claim: = Pt G = 0 infinitely many of Pthe An occur. Then:
Theorem: (a) P (A) = 0 if P n P (An ) <
Definition: Poisson process Stationary if pij (t) j as t i, j (b) P (A) = 1 if n P (An ) = and
N (t) gives the number of events in time Not stationary if pij 0 A1 , A2 , ... are independent.
t Theorem:
Poisson process N (t) in S = {0, 1, 2, ...}, X X and Y Y implies
if 6 Convergence of Ran- Xnn + Yn X + Yn for convergence
(a) N (0) = 0; if s < t then N (s) N (t)
(b) P (N (t + h) = n + m | N (t) = n) = dom Variables a.s., r:th mean and probability. Not
generally true in distribution.
h + o(h) if m = 1
Norm
o(h) if m > 1
(a)kf k 0
1 h + o(h) if m = 0
(b) kf k = 0 iff f = 0
(c) the emission per interval are inde- 6.1 Laws of large numbers
(c) kaf k = |a| kf k
pendent of the intervals before.
(d) kf + gk kf k + kgk
Theorem: Poisson distribution Theorem:
convergent almost surely 2
N (t) has the Poisson distribution: a.s. X 1 , X2 , . . . is iid and E(Xi ) < and
(t)j t Xn X,
P (N (t) = j) = j! e if { : Xn () X() as n } 1
Pn = then
E(X)
Definition: arrivaltime, interarrivaltime n i=1 Xi a.s. and in mean square
convergent in rth mean Theorem:
Arrival time: Tn = inf {t : N (t) = n} r
Interarrivaltime: Xn = Tn Tn1 Xn X {Xn } iid. Distribution function F .
Theorem: Interarrivaltime if E|Xnr | < and E(|Xn X|r ) 0 as Then 1 Pn X P
iff one of the fol-
n i=1 i
X1 , X2 , ... are independent having expo- n lowing holds:
nential distribution Definition: convergent in probability 1) nP (|X | > n) 0 and
R
xdF
1 [n,n]
P
Birth process Poisson process with Xn X as n
intensities 0 , 1 , . . . if P (|Xn X| > ) 0 as n 2) Char. Fcn. (t) of Xi is differen-
3
tiable at t = 0 and 0 (0) = i The best predictor of Y given X is with
Pnthe same mean as Xn such that
1
Theorem: Strong law of large numbers E(X|Y ) n j=1 Xj Y a.s and in mean
X1P, X2 , ... iid. Then Weakly stationary processes:
1 n
n i=1 Xi a.s. as n . If X = {Xn : n 1} is a weakly station-
for some , iff E|X1 | < . In this case 7 Stochastic processes ary process, there exists
Pn a Y such that
m.s.
= EX1 E(Y ) = E(X1 ) and n1 j=1 Xj Y .
Definition: Renewal process
N = {N (t) : t 0} is a process for
6.2 Law of iterated loga- which N (t) = max{n : Tn t} where 8.1 Gaussian processes
T0 = 0, Tn = X1 + X2 + + Xn for
rithm n 1, and the Xn are iid non-negative Definition:
If X1 , X2 , ... are iid with mean 0 and r.v.s A real valued c.t. process is called
variance 1 then Gaussian if each finite dimensional
Sn vector (X(t1 ), X(t2 ), . . . , X(tn )) has
P(lim supn 2nloglogn = 1) = 1
8 Stationary processes the multivariate normal distribution
N ((t), V(t)) , t = (t1 , t2 , . . . , tn )
Definition: Theorem:
6.3 Martingales The Autocovariance function The Gaussian process X is stationary
Definition: Martingale c(t, t + h) = Cov(X(t), X(t + h)) iff. E (X(t)) is constant for all t and
Sn : n 1 is called a martingale with Definition: The process X = {X(t) : V(t) = V(t + h) for all t and h > 0
respect to the sequence Xn : n 1, if t 0} taking real values is called
(a) E|Sn | < strongly stationary if the families

(b) E(Sn+1 | X1 , X2 , ..., Xn ) = Sn {X(t1 ), X(t2 ), ..., X(tn )} and {X(t1 + 9 Inequalities
Lemma: h), X(t2 +h), ..., X(tn +h)} has the same
E(X1 + X2 |Y ) = E(X1 |Y ) + E(X2 |Y ) joint distribution for all t1 , t2 , ..., tn and Cauchy-Schwarz:
2
E (Xg(Y )|Y ) = g(Y )E(X|Y ), g : h>0 (E(XY )) E(X 2 )E(Y 2 )
Rn R Definition: The process X = {X(t) : with equality if and only if aX +bY = 1.
E (X|h(Y )) = E(X|Y ) if h : Rn Rn t 0} taking real values is called Jensens inequality:
is one-one weakly stationary if, for all t1 and Given a convex function J(x) and a r.v.

Lemma: Tower Property t2 and h > 0 :E(X(t1 )) = E(X(t2 )) X with mean : E(J(X)) J()
E [E(X|Y1 , Y2 )|Y1 ] = E(X|Y1 ) and Cov(X(t1 ), X(t2 )) = Cov(X(t1 + Markovs inequality
Lemma: h), X(t2 + h)), thus if and only if it has P (|X| a) E|X| for any a > 0
a
If {Bi : 1 i Pn} is a partition of A constant means and its autocovariance h : (R) [0, ] non-negative fcn, then
n function satisfies c(t, t + h) = c(0, h)
then E(X|A) = i=1 E(X|Bi )P(Bi ) P (h(X) a) E(h(X)) a > 0
a
Theorem: Martingale convergence Definition: General Markovs inequality:
If {Sn } is a Martingale with E(Sn2 < The covariance of complex-valued h : (R) [0, M ] non-negative function
M < ) for some M and all n then C1 and C2 is bounded by some M . Then
a.s,L2 Cov(C1 , C2 ) = E (C1 EC1 )(C2 EC2 )
S : Sn S P (h(X) a) E(h(X))a M a 0a<M
Theorem:
Chebyshevs Inequality:
{X} real, stationary with zero mean E(X 2 )
and autocovariance c(m). P (|X| a) a2 if a 0
6.4 Prediction and condi- The best predictor from the class of lin- Theorem: Holders inequality
tional expectation ear functions of the subsequence {X}rrs If p, q > 1 and p1 + q 1 = 1 then
1 1
is E|XY | (E|X p |) p (E|Y p |) q
Notation: Xbr+k = P s
Psi=0 ai Xri Minkowskis inequality:
p p
kU k2 = E(U 2 ) = hU, U i where i=0 ai c (|i j|) = c(k + j) for If p 1 then
hU, V i = E(U V ), kUn U k2 0 0js 1 1 1
L2 [E(|X + Y |p )] p (E|X p |) p + (E|Y p |) q

Un U Definition: Minkowski 2:
kU + V k2 kU k2 + kV k2 Autocorrelation function of a weakly E(|X + Y |p ) CP [E|X p | + E|Y p |]
Def. stationary process with autocovariance where p > 0 and
X, Y r.v. E(Y 2 ) < . The minimum function c(t) is
(t) = Cov(X(0),X(t)) c(t) 1 0<p1

mean square predictor of Y given X is = c(0)
Var(X(0))Var(X(t)) CP
Y = h(X) = min kY Y k2 2p1 p > 0
Theorem:
Theorem:
Spectral theorem: Weakly stationary Kolomogorovs inequality:
space H is closed and Y H
If a (linear)
process X with strictly pos. 2 is the Let {X } be iid with zero means and
then min Y Y exists
n
2
char. function of some distribution F variances 2 . Then for > 0
n
Projection Theorem: wheneverR (t) is continuous at t = 0 2 ++ 2
H is a closed linear space and Y : (t) = e dF () P( max |X 1 + + Xi | > ) 1 2 n
1in
E(Y 2 ) < F is called the spectral distribution Doob-Kolomogorovs inequality:
For M H then E((Y M )Z) = 0 function. If {Sn } is a martingale, then for any
kY M k2 kY Zk2 Z H Ergodic theorem: >0
E(S 2 )
Theorem: X is strongly stationary such that P( max |Si | > ) 2n
1in
Let X and Y be r.v., E(Y 2 ) < . E|X1 | < there exists a r.v Y

Cheatsheet PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cheatsheet PDF

Uploaded by

Copyright:

Available Formats

1 Sets if (x1 , y1 ) (x2 , y2 ) then by:

P(A) = limi P(Ai ) Theorem: E(X) = , Var(X) = 2

L2 [E(|X + Y |p )] p (E|X p |) p + (E|Y p |) q

You might also like