You are on page 1of 58

c

2011
by Taejeong Kim

Gaussian process
Gaussian process: For any choice of t1, , tk ,
X = (Xt1 , , Xtk )t is a Gaussian random vector.
A Gaussian random process is fully characterized by its 1st
and 2nd moment, ie, by mX (t) and RX (t, s).
Any linear or affine transformation of a Gaussian random
process is Gaussian, eg, integration, differentiation, and stable linear filtering.
If samples of a Gaussian random process are uncorrelated,
they are independent.
If a Gaussian random process is wss, it is sss.
A stationary Gaussian process with arbitrary acf or psd can
be obtained by filtering a white Gaussian noise.

c
2011
by Taejeong Kim

jointly Gaussian processes Xt and Yt: For any choice of


t1, , tk and s1, , sl , (Xt1 , , Xtk , Ys1 , , Ysl )t is a Gaussian random vector.
Jointly Gaussian random processes are fully characterized by
their 1st and 2nd moments, ie, by mX (t), mY (t), RX (t, s),
RY (t, s), and RXY (t, s).
Any linear or affine transformation of jointly Gaussian random processes is Gaussian.
If two jointly Gaussian random processes are uncorrelated,
they are independent.
If jointly Gaussian random processes are jwss, they are jsss.

c
2011
by Taejeong Kim

White noise
white noise Xt: wss process with SX (f ) = N20 watts/Hz
mX (t) = 0, RX ( ) = N20 ( )
> 0, Xt and Xt+ are uncorrelated.
Z
N0 df = : infinite average power
2
PX = EXt =
2

If Xt is Gaussian, > 0, Xt and Xt+ are independent.

Thermal noise: Gaussian, SX (f ) = N20

|f |/f0

,
exp(|f |/f0)1

N0 = kT = 3.77 1021, k: Boltzmann constant


f0 = kT /h = 5.69 THz, h: Plank constant
Xt and Xt+ are independent if > a few pico-seconds.
discrete-time white noise: uncorrelated identical rvs
6= samples of continuous-time white noise.

c
2011
by Taejeong Kim

Matched filter
We discuss cont-time cases; disc-time cases are similar.
Consider detecting a deterministic signal v(t) among a wss random noise Xt with psd SX (f ).
v(t) + Xt

h(t)

w(t) + Yt

t0

H1

H0

w(t) + Yt =
h(t )(v( ) + X )d
Z
w(t) =
h(t )v( )d
Z
Yt =
h(t )X d
goal: Find h(t) that maximizes the signal-to-noise ratio (SNR)
at t0: R = |w(t0)|2/EYt20 .

c
2011
by Taejeong Kim

H(f ), V (f ), W (f ): Fourier transforms of h(t), v(t), w(t)


Z

j2f t0
w(t0) =
W
(f
)e
df

Z
j2f t0
=
H(f
)V
(f
)e
df

H(f )

V
(f )ej2f t0
df
SX (f )
SX (f )

V (f )ej2f t0

H(f )

|w(t0)|

SX (f )

2
|H(f
)|
SX (f )df

SX (f )

df

|V (f )|2
df
SX (f )

[Schwarz ineq: | xtyt dt| |xt| dt |yt|2dt]


=

EYt20

SY (f )df

|V (f )|2
df
SX (f )

|V (f )|2
df
SX (f )

c
2011
by Taejeong Kim

2
Z
|w(t0)|2
|V (f )|
R=
S (f ) df
2
EYt0
X

V
(f )ej2f t0
Equality holds if and only if H(f ) SX (f ) = a
SX (f )
s

V (f )ej2f t0
or H(f ) = a
for some constant a
SX (f )

: matched filter, matched to the input signal


It emphasizes the frequency band where the signal exceeds
the noise, while it suppresses the band where the noise exceeds the signal.

|a V (f )|2
SY (f ) = |H(f )| SX (f ) = S (f )
X
2
Z
|a V (f )| j2f
RY ( ) = S (f ) e
df
X
2

c
2011
by Taejeong Kim

7
Z

j2f t
w(t) =
df
W (f )e
Z
j2f t
=
df
H(f )V (f )e

V (f )ej2f t0
j2f t
=
V
(f
)e
df
SX (f )
Z
|V (f )|2 j2f (tt0)

w(t) = a S (f ) e
df = a1 RY (t t0)
X
EYt2
1
w(t0) = a RY (0) = a : maximum at t0
EYt2
R= 2
|a|
Z

When Xt is Gaussian and the decision is made by thresholding


the output, the matched filter also minimizes the probability of
decision error with a proper threshold level.

c
2011
by Taejeong Kim

If Xt is white with psd SX (f ) = N20 ,

V (f )ej2t0
H(f ) = a S (f )
X
2a V (f )ej2t0
=N
0
2a v(t t)
h(t) = N
0
0

HH

v(t)

HH

HH

H
H

h(t)

w(t0) + Yt0 =
=

t0

h(t0 )(v( ) + X )d
2a Z v( )(v( ) + X )d : time

N0

correlation

Z
Z
2a

2
= N ( v ( )d +
v( )X d )

This shows that the signal energy is fully utilized.

c
2011
by Taejeong Kim

Wiener filter
Consider estimating a random signal Vt from a random observation signal Ut. They are assumed jwss with zero mean and
known psds and cross psd. We discuss continuous-time cases.
Ut

h(t)

Vt

goal: Find h(t) that minimizes the mean-squared error (mse)


E|Vt Vt|2.
Z
Z

Vt = h(t )U d =
h()Ut d: optimal
Z
Z

Vt =
h(t )U d = h()U
t d: another

E|Vt Vt|2 E|Vt

Vt|2 for

any h(t).

c
2011
by Taejeong Kim

10

Vt6

Vt Vt

9
Q
Q
s
Q

Vt

Vt

subspace of all the processes


obtained by linear-filtering
of Ut

orthogonality principle:
Z

E Vt(Vt Vt) = 0 for all Vt =


h()Ut d or for all h(t)
Vt is an mmse estimation.

c
2011
by Taejeong Kim

11

proof: E|Vt Vt|2 = E|(Vt Vt) + (Vt Vt)|2


= E|Vt Vt|2 + E|Vt Vt|2 + 2E(Vt Vt)(Vt Vt)
E|Vt Vt|2 if the last term disappears.
Z
Z

Since Vt Vt = h()Ut d
h()Ut d
Z

= (h() h())U
t d,

it is in the subspace.
E(Vt Vt)(Vt Vt) = 0 by orthogonality.
optimal filter, Wiener filter:
Z

0 = E Vt(Vt Vt) = E(Vt Vt)


h()U
t d [orthogonality]

=E
h()(Vt Vt)Ut d

=
h()(EV
U

E
V
t
t
tUt )d

=
h()(RV U () RV U ())d
Z

c
2011
by Taejeong Kim

12

For this to hold for any h(),


we must have RV U () = RV U ().
RV U ( ) = RV U ( ) = RU ( ) h( )
SV U (f ) = SU (f )H(f )
SV U (f )
H(f ) =
SU (f )
If Ut = Vt + Xt, where Xt is a noise uncorrelated with Vt,
SV U (f ) = SV (f ) + SV X (f ) = SV (f )
SU (f ) = SV (f ) + SX (f ) + SV X (f ) + SXV (f ) = SV (f ) + SX (f )
SV (f )
H(f ) =
SV (f ) + SX (f )
The derivation goes parallel for discrete-time cases.

c
2011
by Taejeong Kim

13

Binomial counting process and random walk


counting process Nt:
(1) Nt = 0, t 0; (2) integer-valued; (3) non-decreasing
binomial counting process Nn: discrete-time process;
X
Nn = ni=1 Xi, where Xi is a Bernoulli process.
u u u
u u u u u u

u u

u u u u u
u u u

u
-

u
u

u
u

u
u

u
u

u
u

u
u

u
u -

random walk Nn: discrete-time process;


X
Nn = ni=1 Xi, where Xi is a modified (1) Bernoulli process.

c
2011
by Taejeong Kim

14

Poisson process
Poisson process Nt: continuous-time counting process with
(t)k t
P (Nt = k) = pNt (k) = k! e
for some > 0, t 0.

: arrival rate, the number of arrivals per unit time


In n repeated Bernoulli trials with the success probability p =
t , for a fixed constant , let the random variable K be the
n
n
number of successes. Then EKn = , and as n , pKn (k)
converges to the pmf of Poi(t).
0

Consider dividing [0, t] into n subintervals and a Bernoulli


trial is executed on each subinterval. Then Kn is a binomial
dist
counting process, and Kn
Nt as n .

c
2011
by Taejeong Kim

15

We have seen a proof before.

n
t(z1)
t
t

alternate proof: GKn (z) = 1 n + n z = 1 + n

et(z1) [Poi(t)] as n .
A Poisson process is also defined for t 0 using increments:
1. N0 = 0.
2. For s < t, Nt Ns is a Poisson rv with mean (t s).
3. For t1 < t2 < < tn, the increments
Nt2 Nt1 , Nt3 Nt2 , , Ntn1 Ntn are independent
: independent increment process

c
2011
by Taejeong Kim

16

Poisson process

c
2011
by Taejeong Kim

17

For an independent increment process with a known initial


value, its joint probabilities can readily be computed.
pNt Ntn (k1, , kn)
1

= P (Nt1N0 = k1 0, Nt2Nt1 = k2 k1, , NtnNtn1 = kn kn1)


= P (Nt1N0 = k1 0)P (Nt2Nt1 = k2 k1) P (NtnNtn1 = kn kn1)
((t10))k10 (t10) ((t2t1))k2k1 (t2t1)

= (k 0)! e
(k
k
)!
1
2
1
((tntn1))knkn1 (tntn1)
e

(knkn1)!

c
2011
by Taejeong Kim

moments: ENt = t, var(Nt) = t: non-stationary


For t > s,
ENtNs = E(Nt Ns + Ns)Ns = E(Nt Ns)Ns + ENs2
= E(Nt Ns)ENs + ENs2
= (t s) s + (s)2 + s = (t)(s) + s
cov(Nt, Ns) = ENtNs ENtENs = s
RN (t, s) = ENtNs = (t)(s) + min(t, s)
CN (t, s) = cov(Nt, Ns) = min(t, s)

18

c
2011
by Taejeong Kim

19

arrival time: Tk := min{t > 0 : Nt k}


1 FTk (t) = P (Tk > t) = P (Nt < k)
=

k1
i=0

(t)i t
e
i!

Nt

k1
i=1

= e

i!

k1
i=1

(t)i

Tk

et + i! ()et

i1
(t)

= et

d (1 F (t))
fTk (t) = dt
Tk

i
d Xk1 (t) et
= dt
i=0 i!
i1
i(t)

(t)i

i!
(i1)!

(t)k1et
= (k1)! , t 0 Erl(k, )

c
2011
by Taejeong Kim

inter-arrival time:
X1, X2, X3, , where Xk = Tk Tk1

Nt

Xk

20
-

Tk1 = t t+x
P (Xk x) = 1 P (Xk > x)
= 1 P (Xk > x|X1 = x1, , Xk1 = xk1)
= 1 P (Nt+x Nt = 0|X1 = x1, , Xk1 = xk1),
where t = Tk1 = x1 + + xk1: (k 1)th arrival time

(x)0 x
= 1 P (Nt+x Nt = 0) = 1 0! e , x > 0

0,

x0

1 ex, x > 0
=
Xk exp()
0,
x0
A counting process with iid exp() interarrival times is a Poisson process of rate .

c
2011
by Taejeong Kim

21

counting process binomial Poisson


arrival time
Pascal
Erlang
inter-arrival time geometric exponential

(superposition of Poisson processes): Lt and Mt are Poisson


processes, independent of each other, with respective rates
and . Then Nt = Lt + Mt is a Poisson process with rate
+ .
(decomposition of a Poisson process): Nt is a Poisson process
with rate ; Xk is a Bernoulli process, independent of Nt, with
success probability p; Lt and Mt are counting processes defined
such that Nt = Lt + Mt, where the n-th arrival of Nt induces
either an arrival of Lt if Xk = 0 or an arrival of Mt if Xk = 1.
Then Lt and Mt are Poisson processes with respective rates
(1 p) and p, and are independent of each other.

c
2011
by Taejeong Kim

22

superposition and decomposition of Poisson processes


Lt

Mt

Nt

Xk

u
u

uu
u

u
-

c
2011
by Taejeong Kim

23

Wiener process
A Wiener process Wt, also called Brownian motion, describes
the motion of a highly excited particle in a fluid, viewed in one
coordinate, that does not drift off in one direction.
Wiener process for t 0:
1. W0 = 0.
2. For s < t, Wt Ws is a Gaussian random variable with
mean zero and variance 2(t s).
3. For t1 < t2 < < tk , the increments Wt2 Wt1 , Wt3 Wt2 ,
, Wtk1 Wtk are independent: independent increment
process.
4. Each sample path is a continuous function of t.
Wt =

t
0 X d ,

t 0, where Xt is a white Gaussian process.

c
2011
by Taejeong Kim

24

The sample paths of Wt are continuous everywhere but differentiable nowhere.


For any c > 0, Vt = 1 Wc t is an identical Wiener process as
c
Wt. self-similar, fractal
moments: EWt = 0, var(Wt) = 2t
For t > s,
EWtWs = E(Wt Ws + Ws)Ws = E(Wt Ws)Ws + EWs2
= E(Wt Ws)EWs + EWs2 = 2s
cov(Wt, Ws) = EWtWs EWtEWs = 2s
RW (t, s) = EWtWs = 2 min(t, s)
CW (t, s) = cov(Wt, Ws) = 2 min(t, s)
Wiener processes are nonstationary.

c
2011
by Taejeong Kim

25

random walk approximation:


X1, X2, is an equiprobable modified Bernoulli process with
values +1 and 1.
Sn :=

n
i=1 Xi:

symmetric random walk

Wt(n) := 1 Sbntc,
n

where b c is the greatest integer no greater than .


Wt(n)

1
n

1
n

?
6
-

c
2011
by Taejeong Kim

26

As n ,
1. The power of the process is maintained.
2. By the central limit theorem, Wt(n) converges in distribution
to a Gaussian process.
3. As the random walk is an independent increment process, so
is its limit process.
4. Wt(n) eventually becomes a Wiener process.
If the random walk is replaced by a binomial counting process, the limit process is a drifting Wiener process.

c
2011
by Taejeong Kim

27

Markov process
We discuss jointly continuous cases; jointly discrete cases are similar.
Markov property:
For any t1 < t2 < < tn and x1, , xn,
fXtn |Xt Xt (xn|x1, , xn1) = fXtn |Xt (xn|xn1)
1

n1

n1

t1

tn2

tn1 tn

Markov process: a process with the Markov property

c
2011
by Taejeong Kim

28

fXt Xtn (x1, , xn)


1
= fXt (x1)fXt |Xt (x2|x1)fXt
1

fXtn |Xt

|Xt Xt (x3|x1, x2)


3
1 2

(xn|x1, , xn1)
Xt
1
n1

= fXt (x1)fXt
1

(xn|xn1)
|Xt (x2|x1)fXt |Xt (x3|x2) fXtn |Xt
2
1
3
2
n1

examples: binomial counting process, random walk, Poisson


process, Wiener process
Discrete-valued Markov processes are called Markov chains.
equivalence: Markov property is equivalent to the conditional independence of the past and the future given the present.
conditional independence: fXY |Z (x, y|z) = fX|Z (x|z)fY |Z (y|z)

c
2011
by Taejeong Kim

29

conditional independence of the past and the future given


the present: For t1 < t2 < < tn < < tn+k ,
fXt Xt Xt Xt |Xtn (x1, , xn1, xn+1, , xn+k |xn)
1

n1

= fX t

fXt

n+1

n+k

Xt
|X (x1, , xn1|xn)
n1 tn

n+1

Xt
|Xtn (xn+1, , xn+k |xn)
n+k

t1

tn1

tn tn+1

The equivalence implies that the time-reversed Markov process is also Markov.

c
2011
by Taejeong Kim

30

proof of the theorem: arguments are suppressed


Markov property conditional independence:
fXt Xt Xt Xt |Xtn
1

n1

= fX t

n+1

n+k

Xt
|X fXt
Xt
|X Xt
X
n1 tn
n+1
n1 tn
n+k t1

= fX t

Xt
|Xtn (fXt
|X Xtn
1
n1
n+1 t1

= fX t

Xt
|Xtn (fXt
|X
1
n1
n+1 tn

fX t

n+1

[ch: chain rule]

fX t

) [ch]
|Xt Xt
1
n+k
n+k1

fXt

n+k

|Xt

n+k1

) [Mp]

|X
Xt
n+k tn

= fX t

|Xtn

fX t

= fX t

|Xtn

fX t

n+1
n+1

n+2

|Xtn Xt
n+1

|X
n+2 tn+1

fXt

fX t

|X Xt
n+k tn
n+k1

|X
n+k tn+k1

[Mp]

[ch]

c
2011
by Taejeong Kim

31

conditional independence Markov property:


fXt Xt Xtn |Xt
= fXt Xt |Xt fXtn |Xt Xt
1

[ch]
fXt Xt
1

n2

n2

n1

Xtn |Xt

n1

= fX t

n2

n1

Xt
|X
1
n2 tn1

X
n2 tn1

fXtn |Xt

n1

[ci]

An independent increment process with known initial state X0


is Markov.
proof: fXtn |Xt Xt (xn|x1, , xn1)
1

n1

= fXtn Xt

(xn
|Xt Xt
1
n1

= fXtn Xt

(xn
|Xt
n1

n1
n1

xn1|x1, , xn1)

xn1|xn1) = fXtn |Xt

n1

(xn|xn1)

example: binomial counting process, random walk, Poisson


process, Wiener process

c
2011
by Taejeong Kim

32

Chapman-Kolmogorov equation for a Markov process:


for t1 t2 t3,
Z
fXt |Xt (x3|x1) = fXt |Xt (x2|x1)fXt |Xt (x3|x2)dx2
1

t1
proof: fXt
Z

|Xt (x3|x1)
3
1

t2

= fX t

t3

Xt |Xt (x2, x3|x1)dx2


2 3
1

= fX t
Z

|Xt (x2|x1)fXt |Xt Xt (x3|x1, x2)dx2


2
1
3
1 2

= fX t

|Xt (x2|x1)fXt |Xt (x3|x2)dx2


2
1
3
2

[Mp]

[ch]

[marginal]

c
2011
by Taejeong Kim

33

|Xt (xk |x1)


1
k

fX t
Z

= fX t

(xk |xk1) fXt |Xt (x2|x1)dx2 dxk1


|Xt
2
1
k
k1

For Markov Xt and t1 < t2 < t3,


E(E(Xt3 |Xt2 )|Xt1 ) = E(Xt3 |Xt1 ).
proof:
Xt1
E(E(Xt3 |Xt2 )|Xt1 = x1)
Z
= E(Xt3 |Xt2 = x2)fXt |Xt (x2|x1)dx2
Z

= [ x 3 fX t
Z

= x3[ fXt

= x3fXt

|Xt (x3|x2)dx3]fXt |Xt (x2|x1)dx2


2
2
1
|Xt (x3|x2)fXt |Xt (x2|x1)dx2]dx3
2
2
1

|Xt (x3|x1)dx3
1

= E(Xt3 |Xt1 = x1)

[C-K eqn]

Xt2

Xt 3

c
2011
by Taejeong Kim

34

homogeneous Markov process:


fXt|Xs (u|v) is shift invariant,
ie, , fXt+ |Xs+ (u|v) = fXt|Xs (u|v).
homogeneous 6= stationary
A discrete-time homogeneous Markov chain (discrete-valued
Markov process) is characterized by a state transition diagram, which includes states and transition probabilities.
state: the value of the random process, often an integer
transition probability: pij = P (Xn+1 = j|Xn = i)

c
2011
by Taejeong Kim

35

example: random walk


p
p
j
Y

1p

-1

1p

1p

1p

example:
0.2

0.1
q

0.8 gry/2

0.3

0.4

blu/1

0.2

0.5

blk/3 i 0.9

0.5

brn/4

0.1

j
Y

1p

c
2011
by Taejeong Kim

36

transition matrix P : the matrix whose (i, j)th entry is pij


example: random walk

..
..
..
..

1p 0
p
0
0 1p 0
p
0
0 1p 0
0
0
0 1p

..

..

..

example:

0
0.2
0
0.2

0.1
0.8
0
0.3

0.4
0
0.9
0.5

0.5

0.1

..

..

..

0
0
p
0

0
0
0
p

..

..

c
2011
by Taejeong Kim

37

j pij = 1
(2)
pij := [P 2]ij

= k pik pkj
X
= k P (Xn+1 = k|Xn = i)P (Xn+2 = j|Xn+1 = k)
= P (Xn+2 = j|Xn = i): Chapman-Kolmogorov equation
(m)

pij := [P m]ij = P (Xn+m = j|Xn = i)


(n+m)

pij

(n) (m)

k pik pkj : Chapman-Kolmogorov equation


(m)
i pij P (Xn
m

P (Xn+m = j) =
= i)
p(n+m) = p(n)P , where
p(n) := (P (Xn = 1), P (Xn = 2), P (Xn = 3), ) is
the marginal pmf of Xn expressed as a row vector.
p(n) = p(0)P n

c
2011
by Taejeong Kim

38

stationary distribution for P : = (1, 2, 3, )


X
that satisfies = P and i i = 1.
If exists for P of a Markov chain,

1
1
n
limn P =
1
..

2
2
2
..

..

limn p(n) = limn p(0)P n = regardless of p(0).


The Markov Chain is asymptotically stationary if exists.

c
2011
by Taejeong Kim

39

Gauss Markov process


A Gaussian process Xt is Markov if and only if
for any t1 < t2 < t3, CX (t1, t3) = CX (tC1,t2(t)C,tX (t) 2,t3) .
X 2 2

Note that CX (t2, t2) = var(Xt2 ) and that this includes both
discrete and continuous time processes.
proof of only if part:
Recall if X and Y are jointly Gaussian,
E(X|Y ) = mX + X (Y mY ).
Y
For Gaussian Xt,
CX (t1,t2)
E(Xt2 |Xt1 ) = mX (t2) + C
[X mX (t1)].
(t ,t ) t1
X 1 1

c
2011
by Taejeong Kim

40

For Markov Xt,


E(E(Xt3 |Xt2 )|Xt1 ) = E(Xt3 |Xt1 ).
E(E(Xt3 |Xt2 )|Xt1 )

CX (t2,t3)

= E mX (t3) + C
[X

m
(t
)]|X
t
X
2
t
2
1
(t ,t )
X 2 2

X (t2,t3) [E(X |X ) m (t )]
= mX (t3) + C
t2
t1
X 2
C (t ,t )
X 2 2

= mX (t3)

CX (t1,t2)

X (t2,t3)
m

+C
(t
)
+
[X

m
(t
)]

m
(t
)
X
2
t
X
1
X
2
1
C (t ,t )
C (t ,t )
X 2 2

X 1 1

X (t2,t3) CX (t1,t2) [X m (t )]
= mX (t3) + C
X 1
CX (t2,t2) CX (t1,t1) t1
X (t1,t3) [X m (t )],
Since E(Xt3 |Xt1 ) = mX (t3) + C
X 1
C (t ,t ) t1
X 1 1

the proof is complete. if part is not given.

c
2011
by Taejeong Kim

41

example: Wiener process: CX (t, s) = 2 min(t, s)


example: Ornstein-Uhlenbeck process: mX (t) = 0,
CX (t, s) = 2e|ts|, ie, CX ( ) = 2e| |
Compare with the random telegraph process.
Ornstein-Uhlenbeck process is the only continuous-time zeromean stationary Gauss Markov process.
If Xt is stationary, the equation of the theorem implies exponential CX ( ), therefore suggesting the proof of the if part.
1. C(1 + 2) = 12 C(1)C(2) for 1, 2 > 0
2. C( ) is even

C( ) = 2e| |, where = ln 2 ln C(1)

c
2011
by Taejeong Kim

42

Autoregressive and moving average process


Autoregressive (AR) and moving average (MA) processes are
discrete-time processes generated by filtering iid processes.
1st order AR process:
Xn = aXn1 + Wn, where Wn is iid.
1st order all-pole filter output when Wn is the input.
Wn

Markov

Xn

c
2011
by Taejeong Kim

43

If stationary and zero mean,

2
X

2
2
a2X
+W

2
X

2
W
=
1a2

2
RX (1) = EXnXn1 = E(aXn1 + Wn)Xn1 = aX
2
RX (2) = EXnXn2 = E(aXn1 + Wn)Xn2 = a2X

2
RX ( ) = EXnXn = E(aXn1 + Wn)Xn = a X

X1
1
a
a

X2
a
1
a

2
2

a
a
1
covariance matrix of X3 = X

..
..
..
..

Xn
an1 an2 an3

n1

a
n2
a
n3
a
..

c
2011
by Taejeong Kim

44

kth order AR process:


Xn = a1Xn1 + + ak Xnk + Wn, where Wn is iid.
kth order all-pole filter output when Wn is the input.
all-pole filter, 4th-order
Wn

6
1

6
2

6
3

a4

Xn

all-zero filter, 4th-order


Wn

b0

D
-

b1
-

b2
-

b3
-

b4

Xn

c
2011
by Taejeong Kim

45

A kth order AR process is kth order Markov.


kth order Markov:
fXn|X1Xn1 (xn|x1 xn1)
= fXn|Xnk Xn1 (xn|xnk xn1)
Equivalently, the past and the future are conditionally independent given k consecutive samples.
The psd of an AR process is rational in ej2f .
SX (f ) =
SX (f ) =

2
W

2
W

2
1
: first order
j2f
1ae

1
: k-th order
j2f
j2f
k
1a1e
ak e

If Wn is Gaussian, so is the AR process.


AR modeling of speech is widely used in a variety of applications such as speech coding and speech synthesis.

c
2011
by Taejeong Kim

46

kth order MA process:


Xn = b0Wn + b1Wn1 + + bk Wnk , where Wn is iid.
kth order all-zero filter output when Wn is the input.
not Markov
The psd of an MA process is rational in ej2f .

j2f
j2f k 2
2
+ + bk e
SX (f ) = W b0 + b1e
: kth order
(k, l)th order ARMA process:
Xn = a1Xn1 + + ak Xnk + b0Wn + + bl Wnl ,
where Wn is iid.
(k, l)-th order pole-zero filter output when Wn is the input.

c
2011
by Taejeong Kim

47

pole-zero filter, (4,4)th-order


-

Wn

b0

D
D
D

b2


6
6
2

b3


6
6
3

b1
-


6
6
1



6
6

a
a
a

pole-zero filter, (4,4)th-order

D
?

D
?

D
?

Xn

6
n

6
1

6
2

6
3

b0

D
?

b1

D
?

b2

D
?

b3

a4

b4

a4

6
n

b4

The psd of an MA process is rational in ej2f .


SX (f ) =

2
j2f
j2f
l

b0+b1e
++bl e
2

:
W

j2f
j2f
k

1a1e
ak e

(k, l)-th order

Any rational psd can be synthesized by an ARMA process.

c
2011
by Taejeong Kim

48

Ergodicity
Ergodicity means equality between time averages and statistical
averages.
statistical average: EXt = mX , Eg(Xt)
time average:

1 XT
1 XT X or

t=1 t
t=T Xt, discrete-time
T
2T
+1
ET Xt = 1 Z T
Z
1
T

X
dt
or
X dt, continuous-time

T 0 t
2T T t
ET g(Xt)
EXt = limT ET Xt
Eg(Xt)
Assume Xt is wss for ergodicity of first and second moment
and sss for higher moments.
We will discuss continuous-time cases.

c
2011
by Taejeong Kim

49

Xt is ergodic
in the mean
EXt = EXt
in the 2nd moment
EXt2 = EXt2
in the acf
, EXt+ Xt = EXt+ Xt
equality in all moments
We can compute the corresponding moment or joint moment
by time averaging the appropriate function of a sample path.
Ergodic theorems provide (necessary and) sufficient conditions for certain ergodicities.
Since the time average is a limit of a random sequence, the
senses of the above equalities need to be defined.

c
2011
by Taejeong Kim

50

For an iid random sequence Xt, t = , 1, 0, 1, 2,


pr
WLLN states that ET Xt EXt as T .
a.s.
SLLN states that ET Xt
EXt as T .
ms
We also know that ET Xt
EXt as T . [How?]
These imply ergodicity of the mean in three different senses.
ms
Xt is ms ergodic in the mean: ET Xt
mX as T .

A (wss) process Xt is ms ergodic in the mean if and only if

Z
limT T1 T0 1 T CX ( )d = 0.

c
2011
by Taejeong Kim

51

proof: EET Xt = EEXt = EEXt = mX

Z
2

1 T
var(ET Xt) = E T 0 Xtdt mX

Z
2
1 T

= E T 0 (Xt mX )dt
Z
1
T ZT
= 2 0 0 E(Xt mX )(Xs mX )dtds
T
Z
s
1
T ZT
6
= 2 0 0 CX (t s)dtds (1)
T
Z
1
T Z T s
= 2 0 s CX ( )d ds
T

@
-

6
@

@
@

(t, s)(, s), = t s

Z
Z
1
0 ZT
T Z T
= 2 T CX ( )dsd + 0 0 CX ( )dsd
T
!
Z
Z
1
0
T
= 2 T (T + )CX ( )d + 0 (T )CX ( )d
T

@
@ -

c
2011
by Taejeong Kim

52

Z
1
= 2 TT (T | |)CX ( )d [symmetry]
T

Z
2

T
= T 0 1 T CX ( )d : since CX ( ) is even.

(2)

A (wss) process Xt is ms ergodic in the mean


Z
if
|CX ( )|d < .

|CX ()|d <


Z
1
limT T T0 1 T CX ( )d

proof:

= 0 [dominated conv thm]

var(ET Xt) 0 as T [prev thm]


A (wss) process Xt is ms ergodic in the mean
Z
1
if and only if limT T T0 CX ( )d = 0.
proof: only if part: Assume ergodicity.

c
2011
by Taejeong Kim

53

1 T C ( )d 2
T 0 X

2
Z
1 T

= T 0 E(X mX )(X0 mX )d

2
Z

1
T

= E(X0 mX ) T 0 (X mX )d

2
2 1ZT

E(X0 mX ) E T 0 (X mX )d [Schwarz ineq]

Z
C ( )d [(2)]
2 2 T
1

= X
X
T 0
T
Z
1
if part: Assume limT T T0 CX ( )d = 0 and consider

Z
Z

T
T
1

|var(ET Xt)| = 2 0 0 CX (t s)dtds [(1)]


T

Z
Z
Z
Z

T
t
T
s
1

= 2 ( 0 0 CX (t s)dsdt + 0 0 CX (s t)dtds)
s
T
6

Z
Z

T
t
2

B
= 2 0 0 CX (t s)dsdt [symmetry]

T
A
Z

c
2011
by Taejeong Kim

= 22

54

T t
0 0 CX ( )d dt ,

where = t s, (t, s)(t, ) in A

Z
Z
Z
Z

2
2
T t
T t
= 2 0 0 CX ( )d dt + 2 T 0 CX ( )d dt, where
T
T
T is a constant given by this:
Z
1
Since limT T T0 CX ( )d = 0,

1 t

> 0, T such that t T t 0 CX ( )d < .


Assuming that T is very large and T T,
Z
Z

Z
Z

2
2
t
t
T
T

2 0 0 CX ( )d dt + 2 T 0 CX ( )d dt [ ineq]
T
T

Z
Z
Z

1 t
2
2
T
t
T

2 0 0 CX ( )d dt + 2 T t t 0 CX ( )d dt [ ineq]
T
T

2
2
Z

+ 22 TT tdt = 1 + 22 T T
2
2
T
T

c
2011
by Taejeong Kim

55

ms
Xt is ms ergodic in the acf : ET Xt+ Xt
RX ( )
for each as T .
( )

( )

Define for each , Yt := Xt+ Xt. Then EYt = RX ( ),


( )
and Xt is ms ergodic in the acf if and only if Yt is ms
ergodic in the mean.
( ) ( )
2
( )
CY ( ) () = EYt+ Yt RX
2
( )
= EXt++ Xt+ Xt+ Xt RX

A (wss) process Xt is ms ergodic in the acf


Z
1
if and only if limT T T0 CY ( ) ()d = 0 for every .

c
2011
by Taejeong Kim

56

Isserlis theorem: For jointly Gaussian zero-mean random variables X1, X2, X3, and X4,
EX1X2X3X4
= EX1X2EX3X4 + EX1X3EX2X4 + EX1X4EX2X3.
In fact, for any even number k,
X
EX1X2 Xk = EXi1 Xi2 EXi3 Xi4 EXik1 Xik ,
where the sum is over all possible ways of partitioning
{1, 2, , k} into k/2 sets of pairs.

c
2011
by Taejeong Kim

57

A stationary Gaussian process Xt with zero mean is ms ergodic


in the acf if and only if
Z
1
limT T T0 CX2 ( )d = 0.
( )

proof: Let Yt

:= Xt+ Xt, and consider


( )

( )

CY ( ) () = EYt+ Yt

CX2 ( ) [zero mean]

= EXt++ Xt+ Xt+ Xt CX2 ( )


= EXt++ Xt+ EXt+ Xt + EXt++ Xt+ EXt+ Xt
+EXt++ XtEXt+ Xt+ CX2 ( ) [Isserlis thm]
= CX2 ( ) + CX2 () + CX ( + )CX ( ) CX2 ( )
= CX2 () + CX ( + )CX ( )

c
2011
by Taejeong Kim

58

only if: Assumed is, by the previous theorem (page 55), that
Z
1
limT T T0 CY ( ) ()d = 0 for every .

Z
1
limT T T0 [CX2 () + CX ( + )CX ( )]d = 0
Z
2
Setting = 0, limT T T0 CX 2()d = 0.
Z
1
if: Assume that limT T T0 CX 2( )d = 0.

1
T

limT

C
()d
(
)

0
Y
T

Z
Z

1
1
T
T
2

lim T 0 CX ()d + lim T 0 CX ( + )CX ( )d

Z
Z
1/2
1/2
2
2
1 T

1 T

C ( )d
[Schwarz]
lim T 0 CX ( + )d
T 0 X

= 0 for any finite .


By the previous theorem, the proof is complete.

You might also like