You are on page 1of 32

c

2011
by Taejeong Kim

Random vector and random matrix


random vector:
X = (X1, , Xk )t, EX = (EX1, , EXk )t: mean vector
random matrix:

EZ
Z

11 EZ1l
11 Z1l

.
.
.
.

.
. , EZ = .
Z = .

EZk1 EZkl
Zk1 Zkl
moments
correlation matrix of X:

EX1 EX1Xk

t
.
.

.
.
RX = EXX =

EXk X1 EXk

c
2011
by Taejeong Kim

covariance matrix of X:

var(X1) cov(X1, Xk )

..
..

CX = E(XEX)(XEX)t =

cov(Xk , X1) var(Xk )

uncorrelated X: CX is diagonal.
iid X: CX = 2I, I: identity matrix
Other possibilities: uncorrelated (independent) between subvectors that are each correlated (dependent) within.
definiteness of a matrix
non-negative definite (positive semidefinite) k k matrix A: symmetric and vector a, atAa 0.
positive definite k k matrix A:
symmetric and vector a, atAa > 0.

c
2011
by Taejeong Kim

RX and CX are real symmetric and non-negative definite.


proof:
atCX a = atE(X EX)(X EX)ta
= E[at(X EX)][(X EX)ta] [associative]
= E[at(X EX)]2 0
eigenvalues and eigenvectors
For a square matrix A, if Ax = x, then is an eigenvalue
of A, and x is an eigenvector of A.
A real matrix may have complex eigenvalues and complex
eigenvectors, but a real symmetric matrix has only real eigenvalues and only real eigenvectors (as we can choose).
A non-negative (positive) definite matrix has only non-negative
(positive) eigenvalues.

c
2011
by Taejeong Kim

proof (non-negative definite): For a square matrix A, its


eigenvalue , and the corresponding eigenvector x normalized such that kxk = 1,
= kxk2 = xtx = xtx = xtAx 0
A kk real symmetric matrix A has only real eigenvalues i
and an (Euclidean) orthonormal set of k eigenvectors q i.

1, i = j
t
q i q j = qi1qj1 + qi2qj2 + + qik qjk = ij =
0, i 6= j
proof (real eigenvalues):
In this proof, to be more general, eigenvectors are considered
complex, though they can be chosen to be all real.
i =
ikq ik2 =
iq tq i = (iq i)tq i = (Aq i)tq i = q tAtq i

i
i
= q tiAq i = q tiiq i = iq tiq i = ikq ik2 = i

c
2011
by Taejeong Kim

Let us form a matrix Q by arranging the orthonormal eigenvectors as columns.

q
q

11 q21 qk1
i1

q
q
q

12
22
k2
i2

q i = . , Q = (q 1, , q k ) = . .

.
.
.
.

qik
q1k q2k qkk
Then Q is an orthogonal (unitary) matrix: QQt = QtQ = I,
ie, Qt = Q1.
proof:
Qt Q =

q t1

.. (q , , q ) =
1
k
t
qk

q t1q 1
q t2q 1

q t1q 2
q t2q 2

q t1q k
q t2q k

=I
..
..
..
q tk q 1 q tk q 2 q tk q k

c
2011
by Taejeong Kim

A real symmetric matrix A is diagonalizable. For the matrix


Q defined above and the diagonal matrix whose diagonal
elements are the eigenvalues of A,
A = QQt and = QtAQ
proof:
AQ = A(q 1, , q k ) = (Aq 1, , Aq k ) = (1q 1, , k q k )

QtAQ =

q t1

.. (1q , , k q )
1
k
q tk
1q t1q 1
1q t2q 1

2q t1q 2
2q t2q 2

k q t1q k
k q t2q k

..
..
..
1q tk q 1 2q tk q 2 k q tk q k

1
0
= .
.
0

0
2
..
0

0
..

c
2011
by Taejeong Kim

notation:
Z
Z
Z
X
X
X
x = x1 xk ; dx = dx1 dxk
s

kXk = ki=1 Xi2; EkXk2 =


pX (x) = pX1Xk (x1, , xk )
fX (x) = fX1Xk (x1, , xk )
FX (x) = FX1Xk (x1, , xk )
X

k
2
EX
i
i=1

X (u) = X1Xk (u1, , uk ) = Ee


=

Eej(u1X1++uk Xk )

= Z

[Euclidean norm]

j utX
j ut x

xe

pX (x)

ej u xfX (x)dx
Y
= ki=1 Xi (ui) if independent.

c
2011
by Taejeong Kim

Transformations or functions of random vectors


Y = G(X)

g (X , , X )
Y

1
1
k
1

.
.

. =
.

gl (X1, , Xk )
Yl
If G is continuously differentiable and invertible (l = k),
X = H(Y ), H = G1

X
h (Y , , Y )

1
1 1
k

.
.

.
.

Xk
hk (Y1, , Yk )
y = {y : y1 < Y1 y1 + 1, , yk < Yk yk + k }
volume of y: |y| = 1 k
x = H(y), y = G(x), volume of x: |x|

c
2011
by Taejeong Kim

P (X x) = P (Y y)
fX (x)|x| fY (y)|y|, where y = G(x),
lim|y |0 |x| = | det(dH(y))|

|y |
h1

y1

Jacobian of H: dH(y) = .. . . .

y
1

fX (x)
fY (y) = fX (x)| det(dH(y))| =
| det(dG(x))|

x = H(y)
h1
yk

..

hk
yk

c
2011
by Taejeong Kim

10

example: X, Y N (0, 1) iid; R = X 2 + Y 2, = 6 (X, Y )


For R 0 and < , the transformation from (X, Y )
to (R, ) is continuously differentiable and invertible.
X = R cos , Y = R sin

dx

dr
dH(r, ) = dy

dr

dx
d = cos r sin , det(dH(r, )) = r

dy
sin r cos
d
fR(r, ) = rfXY (r cos , r sin ) (for r 0 and < )

1
r2 cos2 /2 1
r2 sin2 /2
r2/2 1

=r
e
e
= re
2
2
2

1 , <

r /2

, r0
re

fR(r) =
, f() = 2

0,

else

0, else

c
2011
by Taejeong Kim

11

R Ray(1) and unif(-,) are independent.


Y = G(X) = AX + b, where A is square and invertible.
Yi =

k
j=1 Aij Xj

gi
+ bi, x
= Aij , i = 1, , k
i

dG(x) = A; dH(y) = A1
1

fY (y) = fX (A (y b))| det A

fX (A1(y b))
|=
| det A|

EY = AEX + b; Y EY = A(X EX)


CY = E(Y EY )(Y EY )t
= EA(X EX)(X EX)tAt = ACX At
j v tY

j v t(AX + b)

Y (v) = Ee
= Ee
j v tb
=e
X (Atv)

j v tb

=e

Ee

Av X

c
2011
by Taejeong Kim

12

Estimation
minimum mean-squared-error (mmse) estimation of X:
Given the observation Y and some information on the jpdf,
2, where X: k-d, Y : l-d

= g(Y ) = min1
EkX

Xk
find X

X
Xk
2.

= min1
E(X

X
)
i
i

i=1
Xi

linear mmse estimator, Wiener filter:


= AY , where ARY = RXY and RXY = EXY t.
X
i)2 by finding
proof: First for each i, minimize E(Xi X
i = aiY = Xl aij Yj , where ai will form the i-th row of A.
X
j=1

i)2/aij = 0 E(Xi X
i)(Yj ) = 0, j = 1, , l
E(Xi X
note: differentiation and expectation are usually interchangeable.

c
2011
by Taejeong Kim

13

orthogonality principle:
i)Yj = 0
E(Xi X
i Yj
EXiYj = E X

Xi 6

i
Xi X

9 QQ

Yj

Q
Q
s

i =
X

Xl

j=1

aij Yj

For j = 1, , l,
EXiYj = EaiY Yj
= ai(EY1Yj , , EYl Yj )t. [scalar, 1-d]
EXiY t = aiRY [row vector, l-d]
Repeating for i = 1, , k, ARY = RXY . [matrix, kl]
= RXY R1Y if RY is invertible.
A = RXY RY1 and X
Y
2
= EXY Y , solving min1
For 1-d, it becomes X
a E(X aY ) .
EY 2

c
2011
by Taejeong Kim

14

affine mmse estimator, Wiener filter:


= A(Y mY ) + mX ,
X
where ACY = CXY and CXY = E(X mX )(Y mY )t.
proof: Minimize EkX (AY + b)k2
= Ek[(X mX ) A(Y mY )] + (mX AmY b)k2
= Ek(X mX ) A(Y mY )k2 + kmX AmY bk2
[The cross term disappears.]
b = mX AmY and ACY = CXY
= CXY C 1(Y mY ) + mX if CY is
A = CXY CY1 and X
Y
invertible.
This is equivalent to the linear mmse estimator of X mX
based on Y mY .

c
2011
by Taejeong Kim

15

= g(Y ) = E(X|Y )
(general) mmse estimator: X
proof: Minimize EkX g(Y )k2 = EE(kX g(Y )k2|Y )

2|Y = y)p (y)

E(kX

g(y)k

Y
y
= Z
2

E(kX g(y)k |Y = y)f


Y (y)dy
Minimize E(kX g(y)k2|Y = y) for each y to get g(y).
Given y, g(y) is a vector g = (g1, , gk )t.
(g) := E(kX gk2|Y = y)
= E(kXk2|Y = y) + kgk2 2g tE(X|Y = y)
X
X
X
= ki=1 E(Xi2|Y = y) + ki=1 gi2 2 ki=1 giE(Xi|Y = y)
(g)/gj = 0 gj = E(Xj |Y = y)
g(y) = E(X|Y = y) g(Y ) = E(X|Y )
2
= E(X|Y ), solving min1
For 1-d, X
E(X

g(Y
))
.
g

c
2011
by Taejeong Kim

16

alternative proof:
orthogonality principle for functions of Y :
X 6

Eh(Y )t(X g(Y )) = 0 for any h.

X g(Y )
1
2

g(Y ) = ming EkX g(Y )k

h1(Y ) + h2(Y ) = (h1 + h2)(Y )


ah(Y ) = (ah)(Y )

9 QQ
Q
s
Q

h(Y )

g(Y )

proof of orthogonality
principle for functions:
EkX f (Y )k2 = EkX g(Y ) + g(Y ) f (Y )k2
= EkX g(Y )k2 + Ek(g f )(Y )k2
+2E(g f )(Y )t(X g(Y ))
EkX g(Y )k2 if orthogonality holds.

c
2011
by Taejeong Kim

17

Eh(Y )t[X g(Y )]


= EE(h(Y )t[X g(Y )]|Y )
= Eh(Y )tE([X g(Y )]|Y )
= Eh(Y )t[E(X|Y ) g(Y )]
= 0 for any h, if and only if g(Y ) = E(X|Y ).
Why only if?
Therefore, orthogonality holds if and only if g(Y ) = E(X|Y ),
and hence E(X|Y ) is the mmse estimator.

c
2011
by Taejeong Kim

18

Gaussian random vector


X = (X1, , Xk )t is a Gaussian random vector if any
X
linear combination atX = ki=1 aiXi is a Gaussian random
variable.
X N (m, C), m: mean vector, C: covariance matrix
jpdf [def]

1
1

t 1

fX (x) =
(x

m)
exp

(x m) C
k/2
2
(2)
det C
jchf [def]

u Cu

X (u) = exp jm u
2
A Gaussian random vector is fully characterized by its 1-st
and 2-nd moments, ie, by m and C.

c
2011
by Taejeong Kim

19

X N (m, C) Y = AX + b N (Am + b, ACAt)


proof: at(AX + b) = (Ata)tX + atb is Gaussian.
alternative proof:
t(AX +b)
tb
tv )tX
j
v
j
v
j(A
Y (v) = Ee
=e
Ee
tb
j
v
t
= e
X (A
v)

t
t
t

tb
(A v) CX (A v)

t t
j
v

= e
exp j(A v) m
X

t
t

v (ACX A )v

= exp jv (AmX + b)

2
Any linear or affine transformation of a Gaussian random
vector is Gaussian.

c
2011
by Taejeong Kim

20

example: This example shows that the converse of the above


theorem does not hold.

0, Y < 0
Y, Y < 0
; X2 =
Y N (0, 1); X1 =
Y, Y 0
0, Y 0
X1 + X2 = Y , but neither X1 nor X2 is Gaussian.
fX1 (x)

fX2 (x)

If the components of a Gaussian random vector are uncorrelated, they are independent.
proof (sketch): uncorrelated CX is diagonal
Y
CX1 is diagonal fX (x) = i fXi (xi) [See 2-d case]

c
2011
by Taejeong Kim

21

alternative proof:

X (u) = exp jmtu

utCX u
2

Xk
1

= exp j
2 i=1 i2u2i

Yk
Y
= i=1 exp j miui 12 i2u2i = ki=1 Xi (ui)

k
i=1 miui

example: This example shows that each random variable may


be Gaussian while they are not jointly Gaussian.
X N (0, 1); W = 1, equiprobable, independent of X;
Y = WX
FY (y) = 21 P (Y y|W = 1) + 21 P (Y y|W = 1)

= 12 P (X y|W = 1) + 12 P (X y|W = 1)

= 12 P (X y) + 12 P (X y) = P (X y) = FX (y)

c
2011
by Taejeong Kim

X and Y are Gaussian but not jointly.


X and Y are uncorrelated but dependent.

22
@
@

@
@

@
@

X
@
@

Synthesis of a Gaussian random vector with mean m and covariance matrix C.


For any real symmetric non-negative definite matrix C,
C = QQt = Q1/2QtQ1/2Qt = C 1/2C 1/2, where
C 1/2 = Q1/2Qt.
If C is invertible, so is C 1/2.
Let X consist of iid random variables, each N (0, 1).
X N (0, I) such that

1
xi
1
1 t

Yk

fX (x) = i=1 exp =


exp
x x
2
(2)k/2
2
2

c
2011
by Taejeong Kim

23

Let Y = C 1/2X + m.
Then Y N (m, C 1/2I(C 1/2)t) = N (m, C) such that
fX (C 1/2(y m))
fY (y) =
| det C 1/2|

1
1

t 1

exp (y m) C (y m)
=
2
(2)k/2 det C
We can also use Q1/2 in place of C 1/2 = Q1/2Qt,
ie, Y = Q1/2X + m.
Therefore to generate a Gaussian random vector with m and
C, we proceed as follows.
k iid unif(0, 1)
k iid N (0, 1) by the transform (inverse of the cdf)
N (m, C) by the affine transform (above)

c
2011
by Taejeong Kim

24

The conditional expectation is an affine function for jointly


Gaussian random vectors. That is, if X and Y are jointly
Gaussian, E(X|Y ) = A(Y mY )+mX , where ACY = CXY .
proof (for zero mean): Let ACY = CXY .

X AY
Y

I A X
: jointly Gaussian

=

0 I
Y

E(X AY )Y t = CXY ACY = O: uncorrelated


independent
Set g(Y ) = AY .
Eh(Y )t[X g(Y )] = 0 for any h [indep; zero mean]
: orthogonality holds. [ g(Y ) = E(X|Y )]
E(X|Y ) = g(Y ) = AY

c
2011
by Taejeong Kim

25

The figure shows the line


E(X|Y = y) = a(y mY ) + mX ,
a 1-dim case.
y6
-

x
>0

fX |Y (x|y)

1
1

t
1

s
=
exp
(x mX|y ) CX|Y (x mX|y ) ,
2
(2)k/2 det CX|Y
where mX|y = E(X|Y = y) = A(y mY ) mX
and CX|Y = CX ACY X , in which A satisfies ACY = CXY .
The vector conditional pdf is in the Gaussian jpdf form.
Note that CX|Y does not depend on y.

c
2011
by Taejeong Kim

26

Karhunen-Loeve transform: KLT


Given a random vector X, the KLT A is the matrix whose
rows are (Euclidean) orthonormal eigenvectors q ti of CX .

q t1
q t2

qi1

11

qi2

q i = . , A = . = .21
.
.

q tk qk1
qik

q1k

q2k
t
[A = Q ]
..

qkk

1, i = j
t
q i q j = qi1qj1 + qi2qj2 + + qik qjk = ij =
0, i 6= j
AAt = AtA = I: A is orthogonal or unitary.
X
Y = AX, Yi = q tiX = j qij Xj : transform
X
X = AtY = i Yiq i: expansion

q12
q22
..
qk2

c
2011
by Taejeong Kim

27

CX At = CX (q 1, , q k ) = (CX q 1, , CX q k )
= (1q 1, , k q k )

q t1
q t2

1 0 0

0 2 0
t
ACX A = . (1q 1, , k q k ) = = . .
..
. .
.

t
0 0 k
qk
Y has uncorrelated components. (Assume EX = 0.)
EYiYjt = E(q tiX)(X tq j ) = q ti(EXX t)q j = q tiCX q j

i , i = j
t
= q i j q j =
0,
i 6= j
If EX = 0, Y has orthogonal components.
If X is Gaussian, Y is a Gaussian random vector with independent components.

c
2011
by Taejeong Kim

28

application: transform coding:


A speech or image sample vector X is highly correlated.
Y = AX has uncorrelated components many of which have
very small variance.
Let Y be an approximation of Y with small components
replaced by zeros and the others quantized.
= A1Y is an approximation of X requiring less bits to
X
represent.
JPEG image coding, MPEG video coding

c
2011
by Taejeong Kim

29

u
u

u
u

u
u

scalar quantization vector quantization

u
u

u
u

u
u

u
u

transform code

To encode 2 samples, 4 bits or 16 different vectors are used.


The compression rate or code rate is 2 bits per sample.
The distance between vectors corresponds to distortion.

c
2011
by Taejeong Kim

Y-

30

Y1 -

Q1

Y2

YL -

Q2

QL

Y1 e

Y2 -
Y

YL-

binary decoder

binary encoder
X2 6

Y2

X1

T =

X
-

1
2

1 1

1 1

Y1

c
2011
by Taejeong Kim

KLT (AR, = 0.9)

31

KLT (Lena256, 18)

DCT

c
2011
by Taejeong Kim

32

application: principal component analysis (PCA):


The two components of Y with largest variance, ie, principal
components, are used to display a scatter plot of sample vectors
of X.
Y2 6
r
r

rr
rr r
r
r
rr rrr rr r r
rr r r rr
r
rr
r
rr r
r
r r rr r r r r r
r
r rr
rr r
rr
r r rr
r r
r rrr rr
r r r rr
r

Y2 6

r
r

r
r

Y1

rr
rr r
r
r
rr rrr rr r r
rr r r rr
r
rr
r
rr r
r
r r rr r r r r r
r
r rr
rr r
rr
r r rr
r r
r rrr rr
r r r rr
r

Y2 6

r
r

r
r

r
r

r
rr r r rr
rr
r r rr r r
r
rr r
??
r ?

bb
bb b
b
bb bbb bb b b
b
b
b bbbb
b
b

Y1

? ??
?? ?
? ???? ??
? ? ???
?
?

Y1

One point represents one sample vector.


By choosing principal components, the sample vectors with
different characteristics tend to appear maximally apart in
the plot.
pattern recognition, signal classification, face recognition

You might also like