RP Gub 13 04

c
2011
by Taejeong Kim
Random vector and random matrix

random vector:
X = (X1, , Xk )t, EX = (EX1, , EXk )t: mean vector
random matrix:
EZ
Z
11 EZ1l
11 Z1l
.
.
.
.
.
. , EZ = .
Z = .
EZk1 EZkl
Zk1 Zkl
moments
correlation matrix of X:
EX1 EX1Xk
t
.
.
.
.
RX = EXX =
EXk X1 EXk
c
2011
by Taejeong Kim
covariance matrix of X:
var(X1) cov(X1, Xk )
..
..
CX = E(XEX)(XEX)t =
cov(Xk , X1) var(Xk )
uncorrelated X: CX is diagonal.
iid X: CX = 2I, I: identity matrix
Other possibilities: uncorrelated (independent) between subvectors that are each correlated (dependent) within.
definiteness of a matrix
non-negative definite (positive semidefinite) k k matrix A: symmetric and vector a, atAa 0.
positive definite k k matrix A:
symmetric and vector a, atAa > 0.
c
2011
by Taejeong Kim
RX and CX are real symmetric and non-negative definite.

proof:
atCX a = atE(X EX)(X EX)ta
= E[at(X EX)][(X EX)ta] [associative]
= E[at(X EX)]2 0
eigenvalues and eigenvectors
For a square matrix A, if Ax = x, then is an eigenvalue
of A, and x is an eigenvector of A.
A real matrix may have complex eigenvalues and complex
eigenvectors, but a real symmetric matrix has only real eigenvalues and only real eigenvectors (as we can choose).
A non-negative (positive) definite matrix has only non-negative
(positive) eigenvalues.
c
2011
by Taejeong Kim
proof (non-negative definite): For a square matrix A, its

eigenvalue , and the corresponding eigenvector x normalized such that kxk = 1,
= kxk2 = xtx = xtx = xtAx 0
A kk real symmetric matrix A has only real eigenvalues i
and an (Euclidean) orthonormal set of k eigenvectors q i.
1, i = j
t
q i q j = qi1qj1 + qi2qj2 + + qik qjk = ij =
0, i 6= j
proof (real eigenvalues):
In this proof, to be more general, eigenvectors are considered
complex, though they can be chosen to be all real.
i =
ikq ik2 =
iq tq i = (iq i)tq i = (Aq i)tq i = q tAtq i
i
i
= q tiAq i = q tiiq i = iq tiq i = ikq ik2 = i
c
2011
by Taejeong Kim
Let us form a matrix Q by arranging the orthonormal eigenvectors as columns.
q
q
11 q21 qk1
i1
q
q
q
12
22
k2
i2
q i = . , Q = (q 1, , q k ) = . .
.
.
.
.
qik
q1k q2k qkk
Then Q is an orthogonal (unitary) matrix: QQt = QtQ = I,
ie, Qt = Q1.
proof:
Qt Q =
q t1
.. (q , , q ) =
1
k
t
qk
q t1q 1
q t2q 1
q t1q 2
q t2q 2
q t1q k
q t2q k
=I
..
..
..
q tk q 1 q tk q 2 q tk q k
c
2011
by Taejeong Kim
A real symmetric matrix A is diagonalizable. For the matrix

Q defined above and the diagonal matrix whose diagonal
elements are the eigenvalues of A,
A = QQt and = QtAQ
proof:
AQ = A(q 1, , q k ) = (Aq 1, , Aq k ) = (1q 1, , k q k )
QtAQ =
q t1
.. (1q , , k q )
1
k
q tk
1q t1q 1
1q t2q 1
2q t1q 2
2q t2q 2
k q t1q k
k q t2q k
..
..
..
1q tk q 1 2q tk q 2 k q tk q k
1
0
= .
.
0
0
2
..
0
0
..
c
2011
by Taejeong Kim
notation:
Z
Z
Z
X
X
X
x = x1 xk ; dx = dx1 dxk
s
kXk = ki=1 Xi2; EkXk2 =

pX (x) = pX1Xk (x1, , xk )
fX (x) = fX1Xk (x1, , xk )
FX (x) = FX1Xk (x1, , xk )
X
k
2
EX
i
i=1
X (u) = X1Xk (u1, , uk ) = Ee

=
Eej(u1X1++uk Xk )
= Z
[Euclidean norm]
j utX
j ut x
xe
pX (x)
ej u xfX (x)dx
Y
= ki=1 Xi (ui) if independent.
c
2011
by Taejeong Kim
Transformations or functions of random vectors

Y = G(X)
g (X , , X )
Y
1
1
k
1
.
.
. =
.
gl (X1, , Xk )
Yl
If G is continuously differentiable and invertible (l = k),
X = H(Y ), H = G1
X
h (Y , , Y )
1
1 1
k
.
.
.
.
Xk
hk (Y1, , Yk )
y = {y : y1 < Y1 y1 + 1, , yk < Yk yk + k }
volume of y: |y| = 1 k
x = H(y), y = G(x), volume of x: |x|
c
2011
by Taejeong Kim
P (X x) = P (Y y)
fX (x)|x| fY (y)|y|, where y = G(x),
lim|y |0 |x| = | det(dH(y))|
|y |
h1
y1
Jacobian of H: dH(y) = .. . . .
y
1
fX (x)
fY (y) = fX (x)| det(dH(y))| =
| det(dG(x))|
x = H(y)
h1
yk
..
hk
yk
c
2011
by Taejeong Kim
10
example: X, Y N (0, 1) iid; R = X 2 + Y 2, = 6 (X, Y )

For R 0 and < , the transformation from (X, Y )
to (R, ) is continuously differentiable and invertible.
X = R cos , Y = R sin
dx
dr
dH(r, ) = dy
dr
dx
d = cos r sin , det(dH(r, )) = r
dy
sin r cos
d
fR(r, ) = rfXY (r cos , r sin ) (for r 0 and < )
1
r2 cos2 /2 1
r2 sin2 /2
r2/2 1
=r
e
e
= re
2
2
2
1 , <
r /2
, r0
re
fR(r) =
, f() = 2
0,
else
0, else
c
2011
by Taejeong Kim
11
R Ray(1) and unif(-,) are independent.

Y = G(X) = AX + b, where A is square and invertible.
Yi =
k
j=1 Aij Xj
gi
+ bi, x
= Aij , i = 1, , k
i
dG(x) = A; dH(y) = A1
1
fY (y) = fX (A (y b))| det A
fX (A1(y b))
|=
| det A|
EY = AEX + b; Y EY = A(X EX)

CY = E(Y EY )(Y EY )t
= EA(X EX)(X EX)tAt = ACX At
j v tY
j v t(AX + b)
Y (v) = Ee
= Ee
j v tb
=e
X (Atv)
j v tb
=e
Ee
Av X
c
2011
by Taejeong Kim
12
Estimation
minimum mean-squared-error (mmse) estimation of X:
Given the observation Y and some information on the jpdf,
2, where X: k-d, Y : l-d
= g(Y ) = min1
EkX
Xk
find X
X
Xk
2.
= min1
E(X
X
)
i
i
i=1
Xi
linear mmse estimator, Wiener filter:

= AY , where ARY = RXY and RXY = EXY t.
X
i)2 by finding
proof: First for each i, minimize E(Xi X
i = aiY = Xl aij Yj , where ai will form the i-th row of A.
X
j=1
i)2/aij = 0 E(Xi X
i)(Yj ) = 0, j = 1, , l
E(Xi X
note: differentiation and expectation are usually interchangeable.
c
2011
by Taejeong Kim
13
orthogonality principle:
i)Yj = 0
E(Xi X
i Yj
EXiYj = E X
Xi 6
i
Xi X
9 QQ
Yj
Q
Q
s
i =
X
Xl
j=1
aij Yj
For j = 1, , l,
EXiYj = EaiY Yj
= ai(EY1Yj , , EYl Yj )t. [scalar, 1-d]
EXiY t = aiRY [row vector, l-d]
Repeating for i = 1, , k, ARY = RXY . [matrix, kl]
= RXY R1Y if RY is invertible.
A = RXY RY1 and X
Y
2
= EXY Y , solving min1
For 1-d, it becomes X
a E(X aY ) .
EY 2
c
2011
by Taejeong Kim
14
affine mmse estimator, Wiener filter:

= A(Y mY ) + mX ,
X
where ACY = CXY and CXY = E(X mX )(Y mY )t.
proof: Minimize EkX (AY + b)k2
= Ek[(X mX ) A(Y mY )] + (mX AmY b)k2
= Ek(X mX ) A(Y mY )k2 + kmX AmY bk2
[The cross term disappears.]
b = mX AmY and ACY = CXY
= CXY C 1(Y mY ) + mX if CY is
A = CXY CY1 and X
Y
invertible.
This is equivalent to the linear mmse estimator of X mX
based on Y mY .
c
2011
by Taejeong Kim
15
= g(Y ) = E(X|Y )
(general) mmse estimator: X
proof: Minimize EkX g(Y )k2 = EE(kX g(Y )k2|Y )
2|Y = y)p (y)
E(kX
g(y)k
Y
y
= Z
2
E(kX g(y)k |Y = y)f

Y (y)dy
Minimize E(kX g(y)k2|Y = y) for each y to get g(y).
Given y, g(y) is a vector g = (g1, , gk )t.
(g) := E(kX gk2|Y = y)
= E(kXk2|Y = y) + kgk2 2g tE(X|Y = y)
X
X
X
= ki=1 E(Xi2|Y = y) + ki=1 gi2 2 ki=1 giE(Xi|Y = y)
(g)/gj = 0 gj = E(Xj |Y = y)
g(y) = E(X|Y = y) g(Y ) = E(X|Y )
2
= E(X|Y ), solving min1
For 1-d, X
E(X
g(Y
))
.
g
c
2011
by Taejeong Kim
16
alternative proof:
orthogonality principle for functions of Y :
X 6
Eh(Y )t(X g(Y )) = 0 for any h.
X g(Y )
1
2
g(Y ) = ming EkX g(Y )k
h1(Y ) + h2(Y ) = (h1 + h2)(Y )

ah(Y ) = (ah)(Y )
9 QQ
Q
s
Q
h(Y )
g(Y )
proof of orthogonality
principle for functions:
EkX f (Y )k2 = EkX g(Y ) + g(Y ) f (Y )k2
= EkX g(Y )k2 + Ek(g f )(Y )k2
+2E(g f )(Y )t(X g(Y ))
EkX g(Y )k2 if orthogonality holds.
c
2011
by Taejeong Kim
17
Eh(Y )t[X g(Y )]

= EE(h(Y )t[X g(Y )]|Y )
= Eh(Y )tE([X g(Y )]|Y )
= Eh(Y )t[E(X|Y ) g(Y )]
= 0 for any h, if and only if g(Y ) = E(X|Y ).
Why only if?
Therefore, orthogonality holds if and only if g(Y ) = E(X|Y ),
and hence E(X|Y ) is the mmse estimator.
c
2011
by Taejeong Kim
18
Gaussian random vector

X = (X1, , Xk )t is a Gaussian random vector if any
X
linear combination atX = ki=1 aiXi is a Gaussian random
variable.
X N (m, C), m: mean vector, C: covariance matrix
jpdf [def]
1
1
t 1
fX (x) =
(x
m)
exp
(x m) C
k/2
2
(2)
det C
jchf [def]
u Cu
X (u) = exp jm u
2
A Gaussian random vector is fully characterized by its 1-st
and 2-nd moments, ie, by m and C.
c
2011
by Taejeong Kim
19
X N (m, C) Y = AX + b N (Am + b, ACAt)

proof: at(AX + b) = (Ata)tX + atb is Gaussian.
alternative proof:
t(AX +b)
tb
tv )tX
j
v
j
v
j(A
Y (v) = Ee
=e
Ee
tb
j
v
t
= e
X (A
v)
t
t
t
tb
(A v) CX (A v)
t t
j
v
= e
exp j(A v) m
X
t
t
v (ACX A )v
= exp jv (AmX + b)
2
Any linear or affine transformation of a Gaussian random
vector is Gaussian.
c
2011
by Taejeong Kim
20
example: This example shows that the converse of the above

theorem does not hold.
0, Y < 0
Y, Y < 0
; X2 =
Y N (0, 1); X1 =
Y, Y 0
0, Y 0
X1 + X2 = Y , but neither X1 nor X2 is Gaussian.
fX1 (x)
fX2 (x)
If the components of a Gaussian random vector are uncorrelated, they are independent.
proof (sketch): uncorrelated CX is diagonal
Y
CX1 is diagonal fX (x) = i fXi (xi) [See 2-d case]
c
2011
by Taejeong Kim
21
alternative proof:
X (u) = exp jmtu
utCX u
2
Xk
1
= exp j
2 i=1 i2u2i
Yk
Y
= i=1 exp j miui 12 i2u2i = ki=1 Xi (ui)
k
i=1 miui
example: This example shows that each random variable may

be Gaussian while they are not jointly Gaussian.
X N (0, 1); W = 1, equiprobable, independent of X;
Y = WX
FY (y) = 21 P (Y y|W = 1) + 21 P (Y y|W = 1)
= 12 P (X y|W = 1) + 12 P (X y|W = 1)
= 12 P (X y) + 12 P (X y) = P (X y) = FX (y)
c
2011
by Taejeong Kim
X and Y are Gaussian but not jointly.

X and Y are uncorrelated but dependent.
22
@
@
@
@
@
@
X
@
@
Synthesis of a Gaussian random vector with mean m and covariance matrix C.

For any real symmetric non-negative definite matrix C,
C = QQt = Q1/2QtQ1/2Qt = C 1/2C 1/2, where
C 1/2 = Q1/2Qt.
If C is invertible, so is C 1/2.
Let X consist of iid random variables, each N (0, 1).
X N (0, I) such that
1
xi
1
1 t
Yk
fX (x) = i=1 exp =

exp
x x
2
(2)k/2
2
2
c
2011
by Taejeong Kim
23
Let Y = C 1/2X + m.
Then Y N (m, C 1/2I(C 1/2)t) = N (m, C) such that
fX (C 1/2(y m))
fY (y) =
| det C 1/2|
1
1
t 1
exp (y m) C (y m)
=
2
(2)k/2 det C
We can also use Q1/2 in place of C 1/2 = Q1/2Qt,
ie, Y = Q1/2X + m.
Therefore to generate a Gaussian random vector with m and
C, we proceed as follows.
k iid unif(0, 1)
k iid N (0, 1) by the transform (inverse of the cdf)
N (m, C) by the affine transform (above)
c
2011
by Taejeong Kim
24
The conditional expectation is an affine function for jointly

Gaussian random vectors. That is, if X and Y are jointly
Gaussian, E(X|Y ) = A(Y mY )+mX , where ACY = CXY .
proof (for zero mean): Let ACY = CXY .
X AY
Y
I A X
: jointly Gaussian

=

0 I
Y
E(X AY )Y t = CXY ACY = O: uncorrelated

independent
Set g(Y ) = AY .
Eh(Y )t[X g(Y )] = 0 for any h [indep; zero mean]
: orthogonality holds. [ g(Y ) = E(X|Y )]
E(X|Y ) = g(Y ) = AY
c
2011
by Taejeong Kim
25
The figure shows the line

E(X|Y = y) = a(y mY ) + mX ,
a 1-dim case.
y6
-
x
>0
fX |Y (x|y)
1
1
t
1
s
=
exp
(x mX|y ) CX|Y (x mX|y ) ,
2
(2)k/2 det CX|Y
where mX|y = E(X|Y = y) = A(y mY ) mX
and CX|Y = CX ACY X , in which A satisfies ACY = CXY .
The vector conditional pdf is in the Gaussian jpdf form.
Note that CX|Y does not depend on y.
c
2011
by Taejeong Kim
26
Karhunen-Loeve transform: KLT

Given a random vector X, the KLT A is the matrix whose
rows are (Euclidean) orthonormal eigenvectors q ti of CX .
q t1
q t2
qi1
11
qi2
q i = . , A = . = .21
.
.
q tk qk1
qik
q1k
q2k
t
[A = Q ]
..
qkk
1, i = j
t
q i q j = qi1qj1 + qi2qj2 + + qik qjk = ij =
0, i 6= j
AAt = AtA = I: A is orthogonal or unitary.
X
Y = AX, Yi = q tiX = j qij Xj : transform
X
X = AtY = i Yiq i: expansion
q12
q22
..
qk2
c
2011
by Taejeong Kim
27
CX At = CX (q 1, , q k ) = (CX q 1, , CX q k )
= (1q 1, , k q k )
q t1
q t2
1 0 0
0 2 0
t
ACX A = . (1q 1, , k q k ) = = . .
..
. .
.
t
0 0 k
qk
Y has uncorrelated components. (Assume EX = 0.)
EYiYjt = E(q tiX)(X tq j ) = q ti(EXX t)q j = q tiCX q j
i , i = j
t
= q i j q j =
0,
i 6= j
If EX = 0, Y has orthogonal components.
If X is Gaussian, Y is a Gaussian random vector with independent components.
c
2011
by Taejeong Kim
28
application: transform coding:

A speech or image sample vector X is highly correlated.
Y = AX has uncorrelated components many of which have
very small variance.
Let Y be an approximation of Y with small components
replaced by zeros and the others quantized.
= A1Y is an approximation of X requiring less bits to
X
represent.
JPEG image coding, MPEG video coding
c
2011
by Taejeong Kim
29
u
u
u
u
u
u
scalar quantization vector quantization
u
u
u
u
u
u
u
u
transform code
To encode 2 samples, 4 bits or 16 different vectors are used.

The compression rate or code rate is 2 bits per sample.
The distance between vectors corresponds to distortion.
c
2011
by Taejeong Kim
Y-
30
Y1 -
Q1
Y2
YL -
Q2
QL
Y1 e
Y2 -
Y
YL-
binary decoder
binary encoder
X2 6
Y2
X1
T =
X
-
1
2
1 1
1 1
Y1
c
2011
by Taejeong Kim
KLT (AR, = 0.9)
31
KLT (Lena256, 18)
DCT
c
2011
by Taejeong Kim
32
application: principal component analysis (PCA):

The two components of Y with largest variance, ie, principal
components, are used to display a scatter plot of sample vectors
of X.
Y2 6
r
r
rr
rr r
r
r
rr rrr rr r r
rr r r rr
r
rr
r
rr r
r
r r rr r r r r r
r
r rr
rr r
rr
r r rr
r r
r rrr rr
r r r rr
r
Y2 6
r
r
r
r
Y1
rr
rr r
r
r
rr rrr rr r r
rr r r rr
r
rr
r
rr r
r
r r rr r r r r r
r
r rr
rr r
rr
r r rr
r r
r rrr rr
r r r rr
r
Y2 6
r
r
r
r
r
r
r
rr r r rr
rr
r r rr r r
r
rr r
??
r ?
bb
bb b
b
bb bbb bb b b
b
b
b bbbb
b
b
Y1
? ??
?? ?
? ???? ??
? ? ???
?
?
Y1
One point represents one sample vector.

By choosing principal components, the sample vectors with
different characteristics tend to appear maximally apart in
the plot.
pattern recognition, signal classification, face recognition

RP Gub 13 04

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RP Gub 13 04

Uploaded by

Copyright:

Available Formats

c

Random vector and random matrix

cov(Xk , X1) var(Xk )

RX and CX are real symmetric and non-negative definite.

proof (non-negative definite): For a square matrix A, its

Let us form a matrix Q by arranging the orthonormal eigenvectors as columns.

A real symmetric matrix A is diagonalizable. For the matrix

kXk = ki=1 Xi2; EkXk2 =

X (u) = X1Xk (u1, , uk ) = Ee

Transformations or functions of random vectors

example: X, Y N (0, 1) iid; R = X 2 + Y 2, = 6 (X, Y )

R Ray(1) and unif(-,) are independent.

fY (y) = fX (A (y b))| det A

EY = AEX + b; Y EY = A(X EX)

linear mmse estimator, Wiener filter:

affine mmse estimator, Wiener filter:

2|Y = y)p (y)

E(kX g(y)k |Y = y)f

Eh(Y )t(X g(Y )) = 0 for any h.

g(Y ) = ming EkX g(Y )k

h1(Y ) + h2(Y ) = (h1 + h2)(Y )

Eh(Y )t[X g(Y )]

Gaussian random vector

X N (m, C) Y = AX + b N (Am + b, ACAt)

example: This example shows that the converse of the above

X (u) = exp jmtu

example: This example shows that each random variable may

X and Y are Gaussian but not jointly.

Synthesis of a Gaussian random vector with mean m and covariance matrix C.

fX (x) = i=1 exp =

The conditional expectation is an affine function for jointly

E(X AY )Y t = CXY ACY = O: uncorrelated

The figure shows the line

Karhunen-Loeve transform: KLT

application: transform coding:

scalar quantization vector quantization

To encode 2 samples, 4 bits or 16 different vectors are used.

KLT (AR, = 0.9)

KLT (Lena256, 18)

application: principal component analysis (PCA):

One point represents one sample vector.

You might also like