You are on page 1of 15

Prepared by: METU, Electrical Engin. Dept.

Çağatay Candan Dec. 2011


document version:1.2

On Gaussian Distribution
Gaussian distribution is defined as follows:
( x− µ )2

1 2σ 2
f x ( x) = e x

~
2 πσ 2
x
The function f x ( x) is clearly positive valued. Before calling this function as a probability
~

density function, we should check whether the area under the curve is equal to 1 or not.

R1.1: Area under Gaussian Distribution [Signal Analysis, Papoulis]


( x −µ )2
∞ − −
x2
2σ x2
I =∫ e

2σ x2
Let dx , then by a simple change of variables I = ∫ e dx ; then
−∞ −∞

 ∞ − x1 2  ∞ − x2 2 
2 2


I = ∫ e
2 2σ x 
dx1  ∫ e 2σ x
dx2 
 −∞  −∞ 
  
x12 x2 2
∞ ∞ − −
2σ x2 2σ x2
=∫ ∫ e e dx1dx2
−∞ − ∞
r2
∞ π −
2σ x2
=∫ ∫ πe rdrdφ
0 −

= 2πσ x2
This shows that f x ( x) is a valid probability density function.
~

R1.2: Mean and Variance Calculation for Gaussian Distribution;

Mean: It can be noted that the distribution is symmetric around x = µ x , that is


f x ( µ x + x) = f x ( µ x − x). We know that the mean value should be located at the center of
~ ~

symmetry then x = µ x .

We can show this result as follows: By a simple change of variables, we get


x = ∫ xf x ( x)dx = ∫ ( µ x + x) f x ( µ x + x)dx = ∫ ( µ x − x) f x ( µ x − x)dx . But since we have
~ ~ ~
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

f x ( µ x + x) = f x ( µ x − x) , x = ∫ ( µ x + x) f x ( µ x + x)dx = ∫ ( µ x − x) f x ( µ x + x)dx , then


~ ~ ~ ~

2 x = ∫ ( µ x + x) f x ( µ x + x)dx + ∫ ( µ x − x) f x ( µ x + x)dx = 2 µ x .
~ ~

Variance: This is more tricky, [Probability, Papoulis]. The area under the Gaussian
( x−µ )2

1 2σ x2
distribution is equal to 1, that is ∫ 2πσ x2
e dx = 1 . Now taking derivative of this

relation wrt to µ , we get


( x−µ )2

d 1 2σ x2

dµ ∫ 2πσ 2
e dx = 0
x

( x−µ )2
1 (x − µ) −
2σ x2

2πσ 2 ∫ σ x2
e dx = 0
x

The last equation can be written as ∫ (x − µ) f x


~
( x)dx = 0 , showing one more time that

the mean of Gaussian distribution is µ. Taking derivative of ∫ (x − µ) f x


~
( x)dx = 0

relation with respect to µ for a second time, we get


(x − µ) 2

∫ 2 f x ( x)dx + ∫ (−1) f x ( x)dx = 0 , showing that the variance of Gaussian


σx ~ ~

distribution is σ . 2
x

Taking derivative wrt to a “constant” (such as µ in here) is a powerful calculation/proof


tool. This process can be done if the relation whose derivative is taken is valid for a
continuum of µ values. In our case, the area under Gaussian distribution is 1 for any
( x−µ )2

1 2σ x2
value of µ . Hence I ( µ ) = ∫ e dx = 1 is valid for ∀µ .
2πσ x2

A second note is on the interchange of derivative an integral operators. This step looks
innocent but it is indeed treacherous. We have to verify the interchange in general. If you
have one sided or double sided integrals, (One sided integral has upper or lower limit as
± ∞ , double sided integrals have upper/lower limits as − ∞ and ∞ .); the integrals may or
may not converge. The interchange of integrals or derivatives operators is the interchange
of two limiting operations. (Remember derivative is also defined at a limit of h → 0 or
1 / h → ∞ ) . One has to be careful about this interchange. The integrals studied in the
probability course in general behave nicely (positive and unit area functions). But we
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

should always be careful when we exchange operators involving infinite or infinitesimal


quantities.

R1.3: The distribution of y=ax +b where x is normal distributed:


The distribution of y = a x + b can be expressed using the fundamental theorem for
~ ~

1 y −b
functions of random variables: f y ( y) =
f x( ) . When the definition for
~ |a| ~ a
Gaussian density is substituted for f x ( x) we get
~

y −b
( −µ )2
a ( y − b − aµ ) 2
− −
1 1 2σ x2 1 2 a 2σ x2
f y ( y) = e = e .
~ | a | 2πσ x2 2πa 2σ x2
From the last equation, we note that y is also Gaussian distributed with mean
~

(aµ x + b) and variance a σ . 2 2


x

We reach an important conclusion that by scaling and biasing a Gaussian r.v., we get
another Gaussian r.v. with different mean and variance.
The mean and variance of the new Gaussian random variable can be easily calculated
using expectation operator, y = E{ y} = E{ax + b} = aµ x + b (and a similar relation for
variance). And the distribution of y can be written by using the calculated mean and
~
variance. We know that if a random variable is Gaussian distributed, we only need to
calculate its mean and variance to write the distribution. As an example let z : N (0,1)
~
(Gaussian distributed or normal distributed with zero mean and unit variance) , then
w = 10 z − 10 is N(-10,100) since the mean of 10 z − 10 is -10 and its variance is
( )
~ ~ ~

E{( w − w ) } = E{ (10 z − 10) − (−10) } = E{(10 z ) } = 100 E{z 2 } = 100. s


2 2 2
~

R1.4: Moment Generating Function:


Let z be zero mean, unit variance Gaussian distribution: z : N (0,1) . The moment
~ ~
z2
1 −
generating function of z is Φ( s ) = E{e sz } = ∫e dz . We can write sz-z2/2 as a
sz
e 2
~

perfect square and a term correction
as follows:
z 2
z − 2 sz
2
( z − s) 2
s 2
− ( − sz ) = −( )=− + . This process is known as completing the
2 2 2 2
square and frequently used. When this relation is substituted to Φ(s ) , we get
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

z2 s2 ( z −s )2 s2 z2 s2
1 − 1 − 1 −
Φ( s) = ∫ e e dz = e 2π ∫ e dz = e ∫e dz = e .
sz 2 2 2
2 2 2 (The moment
2π 2π
generating function is valid for all s values.)

When x is a Gaussian random variable with mean µ and variance σ 2 , x : N ( µ , σ 2 ) ;


~ ~

then we know that x can be written as x = σ z + µ where z : N (0,1). The moment 2


~ ~ ~ ~

s 2σ 2
s ( σ 2 z +µ ) sµ +
generating function of x is equal to Φ x (s) = E{e } = E{e sx ~
}=e 2
.
~

R1.5: Higher Order Moments of Gaussian Distribution:


s 2σ 2
sµ +
Let x : N ( µ , σ 2 ) , we know that Φ x ( s) = e 2
. Using power series expansion of ez
~

zk
which is e z = ∑ , we can write Φ x (s) as follows:
k =0 k!
( sµ + s 2σ 2 / 2) 2 ( sµ + s 2σ 2 / 2)3
Φ x ( s) = 1 + (sµ + s σ / 2) + 2
+ 2
+ ...
2 6
The coefficient of sk in Φ x (s ) is equal to E{xk}/k!. When x : N (0, σ 2 ) , we have
~

 (2n)!σ 2n

E{x k } = m k =  2 n n! = 1.3.5....(2n − 1)σ 2 n k = 2n
 0 k = 2n + 1

R1.6: Addition of two independent Gaussian r.v.’s, z=x+y, is a Gaussian r.v.


When two Gaussian distributed independent random variables x : N ( µ x , σ x2 ) and
~

y : N (µ y ,σ 2
y ) are added, the resultant distribution is also Gaussian distributed
~

z : N ( µ x + µ y , σ x2 + σ y2 ) , where z = x + y . . This is a property of great importance in


~ ~ ~ ~
practice.

This result can be easily shown using moment generating functions.


s ( x+ y )
Φ z (s) = E{e } = E{e sz
} = E{e }E{e }
where we have used the
sz sy

independence of x and y in the last equality. From R1.4, we can substitute the moment
~ ~
generating function of Gaussian distribution and get
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

s 2σ x2 s 2σ y2 s 2 (σ x2 +σ 2y )
sµ x + sµ y + s ( µ x + µ y )+
Φ z (s) = e 2
e 2
=e
. Comparing this result with R1.4,
2

we conclude that z = x + y is indeed Gaussian distributed with mean µ x + µ y and


~ ~ ~

variance σ 2
x +σ . 2
y

In the next section we define joint Gaussianity and show that arbitrary linear combination
of Gaussian random variables result in Gaussian distribution.

Jointly Gaussian Random Variables


We first examine two random variables and then extend to random vectors which is the
joint distribution of n random variables.

Two random variables x and y are called jointly Gaussian if their joint density can be
~ ~
written in the following form:
 (x − µ )2 ( x − µ x ) ( y − µ y ) ( y − µ y ) 2  
f xy ( x, y ) =
1 
exp −
1
 x
− 2ρ + 
 2(1 − ρ 2 )  σ 2 σ xσ y σ y2  
2πσ xσ y 1− ρ 2   x

− x2
The function fxy(x,y) is clearly positive valued and of finite area (since e upper
-|x|
bounded by e for large x), therefore with proper scaling it can be utilized as density
function. Note that fxy(x,y) is a function of two independent variables and has five
parameters {µ x , µ y , σ x2 , σ y2 , ρ } . We will attach the labels of mean, variance and
correlation coefficient to these variables after justifying why we do so.

An alternative (but equivalent) definition for joint Gaussianity can be given as follows: x
~

and y are called jointly Gaussian if z = ϖ x x + ϖ y y is Gaussian distributed for any value
~ ~ ~ ~

of ϖ x and ϖ y . Here z is the standard univariable Gaussian distribution that we have


~
previously examined. (lso called univariate Gaussian) We take granted the equivalence of
two definitions and proceed without a proof for now. (The function g (Θ x , Θ y ) and
related descriptions given in R2.4 (given below) can be formulated as proof.)
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

R2.1: Definition in matrix form:


The definition can also be written as follows:
−1
 ρσ x σ y   ( x − µ x )  
 (x − µ ) ( y − µ )  σ x
[ ]
2

 y    
1
x
 ρσ x σ y σ y2  ( y − µ y ) 
f xy ( x, y ) = exp − 
2πσ xσ y 1 − ρ 2  2 
 
 
 
Note that the argument of the exponential mimics the one dimension Gaussian
distribution. If you try not to see the terms related to y variable then you have
−1
 σ x2  ( x − µ x ) 
 [( x − µ ) ]
   
( x − µ x )2
x

exp    
 = exp(− )
 2  2
 
 
From here we feel that multi-dimensional Gaussian distributions is linked to univariate
(single variable) Gaussian distribution, which is a fact whose details are examined below.

The equivalency of matrix definition given in R2.1 to the original definition follows very
simply by calculating the matrix inverse in the matrix definition:
−1
 σ x2 ρσ x σ y  1  σ y2 − ρσ x σ y  1  1 / σ x2 − ρ / σ xσ y 
  =   =  
 ρσ x σ y σ y2  σ x2σ y2 (1 − ρ 2 ) − ρσ x σ y σ x2  (1 − ρ 2 ) − ρ / σ x σ y 1 / σ y2 

R2.2: Definition for zero mean jointly distributed Gaussian r.v’s.


When zero is substituted for the mean we get the following

−1
  σ x2 ρσ x σ y   x  
 [x y ] 
1   ρσ x σ y σ y2   y  
f xy ( x, y ) = exp − 
2πσ xσ y 1 − ρ 2  2 
 
 
 
If new random variables x 2 and y 2 are defined from x and y where x and y are
~ ~ ~ ~ ~ ~
jointly Gaussian r.v.’s with zero mean (the joint pdf is given above) such that
x2 = x + x
y2 = y + y
The distribution of new random variables is f x2 y2 ( x, y ) = f xy ( x − x , y − y ) (by
fundamental theorem) and we see that the distribution of x 2 and y 2 has the form of
~ ~
general joint Gaussian density. Since mean values only shift the center of symmetry to a
point on (x,y) plane, sometimes we prefer to work with zero mean Gaussian distribution
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

and say that this result is also valid for non-zero mean Gaussian distributions. This is due
to the mentioned shift in the center of symmetry. We have some examples of this
situation in these notes. s

R2.3: Two independent Gaussian r.v.’s are jointly Gaussian


Let x and y are independent Gaussian distributed zero mean random variables. (As
~ ~
noted in R2.2, the value of the mean is not important in this argument. We prefer to take
the mean values as zero to simply the presentation.) Then f xy ( x, y ) = f x ( x) f y ( y ) and
when the definition for the univariate Gaussian distribution is inserted, we get the
definition of joint Gaussian distribution for zero mean variables. Since the joint density of
two independent Gaussian random variables can be written in the form required by the
the joint distribution of Gaussian densities, two independent Gaussian random variables
are indeed jointly Gaussian. (This is not very surprising, the labels of the distributions
give away the final conclusion.)

A more surprising result is the next one. If x and y are known to be Gaussian distributed
~ ~

(but not independent), the joint distribution of x and y is not necessarily is Gaussian.
~ ~

That is if marginal densities of x and y are Gaussian, the joint density of x and y is not
~ ~ ~ ~
necessarily Gaussian.
A simple example is as follows. Let x : N(0,1) and y = c x . Here c can take the value of
~ ~ ~ ~ ~

1 and –1 with equal probability. It is not very difficult to show that (for EE230 students)
that y is also N(0,1). Clearly y and x are related to each other through a random
~ ~ ~

mapping c . Therefore x and y are not independent. Furthermore the condition density
~ ~ ~

of y given x is 0.5δ ( y − 1) + 0.5δ ( y + 1) . We show in R2.8 that if y and x are jointly


~ ~ ~ ~

Gaussian distributed, then y given x is always an univariate Gaussian distribution.


~ ~

Therefore the distribution for y given x which is 0.5δ ( y − 1) + 0.5δ ( y + 1) gives away
~ ~

that x and y are not jointly Gaussian. For a constructive example see [Probability,
~ ~
Papoulis].

R2.4: Moment Generating Function of joint Gaussian r.v.’s x and y (zero mean)
sx x + s y y
[s x s y ] xy 
Independent Case: Φ ( s x , s y ) = E{e } = E{e   s y
} is equal to E{e s x x }E{e y } by
independence. Then
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

s 2σ 2 s y 2σ y 2 s 2x σ x 2 + s 2y σ y 2
sx x + x x sy y+ s x+sy y
Φ(sx , s y ) = e 2
e 2
=ex e 2

Dependent Case: Let x' and y ' be independent zero mean Gaussian random variables.
~ ~

Lets define the random variables x and y from x' and y ' as follows:
~ ~ ~ ~

x = a x'
~ ~
(1)
y = cx'+ d y '
~ ~ ~

and lets define an arbitrary looking function g (Θ x , Θ y ) as follows:


g (Θ x , Θ y ) = E{exp(Θ x x + Θ y y )}
= E{exp(Θ x ax'+ Θ y (cx'+ dy ' ))}
= E{exp((Θ x a + Θ y c) x '+ Θ y dy ' )}
= E{exp((Θ x a + Θ y c) x ' )}E{exp(Θ y dy ' )}
= Φ x ' (Θ x a + Θ y c )Φ y ' (Θ y d )
Here Θ x , Θ y are some scalars. When the moment generating functions for x' and y ' is
~ ~
inserted in the last equation, we get:
g (Θ x , Θ y ) = Φ x ' ( Θ x a + Θ y c ) Φ y ' (Θ y d )
1 1
( Θ x a +Θ y c )2 (Θ y d )2
=e 2
e 2

1 2 2
( Θ x a + 2 Θ x Θ y ac + Θ 2y ( c 2 + d 2 ))
=e 2

Now we interpret g (Θ x , Θ y ) and establish its connection with Φ( s x , s y ) . The moment


sx x+ s y y
generating function Φ( s x , s y ) = E{e } is defined for sx and sy values for which the
expectation result is finite. For univariate Gaussian distributions such as x' , Φ x ' ( s x ' )
~

function is defined for all complex s x ' values. The same is also valid for y ' variable. We
~

can then say that g (Θ x , Θ y ) function is defined for arbitrary complex valued pairs of
(Θ x , Θ y ) .

Since the parameters of g (Θ x , Θ y ) function can be arbitrary, we choose to relabel them


as Θ x = s x and Θ y = s y to get:
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

g (Θ x , Θ y ) ↓ ΘΘ xy ==ssxy = E{exp( s x x + s y y )} = Φ ( s x , s y )
and we get the following for the moment generating function of x and y .
~ ~

1 2 2
( s x a + 2 s x s y ac + s 2y ( c 2 + d 2 ))
Φ(sx , s y ) = e 2 (2)

Next we use the fundamental theorem to express the distribution of x and y . We can
~ ~

also invert Φ( s x , s y ) given in (2) to get the distribution of x and y , but we prefer using
~ ~
the fundamental theorem instead of inverse Laplace transforms.

Rewriting equation (1) in the matrix form, we get the following:


 x  a 0   x'  x'  1 / a 0  x
 y  =  c d   y ' ⇒  y ' = − c / ad 1 / d   y 
         
Then
1 x 1 c  1  x 2   11 c  
2

f xy ( x, y ) = f x'y' ( , y −   
x) = A exp −   exp −  y − x (3)
|J| a d ad  2a   2d ad  
  

Here J is the Jacobian (which is J=ad) and A is a scalar whose exact value is not of
interest for now.

Finally we fix (a,c,d) given in (1) to some special values.


a =σx
c = ρσ y
d = σ y (1 − ρ 2 )

Then (2) and (3) become

1 2 2
( s x σ x + 2 s x s y ρσ xσ y + s 2y σ 2y )
Φ ( sx , s y ) = e 2 (2’)

  x2 y2 
f xy ( x, y ) = A exp − 
1 xy
 − 2 ρ + (3’)
 2(1 − ρ 2 ) σ x2

σ xσ y σ y2  
 
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

From (2’) and (3’), we get the moment generating function of jointly distributed Gaussian
zero mean random variables x and y .
~ ~
R2.5: Moment Generating Function of joint Gaussian r.v.’s x and y (non-zero mean)

Let x' and y ' be jointly distributed zero mean Gaussian random variables. The moment
~ ~

generating function of x' and y ' is given in R2.4 equation (2’). Define new random
~ ~
variables as
x = x'+ x
y = y '+ y
The moment generating function of x and y can be written as follows
~
~
s x+ s y y s ( x '+ x )+ s y ( y '+ y ) s x+sy y
Φ ( s x , s y ) = E{e x } = E{e x }= e x Φ x' y ' (sx , s y )

which is
1 2 2
sx x +s y y ( s x σ x + 2 s x s y ρσ x σ y + s 2y σ y2 )
Φ(sx , s y ) = e e 2

Note that when ρ = 0 the moment generating function given in the last equation reduces
to the one found for the independent variables in R2.4.

R2.6: Correlation coefficient between x and y is ρ.


We know that the correlation coefficient between random variables x and y by its
~ ~

definition is not effected by the mean values of x and y . We also know that
~ ~

∂ 2
E{xy} = Φ( s x , s y ) ↓ s x = s y =0 . We take the moment generating function of joint
∂x∂y
Gaussian random variables (with zero mean) and calculate its partial derivative as
follows:
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

∂2 ∂ 2 2( sx2σ x2 +2sxsy ρσ xσ y +sy2σ y2 )


1
Φ(sx , s y ) = e
∂x∂y ∂x∂y

1
( sx2σ x2 +2 sx s y ρσ xσ y +s y2σ y2 )
= (sxσ x + s y ρσ xσ y )e
2 2
∂y
1 2 2 1 2 2
( sxσ x +2 sx s y ρσ xσ y +s2yσ y2 ) ( sxσ x +2 sx s y ρσ xσ y +s2yσ y2 )
= ρσ xσ y e 2
+ (s yσ + sx ρσ xσ y )e
2
y
2

When zero is substituted for s x and s y , we get E{xy} = ρσ xσ y and from here we see
E{xy}
that ρ = . Note that right hand side of the last relation is the definition of
σ xσ y
correlation coefficient for zero mean random variables. Therefore ρ appearing in the
definition of the joint density is the actual value of correlation coefficient between x and
~

y . We call this parameter as the correlation coefficient.


~

R2.7: If the correlation coefficient of jointly Gaussian distributed r.v.’s is zero, then
random variables are independent.
In R2.5 we have noted that “Note that when ρ = 0 the moment generating function given
in the last equation reduces to the one found for the independent variables in R2.4.”, we
have also shown that ρ the correlation coefficient between x and y in R2.5. This
~ ~
important result follows from the combination of R2.4 and R2.5.

R2.7: The marginal density for x and y is Gaussian.


The moment generating function for the marginal density for x is E{e s x x } . The same
~
function can be expressed from the moment generating function of the joint distribution
1 2 2
sx x+ s y y sx x + s y y ( sxσ x + 2 sx s y ρσ xσ y + s 2yσ y2 )
Φ( s x , s y ) = E{e }= e e 2
by taking sy = 0 .
1 2 2
s xσ x
When this is done, we get E{e } = e e
sx x sx x 2
which is the moment generating
function of univariate Gaussian distribution with mean x and variance σ x2 . Hence when
we marginalize a joint Gaussian density, then we have an univariate Gaussian density.

R2.8: The conditional density of x given y for jointly Gaussian variables.


The density of x conditioned on y is also Gaussian. This result has significant
~ ~
importance in estimation theory.
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

i. Case of scalar random variables.


Here x and y are jointly Gaussian distributed with the following density.
~ ~

  ( x − µ )2 ( x − µ x ) ( y − µ y ) ( y − µ y ) 2  
f xy ( x, y ) =
1 
exp −
1
 x
− 2ρ + 
2πσ xσ y 1− ρ 2  2(1 − ρ 2 )  σ 2 σ σ σ 2
 
  x x y y

f xy ( x, y )
The goal is to find f y| x ( y | x) , f y| x ( y | x) = . We explicitly calculate this
f x ( x)
ratio and show that the conditional density is the Gaussian distribution. Without any
loss of generality, we assume µ x = µ y = 0 to simplify the algebra. If µ x = µ y = 0 is
not true, then the following substitutions should be made in all expressions,
x → ( x − µ x ) and y → ( x − µ y ) .
Without a further ado, we start the calculation:

  x2 y 2  
1 
exp −
1
 − 2ρ
xy
+ 2 
2πσ xσ y 1− ρ 2  2(1 − ρ 2 ) σ x2 σ xσ y σ y 
f xy ( x, y )   
f y| x ( y | x) = =
f x ( x) 1  x2 
exp − 2 
2π σ x  2σ x 
 
1  x2 xy y2 
Let’s re-express the exponent in the numerator  − 2 ρ + :
2(1 − ρ 2 )  σ x2 σ xσ y σ y2 

1  x2 xy y2  1  σ y2 2 σ y xy 
 − 2 ρ + =  2 x − 2ρ + y2 
2(1 − ρ 2 )  σ x2 σ xσ y σ y2  2(1 − ρ )σ y
2 2
σ x σx 

 σ xy  σ
2
  σ y  σ y2 2 
2

=
1  y 2 − 2 ρ y +  xρ y  −  xρ  + 2 x 
2(1 − ρ 2 )σ y2  σx  σ x   σ x  σ x 
 
 σ
2
  σ y  σ y2 2 
2
1  y − ρ
= x  −  xρ  + 2 x 
y

2(1 − ρ 2 )σ y2  σ x   σ x  σ x 
 
2
1 σ 
 y − ρ y x + x
2
=
2(1 − ρ 2 )σ y2  σ x  2σ x2
Note that in the second line we have added and subtracted the same term shown with
curly brackets, i.e. {...}2 . This process is called completion to a square. Now
f y| x ( y | x) can be easily written as follows:
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

1  1  σy 
2

f y| x ( y | x) = exp −  y − ρ x  
2π σ y 1− ρ 2  2(1 − ρ 2 )σ y2  σx  
 

 σy 
Hence the conditional distribution is N  ρ x, (1 − ρ 2 )σ y2  .
 σ 
 x 

ii. Case of Multiple Observations (Multivariate Gaussians)


Here y and x is jointly Gaussian distributed. In this case, y is a scalar Gaussian
~ ~ ~

random variable and x is a N x 1 Gaussian vector. As in the previous case, both y


~ ~

and x are assumed to be zero mean without any loss of generality. The goal is to
~

express f y| x ( y | x) =
f xy ( x, y )
f x ( x)
, that is “update” the a-priori pdf for y , that is N 0, σ y2 ( )
~

given the observation vector x :


~

1  1 
exp − z T C z−1 z 
f y |x ( y | x ) =
f xy ( x , y )
=
( 2π )
N +1
Cz
1
2  2 
f x (x) 1  1 
exp − x T C −x1 x 
( 2π ) N
Cx
1
2  2 
x 
Here z is the concatenation of x and y , z =  Nx1  . C z and C x is the
~ ~ ~  y  ( N + 1) x 1
covariance matrix of z and x , respectively. Then as in the scalar the problem, the
~ ~
exponent in the numerator can be expressed as follows:
−1
C x rxy   x 
z C z z = [x
y ] T
1 T −1
2  
2  rxy σ y   y 
In the equation above, rxy = E{xy} is the cross-correlation of observations and y . The
~
matrix inverse in the equation above can be taken using the partitioned matrix
inversion lemma.

There are various type of matrix inversion lemmas, they are abundant in books but for
the sake of simplicity we present a screenshot of a webpage illustrating the form of
lemma that we would like to implement.
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

Partitioned Matrix Inversion Lemma


A screenshot from http://www.cs.nthu.edu.tw/~jang/book/addenda/matinv/matinv/

Substituting A with C x and b with rxy , we can express C −z 1 as follows using the lemma:
 −1 1 −1 T −1 1 
 C r 
−1
C x + k C x rxy rxy C x − C −x1rxy 
C −z 1 =  T = k
x xy
  
 rxy σ y 
2
1 1
 − rxTy C −x1 
 k k 
Then f y|x ( y | x ) can be written as:
 1   1 
1 exp − z T C −z 1 z  exp − z T C −z 1 z 
f xy ( x , y ) Cx 2
 2  =K  2 
f y |x ( y | x ) = =
2π C z 2 exp − 1 x T C −1 x   1 
1
f x (x)
x exp − x T C −x1 x 
 2   2 
 1 
In the last equation, K denotes a constant scalar. The ratio of exp − zT C −z 1 z  and
 2 
 1 
exp − x T C −x1x  can be written more explicitly as follows:
 2 
 1
exp − zT C z−1 z 
 [
 1
]
x  
exp − x T y C z−1   
 1 T  −1 C −x1 0   x  
 2
 1 T −1 
=
 1 T
 2
−1
 y 

= exp  − [ x y ]
 Cz − 
     
exp − x C x x  exp − x
 2
[ ]C
y  x
0 −1  x 
 Cx   
 2   0 0    y  
 2 
  0 0  y  
Prepared by: METU, Electrical Engin. Dept.
Çağatay Candan Dec. 2011
document version:1.2

With the substitution of C −z 1 , we get the following for the matrix in the quadratic product:
−1 C −x 1 0 1 C −x1rxy rxTy C x−1 − C −x1rxy 
C − =  −1 
 0 0 k  − rxy C x
z T
1 
with k = σ y2 − rxTy C −x1 rxy . Then the density becomes:
 1 
exp − zT C z−1 z   1 T 1 C x rxy rxy C x − C x rxy   x  
−1 −1 −1

[ ]
T

f y |x ( y | x ) = K  2  = K exp− y 
x  
 1 T −1  
 2k k  − rxTy C −x1 1   y  
exp − x C x x 
 2 
The last relation can be further reduced to
f y|x ( y | x ) = K exp− (
 1 T −1 T −1
) 
x C x rxy rxy C x x − 2 yrxTy C −x1x + y 2 
 2k 
This not-so-friendly relation is our old friend in wolf’s clothing. To recognize our friend,
we just need to define yˆ = rxTy C −x1 x , and rewrite the same expression as follows:
 1 2
f y|x ( y | x ) = K exp− ( 
)  1 2 
yˆ − 2 yyˆ + y 2  = K exp− ( y − yˆ ) 
 2k   2k 
T
( 2 T −1
)
Hence the conditional density is N rxy C x x, σ y − rxy C x rxy . (This is one of the most
−1

important results for the estimation theory. This shows that the linear minimum mean
square error estimator (a topic of EE503) is the conditional mean which is the optimal
estimator for the general minimum mean square error estimation having no linearity
constraints.)

You might also like