Ess011-14-L11 PG 12 PDF

ESS011
Mathematical statistics and signal processing
Lecture 11: Multivariate randomness
Tuomas A. Rajala
Chalmers TU
April 7, 2014
Course ESS011 (2014)
Week 4: where are we
Recall: Random experiments outcome = value of a random variable
Last week:
Some examples of distribution families for r.v.s
Joint distribution: From one variable to many variables
Todays menu: How multivariate random vectors can be dealt with.
1/16
Joint distribution, recall
Joint distribution Let X and Y be continuous r.v.s. Then the joint

density f (x, y) of the vector (X, Y ) fulfils
1 Always non-negative: f (x, y) 0 for all values of x and y
R R
2 Sums to 1: R R f (x, y)dxdy = 1
3 Describes probabilities:
RbRd
P (a < X < b and c < Y < d) = a c f (x, y)dydx
for a, b, c, d R.
In the discrete case: f (x, y) = P (X = x and Y = y).
Convention: if X only defined on DX ( R, set density to 0 outside DX .

Similarly for Y and the joint density.
We look at definitions for bivariate (X, Y ), but everything generalizes to

n > 2 variables.
2/16
Joint distribution recall, contd.
Marginalization: Z
fX (x) = f (x, y)dy
R
and similarly for Y , discrete case, and more than 2 variables.
Note that the marginals are also densities, so

Z Z b
fX (x) 0, fX (x)dx = 1, P (a < X < b) = fX (x)dx
R a
and similarly for Y ; sums in discrete case(s).
Independence: X
Y f (x, y) = fX (x)fY (y)
3/16
Joint expectation
Expectation For a r.v. (X, Y ) with joint density f (x, y), and some
function g(x, y),
Z Z
E[g(X, Y )] := g(x, y)f (x, y)dydx
R R
P P
In the discrete case: E[g(X, Y )] = x y g(x, y)f (x, y)
R R
Technical requirement: |g(x, y)|f (x, y)dxdy < .
Expected value: Expected value of X can be derived directly from the

joint density with g(x, y) = x, as
Z Z Z Z Z
E(X) = xf (x, y)dxdy = x f (x, y)dydx = xfX (x)dx
R R R R R
and similarly for Y and the discrete case.
4/16
The variance of a multivariate random variable

For a univariate random variable we had the variance Var(X) along with
the mean X . In multivariate case there is also mutual variation:
Covariance Let X and Y be r.v.s with a joint density fXY and marginal
means X := E(X) and Y := E(Y ). Then
Cov(X, Y ) := E [(X X )(Y Y )]
is called the covariance of X and Y .
Computational formula: Cov(X, Y ) = E(XY ) E(X)E(Y )
Notes:
If X is large (small) when Y is large (small), Cov(X, Y ) > 0
If X is large (small) when Y is small (large), Cov(X, Y ) < 0
X
Y Cov(X, Y ) = 0. Not the other way in general!
Only the sign is interpretable, not the absolute value
5/16
Correlation
As X and Y might be of different scales (e.g. meters vs counts), it is

easier to interpret the scale-free version:
2
Correlation: Let X and Y be r.vs with variances X and Y2 . Then we
define
Cov(X, Y )
:= Cor(X, Y ) :=
X Y
and call it the correlation of X and Y .
Notes:
Cor(X, Y ) [1, 1]
Often we denote correlation with or XY , and speak about the
correlation coefficient
Correlation=Covariance of standardized variables
6/16
Worked example
Let X and Y have a joint density f (x, y) = x + y for 0 < x, y < 1. What
is their correlation?
Z 1 1 Z 1 1
fX (x) = (x + y)dy = x + fY (y) = (x + y)dx = y +
0 2 0 2
so Z 1 1 7 7
E(X) = x(x + )dx = , E(Y ) =
0 2 12 12
f(x,y)
and
2
Z 1
2 1 5 2 5
E(X ) = x (x + )dx = , E(Y ) =
0 2 12 12
y
2 2 11
Var(X) = E(X ) [E(X)] = x
144
Z Z Z 1 2 1 1
then E(XY ) = xy(x + y)dxdy = ( x + x)dx =
2 3 3 2.0
0.8
and 1.5
1 0.6
Cov(X, Y ) = E(XY ) E(X)E(Y ) = 1.0
y
144 0.4
0.5
which finally leads to 0.2
0.0
0.2 0.4 0.6 0.8

Cov(X, Y ) 1/144 1 x
Cor(X, Y ) = p p = = 0.091
Var(X) Var(Y ) 11/144 11
7/16
Correlation and linear dependence
Correlation measures linear dependence: Let X be a r.v., and set

Y := 0 + 1 X for some constants 0 , 1 6= 0. Then
2
Cov(X, Y ) = 1 X , and Var(Y ) = 12 X
2
1 2
Cor(X, Y ) = p 2 X = sign() 1
1 X X
Depending on the sign of 1 , we have either (-) perfect negative

correlation or (+) perfect positive correlation.
|rho|=1 |rho| < 1 independent: rho=0 rho=0: nonlinear
b1>0

b1<0
y
x x x x
8/16
Bivariate normal
For two normal distributed r.v.s, correlation defines the joint distribution.
Bivariate normal distribution Random vector (X, Y ) has a bivariate

normal distribution if the joint density f (x, y) =
(x x )2 (y y )2

1 1 2(x x )(y y )
exp +
2(1 2 ) x2 y2
p
2x y 1 2 x y
Note especially if = 0,
fX (x) = N (x , x2 ) and fY (y) = N (x , y2 )
f (x, y) = fX (x)fY (y) i.e. normal r.v.s uncorrelated iff independent
rho= 0.5 rho= 0.8 rho=0

3
3
2
2
1
1
y
y
0
0
1
1
2
2
3
3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
x x x
9/16
Conditional density
As with the other set theoretical results, we also extend the conditional
probability
P (A|B) = P (A B)/P (B)
to random variables.
Conditional density For two random variables X and Y with marginal

densitities fX and fY and joint density f (x, y), assuming fY (y) > 0 for
some constant y the function
f (x, y)
f (x|y) := f (x|Y = y) :=
fY (y)
is called the conditional density of X given the event {Y = y}.
Note: X
Y = f (x|y) = fX (x)
10/16
Worked example
Let X and Y have a joint density f (x, y) = 2 ey for 0 < x < y. What
are the conditional densities?
Z Z
fX (x) = f (x, y)dy = 2 ey dy = ex Exp()
x
Z Z y
fY (y) = f (x, y)dx = 2 ey dx = 2 yey Gamma(2, 1 )
0
therefore
f (x, y) 1
fX|y (x) = = U nif ([0, y])
fY (y) y
f (x, y)
fY |x (y) = = e(yx) shifted-Exp(, x)
fX (x)
joint density marginal of X, lambda=2 marginal of Y, lambda=2
2.0
4.0
0.15
3.5
1.5
1.5 3.0
2.5
0.10
f_X(x)
f_Y(x)
1.0
1.0 2.0
y
1.5
0.05
0.5
0.5 1.0
0.5
0.00
0.0
0.0
0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 0 5 10 15 20

x x x
11/16
Another example: inversion

Recall also the Bayess theorem:
f (y|x)fX (x)
f (x|y) = , fY (y) > 0
fY (y)
Example: We have in orbit a space probe that measure the interval

X > 0 between detecting high-energy particles. The probe sends the
measurement back to base. But unfortunately the signal is corrupted
with noise N so that we observe Y = X + N . What is the most probable
value of X = x if we observe Y = y?
Make assumptions X Exp(), and N N (0, 2 ).
The question becomes: What is the maximum point of f (x|y)?
We need f (y|x), reason thusly: For a fixed X = x the only randomness

in Y is N . Since N = Y X or n = y x, we can write approximately
f (y|x) = fN (n) = fN (y x)
. 12/16
Inversion example contd.

We now have using Bayess theorem
f (y|x)fX (x) 1
f (x|y) = = fN (y x)fX (x)
fY (y) fY (y)
1 1 1 2
= e 22 (yx) ex
fY (y) 2 2
1 1 2 2
= e 22 (y 2xy+x )x
fY (y) 2 2
As we want the maxima, it suffices to minimize

1
h(x) := (y 2 2xy + x2 ) x
2 2
which by diffentiation and setting to 0 leads to x = y 2 .
If y > 2 , take x as best estimate of the true signal. Otherwise we

better say x = 0.
13/16
Convolution
What is density of Z = X + Y for some r.v.s X and Y ? We can derive
it.
FZ (z) = P (Z < z) = P (X + Y < z)
Z Z zy
= f (x, y)dxdy

Z Z z
{x := v y} = f (v y, y)dvdy

Z z Z
= f (v y, y)dydv

so Z
d
fZ (z) =
FZ (z) = f (z y, y)dy
dz
The density of the sum is called convolution.
Especially if X
Y
Z
fZ (z) = fX (z y)fY (y)dy
14/16
Example of convolution
Example: Let X P oisson(1 ) and Y P oisson(2 ) and X

Y.
Show that Z := X + Y P oisson(1 + 2 ).
Discrete convolution has sums instead of integrals.

X X zy yy
fZ (z) = fX (z y)fY (y) = x
ex ey
y y=0
(z y)! y!

1 X z!
= e(x +y ) y zy
z! y=0 y!(z y)! y x
(x + y )z (x +y )
= e P ois(x + y )
z!
where we use the P In general, Xi P ois(i )
Pbinomial formula.
independent Xi P ois( i )
15/16
Summary
Todays menu was

Covariance and correlation: Mutual variation of random variables
Conditional distribution: Density of X given {Y = y}
Convolution: The density of a sum of random variables
Tomorrow we have two important limit theorems and also recap the
course so far.
16/16

Ess011-14-L11 PG 12 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ess011-14-L11 PG 12 PDF

Uploaded by

Copyright:

Available Formats

ESS011

Mathematical statistics and signal processing

Lecture 11: Multivariate randomness

Week 4: where are we

Recall: Random experiments outcome = value of a random variable

Todays menu: How multivariate random vectors can be dealt with.

Joint distribution, recall

Joint distribution Let X and Y be continuous r.v.s. Then the joint

In the discrete case: f (x, y) = P (X = x and Y = y).

Convention: if X only defined on DX ( R, set density to 0 outside DX .

We look at definitions for bivariate (X, Y ), but everything generalizes to

Joint distribution recall, contd.

Note that the marginals are also densities, so

and similarly for Y ; sums in discrete case(s).

Expected value: Expected value of X can be derived directly from the

and similarly for Y and the discrete case.

The variance of a multivariate random variable

Cov(X, Y ) := E [(X X )(Y Y )]

is called the covariance of X and Y .

Computational formula: Cov(X, Y ) = E(XY ) E(X)E(Y )

As X and Y might be of different scales (e.g. meters vs counts), it is

0.2 0.4 0.6 0.8

Correlation and linear dependence

Correlation measures linear dependence: Let X be a r.v., and set

Depending on the sign of 1 , we have either (-) perfect negative

|rho|=1 |rho| < 1 independent: rho=0 rho=0: nonlinear

Bivariate normal distribution Random vector (X, Y ) has a bivariate

rho= 0.5 rho= 0.8 rho=0

Conditional density For two random variables X and Y with marginal

is called the conditional density of X given the event {Y = y}.

0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 0 5 10 15 20

Another example: inversion

Example: We have in orbit a space probe that measure the interval

Make assumptions X Exp(), and N N (0, 2 ).

The question becomes: What is the maximum point of f (x|y)?

We need f (y|x), reason thusly: For a fixed X = x the only randomness

Inversion example contd.

As we want the maxima, it suffices to minimize

If y > 2 , take x as best estimate of the true signal. Otherwise we

Example: Let X P oisson(1 ) and Y P oisson(2 ) and X

Discrete convolution has sums instead of integrals.

Todays menu was

You might also like