You are on page 1of 21

Chapter 2

Multivariate Distributions
2.1 Distributions of Two Random Variables
2.2 Conditional Distributions and Expectations
2.3 The Correlation Coecient
2.4 Independent Random Variables
2.5 Extension to Several Random Variables
1
2.1 Distributions of Two Random Variables
We begin the discussion of two random variables with the following
example. A coin is to be tossed three times and our interest is in the
ordered number pair (number of Hs on the rst two tosses, number of
Hs on all three tosses), where Hand Trepresent, respectively, heads and
tails. Thus the sample space is C = {c : c = c
i
, i = 1, 2, . . . , 8}, where c
1
is TTT, c
2
is TTH, c
3
is THT, c
4
is HTT, c
5
is THH, c
6
is HTH,
c
7
is HHT, and c
8
is HHH. Let X
1
and X
2
be two functions such
that X
1
(c
1
) = X
1
(c
2
) = 0, X
1
(c
3
) = X
1
(c
4
) = X
1
(c
5
) = X
1
(c
6
) = 1,
X
1
(c
7
) = X
1
(c
8
) = 2; and X
2
(c
1
) = 0, X
2
(c
2
) = X
2
(c
3
) = X
2
(c
4
) =
1, X
2
(c
5
) = X
2
(c
6
) = X
2
(c
7
) = 2, X
2
(c
8
) = 3. Thus X
1
and X
2
are
real-valued functions dened on the sample space C, which take us from
that sample space to the space of ordered number pairs
A = {(0, 0), (0, 1), (1, 1), (1, 2), (2, 2), (2, 3)}
Thus X
1
and X
2
are two random variables dened on the space C, and, in
this example, the space of these random variables is the two-dimensional
set A given immediately above.
Denition 1. Given a random experiment with a sample space C.
Consider two random variables X
1
and X
2
, which assign to each element
c of C one and only one ordered pair of numbers X
1
(c) = x
1
, X
2
(c) = x
2
.
The space of X
1
and X
2
is the set of ordered pairs
A = {(x
1
, x
2
); x
1
= X
1
(c), x
2
= X
2
(c), c C}.
Let A be the space associated with the two random variables X
1
and X
2
and let A be a subset of A. As in the case of one random
variable, we shall speak of the event A.
2
Under certain restrictions on the space A and the function f > 0 on
A, we say that the two random variables Xand Y are of the discrete type
or of the continuous type, and have a distribution of that type, according
as the probability set function P(A), A A, can be expressed as
P(A) = Pr
[
(X, Y) A
]
=

A
f (x, y),
or as
P(A) = Pr
[
(X, Y) A
]
=

A
f (x, y)dxdy,
In either case f is called the p.d.f. of the two random variables X and
Y. Of necessity, P(A) = 1 in each case.
Example 1. (Pag. 77) Let
f (x, y) =
{
6x
2
y, 0 < x < 1, 0 < y < 1
0, elsewhere,
be the p.d.f. of two random variables X and Y, which must be of the
continuous type. We have, for instance,
P(0 < X <
3
4
,
1
3
< Y < 2) =

2
1/3

3/4
0
f (x, y)dxdy
=

1
1/3

3/4
0
6x
2
ydxdy +

2
1

3/4
0
0dxdy
=
3
8
+0 =
3
8
.
Note that this probability is the volume under the surface f (x, y) = 6x
2
y
and above the rectangular set {(x, y) : 0 < x <
3
4
,
1
3
< y < 1} in the
xy-plane.
Let the random variables X and Y have the probability set function
P(A), where A is a two-dimensional set. If A is the unbounded set
{(u, v); u x, v y}, where x and y are real numbers, we have
P(A) = Pr
[
(X, Y) A
]
= Pr(X x, Y y).
3
This function of the point (x, y) is called the distribution function of X
and Y and is denoted by
F(x, y) = Pr(X x, Y y).
If X and Y are random variables of the continuous type that have p.d.f.
f (x, y), then
F(x, y) =

f (u, v)dudv.
Accordingly, at points of continuity of f (x, y), we have

2
F(x, y)
xy
= f (x, y).
It is left as an exercise to show, in every case, that
Pr(a < X b, c < Y d) = F(b, d) F(b, c) F(a, d) +F(a, c)
for all real constants a < b, c < d.
Let f (x
1
, x
2
) be the p.d.f. of two random variables X
1
and X
2
.
Pr(a < X
1
< b, < X
2
< ) =

b
a

f (x
1
, x
2
)dx
2
dx
1
for the continuous case, and by
Pr(a < X
1
< b, < X
2
< ) =

a<x
1
<b

x
2
f (x
1
, x
2
)
for the discrete case. Now each of

f (x
1
, x
2
)dx
2
and

x
2
f (x
1
, x
2
)
is a function of x
1
alone, say f
1
(x
1
). Thus, for every a < b, we have
4
The marginal p.d.f. of X
1
is given by
f
1
(x
1
) =

f (x
1
, x
2
)dx
2
(continuous case)
=

x
2
f (x
1
, x
2
) (discrete case)
Thus, for every a < b, we have
Pr(a < X
1
< b) =

b
a
f
1
(x
1
)dx
1
(continuous case),
=

a<x
1
<b
f
1
(x
1
) (discrete case)
The marginal p.d.f. of X
2
is given by
f
2
(x
2
) =

f (x
1
, x
2
)dx
1
(continuous case)
=

x
1
f (x
1
, x
2
) (discrete case)
Exercise 2.1. (Pag. 81)
Let f (x
1
, x
2
) = 4x
1
x
2
, 0 < x
1
< 1, 0 < x
2
< 1, zero elsewhere,
be the p.d.f. of X
1
and X
2
. Find Pr(0 < X
1
<
1
2
,
1
4
< X
2
< 1),
Pr(X
1
= X
2
), Pr(X
1
< X
2
), and Pr(X
1
X
2
).
5
2.2 Conditional Distributions and Expectations
f (x
1
, x
2
) is the joint p.d.f. of the random variables X
1
and X
2
.
The conditional p.d.f. of the random variable X
1
, given that the
random variable X
2
has the value x
2
is dened by
f
1|2
(x
1
| x
2
) =
f (x
1
, x
2
)
f
2
(x
2
)
, f
2
(x
2
) > 0.
The conditional p.d.f. of the random variable X
2
, given that the
random variable X
1
has the value x
1
is dened by
f
2|1
(x
2
| x
1
) =
f (x
1
, x
2
)
f
1
(x
1
)
, f
1
(x
1
) > 0.
Since each of f
2|1
(x
2
| x
1
) and f
1|2
(x
1
| x
2
) is a p.d.f. of one random
variable (whether of the discrete or the continuous type), each has all
the properties of such a p.d.f.
6
The probability
Pr(a < X
2
< b | X
1
= x
1
) =

b
a
f
2|1
(x
2
| x
1
)dx
2
is called the conditional probability that a < X
2
< b, given X
1
= x
1
.
Similarly, the conditional probability that c < X
1
< d, given X
2
= x
2
,
is
Pr(c < X
1
< d | X
2
= x
2
) =

d
c
f
1|2
(x
1
| x
2
)dx
1
If u(X
2
) is a function of X
2
, the expectation
E
[
u(X
2
) | x
1
]
=

u(x
2
)f
2|1
(x
2
| x
1
)dx
2
is called the conditional expectation of u(X
2
), given X
1
= x
1
The mean and the variance of the conditional distribution of X
2
,
given X
1
= x
1
are
E(X
2
| x
1
), and
E{
[
X
2
E(X
2
| x
1
)
]
2
| x
1
}, respectively. We have
var(X
2
| x
1
) = E(X
2
2
| x
1
)
[
E(X
2
| x
1
)
]
2
The conditional expectation of u(X
1
), given X
2
= x
2
, is given by
E
[
u(X
1
) | x
2
]
=

u(x
1
)f
1|2
(x
1
| x
2
)dx
1
7
Example 1. Let X
1
and X
2
have the joint p.d.f.
f (x
1
, x
2
) =
{
2, 0 < x
1
< x
2
< 1
0, elsewhere.
Then the marginal probability density functions are, respectively,
f
1
(x
1
) =

1
x
1
2dx
2
= 2x
2

1
x
1
= 2(1 x
1
), 0 < x
1
< 1,
= 0 elsewhere,
and
f
2
(x
2
) =

x
2
0
2dx
1
= 2x
1

x
2
0
= 2x
2
, 0 < x
2
< 1,
= 0 elsewhere,
The conditional p.d.f. of X
1
, given X
2
= x
2
, 0 < x
2
< 1, is
f
1|2
(x
1
| x
2
) =
2
2x
2
=
1
x
2
, 0 < x
1
< x
2
= 0 elsewhere.
Here the conditional mean and conditional variance of X
1
, given X
2
=
x
2
, are, respectively,
E(X
1
| x
2
) =

x
1
f
1|2
(x
1
| x
2
)dx
1
=

x
2
0
x
1
(
1
x
2
)
dx
1
=
(
1
x
2
)

x
2
0
x
1
dx
1
=
(
1
x
2
)
{
x
2
1
2

x
2
0
}
=
x
2
2
, 0 < x
2
< 1,
8
and
var(X
1
| x
2
) =

x
2
0
(
x
1

x
2
2
)
2
(
1
x
2
)
dx
1
=
(
1
x
2
)
x
2
0
(
x
1

x
2
2
)
2
dx
1
=
(
1
x
2
)
1
3
(
x
1

x
2
2
)
3

x
2
0
=
x
2
2
12
, 0 < x
2
< 1.
Finally, we shall compare the values of
Pr
[
0 < X
1
<
1
2
| X
2
=
3
4
]
and Pr
[
0 < X
1
<
1
2
]
.
We have
Pr
[
0 < X
1
<
1
2
| X
2
=
3
4
]
=

1/2
0
f
1|2
(
x
1
|
3
4
)
dx
1
=

1/2
0
(
4
3
)
dx
1
=
2
3
,
but
Pr
[
0 < X
1
<
1
2
]
=

1/2
0
f
1
(
x
1
)
dx
1
=

1/2
0
2(1 x
1
)dx
1
=
3
4
.
9
Since E(X
2
|x
1
) is a function of x
1
, then E(X
2
|X
1
) is a random
variable with its own distribution, mean, and variance.
The expectation of a function of two random variables, say u(X
1
, X
2
).
Of course, Y = u(X
1
, X
2
) is a random variable and has a p.d.f.,
say g(y), and
E(Y) =

yg(y)dy.
However, it can be proved that E(Y) equals
E
[
u(X
1
, X
2
)
]
=

u(x
1
, x
2
)f (x
1
, x
2
)dx
1
dx
2
We call E
[
u(X
1
, X
2
)
]
the expectation (mathematical expectation or ex-
pected value) of u(X
1
, X
2
), and it can be shown to be a linear operator
as in the one-variable case.
We also note that the expected value of X
2
can be found in two
ways:
E
(
X
2
)
=

x
2
f (x
1
, x
2
)dx
1
dx
2
=

x
2
f
2
(x
2
)dx
2
10
2.3 The Correlation Coecient
Let X, Y, and Z denote random variables that have joint p.d.f.
f (x, y, z).
The means of X, Y, and Z are denoted by
1
,
2
, and
3
respectively.
The variances of X, Y, and Z are denoted by
2
1
,
2
2
, and
2
3
respectively.
The covariance of X and Y is given by
E[(X
1
)(Y
2
)] = E(XY)
1

2
.
If each of
1
and
2
is positive, the number

12
=
E[(X
1
)(Y
2
)]

2
is called the correlation coecient of X and Y.
If E(e
t
1
X+t
2
Y
) exists for h
1
< t
1
< h
1
, h
2
< t
2
< h
2
where
h
1
and h
2
are positive, it is denoted by M(t
1
, t
2
) and is called the
moment-generating function of the joint distribution de X and Y.
Facts;
M(t
1
, 0) = E(e
t
1
X
) = M(t
1
)
and
M(0, t
2
) = E(e
t
2
Y
) = M(t
2
)
In the case of random variables of the continuous type,

k+m
M(t
1
, t
2
)
t
k
1
t
m
2
=

x
k
y
m
e
t
1
x+t
2
y
f (x, y)dxdy
11
so that

k+m
M(t
1
, t
2
)
t
k
1
t
m
2
|
t
1
=t
2
=0
= E(X
k
Y
m
).
Examples;

1
= E(X) =
M(0, 0)
t
1
,
2
= E(Y) =
M(0, 0)
t
2
,

2
1
= E(X
2
)
2
1
=

2
M(0, 0)
t
2
1

2
1

2
2
= E(Y
2
)
2
2
=

2
M(0, 0)
t
2
2

2
2
E
[
(X
1
)(Y
2
)
]
=

2
M(0, 0)
t
1
t
2

2
.
12
Theorem 2.4.1. Suppose (X, Y) have a joint distribution with
the variances of X and Y nite and positive. Denote the means and
variances of X and Y by
1
,
2
and
2
1
,
2
2
, respectively, and let be
the correlation coecient between X and Y. If E(Y | X) is linear in X
then
E(Y | X) =
2
+

1
(X
1
) (2.4.1)
and
E(Var(Y | X)) =
2
2
(1
2
). (2.4.2)
Note:
If E(X | Y) is linear in Y then
E(X | Y) =
1
+

2
(Y
2
)
and
E(Var(X | Y)) =
2
1
(1
2
).
Example 2.4.2 Let the random variables X and Y have the linear
conditional means E(Y | x) = 4x + 3 and E(X | y) =
1
16
y 3. In
accordance with the general formulas for the linear conditional means,
we see that E(Y | x) =
2
if x =
1
and E(X | y) =
1
if y =
2
.
Accordingly, in this special case, we have
2
= 4
1
+3 and
1
=
1
16

2
3
so that
1
=
15
4
and
2
= 12. The general formulas for the linear
conditional means also show that the product of the coecients of x and
y, respectively, is equal to
2
and that the quotient of these coecients
is equal to
2
2
/
2
1
. Here
2
= 4
(
1
16
)
=
1
4
, with =
1
2
(not
1
2
), and

2
2
/
2
1
= 64.
Thus, from the two linear conditional means, we are able to nd the
values of
1
,
2
, , and
2
/
1
, but not the values of
1
and
2
.
13
2.4 Independent Random Variables
Denition 2. Let the random variables X
1
and X
2
have the joint
p.d.f. f (x
1
, x
2
) and the marginal probability density functions f
1
(x
1
)
and f
2
(x
2
), respectively. The random variables X
1
and X
2
are said
to be independent if, and only if, f (x
1
, x
2
) f
1
(x
1
)f
2
(x
2
). Random
variables that are not independent are said to be dependent.
14
Theorem 1 Let the random variables X
1
and X
2
have the joint
p.d.f. f (x
1
, x
2
). Then X
1
and X
2
are stochastically independent if and
only if f (x
1
, x
2
) can be written as the product of a nonnegative function
of x
1
alone and a nonnegative function of x
2
alone. That is,
f (x
1
, x
2
) g(x
1
)h(x
2
)
where g(x
1
) > 0, x
1
A, zero elsewhere, and h(x
2
) > 0, x
2
A, zero
elsewhere.
Theorem 2 If X
1
and X
2
are stocastically independent random
variables with marginal probability density functions f
1
(x
1
) and f
2
(x
2
),
respectively, then
Pr(a < X
1
< b, c < X
2
< d) = Pr(a < X
1
< b)Pr(c < X
2
< d)
for every a < b and c < d, where a, b, c, and d are constants.
Theorem 3 Let the stochastically independent random variables
X
1
and X
2
have the marginal probability density functions f
1
(x
1
) and
f
2
(x
2
), respectively. The expected value of the product of a function
u(X
1
) of X
1
alone and a function v(X
2
) of X
2
alone is, subject to their
existence, equal to the product of the expected value of u(X
1
) and the
expected value of v(X
2
); that is,
E[u(X
1
)v(X
2
)] = E[u(X
1
)]E[v(X
2
)].
Theorem 4. Let X
1
and X
2
denote random variables that have
the joint p.d.f. f (x
1
, x
2
) and the marginal probability density functions
f
1
(x
1
) and f
2
(x
2
), respectively. Furthermore, let M(t
1
, t
2
) denote the
moment generating function of the distribution. Then X
1
and X
2
are
independent if and only if M(t
1
, t
2
) = M(t
1
, 0)M(0, t
2
).
15
2.5 Extension to Several Random Variables
The notions about two random variables can be extended immedi-
ately to n random variables. We make the following denition of the
space of n random variables.
Denition 3. Consider a random experiment with the sample
space C. Let the random variable X
i
assign to each element c C
one and only one real number X
i
(c) = x
i
, i = 1, 2, . . . , n. The space
of these random variables is the set of ordered n-tuples
A = {(x
1
, x
2
, . . . , x
n
); x
1
= X
1
(c), . . . , x
n
= X
n
(c), c C}.
Further, let A be a subset of A. Then
Pr[(X
1
, . . . , X
n
) A] = P(C), where
C = {c; c C and [X
1
(c), X
2
(c), . . . , X
n
(c)] A}.
The probability set function P(A), A A, can be expressed as
P(A) = Pr[(X
1
, . . . , X
n
) A] =

f (x
1
, . . . , x
n
)
or as
P(A) = Pr[(X
1
, . . . , X
n
) A]
=

f (x
1
, . . . , x
n
)dx
1
dx
n
.
The distribution function of the n random variables
X
1
, X
2
, . . . , X
n
is the point function
F(x
1
, x
2
, . . . , x
n
) = Pr(X
1
x
1
, X
2
x
2
, . . . , X
n
x
n
).
16
Example 1. Let f (x, y, z) = e
(x+y+z)
, 0 < x, y, z, < , zero
elsewhere, be the p.d.f. of the random variables X, Y, and Z. Then the
distribution function of X, Y, and Z is given by
F(x, y, z) = Pr(X x, Y y, Z z)
=

z
0

y
0

x
0
e
uvw
du dv dw
=

z
0

y
0
e
vw
(1 e
x
) dv dw
= (1 e
x
)(1 e
y
)(1 e
z
), 0 x, y, z <
and is equal to zero elsewhere. Incidentally, except for a set of probability
measure zero, we have

3
F(x, y, z)
x y z
= f (x, y, z).
17
Let X
1
, X
2
, . . . , X
n
be random variables having p.d.f. f (x
1
, x
2
, . . . , x
n
)
and let u(X
1
, X
2
, . . . , X
n
) be a function of these variables such that the
nfold integral

u(x
1
, . . . , x
n
)f (x
1
, . . . , x
n
)dx
1
dx
n
(1)
exists, if the random variables are of the continuous type, or such that
the nfold sum

x
1

x
n
u(x
1
, . . . , x
n
)f (x
1
, . . . , x
n
) (2)
exists, if the random variables are of the discrete type. The n-fold inte-
gral (or the n-fold sum, as the case may be) is called the mathematical
expectation, denoted by
E[u(X
1
, X
2
, . . . , X
n
)],
of the function u(X
1
, X
2
, . . . , X
n
).
Let the random variables X
1
, . . . , X
n
have the joint p.d.f. f (x
1
, . . . , x
n
).
If the random variables are of the continuous type, then by an argument
similar to the two-variable case, we have for every a < b,
Pr(a < X
1
< b) =

b
a
f
1
(x
1
)dx
1
,
where f
1
(x
1
) is dened by (n-1)-fold integral
f
1
(x
1
) =

f (x
1
, . . . , x
n
)dx
2
dx
n
Therefore, f
1
(x
1
) is the p.d.f. of the one random variable X
1
and f
1
(x
1
)
is called the marginal p.d.f. of X
1
. The marginal probability density
functions f
2
(x
2
), . . . , f
n
(x
n
) of X
2
, . . . , X
n
, respectively, are similar
(n-1)-fold integrals.
18
Let f (x
1
, x
2
, . . . , x
n
) be the joint p.d.f. of the n random variables
X
1
, X
2
, . . . , X
n
, just as before. Now, however, let us take any group of
k < n of these random variables and let us nd the joint p.d.f. of them.
This joint p.d.f. is called the marginal p.d.f. of this particular group of
k variables.
If f
1
(x
1
) > 0, the symbol f
2,...,n|1
(x
2
, . . . , x
n
| x
1
) is dened by the
relation
f
2,...,n|1
(x
2
, . . . , x
n
| x
1
) =
f (x
1
, x
2
, . . . , x
n
)
f
1
(x
1
)
and f
2,...,n|1
(x
2
, . . . , x
n
| x
1
) is called the joint conditional p.d.f. of
X
2
, . . . , X
n
, given X
1
= x
1
. The joint conditional p.d.f. of any n-
1 random variables, say X
1
, . . . , X
i1
, X
i+1
, . . . , X
n
given X
i
= x
i
, is
dened as the joint p.d.f. of X
1
, X
2
, . . . , X
n
divided by the marginal
p.d.f. f
i
(x
i
), provided that f
i
(x
i
) > 0.
More generally, the joint conditional p.d.f. of n k of the random
variables, for given values of the remaining k variables, is dened as
the joint p.d.f. of the n variables divided by the marginal p.d.f. of the
particular group of k variables, provided that the latter p.d.f. is positive.
The random variables X
1
, X
2
, . . . , X
n
are said to be mutually in-
dependent if and only if
f (x
1
, x
2
, . . . , x
n
) = f
1
(x
1
)f
2
(x
2
) f
n
(x
n
) =
n

i=1
f
i
(x
i
)
The theorem that
E
[
u(X
1
)v(X
2
)
]
= E
[
u(X
1
)
]
E
[
v(X
2
]
for independent random variables X
1
and X
2
becomes, for mutually
independent random variables X
1
, X
2
, . . . , X
n
,
E
[
n

i=1
u
i
(X
i
)
]
=
n

i=1
E[u
i
(X
i
)].
19
The moment-generating funtion of the joint distribution of n random
variables X
1
, X
2
, . . . , X
n
is dened as follows. Let
E[exp(t
1
X
1
+t
2
X
2
+ +t
n
X
n
)]
exist for h
i
< t
i
< h
i
, i = 1, 2, . . . , n, where each h
i
is positive. This
expectation is denoted by M(t
1
, t
2
, . . . , t
n
) and is called the m.g.f. of
the joint distribution of X
1
, X
2
, . . . , X
n
. As in the cases of one or two
variables, this m.g.f. is unique and uniquely determines the joint distri-
bution of the n variables (and hence all marginal distributions).
For example, the m.g.f. of the marginal distribution of X
i
is
M(0, . . . , t
i
, 0, . . . , 0), i = 1, 2, . . . , n;
that of the marginal distribution of X
i
and X
j
is
M(0, . . . , t
i
, 0, . . . , 0, t
j
, 0, . . . , 0);
and so on.
Theorem 4 of this chapter can be generalized, and the factorization
M(t
1
, t
2
, . . . , t
n
) =
n

i=1
M(0, . . . , t
i
, 0, . . . , 0)
is a necessary and sucient condition for the mutual independence of
X
1
, X
2
, . . . , X
n
.
20
2.33 (Page 107) (2.5.6, Page 114)
If f (x
1
, x
2
) = e
x
1
x
2
, 0 < x
1
< , 0 < x
2
< , zero elsewhere,
is the joint p.d.f. of the random variables X
1
and X
2
, show that X
1
and
X
2
are independent and that
M(t
1
, t
2
) = (1 t
1
)
1
(1 t
2
)
1
, t
2
< 1, t
1
< 1.
Also show that
E(e
t(X
1
+X
2
)
) = (1 t)
2
, t < 1.
Accordingly, nd the mean and the variance of Y = X
1
+X
2
.
2.39 (Page 114) (2.6.4 Page 123)
Let X
1
, X
2
, X
3
, and X
4
be four independent random variables, each
with p.d.f. f (x) = 3(1 x)
2
, 0 < x < 1, zero elsewhere. If Y is the
minimum of these four variables, nd the distribution function and the
p.d.f. of Y.
21

You might also like