Professional Documents
Culture Documents
(ENM 503)
Michael A. Carchidi
December 1, 2015
Chapter 7 - Jointly Distributed Random Variables
The following notes are based on the textbook entitled: A First Course in
Probability by Sheldon Ross (9th edition) and these notes can be viewed at
https://canvas.upenn.edu/
after you log in using your PennKey user name and Password.
1. Joint Distribution Functions
We are often interested in probability statements concerning two or more random variables. In order to do deal with such probabilities, we define, for any two
random variables X and Y , the joint cumulative probability distribution function
of X and Y by
F (a, b) = P ((X a) (Y b))
(1)
for < a, b < +. The cdf of just the random variable X can then be obtained
from Equation (1) by taking the limit of Equation (1) as b + since
lim ((X a) (Y b)) = (X a) lim (Y b) = (X a) (Y +)
b+
b+
which reduces to
lim ((X a) (Y b)) = X a.
b+
or
lim F (a, b) = P (X a) = FX (a).
(2a)
(2b)
The separate joint distribution functions in Equations (2a,b) are sometimes referred to as the marginal distributions of X and Y .
Note that
where
and so
F ((X > a) (Y > b)) = 1 F ((X a) (Y b)).
But
F ((X a) (Y b)) = F (X a) + F (Y b) F ((X a) (Y b))
which becomes
F ((X a) (Y b)) = FX (a) + FY (b) F (a, b).
Thus we find that
F ((X > a) (Y > b)) = 1 FX (a) FY (b) + F (a, b).
(3)
(4)
(5)
xRX
respectively.
Example #1: The Roll of Two Dice
Suppose that a 5-sided die and a 6-sided die are rolled with X being the
outcome of the 5-sided die and Y being the outcome of the 6-sided die. Then the
values of p(x, y) are
1 1
1
=
5 6
30
Suppose that 3 balls are randomly selected from an urn containing 3 red, 4
white and 5 blue balls. If we let X and Y denote, respectively, the number of red
and white balls drawn, then RX = {0, 1, 2, 3} and RY = {0, 1, 2, 3} with
34 5 34 5
p(x, y) =
3xy
y5+4+3
=
3
or
123xy
1 3
4
5
p(x, y) =
.
220 x y 3 x y
3
and
3
p(1, 2) = P ((X = 1)(Y = 2)|3)P (3) =
(0.5)1 (0.5)21 (0.3) = 0.1125 = p(2, 1).
1
The results of these can be summarized in the following table
x\y
0
1
2
3
py (y) = Column Sums
The Values
0
1
0.1500 0.1000
0.1000 0.175
0.0875 0.1125
0.0375
0
0.3750 0.3875
of p(x, y)
2
3
0.0875 0.0375
0.1125
0
0
0
0
0
0.2000 0.0375
Note how pX (x)and py (y) are computed as marginal distributions by taking row
and column sums, respectively.
Jointly Continuous Random Variables
We say that X and Y are jointly continuous if there exists a function f (x, y),
defined for all real x and y, having the property that for every set C = R R of
pairs of real numbers,
ZZ
P ((X, Y ) C) =
f (x, y)dxdy.
(7)
(x,y)C
The function f (x, y) is called the joint probability density function of X and Y .
If A and B are any sets of real numbers, then by defining
C = {(x, y)|x A, y B},
we see that
P ((X A) (Y B)) =
and hence
fX (x) =
Z Z
B
f (x, y)dxdy
(8a)
(8b)
f (x, y)dy
and
fY (y) =
f (x, y)dx.
Since
F (a, b) = P ((X a) (Y b)) =
we see that
f (x, y)dydx
2 F (a, b)
f (x, y) =
ab
(9)
and since
P ((a1 < X a2 ) (b1 < Y b1 )) =
a2
a1
b2
f (x, y)dydx
(10a)
b1
we see that
P ((a < X a + a) (b < Y b + )) =
a+
b+b
f (x, y)dydx
which reduces to
P ((a < X a + a) (b < Y b + b)) = f (a, b)ab
(10b)
showing that f (a, b) is the probability per unit area that (X, Y ) is near (a, b).
Example #4: Joint Probability Density Functions
Suppose that the joint probability density function for the random variables
X and Y is given by
x 2y
0 < x, y
2e e , for
f (x, y) =
.
0,
for other values of x and y
Then
We also have
P (X < Y ) =
c, for x2 + y 2 R2
f (x, y) =
0, for x2 + y 2 > R2
where c is determined using
Z
and so
+R
f (x, y) =
f (x, y)dydx = 1
+ R2 x2
R2 x2
cdydx = cR2 = 1
1, for x2 + y 2 R2
R2
0, for x + y > R
7
R2 x2 R
for R x +R. By symmetry, we also have
Z
Z 2 2
+
fY (y) =
f (x, y)dx =
2 p 2
1
dx
=
R y2
R2
R2
R y
R2 y2
for R y +R, as the pdf for Y . Next, let D be the random variable giving
the distance the point (X, Y ) is from the origin, then
z 2
z 2
FD (z) = P (D z) = P (x2 + y 2 < z 2 ) =
=
R2
R
for 0 z R. Using this, we have
fD (z) =
2z
dF (z)
= 2
dz
R
for 0 z R.
Example #6: Computing the pdf of Z = X/Y
Suppose that the joint pdf of X and Y is given by
(x+y)
, for
0 < x, 0 < y
e
f (x, y) =
,
0,
for other values of x and y
to compute the pdf of Z = X/Y , we use the cdf (since this involves probabilities)
and write
ZZ
f (x, y)dxdy
F (z) = P (Z z) = P (X/Y z) =
x/yz
or
F (z) =
which reduces to
yz
f (x, y)dxdy =
F (z) =
Z yz
Z yz
0
e(x+y) dxdy =
e(x+y) dxdy
z
z+1
for 0 z. Then
d
dF (z)
=
f (z) =
dz
dz
z
z+1
1
(z + 1)2
for 0 z.
Of course, all of the above can be extended to several variables using
n
!
\
F (x1 , x2 , x3 , ..., xn ) = P
(Xk xk )
(11a)
k=1
n F (x1 , x2 , x3 , ..., xn )
x1 x2 x3 xn
(11b)
n
pn1 pn2 pn3 pnr r
n1 n2 n3 nr 1 2 3
(12)
or
9!
p(3, 2, 2, 1, 1, 0) =
3!2!2!1!1!0!
or p(3, 2, 2, 1, 1, 0) = 0.0015 = 0.15%.
9
1
35
=
6
23328
(13a)
for any two sets A and B. This says that the events EA = X A and EB = Y B
are independent. This also says that
F (x, y) = P ((X x) (Y y)) = P (X x)P (Y y) = FX (x)FY (y)
(13b)
or
f (x, y) = fX (x)fY (x)
(14b)
for all x RX and y RY . This says that X and Y are independent if knowing
the value of one does not change the distribution of the other. Random variables
that are not independent are said to be dependent.
Example #8
A man and woman decide to meet at a certain location. If each of them
independently arrives at a time uniformly distributed between 12 noon and 1 PM,
find the probability that the first to arrive has to wait longer than 10 minutes. To
answer this, X and Y denote, respectively, the time past 12 noon that the man
and the woman arrive, then X and Y are independent random variables each of
10
which is uniformly distributed over the interval from 0 to 60 minutes. The desired
probability is
P = P (|Y X| > 10) = P ((Y X > 10) (Y X < 10))
or
P = P (Y > X + 10) + P (Y < X 10).
This leads to
P (Y > X + 10) =
50
60
fX (x)fY (y)dydx =
x+10
25
,
72
and
P (Y < X 10) =
60
10
x10
fX (x)fY (y)dydx =
60
10
x10
1
1
25
dydx =
60 60
72
so that P = 25/36 ' 0.694. A plot of the region in which any one has to wait
more than ten minutes for the other is shown as the upper left and lower right
triangles in the figure below.
60
50
40
30
20
10
10
20
30
40
50
60
1
(50)(50)
2
+ 12 (50)(50)
25
=
(60)(60)
36
11
which is the same answer obtained via integration. Note that just comparing areas
when dealing with uniform distributions is valid and sometimes simpler than doing
the integration.
Example #9: Balls and Urns Example #2 Revisited
Suppose that 3 balls are randomly selected from an urn containing 3 red, 4
white and 5 blue balls. If we let X and Y denote, respectively, the number of red
and white balls drawn, then RX = {0, 1, 2, 3} and RY = {0, 1, 2, 3} with
34 5 34 5
p(x, y) =
3xy
y5+4+3
=
3
or
123xy
1 3
4
5
p(x, y) =
.
220 x y 3 x y
108
220
and
making
pY (2) =
48
220
108
48
324
18
=
6= p(1, 2) =
220 220
3025
220
and thereby showing that X and Y are not independent here.
pX (1)pY (2) =
12
dxd
P = P (X < (L/2) cos()) =
/2 D/2
0
0
which reduces to P = 2L/(D).
13
R2 R1 1/2
1/2 R2 .
1
1
1
=
2
2
8
P (Triangle with R1 R2 ) =
or
1/2
1/2
x+1/2
f2 (y)f1 (x)dydx
1/2
x+1/2
1/2
1 1
1
dydx =
1 1
8
Assuming next that R2 R1 , the three sides of the triangle being formed have
lengths R2 , R1 R2 and 1 R1 , and since R1 and R2 are identical distributions
and independent, we may simply interchange the rolls of R1 and R2 and use the
result of the previous calculation. Thus, the symmetry in the problem leads to
the area shown below
1 1 1
+ =
8 8 4
or P = 25%.
16
1
8
1 R2 (R1 ) + (R2 R1 )
,
R2 R1 1/2
1/2 R2 .
which is a region in the unit square that looks like a right triangle with side lengths
1/2 and 1/2, as shown below,
but because R1 and R2 are not independent and not both uniform in the unit
square here, we may not use simple areas to compute the probability. Instead, we
have
Z
Z
1/2
P (Triangle with R1 R2 ) =
or
P (Triangle with R1 R2 ) =
x+1/2
f2 (y|x)f1 (x)dydx
1/2
1/2
x+1/2
1/2
1
1
dydx
1x 1
which reduces to
P (Triangle with R1 R2 ) = ln(2)
1
2
or P ' 19.3%. The fact that this is smaller than 25% from the previous example
makes sense since we are being more restrictive in the choice of R2 here. Note
that in computing f2 (y|x), we have
f2 (y|x) =
since Y |X U [x, 1).
1
1x
and
f (x, y) = g(x2 + y 2 ),
respectively, so that
fX (x)fY (y) = g(x2 + y 2 )
18
or
fX0 (x)
g0 (x2 + y 2 )
=
.
2xfX (x)
g(x2 + y 2 )
dfX (x)
= 2cxdx
fX (x)
or
or
ln(fX (x)) = cx2 + c1
fX (x) = Aecx .
or
Since
1=
fX (x)dx = A
cx2
dx = A
cx2
dx = 2A
Setting
c=
1
2 2
we have
1
1
2
fX (x) =
e 2 (x/)
2
19
ecx dx.
for < x < +, which shows that X N(0, 2 ). Using the same argument
for Y , we have
1
1
2
fY (y) =
e 2 (y/)
2
for < y < +. Finally, using assumption 2, we could easily show that
=
and the student should do this. Thus we find that
1
1
1
1
2
2
f (x, y) = fX (x)fY (y) =
e 2 (x/)
e 2 (y/)
2
2
or
1 1 (x2 +y2 )/2
f (x, y) =
e 2
2 2
for < x, y < +. Using this we may now compute the probability that the
point (X, Y ) is within a distance R from the origin, i.e., P (D R) where
D = X2 + Y 2
as
which leads to
Z
P (D R) =
P (D R) = P (D2 R2 ) = P (X 2 + Y 2 R2 )
+R
+ R2 x2
R2 x2
f (x, y)dydx =
+R
+ R2 x2
R2 x2
But
P (Xk > y) = 1 P (Xk y) = 1 Fk (y)
and so
n
Y
G(y) = 1 (1 Fk (y)).
(15b)
k=1
and so
1 Fk (y) = ek y .
Then, if
Y = min{X1 , X2 , X3 , . . . , Xn }
we find that
G(y) = 1
or simply
(16a)
n
n
Y
Y
(1 Fk (y)) = 1
ek y
k=1
k=1
21
(16b)
1 Fk (y) = (1 pk )y
and so
and so
Y = min{X1 , X2 , X3 , . . . , Xn }
leads to
(17a)
n
n
Y
Y
G(y) = 1 (1 Fk (y)) = 1 (1 pk )y
k=1
or simply
G(y) = 1 (1 p)
k=1
with
n
Y
(1 p) =
(1 pk )
k=1
(17b)
k=1
for y = 1, 2, 3, ....
ya
ba
and
1 Fk (y) =
by
ba
and
Y = min{X1 , X2 , X3 , . . . , Xn }
has cdf given by
n
n
Y
Y
by
G(y) = 1 (1 Fk (y)) = 1
ba
k=1
k=1
22
(18a)
or
G(y) = 1
for a y b.
by
ba
(18b)
n
Y
k=1
Pr(Xk z) =
n
Y
(19b)
Fk (z).
k=1
ya
ba
and
1 Fk (y) =
by
ba
and
Z = max{X1 , X2 , X3 , . . . , Xn }
has cdf given by
H(z) =
n
Y
Fk (z) =
k=1
or
H(z) =
for a z b.
n
Y
za
k=1
za
ba
23
(20a)
ba
(20b)
X
x
or simply
pZ (z) =
P (Y = z x)pX (x)
X
x
pX (x)pY (z x)
(21)
where the sum in over all possible values of x in which both pX (x) and pY (z x)
are non-zero. This is known as a convolution sum.
Example #18: The Sum of Two Geometric Distributions
Suppose that X is geometric with parameter 0 p 1 so that
0,
for x = 0, 1, 2, ...
pX (x) =
x = 1, 2, 3, ...
p(1 p)x1 , for
0,
for y = 0, 1, 2, ...
.
pY (y) =
y1
q(1 q) , for
y = 1, 2, 3, ...
X
x
pX (x)pY (z x) =
24
z1
X
x=1
pX (x)pY (z x)
(22a)
(22b)
for z = 2, 3, 4, ..., since pX (x) and pY (z x) are both non-zero only for x =
1, 2, 3, ..., z 1. This says that
x1
z1
z1
X
X
1p
x1
zx1
z2
pZ (z) =
p(1 p) q(1 q)
= pq(1 q)
1q
x=1
x=1
or
z2
pZ (z) = pq(1 q)
which reduces to
x
z2
X
1p
x=0
1q
z2
= pq(1 q)
(1 q)z1 (1 p)z1
pZ (z) = pq
(22c)
pq
for z = 2, 3, 4, ..., and p 6= q, and pZ (z) = 0 for z = 1, 0, 1, 2, .... As a check,
we note that
X
X
(1 q)z1 (1 p)z1
pZ (z) =
pq
pq
z=2
z=2
!
X
X
pq
(1 q)z1
(1 p)z1
=
p q z=2
z=2
!
X
X
pq
=
(1 q)z
(1 p)z
p q z=1
z=1
1q
1p
pq
=
p q 1 (1 q) 1 (1 p)
or
X
pZ (z) = 1
z=2
X
x
pX (x)pY (z x) =
z1
X
x=1
pX (x)pY (z x)
for z = 2, 3, 4, ..., since pX (x) and pY (z x) are both non-zero only for x =
1, 2, 3, ..., z 1. This says that
pZ (z) =
z1
X
x=1
x1
p(1 p)
zx1
p(1 p)
25
z2
= p (1 p)
z1
X
x=1
or
pZ (z) = (z 1)p2 (1 p)z2
(22d)
X
z=2
X
X
2
z2
2
pZ (z) =
(z 1)p (1 p)
=p
z(1 p)z1 = 1
z=2
z=1
(1 q)z1 (1 p)z1
= p2 (z 1)(1 p)z2
lim pq
qp
pq
showing that Equation (22d) is consistent with Equation (22c).
Example #19: The Sum of Two Poisson Distributions
Suppose that X and Y are two Poisson random variables with parameters
and , respectively, so that
pX (x) =
e x
x!
and
pY (y) =
e y
y!
(23a)
z
X
x=0
pX (x)pY (z x) =
e(+)
z!
z
X
e x e zx
z
X
e(+) x zx
=
(z
x)!
x!(z x)!
x=0
x=0
z
z
X
e(+) X z x zx
x zx
=
z!
x!(z x)!
z! x=0 x
x=0
x!
and so we have
e(+) ( + )z
(23b)
pZ (z) =
z!
for z = 0, 1, 2, 3, ..., which shows that Z = X + Y is also Poisson with parameter
+ . In general, if X1 , X2 , X3 , . . ., Xn are all Poisson with parameters 1 , 2 ,
3 , . . ., n , then
(23c)
X = X1 + X2 + X3 + + Xn
26
(23d)
FZ (z) =
P (Y z x)fX (x)dx =
FY (z x)fX (x)dx.
(24)
where the integration is over all values of x in which both fX (x) and fY (z x)
are non-zero. This is known as a convolution integral.
Example #20: The Sum of Two Exponential Distributions
Suppose that X is exponential with parameter 0 < so that
for x < 0
0,
fX (x) =
x
e , for x 0
for y < 0
0,
fY (y) =
.
y
e , for y 0
27
(25a)
(25b)
for z 0, since fX (x) and fY (z x) are both non-zero only for 0 x z. This
says that
Z z
1 ez()
z
x+x
z
fZ (z) = e
e
dx = e
0
or
ez ez
fZ (z) =
(25c)
z
Z
Z
e
ez
dz = 1
pZ (z)dz =
0
0
as it must. Assuming next that = , we note that if Z = X + Y , we have
Z
Z z
fX (x)fY (z x)dx =
ex e(zx) dx
fZ (z) =
0
for z 0, since fX (x) and fY (z x) are both non-zero only for 0 x z. This
says that
fZ (z) = 2 zez
(25d)
for z 0 and = and fZ (z) = 0 for z < 0. As a check, we note that
Z
Z
fZ (z)dz =
2 zez dz = 1
0
ez
e
lim
= 2 zez
28
2 !
1
1 x 1
exp
fX (x) =
2
1
21
and Y is normal with parameters 2 and 22 so that
2 !
1
1 y 2
.
exp
fY (y) =
2
2
22
(26a)
(26b)
Then if Z = X + Y , we have
Z +
fX (x)fY (z x)dx
fZ (z) =
2 !
Z +
1
1 x 1
exp
=
2
1
21
2 !
1
1 z x 2
dx.
exp
2
2
22
Setting w = (x 1 )/1 , we have x = 1 + w1 and dw = dx/1 , and so
2 !
Z +
1 z 1 w1 2
1
1 2
fZ (z) =
dw
exp w exp
22
2
2
2
2 !
Z +
1 z w1 (1 + 2 )
1 2
1
dw
exp w exp
=
22
2
2
2
or
2 !
1
1 z (1 + 2 )
fZ (z) =
exp
22
2
2
Z +
12
z (1 + 2 )
1 2
dw.
exp w 1 + 2 + w1
2
2
22
aw2 +bw
b2 /4a
dw = e
29
(26c)
fZ (z) =
1
1
exp
22
2
z (1 + 2 )
2
2 !
which reduces to
2 v
u
1 z(12+2 )
u 2
2
exp
t
2
2
1 + 12
2 1 + 12
2
1
1
fZ (z) = p 2
exp
2
2 1 + 22
!2
z (1 + 2 )
p
12 + 22
(26d)
(27a)
(27b)
0,
0,
fX (x)fY (z x)dx =
1
fZ (z) =
ba
1
fY (z x)dx
ba
fY (z x)dx.
and so
or
1
fZ (z) =
ba
min(b,za)
max(a,zb)
fZ (z) =
1
1
dx =
ba
(b a)2
min(b,za)
dx
max(a,zb)
min(b, z a) max(a, z b)
.
(b a)2
(27c)
and
max(a, z b) = a
zaa
z 2a
=
.
2
(b a)
(b a)2
and
max(a, z b) = z b
b (z b)
2b z
=
.
2
(b a)
(b a)2
z 2a, for 2a z a + b
(b a)2
(27d)
2b z, for a + b z 2b
31
.07
.06
.05
.04
.03
.02
.01
10
20
30
40
50
(28a)
k xk1 ex
(k 1)!
(28b)
for x 0 and fX (x) = 0 for x < 0. We had also seen that the cdf of the Erlang
distribution is given by
x
FX (x) = 1 e
k1
X
(x)i
i=0
i!
(28c)
these light bulbs into a room and turns them both on at the same time and leaves
them turned on. Person B takes the other two light bulbs into a dierent room
and turns on only one of the light bulbs and then turns on the other light bulb
only when and if the first light bulb burns out. (a) Compute the probability that
person A will be in the dark at the end of one week (168 hours). (b) Compute the
probability that person B will be in the dark at the end of one week (168 hours).
To solve part (a), let X1 , X2 , X3 , and X4 be the random variables for the
lifetime (in hours) of the four light bulbs so that
F1 (x) = F2 (x) = F3 (x) = F4 (x) = 1 ex
with = 1/150. Since person A keeps both light bulbs on at the same time, this
person will be in the dark after one week (24 7 = 168 hours) only when
Y = max(X1 , X2 ) 168.
It should be clear that
Fmax (y) = P (Y y) = P (max(X1 , X2 ) y) = P ((X1 y) (X2 y)).
Since X1 and X2 are independent, we have
Fmax (y) = P ((X1 y) (X2 y)) = P (X1 y)P (X2 y)
resulting in
Fmax (y) = F1 (y)F2 (y) = (1 ey/150 )2
and so
P (Y 168) = (1 e168/150 )2 ' 0.4539.
To solve (b), we note that since person B keeps the light bulbs on only one at the
same time, this person will be in the dark after one week (24 7 = 168 hours)
only when
Z = X3 + X4 168.
From our notes in class we know that Z will be an Erlang distribution with
parameters k = 2, and = 1/300. Then
Fsum (z) = 1
21
X
e2(1/300)z
i=0
33
(2(1/300)z)i
i!
which reduces to
Then
z z/150
e
Fsum (z) = 1 1 +
.
150
168 168/150
e
,
P (Z 168) = 1 1 +
150
z /21 ez dz
34
1
0.8
0.6
0.4
0.2
1
22/2 (2/2)
1
x2/21 ex/2 = ex/2
2
35