Notes#7 PDF

Probability and Statistics
(ENM 503)
Michael A. Carchidi
December 1, 2015
Chapter 7 - Jointly Distributed Random Variables
The following notes are based on the textbook entitled: A First Course in
Probability by Sheldon Ross (9th edition) and these notes can be viewed at
https://canvas.upenn.edu/
after you log in using your PennKey user name and Password.
1. Joint Distribution Functions
We are often interested in probability statements concerning two or more random variables. In order to do deal with such probabilities, we define, for any two
random variables X and Y , the joint cumulative probability distribution function
of X and Y by
F (a, b) = P ((X a) (Y b))
(1)
for < a, b < +. The cdf of just the random variable X can then be obtained
from Equation (1) by taking the limit of Equation (1) as b + since
lim ((X a) (Y b)) = (X a) lim (Y b) = (X a) (Y +)
b+
b+
which reduces to
lim ((X a) (Y b)) = X a.
b+
Then, since probability is a continuous set function, we have

lim F (a, b) = lim P ((X a) (Y b)) = P ( lim ((X a) (Y b)))
or
lim F (a, b) = P (X a) = FX (a).
(2a)
In a similar way, we have

lim F (a, b) = P (Y b) = FY (b).
(2b)
The separate joint distribution functions in Equations (2a,b) are sometimes referred to as the marginal distributions of X and Y .
Note that
where
F ((X > a) (Y > b)) = 1 F (X > a) (Y > b)

(X > a) (Y > b) = (X > a) (Y > b) = (X a) (Y b)
and so
F ((X > a) (Y > b)) = 1 F ((X a) (Y b)).
But
F ((X a) (Y b)) = F (X a) + F (Y b) F ((X a) (Y b))
which becomes
F ((X a) (Y b)) = FX (a) + FY (b) F (a, b).
Thus we find that
F ((X > a) (Y > b)) = 1 FX (a) FY (b) + F (a, b).
(3)
The student should show that, in general

P ((a1 < X a2 ) (b1 < Y b2 )) = F (a2 , b2 ) + F (a1 , b1 )
F (a1 , b2 ) F (a2 , b1 )
whenever a1 < a2 and b1 < b2 .
(4)
When X and Y are Both Discrete Random Variables

Although the earlier equations are valid for both discrete and continuous random variables, in the case when X and Y are both discrete random variables, it
is more convenient to define the joint probability mass function of X and Y as
p(x, y) = P ((X = x) (Y = y))
(5)
for all x RX and y RY . The probability mass functions of X and Y separately

are then obtained from
X
X
p(x, y) and pY (y) =
p(x, y),
(6)
pX (x) =
yRY
xRX
respectively.
Example #1: The Roll of Two Dice
Suppose that a 5-sided die and a 6-sided die are rolled with X being the
outcome of the 5-sided die and Y being the outcome of the 6-sided die. Then the
values of p(x, y) are
1 1
1
=
5 6
30
p(x, y) = P ((X = x) (Y = y)) =

for all x = 1, 2, 3, 4, 5 and y = 1, 2, 3, 4, 5, 6.
Example #2: Balls and Urns
Suppose that 3 balls are randomly selected from an urn containing 3 red, 4
white and 5 blue balls. If we let X and Y denote, respectively, the number of red
and white balls drawn, then RX = {0, 1, 2, 3} and RY = {0, 1, 2, 3} with
34 5 34 5
p(x, y) =
3xy
y5+4+3
=
3
or
123xy
1 3
4
5
p(x, y) =
.
220 x y 3 x y
3
The results of these can be summarized in the following table

The Values of 220 p(x, y)
x\y
0
1
2 3 pX (x) = Row Sums
0
10 40 30 4
84
1
30 60 18 0
108
2
15 12 0 0
27
3
1
0
0 0
1
py (y) = Column Sums 56 112 48 4
Sums Equal 220
Note how pX (x)and py (y) are computed as marginal distributions by taking row
and column sums, respectively.
Example #3: Number of Boys and Girls
Suppose that 15% of the families in a certain community have no children,
20% have 1 child, 35% have two children and 30% have three children. Suppose
further that in each family, each child is equally likely to be a boy or a girl. If a
family is chosen at random from this community, then B (the number of boys)
and G (the number of girls) in this family will have the joint probability mass
function, p(b, g), given by
p(0, 0) = P ((X = 0) (Y = 0)|0)P (0) = (1)(0.15) = 0.15
and
p(0, 1) = P ((X = 0) (Y = 1)|1)P (1) = (0.5)(0.2) = 0.10 = p(1, 0)
and
p(0, 2) = P ((X = 0) (Y = 2)|2)P (2) = (0.5)2 (0.35) = 0.0875 = p(2, 0)
and
p(0, 3) = P ((X = 0) (Y = 3)|3)P (3) = (0.5)3 (0.3) = 0.0375 = p(3, 0)
and

2
p(1, 1) = P ((X = 1) (Y = 1)|2)P (2) =
(0.5)1 (0.5)21 (0.35) = 0.175
1
4
and

3
p(1, 2) = P ((X = 1)(Y = 2)|3)P (3) =
(0.5)1 (0.5)21 (0.3) = 0.1125 = p(2, 1).
1
x\y
0
1
2
3
py (y) = Column Sums
The Values
0
1
0.1500 0.1000
0.1000 0.175
0.0875 0.1125
0.0375
0
0.3750 0.3875
of p(x, y)
2
3
0.0875 0.0375
0.1125
0
0
0
0
0
0.2000 0.0375
pX (x) = Row Sums

0.3750
0.3875
0.2000
0.0375
Sums Equal 1
and column sums, respectively.
Jointly Continuous Random Variables
We say that X and Y are jointly continuous if there exists a function f (x, y),
defined for all real x and y, having the property that for every set C = R R of
pairs of real numbers,
ZZ
P ((X, Y ) C) =
f (x, y)dxdy.
(7)
(x,y)C
The function f (x, y) is called the joint probability density function of X and Y .
If A and B are any sets of real numbers, then by defining
C = {(x, y)|x A, y B},
we see that
P ((X A) (Y B)) =
and hence
fX (x) =
Z Z
B
f (x, y)dxdy
(8a)
(8b)
f (x, y)dy
and
fY (y) =
f (x, y)dx.
Since
F (a, b) = P ((X a) (Y b)) =
we see that
f (x, y)dydx
2 F (a, b)
f (x, y) =
ab
(9)
and since
P ((a1 < X a2 ) (b1 < Y b1 )) =
a2
a1
b2
f (x, y)dydx
(10a)
b1
we see that
P ((a < X a + a) (b < Y b + )) =
a+
b+b
f (x, y)dydx
which reduces to
P ((a < X a + a) (b < Y b + b)) = f (a, b)ab
(10b)
showing that f (a, b) is the probability per unit area that (X, Y ) is near (a, b).
Example #4: Joint Probability Density Functions
Suppose that the joint probability density function for the random variables
X and Y is given by
x 2y
0 < x, y
2e e , for
f (x, y) =
.
0,
for other values of x and y
Then
P ((X > 1) (Y < 1)) =
We also have
P (X < Y ) =
which reduces to P (X < Y ) = 1/3.
2ex e2y dydx = e1 e3 ' 0.3181.

Z y
0
2ex e2y dxdy,
Example #5: A Problem in Geometry

Consider a circle of radius R, and suppose that a point within the circle is
randomly chosen in such a manner that all regions within the circle of equal area
are equally likely to contain the point. In other words, the point is uniformly
distributed within the circle. Let the center of the circle be at the origin and
define X and Y to be the coordinates of the point chosen, as shown below.
The point (X, Y )

Then since each (X, Y ) is equally likely to be near each point on the circle, it
follows that the joint probability density function of X and Y is given by
c, for x2 + y 2 R2
f (x, y) =
0, for x2 + y 2 > R2
where c is determined using
Z
and this leads to
and so
+R
f (x, y) =
f (x, y)dydx = 1
+ R2 x2
R2 x2
cdydx = cR2 = 1
1, for x2 + y 2 R2
R2
0, for x + y > R
7
To determine the pdf (fX (x)) for X, we have

Z +
Z +R2 x2
1
2 2
f (x, y)dy =
dy
=
R x2
fX (x) =
2
R2
R2 x2 R
for R x +R. By symmetry, we also have
Z
Z 2 2
+
fY (y) =
f (x, y)dx =
2 p 2
1
dx
=
R y2
R2
R2
R y
R2 y2
for R y +R, as the pdf for Y . Next, let D be the random variable giving
the distance the point (X, Y ) is from the origin, then
z 2
z 2
FD (z) = P (D z) = P (x2 + y 2 < z 2 ) =
=
R2
R
for 0 z R. Using this, we have
fD (z) =
2z
dF (z)
= 2
dz
R
for 0 z R.
Example #6: Computing the pdf of Z = X/Y
Suppose that the joint pdf of X and Y is given by
(x+y)
, for
0 < x, 0 < y
e
f (x, y) =
,
0,
for other values of x and y
to compute the pdf of Z = X/Y , we use the cdf (since this involves probabilities)
and write
ZZ
f (x, y)dxdy
F (z) = P (Z z) = P (X/Y z) =
x/yz
or
F (z) =
which reduces to
yz
f (x, y)dxdy =
F (z) =
Z yz
Z yz
0
e(x+y) dxdy =
e(x+y) dxdy
z
z+1
for 0 z. Then
d
dF (z)
=
f (z) =
dz
dz
z
z+1
1
(z + 1)2
for 0 z.
Of course, all of the above can be extended to several variables using
n
!
\
F (x1 , x2 , x3 , ..., xn ) = P
(Xk xk )
(11a)
k=1
and if X1 , X2 , X3 , ..., Xn are all continuous random variables, we have

f (x1 , x2 , x3 , ..., xn ) =
n F (x1 , x2 , x3 , ..., xn )
x1 x2 x3 xn
(11b)
as the joint pdf function for X1 , X2 , X3 , ..., Xn .

Example #7: The Multinomial Distribution
Suppose that a sequence of n independent and identical experiments are performed and suppose that each experiment can result in r possible outcomes, with
respective probabilities p1 , p2 , p3 , ..., pr , where p1 + p2 + p3 + + pr = 1. If we
let Xj denote the number of the n experiments that result in outcome number nj
for j = 1, 2, 3, ..., r, then
p(n1 , n2 , n3 , ..., nr ) = P ((X1 = n1 ) (X2 = n2 ) (Xr = nr ))
leads to
p(n1 , n2 , n3 , ..., nr ) =
n
pn1 pn2 pn3 pnr r
n1 n2 n3 nr 1 2 3
(12)
where n1 + n2 + n3 + + nr = n. For a specific example, suppose that a fair die

is rolled 9 times. The probability that 1 appears 3 times, 2 and 3 appear twice, 4
and 5 appear once each and 6 does not appear at all (zero times) is
3 2 2 1 1 0
1
1
1
1
1
9
1
p(3, 2, 2, 1, 1, 0) =
6
6
6
6
6
6
322110
or
9!
p(3, 2, 2, 1, 1, 0) =
3!2!2!1!1!0!
or p(3, 2, 2, 1, 1, 0) = 0.0015 = 0.15%.
9
1
35
=
6
23328
2. Independent Random Variables

Two random variables X and Y are said to be independent, if
P ((X A) (Y B)) = P (X A)P (Y B)
(13a)
for any two sets A and B. This says that the events EA = X A and EB = Y B
are independent. This also says that
F (x, y) = P ((X x) (Y y)) = P (X x)P (Y y) = FX (x)FY (y)
(13b)
for all x RX and y RY . When X and Y are discrete random variables, we

find that
p(x, y) = pX (x)pY (x)
(14a)
for all x RX and y RY , and when X and Y are continuous random variables,
we find that
f (x, y) =
2 (FX (x)FY (y))

2 F (x, y)
=
= FX0 (x)FY0 (y)
yx
yx
or
f (x, y) = fX (x)fY (x)
(14b)
for all x RX and y RY . This says that X and Y are independent if knowing
the value of one does not change the distribution of the other. Random variables
that are not independent are said to be dependent.
Example #8
A man and woman decide to meet at a certain location. If each of them
independently arrives at a time uniformly distributed between 12 noon and 1 PM,
find the probability that the first to arrive has to wait longer than 10 minutes. To
answer this, X and Y denote, respectively, the time past 12 noon that the man
and the woman arrive, then X and Y are independent random variables each of
10
which is uniformly distributed over the interval from 0 to 60 minutes. The desired
probability is
P = P (|Y X| > 10) = P ((Y X > 10) (Y X < 10))
or
P = P (Y > X + 10) + P (Y < X 10).
This leads to
P (Y > X + 10) =
50
60
fX (x)fY (y)dydx =
x+10
25
,
72
and
P (Y < X 10) =
60
10
x10
fX (x)fY (y)dydx =
60
10
x10
1
1
25
dydx =
60 60
72
so that P = 25/36 ' 0.694. A plot of the region in which any one has to wait
more than ten minutes for the other is shown as the upper left and lower right
triangles in the figure below.
60
50
40
30
20
10
10
20
30
40
50
60
The region in which any one has to wait

more than 10 minutes for the other is shown
as the upper left and lower right triangles
The total area of these triangles compared to the total area of the square is
P =
1
(50)(50)
2
+ 12 (50)(50)
25
=
(60)(60)
36
11
which is the same answer obtained via integration. Note that just comparing areas
when dealing with uniform distributions is valid and sometimes simpler than doing
the integration.
Example #9: Balls and Urns Example #2 Revisited
Suppose that 3 balls are randomly selected from an urn containing 3 red, 4
white and 5 blue balls. If we let X and Y denote, respectively, the number of red
and white balls drawn, then RX = {0, 1, 2, 3} and RY = {0, 1, 2, 3} with
34 5 34 5
p(x, y) =
3xy
y5+4+3
=
3
or
123xy
1 3
4
5
p(x, y) =
.
220 x y 3 x y

The Values of 220 p(x, y)
x\y
0
1
2 3 pX (x) = Row Sums
0
10 40 30 4
84
1
30 60 18 0
108
2
15 12 0 0
27
3
1
0
0 0
1
py (y) = Column Sums 56 112 48 4
Sums Equal 220
and column sums, respectively. Note also that
pX (1) =
108
220
and
making
pY (2) =
48
220
108
48
324
18
=
6= p(1, 2) =
220 220
3025
220
and thereby showing that X and Y are not independent here.
pX (1)pY (2) =
12
Example #10: A Geometry Problem - Buons Needle Problem

A table is ruled with equidistant parallel lines a distance D apart. A needle of
length L (where L D) is randomly thrown on the table so that the perpendicular
distance (X) from the center of needle to the closest of the parallel lines is uniform
from 0 to D/2 and the angle () the needle makes with line connecting the center
of the needle to the parallel line is also uniform from 0 to /2, as shown below.
X is the length of the dashed line and is the

angle between the needle and the dashed line
Let us compute the probability that the needle will intersect one of the lines, the
other possibility being that the needle will be completely contained in the strip
between two lines. To answer this, let us determine the position of the needle
by specifying (1) the distance X from the midpoint of the needle to the nearest
parallel line and (2) the angle between the needle and the projected line of
length X. The needle will intersect the line if the hypotenuse of the right triangle
in the above figure is less than L/2, that is if
L
X < cos().
2
As X varies from 0 to D/2 and between 0 and /2, it is reasonable to assume
that they are independent, uniformly distributed random variables over these
respective ranges. Hence
Z /2 Z (L/2) cos()
1
1
dxd
P = P (X < (L/2) cos()) =
/2 D/2
0
0
which reduces to P = 2L/(D).
13
Example #11: A Geometry Problem - The Pipe Cleaner Problem

Consider a pipe cleaner of length 1, which is simply a bendable wire of length
1. Two points are chosen at random on this wire and then the wire is bent at
these points to see if it is possible to form a triangle, as shown in the following
figure.
The pipe cleaner is bent at two points

The pipe cleaner is bend at two points to form a triangle as shown below.
The pipe cleaner is bent into a triangle

We would like to determine the probability that a triangle can be formed.
To solve this, let the two points be located at R1 and R2 . We are given that
R1 U [0, 1), R2 U[0, 1), so that their pdfs are f1 (x) = 1 for 0 x < 1 and
f2 (y) = 1 for 0 y < 1 . We first note that the probability P is given by
P = P ((Triangle with R1 R2 ) (Triangle with R2 R1 ))
14
= P (Triangle with R1 R2 ) + P (Triangle with R2 R1 )

since the two events (R1 R2 ) and (R2 R1 ) are disjoint. Assuming first that
R1 R2 , the three sides of the possible triangle being formed have lengths R1 ,
R2 R1 and 1 R2 , and for these to form the sides of a triangle, we must have
R1 (R2 R1 ) + (1 R2 ) and R2 R1 (R1 ) + (1 R2 )
and
1 R2 (R1 ) + (R2 R1 )
which all reduce to
R1 1/2
R2 R1 1/2
1/2 R2 .
Thus we want to compute

P (Triangle with R1 R2 ) = P ((R1 1/2) (R2 R1 1/2) (1/2 R2 ))
which is a region in the unit square that looks like a right triangle with side lengths
1/2 and 1/2, as shown below.
The horizontal axis is R1 , the vertical axis is

R2 and the right triangle is shown in bold
Using areas, we then have
1
P (Triangle with R1 R2 ) =
2
15

1
1
1
=
2
2
8
We can also compute this probability using

or
1/2
1/2
x+1/2
f2 (y)f1 (x)dydx
1/2
x+1/2
1/2
1 1
1
dydx =
1 1
8
Assuming next that R2 R1 , the three sides of the triangle being formed have
lengths R2 , R1 R2 and 1 R1 , and since R1 and R2 are identical distributions
and independent, we may simply interchange the rolls of R1 and R2 and use the
result of the previous calculation. Thus, the symmetry in the problem leads to
the area shown below

and
Thus we find that
P =
1 1 1
+ =
8 8 4
or P = 25%.
16
1
8
Example #12: Another Pipe-Cleaner Problem

Consider the same pipe cleaner of the previous example. A point is chosen at
random on this wire and then a second point is also chosen at random but only
to the right of the first point. The wire is then bent at these two points to see if
it is possible to form a triangle. Determine the probability that a triangle can be
formed.
To solve this, let the two points be located at R1 and R2 with R2 to the right
of R1 . We are given that R1 U[0, 1), but this time we have R2 U[R1 , 1), so
that their pdfs are f1 (x) = 1 for 0 x < 1 and f2 (y) = 1(1 x) for x y < 1.
The three sides of the possible triangle being formed have lengths R1 , R2 R1
and 1 R2 , and for these to form the sides of a triangle, we must have
and
R1 (R2 R1 ) + (1 R2 ) and R2 R1 (R1 ) + (1 R2 )
which all reduce to

R1 1/2
Thus we want to compute
1 R2 (R1 ) + (R2 R1 )
,
R2 R1 1/2
1/2 R2 .
P (Triangle with R1 R2 ) = P ((R1 1/2) (R2 R1 + 1/2) (1/2 R2 ))
which is a region in the unit square that looks like a right triangle with side lengths
1/2 and 1/2, as shown below,

17
but because R1 and R2 are not independent and not both uniform in the unit
square here, we may not use simple areas to compute the probability. Instead, we
have
Z
Z
1/2
or
x+1/2
f2 (y|x)f1 (x)dydx
1/2
1/2
x+1/2
1/2
1
1
dydx
1x 1
which reduces to
P (Triangle with R1 R2 ) = ln(2)
1
2
or P ' 19.3%. The fact that this is smaller than 25% from the previous example
makes sense since we are being more restrictive in the choice of R2 here. Note
that in computing f2 (y|x), we have
f2 (y|x) =
since Y |X U [x, 1).
1
1x
Example #13: Characterization of the Normal Distribution

Let X and Y denote the horizontal and vertical miss distance when a bullet
is fired at a target, and assume that (1) X and Y are independent continuous
random variables having dierentiable density functions and (2) the joint density
f (x, y) = fX (x)fY (y) of X and Y depends on (x, y) only through x2 + y 2 . This
second assumption states that the probability of the bullet landing on any point
of the xy plane depends only on the distance of the point from the target and not
on its angle of orientation. An equivalent way of phrasing this assumption is to
say that the joint density function is rotational invariant. It is a rather interesting
fact that only 1 and 2 imply that X and Y must be normally distributed random
variables. To prove this, note first that the assumptions 1 and 2 yields the relation
f (x, y) = fX (x)fY (y)
and
f (x, y) = g(x2 + y 2 ),
respectively, so that
fX (x)fY (y) = g(x2 + y 2 )
18
for some dierentiable function g. Dierentiating this equation with resect to x

leads to
fX0 (x)fY (y) = 2xg0 (x2 + y 2 )
and so
fX0 (x)fY (y)

2xg0 (x2 + y 2 )
fX0 (x)
=
=
fX (x)fY (y)
g(x2 + y 2 )
fX (x)
or
fX0 (x)
g0 (x2 + y 2 )
=
.
2xfX (x)
g(x2 + y 2 )
Setting y 2 = R2 x2 , we see that

g0 (R2 )
fX0 (x)
=
= a constant = c
2xfX (x)
g(R2 )
since R is a constant, and so
fX0 (x)
= 2cx
fX (x)
dfX (x)
= 2cxdx
fX (x)
or
or
ln(fX (x)) = cx2 + c1
fX (x) = Aecx .
or
Since
1=
fX (x)dx = A
cx2
dx = A
cx2
dx = 2A
Setting z = cx2 , we have dz = 2cxdx, we have

Z +
Z +
dz
A
A
z
1 = 2A
e
=
z 1/2 ez dz =
1/2
2c(z/c)
c 0
c
0
which says that A = (c/)1/2 , which further says that
r
c cx2
fX (x) =
.
e
Setting
c=
1
2 2
we have
1
1
2
fX (x) =
e 2 (x/)
2
19
ecx dx.
for < x < +, which shows that X N(0, 2 ). Using the same argument
for Y , we have
1
1
2
fY (y) =
e 2 (y/)
2
for < y < +. Finally, using assumption 2, we could easily show that
=
and the student should do this. Thus we find that
1
1
1
1
2
2
f (x, y) = fX (x)fY (y) =
e 2 (x/)
e 2 (y/)
2
2
or
1 1 (x2 +y2 )/2
f (x, y) =
e 2
2 2
for < x, y < +. Using this we may now compute the probability that the
point (X, Y ) is within a distance R from the origin, i.e., P (D R) where
D = X2 + Y 2
as
which leads to
Z
P (D R) =
P (D R) = P (D2 R2 ) = P (X 2 + Y 2 R2 )
+R
+ R2 x2
R2 x2
f (x, y)dydx =
+R
+ R2 x2
R2 x2
1 1 (x2 +y2 )/2

e 2
dydx.
2 2
Using polar coordinates x = r cos() and y = r sin(), we have x2 + y 2 = r2 and

dA = dydx = rdrd, and this becomes
Z R
Z 2 Z R
1 2
1 1 (r2 /2 )
1
2
e 2
rdrd = 2
e 2 (r / ) rdr
P (D R) =
2
0
0
0 2
or
1 2
2
12 (R/)2
P (D R) = e 2 (r / ) |R
0 = 1e
which is a nice simple answer in the end.
3. The Minimum of n Independent Random Variables

Suppose that X1 , X2 , X3 , . . ., Xn are independent random variables with cdfs
F1 (x), F2 (x), F3 (x), . . ., Fn (x), respectively, and suppose that Y is a random
variable defined by
Y = min{X1 , X2 , X3 , . . . , Xn }.
(15a)
20
Then the cdf of Y (denoted by G(y)) can be computed using

G(y) = P (Y y) = 1 P (Y > y) = 1 P (X1 > y, X2 > y, . . . , Xn > y)
n
Y
= 1 P (X1 > y)P (X2 > y) P (Xn > y) = 1
P (Xk > y).
k=1
But
P (Xk > y) = 1 P (Xk y) = 1 Fk (y)
and so
n
Y
G(y) = 1 (1 Fk (y)).
(15b)
k=1
Example #14: The Minimum of n Exponential Random Variables

Suppose now that X1 , X2 , X3 , . . ., Xn are all independent and exponential
with parameters: 1 , 2 , 3 , . . ., n , respectively, then
Fk (y) = 1 ek y
and so
1 Fk (y) = ek y .
Then, if
Y = min{X1 , X2 , X3 , . . . , Xn }
we find that
G(y) = 1
or simply
(16a)
n
n
Y
Y
(1 Fk (y)) = 1
ek y
k=1
k=1
G(y) = 1 e(1 +2 +3 ++n )y = 1 ey

showing that Y is also exponential with parameter
= 1 + 2 + 3 + + n
for y 0.
21
(16b)
Example #15: The Minimum of n Geometric Random Variables

Suppose now that X1 , X2 , X3 , . . ., Xn are all independent and geometric with
parameters: p1 , p2 , p3 , . . ., pn , respectively, then
Fk (y) = 1 (1 pk )y
1 Fk (y) = (1 pk )y
and so
and so
Y = min{X1 , X2 , X3 , . . . , Xn }
leads to
(17a)
n
n
Y
Y
G(y) = 1 (1 Fk (y)) = 1 (1 pk )y
k=1
or simply
G(y) = 1 (1 p)
k=1
with
n
Y
(1 p) =
(1 pk )
k=1
showing that Y is also geometric with parameter

n
Y
p = 1 (1 pk )
(17b)
k=1
for y = 1, 2, 3, ....
Example #16: The Minimum of n Uniform Random Variables

Suppose now that X1 , X2 , X3 , . . ., Xn are all independent and continuously
uniform in the interval [a, b), then
Fk (y) =
ya
ba
and
1 Fk (y) =
by
ba
and
Y = min{X1 , X2 , X3 , . . . , Xn }
has cdf given by
n
n
Y
Y
by
G(y) = 1 (1 Fk (y)) = 1
ba
k=1
k=1
22
(18a)
or
G(y) = 1
for a y b.
by
ba
(18b)
4. The Maximum of n Independent Random Variables

Suppose that X1 , X2 , X3 , . . ., Xn are independent random variables with cdfs:
F1 (x), F2 (x), F3 (x), . . ., Fn (x), respectively, and suppose that Z is a random
variable defined by
Z = max{X1 , X2 , X3 , . . . , Xn },
(19a)
then the cdf of Z (denoted by H(z)) can be computed using
H(z) = Pr(Z z) = Pr(X1 z, X2 z, . . . , Xn z)
resulting in
H(z) =
n
Y
k=1
Pr(Xk z) =
n
Y
(19b)
Fk (z).
k=1
Example #17: The Maximum of n Uniform Random Variables

Suppose now that X1 , X2 , X3 , . . ., Xn are all independent and continuously
uniform in the interval [a, b), then
Fk (y) =
ya
ba
and
1 Fk (y) =
by
ba
and
Z = max{X1 , X2 , X3 , . . . , Xn }
has cdf given by
H(z) =
n
Y
Fk (z) =
k=1
or
H(z) =
for a z b.
n
Y
za
k=1
za
ba
23
(20a)
ba
(20b)
5. The Sum of Two Independent Discrete Random Variables

Suppose that X and Y are both independent discrete random variables with
pmfs pX (x) and pY (y) and suppose that Z = X + Y . Then the probability that
Z = z is given by
X
P (Y + x = z|X = x)P (X = x)
pZ (z) = P (Z = z) = P (Y + X = z) =
x
which can be expressed as

pZ (z) =
X
x
or simply
pZ (z) =
P (Y = z x)pX (x)
X
x
pX (x)pY (z x)
(21)
where the sum in over all possible values of x in which both pX (x) and pY (z x)
are non-zero. This is known as a convolution sum.
Example #18: The Sum of Two Geometric Distributions
Suppose that X is geometric with parameter 0 p 1 so that
0,
for x = 0, 1, 2, ...
pX (x) =
x = 1, 2, 3, ...
p(1 p)x1 , for
and Y is geometric with parameter 0 q 1 so that
0,
for y = 0, 1, 2, ...
.
pY (y) =
y1
q(1 q) , for
y = 1, 2, 3, ...
Assuming first that p 6= q, we note that if Z = X + Y , we have

pZ (z) =
X
x
pX (x)pY (z x) =
24
z1
X
x=1
pX (x)pY (z x)
(22a)
(22b)
for z = 2, 3, 4, ..., since pX (x) and pY (z x) are both non-zero only for x =
1, 2, 3, ..., z 1. This says that
x1
z1
z1
X
X
1p
x1
zx1
z2
pZ (z) =
p(1 p) q(1 q)
= pq(1 q)
1q
x=1
x=1
or
z2
pZ (z) = pq(1 q)
which reduces to
x
z2
X
1p
x=0
1q
z2
= pq(1 q)
1 ((1 p)/(1 q))z1

1 (1 p)/(1 q)
(1 q)z1 (1 p)z1
pZ (z) = pq
(22c)
pq
for z = 2, 3, 4, ..., and p 6= q, and pZ (z) = 0 for z = 1, 0, 1, 2, .... As a check,
we note that
X
X
(1 q)z1 (1 p)z1
pZ (z) =
pq
pq
z=2
z=2
!
X
X
pq
(1 q)z1
(1 p)z1
=
p q z=2
z=2
!
X
X
pq
=
(1 q)z
(1 p)z
p q z=1
z=1
1q
1p
pq
=
p q 1 (1 q) 1 (1 p)
or
X
pZ (z) = 1
z=2
as it must. Assuming next that p = q, we note that if Z = X + Y , we have

pZ (z) =
X
x
pX (x)pY (z x) =
z1
X
x=1
pX (x)pY (z x)
for z = 2, 3, 4, ..., since pX (x) and pY (z x) are both non-zero only for x =
1, 2, 3, ..., z 1. This says that
pZ (z) =
z1
X
x=1
x1
p(1 p)
zx1
p(1 p)
25
z2
= p (1 p)
z1
X
x=1
or
pZ (z) = (z 1)p2 (1 p)z2
(22d)
for z = 2, 3, 4, ..., and p = q, and pZ (z) = 0 for z = 1, 0, 1, 2, .... As a check,

we note that
X
z=2
X
X
2
z2
2
pZ (z) =
(z 1)p (1 p)
=p
z(1 p)z1 = 1
z=2
z=1
as it must. One final point is to note that

(1 q)z1 (1 p)z1
= p2 (z 1)(1 p)z2
lim pq
qp
pq
showing that Equation (22d) is consistent with Equation (22c).
Example #19: The Sum of Two Poisson Distributions
Suppose that X and Y are two Poisson random variables with parameters
and , respectively, so that
pX (x) =
e x
x!
and
pY (y) =
e y
y!
(23a)
for x = 0, 1, 2, ..., and y = 0, 1, 2, 3, ..., and suppose that Z = X + Y , then the

pmf of Z can be computed as
pZ (z) =
z
X
x=0
pX (x)pY (z x) =
e(+)
z!
z
X
e x e zx
z
X
e(+) x zx
=
(z
x)!
x!(z x)!
x=0
x=0
z
z
X
e(+) X z x zx
x zx

=
z!
x!(z x)!
z! x=0 x
x=0
x!
and so we have
e(+) ( + )z
(23b)
pZ (z) =
z!
for z = 0, 1, 2, 3, ..., which shows that Z = X + Y is also Poisson with parameter
+ . In general, if X1 , X2 , X3 , . . ., Xn are all Poisson with parameters 1 , 2 ,
3 , . . ., n , then
(23c)
X = X1 + X2 + X3 + + Xn
26
is Poisson with parameter

= 1 + 2 + 3 + + n .
(23d)
6. The Sum of Two Independent Continuous Random Variables

Suppose that X and Y are both independent continuous random variables
with pdfs fX (x) and fY (y) and suppose that Z = X + Y . Then the probability
that Z z is given by
Z
FZ (z) = P (Z z) = P (Y + X z) = P (Y + x z|X = x)P (X = x)dx
where the integration in over all possible values of x. This can be expressed as
Z
FZ (z) = P (Y z x|X = x)fX (x)dx
or simply
FZ (z) =
P (Y z x)fX (x)dx =
FY (z x)fX (x)dx.
By taking the derivative with respect to z, we find that

Z
fZ (z) = fX (x)fY (z x)dx
(24)
where the integration is over all values of x in which both fX (x) and fY (z x)
are non-zero. This is known as a convolution integral.
Example #20: The Sum of Two Exponential Distributions
Suppose that X is exponential with parameter 0 < so that
for x < 0
0,
fX (x) =
x
e , for x 0
and Y is exponential with parameter 0 < so that
for y < 0
0,
fY (y) =
.
y
e , for y 0
27
(25a)
(25b)
Assuming first that 6= , we note that if Z = X + Y , we have

Z
Z z
fX (x)fY (z x)dx =
ex e(zx) dx
fZ (z) =
0
for z 0, since fX (x) and fY (z x) are both non-zero only for 0 x z. This
says that
Z z
1 ez()
z
x+x
z
fZ (z) = e
e
dx = e
0
or
ez ez
fZ (z) =
(25c)
for z 0 and 6= , and fZ (z) = 0 for z < 0. As a check, we note that
z
Z
Z
e
ez
dz = 1
pZ (z)dz =
0
0
as it must. Assuming next that = , we note that if Z = X + Y , we have
Z
Z z
fX (x)fY (z x)dx =
ex e(zx) dx
fZ (z) =
0
for z 0, since fX (x) and fY (z x) are both non-zero only for 0 x z. This
says that
fZ (z) = 2 zez
(25d)
for z 0 and = and fZ (z) = 0 for z < 0. As a check, we note that
Z
Z
fZ (z)dz =
2 zez dz = 1
0
as it must. One final point is to note that

z
ez
e
lim
= 2 zez
showing that Equation (25d) is consistent with Equation (25c).
28
Example #21: The Sum of Two Normal Distributions

Suppose that X is normal with parameters 1 and 12 so that
2 !
1
1 x 1
exp
fX (x) =
2
1
21
and Y is normal with parameters 2 and 22 so that
2 !
1
1 y 2
.
exp
fY (y) =
2
2
22
(26a)
(26b)
Then if Z = X + Y , we have
Z +
fX (x)fY (z x)dx
fZ (z) =
2 !
Z +
1
1 x 1
exp
=
2
1
21
2 !
1
1 z x 2
dx.
exp
2
2
22
Setting w = (x 1 )/1 , we have x = 1 + w1 and dw = dx/1 , and so
2 !
Z +
1 z 1 w1 2
1
1 2
fZ (z) =
dw
exp w exp
22
2
2
2
2 !
Z +
1 z w1 (1 + 2 )
1 2
1
dw
exp w exp
=
22
2
2
2
or
2 !
1
1 z (1 + 2 )
fZ (z) =
exp
22
2
2
Z +
12
z (1 + 2 )
1 2
dw.
exp w 1 + 2 + w1
2
2
22
Using the fact that
aw2 +bw
b2 /4a
dw = e
29
(26c)
for a > 0, we see that
fZ (z) =
1
1
exp
22
2
z (1 + 2 )
2
2 !
which reduces to

2 v
u
1 z(12+2 )
u 2
2
exp
t
2
2
1 + 12
2 1 + 12
2
1
1
fZ (z) = p 2
exp
2
2 1 + 22
!2
z (1 + 2 )
p
12 + 22
(26d)
showing that Z = X + Y is also normal with parameters 1 + 2 and 12 + 22 .

Example #22: The Sum of Two Identical Uniform Distributions
Suppose that X U[a, b) and Y U[a, b) so that if Z = X + Y , then
RZ = {x|2a x < 2b}.
Then
fX (x) =
and
fY (y) =
and
fZ (z) =
or
1/(b a), for a x < b
(27a)
1/(b a), for a y < b
(27b)
for all other values of x
0,
for all other values of y
0,
fX (x)fY (z x)dx =
1
fZ (z) =
ba
1
fY (z x)dx
ba
fY (z x)dx.
Now fY (z x) 6= 0 only for a z x < b, which says that

zb<xza
30
and so
or
1
fZ (z) =
ba
min(b,za)
max(a,zb)
fZ (z) =
1
1
dx =
ba
(b a)2
min(b,za)
dx
max(a,zb)
min(b, z a) max(a, z b)
.
(b a)2
(27c)
For 2a z a + b, we have z a b and z b a so that

min(b, z a) = z a
and then
fZ (z) =
and
max(a, z b) = a
zaa
z 2a
=
.
2
(b a)
(b a)2
But for a + b z 2b, we have z a b and z b a so that

min(b, z a) = b
and then
fZ (z) =
Thus we find that
fZ (z) =
and
max(a, z b) = z b
b (z b)
2b z
=
.
2
(b a)
(b a)2
z 2a, for 2a z a + b
(b a)2
(27d)
2b z, for a + b z 2b
which is a symmetric triangular distribution. A plot using a = 5 and b = 20 is

shown below.
31
.07
.06
.05
.04
.03
.02
.01
10
20
30
40
50
Plot of fZ (z) versus z for a = 5 and b = 20

and it shows a symmetric triangular distribution.
7. The Erlang Distribution
We have seen in the previous chapter that the sum of k independent and
identical exponential distributions (all with parameter )
X = X1 + X2 + X3 + + Xk ,
(28a)
is an Erlang distribution with parameters = /k and k. The pdf is given by

fX (x) =
k xk1 ex
(k 1)!
(28b)
for x 0 and fX (x) = 0 for x < 0. We had also seen that the cdf of the Erlang
distribution is given by
x
FX (x) = 1 e
k1
X
(x)i
i=0
i!
(28c)
for 0 < x, and F (x) = 0 for x 0.

Example #23: Being In The Dark
A box contains four identical light bulbs. Each light bulb has a lifetime distribution that is exponential with a mean of 150 hours. Person A takes two of
32
these light bulbs into a room and turns them both on at the same time and leaves
them turned on. Person B takes the other two light bulbs into a dierent room
and turns on only one of the light bulbs and then turns on the other light bulb
only when and if the first light bulb burns out. (a) Compute the probability that
person A will be in the dark at the end of one week (168 hours). (b) Compute the
probability that person B will be in the dark at the end of one week (168 hours).
To solve part (a), let X1 , X2 , X3 , and X4 be the random variables for the
lifetime (in hours) of the four light bulbs so that
F1 (x) = F2 (x) = F3 (x) = F4 (x) = 1 ex
with = 1/150. Since person A keeps both light bulbs on at the same time, this
person will be in the dark after one week (24 7 = 168 hours) only when
Y = max(X1 , X2 ) 168.
It should be clear that
Fmax (y) = P (Y y) = P (max(X1 , X2 ) y) = P ((X1 y) (X2 y)).
Since X1 and X2 are independent, we have
Fmax (y) = P ((X1 y) (X2 y)) = P (X1 y)P (X2 y)
resulting in
Fmax (y) = F1 (y)F2 (y) = (1 ey/150 )2
and so
P (Y 168) = (1 e168/150 )2 ' 0.4539.
To solve (b), we note that since person B keeps the light bulbs on only one at the
same time, this person will be in the dark after one week (24 7 = 168 hours)
only when
Z = X3 + X4 168.
From our notes in class we know that Z will be an Erlang distribution with
parameters k = 2, and = 1/300. Then
Fsum (z) = 1
21
X
e2(1/300)z
i=0
33
(2(1/300)z)i
i!
which reduces to
Then
z z/150
e
Fsum (z) = 1 1 +
.
150
168 168/150
e
,
P (Z 168) = 1 1 +
150
or P (Z 168) ' 0.3083.
8. Sum of Squared of Independent Standard Normals is Chi-Squared

We shall show in the next chapter that if Z1 , Z2 , Z3 , . . ., Zn are all independent
and from N(0, 1), then the random variable
X = Z12 + Z22 + Z32 + + Zn2
has a chi-squared distribution with n degrees of freedom.
The Chi-Squared Distribution With Degrees of Freedom ( 2 )
The chi-squared distribution with degrees of freedom (denoted by 2 ) is a
special case of a gamma distribution with = /2 and = 1/. This then has a
pdf given by
1
x/21 ex/2
f (x) = /2
2 (/2)
for 0 x and f (x) = 0 for x < 0. The cdf of this distribution is given by
Z x
Z x
1
F (x) =
f (z)dz = /2
z /21 ez/2 dz.
2
(/2)
0
0
Recall that
(/2) =
z /21 ez dz
is the gamma function evaluated at /2 and is given by (/2 1)! when /2 is an

integer. The mean and variance of the 2 distribution are given by E(X) = and
V (X) = 2, respectively. Typical plots of f (x) are shown in the figure below.
34
1
0.8
0.6
0.4
0.2
Plots of f (x) for = 1 and 2 and 3

showing: f1 (0) > f2 (0) > f3 (0)
Note that the mode of the 2 distribution is given by 2 for 2 . Note also
that for = 2, we find that
f2 (x) =
1
22/2 (2/2)
1
x2/21 ex/2 = ex/2
2
which is an exponential distribution with mean 2.
35

Notes#7 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes#7 PDF

Uploaded by

Copyright:

Available Formats

Probability and Statistics

Then, since probability is a continuous set function, we have

In a similar way, we have

F ((X > a) (Y > b)) = 1 F (X > a) (Y > b)

The student should show that, in general

When X and Y are Both Discrete Random Variables

for all x RX and y RY . The probability mass functions of X and Y separately

p(x, y) = P ((X = x) (Y = y)) =

The results of these can be summarized in the following table

pX (x) = Row Sums

P ((X > 1) (Y < 1)) =

which reduces to P (X < Y ) = 1/3.

2ex e2y dydx = e1 e3 ' 0.3181.

2ex e2y dxdy,

Example #5: A Problem in Geometry

The point (X, Y )

and this leads to

To determine the pdf (fX (x)) for X, we have

and if X1 , X2 , X3 , ..., Xn are all continuous random variables, we have

as the joint pdf function for X1 , X2 , X3 , ..., Xn .

where n1 + n2 + n3 + + nr = n. For a specific example, suppose that a fair die

2. Independent Random Variables

for all x RX and y RY . When X and Y are discrete random variables, we

2 (FX (x)FY (y))

The region in which any one has to wait

The results of these can be summarized in the following table

Example #10: A Geometry Problem - Buons Needle Problem

X is the length of the dashed line and is the

Example #11: A Geometry Problem - The Pipe Cleaner Problem

The pipe cleaner is bent at two points

The pipe cleaner is bent into a triangle

= P (Triangle with R1 R2 ) + P (Triangle with R2 R1 )

Thus we want to compute

The horizontal axis is R1 , the vertical axis is

We can also compute this probability using

The horizontal axis is R1 , the vertical axis is

Example #12: Another Pipe-Cleaner Problem

R1 (R2 R1 ) + (1 R2 ) and R2 R1 (R1 ) + (1 R2 )

which all reduce to

Thus we want to compute

P (Triangle with R1 R2 ) = P ((R1 1/2) (R2 R1 + 1/2) (1/2 R2 ))

The horizontal axis is R1 , the vertical axis is

Example #13: Characterization of the Normal Distribution

for some dierentiable function g. Dierentiating this equation with resect to x

fX0 (x)fY (y)

Setting y 2 = R2 x2 , we see that

Setting z = cx2 , we have dz = 2cxdx, we have

1 1 (x2 +y2 )/2

Using polar coordinates x = r cos() and y = r sin(), we have x2 + y 2 = r2 and

3. The Minimum of n Independent Random Variables

Then the cdf of Y (denoted by G(y)) can be computed using

Example #14: The Minimum of n Exponential Random Variables

G(y) = 1 e(1 +2 +3 ++n )y = 1 ey

Example #15: The Minimum of n Geometric Random Variables

showing that Y is also geometric with parameter

Example #16: The Minimum of n Uniform Random Variables

4. The Maximum of n Independent Random Variables

Example #17: The Maximum of n Uniform Random Variables

5. The Sum of Two Independent Discrete Random Variables

which can be expressed as

and Y is geometric with parameter 0 q 1 so that

Assuming first that p 6= q, we note that if Z = X + Y , we have