Professional Documents
Culture Documents
Random Variables
The most important questions of life are, for the most part,
really only problems of probability. Pierre Simon de Laplace
E XAMPLE 4.4 Suppose we toss 3 fair coins. Let Y denote the number
of heads appearing, then Y takes values 0, 1, 2, 3.
And
1
P(Y = 0) = P((T, T, T )) =
8
3
P(Y = 1) = P((H, T, T ), (T, H, T ), (T, T, H)) =
8
3
P(Y = 2) = P((H, H, T ), (H, T, H), (T, H, H)) =
8
1
P(Y = 3) = P((H, H, H)) =
8
61
62 Chapter 4. Random Variables
Suppose a game is that, you will win if the largest number obtained
is at least 17. What is the probability of winning?
E XAMPLE 4.6 Suppose that there are N distinct types of coupons and
that each time one obtains a coupon, it is, independently of previ-
ous selections, equally likely to be any one of the N types. Let T be
the number of coupons that needs to be collected until one obtains a
complete set of at least one of each type. Compute P(T = n).
Solution:
Considering the probability P(T > n). Once we know that, P(T = n)
will be given as
Note that Nj=1 A j means that one of the coupons has not been col-
S
= ∑ P(A j ) − ∑ P(A j1 A j2 ) + · · ·
j j1 < j2
+ (−1)k+1 ∑ P(A j1 A j2 · · · A jk ) · · ·
j1 < j2 <···< jk
+ (−1)N+1 P(A1 A2 · · · AN ).
4.1. Random Variables 63
N −1 n
P(A j ) = .
N
N −1 n N −2 n N −3 n
N N
P(T > n) = N − + −···
N 2 N 3 N
n
N N 1
+ (−1)
N −1 N
N−1 n
N N −i
= ∑ (−1)i+1 .
i=1 i N
The probability that T equals n can now be obtained from the preced-
ing formula by the use of
R EMARK :
One must collect at least N coupons to obtain a compete set, so P{T > n} = 1
if n < N. Therefore we obtain the interesting combinatorial identity that, for
integers 1 ≤ n < N,
N−1
N −i n
N
∑ i (−1)i+1 = 1
i=1 N
which can be written as
N−1 n
N N −i
∑ (−1)i+1 = 0.
i=0 i N
λk
p(k) = c , k = 0, 1, . . .
k!
where λ > 0 is a fixed positive value and c is a suitably chosen con-
stant.
(a) What is this suitable constant?
(b) Compute P(X = 0) and
(c) P(X > 2).
Solution:
(a) Since
∞
∑ p(k) = 1,
k=0
4.2. Discrete Random Variables 65
we have that
∞
λk ∞
λk
1= ∑c =c∑ = ceλ
k=0 k! k=0 k!
and so c = e−λ .
Note: c is known as the normalising constant.
(b) It follows that
λ0
P(X = 0) = p(0) = c = c = e−λ .
0!
(c)
P(X > 2) = 1 − P(X ≤ 2)
= 1 − P(X = 0) − P(X = 1) − P(X = 2)
λ2
= 1 − e−λ − λ e−λ − e−λ
2
λ2
= 1 − e−λ 1 + λ + .
2
R EMARK :
Suppose that X is discrete and takes values x1 , x2 , x3 , . . . where x1 < x2 < x3 <
. . .. Note then that F is a step function, that is, F is constant in the interval
[xi−1 , xi ) (F takes value p(x1 ) + · · · + p(xi−1 )), and then take a jump of size =
p(xi ).
For instance, if X has a probability mass function given by
1 1 1 1
p(1) = , p(2) = , p(3) = , p(4) = ,
4 2 8 8
then its cumulative distribution function is
0, a < 1
1
4 1 ≤ a < 2
F(a) = 34 2 ≤ a < 3 .
7
3≤a<4
8
1 4≤a
66 Chapter 4. Random Variables
Interpretations of Expectation
See [Ross, pp. 125 – 126, 128]
(i) Weighted average of possible values that X can take on. Weights here
are the probability that X assumes it.
E(X) = 0 × (1 − p) + 1 × p = p.
E XAMPLE 4.14 Let X denote the number obtained when a fair die is
rolled. Then, E(X) = 3.5.
Solution:
Here X takes values 1, 2, . . . , 6 each with probability 1/6. Hence
1 1 1
E(X) = 1 × + 2 × + · · · + 6 ×
6 6 6
1+2+3+4+5+6
=
6
7
= .
2
Solution:
Let X be the number of children they should continue to have until
they have one of each sex. For i ≥ 2, X = i if and only if either
(i) all of their first i − 1 children are boys and the ith child is a girl,
or
(ii) all of their first i − 1 children are girls and the ith child is a boy.
So by independence,
i−1 i−1 i−1
1 1 1 1 1
P(X = i) = + = , i ≥ 2.
2 2 2 2 2
And
∞ i−1 ∞ i−1
1 1 1
E(X) = ∑ i = −1 + ∑ i = −1 + = 3.
i=2 2 i=1 2 (1 − 1/2)2
i−1 = 1
Note that for |r| < 1, ∑∞
i=1 ir (1−r)2
.
68 Chapter 4. Random Variables
Solution:
He will win
0
with probability 1 − p1
v1 with probability p1 (1 − p2 ) .
v1 + v2 with probability p1 p2
v1 p1 (1 − p2 ) + (v1 + v2 )p1 p2 .
v2 p2 (1 − p1 ) + (v1 + v2 )p1 p2 .
v1 p1 (1 − p2 ) ≥ v2 p2 (1 − p1 )
v1 p1 v2 p2
≥ .
1 − p1 1 − p2
For example, if he is 60 percent certain of answering question 1, worth
$200, correctly and he is 80 percent certain of answering question 2,
worth $100, correctly, then he should attempt to answer question 2
first because
100 × 0.8 200 × 0.6
400 = > = 300.
0.2 0.4
4.4. Expectation of a Function of a Random Variable 69
Given X, we are often interested about g(X) and E[g(X)]. How do we com-
pute E[g(X)]? One way is to find the probability mass function of g(X) first
and proceed to compute E[g(X)] by definition.
Solution:
Let Y = X 2 , it follows that Y takes values 0 and 1 with probabilities
Hence
E(X 2 ) = E(Y ) = 0 × 0.5 + 1 × 0.5 = 0.5.
The procedure of going through a new random variable Y = g(X) and find its
probability mass function is clumsy. Fortunately we have the following
Proof Group together all the terms in ∑i g(xi )p(xi ) having the same value
of g(xi ). Suppose y j , j ≥ 1, represent the different values of g(xi ), i ≥ 1. Then,
70 Chapter 4. Random Variables
=∑ ∑ y j p(xi )
j i:g(xi )=y j
= ∑yj ∑ p(xi )
j i:g(xi )=y j
= ∑ y j P(g(X) = y j )
i
= E[g(X)].
E[aX + b] = aE(X) + b.
E(X 2 ) = ∑ x2 pX (x),
x
E(X − µ)k
R EMARK :
(a) The expected value of a random variable X, E(X) is also referred to as the
first moment or the mean of X.
E(X − µ)2
E(IA ) = P(A).
Solution:
Let s be the amount Peter orders. Let Y be the profit (note that, it is a
function of s), then
(
bX − l(s − X) if X ≤ s
Y= .
sb if X > s
72 Chapter 4. Random Variables
Here
s ∞
φ (s) := E(Y ) = ∑ [bi − l(s − i)]p(i) + ∑ sbp(i)
i=0 i=s+1
s ∞
= ∑ [(b + l)i − ls]p(i) + ∑ sbp(i)
i=0 i=s+1
s s
= (b + l) ∑ ip(i) − ls ∑ p(i)
i=0 i=0
" #
s
+sb 1 − ∑ p(i)
i=0
s s
= (b + l) ∑ ip(i) − (b + l)s ∑ p(i) + sb
i=0 i=0
s
= sb − (b + l) ∑ (s − i)p(i).
i=0
φ (s + 1) − φ (s)
" #
s+1
= (s + 1)b − (b + l) ∑ [(s + 1) − i]p(i)
i=0
" #
s
− sb − (b + l) ∑ (s − i)p(i)
i=0
s
= b − (b + l) ∑ p(i)
i=0
From this we notice that φ first increase and then decrease, let s∗ is
the change point, that is,
φ (0) < φ (1) < · · · < φ (s∗ − 1) < φ (s∗ ) > φ (s∗ + 1) > · · · .
z
74 Chapter 4. Random Variables
R EMARK :
(1) Note that var(X) ≥ 0. (Why?)
(2) var(X) = 0 if and only if X is a degenerate random variable (that is, the
random variable taking only one value, its mean).
E(X 2 ) ≥ [E(X)]2 ≥ 0.
Solution:
It was shown that E[X] = 72 . Now
1 1 1 1 1 1 1
E[X 2 ] = 12 × + 22 × + 32 × + 42 × + 52 × + 62 × = 91 × .
6 6 6 6 6 6 6
So 2
91 7 35
var(X) = − = .
6 2 12
Proof
(i)
(1) Each trial results in whether a particular event occurs or doesn’t. Occur-
rence of this event is called success, and non-occurrence called failure.
Write p := P(success), and q := 1 − p = P(failure).
Examples:
Such trials are called Bernoulli(p) trials. We now introduce some random
variables related to Bernoulli trials:
Here
P(X = 1) = p, P(X = 0) = 1 − p
and
E(X) = p, var(X) = p(1 − p).
Here
E(X) = np, var(X) = np(1 − p).
Proof
n
n
n k n−k n!
E(X) = ∑ k pq = ∑k pk qn−k
k=0 k k=1 k!(n − k)!
n
(n − 1)!
= np ∑ pk−1 q(n−1)−(k−1)
k=1 [(n − 1) − (k − 1)]!(k − 1)!
n−1
n − 1 j (n−1)− j
= np ∑ pq , where j = k − 1
j=0 j
= np.
Now
n n
n k n−k n!
E(X(X − 1)) = ∑ k(k − 1) pq = ∑ k(k − 1) pk qn−k
k=0 k k=2 k!(n − k)!
= ...
= n(n − 1)p2 .
So
var(X) = n(n − 1)p2 + np − (np)2 = np(1 − p).
Suppose that X is binomial with parameters (n, p). The key to computing
its distribution function
i
n
P(X ≤ i) = ∑ k pk (1 − p)n−k , i = 0, 1, . . . , n
k=0
p n−k
P(X = k + 1) = P(X = k).
1− p k+1
4.6. Discrete Random Variables arising from Repeated Trials 77
Proof
n! k+1 (1 − p)n−k−1
P(X = k + 1) (n−k−1)!(k+1)! p p n−k
= n!
= .
P(X = k) k
(n−k)!k! p (1 − p)
n−k 1− p k+1
(Note, the trial leading to the first success is included.) Here, X takes
values 1, 2, 3, . . . and so on. In fact, for k ≥ 1,
P(X = k) = pqk−1 .
And
1 1− p
E(X) = , var(X) = .
p p2
0
X = number of failures in the Bernoulli(p) trials
in order to obtain the first success.
Here
0
X = X + 1.
0
Hence, X takes values 0, 1, 2, . . . and
0
P(X = k) = pqk , for k = 0, 1, . . . .
And
0 1− p 0 1− p
E(X ) = , var(X ) = .
p p2
(d) Negative Binomial random variable, denoted by NB(r, p);
Define the random variable
And
r r(1 − p)
E(X) = , var(X) = .
p p2
R EMARK :
Take note that Geom(p) = NB(1, p).
Solution:
which reduces to
1
3(p − 1)2 (2p − 1) > 0 ⇐⇒ p > .
2
So p needs to be greater than 12 .
(b) Consider a system of 2k + 1-components and let X denote the
number of the first 2k − 1 that function.
P2k+1 (effective)
= P(X ≥ k + 1) + P(X = k)(1 − (1 − p)2 ) + P(X = k − 1)p2 .
Thus we need
P2k+1 (effective) − P2k−1 (effective)
= P(X = k − 1)p2 − (1 − p)2 P(X = k)
2k − 1 k−1 k 2 2 2k − 1
= p (1 − p) p − (1 − p) pk (1 − p)k−1
k−1 k
2k − 1 k 1
= p (1 − p)k [p − (1 − p)] > 0 ⇐⇒ p > .
k−1 2
2k−1 2k−1
The last equality follows because k−1 = k .
Solution:
Suppose the probability that each accident occurs on Friday the 13th
is 1/30, just as on any other day. Then the probability of at least four
accidents on Friday the 13th is
3 i 12−i
12 1 29
1− ∑ ≈ 0.000493.
i=0 i 30 30
Solution:
This is given by
∞
P(T > k) = ∑ q j−1 p
j=k+1
= qk (p + qp + q2 p + · · · ) = qk .
This probability can also be found by noting that we are asking for
no successes (i.e., arrivals) in a sequence of k consecutive time units,
where the probability of a success in any one time unit is p. Thus, the
probability is just qk .
Solution:
Let E denote the event that the mathematician first discovers that
the right-hand matchbox is empty and that there are i matches in the
left-hand box at the time. Now, this event will occur if and only if the
(N + 1)th choice of the right-hand matchbox is made at the (N + 1 +
N − i)th trial. Hence, using the Negative Binomial formulation, with
p = 21 , r = N + 1 and k = 2N − i + 1, we see that
2N−i+1
2N − i 1
P(E) = .
N 2
Since there is an equal probability that it is the left-box that is first
discovered to be empty and there are i matches in the right-hand box
at that time, the desired result is
2N−i
2N − i 1
2P(E) = .
N 2
e−λ λ k
P(X = k) = , for k = 0, 1, 2, . . . . (4.1)
k!
This defines a probability mass function, since
∞ ∞
λk
∑ P(X = k) = e−λ ∑ k!
= e−λ eλ = 1.
k=0 k=0
Notation: X ∼ Poisson(λ ).
The Poisson random variable has a tremendous range of application in di-
verse areas because it can be used as an approximation for a binomial ran-
dom variable with parameters (n, p) when n is large and p is small enough
so that np is of moderate size. To see this, suppose X is a binomial random
variable with parameters (n, p) and let λ = np. Then
n!
P(X = k) = pk (1 − p)n−k
(n − k)!k!
k
λ n−k
n! λ
= 1−
(n − k)!k! n n
n(n − 1) · · · (n − k + 1) λ k (1 − λ /n)n
=
nk k! (1 − λ /n)k
82 Chapter 4. Random Variables
λ n λ k
n(n − 1) · · · (n − k + 1)
1− ≈ e−λ , ≈ 1 and 1 − ≈ 1.
n nk n
λk
P(X = k) ≈ e−λ .
k!
R EMARK :
In other words, if n independent trials, each of which results in a success
with probability p are performed, then when n is large and p is small enough
to make np moderate, the number of successes occurring is approximately
aPoisson random variable with parameter λ = np.
Some examples of random variables that obey the Poisson probability law,
that is, Equation (4.1) are:
Each of the preceding, and numerous other random variables, are approxi-
mately Poisson for the same reason – because of the Poisson approximation
to the binomial.
E XAMPLE 4.35 Suppose that the number of typographical errors on
a page of a book has a Poisson distribution with parameter λ = 12 .
Calculate the probability that there is at least one error on a page.
Solution:
Let X denote the number of errors on the page, we have
Solution:
The desired probability is
10 0 10 10
(0.1) (0.9) + (0.1)1 (0.9)9 = 0.7361,
0 1
whereas the Poisson approximation (λ = np = 0.1 × 10 = 1) yields the
value
e−1 + e−1 ≈ 0.7358.
E(X) = λ , var(X) = λ .
Proof We have
∞
λk
E(X) = ∑ k k! e−λ
k=0
∞
λk
= e−λ ∑k
k=1 k!
−λ
∞
λ k−1
= λe ∑ (k − 1)!
k=1
∞
λj
= λ e−λ ∑ =λ
j=0 j!
and
∞
λ k −λ
E(X(X − 1)) = ∑ k(k − 1) e
k=0 k!
∞
λ k−2
= λ 2 e−λ ∑ (k − 2)!
k=2
∞
λj
= λ 2 e−λ ∑ = λ 2.
j=0 j!
Note that
R EMARK :
The Poisson distribution with parameter np is a very good approximation
to the distribution of the number of successes in n independent trials when
each trial has probability p of being a success, provided that n is large and p
small. In fact, it remains a good approximation even when the trials are not
independent, provided that their dependence is weak.
X = number of matches = E1 + · · · + En
and that
1 1
P(Ei ) = and P(Ei |E j ) = , j 6= i
n n−1
showing that though the events Ei , i = 1, . . . , n are not independent,
their dependence, for large n, is weak. So it is reasonable to expect
that the number of matches X will approximately have a Poisson dis-
tribution with parameter n × 1/n = 1, and thus
R EMARK :
For the number of events to occur to approximately have a Poisson distribu-
tion, it is not essential that all the events have the same probability of occur-
rence, but only that all of these probabilities be small.
Poisson Paradigm
Consider n events, with pi equal to the probability that event i occurs, i = 1,
. . ., n. If all the pi are “small”’ and the trials are either independent or at most
“weakly dependent”, then the number of these events that occur approximately
n
has a Poisson distribution with mean ∑ pi .
i=1
Another use of the Poisson distribution arises in situations where events oc-
cur at certain points in time. One example is to designate the occurrence of
an earthquake as an event; another possibility would be for events to corre-
spond to people entering a particular establishment; and a third possibility
is for an event to occur whenever a war starts. Let us suppose that events are
indeed occurring at certain random points of time, and let us assume that,
for some positive constant λ , the following assumptions hold true:
1. The probability that exactly 1 event occurs in a given interval of length
h is equal to λ h+o(h), where o(h) stands for any function f (h) for which
limh→0 f (h)/h = 0. 1
1 For instance, f (h) = h2 is o(h), whereas f (h) = h is not.
86 Chapter 4. Random Variables
It can be shown that under the assumptions mentioned above, the number of
events occurring in any interval of length t is a Poisson random variable with
parameter λt, and we say that the events occur in accordance with a Poisson
process with rate λ .
E XAMPLE 4.40 Suppose that earthquakes occur in the western por-
tion of the United States in accordance with assumptions 1, 2, and
3, with λ = 2 and with 1 week as the unit of time. (That is, earth-
quakes occur in accordance with the three assumptions at a rate of 2
per week.)
(a) Find the probability that at least 3 earthquakes occur during the
next 2 weeks.
(b) Find the probability distribution of the time, starting from now,
until the next earthquake.
Solution:
Let N(t) denote the number of earthquakes that occur in t weeks.
42 −4
P(N(2) ≥ 3) = 1−P(N(2) ≤ 2) = 1−e−4 −4e−4 − e = 1−13e−4 .
2
(b) Let X denote the amount of time (in weeks) until the next earth-
quake. Because X will be greater than t if and only if no events
occur with the next t units of time, we have
Suppose that we have a set of N balls, of which m are red and N − m are blue.
We choose n of these balls, without replacement, and define X to be the number
of red balls in our sample. Then
m N−m
x n−x
P(X = x) = N
,
n
for x = 0, 1, . . . , N.
A random variable whose probability mass function is given as the above
equation for some values of n, N, m is said to be a hypergeometric random
variable, and is denoted by H(n, N, m). Here
nm nm (n − 1)(m − 1) nm
E(X) = , var(X) = +1− .
N N N −1 N
E XAMPLE 4.41 A purchaser of electrical components buys them in
lots of size 10. It is his policy to inspect 3 components randomly from
a lot and to accept the lot only if all 3 are nondefective. If 30 percent
of the lots have 4 defective components and 70 percent have only 1,
what proportion of lots does the purchaser reject?
Solution:
Let A denote the event that the purchaser accepts a lot. Then
3 7
P(A) = P(A|lot has 4 defectives) × + P(A|lot has 1 defective) ×
10 10
4 6 1 9
3 7 54
= 0 103 × + 0 103 × = .
3
10 3
10 100
88 Chapter 4. Random Variables
In this section, we will prove the result that the expected value of a sum of
random variables is equal to the sum of their expectations.
For a random variable X, let X(s) denote the value of X when s ∈ S is the
outcome. Now, if X and Y are both random variables, then so is their sum.
That is, Z = X +Y is also a random variable. Moreover, Z(s) = X(s) +Y (s).
E XAMPLE 4.42 Suppose that an experiment consists of flipping a coin
5 times, with the outcome being the resulting sequence of heads and
tails. Let X be the number of heads in the first 3 flips and Y the num-
ber of heads in the final 2 flips. Let Z = X +Y . Then for the outcome
s = (h,t, h,t, h),
X(s) = 2, Y (s) = 1, Z(s) = X(s) +Y (s) = 3.
For the outcome s = (h, h, h,t, h),
X(s) = 3, Y (s) = 1, Z(s) = X(s) +Y (s) = 4.
Let p(s) = P({s}) be the probability that s is the outcome of the experiment.
P ROPOSITION 4.43
E[X] = ∑ X(s)p(s).
s∈S
Proof Suppose that the distinct values of X are x j , i ≥ 1. For each i, let Si be
the event that X is equal to xi . That is, Si = {s : X(s) = xi }. Then,
E[X] = ∑ xi P{X = xi }
i
= ∑ xi P(Si )
i
= ∑ xi ∑ p(s)
i s∈Si
= ∑ ∑ xi p(s)
i s∈Si
= ∑ ∑ X(s)p(s)
i s∈Si
= ∑ X(s)p(s).
s∈S
n
Proof Let Z = ∑ Xi . Proposition 4.43 gives
i=1
E[Z] = ∑ Z(s)p(s)
s∈S
= ∑ (X1 (s) + X2 (s) + · · · + Xn (s))p(s)
s∈S
= ∑ X1 (s)p(s) + ∑ X2 (s)p(s) + · · · + ∑ Xn (s)p(s)
s∈S s∈S s∈S
= E[X1 ] + E[X2 ] + · · · + E[Xn ]
z
R EMARK :
Note that this result does not require that the Xi ’s be independent.
90 Chapter 4. Random Variables
E XAMPLE 4.46 Find the expected total number of successes that re-
sult from n trials when trial i is a success with probability pi , i =
1, . . . , n.
Solution:
Let (
1, if trial i is a success
Xi = .
0, if trial i is a failure
We have the representation
n
X = ∑ Xi
i=1
Consequently,
n n
E[X] = ∑ E[Xi ] = ∑ pi
i=1 i=1
Solution:
Note that from the previous example, E(X) = np.
Let Xi be define as in the previous example. Then
" ! !#
n n
E[X 2 ] = E ∑ Xi ∑ Xj
i=1 i=1
" #
n n
=E ∑ Xi2 + ∑ ∑ Xi X j
i=1 i=1 j6=i
n n
= ∑ E[Xi2 ] + ∑ ∑ E[Xi X j ]
i=1 i=1 j6=i
n
= ∑ pi + ∑ ∑ E[Xi X j ].
i i=1 j6=i
Hence,
Let X be a discrete random variable. Recall the definitions for the distribution
function (d.f.) and the probability mass function (p.m.f.) of X:
Proof
(i) Note that for a < b, {X ≤ a} ⊂ {X ≤ b} and so the result follows from
Proposition 2.9 (page 19).
z
Some useful calculations
Theoretically, all probability questions about X can be computed in terms of
density function (or probability mass function).
Proof
1
P(X < b) = P lim X ≤ b −
n→∞ n
1
= lim P X ≤ b −
n→∞ n
1
= lim F b − .
n→∞ n
Note that P{X < b} does not necessarily equal F(b), since F(b) also
includes the probability that X equals b. z
P(A) = ∑ pX (x).
x∈A
FX (x) = ∑ pX (y) x ∈ R.
y≤x
Compute
(a) P(X < 3),
(b) P(X = 1),
(c) P X > 12 ,
Solution:
(c) P X > 12 = 1 − P X ≤ 21 = 1 − F 12 = 34 .
1
(d) P(2 < X ≤ 4) = F(4) − F(2) = 12 .
R ECOMMENDED R EADING Read Examples 3d, 4c, 6d, 6e,6g, 8b, 8c, 8f, 8g,
8h, 8j in Chapter 4 of [Ross].