You are on page 1of 34

Four

Random Variables
The most important questions of life are, for the most part,
really only problems of probability. Pierre Simon de Laplace

4.1 R ANDOM VARIABLES

In many situations when an experiment is performed, we are interested in


some function of the outcome rather than the outcome itself. Here are some
examples:
E XAMPLE 4.1 When we roll a pair of dice, let’s say we are not inter-
ested in the numbers that are obtained in each die but we are only
interested in the sum of the numbers.

E XAMPLE 4.2 There are 20 questions in a multiple choice paper. Each


question has 5 alternatives. A student answers all 20 questions by
randomly and independently choosing one alternative out of 5 in
each question. We are interested in X := number of correct answers.

D EFINITION 4.3 A random variable, X, is a mapping from the sample space


to real numbers.

E XAMPLE 4.4 Suppose we toss 3 fair coins. Let Y denote the number
of heads appearing, then Y takes values 0, 1, 2, 3.
And
1
P(Y = 0) = P((T, T, T )) =
8
3
P(Y = 1) = P((H, T, T ), (T, H, T ), (T, T, H)) =
8
3
P(Y = 2) = P((H, H, T ), (H, T, H), (T, H, H)) =
8
1
P(Y = 3) = P((H, H, H)) =
8

61
62 Chapter 4. Random Variables

E XAMPLE 4.5 An urn contains 20 chips numbered from 1 to 20. Three


chips are chosen at random from this urn. Let X be the largest num-
ber among the three chips drawn. Then X takes values from 3, 4, . . . , 20.
And, for k = 3, 4, . . . , 20,
k−1

2
P(X = k) = 20
 .
3

Suppose a game is that, you will win if the largest number obtained
is at least 17. What is the probability of winning?

P(win) = P(X = 17) + P(X = 18) + P(X = 19) + P(X = 20)


= 0.15 + 0.134 + 0.119 + 0.105 = 0.508.

E XAMPLE 4.6 Suppose that there are N distinct types of coupons and
that each time one obtains a coupon, it is, independently of previ-
ous selections, equally likely to be any one of the N types. Let T be
the number of coupons that needs to be collected until one obtains a
complete set of at least one of each type. Compute P(T = n).

Solution:
Considering the probability P(T > n). Once we know that, P(T = n)
will be given as

P(T = n) = P(T > n − 1) − P(T > n).

For a fix n, define the events A1 , A2 , . . . , AN as follows: A j is the event


that no type j coupon is contained among the first n coupons col-
lected, j = 1, . . . , N. Hence,
!
N
[
P(T > n) = P Aj .
j=1

Note that Nj=1 A j means that one of the coupons has not been col-
S

lected during the n collections, which is the same as {T > n}.


Via way of the inclusion-exclusion principle, we have
!
N
[
P(T > n) = P Aj
j=1

= ∑ P(A j ) − ∑ P(A j1 A j2 ) + · · ·
j j1 < j2

+ (−1)k+1 ∑ P(A j1 A j2 · · · A jk ) · · ·
j1 < j2 <···< jk

+ (−1)N+1 P(A1 A2 · · · AN ).
4.1. Random Variables 63

A j occurs if each of the n coupons collected is not of type j. Thus

N −1 n
 
P(A j ) = .
N

Likewise, the event A j1 A j2 occurs if none of the first n coupons col-


lected is of either type j1 or type j2 . Thus, again using independence,
we see that
N −2 n
 
P(A j1 A j2 ) = .
N

The same reasoning gives


 n
N −k
P(A j1 A j2 · · · A jk ) = .
N

Collating, we see that for n > 0,

N −1 n N −2 n N −3 n
       
N N
P(T > n) = N − + −···
N 2 N 3 N
   n
N N 1
+ (−1)
N −1 N
N−1    n
N N −i
= ∑ (−1)i+1 .
i=1 i N

The probability that T equals n can now be obtained from the preced-
ing formula by the use of

P(T = n) = P(T > n − 1) − P(T > n).

R EMARK :
One must collect at least N coupons to obtain a compete set, so P{T > n} = 1
if n < N. Therefore we obtain the interesting combinatorial identity that, for
integers 1 ≤ n < N,
N−1   
N −i n

N
∑ i (−1)i+1 = 1
i=1 N
which can be written as
N−1   n
N N −i
∑ (−1)i+1 = 0.
i=0 i N

Another random variable of interest is the number of distinct types of coupons


that are contained in the first n selections. For more details, refer to [Ross]. 
64 Chapter 4. Random Variables

4.2 D ISCRETE R ANDOM VARIABLES

D EFINITION 4.7 A random variable is said to be discrete if the range of X is


either finite or countably infinite.

D EFINITION 4.8 Suppose that random variable X is discrete, taking values


x1 , x2 , . . ., then the probability mass function of X, denoted by pX (or simply
as p if the context is clear), is defined as
(
P(X = x) if x = x1 , x2 , . . .
pX (x) = .
0 otherwise

Properties of the probability mass function

(i) pX (xi ) ≥ 0; for i = 1, 2, . . .;

(ii) pX (x) = 0; for other values of x;

(iii) Since X must take on one of the values of xi , ∑∞


i=1 pX (xi ) = 1.

E XAMPLE 4.9 Suppose a random variable X only takes values 0, 1, 2 . . ..


If the probability mass function of X (p.m.f. of X) is of the form:

λk
p(k) = c , k = 0, 1, . . .
k!
where λ > 0 is a fixed positive value and c is a suitably chosen con-
stant.
(a) What is this suitable constant?
(b) Compute P(X = 0) and
(c) P(X > 2).

Solution:

(a) Since

∑ p(k) = 1,
k=0
4.2. Discrete Random Variables 65

we have that

λk ∞
λk
1= ∑c =c∑ = ceλ
k=0 k! k=0 k!
and so c = e−λ .
Note: c is known as the normalising constant.
(b) It follows that
λ0
P(X = 0) = p(0) = c = c = e−λ .
0!
(c)
P(X > 2) = 1 − P(X ≤ 2)
= 1 − P(X = 0) − P(X = 1) − P(X = 2)
λ2
= 1 − e−λ − λ e−λ − e−λ
2
λ2

= 1 − e−λ 1 + λ + .
2

D EFINITION 4.10 The cumulative distribution function of X, abbreviated


to distribution function (d.f.) of X, (denoted as FX or F if context is clear) is
defined as
FX : R −→ R
where
FX (x) = P(X ≤ x) for x ∈ R.

R EMARK :
Suppose that X is discrete and takes values x1 , x2 , x3 , . . . where x1 < x2 < x3 <
. . .. Note then that F is a step function, that is, F is constant in the interval
[xi−1 , xi ) (F takes value p(x1 ) + · · · + p(xi−1 )), and then take a jump of size =
p(xi ).
For instance, if X has a probability mass function given by
1 1 1 1
p(1) = , p(2) = , p(3) = , p(4) = ,
4 2 8 8
then its cumulative distribution function is


 0, a < 1

1
4 1 ≤ a < 2



F(a) = 34 2 ≤ a < 3 .
 7
3≤a<4




 8
1 4≤a

66 Chapter 4. Random Variables

This graph of this function is shown below.

E XAMPLE 4.11 Three fair coins example continued.




 0 x<0

1/8 0≤x<1



FY (x) = 1/2 1≤x<2.

7/8 2≤x<3





1 3≤x

4.3 E XPECTED VALUE

D EFINITION 4.12 If X is a discrete random variable having a probability mass


function pX , the expectation or the expected value of X, denoted by E(X) or
µX , is defined by
E(X) = ∑ xpX (x).
x

Commonly used notation:


Use U,V, X,Y, Z upper case of letters to denote random variables (for they are
actually functions) and use u, v, . . . lower case of letters to denote values of
random variables (values of random variables are just real numbers).

Interpretations of Expectation
See [Ross, pp. 125 – 126, 128]

(i) Weighted average of possible values that X can take on. Weights here
are the probability that X assumes it.

(ii) Frequency point of views (relative frequency, in fact).

(iii) Center of gravity.


4.3. Expected Value 67

E XAMPLE 4.13 Suppose X takes only two values 0 and 1 with

P(X = 0) = 1 − p and P(X = 1) = p.

We call this random variable, a Bernoulli random variable of param-


eter p. And we denote it by X ∼ Be(p).

E(X) = 0 × (1 − p) + 1 × p = p.

E XAMPLE 4.14 Let X denote the number obtained when a fair die is
rolled. Then, E(X) = 3.5.

Solution:
Here X takes values 1, 2, . . . , 6 each with probability 1/6. Hence
1 1 1
E(X) = 1 × + 2 × + · · · + 6 ×
6 6 6
1+2+3+4+5+6
=
6
7
= .
2

E XAMPLE 4.15 A newly married couple decides to continue to have


children until they have one of each sex. If the events of having a boy
and a girl are independent and equiprobable, how many children
should this couple expect?

Solution:
Let X be the number of children they should continue to have until
they have one of each sex. For i ≥ 2, X = i if and only if either
(i) all of their first i − 1 children are boys and the ith child is a girl,
or
(ii) all of their first i − 1 children are girls and the ith child is a boy.
So by independence,
 i−1  i−1  i−1
1 1 1 1 1
P(X = i) = + = , i ≥ 2.
2 2 2 2 2

And
∞  i−1 ∞  i−1
1 1 1
E(X) = ∑ i = −1 + ∑ i = −1 + = 3.
i=2 2 i=1 2 (1 − 1/2)2

i−1 = 1
Note that for |r| < 1, ∑∞
i=1 ir (1−r)2
.
68 Chapter 4. Random Variables

E XAMPLE 4.16 A contestant on a quiz show is presented with 2 ques-


tions, 1 and 2. He can answer them in any order. If he decides on the
ith question, and if his answer is correct, then he can go on to the
other question; if his answer to the initial question is wrong, then he
is not allowed to answer the other. For right answer to question i, he
receives vi . If the probability that he knows the answer to question
i is pi where i = 1, 2, what is the strategy for improving his expected
return? Assume independence of answers to each questions.

Solution:

(a) Answer question 1 then question 2:

He will win

0
 with probability 1 − p1
v1 with probability p1 (1 − p2 ) .

v1 + v2 with probability p1 p2

Hence his expected winnings is

v1 p1 (1 − p2 ) + (v1 + v2 )p1 p2 .

(b) Answer question 2 then question 1:


He will win

0
 with probability 1 − p2
v2 with probability (1 − p1 )p2 .

v1 + v2 with probability p1 p2

Hence his expected winnings is

v2 p2 (1 − p1 ) + (v1 + v2 )p1 p2 .

Therefore it is better to answer question 1 first if

v1 p1 (1 − p2 ) ≥ v2 p2 (1 − p1 )
v1 p1 v2 p2
≥ .
1 − p1 1 − p2
For example, if he is 60 percent certain of answering question 1, worth
$200, correctly and he is 80 percent certain of answering question 2,
worth $100, correctly, then he should attempt to answer question 2
first because
100 × 0.8 200 × 0.6
400 = > = 300.
0.2 0.4
4.4. Expectation of a Function of a Random Variable 69

4.4 E XPECTATION OF A F UNCTION OF A R ANDOM VARIABLE

Given X, we are often interested about g(X) and E[g(X)]. How do we com-
pute E[g(X)]? One way is to find the probability mass function of g(X) first
and proceed to compute E[g(X)] by definition.

E XAMPLE 4.17 Let X be the length (in m) of a square which we want


to paint. Say X is a random variable with a certain distribution. We
are interested in cX 2 . Here c is the cost of painting per unit m2 . What
is the expected cost of painting?

E XAMPLE 4.18 Let X be a random variable that takes values −1, 0


and 1 with probabilities:

P(X = −1) = 0.2, P(X = 0) = 0.5 P(X = 1) = 0.3.

We are interested to compute E(X 2 ).

Solution:
Let Y = X 2 , it follows that Y takes values 0 and 1 with probabilities

P(Y = 0) = P(X = 0) = 0.5


P(Y = 1) = P(X = −1 or X = 1)
= P(X = −1) + P(X = 1) = 0.5

Hence
E(X 2 ) = E(Y ) = 0 × 0.5 + 1 × 0.5 = 0.5.

The procedure of going through a new random variable Y = g(X) and find its
probability mass function is clumsy. Fortunately we have the following

P ROPOSITION 4.19 If X is a discrete random variable that takes values xi , i ≥


1, with respective probabilities pX (xi ), then for any real value function g

E[g(X)] = ∑ g(xi )pX (xi ) or equivalently


i
= ∑ g(x)pX (x)
x

Proof Group together all the terms in ∑i g(xi )p(xi ) having the same value
of g(xi ). Suppose y j , j ≥ 1, represent the different values of g(xi ), i ≥ 1. Then,
70 Chapter 4. Random Variables

grouping all the g(xi ) having the same value gives

∑ g(xi )p(xi ) = ∑ ∑ g(xi )p(xi )


i j i:g(xi )=y j

=∑ ∑ y j p(xi )
j i:g(xi )=y j

= ∑yj ∑ p(xi )
j i:g(xi )=y j

= ∑ y j P(g(X) = y j )
i
= E[g(X)].

E XAMPLE 4.20 Let’s apply Proposition 4.19 to Example 4.18.

E(X 2 ) = (−1)2 (0.2) + 02 (0.5) + 12 (0.3) = 0.5

which is in agreement with the result previously computed.

C OROLLARY 4.21 Let a and b be constants, then

E[aX + b] = aE(X) + b.

Proof Apply g(x) = ax + b in Proposition 4.19, we have

E[aX + b] = ∑[ax + b]p(x)


x
= ∑[axp(x) + bp(x)]
x
= a ∑ xp(x) + b ∑ p(x)
x x
= aE(X) + b.

E XAMPLE 4.22 Take g(x) = x2 . Then,

E(X 2 ) = ∑ x2 pX (x),
x

is called the second moment of X.

In general, for k ≥ 1, E X k is called the k-th moment of X.



4.4. Expectation of a Function of a Random Variable 71

E XAMPLE 4.23 Let µ = E(X), and take g(x) = (x − µ)k , then

E(X − µ)k

is called the kth central moment.

R EMARK :
(a) The expected value of a random variable X, E(X) is also referred to as the
first moment or the mean of X.

(b) The first central moment is 0.

(c) The second central moment, namely,

E(X − µ)2

is called the variance of X. 

E XAMPLE 4.24 For an event A ⊂ S, let IA be the indicator of A, that is,



1 if s ∈ A
IA (s) = .
0 if s ∈ S \ A

Note that IA is a Bernoulli random variable with success probability


p = P(A). It then follows that

E(IA ) = P(A).

Theoretically, one can view calculating probability as calculating ex-


pectation. We will revisit this idea in Chapter 7.

E XAMPLE 4.25 A product, sold seasonally, yields a net profit of b


dollars for each unit sold and a net loss of l dollars for each unit
left unsold when the season ends. The number of this product sold
in Peter’s store is a random variable, denoted by X with probability
mass function p(i), i ≥ 0. Furthermore, this product has to be pre-
ordered. How much should Peter pre-order in order to maximize his
profit?

Solution:
Let s be the amount Peter orders. Let Y be the profit (note that, it is a
function of s), then
(
bX − l(s − X) if X ≤ s
Y= .
sb if X > s
72 Chapter 4. Random Variables

Here
s ∞
φ (s) := E(Y ) = ∑ [bi − l(s − i)]p(i) + ∑ sbp(i)
i=0 i=s+1
s ∞
= ∑ [(b + l)i − ls]p(i) + ∑ sbp(i)
i=0 i=s+1
s s
= (b + l) ∑ ip(i) − ls ∑ p(i)
i=0 i=0
" #
s
+sb 1 − ∑ p(i)
i=0
s s
= (b + l) ∑ ip(i) − (b + l)s ∑ p(i) + sb
i=0 i=0
s
= sb − (b + l) ∑ (s − i)p(i).
i=0

Locating the optimal order:


Consider φ (s + 1) − φ (s).

φ (s + 1) − φ (s)
" #
s+1
= (s + 1)b − (b + l) ∑ [(s + 1) − i]p(i)
i=0
" #
s
− sb − (b + l) ∑ (s − i)p(i)
i=0
s
= b − (b + l) ∑ p(i)
i=0

From this we notice that φ first increase and then decrease, let s∗ is
the change point, that is,

φ (0) < φ (1) < · · · < φ (s∗ − 1) < φ (s∗ ) > φ (s∗ + 1) > · · · .

So the optimal order is s∗ .

P ROPOSITION 4.26 (TAIL S UM F ORMULA FOR E XPECTATION )


For nonnegative integer-valued random variable X (that is, X takes values
0, 1, 2, . . .),
∞ ∞
E(X) = ∑ P(X ≥ k) = ∑ P(X > k).
k=1 k=0
4.5. Variance and Standard Deviation 73

Proof Consider the following triangularization:



∑ P(X ≥ k) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) + · · ·
k=1
+ P(X = 2) + P(X = 3) + P(X = 4) + · · ·
+ P(X = 3) + P(X = 4) + · · ·
+ P(X = 4) + · · ·
+ ···
= P(X = 1) + 2P(X = 2) + 3P(X = 3) + 4P(X = 4) + · · ·
= E(X).

4.5 VARIANCE AND S TANDARD D EVIATION

D EFINITION 4.27 If X is a random variable with mean µ, then the variance of


X, denoted by var(X), is defined by

var(X) = E(X − µ)2 .

It is a measure of scattering (or spread) of the values of X.

D EFINITION 4.28 The standard deviation of X, denoted by σX or SD(X), is


defined as p
σX = var(X).

An alternative formula for variance:

var(X) = E(X 2 ) − [E(X)]2 .

Proof (For discrete case only). Here g(x) = (x − µ)2 .

E(X − µ)2 = ∑(x − µ)2 pX (x)


x
= ∑(x2 − 2µx + µ 2 )pX (x)
x
= ∑ x2 pX (x) − ∑ 2µxpX (x) + ∑ µ 2 pX (x)
x x x
2 2
= E(X ) − 2µ ∑ xpX (x) + µ ∑ pX (x)
x x
= E(X ) − 2µ + µ = E(X ) − µ 2 .
2 2 2 2

z
74 Chapter 4. Random Variables

R EMARK :
(1) Note that var(X) ≥ 0. (Why?)

(2) var(X) = 0 if and only if X is a degenerate random variable (that is, the
random variable taking only one value, its mean).

(3) It follows from the formula that

E(X 2 ) ≥ [E(X)]2 ≥ 0.

E XAMPLE 4.29 Calculate var(X) if X represents the outcome when a


fair die is rolled.

Solution:
It was shown that E[X] = 72 . Now
1 1 1 1 1 1 1
E[X 2 ] = 12 × + 22 × + 32 × + 42 × + 52 × + 62 × = 91 × .
6 6 6 6 6 6 6
So  2
91 7 35
var(X) = − = .
6 2 12

Scaling and shifting property of variance and standard deviation:

(i) var(aX + b) = a2 var(X).

(ii) SD(aX + b) = |a| SD(X).

Proof

(i)

var(aX + b) = E[(aX + b − E(aX + b))2 ]


= E[(aX + b − aµ − b)2 ]
= E[a2 (X − µ)2 ]
= a2 E[(X − µ)2 ]
= a2 var(X).

(ii) Follows immediately from above. Note that a2 = |a|.
z
4.6. Discrete Random Variables arising from Repeated Trials 75

4.6 D ISCRETE R ANDOM VARIABLES ARISING FROM R EPEATED T RIALS

We study a mathematical model for repeated trials:

(1) Each trial results in whether a particular event occurs or doesn’t. Occur-
rence of this event is called success, and non-occurrence called failure.
Write p := P(success), and q := 1 − p = P(failure).
Examples:

nature of trial meaning of success meaning of failure probabilities p and q


Flip a fair coin head tail 0.5 and 0.5
Roll a fair die six non-six 1/6 and 5/6
Roll a pair of fair dice double six not double six 1/36 and 35/36
Birth of a child girl boy 0.487 and 0.513
Pick an outcome in A not in A P(A) and 1 − P(A)

(2) Each trial with success probability p, failure with q = 1 − p;

(3) We repeat the trials independently.

Such trials are called Bernoulli(p) trials. We now introduce some random
variables related to Bernoulli trials:

(a) Bernoulli random variable, denoted by Be(p);


We only perform the experiment once, and define
(
1 if it is a success
X= .
0 if it is a failure

Here
P(X = 1) = p, P(X = 0) = 1 − p
and
E(X) = p, var(X) = p(1 − p).

(b) Binomial random variable, denoted by Bin(n, p);


We perform the experiment (under identical conditions and indepen-
dently) n times and define

X = number of successes in n Bernoulli(p) trials.

Therefore, X takes values 0, 1, 2, . . . , n. In fact, for 0 ≤ k ≤ n,


 
n k n−k
P(X = k) = pq .
k
76 Chapter 4. Random Variables

Here
E(X) = np, var(X) = np(1 − p).

Proof

n
  n
n k n−k n!
E(X) = ∑ k pq = ∑k pk qn−k
k=0 k k=1 k!(n − k)!
n
(n − 1)!
= np ∑ pk−1 q(n−1)−(k−1)
k=1 [(n − 1) − (k − 1)]!(k − 1)!
n−1  
n − 1 j (n−1)− j
= np ∑ pq , where j = k − 1
j=0 j
= np.

We will make use of the fact that

var(X) = E(X 2 ) − (EX)2 = E(X(X − 1)) + E(X) − (EX)2 .

Now
n   n
n k n−k n!
E(X(X − 1)) = ∑ k(k − 1) pq = ∑ k(k − 1) pk qn−k
k=0 k k=2 k!(n − k)!
= ...
= n(n − 1)p2 .

So
var(X) = n(n − 1)p2 + np − (np)2 = np(1 − p).

Computing the Binomial Distribution Function

Suppose that X is binomial with parameters (n, p). The key to computing
its distribution function
i  
n
P(X ≤ i) = ∑ k pk (1 − p)n−k , i = 0, 1, . . . , n
k=0

is to utilize the following relationship between P(X = k + 1) and P(X = k):

p n−k
P(X = k + 1) = P(X = k).
1− p k+1
4.6. Discrete Random Variables arising from Repeated Trials 77

Proof

n! k+1 (1 − p)n−k−1
P(X = k + 1) (n−k−1)!(k+1)! p p n−k
= n!
= .
P(X = k) k
(n−k)!k! p (1 − p)
n−k 1− p k+1

(c) Geometric random variable, denoted by Geom(p);


Define the random variable

X = number of Bernoulli(p) trials required


to obtain the first success.

(Note, the trial leading to the first success is included.) Here, X takes
values 1, 2, 3, . . . and so on. In fact, for k ≥ 1,

P(X = k) = pqk−1 .

And
1 1− p
E(X) = , var(X) = .
p p2

Another version of Geometric distribution:

0
X = number of failures in the Bernoulli(p) trials
in order to obtain the first success.

Here
0
X = X + 1.
0
Hence, X takes values 0, 1, 2, . . . and
0
P(X = k) = pqk , for k = 0, 1, . . . .

And
0 1− p 0 1− p
E(X ) = , var(X ) = .
p p2
(d) Negative Binomial random variable, denoted by NB(r, p);
Define the random variable

X = number of Bernoulli(p) trials required


to obtain r success.
78 Chapter 4. Random Variables

Here, X takes values r, r + 1, . . . and so on. In fact, for k ≥ r,


 
k − 1 r k−r
P(X = k) = pq .
r−1

And
r r(1 − p)
E(X) = , var(X) = .
p p2

R EMARK :
Take note that Geom(p) = NB(1, p). 

E XAMPLE 4.30 A gambler makes a sequence of 1-dollar bets, betting


each time on black at roulette at Las Vegas. Here a success is winning
1 dollar and a failure is losing 1 dollar. Since in American roulette
the gambler wins if the ball stops on one of 18 out of 38 positions and
loses otherwise, the probability of winning is p = 18/38 = 0.474.

E XAMPLE 4.31 A communication system consists of n components,


each of which will, independently, function with probability p. The
total system will be able to operate effectively if at least one-half of
its components function.
(a) For what values of p is a 5-component system more likely to
operate effectively than a 3-component system?
(b) In general, when is a (2k + 1)-component system better than a
(2k − 1)-component system?

Solution:

(a) The number of functioning components X is a binomial ran-


dom variable with parameters (n, p). The probability that a 5-
component system will be effective is

P(at least 3 components in a 5-component system functions)


   
5 3 2 5 4
= p (1 − p) + p (1 − p) + p5
3 4

whereas the corresponding probability for a 3-component sys-


tem is

P(at least 2 components in a 3-component system functions)


 
3 2
= p (1 − p) + p3 .
2
4.6. Discrete Random Variables arising from Repeated Trials 79

So a 5-component system more likely to operate effectively than


a 3-component system when

10p3 (1 − p)2 + 5p4 (1 − p) + p5 > 3p2 (1 − p) + p3 ,

which reduces to
1
3(p − 1)2 (2p − 1) > 0 ⇐⇒ p > .
2
So p needs to be greater than 12 .
(b) Consider a system of 2k + 1-components and let X denote the
number of the first 2k − 1 that function.

A (2k + 1)-component system will be effective if either


(i) X ≥ k + 1;
(ii) X = k and at least one of the remaining 2 components func-
tion; or
(iii) X = k − 1 and both of the next 2 components function.

So the probability that a (2k + 1)-component system will be ef-


fective is given by

P2k+1 (effective)
= P(X ≥ k + 1) + P(X = k)(1 − (1 − p)2 ) + P(X = k − 1)p2 .

Now the probability that a (2k − 1)-component system will be


effective is given by

P2k−1 (effective) = P(X ≥ k)


= P(X = k) + P(X ≥ k + 1).

Thus we need
P2k+1 (effective) − P2k−1 (effective)
= P(X = k − 1)p2 − (1 − p)2 P(X = k)
   
2k − 1 k−1 k 2 2 2k − 1
= p (1 − p) p − (1 − p) pk (1 − p)k−1
k−1 k
 
2k − 1 k 1
= p (1 − p)k [p − (1 − p)] > 0 ⇐⇒ p > .
k−1 2
2k−1 2k−1
 
The last equality follows because k−1 = k .

So a 2k +1-components system will be better than a 2k −1-components


system if (and only if) p > 12 .
80 Chapter 4. Random Variables

E XAMPLE 4.32 In a small town, out of 12 accidents that occurred in


1986, four happened on Friday the 13th. Is this a good reason for a
superstitious person to argue that Friday the 13th is inauspicious?

Solution:
Suppose the probability that each accident occurs on Friday the 13th
is 1/30, just as on any other day. Then the probability of at least four
accidents on Friday the 13th is

3    i  12−i
12 1 29
1− ∑ ≈ 0.000493.
i=0 i 30 30

Since this probability is small, this is a good reason for a superstitious


person to argue that Friday the 13th is inauspicious.

E XAMPLE 4.33 The geometric distribution plays an important role


in the theory of queues, or waiting lines. For example, suppose a
line of customers waits for service at a counter. It is often assumed
that, in each small time unit, either 0 or 1 new customers arrive at
the counter. The probability that a customer arrives is p and that no
customer arrives is q = 1 − p. Then the time T until the next arrival
has a geometric distribution. It is natural to ask for the probability
that no customer arrives in the next k time units, that is, for P(T > k).

Solution:
This is given by

P(T > k) = ∑ q j−1 p
j=k+1

= qk (p + qp + q2 p + · · · ) = qk .

This probability can also be found by noting that we are asking for
no successes (i.e., arrivals) in a sequence of k consecutive time units,
where the probability of a success in any one time unit is p. Thus, the
probability is just qk .

E XAMPLE 4.34 (B ANACH MATCH PROBLEM ) At all times, a pipe smok-


ing mathematician carries 2 matchboxes – 1 in his left-hand pocket
and 1 in his right-hand pocket. Each time he needs a match, he is
equally likely to take it from either pocket. Consider the moment
when the mathematician first discovers that one of his matchboxes
is empty. If it is assumed that both matchboxes initially contained
N matches, what is the probability that there are exactly i matches,
i = 0, 1, 2, . . . , N, in the other box?
4.7. Poisson Random Variable 81

Solution:
Let E denote the event that the mathematician first discovers that
the right-hand matchbox is empty and that there are i matches in the
left-hand box at the time. Now, this event will occur if and only if the
(N + 1)th choice of the right-hand matchbox is made at the (N + 1 +
N − i)th trial. Hence, using the Negative Binomial formulation, with
p = 21 , r = N + 1 and k = 2N − i + 1, we see that
   2N−i+1
2N − i 1
P(E) = .
N 2
Since there is an equal probability that it is the left-box that is first
discovered to be empty and there are i matches in the right-hand box
at that time, the desired result is
   2N−i
2N − i 1
2P(E) = .
N 2

4.7 P OISSON R ANDOM VARIABLE

A random variable X is said to have a Poisson distribution with parameter λ


if X takes values 0, 1, 2, . . . with probabilities given as:

e−λ λ k
P(X = k) = , for k = 0, 1, 2, . . . . (4.1)
k!
This defines a probability mass function, since
∞ ∞
λk
∑ P(X = k) = e−λ ∑ k!
= e−λ eλ = 1.
k=0 k=0

Notation: X ∼ Poisson(λ ).
The Poisson random variable has a tremendous range of application in di-
verse areas because it can be used as an approximation for a binomial ran-
dom variable with parameters (n, p) when n is large and p is small enough
so that np is of moderate size. To see this, suppose X is a binomial random
variable with parameters (n, p) and let λ = np. Then

n!
P(X = k) = pk (1 − p)n−k
(n − k)!k!
 k 
λ n−k

n! λ
= 1−
(n − k)!k! n n
n(n − 1) · · · (n − k + 1) λ k (1 − λ /n)n
=
nk k! (1 − λ /n)k
82 Chapter 4. Random Variables

Note that for n large and λ moderate,

λ n λ k
   
n(n − 1) · · · (n − k + 1)
1− ≈ e−λ , ≈ 1 and 1 − ≈ 1.
n nk n

Hence for n large and λ moderate,

λk
P(X = k) ≈ e−λ .
k!
R EMARK :
In other words, if n independent trials, each of which results in a success
with probability p are performed, then when n is large and p is small enough
to make np moderate, the number of successes occurring is approximately
aPoisson random variable with parameter λ = np. 

Some examples of random variables that obey the Poisson probability law,
that is, Equation (4.1) are:

(i) number of misprints on a page;

(ii) number of people in a community living to 100 years;

(iii) number of wrong telephone numbers that are dialed in a day;

(iv) number of people entering a store on a given day;

(v) number of particles emitted by a radioactive source;

(vi) number of car accidents in a day;

(vii) number of people having a rare kind of disease.

Each of the preceding, and numerous other random variables, are approxi-
mately Poisson for the same reason – because of the Poisson approximation
to the binomial.
E XAMPLE 4.35 Suppose that the number of typographical errors on
a page of a book has a Poisson distribution with parameter λ = 12 .
Calculate the probability that there is at least one error on a page.

Solution:
Let X denote the number of errors on the page, we have

P(X ≥ 1) = 1 − P(X = 0) = 1 − e−1/2 = 0.393.


4.7. Poisson Random Variable 83

E XAMPLE 4.36 Suppose that the probability that an item produced


by a certain machine will be defective is 0.1. Find the probability that
a sample of 10 items will contain at most 1 defective item.

Solution:
The desired probability is
   
10 0 10 10
(0.1) (0.9) + (0.1)1 (0.9)9 = 0.7361,
0 1
whereas the Poisson approximation (λ = np = 0.1 × 10 = 1) yields the
value
e−1 + e−1 ≈ 0.7358.

P ROPOSITION 4.37 If X ∼ Poisson(λ ),

E(X) = λ , var(X) = λ .

Proof We have

λk
E(X) = ∑ k k! e−λ
k=0

λk
= e−λ ∑k
k=1 k!
−λ

λ k−1
= λe ∑ (k − 1)!
k=1

λj
= λ e−λ ∑ =λ
j=0 j!

and

λ k −λ
E(X(X − 1)) = ∑ k(k − 1) e
k=0 k!

λ k−2
= λ 2 e−λ ∑ (k − 2)!
k=2

λj
= λ 2 e−λ ∑ = λ 2.
j=0 j!

Note that

var(X) = E(X 2 ) − (E(X))2 = E(X(X − 1)) + E(X) − (E(X))2 ,

and so we have var(X) = λ . z


84 Chapter 4. Random Variables

R EMARK :
The Poisson distribution with parameter np is a very good approximation
to the distribution of the number of successes in n independent trials when
each trial has probability p of being a success, provided that n is large and p
small. In fact, it remains a good approximation even when the trials are not
independent, provided that their dependence is weak. 

E XAMPLE 4.38 Consider again Example 2.22 of Chapter 2 (page 27)


(The matching problem).
Define the events Ei , i = 1, . . . , n, by

Ei = {person i selects his own hat}.

It is easy to see that

X = number of matches = E1 + · · · + En

and that
1 1
P(Ei ) = and P(Ei |E j ) = , j 6= i
n n−1
showing that though the events Ei , i = 1, . . . , n are not independent,
their dependence, for large n, is weak. So it is reasonable to expect
that the number of matches X will approximately have a Poisson dis-
tribution with parameter n × 1/n = 1, and thus

P(X = 0) ≈ e−1 = 0.37

which agrees with what was shown in Chapter 2.


For a second illustration of the strength of the Poisson approximation when
the trials are weakly dependent, let us consider again the birthday problem
presented in Example 2.19 of Chapter 2 (page 25).
E XAMPLE 4.39 Suppose that each of n people is equally likely to
have any of the 365 days of the year as his or her birthday, the prob-
lem is to determine the probability that a set of n independent people
all have different birthdays. This probability was shown to be less
than 12 when n = 23.
We can approximate the preceding probability by using the Poisson
approximation as follows:
Imagine that we have a trial for each of the n2 pairs of individuals i


and j, i 6= j, and say that trial i, j is a success if persons i and j have


the same birthday.
Let Ei j denote the event that trial i, j is a success. Then the events
Ei j , 1 ≤ i < j ≤ n, are not independent and their dependence is weak.
4.7. Poisson Random Variable 85

Now P(Ei j ) = 1/365, it is reasonable to suppose that the number


of successes should approximately have a Poisson distribution with
mean λ = n2 /365 = n(n − 1)/730. Therefore,

P(no 2 people have the same birthday) = P(0 successes)


≈ e−λ
 
−n(n − 1)
= exp .
730

We want the smallest integer n so that


 
−n(n − 1) 1
exp ≤ .
730 2

This can be solved to yield n = 23, in agreement with the result in


Chapter 2.

R EMARK :
For the number of events to occur to approximately have a Poisson distribu-
tion, it is not essential that all the events have the same probability of occur-
rence, but only that all of these probabilities be small. 

Poisson Paradigm
Consider n events, with pi equal to the probability that event i occurs, i = 1,
. . ., n. If all the pi are “small”’ and the trials are either independent or at most
“weakly dependent”, then the number of these events that occur approximately
n
has a Poisson distribution with mean ∑ pi .
i=1

Another use of the Poisson distribution arises in situations where events oc-
cur at certain points in time. One example is to designate the occurrence of
an earthquake as an event; another possibility would be for events to corre-
spond to people entering a particular establishment; and a third possibility
is for an event to occur whenever a war starts. Let us suppose that events are
indeed occurring at certain random points of time, and let us assume that,
for some positive constant λ , the following assumptions hold true:
1. The probability that exactly 1 event occurs in a given interval of length
h is equal to λ h+o(h), where o(h) stands for any function f (h) for which
limh→0 f (h)/h = 0. 1
1 For instance, f (h) = h2 is o(h), whereas f (h) = h is not.
86 Chapter 4. Random Variables

2. The probability that 2 or more events occur in an interval of length h is


equal to o(h).

3. For any intergers n, j1 , j2 , . . . , jn and any set of n non-overlapping inter-


vals, if we define Ei to be the event that exactly ji of the events under
consideration occur in ith of these intervals, then the events E1 , E2 , . . . , En
are independent.

It can be shown that under the assumptions mentioned above, the number of
events occurring in any interval of length t is a Poisson random variable with
parameter λt, and we say that the events occur in accordance with a Poisson
process with rate λ .
E XAMPLE 4.40 Suppose that earthquakes occur in the western por-
tion of the United States in accordance with assumptions 1, 2, and
3, with λ = 2 and with 1 week as the unit of time. (That is, earth-
quakes occur in accordance with the three assumptions at a rate of 2
per week.)

(a) Find the probability that at least 3 earthquakes occur during the
next 2 weeks.
(b) Find the probability distribution of the time, starting from now,
until the next earthquake.

Solution:
Let N(t) denote the number of earthquakes that occur in t weeks.

(a) Then N(2) ∼ Poisson(4) and

42 −4
P(N(2) ≥ 3) = 1−P(N(2) ≤ 2) = 1−e−4 −4e−4 − e = 1−13e−4 .
2

(b) Let X denote the amount of time (in weeks) until the next earth-
quake. Because X will be greater than t if and only if no events
occur with the next t units of time, we have

P(X > t) = P(N(t) = 0) = e−λt

so the required probability distribution function F is given by

F(t) = P(X ≤ t) = 1 − e−λt = 1 − e−2t .


4.8. Hypergeometric Random Variable 87

Computing the Poisson Distribution Function


If X is Poisson with parameter λ , then
P(X = i + 1) e−λ λ i+1 /(i + 1)! λ
= −λ i
= ,
P(X = i) e λ /i! i+1
which means that
λ
P(X = i + 1) = P(X = i).
i+1
This gives a nice recursive way of computing the Poisson probabilities:
P(X = 0) = e−λ
P(X = 1) = λ P(X = 0) = λ e−λ
λ λ 2 −λ
P(X = 2) = P{X = 1} = e
2 2
..
.

4.8 H YPERGEOMETRIC R ANDOM VARIABLE

Suppose that we have a set of N balls, of which m are red and N − m are blue.
We choose n of these balls, without replacement, and define X to be the number
of red balls in our sample. Then
m N−m
 
x n−x
P(X = x) = N
 ,
n
for x = 0, 1, . . . , N.
A random variable whose probability mass function is given as the above
equation for some values of n, N, m is said to be a hypergeometric random
variable, and is denoted by H(n, N, m). Here
 
nm nm (n − 1)(m − 1) nm
E(X) = , var(X) = +1− .
N N N −1 N
E XAMPLE 4.41 A purchaser of electrical components buys them in
lots of size 10. It is his policy to inspect 3 components randomly from
a lot and to accept the lot only if all 3 are nondefective. If 30 percent
of the lots have 4 defective components and 70 percent have only 1,
what proportion of lots does the purchaser reject?
Solution:
Let A denote the event that the purchaser accepts a lot. Then
3 7
P(A) = P(A|lot has 4 defectives) × + P(A|lot has 1 defective) ×
10 10
4 6 1 9
   
3 7 54
= 0 103 × + 0 103 × = .
3
10 3
10 100
88 Chapter 4. Random Variables

Hence, 46 percent of the lots are rejected.

4.9 E XPECTED VALUE O F S UMS O F R ANDOM VARIABLES

In this section, we will prove the result that the expected value of a sum of
random variables is equal to the sum of their expectations.
For a random variable X, let X(s) denote the value of X when s ∈ S is the
outcome. Now, if X and Y are both random variables, then so is their sum.
That is, Z = X +Y is also a random variable. Moreover, Z(s) = X(s) +Y (s).
E XAMPLE 4.42 Suppose that an experiment consists of flipping a coin
5 times, with the outcome being the resulting sequence of heads and
tails. Let X be the number of heads in the first 3 flips and Y the num-
ber of heads in the final 2 flips. Let Z = X +Y . Then for the outcome
s = (h,t, h,t, h),
X(s) = 2, Y (s) = 1, Z(s) = X(s) +Y (s) = 3.
For the outcome s = (h, h, h,t, h),
X(s) = 3, Y (s) = 1, Z(s) = X(s) +Y (s) = 4.

Let p(s) = P({s}) be the probability that s is the outcome of the experiment.

P ROPOSITION 4.43
E[X] = ∑ X(s)p(s).
s∈S

Proof Suppose that the distinct values of X are x j , i ≥ 1. For each i, let Si be
the event that X is equal to xi . That is, Si = {s : X(s) = xi }. Then,

E[X] = ∑ xi P{X = xi }
i
= ∑ xi P(Si )
i
= ∑ xi ∑ p(s)
i s∈Si
= ∑ ∑ xi p(s)
i s∈Si

= ∑ ∑ X(s)p(s)
i s∈Si

= ∑ X(s)p(s).
s∈S

The final equality follows because S1 , S2 , . . . are mutually exclusive events


whose union is S. z
4.9. Expected Value Of Sums Of Random Variables 89

E XAMPLE 4.44 Suppose that two independent flips of a coin that


comes up heads with probability p are made, and let X denote the
number of heads obtained. Now

P(X = 0) = P(t,t) = (1 − p)2 ,


P(X = 1) = P(h,t) + P(t, h) = 2p(1 − p),
P(X = 2) = P(h, h) = p2

It follows from the definition that

E[X] = 0 × (1 − p)2 + 1 × 2p(1 − p) + 2 × p2 = 2p

which agrees with

E[X] = X(h, h)p2 + X(h,t)p(1 − p) + X(t, h)(1 − p)p + X(t,t)(1 − p)2


= 2p2 + p(1 − p) + (1 − p)p
= 2p.

As a consequence of Proposition 4.43, we have the important and useful re-


sult that the expected value of a sum of random variables is equal to the sum
of their expectations.

P ROPOSITION 4.45 For random variables X1 , X2 , . . . , Xn ,


" #
n n
E ∑ Xi = ∑ E [Xi ] .
i=1 i=1

n
Proof Let Z = ∑ Xi . Proposition 4.43 gives
i=1

E[Z] = ∑ Z(s)p(s)
s∈S
= ∑ (X1 (s) + X2 (s) + · · · + Xn (s))p(s)
s∈S
= ∑ X1 (s)p(s) + ∑ X2 (s)p(s) + · · · + ∑ Xn (s)p(s)
s∈S s∈S s∈S
= E[X1 ] + E[X2 ] + · · · + E[Xn ]

z
R EMARK :
Note that this result does not require that the Xi ’s be independent. 
90 Chapter 4. Random Variables

E XAMPLE 4.46 Find the expected total number of successes that re-
sult from n trials when trial i is a success with probability pi , i =
1, . . . , n.

Solution:
Let (
1, if trial i is a success
Xi = .
0, if trial i is a failure
We have the representation
n
X = ∑ Xi
i=1

Consequently,
n n
E[X] = ∑ E[Xi ] = ∑ pi
i=1 i=1

E XAMPLE 4.47 For X ∼ Bin(n, p), find var(X).

Solution:
Note that from the previous example, E(X) = np.
Let Xi be define as in the previous example. Then

" ! !#
n n
E[X 2 ] = E ∑ Xi ∑ Xj
i=1 i=1
" #
n n
=E ∑ Xi2 + ∑ ∑ Xi X j
i=1 i=1 j6=i
n n
= ∑ E[Xi2 ] + ∑ ∑ E[Xi X j ]
i=1 i=1 j6=i
n
= ∑ pi + ∑ ∑ E[Xi X j ].
i i=1 j6=i

Note that Xi2 = Xi and that


(
1, if Xi = 1 and X j = 1.
Xi X j =
0, otherwise

Hence,

E[Xi X j ] = P(Xi = 1, X j = 1) = P(trials i and j are successes).


4.10. Distribution Functions and Probability Mass Functions 91

Note that if X is binomial, then for i 6= j, the results of trial i and


trial j are independent, with each being a success with probability p.
Therefore,
E[Xi X j ] = p2 , i 6= j.
Thus
E[X 2 ] = np + n(n − 1)p2 ,
which gives

var(X) = E[X 2 ] − (E[X])2 = np + n(n − 1)p2 − n2 p2 = np(1 − p).

4.10 D ISTRIBUTION F UNCTIONS AND P ROBABILITY M ASS F UNCTIONS

Let X be a discrete random variable. Recall the definitions for the distribution
function (d.f.) and the probability mass function (p.m.f.) of X:

(1) For distribution function, FX : R −→ R defined by

FX (x) = P(X ≤ x).

(2) For probability mass function, pX : R −→ R defined by

pX (x) = P(X = x).

Properties of distribution function


(i) FX is a nondecreasing function, i.e., if a < b, then FX (a) ≤ FX (b).

(ii) limb→∞ FX (b) = 1.

(iii) limb→−∞ FX (b) = 0.

(iv) FX is right continuous. That is, for any b ∈ R

lim FX (x) = FX (b).


x→b+

Proof

(i) Note that for a < b, {X ≤ a} ⊂ {X ≤ b} and so the result follows from
Proposition 2.9 (page 19).

(ii) For bn ↑ ∞, the events {X ≤ bn }, n ≥ 1, are increasing events whose union


is the event {X < ∞}. By Proposition 2.25 (page 30),

lim P(X ≤ bn ) = P(X < ∞) = 1.


n→∞
92 Chapter 4. Random Variables

(iii) Similar to the above. Consider bn ↓ −∞ and the events {X ≤ bn }, n ≥ 1.


(iv) If bn decreases to b, then {X ≤ bn }, n ≥ 1, are decreasing events whose
intersection is {X ≤ b}. Again, Proposition 2.25 yields
lim P(X ≤ bn ) = P(X ≤ b).
n→∞

z
Some useful calculations
Theoretically, all probability questions about X can be computed in terms of
density function (or probability mass function).

(1) Calculating probabilities from density function

(i) P(a < X ≤ b) = FX (b) − FX (a).

Proof Note that {X ≤ b} = {X ≤ a} ∪ {a < X ≤ b} so


P(X ≤ b) = P(X ≤ a) + P(a < X ≤ b).
Rearrangement yields the result. z

(ii) P(X < b) = limn→∞ F b − n1 .




Proof
  
1
P(X < b) = P lim X ≤ b −
n→∞ n
 
1
= lim P X ≤ b −
n→∞ n
 
1
= lim F b − .
n→∞ n
Note that P{X < b} does not necessarily equal F(b), since F(b) also
includes the probability that X equals b. z

(iii) P(X = a) = FX (a) − FX (a−) where FX (a−) = limx→a− FX (x).


(iv) Using the above, we can compute P(a ≤ X ≤ b); P(a ≤ X < b) and
P(a < X < b). For example,
P(a ≤ X ≤ b) = P(X = a) + P(a < X ≤ b)
= FX (a) − FX (a−) + [FX (b) − FX (a)]
= FX (b) − FX (a−).
Similarly for the other two.
4.10. Distribution Functions and Probability Mass Functions 93

(2) Calculating probabilities from probability mass function

P(A) = ∑ pX (x).
x∈A

(3) Calculate probability mass function from density function

pX (x) = FX (x) − FX (x−), x ∈ R.

(4) Calculate density function from probability mass function

FX (x) = ∑ pX (y) x ∈ R.
y≤x

E XAMPLE 4.48 The distribution function of the random variable X is


given by 

0, x < 0,

x
 2 , 0 ≤ x < 1,



F(x) = 23 , 1 ≤ x < 2,

 11 , 2 ≤ x < 3,



 12
1, 3 ≤ x.

Compute
(a) P(X < 3),
(b) P(X = 1),
(c) P X > 12 ,


(d) P(2 < X ≤ 4).

Solution:

(a) P(X < 3) = limn P X ≤ 3 − n1 = limn F 3 − 1n = limn 1211


= 11
 
12 .
1
1−
(b) P(X = 1) = F(1) − limn F 1 − n1 = 32 − limn 2 n = 32 − 12 = 61 .


(c) P X > 12 = 1 − P X ≤ 21 = 1 − F 12 = 34 .
  

1
(d) P(2 < X ≤ 4) = F(4) − F(2) = 12 .

R ECOMMENDED R EADING Read Examples 3d, 4c, 6d, 6e,6g, 8b, 8c, 8f, 8g,
8h, 8j in Chapter 4 of [Ross].

You might also like