You are on page 1of 21

MA295S.

10 - ELEMENTS OF PROBABILITY
This course introduces the essential concepts of probability to MS and PhD Math students through topics
such as random variables and their transformations, special probability distributions, expectations and
cumulants, and probability generating functions. It also discusses modes of convergence, the Central Limit
Theorem, the Laws of Large Numbers, and other special topics. Emphasis is on proving results, to prepare
students for a more intensive probability course grounded on measure theory.
Summer 2013
12:00 - 1:30 pm
SEC A209
Richard B. Eden, Ph.D.
reden@math.admu.edu.ph
Mathematics Department
Ateneo de Manila University
Course Requirements
1. Regular problem sets
2. Midterm exams - written
3. Oral presentation on a special topic or paper on probability at the level of this course
1. Sample Space and Probability
1.1. Probabilistic Models
There are two types of phenomena.
1. Deterministic - repeated observations under a specied set of conditions invariably lead to the same
outcome.
A ball initially at rest is dropped a height of s meters inside an evacuated cylinder. We observe the
time it falls: t =
_
2s/g.
2. Random or nondeterministic - repeated observations under a specied set of conditions do not lead
to the same result
A coin is tossed. We observe the result: heads or tails.
A probabilistic model is a mathematical description of an uncertain event. It involves an underlying
process, called the experiment, that will produce exactly one out of several possible outcomes.
Given an experiement:
The sample space, denoted by , is the set of all possible outcomes.
A sample outcome or a sample point is any element .
An event is any subset A .
Example 1 Consider the experiment of tossing a coin three times.
Suppose we receive $1 each time a head comes up. We can then work with = {0, 1, 2, 3}.
Suppose we receive $1 for every toin coss, up to and including the rst time a head comes up. Then we
receive $2 for every toin coss, up to the second time a head comes up. Generally, the amount per toss
is doubled each time a head comes up. Then we need to work with = {(a, b, c) : a, b, c = H or T}.
Example 2 Consider the experiment of choosing real values for the coecients of ax
2
+ bx + c = 0.
Then = {(a, b, c)|a = 0}. If A is the event for which the equation has no real solution, then A =
{(a, b, c)|b
2
4ac < 0}.
Next, for each experiment with sample space , we want to choose events to which we must associate
probabilities. Let F be a collection of events, i.e., a collection of subsets of .
Denition F is called a -eld of if the following are satised.
1. F
2. A F A
c
F
3. A
1
, A
2
, . . . F

_
j=1
A
j
F
Example 3 Here are three simple examples of -elds of .
F = {, }
F = power set of = {all subsets of }
If A , F = {, A, A
c
, }.
Remark A -eld is closed under the taking of complements, countable unions and countable intersections.
MA 295S.10 Elements of Probability 1 Summer 2013 (R. Eden)
A
1
, A
2
, . . . F

j=1
A
j
F
If A, B F, then the following are also in F: A B, A B, A B = A B
c
, B A, and
AB = (AB) (B A).
Finally, we want to allocate a probability to each event A F.
Denition A mapping P : F R is called a probability measure on (, F) if the following properties are
satised.
(K1) P(A) 0 for all A F
(K2) P() = 1
(K3) If A
1
, A
2
, . . . F and A
i
A
j
= , then P
_

_
n=1
A
n
_
=

n=1
P(A
n
).
Remark If AB = , then we say that A and B are mutually exclusive. (K1), (K2) and (K3) are called
Kolmogorovs Axioms, named after A. Kolmogorov, one of the fathers of probability theory. The triple
(, F, P) is called a probability space. P is also referred to as a probability law.
Remark In many instances, we will talk about probability models without explicitly mentioning the
underlying probability space anymore. It will be assumed we are working under an appropriate probability
space (, F, P).
Theorem Let A, B F.
1. P(A
c
) = 1 P(A)
2. P() = 0
3. If A B, then P(A) P(B).
4. P(A) 1
5. P(A B) = P(A) +P(B) P(A B)
Corollary Let A, B, C F.
1. P(A B) P(A) +P(B)
2. P(A B C) = P(A) +P(B) +P(C) P(A B) P(B C) P(C A) +P(A B C)
Example 4 Suppose that P(A) = 0.9 and P(B) = 0.8.
1. Show that P(A B) 0.7.
2. If P(A B) = 0.75, nd P(AB) and P(A
c
B
c
).
Example 5 A sequence of events {A
n
}
n1
is said to be an increasing sequence if A
1
A
2
. Dene
A = lim
n
A
n
=

_
n=1
A
n
. Prove that lim
n
P(A
n
) = P(A).
Example 6 Consider an experiment involving three tosses of a fair coin. What are , F and P? If
A = {exactly 2 heads occur}, nd P(A).
Theorem If the sample space consists of a nite number of possible outcomes, then for any event A =
{s
1
, s
2
, . . . , s
m
}, P(A) = P({s
1
}) +P({s
2
}) + +P({s
m
}).
MA 295S.10 Elements of Probability 2 Summer 2013 (R. Eden)
Theorem If the sample space consists of n possible outcomes which are equally likely, then the probability
of any event A is given by P(A) =
#A
n
.
Example 7 One card is drawn at random from a deck of 52 cards. What is the probability that it is
neither a heart nor a queen?
Example 8 A fair coin is tossed three times. What is the probability that heads show up an even number
of times?
1.2 Conditional Probability
On a throw of a fair die, the probability we get a 4 is
1
6
. But if we already know beforehand that the
outcome is an even number, the probability is
1
3
. We write P(A|B) to denote the probability of event A
given event B, i.e., given that event B has occurred.
Denition Let A and B be events with P(B) > 0. The (conditional) probability of A given B is
P(A|B) =
P(A B)
P(B)
.
Example 9 The probability that a regularly scheduled ight departs on time is P(D) = 0.83; the prob-
ability that it arrives on time is P(A) = 0.82; and the probability that it departs and arrives on time is
0.78. Find the probability that a plane
1. arrives on time given that it departed on time,
2. departed on time given that it has arrived on time, and
3. arrives on time, given that it did not depart on time.
Theorem Let (, F, P) be a probability space, and B F with P(B) > 0. Then (, F, Q) is also a
probability space where Q : F R is dened as Q(A) = P(A|B).
Since conditional probabilities constitute a legitimate probability law, all general properties of proba-
bility laws remain valid.
Corollary Let A, B, C F with P(B) > 0.
1. P(A
c
|B) = 1 P(A|B)
2. If A C, then P(A|B) P(C|B).
3. P(A C|B) = P(A|B) +P(C|B) P(A C|B)
Example 10 We toss a fair coin three times. Find P(A|B) where A = {more heads than tails come up}
and B = {1st toss is a head}.
Example 11 Find P(A B) if P(A) = 0.2, P(B) = 0.4 and P(A|B) +P(B|A) = 0.75.
Theorem P(A B) = P(A|B)P(B) and P(A B C) = P(A|B C)P(B|C)P(C)
Example 12 If an aircraft is present in a certain area, a radar correctly registers its presence with prob-
ability 0.99. If it is not present, the radar falsely registers an aircraft presence with probability 0.10. We
assume that an aircraft is present with probability 0.05. What is the probability of false alarm (a false
indication of aircraft presence), and the probability of missed detection (nothing registers, even though an
aircraft is present)?
MA 295S.10 Elements of Probability 3 Summer 2013 (R. Eden)
Example 13 Three cards are drawn (in order) from an ordinary deck without replacement. Find the
probability that none of the cards is a heart.
Example 14 Consider the set of families having two children. Assume that the four possible birth se-
quences, taking gender into account - BB, BG, GB, GG - are equally likely. What is the probability that
both children are boys given that at least one is a boy?
1.3 Total Probability Theorem and Bayes Rule
A partition of is a collection of events {A
1
, A
2
, . . . , A
n
} such that =
n
_
i=1
A
i
and A
i
A
j
= for i = j.
Theorem (Total Probability Theorem) Let {A
1
, A
2
, . . . , A
n
} be a partition of with P(A
i
) > 0 for
i = 1, 2, . . . , n. Then for any B F,
P(B) =
n

i=1
P(B|A
i
)P(A
i
) = P(B|A
1
)P(A
1
) + +P(B|A
n
)P(A
n
).
Example 15 Two numbers are chosen at random from among the numbers 1 to 10 without replacement.
Find the probability that the second number chosen is 5.
Example 16 We roll a fair four-sided die. If the result is 1 or 2, we roll once more but otherwise we stop.
What is the probability that the sum total of our rolls is at least 4?
Example 17 Alice is taking a probability class and at the end of each week she can be either up-to-date or
she may have fallen behind. If she is up-to-date in a given week, the probability that she will be up-to-date
(or behind) in the next week is 0.8 (or 0.2, respectively). If she is behind in a given week, the probability
that she will be up-to-date (or behind) in the next week is 0.6 (or 0.4, respectively). Alice is (by default)
up-to-date when she starts the class. What is the probability that she is up-to-date after three weeks?
Theorem (Bayes Rule) Let {A
1
, A
2
, . . . , A
n
} be a partition of with P(A
i
) > 0 for i = 1, 2, . . . , n. Then
for any B F with P(B) > 0,
P(A
i
|B) =
P(A
i
)P(B|A
i
)
P(B)
=
P(A
i
)P(B|A
i
)
P(A
1
)P(B|A
1
) + +P(A
n
)P(B|A
n
)
.
Remark Bayes Rule is often used for inference. There are a number of causes A
1
, A
2
, . . . , A
n
that may
result in a certain eect B. Given that the eect B has been observed, we wish to evaluate the probability
that the cause A
i
is present.
Example 18 A company producing electric relays has three manufacturing plants producing 50, 30 and
20 percent, respectively, of its product. Suppose that the probabilities that a relay manufactured by these
plants is defective are 0.02, 0.05, and 0.01, respectively.
1. If a relay is selected at random from the output of the company, what is the probability that it is
defective?
2. If a relay selected at random is found to be defective, what is the probability that it was manufactured
by plant 2?
MA 295S.10 Elements of Probability 4 Summer 2013 (R. Eden)
Example 19 A simple binary communication channel carries messages by using only two signals, 0 and
1. For this channel, 40% of the time a 1 is transmitted. The probability that a transmitted 0 is correctly
received is 0.90, and the probability that a transmitted 1 is correctly received is 0.95.
1. What is the probability of a 1 being received?
2. Given that a 1 is received, what is the probability that a 1 was transmitted?
Example 20 You know that a certain letter is equally likely to be in any one of three dierent folders.
Let
i
< 1 (i = 1, 2, 3) be the probability that you will nd your letter upon making a quick examination
of Folder i if the letter is, in fact, in Folder i. Suppose you look in Folder 1 and do not nd the letter.
What is the probability that the letter is in Folder 1?
1.4 Independence
If P(A|B) = P(A), the knowledge that event B has occurred has no eect on the probability of event A.
Consequently, P(A B) = P(A)P(B), and also, P(B|A) = P(A).
Denition The events A and B are independent if P(A B) = P(A)P(B).
If P(A) > 0 and P(B) > 0, then A and B are independent i P(A|B) = P(A) or P(B|A) = P(B).
Example 21 Consider the experiment of throwing two fair dice. Let A be the event that the sum of the
dice is 7, B the event that the sum is 6, and C the event that the rst die is 4. Show that the events A
and C are independent, but events B and C are not independent.
Example 22 Bill and George go target shooting together. Both shoot at a target at the same time.
Suppose Bill hits the target with probability 0.7 whereas George, independently, hits the target with
probability 0.4.
1. Given that exactly one shot hit the target, what is the probability that it was Georges shot?
2. Given that the target is hit, what is the probability that George hit it?
Theorem If A and B are independent, then so are (a) A and B
c
, (b) A
c
and B, and (c) A
c
and B
c
.
Denition The events A
1
, A
2
, . . . , A
n
are independent if P
_

iS
A
i
_
=

iS
P(A
i
) for every subset S of
{1, 2, . . . , n}.
Remark The three events A, B, C are independent if all the following are satised: (i) P(A B) =
P(A)P(B), (ii) P(BC) = P(B)P(C), (iii) P(CA) = P(C)P(A), and (iv) P(ABC) = P(A)P(B)P(C).
A, B, C are said to be pairwise disjoint if (i)-(iii) hold.
Example 23 Let A
1
, A
2
, . . . , A
n
be independent events, and P(A
i
) = p for each A
i
.
1. What is the probability that none of the A
i
s occur?
2. Show that the probability that an even number of the A
i
s occur is
1
2
(1 +(q p)
n
) where q = 1 p.
MA 295S.10 Elements of Probability 5 Summer 2013 (R. Eden)
1.5 Problems
1. Prove that P
_
n
_
i=1
A
i
_

i=1
P(A
i
).
2. A sequence of events {A
n
}
n1
is said to be a decreasing sequence if A
1
A
2
. Dene A =
lim
n
A
n
=

n=1
A
n
. Prove that lim
n
P(A
n
) = P(A).
3. The numbers 1, 2, 3, . . . , n are arranged in random order. Find the probability that 1, 2, 3 are neigh-
bors in the ordering.
4. A committee of 5 persons is to be selected randomly from a group of 5 men and 10 women.
(a) Find the probability that the committee consists of 2 men and 3 women.
(b) Find the probability that the committee consists of all women.
5. A die is loaded in such a way that the probability of each face turning up is proportional to the
number of dots on that face. What is the probability of getting an even number in one throw?
6. 90% of men and 95% of women in a population are right-handed. In the population, 52% are men
and 48% are women. What is the probability that a randomly selected person is right-handed?
7. Show that if P(A|B) > P(A), then P(B|A) > P(B).
8. Suppose P(A) =
2
5
, P(AB) =
3
5
, P(B|A) =
1
4
, P(C|B) =
1
3
and P(C|AB) =
1
2
. Find P(A|BC).
9. Prove: P
_
n

i=1
A
i
_
= P(A
n
|A
1
A
2
A
n1
) P(A
3
|A
1
A
2
)P(A
2
|A
1
)P(A
1
).
10. Justin hasnt decided yet where to have his next concert. There is a 50% that it will be in China,
a 30% chance it will be in India, and a 20% chance it will be in the Philippines. The probability of
Selena attending the concert is 30% if it is in China, 10% if in India, and 20% if in the Philippines.
If Selena ends up attending the concert, what is the probability that Justin chose the Philippines?
11. One bag contains 4 white balls and 3 black balls, and a second bag contains 3 white balls and 5
black balls. One ball is drawn from the rst bag and placed unseen in the second bag. What is the
probability that a ball now drawn from the second bag is black?
12. If the events A, B and C are independent, are A and BC necessarily independent? Prove, or show
a counterexample.
13. Given an event C, Aand B are said to be conditionally independent if P(AB|C) = P(A|C)P(B|C).
In this case, prove that P(A|B C) = P(A|C), i.e., if C is known to have occurred, the additional
knowledge that B also occurred does not change the probability of A.
14. How many times should a fair die be rolled so that the probability of
(a) at least one six is at least 0.9?
(b) at least two sixes is at least 0.9?
15. In the experiment of throwing two fair dice, consider the events A = {the rst die is odd}, B =
{the second die is odd}, and C = {the sum is odd}. Show that A, B, C are pairwise independent,
but not independent.
16. In the experiment of throwing two fair dice, consider the events A = {the rst die is 1, 2 or 3},
B = {the rst die is 3, 4 or 5}, and C = {the sum is 9}. Show that P(ABC) = P(A)P(B)P(C),
but A, B, C are not independent.
MA 295S.10 Elements of Probability 6 Summer 2013 (R. Eden)
2. Random Variables
2.1. Concept of a Random Variable
In many probabilistic models, the outcomes are of a numerical nature (e.g., instrument readings, stock
prices). In other experiments, the outcomes are not numerical, but they may be associated with some
numerical value of interest.
Denition A random variable (r.v.) X is a real-valued function of the experimental outcome.
A r.v. is a function that associates a real number with each sample point. Given the probability space
(, F, P), a (measurable) function X : R is a r.v. A r.v. is usually denoted by an upper case letter.
Example 1 Consider the experiment of tossing a coin three times, and noting whether each toss results
in head or tail. A sample point is
1
=HTH. Let X be the r.v. which denotes the number of heads. Then
for this outcome, X = 2; strictly speaking, X(
1
) = 2. For
2
=HTT, X = 1, i.e., X(
2
) = 1.
Example 2 Consider the experiment of tossing two die. Let X and Y be the r.v.s representing the sum,
and the outcome of the second dice, respectively. For the outcome = (3, 4), X = 7 and Y = 4, i.e.,
X() = 7 and Y () = 4.
Example 3 Suppose = [1, 1]. Dene the r.v.s X, Y and Z as X() = , Y () =
2
and Z() =
sgn(). If we think of the experiment of randomly choosing any number between 1 and 1 (inclusive),
then X is the outcome, Y = X
2
is its square, and Z is any of three possible values: 1 if we picked a
negative number, 0 if we picked 0, and 1 if we picked a positive number.
Denition A random variable is discrete if its set of possible values is nite or at most countably innite,
and continuous if it takes on values on a continuous scale.
In the preceding example, Z is a discrete r.v. while X and Y are continuous. X takes on all values in
[1, 1], Y takes on all values in [0, 1], and Z takes on the values 1, 0, 1. In practical problems, continuous
r.v.s represent measured data such as all possible heights, temperatures, or distances, whereas discrete
random variables represent count data, such as the number of defectives in a sample of 7 items, or the
number of highway fatalities per month in a city.
2.2 Discrete Random Variables
A discrete random variable can be analyzed through its probability mass function.
Denition Given a discrete r.v. X whose range of values is S, its probability mass function (PMF)
or probability distribution is a function p : S R for which
1. p(x) 0 for all x S,
2.

xS
p(x) = 1, and
3. P(X = x) = p(x) for all x S.
Remark Strictly speaking, when we write P(X = x), we mean P({X = x}), the probability of the event
where X = x, i.e., the event { : X() = x}. If theres more than one r.v. being considered, we may
choose to avoid ambiguity by writing p
X
instead of p. Also, we may extend the domain of p from S to R
by letting p(x) = 0 if x / S.
MA 295S.10 Elements of Probability 7 Summer 2013 (R. Eden)
Example 4 A r.v. X has possible outcomes x = 0, 1, 2, 3. Find c if its PMF is p(x) = c(x
2
+ 4).
Example 5 A fair coin is tossed until a head comes up for the rst time. Let the r.v. Y be the number
of the toss on which this rst head comes up. What is the PMF of Y ? Verify this is a valid PMF. What
is the probability that the rst head happens on an odd-numbered toss?
Example 6 A biased coin is tossed, which comes up a head with probability p, and a tail with probability
1 p. Let X = 1 if the outcome is a head, and X = 0 if the outcome is a tail. What is the PMF of X?
An experiment with two possible outcomes - success and failure - is called a Bernoulli trial. Dene
the r.v. X to be X = 1 when the result is a success and X = 0 when the result is a failure. X is then
said to be a Bernoulli random variable. We also say X has a Bernoulli distribution, with PMF as in
the preceding example.
If Y = g(X) is a function of a r.v. X, then Y is also a r.v. If X is discrete and has PMF p
X
, then Y
is also discrete, with PMF p
Y
characterized by p
Y
(y) =

{x|g(x)=y}
p
X
(x).
Example 7 The r.v. X has PMF
p
X
(x) =
_
_
_
2/7 if x is an integer in the range [1, 1]
1/7 if x = 2
0 otherwise
Find the PMF of Y = X
2
.
Denition Given a discrete r.v. X with PMF p, its cumulative distribution function (CDF) is the
function F dened by F(x) = P(X x) =

tx
p(t).
Example 8 Find the CDF F of the r.v. X from the preceding example and graph it. Use the CDF to
nd P(X = 0).
If x
i
is a value assumed by X, and x
i1
, another value assumed by X, is the largest possible value less
than x
i
, then clearly, p(x
i
) = F(x
i
) F(x
i1
). The CDF of X has a piecewise constant and staircase-like
form.
2.3 Continuous Random Variables
Denition Given a continuous r.v. X, its probability density function (PDF), if it exists, is a function
f : R R for which
1. f(x) 0 for all x R,
2.
_

f(x) dx = 1, and
3. P(a X b) =
_
b
a
f(x) dx.
Remark A continuous r.v. has a probability of 0 of assuming exactly any of its values. If X has PDF f,
P(X = a) =
_
a
a
f(x)dx = 0
MA 295S.10 Elements of Probability 8 Summer 2013 (R. Eden)
and so
P(a X b) = P(a X < b) = P(a < X b) = P(a < X < b).
Consider a r.v. whose values are the heights of all people, Between, any two values, say, 163.99 and 164.01
cm, there are an innite number of heights. The probability of selecting a person at random exactly 164 cm
tall and not one of the innitely large set of heights so close to 164 cm that you cannot humanly measure
the dierence, is remote, and thus we assign a probability of 0 to the event.
Remark If B B(R), P(X B) =
_
B
f(x) dx.
Example 9 A r.v. X has PDF
f(x) =
_
c
(x + 5)
3
if x > 0
0 elsewhere
for some constant c. Find (a) c, (b) P(X > 7), (c) P
_
X +
8
X
> 6
_
, and (d) P(X is an even integer).
Remark For convenience, we will refer to the support of a continuous r.v. X, denoted S
X
, as the set
of values x where its density is positive: f(x) > 0. The support is essentially the set of values that X
assumes. In the example above, S
X
= (0, ). Strictly speaking, integrations involving the density should
only be over the support. For convenience, we will indicate these as integrations over (, ) without
sacricing correctness since f(x) = 0 anyway outside the support.
Denition Given a continuous r.v. X with PDF f, its cumulative distribution function (CDF) is the
function F dened by F(x) = P(X x) =
_
x

f(x) dx.
Remark It follows that
d
dx
F(x) = f(x) at points x where F has a derivative, P(a < X < b) = F(b)F(a)
and P(X > a) = 1 F(a). Also, F is continuous on (, ).
Example 10 Let > 0. A r.v. X with density f(x) = e
x
, x > 0 is said to be an exponential
random variable with parameter (we also say X has exponential distribution). Find the CDF of X.
Example 11 Let X have CDF F(x) =
_
_
_
0 x < 0
x
2
/R
2
0 x R
1 x > R
. Find its PDF.
Remark Even though in the preceding example, F is not dierentiable at R, we can dene f(R) = 0;
the resulting f will be a density for X.
If the r.v. Y = g(X) is a function of X, we can nd the PDF of Y by dierentiating the CDF of Y .
We can nd the latter by expressing this in terms of the CDF of X because F
Y
(y) = P(g(X) y).
Example 12 Let X be a r.v. with PDF f
X
(x) =
1 +x
2
, 1 < x < 1. Find the PDF of W = 4X,
Y = 2 3X and Z = X
2
.
Are there random variables that are neither discrete nor continuous? Yes. Consider the r.v. X which
assumes all values in [1, 3], with the following properties: P(X = 1) =
2
9
, P(X = 3) =
1
9
, and it has density
p(x) = 1/x
2
for 1 < x < 3, i.e., P(a < X < b) =
_
b
a
(1/x
2
)dx for 1 < a b < 3.
MA 295S.10 Elements of Probability 9 Summer 2013 (R. Eden)
2.4 Expectation and Variance
A biased coin is more likely to turn up heads than tails. Let X = 1 if it turns up heads and X = 1 if
it turns up tails. Suppose P(X = 1) = 0.6, and that in fact, in 100 throws of the coin, 60 turned up heads
and 40, tails. X is recorded for each throw. What is the average of the values of X?
Denition Let X be a r.v. with probability distribution (PMF/PDF) f(x). The mean or expected
value or expectation of X, denoted by E(X), or
X
, is
E(X) =

x
xf(x) if X is discrete
and
E(X) =
_

xf(x) dx if X is continuous.
Example 13 A r.v. X has Poisson distribution with paramteter > 0 if it has PMF p(x) =

x
e

x!
for x = 0, 1, 2, . . .. Verify that this is a valid PMF, and nd E(X).
Example 14 Recall that a r.v. X has exponential distribution with parameter > 0 if it has PDF
f(x) = e
x
for x > 0. Find E(X).
Remark If X assumes innitely many values, the innite sum or improper integral dening E(X) may
not be well-dened. To remedy this, we will require that the sum/integral in the denition converge
absolutely:

x
|x|f(x) < ,
_

|x|f(x) dx < . Otherwise, X does not have nite expectation and


E(X) is undened. In fact, E(X) < i E(|X|) < .
Theorem Let X be a r.v. with probability distribution f(x). Given a function g : R R, the r.v. g(X)
has expected value
E(g(X)) =

x
g(x)f(x) if X is discrete
and
E(g(X)) =
_

g(x)f(x) dx if X is continuous,
provided

x
|g(x)|f(x) < or
_

|g(x)|f(x) dx < .
Example 15 Find E(7X + 1) for the r.v. in Example 7, and E(X
2
) for the r.v.s in Example 13 and
Example 14.
Theorem Let the r.v. X have nite expectation. Let g and h be functions for which g(X) and h(X) both
have nite expectations.
1. E(aX +b) = aE(X) +b for any constants a and b.
2. E(g(X) +h(X)) = E(g(X)) +E(h(X))
3. If g(X) h(X), then E(g(X)) E(h(X)).
4. |E(X)| E(|X|)
MA 295S.10 Elements of Probability 10 Summer 2013 (R. Eden)
Theorem Let X be a nonnegative integer-valued random variable. If X has nite expectation, then
E(X) =

x=1
P(X x).
Example 16 If 0 < p < 1, then X is called a geometric random variable with parameter p if it has
mass function p(x) = p(1 p)
x
for x = 0, 1, 2, . . .. Find E(X).
Denition Let p 0. The pth moment, or moment of order p, of X, is E(X
p
), if this is nite.
Example 17 Let X have density p(x) = 2/x
3
for x > 1. Find its rst and second moments.
Theorem (Jensens Inequality) Let g be convex on [a, b], and X a r.v. such that a X b. Then
g(E(X)) E(g(X)).
Corollary For 0 < q < p, if the pth moment exists, then so does the qth moment.
Denition Let X be a r.v. with probability distribution f(x), nite second moment, and mean . The
variance of X, denoted by Var(X),
2
or
2
X
, is
E[(X )
2
] =

x
(x )
2
f(x) if X is discrete
and
E[(X )
2
] =
_

(x )
2
f(x) dx if X is continuous.
The standard deviation of X is
X
=
_
Var(X).
Remark For a r.v. X, E(X) describes where the distribution is centered. Var(X) on the other hand
describes the variability by considering the spread of values of X from the mean.
Example 18 Find the variance of X with mass function p for which p(0) =
1
4
, p(1) =
1
8
, p(2) =
1
2
,
p(3) =
1
8
.
Theorem The variance of X is Var(X) = E(X
2
) [E(X)]
2
= E(X
2
)
2
.
Example 19 Verify this theorem for the preceding example. Find also the variance of Y = 2X + 3.
Theorem If X has nite second moment, Var(aX +b) = a
2
Var(X) for any constants a, b.
Example 20 Find the variance of a r.v. X having Poisson distribution with parameter , and a r.v. Y
having Exponential distribution with parameter .
2.5 Joint Probability Distributions
Many probabilistic situations often involve several random variables of interest. Given two r.v.s X and
Y in the same probability space (, F, P), we will write P(X = x, Y = y) to denote the probability of the
event {X = x} {Y = y}, i.e., the event { : X() = x and Y () = y}.
Denition The function p
X,Y
(x, y) = p(x, y) is a joint probability mass function or joint probability
distribution of the discrete random variables X and Y if
1. p(x, y) 0 for all x, y
MA 295S.10 Elements of Probability 11 Summer 2013 (R. Eden)
2.

y
p(x, y) = 1
3. P(X = x, Y = y) = p(x, y).
For any region A in the xy-plane, P[(X, Y ) A] =

(x,y)A
p(x, y).
Example 21 Suppose X and Y have the following joint probability distribution:
X
2 4
1 0.10 0.15
Y 3 0.20 0.30
5 0.10 0.15
Find P(X +Y 5), P(X = 2) and P(Y = 3).
Denition The function f
X,Y
(x, y) = f(x, y) is a joint probability density function of the continuous
random variables X and Y if
1. f(x, y) 0 for all x, y
2.
_

f(x, y) dxdy = 1
3. For any region A in the xy-plane, P[(X, Y ) A] =
__
A
f(x, y) dxdy.
Example 22 Let X and Y have joint density f
X,Y
(x, y) = 2e
(x+y)
, 0 < x < y, 0 < y, i.e., the support is
the region in the rst quadrant above the line y > x. Find P(Y < 3X).
Given the joint distribution f
X,Y
(x, y) of X and Y , how do we determine the distribution of X and of
Y alone?
Denition If the joint distribution of X and Y is f
X,Y
(x, y), the marginal distributions of X alone
and of Y alone are
f
X
(x) =

y
f
X,Y
(x, y) and f
Y
(y) =

x
f
X,Y
(x, y)
for the discrete case, and
f
X
(x) =
_

f
X,Y
(x, y) dy and f
Y
(y) =
_

f
X,Y
(x, y) dx
for the continuous case.
Example 23 Find the marginal distributions of X and Y in Example 21 and Example 22.
Denition The joint cumulative distribution function of the r.v.s X and Y is F
X,Y
(x, y) = P(X
x, Y y). If their joint density is f
X,Y
(x, y), then
F
X,Y
(x, y) =

ux

vy
f
X,Y
(u, v)
for the discrete case, and
F
X,Y
(x, y) =
_
x

_
y

f
X,Y
(u, v) dv du
MA 295S.10 Elements of Probability 12 Summer 2013 (R. Eden)
for the continuous case.
Theorem Let X and Y be continuous r.v.s with joint CDF F
X,Y
(x, y). Their joint PDF is f
X,Y
(x, y) =

2
xy
F
X,Y
(x, y) provided F
X,Y
has continuous second partial derivatives.
Example 24 Both the r.v.s X and Y assume values in [0, 1]. If their joint CDF is F
X,Y
(x, y) =
1
3
x
2
(2y+y
2
)
in [0, 1] [0, 1], nd their PDF.
Remark The denitions and theorems in this section extend in a very straightforward way to situations
involving more than two variables.
2.6 Further Properties of Expectations
Denition Suppose X and Y are r.v.s with joint distribution f
X,Y
(x, y). Then the expected value of the
random variable g(X, Y ), a function of X and Y , is given by
E[g(X, Y )] =

y
g(x, y)f
X,Y
(x, y)
for the discrete case, provided

x

y
|g(x, y)|f
X,Y
(x, y) < , and
E[g(X, Y )] =
_

g(x, y)f
X,Y
(x, y) dxdy
for the continuous case, provided
_

|g(x, y)|f
X,Y
(x, y) dxdy < .
Theorem If X and Y both have nite mean, then E(aX +bY ) = aE(X) +bE(Y ).
Expectation is a linear operator: E[ag(X, Y ) +bh(X, Y )] = aE[g(X, Y )] +bE[h(X, Y )].
Example 25 An electrical circuit has three resistors wired in parallel. Their actual resistances, X, Y
and Z, vary between 10 and 20 according to their joint PDF f
X,Y,Z
(x, y, z) =
1
67500
(xy + xz + yz), 10
x, y, z 20. What is the expected resistance for the circuit. Fact: if R is the circuits resistance, then
1/R = 1/X + 1/Y + 1/Z.
Example 26 A disgruntled assistant is upset about having to stu envelopes. Handed a box of n letters
and n envelopes, he vents his frustration by putting the letters into the envelopes at random. How many
people, on average, will receive their correct mail?
Example 27 Ten fair dice are rolled. Calculate the expected value of the sum of the faces showing.
Denition The covariance of the r.v.s X and Y , denoted Cov(X, Y ) or
XY
is E[(X
X
)(Y
Y
)].
Their correlation is (X, Y ) =
Cov(X, Y )
_
Var(X) Var(Y )
=

XY

Y
.
The covariance is a measurement of the association between two random variables. For example, if
large (small) values of X often result in large (small) values of Y , positive (negative) X
X
will often
result in positive (negative) Y
Y
; (X
X
))(Y
Y
) will tend to be positive. Also, we will use
XY
for the covariance instead of the standard deviation of the product XY .
Theorem Cov(X, Y ) =
XY
= E(XY ) E(X)E(Y ) = E(XY )
X

Y
.
Example 28 Find the covariance and correlation of X and Y if they have the following joint distribution:
MA 295S.10 Elements of Probability 13 Summer 2013 (R. Eden)
X
2 4
1 0.10 0.20
Y 3 0.20 0.25
5 0.10 0.15
Theorem We have the following properties of variance and covariance.
1. Cov(X, Y ) = Cov(Y, X), i.e.,
XY
=
Y X
2. Cov(X, X) = Var(X), i.e.,
XX
=
2
X
3. Cov(aX +b, cY +d) = ac Cov(X, Y )
4. Var(aX +bY ) = a
2
Var(X) +b
2
Var(Y ) + 2ab Cov(X, Y ), i.e.,
2
aX+bY
= a
2

2
X
+b
2

2
Y
+ 2
XY
5. Var(a
1
X
1
+a
2
X
2
+ +a
n
X
n
) =
n

i=1
a
2
i
Var(X
i
) + 2

j<k
a
j
a
k
Cov(X
j
, X
k
)
Theorem (Cauchy-Schwarz Inequality) |E(XY )|
2
E(X
2
)E(Y
2
)
Corollary |(X, Y )| 1
Theorem (Markovs Inequality) Let X be a nonnegative random variable with nite mean. Then for any
t > 0,
P(X t)
E(X)
t
.
Corollary Let X be a nonnegative random variable with nite pth moment. Then for any t > 0,
P(X t)
E(X
p
)
t
p
.
Remark We expect that as t , P(|X| t) 0. From the above result, we get more information
about how fast P(|X| t) decays to 0 based on the moments that we know are nite for X.
Theorem (Chebyshevs Inequality) Let X have nite second moment. Then for any a > 0,
P[|X E(X)| a]
Var(X)
a
2
.
Chebyshevs Inequality sometimes takes the following form: If X has nite second moment, then for
any k,
P( k < X < +k) 1
1
k
2
.
The probability that any random variable will assume a value within k standard deviations of the mean is
at least 1 1/k
2
.
Example 29 A r.v. X has mean = 10 and variance
2
= 4. Use Chebyshevs Inequality to estimate
P(5 < X < 15). For what values of c does Chebyshevs Inequality guarantee that P(|X 10| c) 0.04.
MA 295S.10 Elements of Probability 14 Summer 2013 (R. Eden)
2.7 Independent Random Variables
Denition The two random variables X and Y are said to be independent if for every interval A and
every interval B, P(X A and Y B) = P(X A)P(Y B).
Theorem The random variables X and Y are independent if and only if there are functions g(x) and h(y)
such that f
X,Y
(x, y) = g(x)h(y). If this equation holds, there is a constant k such that f
X
(x) = kg(x) and
f
Y
(y) =
1
k
h(y).
Example 30 The joint density of X and Y are given. Are X and Y independent?
1. f
X,Y
(x, y) = 12xy(1 y), 0 x 1, 0 y 1
2. f
X,Y
(x, y) = 8xy, 0 y x 1
Example 31 If two random variables X and Y are dened over a region in the xy-plane that is not a
rectangle (possibly innite) with sides parallel to the coordinate axes, can X and Y be independent?
Theorem X and Y are independent i F
X,Y
(x, y) = F
X
(x)F
Y
(y).
Remark If X and Y are independent, and A and B are formed from intervals under the operations of
countable intersection, countable union, and complement, then P(X A and Y B) = P(X A)P(Y
B).
Example 32 Show that if X and Y are independent, then so are e
X
and Y
2
.
Remark If X and Y are independent, and g and h are (Borel) functions (e.g. continuous functions), then
g(X) and h(Y ) are independent.
Theorem If X and Y are independent, then
1. E(XY ) = E(X)E(Y )
2. Cov(X, Y ) = (X, Y ) = 0
3. Var(aX +bY ) = a
2
Var(X) +b
2
Var(Y )
Remark However, if Cov(X, Y ) = 0, it doesnt mean X and Y are independent. Suppose X takes values
1, 0, 1 with equal probability
1
3
, and let Y = X
2
. Clearly, X and Y are dependent (not independent),
but Cov(X, Y ) = 0.
Denition The random variables X
1
, X
2
, . . . , X
n
are independent if for any collection A
1
, A
2
, . . . , A
n
of
subsets of R formed by taking countable unions, countable intersections, and complements, of intervals,
P(X A
1
, X
2
A
2
, . . . , X
n
A
n
) = P(X
1
A
1
)P(X
2
A
2
) P(X
n
A
n
).
Theorem If X
1
, X
2
, . . . , X
n
are independent, then f
X
1
,X
2
,...,Xn
(x
1
, x
2
, . . . , x
n
) = f
X
1
(x
1
)f
X
2
(x
2
) f
Xn
(x
n
),
F
X
1
,X
2
,...,Xn
(x
1
, x
2
, . . . , x
n
) = F
X
1
(x
1
)F
X
2
(x
2
) F
Xn
(x
n
) and E[X
1
X
2
X
n
] = E[X
1
]E[X
2
] E[X
n
].
Example 33 Consider k urns, each holding n chips, numbered 1 through n. A chip is to be drawn at
random from each urn. What is the probability that all k chips will bear the same number?
Example 34 Suppose that X
1
, X
2
, X
3
, X
4
are independent random variables, each with pdf f
X
i
(x
i
) = 4x
3
i
,
0 x 1. Find
MA 295S.10 Elements of Probability 15 Summer 2013 (R. Eden)
1. P(X
1
<
1
2
)
2. P(exactly one X
i
<
1
2
)
3. f
X
1
,X
2
,X
3
,X
4
(x
1
, x
2
, x
3
, x
4
)
4. F
X
2
,X
3
(x
2
, x
3
).
Example 35 Let X and Y be independent random variables each geometrically distributed with parameter
p, i.e., f
X
(x) = p(1 p)
x
, x = 0, 1, 2, . . ..
1. Find the distribution of min(X, Y ) and of max(X, Y ).
2. Find P(min(X, Y ) = X) = P(Y X).
3. Find the distribution of X +Y .
4. Find P(Y = y|X +Y = z) for y = 0, 1, . . . , z.
Example 36 A random sample of 8 independent and identically distributed random variables X
1
, X
2
, . . . , X
8
is obtained. Each has density function f(x) =
1
4
x
3
, 0 < x < 2. Let Y = min(X
1
, X
2
, . . . , X
8
) and
Z = max(X
1
, X
2
, . . . , X
8
). Find the densities of Y and of Z.
MA 295S.10 Elements of Probability 16 Summer 2013 (R. Eden)
2.8 Problems
1. Let F be the CDF of a r.v. X (make no assumptions that it is discrete or continuous). Prove that
lim
x
F(x) = 1 and lim
x
F(x) = 0.
Hint: To prove the rst statement, note that it suces to show that if {y
i
} is any increasing sequence
of numbers with y
n
, then lim
yn
F(y
n
) = 1. Now, dene the sequence of events {A
i
} by
A
1
= {X y
1
} and A
n
= {y
n1
< X y
n
} for n = 2, 3, . . .. What is

_
n=1
A
n
? Express F(y
n
) as the
probability of the union of some of the events A
i
.
2. Let p (0, 1), and X a continuous r.v. X. If x
p
is a number such that F(x
p
) = P[X x
p
] = p, then
x
p
is referred to as the pth quantile of X. If X Exp(), nd x
0.5
(the median), x
0.25
(the lower
quartile) and x
0.75
(the upper quantile).
3. The joint density of X and Y is f
X,Y
(x, y) = 10xy
2
for 0 < x < y < 1.
(a) Find the marginal densities of X and of Y . Use these to nd the cumulative distributions of X
and of Y .
(b) Use f
X,Y
to derive the joint cumulative distribution F
X,Y
(x, y) =
5
3
x
2
y
3

2
3
x
5
for points (x, y)
in the support.
(c) Find lim
y
F
X,Y
(x, y) = F
X,Y
(x, 1) and lim
x
F
X,Y
(x, y) = F
X,Y
(y, y). Compare these answers
with those from (a).
(d) For general r.v.s X and Y with joint distribution, explain why lim
y
F
X,Y
(x, y) = F
X
(x) and
lim
x
F
X,Y
(x, y) = F
Y
(y).
4. Let the r.v. X take the values 2
k
and 2
k
for k = 2, 3, . . . with PMF p dened by p(2
k
) = p(2
k
) =
2
k
. Show that p is a valid PMF. Note that p is symmetric around 0; is E(X) = 0?
5. Let N be a positive integer. Suppose the r.v. X has PMF p(x) =
2x
N(N + 1)
for x = 1, 2, . . . , N.
Show that p is a valid PMF, and nd the mean of X.
6. The speed of a molecule in a uniform gas at equilibrium is a random variable V whose probability
distribution is f(v) = kv
2
e
bv
2
, v > 0, where k is an appropriate constant and b depends on the
absolute temperature and mass of the molecule. Find the probability distribution of the kinetic
energy of the molecule, W, where W = mV
2
/2.
7. If X Poi(), nd E[(1 +X)
1
].
8. The gamma function is dened as the function (k) =
_

0
t
k1
e
t
dt for k > 0. It has the property
(z + 1) = z(z) for any z > 0. Let k > 0 and > 0. If a random variable X has density
f(x) = cx
k1
e
x/
, x > 0
for some constant c, we say that X has Gamma distribution with parameters k and .
(a) Show that c = 1/[(k)
k
]. Find E(X) and Var(X) in terms of k and .
(b) The skewness and kurtosis of a r.v. X are dened respectively as E
_
_
X

_
3
_
and E
_
_
X

_
4
_
.
Find the kurtosis and skewness of X above.
MA 295S.10 Elements of Probability 17 Summer 2013 (R. Eden)
(c) If a unique x

exists such that f(x) is maximized at x = x

, then we say x

is the mode of X.
Show that X above has a mode only if k > 1. In this case, what is the mode?
9. Show that Var(X) = min
aR
E[(X a)
2
] by letting h(a) = E[(X a)
2
] and nding the minimum value
of the quadratic function h.
10. Suppose an experiment having r possible outcomes 1, 2, . . . , r that occur with probabilities p
1
, . . . , p
r
is repeated n times. Let X be the number of times the rst outcome occurs, and Y the number of
times the second outcome occurs. Show that
(X, Y ) =
_
p
1
p
2
(1 p
1
)(1 p
2
)
.
Hint: Let I
i
= 1 if the ith trial yields outcome 1, and I
i
= 0 otherwise. Similarly, let J
i
= 1 if the
ith trial yields outcome 2, and J
i
= 0 otherwise. Then X = I
1
+ I
n
and Y = J
1
+ +J
n
. Show
that E(I
i
J
i
) = 0 and if i = j, E(I
i
J
j
) = p
1
p
2
. Note too that I
1
, . . . , I
n
are independent, and so are
J
1
, . . . , J
n
.
11. Let X
n
take on the values 1, 2, . . . , n, each with probability 1/n. Dene Y
n
to be X
2
n
. Find (X
n
, Y
n
)
and lim
n
(X
n
, Y
n
).
Hint:

n
i=1
i =
n(n+1)
2
,

n
i=1
i
2
=
n(n+1)(2n+1)
6
,

n
i=1
i
3
=
_
n(n+1)
2
_
2
,

n
i=1
i
4
=
n(n+1)(2n+1)(3n
2
+3n1)
30
.
12. Let X have uniform distribution on (0, 1), which means f
X
(x) = 1, 0 < x < 1. Find the density of
Y =
1
ln(1 X) for > 0.
13. Let X be a r.v. with mass function
p(x) =
_
1/18 x = 1, 3
16/18 x = 2
.
Find a value of such that P(|X | ) = Var(X)/
2
. This shows that in general, the bound
given by Chebyshevs inequality cannot be improved.
14. Show that if X and Y have nite variance, then Cov(X, Y ) =
1
4
[Var(X +Y ) Var(X Y )] and
Var(X +Y ) + Var(X Y ) = 2 [Var(X) + Var(Y )].
15. Show that if the covariances are nite, Cov(X +W, Y +Z) = Cov(X, Y ) +Cov(X, Z) +Cov(W, Y ) +
Cov(W, Z).
16. Let X and Y have nite positive variance, and m, n, p, q real numbers with n, q nonzero. Show that
(m+nX, p +qY ) = (X, Y ) if nq > 0
and
(m+nX, p +qY ) = (X, Y ) if nq < 0.
17. Let X and Y be independent random variables each having geometric distribution with parameter
p. Set Z = Y X and M = min(X, Y ).
(a) Show that for integers z and m 0,
p
M,Z
(m, z) =
_
P(X = mz)P(Y = m) z < 0,
P(X = m)P(Y = m+z) z 0
.
(b) Conclude from (a) for for integers z and m 0, p
M,Z
(m, z) = p
2
(1 p)
2m
(1 p)
|z|
.
(c) Find the marginal distributions of M and Z. Are M and Z independent?
MA 295S.10 Elements of Probability 18 Summer 2013 (R. Eden)
3. Special Distributions
3.x Conditional Distributions
Denition The conditional PMF of a discrete r.v. X, conditioned on an event A with P(A) > 0, is
dened by
p
X|A
(x) = P(X = x|A) =
P({X = x} A)
P(A)
Example 1 Let X be the roll of a die and let A = {the roll is an even number}. Find p
X|A
(x)
If X and Y are discrete r.v.s in the same probability space, the above denition can be specialized to
events A of the form {Y = y}.
Denition Let X and Y be discrete r.v.s. The conditional PMF of X, given Y = y, is
p
X|Y
(x|y) = p
X
(x|y) = p
X|y
(x) = P(X = x|Y = y) =
p
X,Y
(x, y)
p
Y
(y)
for p
Y
(y) = 0.
Example 2 A fair coin is tossed ve times. Let Y denote the total number of heads that occur, and X
the number of heads occuring on the last two tosses. Find the conditional PMF p
Y |x
(y) for each x.
The conditional PMF can be used to calculate the marginal PMFs:
p
X
(x) =

y
p
X,Y
(x, y) =

y
p
Y
(y)p
X|Y
(x|y).
Example 3 A transmitter sends messages over a computer network. X is the travel time of a given
message and Y is the length of the given message. Suppose
p
Y
(y) =
_
5/6 if y = 10
2
1/6 if y = 10
4
.
The travel time depends on the message length. In particular,
p
X|Y
(x|y) =
_

_
1/2 if x = 10
4
y
1/3 if x = 10
3
y
1/6 if x = 10
2
y
.
Find the distribution of X. Use this to nd E(X).
Conditional expectations are dened as expected. For example,
E[g(X)|A] =

x
g(x)p
X|A
(x|A) E[X|Y = y] =

x
xp
X|Y
(x|y).
Example 32 (continued) For the previous example, nd E[X|Y = 10
2
] and E[X|Y = 10
4
].
MA 295S.10 Elements of Probability 19 Summer 2013 (R. Eden)
Theorem (Total Expectation Theorem) E(X) =

y
p
Y
(y)E[X|Y = y]. In general, if A
1
, A
2
, , A
n
form
a partition of with P(A
i
) > 0 for each i, then E(X) =
n

i=1
P(A
i
)E[X|A
i
].
This means the unconditional average can be obtained by averaging the conditional average. It is also
instructive to compare this with P(X = x) =

y
p
Y
(y)P[X = x|Y = y].
Example 32 (continued) Use the conditional expectations of X given Y to nd E(X).
MA 295S.10 Elements of Probability 20 Summer 2013 (R. Eden)

You might also like