Week 5 Lectures

Chapter 5
Week 5: Expectation for

random variables
5.1 Discrete random variables

We introduced above a median as a characteristic of a ”center” of the
distribution. There is an alternative and more popular parameter of
distribution that describes its center.
Definition: Let X be a discrete random variable with frequency p(x)
such that
∑
|x|p(x) < +∞.
x
The expectation of X (or the mean value of X, or the expected value

of X) is ∑
E(X) = xp(x)
x
∑
Note that condition that x |x|p(x) < +∞ is always satisfied
if the random variable takes values at a finite set.
Notation: It is acceptable to write EX instead of E(X).
Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p
We have that EX = 1 · p + 0 · (1 − p) = p.
Note that X takes values 0 or 1 only, and it never takes the value
p = EX, if p ∈ (0, 1). This is why the term ”the expected value”
should not be taken literally.
Note that the median of Bernoulli distribution is not a very useful
parameter. If p = 1/2 than
P(X ≤ a) = P(X = 0) = P(X = 1) = P(X ≥ a)
1
2CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES
for any a ∈ (0, 1), i.e., any point a ∈ (0, 1) is a median.

Example
If we toss a single die once and X is the outcome, then
X 1 2 3 4 5 6
1 1 1 1 1 1
p(x) = P(X = x)
6 6 6 6 6 6
1 1 1 1 1 1
E(X) = 1 × +2× +3× +4× +5× +6×
6 6 6 6 6 6
= 3.5
Example
A pair of dice are tossed once and X is the sum of the outcomes:
X 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
p(x)
36 36 36 36 36 36 36 36 36 36 36
= P(X = x)
∑
12
1 2 1
EX = xf (x) = 2 × +3× + . . . + 12 × =7
36 36 36
x=2
Example
A gambler bets on the coin tossing game: he pays $100 for head and
receives $100 for the tail. What is his expected gain?
Solution: let X be the gain. It is a random variable with
frequency pX (x) given by
x -100 100
pX (x) 1/2 1/2
It gives EX = 100(1/2) + 100(−1/2) = 0. This is a fair
game!
Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We have
P(X = −1) = 20/38, P(X = 1) = 18/38,
Hence
EX = −1 · 20/38 + 1 · 18/38 = −1/19.
Your expected loss is about $.05. This game gives an advantage to

the casino; it covers casino business expenses.
5.2. EXPECTATION FOR CONTINUOUS RANDOM VARIABLES 3
Example
Consider roulette wheel again. If you bet $1 that the number 1 comes
up, you win $35 or lose $1 according to whether or not that event
occurs. Let X be the gain. We have
P(X = −1) = 37/38, P(X = 35) = 1/38,
Hence
EX = −1 · 37/38 + 35 · 1/38 = −1/19.
Your expected loss is about $0.05 again, Note that the expected value
is the same as for the previous example.
5.2 Expectation for continuous random vari-

ables
Definition: Let X be a continuous random variable with density

f (x) such that
∫ ∞
|x|f (x)dx < +∞.
−∞
The expectation of X (or the mean value of X, or the expected value

of X) is ∫ ∞
EX = E(X) = xf (x)dx.
−∞
Example
Consider the uniform probability function over (0, a).
{ 1
, 0≤x≤a
f (x) = a
0, elsewhere
⇒ a
∫ a
x 1 x2 a
E(X) = dx = = .
0 a a 2 0 2
Example
Consider the exponential distribution
{
λe−λx , x ≥ 0
f (x) =
0, elsewhere
⇒
∫ +∞
1
EX = E(X) = xλe−λx dx = .
0 λ
5.3 Case of the distributions with heavy tails

(fat tails)
It may happen that
∫ ∞
|x|f (x)dx = +∞
−∞
(i.e., the integral diverges). It happens if f (x) is not decaying fast

enough when |x| → +∞ ∫∞(the case of heavy (fat) tail distribution).
Technically, the integral −∞ xf (x)dx is not defined in this case.
It is common to accept the following rules:

(A) If
∫ 0 ∫ ∞
xf (x)dx > −∞, xf (x)dx = +∞
−∞ 0
then EX = +∞ (and, therefore, it is defined).

(B) If
∫ 0 ∫ ∞
xf (x)dx = −∞, xf (x)dx < +∞
−∞ 0
then EX = −∞ (and, therefore, it is defined).
The only case when EX is not defined in any sense is when

∫ 0 ∫ ∞
xf (x)dx = −∞, xf (x)dx = +∞.
−∞ 0
Similar rules are often accepted for the case of discrete random
variables.
Example
Cauchy distribution: Consider a random variable X = X1 /X2 ,
where Xi ∼ N (0, 1). We found on Week 4 that X has density
1
f (x) = .
π(1 + x2 )
(Cauchy density). We have that

∫ 0 ∫ ∞
x x
dx = −∞, dx = +∞.
−∞ π(1 + x2 ) 0 π(1 + x2 )
This means that EX is not defined.

5.4. EXPECTATIONS OF FUNCTIONS OF RANDOM VARIABLES5
5.4 Expectations of functions of random vari-

ables
Discrete random varibales
Theorem Let g(x) be a function, g : R → R. Let X be a discrete
random variable with frequency p(x) such that
∑
|g(x)|p(x) < +∞.
x
Let Y = g(X). Then

∑
EY = g(x)p(x)
x
Proof. Assume that X take values xk , k = 1, 2, ..., and P(X =

xk ) = pX (xk ). It follows that Y take values yk = g(xk ), k =
1, 2, .... Assume first that, for any yk , there is an unique xk such
that yk = g(xk ). Then P(Y = yk ) = pX (xk ). Then the frequency
function for Y is pY (yk ) = pX (xk ). We have
∑ ∑ ∑
EY = yk pY (yk ) = g(xk )pX (xk ) = g(x)p(x).
k k x
The case when there are more than one x such that yk = g(x)
requires additional analysis, We have
∑
P(Y = yk ) = pX (xj ).
j: g(xj )=yk
Hence
∑ ∑ ∑
EY = yk pY (yk ) = yk pX (xj )
k k j: g(xj )=yk
∑ ∑ ∑
= g(xj )pX (xj ) = g(x)p(x).
k j: g(xj )=yk x
Example
{
X=
We have that E(X 2 ) = 12 · p + 02 · (1 − p) = p.
Notations:
It is common to write EX 2 instead of E(X 2 ). Therefore, EX 2
means E(X 2 ) rather than (EX)2 .
Continuous random variables

Let g(x) be a function, g : R → R. Let X be a continuous random
variable with density f (x) such that
∫ ∞
|g(x)|f (x)dx < +∞.
−∞
Then expectation of Y = g(X) is

∫ ∞
EY = g(x)f (x)dx.
−∞
Example
Consider the uniform distribution over (0, a).
{ 1
, 0≤x≤a
f (x) = a
0, elsewhere
⇒ a
∫ a 2
2 x 1 x3 a2
E(X ) = dx = = .
0 a a 3 0 3
Example
Consider the exponential distribution
{
λe−λx , x ≥ 0
f (x) =
0, elsewhere
then
∫ +∞
2
2
EX = E(X ) = 2
x2 λe−λx dx = 2 .
0 λ
5.5 Joint random variables

Theorem Let X and Y be jointly distributed with probability distri-
bution function or probability function f (x, y). hen
{ ∑ ∑
y g(x, y)f (x, y) if X, Y are discrete
Eg(X, Y ) = ∫ ∞x ∫ ∞
−∞ −∞ g(x, y)f (x, y)dxdy if X, Y are continuous.
Example
Consider discrete random variable with frequency
{ 1
(x + y), x = 0, 1, 2; y = 0, 1, 2, 3
p(x, y) = 30
0, elsewhere.
Then
∑
2 ∑
3
1
E(XY ) = xy (x + y)
30
x=0 y=0
5.5. JOINT RANDOM VARIABLES 7
It gives
1 ∑
2
E(XY ) = {x × 0 × (x + 0) + x × 1 × (x + 1)
30
x=0
+x × 2 × (x + 2) + x × 3 × (x + 3)}
1 ∑
2
= {x(x + 1) + 2x(x + 2) + 3x(x + 3)}
30
x=0
1
= {0(0 + 1) + 2 × 0 × (0 + 2) + 3 × 0 × (0 + 3)
30
+1(1 + 1) + 2 × 1 × (1 + 2) + 3 × 1 × (1 + 3)
+2(2 + 1) + 2 × 2 × (2 + 2) + 3 × 2 × (2 + 3)}
72
= = 2.4.
30
Example
In the previous example
∑
2 ∑
3
(x + y)2 98
E(X + Y ) = = = 3.266.
30 30
x=0 y=0
Special cases
Let g(X, Y ) = X. Let us verify that Eg(X, Y ) = E(X).
We have
{ ∑ ∑
y xf (x, y), if X, Y are discrete
E(X) = ∫ ∞x ∫ ∞
−∞ −∞ xf (x, y)dxdy, if X, Y are continuous
In the case when X and Y are continuous,

∫ ∞ ∫ ∞ ∫ ∞ {∫ ∞ }
E(X) = xf (x, y)dxdy = x f (x, y)dy dx
−∞ −∞ −∞ −∞
∫ ∞
= xfX (x)dx,
−∞
∫∞
where fX (x) ≡ −∞ f (x, y)dy is the marginal density of X.
Similar conclusions for discrete random variables.
Theorem Let a be a constant. Then
E(a) = a.
Proof: a can be described as a discrete random variable that takes

only one value, a. Then Ea = a · 1.
5.6 Expectation of linear combination of ran-

dom variables
Theorem Let X be a random variable, a ∈ R. Then
E(aX) = aEX.
Proof: We consider continuous case only. Take g(x) = ax. We

have
∫ ∞ ∫ ∞
E(aX) = axf (x)dx = a xf (x)dx = aEX.
−∞ −∞
Example
X be the gain. We found above that
EX = −$1/19 ∼ −$0.05.
If you bet $100 that an odd non-zero number comes up, you win or
lose is Y = 100X, and EY = 100EX ∼ −$5.
Theorem Let X be a random variable, a ∈ R. Then E(a +

X) = a + EX.
Proof: We consider continuous case only. Take g(x) = a + x.
We have
∫ ∞
E(a + X) = (a + x)f (x)dx
∫ ∞ ∫ ∞ −∞
=a f (x)dx + xf (x)dx = a + EX.
−∞ −∞
Example
X be the gain. We found above that EX = −$1/19 ∼ −$0.05.
Assume that you have $10 in your pocket, and let Y be the total
amount of money in your pocket after the game. Then Y = $10 + X
and EY = $10 + EX ∼ $9.95.
Theorem Let X and Y be random variables, a ∈ R. Then

E(X + Y ) = EX + EY .
5.6. EXPECTATION OF LINEAR COMBINATION OF RANDOM VARIABLES9
Proof: We consider continuous case only. Take g(x, y) = x + y.

We have
∫ ∞∫ ∞
E(X + Y ) = (x + y)f (x, y)dxdy
−∞ −∞
∫ ∞ {∫ ∞ } ∫ ∞ {∫ ∞ }
= x f (x, y)dy dx + y f (x, y)dx
−∞ −∞ −∞ −∞
∫ ∞ ∫ ∞
= xfX (x)dx + yfY (y)dy
−∞ −∞
= EX + EY.
Here fX (x) and fY (y) are the marginal distributions of X and Y,
respectively.
Example
you win or lose $1 according to whether or not that event occurs.
Let X be the gain. If you bet another $1 that the number 1 comes
up, you win $35 or lose $1 according to whether or not that event
occurs. Let Y be the gain. We found above that EX = EY =
−$1/19 ∼ −$0.05. The expected gain for this combined $2 bet is
E(X + Y ) = −2/19 ∼ −$0.1.
Corollary Let X1 , ..., Xn be random variables, ai ∈ R,

∑
n
Z = a0 + ai Xi .
i=1
Then
∑
n
EZ = a0 + ai Xi .
i=1
Example
Let X be the total number of successes in n Bernoulli trias. We have
that X = X1 + .... + Xn , where Xi are Bernoulli variables such that
{
Xi =
We have that EXi = 1 · p + 0 · (1 − p) = p.

Hence EX = p + ... + p = np.
Corollary Let S(X) and T (X) be functions of X. Then
E(S(X) + T (X)) = E(S(X)) + E(T (X)).
Corollary Let X and Y be jointly distributed, and let g and h
be functions of the random variables X and Y. Then E[g(X, Y )] +
h(X, Y )] = E[g(X, Y )] + h(X, Y )]
5.7 Expectation of a product

Is it correct that E(XY ) = E(X)E(Y )?
The answer is no. For example, take random variable X such
that
{
1, with probability1/2
X=
−1, with probability 1/2
We have that EX = 0, but X 2 = 1 with probability 1 and
EX 2 = 1 ̸= EX · EX = 0.
Theorem If X and Y are independent random variables, then
E(XY ) = E(X)E(Y )
Proof (for continuous case).
∫ ∞ ∫ ∞
E(XY ) = xyf (x, y)dxdy.
−∞ −∞
Since X and Y are independent, we have f (x, y) = fX (x)fY (y),

where fX (x) and fY (y) are the marginal distributions of X and Y,
respectively. Hence
∫ ∞∫ ∞ ∫ ∞∫ ∞
E(XY ) = xyf (x, y)dxdy = xyfX (x)fY (y)dxdy
−∞ −∞ −∞ −∞
∫ ∞ ∫ ∞
= xfX (x)dx yfY (y)dy = E(X)E(Y ).
−∞ −∞
Note: The condition that E(XY ) = E(X)E(Y ) is neces-

sary but not sufficient for independency of X and Y . That is,
E(XY ) = E(X)E(Y ) does not necessarily imply that X and Y are
independent.
5.8 Probability of an event

Let A be a a random event. Consider a random variable IA such that
{
1, if event A occurs
IA =
0, otherwise
Theorem EIA = P(A).
*
5.9 Expectation in the axiomatic setting

Let Ω be the sample space, and let X(ω) be a random variable, i.e.,
it is a mapping,
∫ X : Ω → R. In this setting, it is common to write
EX = Ω X(ω)P(dω)
meaning that the expectation is a kind of an integral. Modern proba-
bility theory gives an interpretation to this integration.
5.10. MOMENTS 11
5.10 Moments
Definition: The kth moment of the random variable is defined as

′
µk = E(X k )
′
It is usually denoted by µk .
By the rule of expectation, we have that
{ ∑ k
x f (x) if X is discrete
µk = ∫ ∞x k
′
−∞ x f (x)dx if X is continuous
k = 1, 2, ... The first moment (i.e., EX) is called the mean of X and
is often denoted by µ, i.e.,
′
µ1 = E(X) = mean of X ≡ µ.
Example
{
X=
We have that µ′k = E(X k ) = 1k · p + 0k · (1 − p) = p.
Definition: The kth central moment (or the moment about the mean)
of the random variable X is defined as E[(X − µ)k ]. It is usually
denoted by µk . By the rule of expectation, we have that
{ ∑
(x − µ)k f (x) if X is discrete
µk = ∫ ∞x
−∞ (x − µ) f (x)dx if X is continuous
k
Example
{
X=
We have that µ = EX = p,
µk = E(X − µ)k = (1 − p)k · p + (−p)k · (1 − p).
In particular,
µ2 = (1 − p)2 · p + (−p)2 · (1 − p) = p − 2p2 + p3 − p2 − p3

= p − p2 = p(1 − p).

Week 5 Lectures

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 5 Lectures

Uploaded by

Copyright:

Available Formats

Chapter 5

Week 5: Expectation for

5.1 Discrete random variables

The expectation of X (or the mean value of X, or the expected value

Notation: It is acceptable to write EX instead of E(X).

for any a ∈ (0, 1), i.e., any point a ∈ (0, 1) is a median.

P(X = −1) = 20/38, P(X = 1) = 18/38,

EX = −1 · 20/38 + 1 · 18/38 = −1/19.

Your expected loss is about $.05. This game gives an advantage to

P(X = −1) = 37/38, P(X = 35) = 1/38,

EX = −1 · 37/38 + 35 · 1/38 = −1/19.

5.2 Expectation for continuous random vari-

Definition: Let X be a continuous random variable with density

The expectation of X (or the mean value of X, or the expected value

5.3 Case of the distributions with heavy tails

(i.e., the integral diverges). It happens if f (x) is not decaying fast

It is common to accept the following rules:

then EX = +∞ (and, therefore, it is defined).

then EX = −∞ (and, therefore, it is defined).

The only case when EX is not defined in any sense is when

(Cauchy density). We have that

This means that EX is not defined.

5.4 Expectations of functions of random vari-

Let Y = g(X). Then

Proof. Assume that X take values xk , k = 1, 2, ..., and P(X =

We have that E(X 2 ) = 12 · p + 02 · (1 − p) = p.

Continuous random variables

Then expectation of Y = g(X) is

5.5 Joint random variables

In the case when X and Y are continuous,

Proof: a can be described as a discrete random variable that takes

5.6 Expectation of linear combination of ran-

Proof: We consider continuous case only. Take g(x) = ax. We

Theorem Let X be a random variable, a ∈ R. Then E(a +

Theorem Let X and Y be random variables, a ∈ R. Then

Proof: We consider continuous case only. Take g(x, y) = x + y.

Corollary Let X1 , ..., Xn be random variables, ai ∈ R,

We have that EXi = 1 · p + 0 · (1 − p) = p.

5.7 Expectation of a product

Since X and Y are independent, we have f (x, y) = fX (x)fY (y),

Note: The condition that E(XY ) = E(X)E(Y ) is neces-

5.8 Probability of an event

5.9 Expectation in the axiomatic setting

Definition: The kth moment of the random variable is defined as

We have that µ′k = E(X k ) = 1k · p + 0k · (1 − p) = p.

µk = E(X − µ)k = (1 − p)k · p + (−p)k · (1 − p).

µ2 = (1 − p)2 · p + (−p)2 · (1 − p) = p − 2p2 + p3 − p2 − p3

You might also like