You are on page 1of 11

Chapter 5

Week 5: Expectation for


random variables

5.1 Discrete random variables


We introduced above a median as a characteristic of a ”center” of the
distribution. There is an alternative and more popular parameter of
distribution that describes its center.
Definition: Let X be a discrete random variable with frequency p(x)
such that

|x|p(x) < +∞.
x

The expectation of X (or the mean value of X, or the expected value


of X) is ∑
E(X) = xp(x)
x

Note that condition that x |x|p(x) < +∞ is always satisfied
if the random variable takes values at a finite set.

Notation: It is acceptable to write EX instead of E(X).

Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p
We have that EX = 1 · p + 0 · (1 − p) = p.
Note that X takes values 0 or 1 only, and it never takes the value
p = EX, if p ∈ (0, 1). This is why the term ”the expected value”
should not be taken literally.
Note that the median of Bernoulli distribution is not a very useful
parameter. If p = 1/2 than
P(X ≤ a) = P(X = 0) = P(X = 1) = P(X ≥ a)

1
2CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES

for any a ∈ (0, 1), i.e., any point a ∈ (0, 1) is a median.


Example
If we toss a single die once and X is the outcome, then

X 1 2 3 4 5 6
1 1 1 1 1 1
p(x) = P(X = x)
6 6 6 6 6 6

1 1 1 1 1 1
E(X) = 1 × +2× +3× +4× +5× +6×
6 6 6 6 6 6
= 3.5

Example
A pair of dice are tossed once and X is the sum of the outcomes:

X 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
p(x)
36 36 36 36 36 36 36 36 36 36 36
= P(X = x)


12
1 2 1
EX = xf (x) = 2 × +3× + . . . + 12 × =7
36 36 36
x=2

Example
A gambler bets on the coin tossing game: he pays $100 for head and
receives $100 for the tail. What is his expected gain?
Solution: let X be the gain. It is a random variable with
frequency pX (x) given by

x -100 100
pX (x) 1/2 1/2
It gives EX = 100(1/2) + 100(−1/2) = 0. This is a fair
game!

Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We have

P(X = −1) = 20/38, P(X = 1) = 18/38,

Hence

EX = −1 · 20/38 + 1 · 18/38 = −1/19.

Your expected loss is about $.05. This game gives an advantage to


the casino; it covers casino business expenses.
5.2. EXPECTATION FOR CONTINUOUS RANDOM VARIABLES 3

Example
Consider roulette wheel again. If you bet $1 that the number 1 comes
up, you win $35 or lose $1 according to whether or not that event
occurs. Let X be the gain. We have

P(X = −1) = 37/38, P(X = 35) = 1/38,

Hence

EX = −1 · 37/38 + 35 · 1/38 = −1/19.

Your expected loss is about $0.05 again, Note that the expected value
is the same as for the previous example.

5.2 Expectation for continuous random vari-


ables

Definition: Let X be a continuous random variable with density


f (x) such that
∫ ∞
|x|f (x)dx < +∞.
−∞

The expectation of X (or the mean value of X, or the expected value


of X) is ∫ ∞
EX = E(X) = xf (x)dx.
−∞

Example
Consider the uniform probability function over (0, a).
{ 1
, 0≤x≤a
f (x) = a
0, elsewhere
⇒ a
∫ a
x 1 x2 a
E(X) = dx = = .
0 a a 2 0 2

Example
Consider the exponential distribution
{
λe−λx , x ≥ 0
f (x) =
0, elsewhere

∫ +∞
1
EX = E(X) = xλe−λx dx = .
0 λ
4CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES

5.3 Case of the distributions with heavy tails


(fat tails)
It may happen that
∫ ∞
|x|f (x)dx = +∞
−∞

(i.e., the integral diverges). It happens if f (x) is not decaying fast


enough when |x| → +∞ ∫∞(the case of heavy (fat) tail distribution).
Technically, the integral −∞ xf (x)dx is not defined in this case.

It is common to accept the following rules:


(A) If
∫ 0 ∫ ∞
xf (x)dx > −∞, xf (x)dx = +∞
−∞ 0

then EX = +∞ (and, therefore, it is defined).


(B) If
∫ 0 ∫ ∞
xf (x)dx = −∞, xf (x)dx < +∞
−∞ 0

then EX = −∞ (and, therefore, it is defined).

The only case when EX is not defined in any sense is when


∫ 0 ∫ ∞
xf (x)dx = −∞, xf (x)dx = +∞.
−∞ 0

Similar rules are often accepted for the case of discrete random
variables.

Example
Cauchy distribution: Consider a random variable X = X1 /X2 ,
where Xi ∼ N (0, 1). We found on Week 4 that X has density

1
f (x) = .
π(1 + x2 )

(Cauchy density). We have that


∫ 0 ∫ ∞
x x
dx = −∞, dx = +∞.
−∞ π(1 + x2 ) 0 π(1 + x2 )

This means that EX is not defined.


5.4. EXPECTATIONS OF FUNCTIONS OF RANDOM VARIABLES5

5.4 Expectations of functions of random vari-


ables
Discrete random varibales
Theorem Let g(x) be a function, g : R → R. Let X be a discrete
random variable with frequency p(x) such that

|g(x)|p(x) < +∞.
x

Let Y = g(X). Then



EY = g(x)p(x)
x

Proof. Assume that X take values xk , k = 1, 2, ..., and P(X =


xk ) = pX (xk ). It follows that Y take values yk = g(xk ), k =
1, 2, .... Assume first that, for any yk , there is an unique xk such
that yk = g(xk ). Then P(Y = yk ) = pX (xk ). Then the frequency
function for Y is pY (yk ) = pX (xk ). We have
∑ ∑ ∑
EY = yk pY (yk ) = g(xk )pX (xk ) = g(x)p(x).
k k x

The case when there are more than one x such that yk = g(x)
requires additional analysis, We have

P(Y = yk ) = pX (xj ).
j: g(xj )=yk

Hence
∑ ∑ ∑
EY = yk pY (yk ) = yk pX (xj )
k k j: g(xj )=yk
∑ ∑ ∑
= g(xj )pX (xj ) = g(x)p(x).
k j: g(xj )=yk x

Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p

We have that E(X 2 ) = 12 · p + 02 · (1 − p) = p.

Notations:
It is common to write EX 2 instead of E(X 2 ). Therefore, EX 2
means E(X 2 ) rather than (EX)2 .
6CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES

Continuous random variables


Let g(x) be a function, g : R → R. Let X be a continuous random
variable with density f (x) such that
∫ ∞
|g(x)|f (x)dx < +∞.
−∞

Then expectation of Y = g(X) is


∫ ∞
EY = g(x)f (x)dx.
−∞

Example
Consider the uniform distribution over (0, a).
{ 1
, 0≤x≤a
f (x) = a
0, elsewhere
⇒ a
∫ a 2
2 x 1 x3 a2
E(X ) = dx = = .
0 a a 3 0 3

Example
Consider the exponential distribution
{
λe−λx , x ≥ 0
f (x) =
0, elsewhere
then
∫ +∞
2
2
EX = E(X ) = 2
x2 λe−λx dx = 2 .
0 λ

5.5 Joint random variables


Theorem Let X and Y be jointly distributed with probability distri-
bution function or probability function f (x, y). hen
{ ∑ ∑
y g(x, y)f (x, y) if X, Y are discrete
Eg(X, Y ) = ∫ ∞x ∫ ∞
−∞ −∞ g(x, y)f (x, y)dxdy if X, Y are continuous.

Example
Consider discrete random variable with frequency
{ 1
(x + y), x = 0, 1, 2; y = 0, 1, 2, 3
p(x, y) = 30
0, elsewhere.

Then

2 ∑
3
1
E(XY ) = xy (x + y)
30
x=0 y=0
5.5. JOINT RANDOM VARIABLES 7

It gives

1 ∑
2
E(XY ) = {x × 0 × (x + 0) + x × 1 × (x + 1)
30
x=0
+x × 2 × (x + 2) + x × 3 × (x + 3)}
1 ∑
2
= {x(x + 1) + 2x(x + 2) + 3x(x + 3)}
30
x=0
1
= {0(0 + 1) + 2 × 0 × (0 + 2) + 3 × 0 × (0 + 3)
30
+1(1 + 1) + 2 × 1 × (1 + 2) + 3 × 1 × (1 + 3)
+2(2 + 1) + 2 × 2 × (2 + 2) + 3 × 2 × (2 + 3)}
72
= = 2.4.
30

Example
In the previous example


2 ∑
3
(x + y)2 98
E(X + Y ) = = = 3.266.
30 30
x=0 y=0

Special cases
Let g(X, Y ) = X. Let us verify that Eg(X, Y ) = E(X).
We have

{ ∑ ∑
y xf (x, y), if X, Y are discrete
E(X) = ∫ ∞x ∫ ∞
−∞ −∞ xf (x, y)dxdy, if X, Y are continuous

In the case when X and Y are continuous,


∫ ∞ ∫ ∞ ∫ ∞ {∫ ∞ }
E(X) = xf (x, y)dxdy = x f (x, y)dy dx
−∞ −∞ −∞ −∞
∫ ∞
= xfX (x)dx,
−∞

∫∞
where fX (x) ≡ −∞ f (x, y)dy is the marginal density of X.
Similar conclusions for discrete random variables.
Theorem Let a be a constant. Then

E(a) = a.

Proof: a can be described as a discrete random variable that takes


only one value, a. Then Ea = a · 1.
8CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES

5.6 Expectation of linear combination of ran-


dom variables
Theorem Let X be a random variable, a ∈ R. Then

E(aX) = aEX.

Proof: We consider continuous case only. Take g(x) = ax. We


have
∫ ∞ ∫ ∞
E(aX) = axf (x)dx = a xf (x)dx = aEX.
−∞ −∞

Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We found above that

EX = −$1/19 ∼ −$0.05.

If you bet $100 that an odd non-zero number comes up, you win or
lose is Y = 100X, and EY = 100EX ∼ −$5.

Theorem Let X be a random variable, a ∈ R. Then E(a +


X) = a + EX.
Proof: We consider continuous case only. Take g(x) = a + x.
We have
∫ ∞
E(a + X) = (a + x)f (x)dx
∫ ∞ ∫ ∞ −∞
=a f (x)dx + xf (x)dx = a + EX.
−∞ −∞

Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs. Let
X be the gain. We found above that EX = −$1/19 ∼ −$0.05.
Assume that you have $10 in your pocket, and let Y be the total
amount of money in your pocket after the game. Then Y = $10 + X
and EY = $10 + EX ∼ $9.95.

Theorem Let X and Y be random variables, a ∈ R. Then


E(X + Y ) = EX + EY .
5.6. EXPECTATION OF LINEAR COMBINATION OF RANDOM VARIABLES9

Proof: We consider continuous case only. Take g(x, y) = x + y.


We have
∫ ∞∫ ∞
E(X + Y ) = (x + y)f (x, y)dxdy
−∞ −∞
∫ ∞ {∫ ∞ } ∫ ∞ {∫ ∞ }
= x f (x, y)dy dx + y f (x, y)dx
−∞ −∞ −∞ −∞
∫ ∞ ∫ ∞
= xfX (x)dx + yfY (y)dy
−∞ −∞
= EX + EY.
Here fX (x) and fY (y) are the marginal distributions of X and Y,
respectively.
Example
A roulette wheel has the numbers 1,2,..,36, as well as 0 and 00 (total
38 numbers). If you bet $1 that an odd non-zero number comes up,
you win or lose $1 according to whether or not that event occurs.
Let X be the gain. If you bet another $1 that the number 1 comes
up, you win $35 or lose $1 according to whether or not that event
occurs. Let Y be the gain. We found above that EX = EY =
−$1/19 ∼ −$0.05. The expected gain for this combined $2 bet is
E(X + Y ) = −2/19 ∼ −$0.1.

Corollary Let X1 , ..., Xn be random variables, ai ∈ R,



n
Z = a0 + ai Xi .
i=1

Then

n
EZ = a0 + ai Xi .
i=1

Example
Let X be the total number of successes in n Bernoulli trias. We have
that X = X1 + .... + Xn , where Xi are Bernoulli variables such that
{
1, with probability p
Xi =
0, with probability 1 − p

We have that EXi = 1 · p + 0 · (1 − p) = p.


Hence EX = p + ... + p = np.
Corollary Let S(X) and T (X) be functions of X. Then
E(S(X) + T (X)) = E(S(X)) + E(T (X)).
Corollary Let X and Y be jointly distributed, and let g and h
be functions of the random variables X and Y. Then E[g(X, Y )] +
h(X, Y )] = E[g(X, Y )] + h(X, Y )]
10CHAPTER 5. WEEK 5: EXPECTATION FOR RANDOM VARIABLES

5.7 Expectation of a product


Is it correct that E(XY ) = E(X)E(Y )?
The answer is no. For example, take random variable X such
that
{
1, with probability1/2
X=
−1, with probability 1/2
We have that EX = 0, but X 2 = 1 with probability 1 and
EX 2 = 1 ̸= EX · EX = 0.
Theorem If X and Y are independent random variables, then
E(XY ) = E(X)E(Y )
Proof (for continuous case).
∫ ∞ ∫ ∞
E(XY ) = xyf (x, y)dxdy.
−∞ −∞

Since X and Y are independent, we have f (x, y) = fX (x)fY (y),


where fX (x) and fY (y) are the marginal distributions of X and Y,
respectively. Hence
∫ ∞∫ ∞ ∫ ∞∫ ∞
E(XY ) = xyf (x, y)dxdy = xyfX (x)fY (y)dxdy
−∞ −∞ −∞ −∞
∫ ∞ ∫ ∞
= xfX (x)dx yfY (y)dy = E(X)E(Y ).
−∞ −∞

Note: The condition that E(XY ) = E(X)E(Y ) is neces-


sary but not sufficient for independency of X and Y . That is,
E(XY ) = E(X)E(Y ) does not necessarily imply that X and Y are
independent.

5.8 Probability of an event


Let A be a a random event. Consider a random variable IA such that
{
1, if event A occurs
IA =
0, otherwise
Theorem EIA = P(A).
*

5.9 Expectation in the axiomatic setting


Let Ω be the sample space, and let X(ω) be a random variable, i.e.,
it is a mapping,
∫ X : Ω → R. In this setting, it is common to write
EX = Ω X(ω)P(dω)
meaning that the expectation is a kind of an integral. Modern proba-
bility theory gives an interpretation to this integration.
5.10. MOMENTS 11

5.10 Moments

Definition: The kth moment of the random variable is defined as



µk = E(X k )

It is usually denoted by µk .
By the rule of expectation, we have that
{ ∑ k
x f (x) if X is discrete
µk = ∫ ∞x k

−∞ x f (x)dx if X is continuous

k = 1, 2, ... The first moment (i.e., EX) is called the mean of X and
is often denoted by µ, i.e.,

µ1 = E(X) = mean of X ≡ µ.

Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p

We have that µ′k = E(X k ) = 1k · p + 0k · (1 − p) = p.

Definition: The kth central moment (or the moment about the mean)
of the random variable X is defined as E[(X − µ)k ]. It is usually
denoted by µk . By the rule of expectation, we have that
{ ∑
(x − µ)k f (x) if X is discrete
µk = ∫ ∞x
−∞ (x − µ) f (x)dx if X is continuous
k

Example
Let X be a Bernoulli random variable such that
{
1, with probability p
X=
0, with probability 1 − p

We have that µ = EX = p,

µk = E(X − µ)k = (1 − p)k · p + (−p)k · (1 − p).

In particular,

µ2 = (1 − p)2 · p + (−p)2 · (1 − p) = p − 2p2 + p3 − p2 − p3


= p − p2 = p(1 − p).

You might also like