Professional Documents
Culture Documents
Expectation
Two coins are tossed. How many heads can we expect to come up?
1/ 4, y = 0
1/ 4, y = 2
The answer seems to be 1 (the middle value).
But what exactly do we mean by expect?
A mental experiment: Suppose we toss the two coins 1000 times, and each time
record the number of heads, y.
The result would be something like 1,1,2,0,1,2,0,1,1,1,...,1,0.
Wed get about 250 zeros, 500 ones and 250 twos.
So the average of the 1000 values of Y would be approximately
250(0) + 500(1) + 250(2)
= 0(1/4) + 1(1/2) + 2(1/4) = 1.
1000
STAT2001_CH03B Page 2 of 7
1 p,
y =1
y = 0.
Thus for example, if we toss a fair coin thousands of times, and each time write 1
when a head comes up and 0 otherwise, we will get a sequence like 0,0,1,0,1,1,1,0,...
The average of these 1's and 0's will be about 1/2, corresponding to the fact that each
such number has a Bernoulli distribution with parameter 1/2 and thus a mean of 1/2.
=y
y =1
n!
p y (1 p )n y
y !(n y )!
(n 1)!
p y1 (1 p )( n1)( y1)
(
y
1)!((
n
1)
(
y
1))!
y =1
= np
= np
m!
p x (1 p) m x
x !( m x )!
= np
x=0
(x = y 1 and m = n 1)
This makes sense. Eg, if we roll a die 60 times, we can expect 60(1/6) = 10 sixes.
STAT2001_CH03B Page 3 of 7
Note: The text presents this equation as Theorem 3.2 and provides a proof for it. We
have instead defined the expected value of a function of a rv, with no need for a proof.
E (Y 2 ) = y 2 p( y ) = 02 (1 p) +12 p = p
y
Laws of expectation
1.
If c is a constant, then Ec = c.
2.
E{cg(Y)} = cEg(Y).
3.
E{g1 (Y ) + + g k (Y )} = Eg1 (Y ) + + Eg k (Y ) .
= 3p + p2
= 4p2 .
STAT2001_CH03B Page 4 of 7
2.
3.
4.
( = E (Y ) 2 ).
( = Var (Y ) ).
(a)
y =1
y=2
1
2 17
So 3 = EY 3 = y 3 p ( y ) = 13 + 23 = .
3 3
3
y
5 1
5 2 2
So = 2 = E (Y ) = ( y ) p( y ) = 1 + 2 = .
3 3
3 3 9
y
2
Hence = 2 / 3 = 0.4714 .
The various moments provide information about the nature of a distribution.
We have already seen that the mean provides a measure of central tendency.
The variance and standard deviation provide measures of dispersion.
Distributions that are highly disperse have a large variance.
STAT2001_CH03B Page 5 of 7
Example:
p(x)
p(y)
1.
VarY = EY 2 ( EY ) 2 .
2.
Var (a + bY ) = b 2VarY .
(Equivalently, 2 = 2 2 .)
Proof of 1:
LHS = E (Y ) 2 = E (Y 2 2Y + 2 ) = EY 2 2 EY + 2 = RHS.
Proof of 2:
LHS = E[(a + bY ) E (a + bY )]2 = E[a + bY a b]2 = b2 E (Y ) 2 = RHS.
STAT2001_CH03B Page 6 of 7
(See Thm 3.12 in the text for a general proof, and below for the case k = 1.)
dt
t = 0
We may also write m(1) (t ) as m(t ) , and m(2) (t ) as m(t ) , etc.
Notes:
For all mgf's, m(t), it is true that m(0) (0) = m(0) = EeY 0 = E1 = 1 .
Proof that = 1 = m(0) :
d
d
d
d
m(t ) = m(t ) = EeYt = e yt p ( y ) = e yt p ( y ) = ye yt p ( y ) .
dt
dt
dt y
y dt
y
So m(0) = ye y 0 p ( y ) = y 1 p ( y ) = EY = .
y
STAT2001_CH03B Page 7 of 7
Example 17 Use the mgf technique to find the mean and variance of the binomial
distribution.
Let Y ~ Bin(n,p). Then Y has mgf
n
n
n
n
m(t ) = EeYt = e yt p y (1 p ) n y = ( pet ) y (1 p )n y
y
y =0
y =0 y
= {( pet ) + (1 p )}n by the binomial theorem.
Thus m(t ) = (1 p + pet ) n .
Then, m(t ) =
dm(t )
= n(1 p + pet )n1 pet
dt
du du dv
=
, where here v = 1 p + pet and u = m(t ) = v n .)
dt
dv dt
d 2 m(t ) dm(t )
=
= np{(1 p + pet ) n1 e t + e t ( n 1)(1 p + pet ) n2 pet }
2
dt
dt
by the product rule for differentiation.
d (uv )
dv
du
= u + v , where here u = (1 p + pe t ) n1 and v = et .)
dt
dt
dt
1
(1 + et )3 .
8
1 1
m(t ) = 1 + et = (1 p + pet ) n , where n = 3 and p = 1/2.
2 2
Thus m(t) is the mgf of a random variable whose distribution is binomial with
parameters 3 and 1/2. Therefore Y ~ Bin(3,1/2), and so P(Y = 3) = 1/8.