You are on page 1of 9

page 93 110SOR201(2002)

Chapter 6
Moment Generating Functions
6.1 Denition and Properties
Our previous discussion of probability generating functions was in the context of discrete r.v.s.
Now we introduce a more general form of generating function which can be used (though not
exclusively so) for continuous r.v.s.
The moment generating function (MGF) of a random variable X is dened as
M
X
() = E(e
X
) =
_

x
e
x
P(X = x) if X is discrete
_

e
x
f
X
(x)dx if X is continuous
(6.1)
for all real for which the sum or integral converges absolutely. In some cases the existence
of M
X
() can be a problem for non-zero : henceforth we assume that M
X
() exists in some
neighbourhood of the origin, || <
0
. In this case the following can be proved:
(i) There is a unique distribution with MGF M
X
().
(ii) Moments about the origin may be found by power series expansion: thus we may write
M
X
() = E(e
X
)
= E
_

r=0
(X)
r
r!
_
=

r=0

r
r!
E(X
r
) [i.e. interchange of E and

valid]
i.e.
M
X
() =

r=0

r
r!
where

r
= E(X
r
). (6.2)
So, given a function which is known to be the MGF of a r.v. X, expansion of this function in
a power series of gives

r
, the r
th
moment about the origin, as the coecient of
r
/r!.
(iii) Moments about the origin may also be found by dierentiation: thus
d
r
d
r
{M
X
()} =
d
r
d
r
_
E(e
X
)
_
= E
_
d
r
d
r
(e
X
)
_
(i.e. interchange of E and dierentiation valid)
= E
_
X
r
e
X
_
.
page 94 110SOR201(2002)
So
_
d
r
d
r
{M
X
()}
_
=0
= E(X
r
) =

r
. (6.3)
(iv) If we require moments about the mean,
r
= E[(X )
r
], we consider M
X
(), which
can be obtained from M
X
() as follows:
M
X
() = E
_
e
(X)
_
= e

E(e
X
) = e

M
X
(). (6.4)
Then
r
can be obtained as the coecient of

r
r!
in the expansion
M
X
() =

r=0

r
r!
(6.5)
or by dierentiation:

r
=
_
d
r
d
r
{M
X
()}
_
=0
. (6.6)
(v) More generally:
M
a+bX
() = E
_
e
(a+bX)
_
= e
a
M
X
(b). (6.7)
Example
Find the MGF of the N(0, 1) distribution and hence of N(,
2
). Find the moments about the
mean of N(,
2
).
Solution If Z N(0, 1),
M
Z
() = E(e
Z
)
=
_

e
z
1

2
e

1
2
z
2
dz
=
1

2
_

exp{
1
2
(z
2
2z +
2
) +
1
2

2
}dz
= exp(
1
2

2
)
1

2
_

exp{
1
2
(z )
2
}dz.
But here
1

2
exp{...} is the p.d.f. of N(, 1)), so
M
Z
() = exp(
1
2

2
) (6.8)
If X = +Z, X N(,
2
), and
M
X
() = M
+Z
()
= e

M
Z
() by (6.7)
= exp( +
1
2

2
).
page 95 110SOR201(2002)
Then
M
X
() = e

M
X
() = exp(
1
2

2
)
=

r=0
(
1
2

2
)
r
r!
=

r=0

2r
2
r
r!

2r
=

r=0

2r
2
r
.
(2r)!
r!
.

2r
(2r)!
.
Using property (iv) above, we obtain

2r+1
= 0, r = 1, 2, ...

2r
=

2r
(2r)!
2
r
r!
, r = 0, 1, 2, ...
(6.9)
e.g.
2
=
2
;
4
= 3
4
.
6.2 Sum of independent variables
Theorem
Let X, Y be independent r.v.s with MGFs M
X
(), M
Y
() respectively. Then
M
X+Y
() = M
X
().M
Y
(). (6.10)
Proof
M
X+Y
() = E
_
e
(X+Y )
_
= E
_
e
X
.e
Y
_
= E(e
X
).E(e
Y
) [independence]
= M
X
().M
Y
().
Corollary If X
1
, X
2
, ..., X
n
are independent r.v.s,
M
X
1
+X
2
++Xn
() = M
X
1
().M
X
2
()...M
Xn
(). (6.11)
Note: If X is a count r.v. with PGF G
X
(s) and MGF M
X
(),
M
X
() = G
X
(e

) : G
X
(s) = M
X
(log s). (6.12)
Here the PGF is generally preferred, so we shall concentrate on the MGF applied to continuous
r.v.s.
Example
Let Z
1
, ..., Z
n
be independent N(0, 1) r.v.s. Show that
V = Z
2
1
+ +Z
2
n

2
n
. (6.13)
Solution Let Z N(0, 1). Then
M
Z
2() = E
_
e
Z
2
_
=
_

e
z
2 1

2
e

1
2
z
2
dz
=
_

2
exp{
1
2
(1 2)z
2
}dz.
Assuming <
1
2
, substitute y =

1 2z. Then
M
Z
2() =
_

2
e

1
2
y
2
.
1

1 2
dy = (1 2)

1
2
, <
1
2
. (6.14)
page 96 110SOR201(2002)
Hence
M
V
() = (1 2)

1
2
.(1 2)

1
2
...(1 2)

1
2
= (1 2)
n/2
, <
1
2
.
Now
2
n
has the p.d.f.
1
2
n
2
(
n
2
)
w
n
2
1
e

1
2
w
, w 0; n a positive integer.
Its MGF is
_

0
e
w
1
2
n
2
(
n
2
)
w
n
2
1
e

1
2
w
dw
=
_

0
1
2
n
2
(
n
2
)
w
n
2
1
exp{
1
2
w(1 2)}dw
(t =
1
2
(1 2) ( <
1
2
))
= (1 2)

n
2
1
(
n
2
)
_

0
t
n
2
1
e
t
dt
= (1 2)

n
2
, <
1
2
= M
V
().
So we deduce that V
2
n
. Also, from M
Z
2() we deduce that Z
2

2
1
.
If V
1

2
n
1
, V
2

2
n
2
and V
1
, V
2
are independent, then
M
V
1
+V
2
() = M
V
1
().M
V
2
() = (1 2)

n
1
2
(1 2)

n
2
2
( <
1
2
)
= (1 2)
(n
1
+n
2
)/2
.
So V
1
+V
2

2
n
1
+n
2
. [This was also shown in Example 3, 5.8.2.]
6.3 Bivariate MGF
The bivariate MGF (or joint MGF) of the continuous r.v.s (X, Y ) with joint p.d.f.
f(x, y), < x, y < is dened as
M
X,Y
(
1
,
2
) = E
_
e

1
X+
2
Y
_
=
_

1
x+
2
y
f(x, y)dxdy, (6.15)
provided the integral converges absolutely (there is a similar denition for the discrete case). If
M
X,Y
(
1
,
2
) exists near the origin, for |
1
| <
10
, |
2
| <
20
say, then it can be shown that
_

r+s
M
X,Y
(
1
,
2
)

r
1

s
2
_

1
=
2
=0
= E(X
r
Y
s
). (6.16)
The bivariate MGF can also be used to nd the MGF of aX +bY , since
M
aX+bY
() = E
_
e
(aX+bY )
_
= E
_
e
(a)X+(b)Y
_
= M
X+Y
(a, b). (6.17)
page 97 110SOR201(2002)
Example Bivariate Normal distribution
Using MGFs:
(i) show that if (U, V ) N(0, 0; 1, 1; ), then (U, V ) = , and deduce (X, Y ),
where (X, Y ) N(
x
,
y
;
2
x
,
2
y
; );
(ii) for the variables (X, Y ) in (i), nd the distribution of a linear combination aX +bY , and
generalise the result obtained to the multivariate Normal case.
Solution
(i) We have
M
U,V
(
1
,
2
) = E(e

1
U+
2
V
)
=
_

1
u+
2
v
1
2
_
1
2
exp
_

1
2(1
2
)
[u
2
2uv +v
2
]
_
dudv
=
1
2

1
2
_

exp{......}dudv
= ......... = exp{
1
2
(
2
1
+ 2
1

2
+
2
2
)}.
Then
M
U,V
(
1
,
2
)

1
= exp{.....}(
1
+
2
)

2
M
U,V
(
1
,
2
)

2
= exp{....}(
1
+
2
)(
1
+
2
) + exp{....}.
So
E(UV ) =
_

2
M
U,V
(
1
,
2
)

2
_

1
=
2
=0
= .
Since E(U) = E(V ) = 0 and Var(U) = Var(V ) = 1, we have that the correlation coecient of
U, V is
(U, V ) =
Cov(U, V )
_
Var(U).Var(V )
=
E(UV ) E(U)E(V )
1
= .
Now let
X =
x
+
x
U, Y =
y
+
y
V.
Then, as we have seen in Example 1, 5.8.2,
(U, V ) N(0, 0; 1, 1; ) (X, Y ) N(
x
,
y
;
2
x
,
2
y
; ).
It is readily shown that a correlation coecient remains unchanged under a linear transforma-
tion of variables, so (X, Y ) = (U, V ) = .
(ii) We have that
M
X,Y
(
1
,
2
) = E
_
e

1
(x+xU)+
2
(y+yV )
_
= e
(
1
x+
2
y)
M
U,V
(
1

x
,
2

y
)
= exp{(
1

x
+
2

y
) +
1
2
(
2
1

2
x
+ 2
1

y
+
2
2

2
y
)].
So, for a linear combination of X and Y ,
M
aX+bY
() = M
X,Y
(a, b) = exp{(a
x
+b
y
) +
1
2
(a
2

2
x
+ 2abCov(X, Y ) +b
2

2
y
)
2
}
= MGF of N(a
x
+b
y
, a
2

2
x
+ 2abCov(X, Y ) +b
2

2
y
)
2
),
i.e.
aX +bY N(aE(X) +bE(Y ), a
2
Var(X) + 2abCov(X, Y ) +b
2
Var(Y )). (6.18)
page 98 110SOR201(2002)
More generally, let (X
1
, ..., X
n
) be multivariate normally distributed. Then, by induction,
n

i=1
a
i
X
i
N
_
_
n

i=1
a
i
E(X
i
),
n

i=1
a
2
i
Var(X
i
) + 2

i<j
a
i
a
j
Cov(X
i
, X
j
)
_
_
. (6.19)
(If the Xs are also independent, the covariance terms vanish but then there is a simpler
derivation (see HW 8).)
6.4 Sequences of r.v.s
6.4.1 Continuity theorem
First we state (without proof) the following:
Theorem
Let X
1
, X
2
, ... be a sequence of r.v.s (discrete or continuous) with c.d.f.s F
X
1
(x), F
X
2
(x), ...
and MGFs M
X
1
(), M
X
2
(), ..., and suppose that, as n ,
M
Xn
() M
X
() for all ,
where M
X
() is the MGF of some r.v. X with c.d.f. F
X
(x). Then
F
Xn
(x) F
X
(x) as n
at each x where F
X
(x) is continuous.
Example
Using MGFs, discuss the limit of Bin(n, p) as n , p 0 with np = > 0 xed.
Solution Let X
n
Bin(n, p), with PGF G
X
(s) = (ps +q)
n
. Then
M
Xn
() = G
Xn
(e

) = (pe

+q)
n
= {1 +

n
(e

1)}
n
where = np.
Let n , p 0 in such a way that remains xed. Then
M
Xn
() exp{(e

1)} as n ,
since
_
1 +
a
n
_
n
e
a
as n , a constant, (6.20)
i.e.
M
Xn
() MGF of Poisson() (6.21)
(use (6.12), replacing s by e

in the Poisson PGF (3.7)). So, invoking the above continuity


theorem,
Bin(n, p) Poisson() (6.22)
as n , p 0 with np = > 0 xed. Hence in large samples, the binomial distribution
can be approximated by the Poisson distribution. As a rule of thumb: the approximation is
acceptable when n is large, p small, and = np 5.
page 99 110SOR201(2002)
6.4.2 Asymptotic normality
Let {X
n
} be a sequence of r.v.s (discrete or continuous). If two quantities a and b can be found
such that
c.d.f. of
(X
n
a)
b
c.d.f. of N(0, 1) as n , (6.23)
X
n
is said to be asymptotically normally distributed with mean a and variance b
2
, and we write
X
n
a
b
a
N(0, 1) or X
n
a
N(a, b
2
). (6.24)
Notes: (i) a and b need not be functions of n; but often a and b
2
are the mean and variance of
X
n
(and so are functions of n).
(ii) In large samples we use N(a, b
2
) as an approximation to the distribution of X
n
.
6.5 Central limit theorem
A restricted form of this celebrated theorem will now be stated and proved.
Theorem
Let X
1
, X
2
, ... be a sequence of independent identically distributed r.v.s, each with mean and
variance
2
. Let
S
n
= X
1
+X
2
+ +X
n
, Z
n
=
(S
n
n)

n
.
Then
Z
n
a
N(0, 1) or P(Z
n
z) P(Z z) as n , where Z N(0, 1),
and S
n
a
N(n, n
2
).
Proof Let Y
i
= X
i
(i = 1, 2, ...). Then Y
1
, Y
2
, ... are i.i.d. r.v.s, and
S
n
n = X
1
+ +X
n
n = Y
1
+ +Y
n
.
So
M
Snn
() = M
Y
1
().M
Y
2
()....M
Yn
() = {M
Y
()}
n
,
and
M
Zn
() = MSnn

n
() = E
_
exp
_
Snn

n

__
= E
_
exp
_
(S
n
n)(

n
)
__
= M
Snn
_

n
_
=
_
M
Y
_

n
__
n
.
Note that
E(Y ) = E(X ) = 0 : E(Y
2
) = E{(X )
2
} =
2
.
Then
M
Y
() = 1 + E(Y )

1!
+ E(Y
2
)

2
2!
+ E(Y
3
)

3
3!
+
= 1 +
1
2

2
+o(
2
)
page 100 110SOR201(2002)
(where o(
2
) denotes a function g() such that
g()

2
0 as 0). So
M
Zn
() = {1 +
1
2

2
(

2
n
2
) +o(
1
n
)}
n
= {1 +
1
2

2
.
1
n
+o(
1
n
)}
n
(where o(
1
n
) denotes a function h(n) such that
h(n)
1/n
0 as n ).
Using the standard result (6.20), we deduce that
M
Zn
() exp(
1
2

2
) as n
which is the MGF of N(0,1).
So
c.d.f. of Z
n
=
S
n
n

n
c.d.f. of N(0, 1) as n ,
i.e.
Z
n
a
N(0, 1) or S
n
a
N(n, n
2
). (6.25)

Corollary
Let X
n
=
1
n
n

i=1
X
i
. Then X
n
a
N(,

2
n
). (6.26)
Proof X
n
= W
1
+ + W
n
where W
i
=
1
n
X
i
and W
1
, ..., W
n
are i.i.d. with mean

n
and
variance

2
n
2
. So
X
n
a
N(n.

n
, n.

2
n
2
) = N(,

2
n
).
(Note: The theorem can be generalised to
independent r.v.s with dierent means & variances
dependent r.v.s
but extra conditions on the distributions are required.
Example 1
Using the central limit theorem, obtain an approximation to Bin(n, p) for large n.
Solution Let S
n
Bin(n, p). Then
S
n
= X
1
+X
2
+ +X
n
,
where
X
i
=
_
1, if the ith trial yields a success
0, if the ith trial yields a failure.
Also, X
1
, X
2
, ..., X
n
are independent r.v.s with
E(X
i
) = p, Var(X
i
) = pq.
So
S
n
a
N(np, npq),
i.e., for large n, the binomial c.d.f. is approximated by the c.d.f. of N(np, npq).
[As a rule of thumb: the approximation is acceptable when n is large and p
1
2
such that
np > 5.]
page 101 110SOR201(2002)
Example 2
As Example 1, but for the
2
n
distribution.
Solution Let V
n

2
n
. Then we can write
V
n
= Z
2
1
+ +Z
2
n
,
where Z
2
1
, ..., Z
2
n
are independent r.v.s and
Z
i
N(0, 1), Z
2
i

2
1
; E(Z
2
i
) = 1, Var(Z
2
i
) = 2.
So
V
n
a
N(n, 2n).
Note: These are not necessarily the best approximations for large n. Thus
(i)
P(S
n
s) P
_
Z
s+
1
2
np

npq
_
where Z N(0, 1)
= F
S
_
s+
1
2
np

npq
_
.
The
1
2
is a continuity correction, to take account of the fact that we are approximating a
discrete distribution by a continuous one.
(ii)
_
2V
n
approx
N(

2n 1, 1).
6.6 Characteristic function
The MGF does not exist unless all the moments of the distribution are nite. So many distri-
butions (e.g. t,F) do not have MGFs. So another GF is often used.
The characteristic function of a continuous r.v. X is
C
X
() = E(e
iX
) =
_

e
ix
f(x)dx, (6.27)
where is real and i =

1. C
X
() always exists, and has similar properties to M
X
(). The
CF uniquely determines the p.d.f.:
f(x) =
1
2
_

C
X
()e
ix
d (6.28)
(cf. Fourier transform). The CF is particularly useful in studying limiting distributions. How-
ever, we do not consider the CF further in this module.

You might also like