Professional Documents
Culture Documents
1. Introduction
2. Probability Theory
Econometricians
by
Anil K. Bera
1.1 Introduction.
If you look around, you will notice the world is full of uncertainity. With
all the enormous amount of past information, we never can tell about the exact
weather condition of tomorrow. The same is true for many economic variable, such
as- stock price, exchange rate, inflation, unemployment, interest rate, mortgage
rate, etc. [If you know the exact future price of major stock- you can make a
million! In that case, of course, you won't be taking this course.] Then, what is
the role of statistics in this uncertain world? The basic foundation of statistics
is based on the idea that there is an underlying principle or common rule in
the midst of all the chaos and irregularities. Statistics is a science to formulate
these common rules in a systematic way. Econometric:; is that field of science
which deals with application of statistics to economics. Statistics is applicable
to all branches of science and humanities. You might have heard of fields, like
sociometry, psychometry, cliometrics and biometrics. These are application of
statistics to sociology, psychology, history and biology, respectively. Application of
statistics in economics is somewhat controversial, since unlike physical or biological
sciences, in economics we can't conduct purely random experiments. In most
cases what we have historical data on certain economic variables. For all practical
purposes, we can view these data as result of some random experiment and then
use the statistical tools to analyze the data. For example, regarding stock price
movement, based on the available data we can try to find the underlying probability
distribution. This distribution will depend on some unknown parameters which
can be estimated using the data. We can also test some hypothesis regarding the
parameters or we can even test whether the (assumed) probability distribution is
valid or not.
Just like any other science, in statistics there are many approaches, Classical
and Bayesian, parametric and nonparametric etc. These are not always substitutes
of each other and in many cases, they can be successfully used as compliments of
each other. However) in this course, we will concentrate on the classical parametric
approach.
1
2
Example 2.1.1:
You know that with real numbers we can do lot of operations like addition
(+), substraction (-), multiplication (x) etc. Similar operations could also be done
with sets, e.g., we can "add" (sort of) two sets, substract one set from the another
etc. Two very important operations are "union" and "intersection" of sets.
2
3
of sets A 1 ,A2,A3 , as U Ai = Al U A2 U A 3 .
i=I
As in the case of "union", we can also define the operation "n" for more than two
sets. For example,
n Ai = Al n Az ... n An
n
i=l
{xix E Ai, for all i = 1,2"", n}
n Ai = Al n A2 n A
00
3
4
AOB
Figure 2.1.1
Be (l...
Figure 2.1.2
- 3/:
5
Continuing with Example 2.1.1, suppose those students taking Econ 472 have
already attended Econ 402, i.e. there is no student in Econ 472 class who is taking
Econ 402 now. Then if we talk about F n G, the set will be empty. We will call
such a set a null set and will denote it by 4>. By definition for any set A, AU4> = A,
An 4> = 4>.
4
6
Figure 2.1.3
- yA
7
These identities are known as De Morgan's law. Try to prove the following gener
alizations:
Let us now link up the set theory with the concepts of "event" and "probability".
Suppose we throw one coin twice. The coin has two sides, head (H) and tail (T).
What are the possible outcomes?
Both tail (T T)
Both head (H H)
Tail head (T H)
Collect these together in a set D={ (T T), (H H), (T H), (H Tn, this is the
collection of all possible outcomes. We may be interested in the following special
outcomes:
5
8
So far we have considered sets which are collection of single element, e.g., we
had a set C = {I, 2, 3, ... }. We can think uf <t set whose elements are also sets,
i.e., a set of sets. We can call this a collection of a class of sets. By giving a
different structure to this class of sets, we can define many concepts, such as ring
and field. For our future purpose, all we need is the concept of a - field (sigma
field). This will be denoted by A (script A). A 0" - field is nothing but a collection
of sets AI, A 2 , As, ... satisfying the following properties
00
(ii) If A E A then AC E A.
In other words A is closed under the formation of countable unions and under
complimentation. From the above two conditions, it is clear that for A to be a
O"-field,the null set and the space n must belong to A.
Example 2.1.8:
n= R (real line)
A = {countable union of intervals like( a, b]}
A is called Borel field and members of A are called Borel sets in R.
As you can guess, the word "random" is associated with some sort of uncer
tainty. If we toss a coin, we know the possibilities: head (H) or tail (T); but we are
uncertain about exactly which one will appear. Therefore, "tossing a coin" can be
regarded as a random experiment where the possibilities are known but not the
6
9
exact outcome. The probability theory, the collection of all possible outcomes is
known as Jample Jpace.
Example 2.2.1:
(i) Toss a coin. The sample space is 51 = {H, T}.
(ii) Toss two coins or one coin twice, the sample space
51 = {(HH),(TT),(HT), (TH)}
Instead of assigning symbols, we can give these outcomes, some numbers (real
numbers). For example, for the above Example (i), we can define
x 0 if the outcome is T
= 1 if the outcome is H
Let us first formally define "Probability". For Example (i), we have the sample
space 51 {H, T}. The O"-field defined on 51 is A = {cP, 51, (H), (T)}. Elements
of A are called the events. "Probability" is nothing but assigning real numbers
(satisfying some conditions) to each of these events.
P :A ---t [O,lJ
satisfying the following axioms
(i) pen) = 1
(ii) If AI, A 2 , A 3 , . E A are disjoint (i.e., Ai n Aj = cP for all i f:: j) then
00
7
10
Example 2.2.2:
n == {H,T}
In other words X (.) is a measurable function from the sample space to the
real line. "Measurability" is defined by requiring that the inverse image of X is
an element of the a-field, i.e., an event. Recall that, probability is defined only
for events. By requiring that X is measurable, in a sense, we are assuring its
probability distribution.
Example 2.2.3: Toss a coin twice, then the sample space n and a a-field A can
be defined as
x=o
=1
2
8
11
X-I
Figure 2.2.1
-- ~l-
12
CorreSponding to (n, A, P), there exists another probability space (R, B, pX),
where pX is defined as
X : n ------+ R
pX : B ------+ [0, 1].
13
The last two columns describe the probability distribution of the random
variable X. Sometimes we will simply denote it by P(X).
x P(X)
0 1/4
1 1/2
2 1/4
Most of the time probability distributions (of discrete random variables) are
presented this way. From the above discussion, it is clear that each such probability
distribution originates from an n, sample space of a random experiment.
Definition 2.2.3: Listing of the values along with the corresponding probabilities
is called the probability distribution of a random variable.
Note: Strictly speaking, this definition applies to "discrete" random variable only.
Later, we will define, "discrete" and" continuous" random variables.
Note: \Ve will use "PrO" to denote probability of an event without defining the
set explicitly, and PO or pX (.), when the set is explicitly stated in the argument.
Also note that the probability spaces for P and pX are respectively, (n, A, P) and
(n,B,p X ).
Let us now provide a formal definition of the distribution function. Let
W(x) = {w E nIX(w) :::; x}.
Since X is measurable, W(x) E A. In the probability space (n,B,p X ), we can
write the probability of W(x) as
P(W(x)) = pX[(_oo, xl].
10
14
This is well defined since (-00, xl E B. This probability is called the distribution
function of X, i.e.,
Or simply
X F(X)
0 1/4
1 3/4
2 1
If we plot, F( x) will look like as in Figure 2.3.1. Note that it is a step function.
Also notice, the discontinuties at x = 0, 1 and 2.
(ii) F(x) is a nondecreasing function of x Le., if Xl > Xz, then F(xt) 2: F(xz).
Proof:
11
15
Fex-)
. 0 i
F19ure 2.3.1
1/ A
16
F(-oo) = n--+oo
lim F(-n) = lim P(An)
n--+oo
= P( lim An)' (why 7)
n--+oo
= P( = O. (why 7)
Note: The first (why 7) follows from the "continuity" property of P(.). It says:
if {An} is a monotone sequence of events, then P(limn--+oo An) = limn--+oo P(A n ).
(Try to prove this; see Workout Examples-I, Question 6).
An = {w E DIX(w) ::; n}
(v) For all x, F( x) is continuous to the right or right continuous. [What does it
really mean is that F(x + 0) = F(x) where F(x + 0) = lime.j..o F(x + c:).]
Proof: Define the set
An = {w E D IX (w) ::; X + -)
n
1
F(x + -) = P(An)
n
12
17
1
F(x + 0) = limF(x
e:.j.O
+ c) = lim F(x
n-too
+ -).
n
1
Bn = {w E nIX(w) :::; x - -}
n
However,
Hence,
F(x) - F(x - 0) Pr(X = x)
Pr(X = 0) = ~ > 0
Pr(X = 1) = ! >0
Pre X = 2) = ~ > 0
If Pr(X = x) = 0 for all x then F(x) will be continuous since, in that case
F(x) F(x + 0) = F(x - 0) for all x.
Once we have defined the distribution function, we can talk about the "Prob
ability mass function" (for discrete variables) and "Probability density function"
(for continuous variables).
13
18
Let n contain finite (or count ably infinite) number of elements. Here by
countably infinite we mean one-to-one correspondence with the set of integers, N =
{I, 2, 3, ..... }. To see an example, consider an experiment of tossing a coin until we
get a head. Then n = {H, T H, TT H, ..... }. If we define X as the number of trials
to get a head then X = 1,2,3, ...... Denote that as n = {WI, W2, W3, . }. Therefore,
n contains discrete points. For any event A E A, we define the probability
peA) L: P(Wi).
"',EA
For a random variable X 1 constructed on n will also take discrete values. Let us
now denote the range of X as X and the associated probability space as (X, B, pX).
Therefore, we will have a discrete random variable X with discrete probability
distribution pX. Given that
pX(X) 1.
Example 2.4.1:
Then
1.e.,
X p,x
o 1/4
1 1/2
2 1/4
14
19
3
pX (X) = L Pr(X = Xi)
i=l
Example 2.4.2 :
(i) Toss a coin n times and let X = # heads. Then X takes (n+ 1) values, namely,
X = 0,1,2, ... , n. The probability distribution of X with the corresponding
points in the sample space can be written as
w X pX
TTTT ... TTT o (1/2t
HTTT ... TTT (1/2)n )
THTT ... TTT (1/2)n ( Add = n(1/2)n
15
20
distribution. Check here that if we add pX for all the values of X, it is equal
to one.
(ii) Let us now consider our earlier example of tossing a coin until we get a head,
and define X = # heads. Then X will take countably infinite number of
values with the following probability distribution.
w x pX
H 1 1/2
TH 2 (1/2)2
TTH 3 (1/2)3
(iii) Now suppose X takes n values, (Xl,X2, ... ,X n ) = {Xi, i = 1,2, ... ,n}.
F(x) = Pr(X $ x) = I: Pi
Xi
Any set of pi s can serve our purpose. All we need is to satisfy the following two
conditions:
(i) Pi 2: 0 Vi.
(ii) 2: i Pi = 1.
As we noted before when the distribution is discrete there will be jumps in F(x),
and therefore, it will not be continuous and hence not differentiable. Now suppose,
F( x) is continuous and, differentiable excpept a few points and
f(x) = dF(x)
dx
16
21
Note that for the discrete case, this probability can be written as
17
22
Figure 2.4.1
17A
23
Example 2.4.3:
0, for x <
Let F(x) = x, for x E [O,IJ
{
1, for x > 1.
as given in Figure 2.4.2.
0, for x <
f(x)::::; 1, for x E [0, IJ
{
0, for x > 1.
X = expenditure on cars
Pr(X < 0) =0
Pr(X = 0) 0.5
f(x) = 0.5e- x for x >0
18
24
Figure 2.4.2
,..
' () i
F 19ure 2.4.3
16 /1 -
25
j~)
'x... ->
Figure 2.4.4
26
#cases for An B
peA n BIB)
#cases for B
(#cases for An B/#cases in fl) peA n Blfl)
-
(#cases forB/#cases in fl) P(Blfl)
fl {(HH),(TT),(HT),(TH)}
and A (HT), B = {(HT), (TH)}, An B = (HT).
19
27
1 1 1
Therefore, P(A) = 4' P(B) = 2' p(AnB)=4
Let us first intiutively find the conditional probabilities. For (AlB), we know
that either (HT) or (TH) has appeared, and we want to find the probability that
(HT) has occurred. Since all the element of n has equal probability, P(AIB) = ~.
Similarly P(BIA) = 1 since (HT) has already occurred. Now let us use the formula
to get the conditional probabilities.
P(BIA) =
P(A n B)
P(A) = t
1
= 1 # P(B) (Interpret this result)
Here the probability of A (or B) changes after it has been given that B (or A) has
appeared. In such a case we say that the two events A and B are dependent.
P(AIB) =
p(AnB)
P(B) = t = "2
1 1
= P(A) (Interpret this result)
20
28
Proof:
C) P(AIB)
Z
= P(AB) >0
PCB)
.. ~, = PCB) = PCB) = .
(iii) Let AI) A21 A 3 ) be a sequence of disjoint ~vents, then
= LP(AiIB)
i=l
Note that (Ai n BY s are disjoint, since (Ai n B) n (Aj n B)=Ai n Aj n B = for
ii=j.
Note: Conditional distributions are vary useful in many practical applications.
Such as,
(iii) Wage differential: Wage distributions could be different for unionized and
non-unionized workers.
21
29
Proof: We have
n n
AnB=AncUB i )= UCAnBi).
;=1 i=l
Note that (A n Bi), i 1,2, ... , n are disjoint events (why?). Therefore,
n
i=l
n
1=1
n
= L P(Bi).P(AIB i )
i=l
22
30
i=l
n
I.e. peA) =L P(Bj)P(AIBi) (1)
i=l
Theorem (Bayes) 2.6.1 : Let {Bd, i = 1,2, ... ,n be n disjoint events in A, i.e.,
n
Bi n Bj = 4>, Vi =J:. j, and let U Bi = n. Then for any event A E A for which
i=l
P(AIBj) is defined
(2)
Proof:
P(B'IA) = P(Bj n A)
, peA)
P(AIBj)P(B j )
= (using definition of conditional probability)
peA)
P(AIBj)P(Bj)
(using (1), above)
Example 2.6.1 : The customer service manager for PROTRAC is reponsible for
expediting late orders. To do this job effectively, when an order is late he must
determine if the lateness is caused by an ordering error or a delivery error. If an
order is late, one or the other of these two types of errors must have occured.
Because of the way in which this system is designed, both errors cannot occur on
the same order. From past experience, he knows that an ordering error will cause
8 out of 20 deliveries to be late, whereas a delivery error will cause 8 out of 10
deliveries to be late. Historically, out of 1000 orders, 30 ordering errors and 10
delivery errors have occured.
23
31
Assume that an order is late. If the customer service manager wishes to look
first for the type of error that has the largest probability of occurring, should he
look for an ordering error or delivery error?
Solution:
Let
A=the event that an order is late
B 1 =the event that an ordering error is made
The problem is to find the maximum of P(BlIA) and P(B 2 IA). From the data
in the problem it seems reasonable to assess the following probabilities: P(Bd =
0.03, P(B2) = 0.01, P(B3) = 0.96, P(AIBl) = DAD, P(AIB 2) = 0.80,
P(AIB3) = O.
P(AIBI)P(B 1 )
P(B1IA) = P(AIBI)P(BI) + P(AIB2)P(B2) + P(AI B 3)P(B3)
P(AIB 2 )P(B2 )
2 A
P(B I ) P(AIB2 )P(B2) + P(AIB1)P(Bd + P(AI B 3)P(B3)
Since P(AIB3) = 0, we have
Thus the customer service manager should first check on whether an ordering
error has been made. Here w~ should note that P(AIB2) > P(AIBd, however,
P(B1IA) > P(B2IA), i.e., altho,Ugh probability of being late is higher when there
is a delivery error, the probability that a lateness will be caused by a delivery error
is lower.
Note: Q.7 of Workout EXanIples-I is almost the same as the above Bayes
24
32
theorem. The only difference is that in the above question we consider a sequence
00
Note: The Bayes theorem was originally proved by Thomas Bayes in 1763 (al
though many people doubt this, who think Bayes theorem is not due to Bayes!).
The theorem did not have much influence until the appearence of the classic
book,Harold Jeffreys (1950), "Theory of Probability".
In the Bayes theorem if we interpret "A" as sample and "B" as our "prior infor
mation" , then the theorem tells us how to revise our prior opinion on the the light
of the occurrence of the sample.
Rev. T. Bayes (1763), "An Essay Toward Solving a Problem in the Doctrine
of Chances," Philosophical Transaction of Royal Society of London, Vo1.53,
pp. 370-418.
25
33
<P 0 0 0
<P 0 1 0
(T T) 0 2 1
4
<P 1 0 0
1
(HT) 1 1 4"
1
(TH) 1 1 4"
<P 1 2 0
1
(H H) 2 0 4"
<P 2 1 0
<P 2 2 0
X\Y 0 1 2 pX
0 0 0 1
4 t = Pr(X = 0)
1 0 1
2
0 ~ = Pr(X = 1)
2 1
4" 0 0 t = Pr(X = 2)
pY 1 1 1 1.0
4 2 4
= Pr(Y = 0) = Pr(Y = 1) = Pr(Y = 2)
The joint probability distribution pXY is graphically presented in Figure 2.7.1.
In general, we can define
So that pX and pY are induced probabilities defined on (R, B). Therefore, the
joint induced probability pXY is defined on (R2, B2). Therefore, the induced
probability space is (R2, B2, pXY).
26
34
o i 2
Figure 2.7.1
35
Note that
As we have defined in the univariate case, for the bivariate situation, the joint
distribution function of X and Y can be d~fined as
Then
F(x, y) = pXY (A).
27
36
Figure 2.7.2 .
\ ' .
-27A
37
P(AIB) = P(A n B)
P(B) .
Similarly, for two random variables we can define the "conditional probability" of
X E Al given Y E A z , which we will denote as pXIY (AIIAz), as
Similarly,
If pXIY(AIIA z ) is same as PX(Ad for all Al and AZI we say X and Yare
independent. In that case
28
38
F(x, y) G(x).H(y),
where
Proof: Sufficient part: Assume F(x,y) G(x).H(y). For any given intervals,
Thus we have
This is true for any intervals II and 12 , and therefore, for any Al and A2 E B
(why?), i.e.)
N eceJJary part: Assume X and Yare independent and therefore, for any Al and
A2 ,
29
39
(a", Cl,,)
Figure 2.7.4
-- 29A
40
Then
Now let us assume that n is discrete. Then both X and Y will be dis
crete. Suppose X takes values {xd, i = 1,2,3, ... , and Y takes values {Yj },j =
1,2,3, .... That is, X and Y can take infinite or countably infinite numbers of
values. We denote,
X\Y Yl Y2 Y3 Yj
(ii) L LPij = l.
j
XiS;XYjS;y
30
41
00 00
31
42
For the continuous case, we define the joint probability density function(pdf)
as
= 8 2 F(x, y)
f( x, y ) 8x8y
if F( x, y) is differentiable except on a set of points with measure zero. Conversely
f(x,y)~O
f : f : f(x, y)dxdy = 1.
(Compare this with the defnition of marginal distribution for the discrete case.)
Distribution function of X :
Similarly, the marginal pdf and distribution function of Yare respectively given
by
32
43
Pr[X<x,y 0<Y<y+8]
Pr [X < xly - 0 < Y < Y + 0] = [ - -]
- - - Pr y - 8 ~ Y ~ y +5
Similarly,
y+a
lim
a-+O !.y-a
feu, v)dv = feu, y).
f~oo f( u, y)du
hey)
This is defined as the conditional distribution function of X, i.e.,
33
44
Differen tiating,
= OX
o [12x 2
+ xy] x+ y.
Check that
(i) f(x, y) 2 0
F(x,y) = 1l x Y
f(u,v)dvdu = foX 1 Y
(u + v)dvdu
{X v2 ] Y {X ( y2 )
= Jo [uv +"2 0du = 10 uy +"2 du
u2 y2]X 1
= "2 Y + u 2 = 2 xy (x + y).
. 0
Marginal pdf of X :
34
45
Note that X and Y are not independent (why?). Given f(x, y) we can also find
probabilities like
1 3
Pr[O $ X $ 2' t[
4 $ Y $ 1] = J 2 Jo[12 f(x, y)dx ] dy.
4
Pr[O $ X $
1
21 = 12 1
g(x)dx.
Pr [0 $ X $ ~, ~ $ Y $ 1]
Pr [0 $ X $ ~ I~ $ Y $ 1] Pr[~ ~ Y ~ 1]
Q: Find the above three values.
Consider our experiment of tossing a coin twice. Let us set up the following
game. As before, let X # of heads. If
X = a you get $10.00
= 1 you pay $12.00
2 you get $10.00
35
46
Will you agree to play this game? The answer really depends on whether you
expect to loose or gain. The question is how do we compute this expectation.
Let us formally define a random variable X on the probability space (n, A,p)
wi th distribution function F( x). Then expectation (or mathematical expectation)
of X is said to exist if and only if
E(X) = i: xdF(x).
For a continuous distribution with pdf f(x), this expectation can be written as
E(X) = f: xf(x)dx.
[Recall f(x) = d~~x) i.e., f(x)dx = dF(x).] For a discrete distribution with pmf
Pr(X = Xi) = Pi, i 1,2, ... ,00, E(X) can be expressed
00 00
For our example, the expected value of playing the game can be calculated using
the above formula. Let Z =: payoff. Then
1
Z = 10 with prob.
4
1
-12 with prob.
2
1
= 10 with prob.
4
Hence
1 1 1
E(Z) = 10 x 4 -12 x 2' + 10 x 4 = -1.
Therefore, on the average you would loose $1.00. That is, every time you play this
game you are not really going to loose a dollar, but if you playa large number
(N) of times, you can expect to loose N dollars overall.
36
47
Here X = C w. p. 1
=1= C w. p. 0
Hence
E(X) = C.l = C
(ii) If E(X) exists and C is a finite constant then E(C X) exists and E( C X)
CE(X).
Proof: E(X) exists, therefore J IxldF(x) < 00. Now E(CX) will exist if
J ICXldF(x) < 00.
j ICXldF(x) = ICI j IXldF(x) < 00,
i.e., E( C X) e xists.
(iii) Let E(X) and E(Y) exist for two random variables X and Y defined on
the same probability space (n, A, P). Then E(X + Y) exists and E(X + Y)
E(X) + E(Y).
Proof:
(iv) If E(X) exists, then for any finite real numbers a and b, E(a + bX) exits
and E(a + bX) = a + bE(X).
Proof: Left as an exercise.
Note: All the above cases can be obtained as special cases of E(aX + bY)
aE(X) + bE(Y).
Now consider a continuous function g(X).
g: n--+n
Recall: X: n --+ n
If we denote g(.) as g(X(w)) then
g: n --+ n.
Now the question is, if X is random variable, is g(X) a random variable? The
answer is yes. Pick up a set A E B [recall the u-field n, B)] and define a set
{w/g(X(w)) E A}
={w/X(w) E g-I(A)}
={w/w E X-I (g-l(A))} E A
E(g(X)) =: J g(x)dF(x)
3.2 Moments.
If E(xr) exists we call it rth raw moment of X or rth moment around zero, and
denote it by fL~' Therefore, we have
and E(xr) exists. If 0 :5 IXI < 1, then IXI8 + 1 ;::: IXl r implies
Next we define g(x) = (x - a)r >. If E[g(x)] exists, we call it a rth moment
of X around a. If we take a = E(X), then
If this exists, we call it rth central moment of X) and we will denote this as fLr.
Note that
flo = 1
I.Ll = 0 (why?)
fLz = E(X -l.Li? = E(X - E(X2
= E(X2) [E(X)F (why?)
fL~ fL~2
39
50
fJ-2 is called the variance of X and is denoted by VeX). If we can find a relationship
between the raw and the central moments.
Theorem 3.2.2: If fl~ exists, then fl~(a) E(X - ar also exists, and
f S mce
P roo: flrIeXIsts,
'
t hen so d0 fllI , fl2'
I
... , flr-l
I
.
Now
(1)
This implies
+ C~ 1) r 1
Ixllal - + lair,
r =1 PI / (1) / /
PI - 1 POPI = 0
I /2
= P2 - PI
r = 3 P3 p~ - 3p~p~ + 2p~3 (why?)
r 4 (why?)
xr = [(x - a) + ar
(x-ay+G)(x aY-Ia+(;)(x ay- 2a2+ ...
Therefore,
+ (r: 1) Ix - allar- 1
+ lair,
JIxlr dF(x) JIx - air dF(x) +.,. + C: 1) lal r- I JIx - aldF(x) + lair
S; < 00.
41
52
Pr1 = pr + (r)
1 pr-IPI1 + (r)
2 pr-2Pl12 + ...
P; = P3 + 3P2P~ + p;3
P4 = P4 + 4 P3PI + 6P2Pl + PI
1 1 12 14
.
Q.: Check the above results.
Lastly, we put
g(X) x(r) = X(X 1)(X - 2) ... (X - r + 1).
Then E[g(X)] is called the rth factorial moment of X and is denoted by p(r), if it
exists. It is called factorial moment since x(d = (x:'!r)!' We can show that if the
raw moments p~ exits, so does p(r). This is because
xU) =x
X(2) = x2 - x
x(3) = x 3 - 3x 2 + 2x
X(4) X4 - 6x 3 + llx 2 6x.
From above, taking expectations
p(2) p~ - p~
42
53
x = X(1)
X2 = X(2) + XCI)
X3 = X(3) + 3X(2) + X(I)
We have
Extra Note:
1. You might have noticed in Section 3.1, we say expectation of X exists iff
J~oo IxldF(x) < 00 While E(X) is define as J~oo xdF(x). The question is:
why do we need a stronger condition for the existence of expectation of X?
Consider, E(X) when X is discrete random variable taking countably infinite
number of values Xl, X2, .. with probabilities PI, P2, respectively, then
00
311
2x7r2(1+22+32"') 1
43
54
2. At this stage you might be wondering what are the uses of these all kinds of
moments. From the moments we can get a very good idea of the probability
distribution of a random variable. As we will see later different measures
of central tendency, dispersion, skewness, peakness of distributions can be
described by moments.
1 a
b
g(x)dF(x)
n
= limn -+ oo L9((i)[F(Xi) - F(Xi-d,
i=l
where the interval (a,b] has been divided into n subintervals (Xi,Xi+d, i.e., a
Xo < Xl < X2 < ". < Xn = b; and (i is an arbitrary point in the subinterval
(Xi-I, Xi]' This is a generalization of the standard Riemann integral, you already
know, namely,
1b g(x)dF(x) = 1 6
g(x)f(x)dx
which is a standard Riemann intregraL On the other hand if F(.) is a step function
(like our distribution function in the discrete case) with jumps at Xl,;:r2, . " then
1b g(x)dF(x) = ~g(xi)[F(Xi) -
,
F(Xi-dJ
n
= L g(xi)Pr(X Xi).
i=l
44
83
Therefore, the coefficient of t i gives Pr(X = i). From Px(t) we can also get the
factorial moments. We have
In general
r
d Px(t)] p(r).
dt r t=l
Proof:
67
84
Q.E.D.
The above proof uses the fact that since Xl and X 2 are independent
t2 t3
=
J (1 +tx + 2!X2 + 3! + .. .)dF(x)
= ko Jt
00
i
j!xjdF(x) ko J
00 t
j
i
xjdF(x)
00 tj
= L
j=O
7Jllj.
J.
Mx(t) is called the moment generating function (m.g.f) since as we see from above,
the coefficient of ~ gives Ilj. Another way to find Ilj is
j
d Mx(t)]
. (h
c eck) .
dtJ t=o
MxCt) E(e t x )= 1 o
00
etx 1
f( 0: ){3a
xa-IeTdx
-z
1
--c--:---
1
o
00
x a -1 e -"'(l-{3t) dx
(3
1 1
=(1 {3t)a' t<fj'
68
85
Therefore,
dMX(t)]
dt t=O
= (-a)(1 ,8t)-cr-
1
c-rnL=o' =} ,u~=a,8.
2
M~(t)] ,
d
dt t=O
= C_0)( _0 - 1)(1 ,8t)-cr-2( _,8)2 L=o' =} ,u2 a( a + 1),82.
Hence, ,u2 ,u~ -,u~ 2 = a,82 Unfortunately m.g.j'of many distribution do not exist
as the following example shows:
.
~
1!')
= 2"(
1!'
+ 22 + 32 + ... ) = 2"
1!'
x 6 1.
)=1
L tj6
e - x
1
1!'2
-6
1!'2
L e tj 1
j=1 j=1
This sum does not exist, and therefore Mx(t) is not defined.
However, there is a function which always exists and from which we can find
the moments. That function is called characteristic function.
69
86
1 ,\
f(x) - _. - 00 <x< 00, 0 < ,\ < 00
- 7r ,\2 + (x - J1.)2
1 1
f(x) =;- 1+ - 00 < x < 00.
= 1 [l;b x dx + 1
00
x dx ]
7r -00 1 + x 1+x 2
0
2
However
0 x
1 -00 1+
- --=-'------'- ,~oo = -00 (does not exist)
and
Jo
roo 1 +x x dx = 2 00 (does not exist)
70
87
1
-a 1 +x
x 2 dx = O.
The characteristic function of the Cauchy distribution, however, does exist. It can
be shown that
= -11=
itx
e
2 dx
_=
1T 1+ x
f(x) (*)
-1
2
J=_= . ett.1'-ltldt = 1= 0
costx - e-tdt
:=:
x sin tx - cos tx = 1
1+ 1 o - 1 +x2
x(t) = 1=_=
~ eitxe-Ixldt =~
2
1=_=
(cos tx + i sin tx )e-1x1dx
= 1= cos txe -x dx
71
88
Now
I.e.,
1
Hence >x(t) 1+t 2
In a similar way, we can show that the moment generating function for the
Laplace distribution is given by
1
=
F(xo + a) F(xo - a) = . 1
hm -
JT sinat e-1txo>x(t)dt
.
. T----+oo 7r -T t
72
89
Proof: We have
</>x(t) = j eitxdF(x)
r
Th erelore, dr</>x(t)
dr .rj x r eitxdF( x )
= Z
t
Result 5.3.5:
Proof:
Proof:
</>X,y(tl, t 2) = E (eitlX+it2Y)
=j j eittx+it2YdF(x, y)
73
90
Result 5.3.7: <px(t) is real iff X follows a symmetric distribution around zero.
Proof:
i.e., X and -X has same cf., i.e., same distribution. Hence X is symmetrically
distributed around zero.
f(x)=l XE[O, 1]
<px(t)
1 d<Px(t)] _ 1 _ I
i dt - 2 - Ill'
t=O
74
91
X>'-l
A (Box-Cox transformation, where A is a constant)
Here the basic principle is that, first find the distribution function of U) F( u).
Then differentiate it w. r. t. U and obtain
dF(u)
feu) du .
The approach is better demonstrated through an example.
Example 6.1.1: Let
3
f(x)=2x'1. -1:5x:51
o otherwise.
75
92
Check that
1 311
1 0
f(u)du = -
2 0
1
u 2 du = 1.
Now to evaluate the above integral, consider a change of variable method. We can
write
i dF(x) =i f(x)dx = If(h(u))ld~~)ldU = If(u)dU
feu) = t
1= 1
oi(u)J[g;l(u)ll dg~~(u)!
P
= LDi(u)J[hi(u)l!dhi(u)!
.
1==1
du
76
93
For a proof of this result see Andre 1. Khuri (1993), Advanced Calculus with Ap
plications in Statistics, John Wiley & Sons, pp. 246-249.
1 (",_~)2
f(x) = --e- 2" ) -00 <x< 00.
V2ia
Let U = a + bX. Therefore, the inverse function h( u) is
a U
X = -- + -
b b
= h(U).
N ow dh(u)1
I~ = m'
1
t hereore,
a U 1
f( u) = f (h( u)) Id:Su) I f(- b + b"\bl
1 1
=--e
V2ia Ibl
1
--==--e
V2ialbl
Now fix the range of u; since -00 < x < 00 and u a + bx, we should have
-00 < u < 00. Therefore,
feu)
Let us now consider the case when the transformation is not one-to-one. For
example, let X '" N(O, 1), and we want to find the probability densities of IXI and
X2. The functions are not one-to-one.
First consider the case of U = 9 ( x) = IX I. We can partition the range of X
into two parts so that the function is one-to-one separately in those two (P = 2)
77
94
Therefore,
78
95
We know the joint pdf of Xl and X 2 and we are interested in finding the joint pdf
of U1 and U2 . We denote the inverse functions as
and
Suppose the joint pdf of Xl and X 2 is given by f(xl, X2)' The first step is to find
the Jacobian.
J = j8(X,X2)
O(Ul,U2)
1= Id(h (UId(Ul,U2)
U2),h 2(UI,U2)) I
l 1
(1)
The third and laJt Jtep is to find the ranges of U1 and U2 using the relationship
So, to summerize the three steps: (for the one variable case)
feu) = J[h(u)]IJI
(3) Find the range of U using U = g(X) and range of X which is already known.
79
96
And then from the joint p. d. f. f( Ul, U2), we will obtain the p. d. f. of Ul by
Therefore,
(why?).
Therefore,
80
97
We can use either the characteristic function (c.f.) or the moment generating
function (m.g.f) if it exists. In this approach we find the g.f.. of U = g(X) and
compare it with the g.f.. of well known distribution and try to see a match. Best
way to demonstrate this approach is through an example.
U = a+bX,
= -I21ra J-oo
1
---::=-
1
00
e
_ (z_p)2_ 2",2
20":1
j ,,,,
dx
-00
1[ 2] 2dx
~ x-(J.I+o- it)
e ~ (/42-(J.I+o- 2it)2]
= e i /4t
81
98
1 ("'-'2)'
f(x) = --e- 2.. 00 < x < 00.
J'hf(J
if and only if
<Px(t) e itIJ
82