Professional Documents
Culture Documents
Slides 5
Bilkent
This Version: 16 November 2014
(Bilkent)
ECON509
1 / 82
Introduction
In this part of the lecture notes, we will focus on properties of random samples and
consider some important statistics and their distributions, which you will encounter
in your future econometrics courses.
In addition, we will also introduce key convergence concepts, which are at the
foundation of asymptotic theory. You might nd these concepts a bit too abstract,
but they are used in econometrics pretty frequently.
(Bilkent)
ECON509
2 / 82
(Bilkent)
ECON509
3 / 82
f (xi j ),
i =1
where we assume that the population pdf of pmf is a member of a parametric family,
and is the vector of parameters.
The random sampling model in Denition 5.1.1 is sometimes called sampling from
an innite population.
Suppose we obtain the values of X1 , ..., Xn sequentially.
First, the experiment is performed and X1 = x1 is observed.
Then, the experiment is repeated and X2 = x2 is observed.
The assumption of independence implies that the probability distribution for X2 is
unaected by the fact that X1 = x1 was observed rst.
(Bilkent)
ECON509
4 / 82
(Bilkent)
ECON509
5 / 82
1
n
(X i
1 i =1
2
X ) .
(Bilkent)
ECON509
6 / 82
n x 2 .
(xi
a )2
i =1
(xi
x + x
a )2
(xi
x )2 + 2
(xi
(xi
x )2 +
i =1
n
i =1
n
i =1
i =1
n
(x
x ) (x
a) +
(x
a )2
i =1
a )2 .
i =1
a )2 is given by
a = x .
Part (b) can easily be proved by expanding the binomial and taking the sum.
(Bilkent)
ECON509
7 / 82
Lemma (5.2.5): Let X1 , ..., Xn be a random sample from a population and let g (x )
be a function such that E [g (X1 )] and Var (g (X1 )) exist. Then
"
#
n
g (X i )
= nE [g (X1 )]
i =1
and
n
Var
g (X i )
i =1
g (X i )
i =1
i =1
i =1
since Xi are distributed identically. Note that independence is not required here.
(Bilkent)
ECON509
8 / 82
Then,
n
Var
g (X i )
i =1
i =1
i 6 =j
i =1
i =1
for all i 6= j,
due to independence and that Var (g (Xi )) is the same for all Xi , due to their
distributions being identical.
(Bilkent)
ECON509
9 / 82
Theorem (5.2.6): Let X1 , ..., Xn be a random sample from a population with mean
and variance 2 < . Then,
1
2
3
E [X ] = ,
Var (X ) = 2 /n,
E [S 2 ] = 2 .
Proof: Now,
E [X ] = E
1 n
Xi
n i
=1
1 n
1 n 2
2
.
Var
(
X
)
=
=
i
n
n 2 i =1
n 2 i =1
1 n
1
E [Xi ] = n = .
n i
n
=1
Then,
Var (X ) = Var
(Bilkent)
1 n
Xi
n i
=1
ECON509
10 / 82
Finally,
2
E [S ]
=
=
=
=
1
n
(X i
i =1
E [Xi2 ]
1 i =1
n
1
n
1 i =1
1
#
2
X) =
1
n
"
n
n
"
Xi2
i =1
n X 2
E [X 2 ]
n
2 + 2
n 2 + 2
2
+ 2
n
2
+ 2
n
(n
1 ) 2
= 2 .
n 1
(Bilkent)
ECON509
11 / 82
(Bilkent)
ECON509
12 / 82
In particular, if X1 , ..., Xn all have the same distribution with mgf MX (t ), then
MZ (t ) = [MX (t )]n .
Now the sequence we have is actually X1 /n, ..., Xn /n. Observe that if
E [e tX ] = MX (t ),
then,
E [e t (X /n ) ] = E [e n X ] = MX (t/n ).
Then, for Z = X , Theorem (4.6.7) gives
MZ (t ) = MX (t ) = [MX (t/n )]n .
(Bilkent)
ECON509
13 / 82
=
=
exp
t
2 (t/n )2
+
n
2
"
#
2 /n t 2
= exp t +
.
2
t
2 (t/n )2
exp n +
n
2
Thus, X
N (, 2 /n ).
In this example, it was helpful to use Theorem (5.2.7) because the expression for
MX (t ) turned out to be a familiar mgf. It cannot, of course, be guaranteed that this
will always be the case for any X . However, when this is the case, this result makes
derivation of the distribution of X very easy.
Another example is the sample mean for a gamma (, ) sample. The mgf of X in
this case is given by
MX (t ) =
1
(t/n )
1
( /n ) t
ECON509
14 / 82
Now, we focus on the case where the population distribution is the normal
distribution.
The results we introduce in this section will be very useful when you deal with linear
regression models.
We have already talked about distributional properties of the sample mean and
variance. Given the extra assumption of normality, we are now in a position to
determine their full distributions.
The chi squared distribution will come up frequently within this context, so we start
by introducing some important properties related to this distribution.
Remember that the chi squared pdf is a special case of the gamma pdf and is given
by
1
f (x ) =
x (p/2 ) 1 e x /2 ,
0 < x < ,
(p/2)2p/2
where p is called the degrees of freedom.
(Bilkent)
ECON509
15 / 82
21 .
In other words, the square of a standard normal random variable is a chi squared
random variable.
b. If X 1 , ..., X n are independent and X i
2pi , then
X 1 + ... + X n
2p1 +...+pn ;
that is, independent chi squared variables add to a chi squared variable, and the
degrees of freedom also add.
To prove the second part, remember that a 2p random variable is a gamma (p/2, 2)
random variable. By the result obtained in Example (4.6.8), the sum of independent
gamma (i , ) random variables is a gamma (1 + ... + n , ) random variable.
Then, X1 + ... + Xn above is a gamma ((p1 + ... + pn )/2, 2) random variable, which
is a 2p1 +...+pn random variable.
Part (a) can be proved by obtaining the pdf of the transformation Y = Z 2 and then
conrming that this is the pdf of a 21 random variable.
(Bilkent)
ECON509
16 / 82
A nice result about normally distributed random variables is that zero covariance
implies independence, which does not hold necessarily for other distribution
functions.
Lemma (5.3.3): Let Xj
N (j , 2j ), j = 1, ..., n independent. For constants aij and
brj (j = 1, ..., n; i = 1, ..., k; r = 1, ..., m) where k + m n, dene
n
Ui
aij Xj ,
i = 1, ..., k,
brj Xj ,
r = 1, ..., m.
j =1
n
Vr
j =1
(Bilkent)
ECON509
17 / 82
aij Xj brj Xj
j =1
j =1
=E
j =1
aij brj ,
j =1
(Bilkent)
2
2
1
e (1/2 )(x1 +x2 ) ,
2
ECON509
< x1 , x2 < .
18 / 82
and
v = b1 x1 + b2 x2 ,
and
x2 =
which imply
x1 =
b2 u
a1 b2
a2 v
b1 a2
a1 v
a1 b2
b1 u
.
b1 a2
a2
a1 b2 b1 a2
a1
a1 b2 b1 a2
(b2 a1
(a1 b2
b1 a2 )
b1 a2 )
1
a1 b2
b1 a2
(Bilkent)
b2 u
a1 b2
a2 v
a1 v
,
b1 a2 a1 b2
ECON509
b1 u
b1 a2
1
a1 b2
b1 a2
19 / 82
1
exp
2
1
2 (a1 b2
1
a1 b2
where
b1 a2
b1 a2 )2
(b2 u
a2 v ) + (a1 v
b1 u )
< u, v < .
Now,
(b2 u
a2 v )2 + (a1 v
2 (a1 b1 + a2 b2 ) uv .
But we know that (a1 b1 + a2 b2 ) = 0. Hence, this shows that the joint pdf factorises
into a function of u and a function of v . Therefore, U and V are independent!
A similar technique can be utilised to prove part (b). Specically, by using a
transformation argument, it can be shown that the joint pdf of vectors (U1 , ..., Uk )
and (V1 , ..., Vm ) will factorise, which will prove independence. We skip this.
(Bilkent)
ECON509
20 / 82
The main message here is that, if we start with independent normal random
variables, the zero-covariance and independence are equivalent for linear functions of
these random variables. Therefore, checking independence comes down to checking
covariances.
Part (b), on the other hand, makes it possible to infer overall independence of
normal vectors by just checking pairwise independence, which is not valid for general
random variables.
Thinking about part (a), I nd it intuitively much easier to follow that the normal
distribution is completely determined by its mean and variance while other moments
do not matter. Hence, ensuring zero covariance is equivalent to ensuring
independence, since we do not have to care about the remaining moments of the
distribution.
(Bilkent)
ECON509
21 / 82
We can show the usefulness of Lemma 5.3.3 by trying to prove the independence of
X and S 2 when X1 , ..., Xn is sampled from a normal population distribution.
It can be shown that S 2 can be written as a function of (X2 X , ..., Xn X ) . Now,
if we can show that these random variables are not correlated with X , then by the
normality assumption and Lemma 5.3.3, we can conclude independence.
We have
1 n
X = Xi
n i =1
and
X =
i =1
where
ij =
(Bilkent)
Xj
1
0
ECON509
ij
1
n
Xi ,
if i = j
.
otherwise
22 / 82
Cov (X , Xj X ) = E
Xi
ij n Xi
n i
=1
i =1
#
"
1 n
1
2
= E
ij
Xi
n i
n
=1
2
=
Now,
i =1
ij
1
n
1 n
ij
n i
=1
1
n
=1
1
=1
n
i =1
1 = 0.
Hence,
Cov (X , Xj
and so, X and Xj
(Bilkent)
X ) = 0,
ECON509
23 / 82
Convergence Concepts
(Bilkent)
ECON509
24 / 82
Convergence Concepts
The tools you will learn in this section are the fundamental building blocks of what
is known as asymptotic theory or large sample theory. Although a bit abstract
at rst sight, these results are at the core of many proofs you will encounter in
econometrics articles.
The three important concepts we will consider in what follows are
1
2
3
In the rst instance we will consider the case where a sequence of random variables
X1 , ..., Xn exhibits the iid property. This is one (and the simplest) of many possible
dependence settings.
(Bilkent)
ECON509
25 / 82
Convergence Concepts
Almost Sure Convergence
Remember that random variables are functions dened on the sample space, e.g.
X ( ) . Our interest will be on a sequence of random variables, indexed by sample
size, i.e. Xn ( ).
To motivate the following discussion, consider pointwise convergence:
lim Xn ( ) = X ( )
n !
for all 2 ,
(Bilkent)
ECON509
26 / 82
Convergence Concepts
Almost Sure Convergence
The idea is this: pointwise convergence fails for some points in . However, the
number of such points is so small that we can safely assign zero measure (or zero
probability) to the set of these points.
Other ways of expressing this denition are,
P
lim jXn
n !
or
P
(Bilkent)
Xj > e = 0
: lim Xn ( ) = X ( )
n !
ECON509
= 1.
27 / 82
Convergence Concepts
Almost Sure Convergence
This time, we have a dierence. Convergence fails on a very small set E such that
P (E ) = 0.
That P (E ) = 0 is due to the set being so small that we can safely assign zero
probability to the set.
Remember that in earlier lectures we have stated that for a continuously distributed
random variable, the probability of a single point is always equal to zero. This is
similar, in spirit, to the situation at hand.
This type of convergence is also called convergence almost everywhere and
convergence with probability 1.
The following notation is common:
a.s .
Xn ! X
wp1
Xn ! X
lim Xn = X a.s.
n !
Note that, at the cost of sloppy notation, the argument of Xn ( ) is usually dropped.
Also, Xn ( ) need not converge to a function. It can also simply converge to some
constant, say, a.
(Bilkent)
ECON509
28 / 82
Convergence Concepts
Almost Sure Convergence
1 n
Z i ( ).
n i
=1
Then, almost sure convergence is a statement on the joint distribution of the entire
sequence fZi ( )g.
White (2001, p.19): The probability measure P determines the joint distribution
of the entire sequence fZi ( )g. A sequence Xn ( ) converges almost surely if the
probability of obtaining a realisation of the sequence fZi ( )g for which convergence
to Xn ( ) occurs is unity.
(Bilkent)
ECON509
29 / 82
Convergence Concepts
Almost Sure Convergence
a.s .
Xd ,n ! Xd
jX i j
a.s .
Then, Xn ! X if
lim jjXn ( )
n !
X ( )jj = 0,
Whenever you see the terms almost sure, with probability one, or almost
everywhere you should remember that the relationship that is referred to holds
everyhwhere except for some set with zero probability.
(Bilkent)
ECON509
30 / 82
Convergence Concepts
Almost Sure Convergence
Now that we have a convergence concept in our arsenal, the next question is: under
which conditions can we use this result for statistics of interest?
One important result is Kolmogorovs Strong Law of Large Numbers (SLLN).
Theorem (Kolmogorovs Strong Law of Large Numbers): If X1 , X2 , ... are
independent and identically distributed random variables and the common mean is
nite, i.e. E [jX1 j] = < , then
1 n
Xi =
n ! n
i =1
lim
a.s.
or
1 n
a.s .
Xi ! .
n i
=1
Note that the sample size will never be equal to . However, when the sample size
is large enough, the sample mean will be very close to the population mean, .
What is remarkable about this result is that, as long as we have iid random
variables, the only condition is that the random variables have nite mean.
For the time being, su ce to say that one can relax the iid assumption but that
means that we will require more than just nite means. There is a trade-o between
the amount of dependence and the strength of moment assumptions you have to
make in order to attain convergence.
(Bilkent)
ECON509
31 / 82
Convergence Concepts
Convergence in Probability
n !
X j > e) = 0.
An equivalent statement is that given e > 0 and > 0 there exists an N, which
depends on both and e, such that P (jXn X j > e) < for all n > N. This
merely is a restatement using the formal denition of limit.
One could also write
lim P (f : jXn ( )
n !
(Bilkent)
ECON509
X ( )j > eg) = 0.
32 / 82
Convergence Concepts
Convergence in Probability
It is not easy to see the dierence between the two modes of convergence. However,
lets give it a try!
Almost sure convergence states that we have pointwise convergence for all 2
except for a small, zero measure set E . Importantly, this set is independent of n.
Convergence in probability states that as the sample size goes towards , the
probability that Xn will deviate from X by more than e decreases towards zero.
However, for any sample size, there is a positive probability that Xn will deviate by
more than e. In other words, for some 2 En , such that P (En ) > 0, jXn X j
will be larger than e.
Importantly, there is nothing that restricts En to be the same for all n. Hence, the
set on which Xn deviates from X may change as n increases.
The good news is, deviation probability slowly goes to zero, hence P (En ) will
eventually be zero.
(Bilkent)
ECON509
33 / 82
Convergence Concepts
Convergence in Probability
1 n
Zi .
n i
=1
White (2001, p.24): With almost sure convergence, the probability measure P
takes into account the joint distribution of the entire sequence fZi g, but with
convergence in probability, we only need concern ourselves sequentially with the joint
distribution of the elements of fZi g that actually appear in Xn , typically the rst n.
(Bilkent)
ECON509
34 / 82
Convergence Concepts
Convergence in Probability
(Bilkent)
ECON509
35 / 82
Convergence Concepts
Convergence in Probability
X ()-X( ) for some given - Case 1
n
80
60
40
20
-20
-40
(Bilkent)
20
40
60
80
100
n
ECON509
120
140
160
180
200
36 / 82
Convergence Concepts
Convergence in Probability
X ()-X( ) for some given - Case 2
n
80
60
40
20
-20
-40
-60
(Bilkent)
20
40
60
80
100
n
ECON509
120
140
160
180
200
37 / 82
Convergence Concepts
Convergence in Probability
+ I[0,1 ] ( ) ,
X2 ( ) = + I[0, 1 ] ( ) ,
X3 ( )
+ I[ 1 ,1 ] ( ) ,
X4 ( ) = + I[0, 1 ] ( ) ,
X5 ( )
+ I[ 1 , 2 ] ( ) ,
X6 ( ) = + I[ 2 ,1 ] ( ) ,
3 3
etc. Let, X ( ) = .
Do we have convergence in probability? Observe that
X1 ( )
X ( )
I[0,1 ] ( ) ,
X2 ( )
X3 ( )
X ( )
I[ 1 ,1 ] ( ) ,
X4 ( )
X ( ) = I[0, 1 ] ( ) ,
2
X ( ) = I[0, 1 ] ( ) ,
X5 ( )
X ( )
I[ 1 , 2 ] ( ) ,
X6 ( )
X ( ) = I[ 2 ,1 ] ( ) ,
3
3 3
etc.
(Bilkent)
ECON509
38 / 82
Convergence Concepts
Convergence in Probability
(Bilkent)
ECON509
39 / 82
Convergence Concepts
Convergence in Probability
Example (5.5.7): Let the sample space be the closed interval [0, 1] with the
uniform probability distribution. Dene random variables
Xn ( ) = + n
and
X ( ) = .
(Bilkent)
ECON509
40 / 82
Convergence Concepts
Convergence in Probability
Xn ! X
or
p lim Xn = X ,
n !
as short hand.
Associated with convergence in probability is the Weak Law of Large Numbers
(WLLN)
Theorem (Weak Law of Large Numbers): If X1 , X2 , ... are iid random variables
with common mean < and variance 2 < , then
1 n
p
Xi ! ,
n i
=1
as n ! .
(Bilkent)
ECON509
41 / 82
Convergence Concepts
Convergence in Probability
Proof: The proof uses Chebychevs Inequality. Remember this says that if X is a
random variable and if g (x ) is a nonnegative function, then, for any r > 0,
P (g (X )
r)
e2
E [g (X )]
.
r
Now, consider
P (jX n
e) = P (X n
) .
P (jX n
j < e) = 1
and
lim 1
Hence,
E [(X n ) ]
Var (X n )
2
=
= 2,
2
2
e
e
ne
n !
e)
2
,
ne2
2
= 1.
ne2
lim P (jX n
n !
(Bilkent)
ECON509
j < e) = 1.
This Version: 16 November 2014
42 / 82
Convergence Concepts
Convergence in Probability
This is perhaps a good time to stop and reect a little bit on these new concepts.
What the Weak and Strong LLNs are saying is that under certain conditions, the
sample mean converges to the population mean as n ! . This is known as
consistency: one would say that the sample mean is a consistent estimator of the
population mean.
In actual applications, this means that if the sample size is large enough, then the
sample mean is close to the population mean. So n does not have to be that close
to innity. On the other hand, as mentioned at the beginning , how large the
sample size should be in order to be considered a large enough sample is a
dierent question in its own. We will not deal with this here.
Sometimes, consistency is compared to unbiasedness.
An estimator of a population value is an unbiased estimator if and only if
E [ ] = .
What are the things we might want to estimate? One example would be parameters
of a distribution family. For example, we might know that the data are distributed
with N , 2 , but we may not know the particular values of and 2 . In this case,
we would estimate these parameters.
(Bilkent)
ECON509
43 / 82
Convergence Concepts
Convergence in Probability
The p lim operator has some nice properties that makes it much more convenient to
deal with, compared to the expectation operator. In particular, let X1 , X2 , ... and
Y1 , Y2 , ... be two random sequences and a1 , a2 , ... be some non-stochastic sequence.
Then, we have the following.
1
p lim
n !
Xn
p limn ! Xn
=
,
Yn
p limn ! Yn
Xn
Yn
6=
E [X n ]
;
E [Y n ]
n !
n !
(Bilkent)
ECON509
n !
44 / 82
Convergence Concepts
Convergence in Probability
Note that, one concept does not usually imply another. In other words, a consistent
estimator can be biased, while an unbiased estimator can be inconsistent.
Suppose we are trying to estimate the population parameter . Consider the
following estimators.
1
2
(Bilkent)
ECON509
45 / 82
Convergence Concepts
Convergence in Probability
K !K
(Bilkent)
and
ECON509
K ! K.
46 / 82
Convergence Concepts
Convergence in Probability
As far as economists and most of the econometricians are concerned, one would not
care too much about whether convergence is achieved almost surely or in probability.
As long as convergence is achieved, the rest is not important.
However, in some cases it might be easier to prove the LLN for one of the two
p
a.s .
convergence types. This is no problem, as ! implies ! anyway.
(Bilkent)
ECON509
47 / 82
Convergence Concepts
Convergence in Probability
1
n
(X i
1 i =1
2
X n ) ,
Sn2
e =P
Sn2
Sn2
e2
Var (Sn2 )
.
e2
(Bilkent)
ECON509
48 / 82
Convergence Concepts
Convergence in Probability
Although we might get convergence results for some sample mean Xn and Yn , we
might actually be interested in the convergence properties of a function of these, say
g (X n , Y n ) .
Fortunately, we have the following useful result.
If (Xn , Yn ) converges almost surely (in probability) to (X , Y ) , if g (x , y ) is a
continuous function over some set D , and if the images of under
[Xn ( ) , Yn ( )] and [X ( ) , Y ( )] are in D , then g (Xn , Yn ) converges almost
surely (in probability) to g (X , Y ) .
More pragmatically,
a.s .
a.s .
Xn ! X
g (X n ) ! g (X ),
Xn ! X
g (X n ) ! g (X ).
2
Example (5.5.5) (Consistency of S ): If Sn2 is a consistent
p estimator of , then
one can show that the sample standard deviation Sn = Sn2 = h (Sn2 ) is a consistent
estimator of .
ECON509
49 / 82
Convergence Concepts
Convergence in Probability
n !
X jp ] = 0.
Just as a reference, Lp convergence does not imply almost sure convergence and nor
does almost sure convergence imply Lp convergence. However, Lp convergence
implies convergence in probability.
(Bilkent)
ECON509
50 / 82
Convergence Concepts
Convergence in Probability
then
lim sup
n ! 2
1 n
g (X i , )
n i
=1
E [g (X1 , )] = 0
a.s.
This is not a simple generalisation of strong LLN. Generally, LLN for each and every
individually would not imply uniform LLN (except under certain assumptions).
(Bilkent)
ECON509
51 / 82
Convergence Concepts
Convergence in Distribution
(Bilkent)
ECON509
52 / 82
Convergence Concepts
Convergence in Distribution
n !
Xn ! X ,
d
Xn ! FX ,
L
Xn ! FX .
It is important to underline that it is not Xn that converges to a distribution.
Instead, it is the distribution of Xn that converges to the distribution of X .
(Bilkent)
ECON509
53 / 82
Convergence Concepts
Convergence in Distribution
n !
is equivalent to
F X n (x ) = P (X n
(Bilkent)
x) !
ECON509
0
1
if x < a
.
if x > a
This Version: 16 November 2014
54 / 82
Convergence Concepts
Convergence in Distribution
We now introduce one of the most useful theorems we have considered so far.
Theorem (5.5.15) (Central Limit Theorem): Let X1 , X2 , ... be a sequence of iid
random variables with E [Xi ] = < and 0 < Var (Xi ) = 2 < . Dene
1 n
X n = Xi .
n i =1
Let Gn (x ) denote the cdf of
(X n )
.
lim Gn (x ) =
n !
In other words,
1
p
n
(Bilkent)
i =1
Z x
Xi
ECON509
1
p e
2
y 2 /2
< x < ,
dy .
! N (0, 1).
55 / 82
Convergence Concepts
Convergence in Distribution
This is a powerful result! We start with the iid and nite mean and variance
assumptions. In return, the Central Limit Theorem (CLT) promises us that the
distribution of a properly standardised version of the sample mean given by
1
p
n
i =1
Xi
will converge to the standard normal distribution as the sample size tends to innity.
As before, the sample size will never be equal to . BUT, for large enough samples,
Xi
p1 n
will be approximately standard normal. As n becomes larger, this
n i =1
approximate result will become more accurate.
As with LLNs, it is possible to obtain CLTs for non-iid data. However, this will
require one to make stronger assumptions regarding the moments of the sequence of
random variables. The trade-o between dependence and moment assumptions is
always there.
(Bilkent)
ECON509
56 / 82
Convergence Concepts
Convergence in Distribution
Lets prove the CLT. However, before we do that, we have to revisit Taylor
Expansions.
Denition (5.5.20): If a function g (x ) has derivatives of order r , that is,
dr
g (r ) (x ) = dx
r g (x ) exists, then for any constant a, the Taylor polynomial of order r
about a is
r
g (i ) (a )
T r (x ) =
(x a )i .
i!
i =0
This polynomial is used in order to obtain a Taylor expansion of order r about
x = a. This is given by
g (x ) = T r (x ) + R,
where R = g (x )
(Bilkent)
ECON509
57 / 82
Convergence Concepts
Convergence in Distribution
dr
dx r
g (x )
exists, then
x =a
lim
x !a
g (x )
(x
T r (x )
= 0.
a )r
This says that the remainder, g (x ) Tr (x ) , always tends to zero faster than the
highest-order term of the approximation.
Importantly, this also means that as x tends to a, the remainder term approaches 0.
(Bilkent)
ECON509
58 / 82
Convergence Concepts
Convergence in Distribution
X n
converges to the mgf of a N (0, 1) random variable, which will prove that the
p X n
distribution of n
converges to the standard normal distribution.
Now, let Yi =
Xi
.
Then,
h Xi i
MY i (t ) = E [e tY i ] = E e t
=e
(Bilkent)
ECON509
h Xi i
E et = e
MX i (t/ ) .
59 / 82
Convergence Concepts
Convergence in Distribution
X n
n Y .
Since Xi are iid, Yi are also iid. In addition, E [Yi ] = 0 and Var (Yi ) = 1.
Now, due to the independence and identical distribution assumptions,
i
h
i
h p1
p
p
t (Y +...+Y n )
= E e (t / n )Y 1 ... e (t / n )Y n
M n (t ) = E e n 1
p
p
p n
= E [e (t / n )Y 1 ] ... E [e (t / n )Y n ] = MY i t/ n
.
p
Lets expand MY i t/ n around t = 0.
MY i
t
p
M Y i (0 ) +
1 d2
M
2 dt 2 Y i
where
RY
(Bilkent)
t
p
d
M
dt Y i
t
p
dk
dt k MYi
k =3
ECON509
t
p
t =0
t
p
t =0
t
p
t
p
t =0
+ RY i
p
t/ n
k!
t
p
60 / 82
Convergence Concepts
Convergence in Distribution
1/ n
p
as well, for all t, including t = 0, since RY 0/ n = 0.
(Bilkent)
ECON509
(1)
61 / 82
Convergence Concepts
Convergence in Distribution
p
d
M
t/ n
dt Y i
E [Yi ] = 0,
Var (Yi ) = 1,
t =0
p
d2
M
t/ n
dt 2 Y i
t =0
t
p
=
=
(Bilkent)
"
1+0
1+
1
n
t
p
1
+
2
t2
+ n RY i
2
ECON509
t
p
t
p
+ RY i
t
p
#n
62 / 82
Convergence Concepts
Convergence in Distribution
n !
Let an =
t2
2
lim
n !
pt
+ n RY i
MY i
t
p
1+
= ea.
an
n
= lim
n !
1+
1
n
t2
+ n RY i
2
t
p
= et
/2
But this is the mgf of the N (0, 1) distribution! Hence, the CLT is proved.
(Bilkent)
ECON509
63 / 82
Convergence Concepts
Convergence in Distribution
Example (5.5.16): Suppose (X1 , ..., Xn ) are a random sample from a negative
binomial (r , p ) distribution. For this distribution, one can show that
E [X i ] =
r (1
p)
p
and
Var (Xi ) =
r (1 p )
,
p2
for all i .
Then, the CLT tells us that
p
n (X r (1 p ) /p ) d
p
! N (0, 1) .
r (1 p ) /p 2
(Bilkent)
ECON509
64 / 82
Convergence Concepts
Convergence in Distribution
One can also do exact calculations but these would be di cult. Take r = 10, p = .5
and n = 30. Now, consider
!
30
P (X
11) = P Xi
330
i =1
330
x =0
300 + x
x
1
2
300
1
2
= .8916,
N (0, 1).
ECON509
65 / 82
Convergence Concepts
Convergence in Distribution
g (X n ) ! g (X ).
d
Y n X n ! kX ,
d
Xn + Yn ! X + k .
(Bilkent)
ECON509
66 / 82
Convergence Concepts
Convergence in Distribution
n (X n )
n (X n
=
Sn
Sn |
{z
|{z}
d
!N (0,1 )
!1
(Bilkent)
)
}
! N (0, 1) .
0 while Var (X ) = 12
n (X n
Sn
ECON509
N (0, 1) .
1. Moreover, clearly, X is
! N (0, 1) .
This Version: 16 November 2014
67 / 82
Convergence Concepts
The Delta Method
When talking about the CLT, our focus has been on the limiting distribution of
some standardised random variable.
There are many instances, however, when we are not specically interested in the
distribution of the standardised random variable itself, but rather of some function of
it.
The delta method comes in handy in such cases. This method utilises our knowledge
on the limiting distribution of a random variable in order nd the limiting
distribution of a function of this random variable.
In essence, this method is a combination of Slutskys Theorem and Taylors
approximation.
(Bilkent)
ECON509
68 / 82
Convergence Concepts
The Delta Method
For a given function g ( ) and a specic value of , suppose that g 0 ( ) exists and
g 0 ( ) 6= 0. Then,
p
d
2
n [g (Yn ) g ( )] ! N 0, 2 g 0 ( )
.
Proof: The rst-order Taylor expansion of g (Yn ) about Yn = is
g (Y n ) = g ( ) + g 0 ( ) (Y n
p
) + R,
)
}
and therefore,
p
n [g (Y n )
as n ! ,
!N (0,2 )
g ( )] ! g 0 ( ) N 0, 2 = N 0, g 0 ( )
p
2 .
p
ECON509
69 / 82
Convergence Concepts
The Delta Method
) +
g 00 ( )
(Y n
2
)2 + R.
p
p
As before, R ! 0 as Yn ! . However, this time g 0 ( ) = 0.
So,
g (Y n )
(Bilkent)
g ( ) +
g 00 ( )
(Y n
2
ECON509
)2 ,
as n ! .
70 / 82
Convergence Concepts
The Delta Method
Now,
p Yn
n
! N (0, 1) ,
g ( )]
Yn
! 21 .
g 00 ( )
n (Y n )2
2 |
{z
}
as n ! ,
!2 21
and, therefore,
n [g (Y n )
(Bilkent)
g ( )] !
ECON509
g 00 ( ) 2 2
1 .
2
71 / 82
Convergence Concepts
The Delta Method
(Bilkent)
g ( )] !
ECON509
g 00 ( ) 2 2
1 .
2
72 / 82
Convergence Concepts
Some More Large Sample Results
So far we have worked with iid random sequences only. We now introduce large
sample results for a dierent type of distribution. The following is largely based on
White (2001).
Lets start with independent heterogeneously distributed random variables.
The failure of the identical distribution assumption results from stratifying
(grouping) the population in some way. The independence assumption remains valid
provided that sampling within and across the strata is random.
Theorem (Markovs Law of Large Numbers): Let X1 , ..., Xn be a sequence of
independent random variables, with E [Xi ] = i < , for all i . If for some > 0,
i1
0 h
E jX i i j1 +
A < ,
@
i 1 +
i =1
then
1 n
Xi
n i
=1
(Bilkent)
1 n
a.s .
i ! 0.
n i
=1
ECON509
73 / 82
Convergence Concepts
Some More Large Sample Results
1 n
ni=1 Xi
n i =1 i ! 0.
Remember that Kolmogorovs strong law only requires the existence of the rst
order moment.
1
n
So, moving away from the iid assumption usually comes at the cost of stronger
moment assumptions.
Xi are heterogeneous so their expected values are not identical. So, now we have
n 1 ni=1 i lying around. Compare this with the iid case,
1 n
1 n
E [Xi ] = = .
n i =1
n i =1
(Bilkent)
ECON509
74 / 82
Convergence Concepts
Some More Large Sample Results
1 n
E [X i ]
n i
=1
2 =
and
1 n
Var (Xi ) ,
n i
=1
p (X
n
! N (0, 1)
n !
1 1 n
2 n i =1 (x
i )2 >n 2
(x
i )2 dFi (x ) = 0.
(2)
ECON509
75 / 82
Convergence Concepts
Some More Large Sample Results
(x i )
n
i =1 Var (X i )
(Bilkent)
> .
ECON509
76 / 82
Convergence Concepts
Some More Large Sample Results
2 > 0
for all n su ciently large, then
p (X
n
! N (0, 1) .
The conditions of this results are easiler to follow. We now need existence of even
higher order moments (more than the second order).
(Bilkent)
ECON509
77 / 82
Convergence Concepts
Order Notation
f (x )
!0
g (x )
as
x ! ,
x !
constant,
f (x ) = O fg (x )g
(Bilkent)
ECON509
78 / 82
Convergence Concepts
Order Notation
Xn p
! 0,
f (n )
then we write
Xn = op ff (n )g ,
while if
Xn p
! X,
f (n )
where X is a constant,
we write
Xn = Op ff (n )g .
The same applies to almost sure convergence.
(Bilkent)
ECON509
79 / 82
Convergence Concepts
Order Notation
Then, instead of
p
X !
a.s .
X !
or
X = + oa.s . (1) ,
or
respectively.
Moreover, if
(Bilkent)
p (X
n
p (X
n
ECON509
! N (0, 1) ,
= O p (1 )
80 / 82
Convergence Concepts
Order Notation
So,
Xn
! 0 = O (1 ) .
n
Xn
! O (1 ) , X n ! O n .
n
(Bilkent)
ECON509
81 / 82
Convergence Concepts
Order Notation
Xn
= O (1 ) .
n
(Bilkent)
ECON509
82 / 82