You are on page 1of 82

ECON509 Probability and Statistics

Slides 5

Bilkent
This Version: 16 November 2014

(Bilkent)

ECON509

This Version: 16 November 2014

1 / 82

Introduction

In this part of the lecture notes, we will focus on properties of random samples and
consider some important statistics and their distributions, which you will encounter
in your future econometrics courses.
In addition, we will also introduce key convergence concepts, which are at the
foundation of asymptotic theory. You might nd these concepts a bit too abstract,
but they are used in econometrics pretty frequently.

(Bilkent)

ECON509

This Version: 16 November 2014

2 / 82

Basic Concepts of Random Samples

We start with a denition.


Denition (5.1.1): The random variables X1 , ..., Xn are called a random sample of
size n from the population f (x ) if X1 , ..., Xn are mutually independent random
variables and the marginal pdf or pmf of each Xi is the same function f (x ).
Alternatively, X1 , ..., Xn are called independent and identically distributed random
variables with pdf or pmf f (x ). This is commonly abbreviated to iid random
variables.
In many experiments there are n > 1 repeated observations made on the variable,
where X1 is the rst observation, X2 is the second observation etc.
In that case, each Xi has a marginal distribution given by f (x ).
In addition, the value of one observation has no eect on or relationship with any of
the other observations.
X1 , ..., Xn are mutually independent.

(Bilkent)

ECON509

This Version: 16 November 2014

3 / 82

Basic Concepts of Random Samples

Then, the joint pdf or pmf is given by


n

f (x1 , ..., xn j ) = f (x1 j ) f (x2 j ) ... f (xn j ) =

f (xi j ),

i =1

where we assume that the population pdf of pmf is a member of a parametric family,
and is the vector of parameters.
The random sampling model in Denition 5.1.1 is sometimes called sampling from
an innite population.
Suppose we obtain the values of X1 , ..., Xn sequentially.
First, the experiment is performed and X1 = x1 is observed.
Then, the experiment is repeated and X2 = x2 is observed.
The assumption of independence implies that the probability distribution for X2 is
unaected by the fact that X1 = x1 was observed rst.

(Bilkent)

ECON509

This Version: 16 November 2014

4 / 82

Sums of Random Variables from a Random Sample


Suppose we have drawn a sample (X1 , ..., Xn ) from the population. We might want
to obtain some summary statistics.
This summary might be dened as a function
T (x1 , ..., xn ),
which might be real or vector-valued.
So,
Y = T (X1 , ..., Xn ),
is a random variable or a random vector.
We can use similar techniques as those introduced for functions of random variables,
to investigate the distributional properties of Y .
Thanks to (X1 , ..., Xn ) possessing the iid property, the distribution of Y will be
tractable.
This distribution is usually derived from the distribution of the variables in the
random sample. Hence, it is called the sampling distribution of Y .

(Bilkent)

ECON509

This Version: 16 November 2014

5 / 82

Sums of Random Variables from a Random Sample


Denition (5.2.1): Let X1 , ..., Xn be a random sample of size n from a population
and let T (x1 , ..., xn ) be a real-valued or vector-valued function whose domain
includes the sample space of (X1 , ..., Xn ) . Then, the random variable or random
vector Y = T (X1 , ..., Xn ) is called a statistic. The probability distribution of a
statistic Y is called the sampling distribution of Y .
Lets consider some commonly used statistics.
Denition (5.2.2): The sample mean is the arithmetic average of the values in a
random sample. It is usually denoted by
1 n
X = Xi .
n i =1
Denition (5.2.3): The sample variance is the statistic dened by
S2 =

1
n

(X i

1 i =1

2
X ) .

The sample standard deviation is the statistic dened by


p
S = S 2.

(Bilkent)

ECON509

This Version: 16 November 2014

6 / 82

Sums of Random Variables from a Random Sample


Theorem (5.2.4): Let x1 , ..., xn be any number and x = (x1 + ... + xn ) /n. Then,
1
2

min a ni=1 (xi a )2 = ni=1 (xi x )2 ,


(n 1 ) s 2 = ni=1 (xi x )2 = ni=1 xi2

n x 2 .

Proof: To prove part (a), consider


n

(xi

a )2

i =1

(xi

x + x

a )2

(xi

x )2 + 2

(xi

(xi

x )2 +

i =1
n

i =1
n

i =1

i =1
n

(x

x ) (x

a) +

(x

a )2

i =1

a )2 .

i =1

Clearly, the value of a that minimises ni=1 (xi

a )2 is given by

a = x .
Part (b) can easily be proved by expanding the binomial and taking the sum.

(Bilkent)

ECON509

This Version: 16 November 2014

7 / 82

Sums of Random Variables from a Random Sample

Lemma (5.2.5): Let X1 , ..., Xn be a random sample from a population and let g (x )
be a function such that E [g (X1 )] and Var (g (X1 )) exist. Then
"
#
n

g (X i )

= nE [g (X1 )]

i =1

and
n

Var

g (X i )

i =1

Proof: This is straightforward. First,


"
#
n

g (X i )

i =1

= nVar (g (X1 )).

i =1

i =1

E [g (Xi )] = E [g (X1 )] = nE [g (X1 )],

since Xi are distributed identically. Note that independence is not required here.

(Bilkent)

ECON509

This Version: 16 November 2014

8 / 82

Sums of Random Variables from a Random Sample

Then,
n

Var

g (X i )

i =1

Var (g (Xi )) + Cov (g (Xi ), g (Xj ))

i =1

i 6 =j

Var (g (Xi )) + 0 = Var (g (X1 )) = nVar (g (X1 )),

i =1

i =1

where we have used the fact that


Cov (g (Xi ), g (Xj )) = 0

for all i 6= j,

due to independence and that Var (g (Xi )) is the same for all Xi , due to their
distributions being identical.

(Bilkent)

ECON509

This Version: 16 November 2014

9 / 82

Sums of Random Variables from a Random Sample

Theorem (5.2.6): Let X1 , ..., Xn be a random sample from a population with mean
and variance 2 < . Then,
1
2
3

E [X ] = ,
Var (X ) = 2 /n,
E [S 2 ] = 2 .

Proof: Now,
E [X ] = E

1 n
Xi
n i
=1

1 n
1 n 2
2
.
Var
(
X
)
=

=
i

n
n 2 i =1
n 2 i =1

1 n
1
E [Xi ] = n = .
n i
n
=1

Then,
Var (X ) = Var

(Bilkent)

1 n
Xi
n i
=1

ECON509

This Version: 16 November 2014

10 / 82

Sums of Random Variables from a Random Sample

Finally,
2

E [S ]

=
=
=
=

1
n

(X i

i =1

E [Xi2 ]

1 i =1
n

1
n

1 i =1
1

#
2

X) =

1
n

"

n
n

"

Xi2

i =1

n X 2

E [X 2 ]
n

2 + 2

n 2 + 2

2
+ 2
n

2
+ 2
n

(n

1 ) 2
= 2 .
n 1

An aside: You might already know that, since E [X ] = and E [S 2 ] = 2 , one


would refer to X and S 2 as the unbiased estimators for and 2 , respectively.

(Bilkent)

ECON509

This Version: 16 November 2014

11 / 82

Sums of Random Variables from a Random Sample


Now, observe that

MX (t ) = E [e t X ] = E [e t (X 1 +...+X n )/n ] = E [e (t /n )Y ] = MY (t/n ).


Of course, since X1 , ..., Xn are identically distributed, MX i (t ) is the same function
for each i . Therefore, we have the following result.
Theorem (5.2.7): Let X1 , ..., Xn be a random sample from a population with mgf
MX (t ). Then, the mgf of the sample mean is
MX (t ) = [MX (t/n )]n .
The proof is an application of Theorem (4.6.7).
Proof: Remember that Theorem (4.6.7) says that if X1 , ..., Xn are mutually
independent random variables with mgfs MX 1 (t ), ..., MX n (t ) and if
Z = X1 + ... + Xn ,
then, the mgf of Z is
MZ (t ) = MX 1 (t ) ... MX n (t ).

(Bilkent)

ECON509

This Version: 16 November 2014

12 / 82

Sums of Random Variables from a Random Sample

In particular, if X1 , ..., Xn all have the same distribution with mgf MX (t ), then
MZ (t ) = [MX (t )]n .
Now the sequence we have is actually X1 /n, ..., Xn /n. Observe that if
E [e tX ] = MX (t ),
then,

E [e t (X /n ) ] = E [e n X ] = MX (t/n ).
Then, for Z = X , Theorem (4.6.7) gives
MZ (t ) = MX (t ) = [MX (t/n )]n .

(Bilkent)

ECON509

This Version: 16 November 2014

13 / 82

Sums of Random Variables from a Random Sample


Example (5.2.8) (Distribution of the mean): Let X1 , ..., Xn be a random sample
from a N (, 2 ) population. Then the mgf of the sample mean is
MX (t )

=
=

exp

t
2 (t/n )2
+
n
2

"

#
2 /n t 2
= exp t +
.
2

t
2 (t/n )2
exp n +
n
2

Thus, X
N (, 2 /n ).
In this example, it was helpful to use Theorem (5.2.7) because the expression for
MX (t ) turned out to be a familiar mgf. It cannot, of course, be guaranteed that this
will always be the case for any X . However, when this is the case, this result makes
derivation of the distribution of X very easy.
Another example is the sample mean for a gamma (, ) sample. The mgf of X in
this case is given by
MX (t ) =

1
(t/n )

1
( /n ) t

which reveals that X


gamma (n, /n ).
Of course, there are cases where this method is not useful, due to mgf of X not
being recognisable. Remedies for such cases are considered in Casella and Berger
(2001, pp. 215-217). We will not cover these.
(Bilkent)

ECON509

This Version: 16 November 2014

14 / 82

Sampling from the Normal Distribution

Now, we focus on the case where the population distribution is the normal
distribution.
The results we introduce in this section will be very useful when you deal with linear
regression models.
We have already talked about distributional properties of the sample mean and
variance. Given the extra assumption of normality, we are now in a position to
determine their full distributions.
The chi squared distribution will come up frequently within this context, so we start
by introducing some important properties related to this distribution.
Remember that the chi squared pdf is a special case of the gamma pdf and is given
by
1
f (x ) =
x (p/2 ) 1 e x /2 ,
0 < x < ,
(p/2)2p/2
where p is called the degrees of freedom.

(Bilkent)

ECON509

This Version: 16 November 2014

15 / 82

Sampling from the Normal Distribution


Lemma (5.3.2) (Facts about chi squared random variables): We use the notation
2p denote a chi squared random variable with p degrees of freedom.
a. If Z

N (0, 1 ) random variable, then


Z2

21 .

In other words, the square of a standard normal random variable is a chi squared
random variable.
b. If X 1 , ..., X n are independent and X i
2pi , then
X 1 + ... + X n

2p1 +...+pn ;

that is, independent chi squared variables add to a chi squared variable, and the
degrees of freedom also add.

To prove the second part, remember that a 2p random variable is a gamma (p/2, 2)
random variable. By the result obtained in Example (4.6.8), the sum of independent
gamma (i , ) random variables is a gamma (1 + ... + n , ) random variable.
Then, X1 + ... + Xn above is a gamma ((p1 + ... + pn )/2, 2) random variable, which
is a 2p1 +...+pn random variable.
Part (a) can be proved by obtaining the pdf of the transformation Y = Z 2 and then
conrming that this is the pdf of a 21 random variable.
(Bilkent)

ECON509

This Version: 16 November 2014

16 / 82

Sampling from the Normal Distribution

A nice result about normally distributed random variables is that zero covariance
implies independence, which does not hold necessarily for other distribution
functions.
Lemma (5.3.3): Let Xj
N (j , 2j ), j = 1, ..., n independent. For constants aij and
brj (j = 1, ..., n; i = 1, ..., k; r = 1, ..., m) where k + m n, dene
n

Ui

aij Xj ,

i = 1, ..., k,

brj Xj ,

r = 1, ..., m.

j =1
n

Vr

j =1

a. The random variables U i and V r are independent if and only if Cov (U i , V r ) = 0.


Furthermore, Cov (U i , V r ) = nj=1 a ij b rj 2j .
b. The random vectors (U 1 , ..., U k ) and (V 1 , ..., V m ) are independent if and only if U i is
independent of V r for all pairs i , r (i = 1, ..., k ; r = 1, ..., m).

(Bilkent)

ECON509

This Version: 16 November 2014

17 / 82

Sampling from the Normal Distribution

Again, we consider the case where Xj


"
n

Cov (Ui , Vr ) = E [Ui Vr ] = E

N (0, 1), for simplicity. Then,


#
"
#
n

aij Xj brj Xj

j =1

j =1

=E

aij brj Xj2

j =1

aij brj ,

j =1

due to independence. The implication from independence to zero covariance is also


immediate (Theorem (4.5.5)). In addition, since Ui and Vr are linear combinations
of normal random variables, they are also normally distributed (Corollary 4.6.10).
Now, it is a bit more involved to show that we have indeed independence. Consider
the case where n = 2, as the more general proof will be similar in spirit but more
complicated.
The joint pdf is given by
fX 1 ,X 2 (x1 , x2 ) =

(Bilkent)

2
2
1
e (1/2 )(x1 +x2 ) ,
2

ECON509

< x1 , x2 < .

This Version: 16 November 2014

18 / 82

Sampling from the Normal Distribution

Consider the transformation is given by


u = a1 x1 + a2 x2

and

v = b1 x1 + b2 x2 ,

and

x2 =

which imply
x1 =

b2 u
a1 b2

a2 v
b1 a2

a1 v
a1 b2

b1 u
.
b1 a2

The Jacobian of the transformation is


b2
a1 b2 b1 a2
b1
a1 b2 b1 a2

a2
a1 b2 b1 a2
a1
a1 b2 b1 a2

(b2 a1
(a1 b2

b1 a2 )
b1 a2 )

1
a1 b2

b1 a2

Then, the new joint pdf is,


fU ,V (u, v ) = fX 1 ,X 2

(Bilkent)

b2 u
a1 b2

a2 v
a1 v
,
b1 a2 a1 b2

ECON509

b1 u
b1 a2

1
a1 b2

b1 a2

This Version: 16 November 2014

19 / 82

Sampling from the Normal Distribution


Therefore,
fU ,V (u, v )

1
exp
2

1
2 (a1 b2

1
a1 b2
where

b1 a2

b1 a2 )2

(b2 u

a2 v ) + (a1 v

b1 u )

< u, v < .

Now,

(b2 u

a2 v )2 + (a1 v

b1 u )2 = b12 + b22 u 2 + a12 + a22 v 2

2 (a1 b1 + a2 b2 ) uv .

But we know that (a1 b1 + a2 b2 ) = 0. Hence, this shows that the joint pdf factorises
into a function of u and a function of v . Therefore, U and V are independent!
A similar technique can be utilised to prove part (b). Specically, by using a
transformation argument, it can be shown that the joint pdf of vectors (U1 , ..., Uk )
and (V1 , ..., Vm ) will factorise, which will prove independence. We skip this.

(Bilkent)

ECON509

This Version: 16 November 2014

20 / 82

Sampling from the Normal Distribution

The main message here is that, if we start with independent normal random
variables, the zero-covariance and independence are equivalent for linear functions of
these random variables. Therefore, checking independence comes down to checking
covariances.
Part (b), on the other hand, makes it possible to infer overall independence of
normal vectors by just checking pairwise independence, which is not valid for general
random variables.
Thinking about part (a), I nd it intuitively much easier to follow that the normal
distribution is completely determined by its mean and variance while other moments
do not matter. Hence, ensuring zero covariance is equivalent to ensuring
independence, since we do not have to care about the remaining moments of the
distribution.

(Bilkent)

ECON509

This Version: 16 November 2014

21 / 82

Sampling from the Normal Distribution

We can show the usefulness of Lemma 5.3.3 by trying to prove the independence of
X and S 2 when X1 , ..., Xn is sampled from a normal population distribution.
It can be shown that S 2 can be written as a function of (X2 X , ..., Xn X ) . Now,
if we can show that these random variables are not correlated with X , then by the
normality assumption and Lemma 5.3.3, we can conclude independence.
We have

1 n
X = Xi
n i =1

and

X =

i =1

where
ij =

(Bilkent)

Xj

1
0

ECON509

ij

1
n

Xi ,

if i = j
.
otherwise

This Version: 16 November 2014

22 / 82

Sampling from the Normal Distribution


For simplicity (and without loss of generality) consider the case Xj
N (0, 2 ) for all
j. Then,
(
#)
!"
n
1 n
1

Cov (X , Xj X ) = E
Xi
ij n Xi
n i
=1
i =1
#
"
1 n
1
2
= E
ij
Xi
n i
n
=1
2

=
Now,

i =1

ij

1
n

1 n
ij
n i
=1

1
n

=1

1
=1
n
i =1

1 = 0.

Hence,
Cov (X , Xj
and so, X and Xj

(Bilkent)

X ) = 0,

X are independent for all j, which yields the desired result.

ECON509

This Version: 16 November 2014

23 / 82

Convergence Concepts

The underlying idea in this section is to understand what happens to sequences of


random variables, or also summary statistics, when we let the sample size go to
innity.
This is, of course, an idealistic concept, if you like, because the sample size never
goes to innity. However, the idea is to attain a grasp of what happens when the
sample size becomes large enough.
This is important because, although important results are based on the case when
the sample size approaches , they are actually relevant for nite sample size, which
are large enough.
How large is large enough is very much related to the data, its dependence
structure, the econometric model at hand etc, so it is di cult to give a proper rule
of thumb.

(Bilkent)

ECON509

This Version: 16 November 2014

24 / 82

Convergence Concepts

The tools you will learn in this section are the fundamental building blocks of what
is known as asymptotic theory or large sample theory. Although a bit abstract
at rst sight, these results are at the core of many proofs you will encounter in
econometrics articles.
The three important concepts we will consider in what follows are
1
2
3

almost sure convergence,


convergence in probability,
convergence in distribution.

In the rst instance we will consider the case where a sequence of random variables
X1 , ..., Xn exhibits the iid property. This is one (and the simplest) of many possible
dependence settings.

(Bilkent)

ECON509

This Version: 16 November 2014

25 / 82

Convergence Concepts
Almost Sure Convergence

Remember that random variables are functions dened on the sample space, e.g.
X ( ) . Our interest will be on a sequence of random variables, indexed by sample
size, i.e. Xn ( ).
To motivate the following discussion, consider pointwise convergence:
lim Xn ( ) = X ( )

n !

for all 2 ,

where is, as before, the sample space.


Notice that, convergence occurs for all ! There is not much of a probabilistic
statement here.
This is the strongest form of convergence we can have on the sample space. But it
is not relevant for probabilistic statements.

(Bilkent)

ECON509

This Version: 16 November 2014

26 / 82

Convergence Concepts
Almost Sure Convergence

A slightly restricted version of pointwise convergence is almost sure convergence.


Denition (Almost Sure Convergence): A sequence of random variables X1 , X2 , ...
dened on a probability space (, F , P ) converges almost surely to a random
variable X if
lim Xn ( ) = X ( ) ,
n !

for each , except for 2 E , where P (E ) = 0.

The idea is this: pointwise convergence fails for some points in . However, the
number of such points is so small that we can safely assign zero measure (or zero
probability) to the set of these points.
Other ways of expressing this denition are,
P

lim jXn

n !

or
P

(Bilkent)

Xj > e = 0

for every e > 0,

: lim Xn ( ) = X ( )
n !

ECON509

= 1.

This Version: 16 November 2014

27 / 82

Convergence Concepts
Almost Sure Convergence

This time, we have a dierence. Convergence fails on a very small set E such that
P (E ) = 0.
That P (E ) = 0 is due to the set being so small that we can safely assign zero
probability to the set.
Remember that in earlier lectures we have stated that for a continuously distributed
random variable, the probability of a single point is always equal to zero. This is
similar, in spirit, to the situation at hand.
This type of convergence is also called convergence almost everywhere and
convergence with probability 1.
The following notation is common:
a.s .

Xn ! X
wp1

Xn ! X

lim Xn = X a.s.

n !

Note that, at the cost of sloppy notation, the argument of Xn ( ) is usually dropped.
Also, Xn ( ) need not converge to a function. It can also simply converge to some
constant, say, a.
(Bilkent)

ECON509

This Version: 16 November 2014

28 / 82

Convergence Concepts
Almost Sure Convergence

Usually, Xn ( ) is some sample average. For example, let Zi ( ), i = 1, ..., n, be


some random variable and let
Xn ( ) =

1 n
Z i ( ).
n i
=1

Then, almost sure convergence is a statement on the joint distribution of the entire
sequence fZi ( )g.

White (2001, p.19): The probability measure P determines the joint distribution
of the entire sequence fZi ( )g. A sequence Xn ( ) converges almost surely if the
probability of obtaining a realisation of the sequence fZi ( )g for which convergence
to Xn ( ) occurs is unity.

(Bilkent)

ECON509

This Version: 16 November 2014

29 / 82

Convergence Concepts
Almost Sure Convergence

We can extend this concept to vectors. Let Xn = (X1,n , ..., XD ,n )0 and


X = (X1 , ..., XD )0 . Then
a.s .
Xn ! X ,
if and only if

a.s .

Xd ,n ! Xd

for all d = 1, ..., D.

So almost sure convergence has to occur component by component.


A more compact notation is available. Dene the Euclidian norm:
q
X12 + ... + Xd2
jjX jj =
and observe that

jX i j
a.s .

Then, Xn ! X if
lim jjXn ( )

n !

X ( )jj = 0,

X12 + ... + Xd2 .

except for 2 E , where P (E ) = 0.

Whenever you see the terms almost sure, with probability one, or almost
everywhere you should remember that the relationship that is referred to holds
everyhwhere except for some set with zero probability.
(Bilkent)

ECON509

This Version: 16 November 2014

30 / 82

Convergence Concepts
Almost Sure Convergence

Now that we have a convergence concept in our arsenal, the next question is: under
which conditions can we use this result for statistics of interest?
One important result is Kolmogorovs Strong Law of Large Numbers (SLLN).
Theorem (Kolmogorovs Strong Law of Large Numbers): If X1 , X2 , ... are
independent and identically distributed random variables and the common mean is
nite, i.e. E [jX1 j] = < , then
1 n
Xi =
n ! n
i =1
lim

a.s.

or

1 n
a.s .
Xi ! .
n i
=1

Note that the sample size will never be equal to . However, when the sample size
is large enough, the sample mean will be very close to the population mean, .
What is remarkable about this result is that, as long as we have iid random
variables, the only condition is that the random variables have nite mean.
For the time being, su ce to say that one can relax the iid assumption but that
means that we will require more than just nite means. There is a trade-o between
the amount of dependence and the strength of moment assumptions you have to
make in order to attain convergence.
(Bilkent)

ECON509

This Version: 16 November 2014

31 / 82

Convergence Concepts
Convergence in Probability

The next convergence type is convergence in probability. Its denition is similar to


that of almost sure convergence but in essence it is a much weaker convergence
concept.
Denition (Convergence in Probability): A sequence of random variables
X1 , X2 , ... converges in probability to a random variable X if, for every e > 0,
lim P (jXn

n !

X j > e) = 0.

An equivalent statement is that given e > 0 and > 0 there exists an N, which
depends on both and e, such that P (jXn X j > e) < for all n > N. This
merely is a restatement using the formal denition of limit.
One could also write
lim P (f : jXn ( )

n !

(Bilkent)

ECON509

X ( )j > eg) = 0.

This Version: 16 November 2014

32 / 82

Convergence Concepts
Convergence in Probability

It is not easy to see the dierence between the two modes of convergence. However,
lets give it a try!
Almost sure convergence states that we have pointwise convergence for all 2
except for a small, zero measure set E . Importantly, this set is independent of n.
Convergence in probability states that as the sample size goes towards , the
probability that Xn will deviate from X by more than e decreases towards zero.
However, for any sample size, there is a positive probability that Xn will deviate by
more than e. In other words, for some 2 En , such that P (En ) > 0, jXn X j
will be larger than e.
Importantly, there is nothing that restricts En to be the same for all n. Hence, the
set on which Xn deviates from X may change as n increases.
The good news is, deviation probability slowly goes to zero, hence P (En ) will
eventually be zero.

(Bilkent)

ECON509

This Version: 16 November 2014

33 / 82

Convergence Concepts
Convergence in Probability

Now, as before, let Zi , i = 1, ..., n, be some random variable and let


Xn =

1 n
Zi .
n i
=1

White (2001, p.24): With almost sure convergence, the probability measure P
takes into account the joint distribution of the entire sequence fZi g, but with
convergence in probability, we only need concern ourselves sequentially with the joint
distribution of the elements of fZi g that actually appear in Xn , typically the rst n.

(Bilkent)

ECON509

This Version: 16 November 2014

34 / 82

Convergence Concepts
Convergence in Probability

Consider the behaviour of Xn ( ) X ( ) for some xed as n ! on the next


p
two slides. Which one makes you think that Xn ( ) X ( ) ! 0? Which one makes
a.s .
you think that Xn ( ) X ( ) ! 0?

(Bilkent)

ECON509

This Version: 16 November 2014

35 / 82

Convergence Concepts
Convergence in Probability
X ()-X( ) for some given - Case 1
n

80

60

40

20

-20

-40

(Bilkent)

20

40

60

80

100
n

ECON509

120

140

160

180

200

This Version: 16 November 2014

36 / 82

Convergence Concepts
Convergence in Probability
X ()-X( ) for some given - Case 2
n

80

60

40

20

-20

-40

-60

(Bilkent)

20

40

60

80

100
n

ECON509

120

140

160

180

200

This Version: 16 November 2014

37 / 82

Convergence Concepts
Convergence in Probability

Example (5.5.8) (Convergence in probability but not almost surely): Let


= [0, 1] with the uniform probability distribution.
Dene the following sequence.
X1 ( )

+ I[0,1 ] ( ) ,

X2 ( ) = + I[0, 1 ] ( ) ,

X3 ( )

+ I[ 1 ,1 ] ( ) ,

X4 ( ) = + I[0, 1 ] ( ) ,

X5 ( )

+ I[ 1 , 2 ] ( ) ,

X6 ( ) = + I[ 2 ,1 ] ( ) ,

3 3

etc. Let, X ( ) = .
Do we have convergence in probability? Observe that
X1 ( )

X ( )

I[0,1 ] ( ) ,

X2 ( )

X3 ( )

X ( )

I[ 1 ,1 ] ( ) ,

X4 ( )

X ( ) = I[0, 1 ] ( ) ,
2
X ( ) = I[0, 1 ] ( ) ,

X5 ( )

X ( )

I[ 1 , 2 ] ( ) ,

X6 ( )

X ( ) = I[ 2 ,1 ] ( ) ,
3

3 3

etc.

(Bilkent)

ECON509

This Version: 16 November 2014

38 / 82

Convergence Concepts
Convergence in Probability

Now, due to the uniform probability distribution assumption, P ( 2 [a, b ]) = b

where 0 a b 1. Then, as n ! , P : I[a,b ] ( ) > e ! 0, since


b a ! 0 as n ! .

Do we have almost sure convergence? No. There is no value of 2 for which


Xn ( ) ! . For every , the value Xn ( ) alternates between the values and
+ 1 innitely often.

For example, if = 3/8, X1 ( ) = 11/8, X2 ( ) = 11/8, X3 ( ) = 3/8,


X4 ( ) = 3/8, X5 ( ) = 11/8, X6 ( ) = 3/8, etc.
No pointwise convergence occurs for this sequence.

(Bilkent)

ECON509

This Version: 16 November 2014

39 / 82

Convergence Concepts
Convergence in Probability

Example (5.5.7): Let the sample space be the closed interval [0, 1] with the
uniform probability distribution. Dene random variables
Xn ( ) = + n

and

X ( ) = .

For every 2 [0, 1), n ! 0 as n ! and Xn ( ) ! = X ( ).

However, Xn (1) = 2 for every n, so Xn (1) does not converge to 1 = X (1).


But, since the convergence occurs on the set [0, 1) and P ([0, 1)) = 1 (remember
that when we have a continuous random variable, the probability of a single point is
equal to zero),
a.s .
Xn ! X .

(Bilkent)

ECON509

This Version: 16 November 2014

40 / 82

Convergence Concepts
Convergence in Probability

One would usually write


p

Xn ! X

or

p lim Xn = X ,
n !

as short hand.
Associated with convergence in probability is the Weak Law of Large Numbers
(WLLN)
Theorem (Weak Law of Large Numbers): If X1 , X2 , ... are iid random variables
with common mean < and variance 2 < , then
1 n
p
Xi ! ,
n i
=1
as n ! .

(Bilkent)

ECON509

This Version: 16 November 2014

41 / 82

Convergence Concepts
Convergence in Probability

Proof: The proof uses Chebychevs Inequality. Remember this says that if X is a
random variable and if g (x ) is a nonnegative function, then, for any r > 0,
P (g (X )

r)

e2

E [g (X )]
.
r

Now, consider
P (jX n

e) = P (X n

where we use r = e2 and g (X n ) = (X n


The above result implies that
P (jX n

) .

P (jX n

j < e) = 1

and
lim 1

Hence,

E [(X n ) ]
Var (X n )
2
=
= 2,
2
2
e
e
ne

n !

e)

2
,
ne2

2
= 1.
ne2

lim P (jX n

n !
(Bilkent)

ECON509

j < e) = 1.
This Version: 16 November 2014

42 / 82

Convergence Concepts
Convergence in Probability

This is perhaps a good time to stop and reect a little bit on these new concepts.
What the Weak and Strong LLNs are saying is that under certain conditions, the
sample mean converges to the population mean as n ! . This is known as
consistency: one would say that the sample mean is a consistent estimator of the
population mean.
In actual applications, this means that if the sample size is large enough, then the
sample mean is close to the population mean. So n does not have to be that close
to innity. On the other hand, as mentioned at the beginning , how large the
sample size should be in order to be considered a large enough sample is a
dierent question in its own. We will not deal with this here.
Sometimes, consistency is compared to unbiasedness.
An estimator of a population value is an unbiased estimator if and only if
E [ ] = .
What are the things we might want to estimate? One example would be parameters
of a distribution family. For example, we might know that the data are distributed
with N , 2 , but we may not know the particular values of and 2 . In this case,
we would estimate these parameters.
(Bilkent)

ECON509

This Version: 16 November 2014

43 / 82

Convergence Concepts
Convergence in Probability

The p lim operator has some nice properties that makes it much more convenient to
deal with, compared to the expectation operator. In particular, let X1 , X2 , ... and
Y1 , Y2 , ... be two random sequences and a1 , a2 , ... be some non-stochastic sequence.
Then, we have the following.
1

p lim

n !

while we usually have


E

Xn
p limn ! Xn
=
,
Yn
p limn ! Yn
Xn
Yn

6=

E [X n ]
;
E [Y n ]

p lim (Xn + Yn ) = p lim Xn + p lim Yn ;


n !

n !

n !

the p lim of a non-random sequence is equal to its limit:


p lim an = lim an .
n !

(Bilkent)

ECON509

n !

This Version: 16 November 2014

44 / 82

Convergence Concepts
Convergence in Probability

Note that, one concept does not usually imply another. In other words, a consistent
estimator can be biased, while an unbiased estimator can be inconsistent.
Suppose we are trying to estimate the population parameter . Consider the
following estimators.
1
2

= + 20/n : consistent but biased.


= X where P (X = + 100 ) = P (X =

(Bilkent)

ECON509

100 ) = .50 : unbiased but inconsistent.

This Version: 16 November 2014

45 / 82

Convergence Concepts
Convergence in Probability

Returning to the discussion at hand, it is important to acknowledge that neither


almost sure convergence nor convergence in probability (and nor any convergence
type) says anything about the distribution of the sequence X1 , X2 , ... . For example,
it might be such that the distribution of Xi changes as i varies. This is ne.
So far, we have only considered LLNs that work when the sequence is drawn from an
iid population. If this assumption is violated, we can still probably have convergence
of the sample mean to the population mean, but we will have to nd an appropriate
LLN that works for the particular population distribution we have.
A useful result relating almost sure convergence and convergence in probability is
that
p
a.s .
Xn ! X ) Xn ! X .
Convergence in probability, however, does not imply almost sure convergence.
Obviously, for some constant K
a.s .

K !K

(Bilkent)

and

ECON509

K ! K.

This Version: 16 November 2014

46 / 82

Convergence Concepts
Convergence in Probability

As far as economists and most of the econometricians are concerned, one would not
care too much about whether convergence is achieved almost surely or in probability.
As long as convergence is achieved, the rest is not important.
However, in some cases it might be easier to prove the LLN for one of the two
p
a.s .
convergence types. This is no problem, as ! implies ! anyway.

In addition, convergence almost surely might be slower than convergence in


probability in the sense that it might require a larger sample size before the sample
mean is close enough to the population mean.

(Bilkent)

ECON509

This Version: 16 November 2014

47 / 82

Convergence Concepts
Convergence in Probability

Example (5.5.3) (Consistency of S 2 ): Suppose we have a sequence X1 , X2 , ... of


iid random variables with E [Xi ] = and Var (Xi ) = 2 < . If we dene
Sn2 =

1
n

(X i

1 i =1

2
X n ) ,

what condition do we require in order to prove a Weak Law of Large Numbers


(WLLN) for Sn2 ?
Again, use Chebychevs Inequality, to obtain
P

Sn2

e =P

Sn2

Sn2

e2

Var (Sn2 )
.
e2

Therefore, a su cient condition for weak convergence of Sn2 to 2 is that


limn ! Var (Sn2 ) = 0.

(Bilkent)

ECON509

This Version: 16 November 2014

48 / 82

Convergence Concepts
Convergence in Probability

Although we might get convergence results for some sample mean Xn and Yn , we
might actually be interested in the convergence properties of a function of these, say
g (X n , Y n ) .
Fortunately, we have the following useful result.
If (Xn , Yn ) converges almost surely (in probability) to (X , Y ) , if g (x , y ) is a
continuous function over some set D , and if the images of under
[Xn ( ) , Yn ( )] and [X ( ) , Y ( )] are in D , then g (Xn , Yn ) converges almost
surely (in probability) to g (X , Y ) .
More pragmatically,
a.s .

a.s .

Xn ! X

g (X n ) ! g (X ),

Xn ! X

g (X n ) ! g (X ).

2
Example (5.5.5) (Consistency of S ): If Sn2 is a consistent
p estimator of , then
one can show that the sample standard deviation Sn = Sn2 = h (Sn2 ) is a consistent
estimator of .

Interestingly, it can be shown that Sn , in fact, is a biased estimator of ! However,


the bias disappears asymptotically.
(Bilkent)

ECON509

This Version: 16 November 2014

49 / 82

Convergence Concepts
Convergence in Probability

Before we move on to a dierent type of convergence, let us, for sake of


completeness, introduce one more type of convergence.
Denition (Lp Convergence): Let 0 < p < , let X1 , X2 , ... be a sequence of
random variables with E [jXn jp ] < and let X be a random variable with
E [jX jp ] < . Then, Xn converges in Lp to X if
lim E [jXn

n !

X jp ] = 0.

Just as a reference, Lp convergence does not imply almost sure convergence and nor
does almost sure convergence imply Lp convergence. However, Lp convergence
implies convergence in probability.

(Bilkent)

ECON509

This Version: 16 November 2014

50 / 82

Convergence Concepts
Convergence in Probability

Finally, we present another type of LLN before we move on to convergence in


distribution.
Theorem (Uniform Strong LLN): If X1 , X2 , ... are iid random variables, if g (x , ) is
continuous over X where X is the range of X1 and is a closed and bounded
set, and if
E sup jg (X1 , )j < ,
2

then
lim sup

n ! 2

1 n
g (X i , )
n i
=1

E [g (X1 , )] = 0

a.s.

Moreover, E [g (X1 , )] is a continuous function of .


Therefore, the worst deviation of the sample average from the population average
(E [g (X1 , )]) that one can nd over all 2 converges to zero almost surely.

This is not a simple generalisation of strong LLN. Generally, LLN for each and every
individually would not imply uniform LLN (except under certain assumptions).

(Bilkent)

ECON509

This Version: 16 November 2014

51 / 82

Convergence Concepts
Convergence in Distribution

So far, we have dealt with results concerning the sample mean.


The main theme has been the convergence of the sample mean to the population
mean.
This is useful, but we can get much more.
For example, we can get convergence in distribution, as well.

(Bilkent)

ECON509

This Version: 16 November 2014

52 / 82

Convergence Concepts
Convergence in Distribution

Denition (Convergence in Distribution): A sequence of random variables


X1 , X2 , ... converges in distribution to a random variable X if
lim FX n (x ) = FX (x ) ,

n !

at every x where F (x ) is continuous.


This is also called convergence in law. The following short hand notation is used to
denote convergence in distribution:
d

Xn ! X ,
d

Xn ! FX ,
L

Xn ! FX .
It is important to underline that it is not Xn that converges to a distribution.
Instead, it is the distribution of Xn that converges to the distribution of X .

(Bilkent)

ECON509

This Version: 16 November 2014

53 / 82

Convergence Concepts
Convergence in Distribution

As far as sequences of random vectors are concerned, a sequence of random vectors


Xn = (X1,n , ..., Xd ,n ) converges in distribution to a random vector X if
lim FX n (x1 , ..., xd ) = FX (x1 , ..., xd ) ,

n !

at every x = (x1 , ..., xd ) where F (x1 , ..., xd ) is continuous.


Importantly, convergence in probability implies convergence in distribution.
Theorem (5.5.12): If the sequence of random variables X1 , X2 , ... converges in
probability to a random variable X , the sequence also converges in distribution to X .
Consequently, almost sure convergence implies convergence in distribution, as well.
Theorem (5.5.13): The sequence of random variables X1 , X2 , ... converges in
probability to a constant a if and only if the sequence also converges in distribution
to a. Equivalently, the statement
P (jXn

a j > e) ! 0 for every e > 0

is equivalent to
F X n (x ) = P (X n

(Bilkent)

x) !

ECON509

0
1

if x < a
.
if x > a
This Version: 16 November 2014

54 / 82

Convergence Concepts
Convergence in Distribution

We now introduce one of the most useful theorems we have considered so far.
Theorem (5.5.15) (Central Limit Theorem): Let X1 , X2 , ... be a sequence of iid
random variables with E [Xi ] = < and 0 < Var (Xi ) = 2 < . Dene
1 n
X n = Xi .
n i =1
Let Gn (x ) denote the cdf of

(X n )
.

lim Gn (x ) =

n !

In other words,
1
p
n

(Bilkent)

i =1

Then, for any x ,

Z x

Xi

ECON509

1
p e
2

y 2 /2

< x < ,

dy .

! N (0, 1).

This Version: 16 November 2014

55 / 82

Convergence Concepts
Convergence in Distribution

This is a powerful result! We start with the iid and nite mean and variance
assumptions. In return, the Central Limit Theorem (CLT) promises us that the
distribution of a properly standardised version of the sample mean given by
1
p
n

i =1

Xi

will converge to the standard normal distribution as the sample size tends to innity.
As before, the sample size will never be equal to . BUT, for large enough samples,
Xi
p1 n
will be approximately standard normal. As n becomes larger, this
n i =1
approximate result will become more accurate.
As with LLNs, it is possible to obtain CLTs for non-iid data. However, this will
require one to make stronger assumptions regarding the moments of the sequence of
random variables. The trade-o between dependence and moment assumptions is
always there.

(Bilkent)

ECON509

This Version: 16 November 2014

56 / 82

Convergence Concepts
Convergence in Distribution

Lets prove the CLT. However, before we do that, we have to revisit Taylor
Expansions.
Denition (5.5.20): If a function g (x ) has derivatives of order r , that is,
dr
g (r ) (x ) = dx
r g (x ) exists, then for any constant a, the Taylor polynomial of order r
about a is
r
g (i ) (a )
T r (x ) =
(x a )i .
i!
i =0
This polynomial is used in order to obtain a Taylor expansion of order r about
x = a. This is given by
g (x ) = T r (x ) + R,
where R = g (x )

(Bilkent)

Tr (x ) is the remainder for the approximation.

ECON509

This Version: 16 November 2014

57 / 82

Convergence Concepts
Convergence in Distribution

Now, a useful major result is as follows.


Theorem (5.5.21): If g (r ) (a ) =

dr
dx r

g (x )

exists, then
x =a

lim

x !a

g (x )
(x

T r (x )
= 0.
a )r

This says that the remainder, g (x ) Tr (x ) , always tends to zero faster than the
highest-order term of the approximation.
Importantly, this also means that as x tends to a, the remainder term approaches 0.

(Bilkent)

ECON509

This Version: 16 November 2014

58 / 82

Convergence Concepts
Convergence in Distribution

We can now prove the CLT.


Proof: We will do the proof for the case where the mgf exists for jt j < h for some
positive h. The CLT can be proved without assuming existence of the mgf and using,
instead, characteristic functions. However, this would be much more complicated.
Let E [Xi ] = and Var (Xi ) = 2 . The aim is to show that as n ! the mgf of

X n

converges to the mgf of a N (0, 1) random variable, which will prove that the
p X n
distribution of n
converges to the standard normal distribution.

Now, let Yi =

Xi
.

Then,

h Xi i
MY i (t ) = E [e tY i ] = E e t
=e

(Bilkent)

ECON509

h Xi i
E et = e

MX i (t/ ) .

This Version: 16 November 2014

59 / 82

Convergence Concepts
Convergence in Distribution

Let Mn (t ) be the mgf for

X n

n Y .

Since Xi are iid, Yi are also iid. In addition, E [Yi ] = 0 and Var (Yi ) = 1.
Now, due to the independence and identical distribution assumptions,
i
h
i
h p1
p
p
t (Y +...+Y n )
= E e (t / n )Y 1 ... e (t / n )Y n
M n (t ) = E e n 1
p
p
p n
= E [e (t / n )Y 1 ] ... E [e (t / n )Y n ] = MY i t/ n
.
p
Lets expand MY i t/ n around t = 0.
MY i

t
p

M Y i (0 ) +

1 d2
M
2 dt 2 Y i

where

RY
(Bilkent)

t
p

d
M
dt Y i
t
p

dk
dt k MYi
k =3
ECON509

t
p

t =0

t
p

t =0

t
p

t
p

t =0

+ RY i
p
t/ n
k!

t
p

This Version: 16 November 2014

60 / 82

Convergence Concepts
Convergence in Distribution

These expansions do exist since that the mgf of Xi exists in a neighbourhood of 0


p
(jt j < h for some h) implies that jt j < nh which, in turn, implies that MY i ptn
exists.
We know from Theorem (5.5.21) that, for xed t,
p
p
RY t/ n
RY t/ n
lim
p 2 = nlim
p 2 = 0,
p
!
t / n !0
t/ n
t/ n
where we need to have t 6= 0.

But, as will become clear in a moment, we are interested in the behaviour of


p
RY t/ n
p 2 .
1/ n
Since the above result is based on xed t, we have
p
p
RY t/ n
lim
n RY t/ n = 0,
p 2 = nlim
n !
!

1/ n
p
as well, for all t, including t = 0, since RY 0/ n = 0.
(Bilkent)

ECON509

This Version: 16 November 2014

(1)

61 / 82

Convergence Concepts
Convergence in Distribution

Now, notice that

p
d
M
t/ n
dt Y i

E [Yi ] = 0,

Var (Yi ) = 1,

t =0

p
d2
M
t/ n
dt 2 Y i

t =0

and MY i (0) = 1 by denition.


Therefore,
MY i

t
p

=
=

(Bilkent)

"

1+0
1+

1
n

t
p

1
+
2

t2
+ n RY i
2

ECON509

t
p

t
p

+ RY i

t
p

#n

This Version: 16 November 2014

62 / 82

Convergence Concepts
Convergence in Distribution

Remember that for any sequence an , if limn ! an = a, then


lim

n !

Let an =

t2
2

lim

n !

pt

+ n RY i
MY i

t
p

1+

= ea.

and observe that, by (1), limn ! an = t 2 /2. Then,


n

an
n

= lim

n !

1+

1
n

t2
+ n RY i
2

t
p

= et

/2

But this is the mgf of the N (0, 1) distribution! Hence, the CLT is proved.

(Bilkent)

ECON509

This Version: 16 November 2014

63 / 82

Convergence Concepts
Convergence in Distribution

Example (5.5.16): Suppose (X1 , ..., Xn ) are a random sample from a negative
binomial (r , p ) distribution. For this distribution, one can show that
E [X i ] =

r (1

p)
p

and

Var (Xi ) =

r (1 p )
,
p2

for all i .
Then, the CLT tells us that
p
n (X r (1 p ) /p ) d
p
! N (0, 1) .
r (1 p ) /p 2

Hence, in a large sample, this quantity should be approximately standard normally


distributed.

(Bilkent)

ECON509

This Version: 16 November 2014

64 / 82

Convergence Concepts
Convergence in Distribution

One can also do exact calculations but these would be di cult. Take r = 10, p = .5
and n = 30. Now, consider
!
30

P (X
11) = P Xi
330
i =1

330

x =0

300 + x
x

1
2

300

1
2

= .8916,

which follows from the fact that 30


i =1 Xi is negative binomial(nr , p ) (you do not
have to prove the results presented in this bulletpoint!).
Such calculations would be tough, even using a computer, as we are considering
factorials of very large numbers.
We could also use the CLT to obtain the following approximation.
!
p
p
10)
30
X
30
11
10
(
(
)
p
p
P (Z
1.2247) = .8888,
P (X
11) = P
20
20
where Z
(Bilkent)

N (0, 1).
ECON509

This Version: 16 November 2014

65 / 82

Convergence Concepts
Convergence in Distribution

Two useful results are given next.


Theorem: If Xn is a sequence of random vectors each with support X , g (x ) is
continuous on X and
d
Xn ! X ,
then

g (X n ) ! g (X ).
d

Theorem (5.5.17) (Slutskys Theorem): If Xn ! X and Yn ! k, where k is a


constant, then
1
2

Y n X n ! kX ,
d
Xn + Yn ! X + k .

(Bilkent)

ECON509

This Version: 16 November 2014

66 / 82

Convergence Concepts
Convergence in Distribution

Example (5.5.18): Suppose that


p
n (X n ) d
! N (0, 1) ,

however the value of is unknown. What to do?


p
In Example (5.5.3), we have seen that if limn ! Var Sn2 = 0, then Sn2 ! 2 . One
p

can show that this implies that Sn ! .


Then, by Slutskys Theorem,
p
p

n (X n )
n (X n
=
Sn
Sn |

{z
|{z}
d

!N (0,1 )

!1

If you nd this confusing, try to see this as


p
n (X n ) d
! X , where
Sn
The mean of X is equal to 1
normally distributed. Hence,

(Bilkent)

)
}

! N (0, 1) .

0 while Var (X ) = 12

n (X n
Sn

ECON509

N (0, 1) .
1. Moreover, clearly, X is

! N (0, 1) .
This Version: 16 November 2014

67 / 82

Convergence Concepts
The Delta Method

When talking about the CLT, our focus has been on the limiting distribution of
some standardised random variable.
There are many instances, however, when we are not specically interested in the
distribution of the standardised random variable itself, but rather of some function of
it.
The delta method comes in handy in such cases. This method utilises our knowledge
on the limiting distribution of a random variable in order nd the limiting
distribution of a function of this random variable.
In essence, this method is a combination of Slutskys Theorem and Taylors
approximation.

(Bilkent)

ECON509

This Version: 16 November 2014

68 / 82

Convergence Concepts
The Delta Method

Theorem (5.5.24) (Delta Method): Let Yn be a sequence of random variables


that satises
p
d
n (Yn ) ! N 0, 2 .

For a given function g ( ) and a specic value of , suppose that g 0 ( ) exists and
g 0 ( ) 6= 0. Then,
p
d
2
n [g (Yn ) g ( )] ! N 0, 2 g 0 ( )
.
Proof: The rst-order Taylor expansion of g (Yn ) about Yn = is
g (Y n ) = g ( ) + g 0 ( ) (Y n
p

) + R,

where R is the remainder and R ! 0 as Yn ! .


Then,
p
p
n [g (Yn ) g ( )] g 0 ( ) n (Yn
|
{z

)
}

and therefore,
p
n [g (Y n )

as n ! ,

!N (0,2 )

g ( )] ! g 0 ( ) N 0, 2 = N 0, g 0 ( )
p

2 .
p

Implicitly, what we need here is that Yn ! , as this makes sure that R ! 0.


(Bilkent)

ECON509

This Version: 16 November 2014

69 / 82

Convergence Concepts
The Delta Method

In some cases, one might have


g 0 ( ) = 0.
In this case, the delta method as proposed above will not work.
However, this problem can be solved by using a second-order delta method.
Consider the second order expansion.
g (Y n ) = g ( ) + g 0 ( ) (Y n

) +

g 00 ( )
(Y n
2

)2 + R.

p
p
As before, R ! 0 as Yn ! . However, this time g 0 ( ) = 0.

So,

g (Y n )

(Bilkent)

g ( ) +

g 00 ( )
(Y n
2

ECON509

)2 ,

as n ! .

This Version: 16 November 2014

70 / 82

Convergence Concepts
The Delta Method

Now,

p Yn
n

! N (0, 1) ,

which implies that


n
Hence,
n [g (Y n )

g ( )]

Yn

! 21 .

g 00 ( )
n (Y n )2
2 |
{z
}

as n ! ,

!2 21

and, therefore,
n [g (Y n )

(Bilkent)

g ( )] !

ECON509

g 00 ( ) 2 2
1 .
2

This Version: 16 November 2014

71 / 82

Convergence Concepts
The Delta Method

The next Theorem follows.


Theorem (5.5.26) (Second Order Delta Method): Let Yn be a sequence of
random variables that satises
p
d
n (Yn ) ! N 0, 2 .
For a given function g and a specic value of , suppose that g 0 ( ) = 0, g 00 ( )
exists and g 00 ( ) 6= 0. Then,
n [g (Y n )

(Bilkent)

g ( )] !

ECON509

g 00 ( ) 2 2
1 .
2

This Version: 16 November 2014

72 / 82

Convergence Concepts
Some More Large Sample Results

So far we have worked with iid random sequences only. We now introduce large
sample results for a dierent type of distribution. The following is largely based on
White (2001).
Lets start with independent heterogeneously distributed random variables.
The failure of the identical distribution assumption results from stratifying
(grouping) the population in some way. The independence assumption remains valid
provided that sampling within and across the strata is random.
Theorem (Markovs Law of Large Numbers): Let X1 , ..., Xn be a sequence of
independent random variables, with E [Xi ] = i < , for all i . If for some > 0,
i1
0 h

E jX i i j1 +
A < ,
@
i 1 +
i =1
then

1 n
Xi
n i
=1

(Bilkent)

1 n
a.s .
i ! 0.
n i
=1

ECON509

This Version: 16 November 2014

73 / 82

Convergence Concepts
Some More Large Sample Results

Notice some important dierences.


First, note that we have relaxed the iid assumption: the sequence is not identically
distributed. Instead, we have heterogeneously distributed random variables. This
comes at a cost: We now have to ensure that moments of order higher than one
should also be bounded. The following Corollary makes this easier to see.
Corollary 3.9. (White (2001)):
h Let Xi1 , ..., Xn be a sequence of independent
random variables such that E jXi j1 + < for some > 0 and all i . Then,
a.s .

1 n
ni=1 Xi
n i =1 i ! 0.
Remember that Kolmogorovs strong law only requires the existence of the rst
order moment.
1
n

So, moving away from the iid assumption usually comes at the cost of stronger
moment assumptions.
Xi are heterogeneous so their expected values are not identical. So, now we have
n 1 ni=1 i lying around. Compare this with the iid case,
1 n
1 n
E [Xi ] = = .

n i =1
n i =1
(Bilkent)

ECON509

This Version: 16 November 2014

74 / 82

Convergence Concepts
Some More Large Sample Results

There is also a CLT associated with independent heterogeneously distributed random


sequences. The following is based on Theorem 5.6 from White (2001). For
conciseness, dene
=

1 n
E [X i ]
n i
=1

2 =

and

1 n
Var (Xi ) ,
n i
=1

i.e. the average mean and the average variance, respectively.


Theorem (Lindeberg-Feller): Let X1 , ..., Xn be a sequence of independent random
scalars with E [Xi ] = i < , Var (Xi ) = 2 , where 0 < 2 < and distribution
functions Fi , i = 1, 2, ... . Then,

p (X
n

! N (0, 1)

if and only if for every > 0,


lim

n !

1 1 n

2 n i =1 (x

i )2 >n 2

(x

i )2 dFi (x ) = 0.

(2)

Now, this looks a lot more complicated!


(Bilkent)

ECON509

This Version: 16 November 2014

75 / 82

Convergence Concepts
Some More Large Sample Results

Again, lets proceed step by step.


First of all, the normalised term that converges in distribution is almost the same as
the one under the iid assumption. The only dierence is that we now use the
average mean and the average variance. Not surprising since we are dealing with
heterogeneous random variables.
Second, the conditions of the Theorem seem to be very similar to the CLT for iid
processes except for (2).
This is known as the Lindeberg condition and it requires the average contribution of
the extreme tails to the variance of Xi to be zero in the limit.
The integral is actually the contribution to the variance of Xi , in the region where
(x i )2
n 2

(x i )
n
i =1 Var (X i )

(Bilkent)

> .

ECON509

This Version: 16 November 2014

76 / 82

Convergence Concepts
Some More Large Sample Results

There is an equivalent result, which is easier to follow. This is given in Theorem


5.10 (White, 2001).
Theorem (Liapounov): Let X1 , ..., Xn be a sequence of independent random scalars
with E [Xi ] = i < , Var (Xi ) = 2i < and
h
i
E jXi i j2 + < , for some > 0 and all i .
If

2 > 0
for all n su ciently large, then

p (X
n

! N (0, 1) .

The conditions of this results are easiler to follow. We now need existence of even
higher order moments (more than the second order).

(Bilkent)

ECON509

This Version: 16 November 2014

77 / 82

Convergence Concepts
Order Notation

We nish this part by introducing the order notation.


This so called big-O, little-o notation is used to determine at which speed random
variables are approaching some bounded limit.
Put dierently, the notation is related to the order of magnitude of terms, as n ! .
Let f (x ) and g (x ) be two functions.
If

f (x )
!0
g (x )

as

x ! ,

then f is of smaller order than g and we write


f (x ) = o fg (x )g .
If,
lim jf (x ) /g (x )j

then we will write

x !

constant,

f (x ) = O fg (x )g

(Bilkent)

ECON509

This Version: 16 November 2014

78 / 82

Convergence Concepts
Order Notation

There is also a corresponding notation for random variables.


Let X1 , X2 , ... be a sequence of random variables and f be a real function.
If,

Xn p
! 0,
f (n )

then we write
Xn = op ff (n )g ,
while if

Xn p
! X,
f (n )

where X is a constant,

we write
Xn = Op ff (n )g .
The same applies to almost sure convergence.

(Bilkent)

ECON509

This Version: 16 November 2014

79 / 82

Convergence Concepts
Order Notation

It is important to know the following two representations of convergence in


probability and distribution, based on order notation.
Let X = n 1 n Xi .
i =1

Then, instead of

p
X !

a.s .
X !

or

one can also write


X = + op (1)

X = + oa.s . (1) ,

or

respectively.
Moreover, if

then, one can also write

(Bilkent)

p (X
n

p (X
n

ECON509

! N (0, 1) ,
= O p (1 )

This Version: 16 November 2014

80 / 82

Convergence Concepts
Order Notation

You might nd it not so clear at rst, but if


Xn = o n ,
for some , then
Xn = O n .
To see this, notice that

So,

Xn
! 0 = O (1 ) .
n
Xn
! O (1 ) , X n ! O n .
n

(Bilkent)

ECON509

This Version: 16 November 2014

81 / 82

Convergence Concepts
Order Notation

It might now be obvious to you that, if


Xn ! 0,
then
X n = o (1 ).
Now, consider
Xn = O n )

Xn
= O (1 ) .
n

Then, for any > 0,


Xn
= o (1 ) ) X n = o n + .
n +

(Bilkent)

ECON509

This Version: 16 November 2014

82 / 82

You might also like