Asymp Theory Notes 770 PDF

Notes on Asymptotic Theory: Convergence in
Probability and Distribution

Introduction to Econometric Theory
Econ. 770
Jonathan B. Hill
Dept. of Economics
University of North Carolina - Chapel Hill
November 19, 2011
Introduction
Let ( F ) be a probability space. Throughout is a parameter of interest like

the mean, variance, correlation, or distribution parameters like Poisson , Binomial
, or exponential . Throughout f^ g1 is a sequence of estimators of based on
a sample of data f g=1 with sample size 1. Assume ^ is F-measurable for
any . Unless otherwise noted, assume the 0 have the same mean and variance:
( 2 ). If appropriate, we may have a bivariate sample f g=1 where
( 2 ) and ( 2 ).
Examples include the sample mean, variance, or correlation:
X
:= 1
Sample Mean :
=1
Sample Variance #1 :
1 X
:=
( )2
1
=1
Sample Variance #2 :
^ 2 :=
X
=1
P

1 =1
Sample Correlation :
^ :=
^
^
Similarly, we may estimate a probability by using a sample relative frequency:
1X
^ () =
( ) the sample percentage of
=1
Notice ^ () estimates ( ).
We will look at estimator properties: what ^ is on average for any sample
size; and what ^ becomes as the sample size grows. PIn every
case above the
es
timator is a variant of a straight

average
(e.g. 1 =1 is a
), or a function of a straight average (e.g.

straight average of
^
2 12
2
P
:= (1 =1 ) , the square root of the average ). We therefore

pay particular attention to the sample mean.
Unbiasedness
Defn.
We say ^ is an unbiased estimator of if [^ ] = . Dene bias as

B ^ := [^ ]
An unbiased estimator has zero bias: B(^ ) = 0. If we had an innite number of

samples of size , then the average estimate ^ across all samples would be . An
asymptotically unbiased estimator satises B(^ ) ! 0 as ! 1.
Claim (Weighted Average):

P Let have a common mean := [ ]. Then
the
^ :=
=1 is an unbiased estimator of := [] if
P weighted average
=
1.
=1
Proof:
"
X
=1
[ ] =
=1
X
=1
= QED.
:= 1 P is a
Corollary (Straight Average):
The sample mean
=1
P
weighted average with at or uniform weights = 1 hence trivially =1 = 1
hence
=
[]
P
The problem then arises as to which weighted average =1 may be preferred
in practice since any with unit summed weights is unbiased. We will discuss below the
concept of eciency below, but the minimum mean-squared-error unbiased estimator
has uniform weights if ( 2 ). That is:
is the best linear

Claim (Sample Mean is Best):
Let ( 2 ). Then
unbiased estimator of (i.e. it is BLUE).
Proof:
We want to solve

!2
X
X
min

subject to
= 1
=1
=1
The Lagrange is
L ( ) :=
=1
!2
+ 1
X
=1
P
P
where by independence ( =1 )2 = 2 =1 2 , hence
X
X
L ( ) := 2
2 + 1
=1
=1
The rst order conditions are
L ( ) = 2 2 = 0 and
L ( ) = 1
= 0
=1
2
2
ThereforeP = (2
P) is a constant that sums to =1 = 1. Write = (2 ) =:
. Since =1 = =1 = = 1 it follows = = 1. QED.
Remark:
As in many cases here and below, independence can be substituted
for uncorrelatedness since the same proof applies: [ ] = [ ][ ] for all
6= . We can also substitute uncorrelatedness with a condition that restricts the
total correlation across all and for 6= , but such generality is typically only
exploited in time series settings (where is at a dierent time period).
Claim (Sample Variance):

Let ( 2 ). The estimator 2 is unbiased
2
and
^ is negatively biased but asymptotically unbiased.
Proof:
Notice
=1
=1
1 2
1 X
2= 1
2
=
^ 2 =

+
1X
1 X
1X
2
( ) +
+2
( )
=1
=1
=1
1X
1X
2
( ) + 2
( )
=1
=1
1X
22

( )2 +
=1
1X
2
( )2
=1
is unbiased
By the iid assumption and the fact that

!
2
1
1 X
1
=
= 2
( ) = 2 2 = 2
=1
=1
Further, by denition 2 := [( )2 ] hence

"
#
i 1X
1X
1X h
2
2
( ) =
( ) =
2 = 2
=1
=1
=1
Therefore
2
1 2
1
=
^ = 2 2 = 2

2
This implies each claim: 2 = 2 (2 is unbiased),
^ = 2 ( 1) 2
2
2
2
(^
is negatively biased), and
^ = ( 1) ! 2 (^
2 is asymptotically
unbiased). QED.
Example:
We simulate 100 samples of (75 4) with sample size = 20.
for each sample. The simulation average of all
is 74.983941
In Figure 1 we plot
and the simulation variance of

Pall is 21615195.
P
In Figure 2 we plot
^ = =1 for each sample with weights = =1 .
The simulation average of all
^ is 74.982795 and the simulation variance of all
^
is .30940776. Thus, both display the same property of unbiasedness, but exhibits
less dispersion across samples
Figure 1 :
Figure 2 :
^
Convergence in Mean-Square or L -Convergence
Defn.
We also write
We say ^ 2 R converges to in mean-square if
2
MSE(^ ) := ^ ! 0
^
! and ^ ! in mean-square.
If ^ is unbiased for then
h i2
h i
MSE(^ ) = ^ ^
= ^
Convergence in mean-square certainly does not require unbiasedness. In the, MSE is
h i
h i
2
MSE(^ ) = ^ = ^ ^ + ^
h i2
h i
2
h i h i
= ^ ^
+ ^ + 2 ^ ^
^
h i2
h i
2
= ^ ^
+ ^
4
h i
h i
h i
since ^ is just a constant and ^ ^ = [^ ] ^ = 0. Hence
MSE is the variance plus bias squared:
h i2
h i
2
h i 2
MSE(^ ) = ^ ^
+ ^ = ^ + B ^
If ^ 2 R then we write
0
MSE(^ ) := ^ ^ ! 0
hence component wise convergence. We may similarly write convergence in 2 -norm

0
112

X
0
X
^ ^ ! 0 where kk := @
2 A
2
=1 =1
or convergence in matrix (spectral) norm:

^ ^ ! 0 where kk is the largest eigenvalue of .
2
Both imply convergence with respect to each element ^ ! 0.
Defn.
We say ^ 2 R has the property of -convergence, or convergence in
-norm, to if for 0
^ ! 0
Clearly 2 -convergence and mean-square convergence are equivalent.
Claim (Sample Mean):
Proof:
! in mean square.
Let ( 2 ). Then
)2 = []
= 2 ! 0 QED.
(
= 2 still holds.
We only require uncorrelatedness since []
mean square.
Proof:
! in
Let ( 2 ) be uncorrelated. Then
)2 = []
= 2 ! 0 QED.
(
In fact, we only need all cross covariances to not be too large as the sample size
grows.
P
Let ( 2 ) satisfy 12 ( ) ! 0.
! in mean square.
Then
)2 = []
= 2 + 22 P ( ) ! 0 QED.
Proof:
(
Remark:
In micro-economic contexts involving cross-sectional data this type
of correlatedness is evidently rarely or never entertained. Typically we assume the
0 are uncorrelated. It is, however, profoundly popular in macroeconomic and
5
nance contexts where data are time series. A very large P

class of time series random
2
variables satises both ( ) 6= 0 8 6= and 1
( ) ! 0, and
therefore exhibits ! in mean square.
! in -norm for any 2 (1 2] but proving the result

If ( 2 ) then
for non-integer 2 (1 2) is quite a bit more dicult. There are many types of
"maximal inequalities", however, that can be used to prove
for 2 (1 2) where 0 is a nite constant.
=1
! in -norm for
Let ( 2 ) be iid. Then

any 2 (1 2).
Proof:
1 X
1 X
1
1
=
f g = 1 ! 0
=1
=1
since 1 QED.
Example:
We simulate (7 400) with sample sizes = 5 15 25 1000.
and []
= 400 over sample size . Notice the high volatility
In Figure 3 we plot
for small .
and []
Figure 3:
Convergence in Probability : WLLN
Defn.
We say ^ converges in probability to if
lim ^ = 0 8 0
!1
We variously write
^ !
and ^ !
(1)
and we say ^ is a consistent estimator of .

Since probability convergence is convergence in the sequence f (j^ j
by the denition of a limit it follows for every 0 there exists 0
such that
1 8
)g1
=1 ,
That is, for a large enough sample size ^ is guaranteed to be as close to as we

choose (i.e. the ) with as a great a probability as we choose (i.e. 1 ).
Claim (Law of Large Numbers = LLN):
Proof:
!
If ( 2 ) then
.
By Chebyshevs inequality and independence, for any 0

2
2
2 = 2 ! 0 QED

Remark 1:
We call this a Weak Law of Large Numbers [WLLN] since convergence is in probability. A Strong LLN based on a stronger form of convergence is
given below.
2 = 2 ! 0.
Remark 2:
We only need uncorrelatedness to get
The WLLN, however extends to many forms of dependent random variables.
Remark 3:
In the iid case we only need j j 1, although the proof is
substantially more complicated. Even for non-iid data we typically only need j j1+
1 for innitessimal 0 (pay close attention to scholarly articles you read, and
to your own assumptions: usually far stronger assumptions are imposed than are
actually required).
P
The weighted average =1 is also consistent as long as the weights decay
with the sample size. Thus we write the weight as .
P
P
P
Claim:
If ( 2 ) then =1 ! if =1 = 1 and =1 2 ! 0.
P
Proof:
By Chebyshevs inequality, independence and =1 = 1, for any 0

!

!2

!2
X
X
X

2
= 2
f g
=1
=1
= 2
X
=1
=1
h
i
X
2 ( )2 = 2 2
2 ! 0
=1
which proves the claim. QED.
with = 1, but also the weights = P used in

An example is
=1
Figure 2.
Example:
We simulate (75 20)
sample sizes = 5 15 25 10000.
Pwith
In Figures 4 and 5 we plot and

^ = =1 over sample size . Notice the
high volatility for small .
7

Figure 4 :
Figure 5 :
^
79
79
78
78
77
77
76
76
75
75
74
74
73
73
72
72
71
71
70
70
1005
2005
3005
4005
5005
6005
7005
8005
9005
1005
2005
3005
Sam ple Size n
4005
5005
6005
7005
8005
9005
Sam ple Size n
Claim (Slutsky Theorem):

Let ^ 2 R . If ^ ! and : R ! R is
continuous (except possibly with countably many discontinuity points) then (^ ) !

().
Corollary:
Let ^ ! , = 1 2. Then ^ 1 ^ 2 ! 1 2 , ^1 ^2 ! 1 2 ,
and if 2 6= 0 and lim inf !1 j^2 j 0 then ^1 ^2 ! 1 2 .
Claim:
Proof:
If ( 2 ) and [4 ] 1 then 2 ! 2 .
Note
1 2
1 X
1X
2
=
=
( )2
=1
=1
!
2 !
By LLN
, therefore by the Slutsky Theorem
0. By [4 ]
1 it follows ( )2 is iid with a nite variance, hence it satises the LLN:
P
1 =1 ( )2 ! [( )2 ] = 2 . QED.
Claim:
If ( 2 2 ) and [2 2 j 1 then the sample correla-
tion
^ ! the population correlation.
Example:
We simulate (7 400) and (0 900) and construct =
43 + 2 + . The true correlation is

43 [ ] + 2 2 7 (43 + 2 7)
[ ] [ ] [ ]
p
=
=
20 4 400 + 900
43 7 + 2 400 + 72 7 (43 + 2 7)
p
=
= 8
20 4 400 + 900
We estimate correlation for samples with size = 5 15 25 10000. Figure 6 demonstrates consistency and therefore the Slutsky Theorem.
Figure 6: Correlation
1.00
0.90
0.80
0.70
0.60
0.50
5
1005 2005 3005 4005 5005 6005 7005 8005 9005

Sample Size n
Almost Sure Convergence : SLLN
Defn.
We say ^ converges almost surely to if
lim ^ = = 1
!1
This is identical to
lim max = 0 8 0
!1
We variously write
^ !
and ^ !
and we say ^ is strongly consistent for .

We have the following relationships.
Claim:
Proof:
^ ! implies ^ ! ; . ^ ! implies ^ ! .
(j^ j ) 2 (^ )2 by Chebyshevs inequality. If (^ )2 ! 0 (i.e.
^
! ) then (j^ j ) ! 0 where 0 is arbitrary. Therefore ^ ! .
(j^ j ) (sup j^ j ) since sup j^ j j^ j.
Therefore if (sup j^ j ) ! 0 8 0 (i.e. ^ ! ) then (j^ j
) ! 0 8 0 (i.e. ^ ! ). QED.
If ^ is bounded wp1 then ^ ! if and only if [^ ] ! which is asymptotic un
biasedness (see Bierens). By the Slutsky Theorem ^ ! implies (^ )2 ! 0 hence

[(^
)2 ] ! 0: convergence in probability implies convergence in mean-square.
This proves the following (and gives almost sure convergence as the "strongest" form:
the one that implies all the rest).
Claim (a.s. =) i.p. =) m.s.):
Let ^ be bounded wp1: (j^ j ) = 1

9
for nite 0. Then ^ ! implies ^ ! implies asymptotic unbiasedness and
^
! .
Claim (Strong Law of Large Numbers = SLLN):
! .
Remark:
Example:
If ( 2 ) then
The Slutsky Theorem carries over to strong convergence.
Let ( 2 ) and dene

^ :=
1 +

Then (j^ j ) = 1. Moreover, under the iid assumption
! by the SLLN,
hence by the Slutsky Theorem
1
^
!
1 +
Therefore
1
^ !
1 +
and [^ ] ! = 1(1 + ) and
2
^

! 0
Convergence in Distribution : CLT
Defn.
We say ^ converges in distribution to a distribution , or to a random
variable with distribution , if
lim ^ = () for every on the support .

!1
Thus, while ^ may itself not be distributed , asymptotically it is. We write
^ !
or ^ ! where .
The notation ^ ! is a bit awkward, because characterizes innitely many

random variables. We are therefore saying there is some random draw from
that ^ is becoming. Which random draw is not specied.
6.1
Central Limit Theorem
Convergence of some
By far the most famous result concerns the sample mean .
^
estimator in a monumentally large number of cases reduces to convergence of a
sample mean of something, call it . This carries over to the sample correlation,
regression model estimation methods like Ordinary Least Squares, GMM, and Maximum Likelihood, as well as non-parametric estimation, and on and on.
10
As usual, we limit ourselves to the iid case. The following substantially carries
over to non-iid data, and based on a rarely cited obscure fact does not even require
a nite variance (I challenge you to nd a proof of this, or to ever discover any
econometrics textbook that accurately states this).
Claim (Central Limit Theorem = CLT):
:=
If ( 2 ) then
!

(0 1)
Remark 1:
This is famously cites as the LindebergLvy CLT. Historically,
however, the proof arose in dierent camps sometime between 1910-1930 (covering
Lindeberg, Lvy, Chebyshev, Markov and Lyapunov).
p
Remark 2:
Notice by construction :=
is a standardized
= by identical distributedness and []
= 2 by
sample mean because []
independence and identical distributedness. Thus

[]
p
:=
=
=

[]
Therefore
p
has mean 0 and variance 1:
"
# p

=
=0
"
#

p

2
= 2
= 2
= 1
Thus, even as ! 1 the random variable (0 1). Although this is a long way
from proving has a denable distribution, even in the limit, it does help to point
p
for otherwise we simply have
out that the term ! 1 is necessary to stabilize ,
! 0.
p
Remark 3:
Asymptotically :=
has a standard normal density (2)1 expf 2 2g.
Proof:
Dene := ( ), hence
p
1 X
:= = p

=1
We will show the characteristic function [ ] ! 2 . The latter is the characteristic function of a standard normal, while characteristic functions and distributions
have a unique correspondence: only standard normals have a characteristic function
2
like 2 .
11
12
By independence and identical distributednessNow expand

0 by a second order Taylor expansion:
"
#
h
i
h
i
Y
12
12
=1

=
=
around =
(2)
=1
Y
=1
12
h
i h
i
12
12
= 1 + 12
1 2
+ 2 2 +
1!
2!
= 1 + 12
1 2
2 +
1!
2!
where is a remainder term that is a function of 12 . Now take the expectations

as in (2), and note [ ] = [( )] = 0 and [2 ] = [( )2 ]2 = 2 2 =
1:
h
i
12
1 2

= 1 + 12 [ ] 2
+ [ ]
1!
2!
1 2
= 1
+
2
where := [ ]
12
12
is a bounded random variable, in particular j
j
It is easy to prove
1 wp1 (see Bierens) so even if does not have higher moments we know j j 1.
12
] ! 1.
Further ! 0 because [
Now take the -power in (2): by the Binomial expansion

h
i
1 2
1 2
12

=
1
+
=
1
2
=0
2
X
1
1 2
=
1
+
1

2
2
=1
The rst term satises
2
1 2
1
! 2
2
because the sequence f(1 + ) g1 converges: (1 ) ! (simply put

= 2 2). For the second term notice for large enough we have j1 1 2 2j
1 hence

X

X
1 2
X
1

= (1 + )
=1
=1
12
=0
See Bierens for details that verify (1 + ) ! 0. QED.
Example (Bernoulli):
The most striking way to demonstrate the CLT is to
begin with the least normal of data, a Bernoulli random variable which is discrete
and takes only two nite values, and show

! (0 1), a continuous
random variable with innite support.
We simulate (2) for = 5, 50 500, 10000 and compute

2
2
p
p
p
:=
= p
=
4
2 8
In order to show the small sample distribution of we need a sample of 0 ,
so we repeat the simulation 1000 times. We plot the relative frequencies of the
sample of 0 for each . Let f g1000
the simulated sample of 0 . The
=1 beP
relative frequencies are the percentage 11000 1000
=1 ( +1 ) for interval
endpoints = [5 49 48 49 50]. See Figure 7. For the sake of comparison
in Figure 8 we plot the relative frequencies for one sample of 1000 iid standard normal
random variables (0 1).

Another way to see how becomes a standard normal random variable is to
compute the quantile such that ( ) = 975. A standard normal satises
( 196) = 975. We call an empirical quantile since it is based on a simulated
set of samples. We simulate 10,000 samples for each size = 5 105 205 ..., 5005
and compute . See Figure 9. As increases ! 196.
Figure 7
Standardized Means for Bernoulli
1000 0 , = 5
1000 0 , = 50
1000 0 , = 500
1000 0 , = 5000
13
Figure 8
Standard Normal
Standard Normal
Figure 9 - Empirical Quantiles q
2.3
2.2
2.1
2.0
1.9
1.8
1.7
5
505 1005 1505 2005 2505 3005 3505 4005 4505 5005
Sample Size n
14

Asymp Theory Notes 770 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Asymp Theory Notes 770 PDF

Uploaded by

Copyright:

Available Formats

Notes on Asymptotic Theory: Convergence in

Probability and Distribution

Let ( F ) be a probability space. Throughout is a parameter of interest like

timator is a variant of a straight

), or a function of a straight average (e.g.

:= (1 =1 ) , the square root of the average ). We therefore

We say ^ is an unbiased estimator of if [^ ] = . Dene bias as

An unbiased estimator has zero bias: B(^ ) = 0. If we had an innite number of

Claim (Weighted Average):

has uniform weights if ( 2 ). That is:

is the best linear

The rst order conditions are

. Since =1 = =1 = = 1 it follows = = 1. QED.

Claim (Sample Variance):

Further, by denition 2 := [( )2 ] hence

and the simulation variance of

Convergence in Mean-Square or L -Convergence

We say ^ 2 R converges to in mean-square if

If ^ is unbiased for then

Convergence in mean-square certainly does not require unbiasedness. In the, MSE is

hence component wise convergence. We may similarly write convergence in 2 -norm

or convergence in matrix (spectral) norm:

^ ^ ! 0 where kk is the largest eigenvalue of .

nance contexts where data are time series. A very large P

therefore exhibits ! in mean square.

! in -norm for any 2 (1 2] but proving the result

for 2 (1 2) where 0 is a nite constant.

Claim (Sample Mean):

Convergence in Probability : WLLN

We say ^ converges in probability to if

and we say ^ is a consistent estimator of .

That is, for a large enough sample size ^ is guaranteed to be as close to as we

By Chebyshevs inequality and independence, for any 0

which proves the claim. QED.

with = 1, but also the weights = P used in

In Figures 4 and 5 we plot and

Sam ple Size n

Sam ple Size n

Claim (Slutsky Theorem):

continuous (except possibly with countably many discontinuity points) then (^ ) !

and if 2 6= 0 and lim inf !1 j^2 j 0 then ^1 ^2 ! 1 2 .

If ( 2 2 ) and [2 2 j 1 then the sample correla-

1005 2005 3005 4005 5005 6005 7005 8005 9005

Almost Sure Convergence : SLLN

We say ^ converges almost surely to if

and we say ^ is strongly consistent for .

(j^ j ) 2 (^ )2 by Chebyshevs inequality. If (^ )2 ! 0 (i.e.

(j^ j ) (sup j^ j ) since sup j^ j j^ j.

Therefore if (sup j^ j ) ! 0 8 0 (i.e. ^ ! ) then (j^ j

If ^ is bounded wp1 then ^ ! if and only if [^ ] ! which is asymptotic un

biasedness (see Bierens). By the Slutsky Theorem ^ ! implies (^ )2 ! 0 hence

Claim (a.s. =) i.p. =) m.s.):

Let ^ be bounded wp1: (j^ j ) = 1

for nite 0. Then ^ ! implies ^ ! implies asymptotic unbiasedness and

Claim (Strong Law of Large Numbers = SLLN):

The Slutsky Theorem carries over to strong convergence.

Let ( 2 ) and dene

Convergence in Distribution : CLT

lim ^ = () for every on the support .

Thus, while ^ may itself not be distributed , asymptotically it is. We write

The notation ^ ! is a bit awkward, because characterizes innitely many

Central Limit Theorem