Professional Documents
Culture Documents
Introduction
X
:= 1
Sample Mean :
=1
Sample Variance #1 :
1 X
:=
( )2
1
=1
Sample Variance #2 :
^ 2 :=
X
=1
P
1 =1
Sample Correlation :
^ :=
^
^
Similarly, we may estimate a probability by using a sample relative frequency:
1X
^ () =
( ) the sample percentage of
=1
Notice ^ () estimates ( ).
We will look at estimator properties: what ^ is on average for any sample
size; and what ^ becomes as the sample size grows. PIn every
case above the
es
2
P
Unbiasedness
Defn.
=
1.
=1
Proof:
"
X
=1
[ ] =
=1
X
=1
= QED.
:= 1 P is a
Corollary (Straight Average):
The sample mean
=1
P
weighted average with at or uniform weights = 1 hence trivially =1 = 1
hence
=
[]
P
The problem then arises as to which weighted average =1 may be preferred
in practice since any with unit summed weights is unbiased. We will discuss below the
concept of eciency below, but the minimum mean-squared-error unbiased estimator
Proof:
We want to solve
!2
X
X
min
subject to
= 1
=1
=1
The Lagrange is
L ( ) :=
=1
!2
+ 1
X
=1
P
P
where by independence ( =1 )2 = 2 =1 2 , hence
X
X
L ( ) := 2
2 + 1
=1
=1
L ( ) = 2 2 = 0 and
L ( ) = 1
= 0
=1
2
2
ThereforeP = (2
P) is a constant that sums to =1 = 1. Write = (2 ) =:
Remark:
As in many cases here and below, independence can be substituted
for uncorrelatedness since the same proof applies: [ ] = [ ][ ] for all
6= . We can also substitute uncorrelatedness with a condition that restricts the
total correlation across all and for 6= , but such generality is typically only
exploited in time series settings (where is at a dierent time period).
Notice
=1
=1
1 2
1 X
2= 1
2
=
^ 2 =
+
1X
1 X
1X
2
( ) +
+2
( )
=1
=1
=1
1X
1X
2
( ) + 2
( )
=1
=1
1X
22
( )2 +
=1
1X
2
( )2
=1
is unbiased
By the iid assumption and the fact that
!
2
1
1 X
1
=
= 2
( ) = 2 2 = 2
=1
=1
i 1X
1X
1X h
2
2
( ) =
( ) =
2 = 2
=1
=1
=1
Therefore
2
1 2
1
=
^ = 2 2 = 2
2
This implies each claim: 2 = 2 (2 is unbiased),
^ = 2 ( 1) 2
2
2
2
(^
is negatively biased), and
^ = ( 1) ! 2 (^
2 is asymptotically
unbiased). QED.
Example:
We simulate 100 samples of (75 4) with sample size = 20.
for each sample. The simulation average of all
is 74.983941
In Figure 1 we plot
is .30940776. Thus, both display the same property of unbiasedness, but exhibits
less dispersion across samples
Figure 1 :
Figure 2 :
^
Defn.
We also write
2
MSE(^ ) := ^ ! 0
^
! and ^ ! in mean-square.
h i2
h i
MSE(^ ) = ^ ^
= ^
h i
h i
2
MSE(^ ) = ^ = ^ ^ + ^
h i2
h i
2
h i h i
= ^ ^
+ ^ + 2 ^ ^
^
h i2
h i
2
= ^ ^
+ ^
4
h i
h i
h i
since ^ is just a constant and ^ ^ = [^ ] ^ = 0. Hence
MSE is the variance plus bias squared:
h i2
h i
2
h i 2
MSE(^ ) = ^ ^
+ ^ = ^ + B ^
If ^ 2 R then we write
0
MSE(^ ) := ^ ^ ! 0
0
X
^ ^ ! 0 where kk := @
2 A
2
=1 =1
2
Both imply convergence with respect to each element ^ ! 0.
Defn.
We say ^ 2 R has the property of -convergence, or convergence in
-norm, to if for 0
^ ! 0
Clearly 2 -convergence and mean-square convergence are equivalent.
Claim (Sample Mean):
Proof:
! in mean square.
Let ( 2 ). Then
)2 = []
= 2 ! 0 QED.
(
= 2 still holds.
We only require uncorrelatedness since []
Claim (Sample Mean):
mean square.
Proof:
! in
Let ( 2 ) be uncorrelated. Then
)2 = []
= 2 ! 0 QED.
(
In fact, we only need all cross covariances to not be too large as the sample size
grows.
P
Claim (Sample Mean):
Let ( 2 ) satisfy 12 ( ) ! 0.
! in mean square.
Then
)2 = []
= 2 + 22 P ( ) ! 0 QED.
Proof:
(
Remark:
In micro-economic contexts involving cross-sectional data this type
of correlatedness is evidently rarely or never entertained. Typically we assume the
0 are uncorrelated. It is, however, profoundly popular in macroeconomic and
5
=1
! in -norm for
Let ( 2 ) be iid. Then
1 X
1 X
1
1
=
f g = 1 ! 0
=1
=1
since 1 QED.
Example:
We simulate (7 400) with sample sizes = 5 15 25 1000.
and []
= 400 over sample size . Notice the high volatility
In Figure 3 we plot
for small .
and []
Figure 3:
Defn.
lim ^ = 0 8 0
!1
We variously write
^ !
and ^ !
(1)
1 8
)g1
=1 ,
!
If ( 2 ) then
.
2
2 = 2 ! 0 QED
Remark 1:
We call this a Weak Law of Large Numbers [WLLN] since convergence is in probability. A Strong LLN based on a stronger form of convergence is
given below.
2 = 2 ! 0.
Remark 2:
We only need uncorrelatedness to get
The WLLN, however extends to many forms of dependent random variables.
Remark 3:
In the iid case we only need j j 1, although the proof is
substantially more complicated. Even for non-iid data we typically only need j j1+
1 for innitessimal 0 (pay close attention to scholarly articles you read, and
to your own assumptions: usually far stronger assumptions are imposed than are
actually required).
P
The weighted average =1 is also consistent as long as the weights decay
with the sample size. Thus we write the weight as .
P
P
P
Claim:
If ( 2 ) then =1 ! if =1 = 1 and =1 2 ! 0.
P
Proof:
By Chebyshevs inequality, independence and =1 = 1, for any 0
!
!2
!2
X
X
X
2
= 2
f g
=1
=1
= 2
X
=1
=1
h
i
X
2 ( )2 = 2 2
2 ! 0
=1
Example:
We simulate (75 20)
sample sizes = 5 15 25 10000.
Pwith
Figure 4 :
Figure 5 :
^
79
79
78
78
77
77
76
76
75
75
74
74
73
73
72
72
71
71
70
70
1005
2005
3005
4005
5005
6005
7005
8005
9005
1005
2005
3005
4005
5005
6005
7005
8005
9005
Corollary:
Let ^ ! , = 1 2. Then ^ 1 ^ 2 ! 1 2 , ^1 ^2 ! 1 2 ,
Claim:
Proof:
If ( 2 ) and [4 ] 1 then 2 ! 2 .
Note
1 2
1 X
1X
2
=
=
( )2
=1
=1
!
2 !
By LLN
, therefore by the Slutsky Theorem
0. By [4 ]
1 it follows ( )2 is iid with a nite variance, hence it satises the LLN:
P
1 =1 ( )2 ! [( )2 ] = 2 . QED.
Claim:
tion
^ ! the population correlation.
Example:
We simulate (7 400) and (0 900) and construct =
43 + 2 + . The true correlation is
43 [ ] + 2 2 7 (43 + 2 7)
[ ] [ ] [ ]
p
=
=
20 4 400 + 900
43 7 + 2 400 + 72 7 (43 + 2 7)
p
=
= 8
20 4 400 + 900
We estimate correlation for samples with size = 5 15 25 10000. Figure 6 demonstrates consistency and therefore the Slutsky Theorem.
Figure 6: Correlation
1.00
0.90
0.80
0.70
0.60
0.50
5
Defn.
lim ^ = = 1
!1
This is identical to
lim max = 0 8 0
!1
We variously write
^ !
and ^ !
^ ! implies ^ ! ; . ^ ! implies ^ ! .
^
! ) then (j^ j ) ! 0 where 0 is arbitrary. Therefore ^ ! .
) ! 0 8 0 (i.e. ^ ! ). QED.
^
! .
! .
Remark:
Example:
If ( 2 ) then
1 +
Then (j^ j ) = 1. Moreover, under the iid assumption
! by the SLLN,
hence by the Slutsky Theorem
1
^
!
1 +
Therefore
1
^ !
1 +
and [^ ] ! = 1(1 + ) and
2
^
! 0
Defn.
We say ^ converges in distribution to a distribution , or to a random
variable with distribution , if
^ !
or ^ ! where .
6.1
Convergence of some
By far the most famous result concerns the sample mean .
^
estimator in a monumentally large number of cases reduces to convergence of a
sample mean of something, call it . This carries over to the sample correlation,
regression model estimation methods like Ordinary Least Squares, GMM, and Maximum Likelihood, as well as non-parametric estimation, and on and on.
10
As usual, we limit ourselves to the iid case. The following substantially carries
over to non-iid data, and based on a rarely cited obscure fact does not even require
a nite variance (I challenge you to nd a proof of this, or to ever discover any
econometrics textbook that accurately states this).
Claim (Central Limit Theorem = CLT):
:=
If ( 2 ) then
!
(0 1)
Remark 1:
This is famously cites as the LindebergLvy CLT. Historically,
however, the proof arose in dierent camps sometime between 1910-1930 (covering
Lindeberg, Lvy, Chebyshev, Markov and Lyapunov).
p
Remark 2:
Notice by construction :=
is a standardized
= by identical distributedness and []
= 2 by
sample mean because []
independence and identical distributedness. Thus
[]
p
:=
=
=
[]
Therefore
p
has mean 0 and variance 1:
"
# p
=
=0
"
#
p
2
= 2
= 2
= 1
Thus, even as ! 1 the random variable (0 1). Although this is a long way
from proving has a denable distribution, even in the limit, it does help to point
p
for otherwise we simply have
out that the term ! 1 is necessary to stabilize ,
! 0.
p
Remark 3:
Asymptotically :=
has a standard normal density (2)1 expf 2 2g.
Proof:
Dene := ( ), hence
p
1 X
:= = p
=1
We will show the characteristic function [ ] ! 2 . The latter is the characteristic function of a standard normal, while characteristic functions and distributions
have a unique correspondence: only standard normals have a characteristic function
2
like 2 .
11
12
Y
12
12
=1
=
=
around =
(2)
=1
Y
=1
12
h
i h
i
12
12
= 1 + 12
1 2
+ 2 2 +
1!
2!
= 1 + 12
1 2
2 +
1!
2!
1 2
= 1 + 12 [ ] 2
+ [ ]
1!
2!
1 2
= 1
+
2
where := [ ]
12
12
is a bounded random variable, in particular j
j
It is easy to prove
1 wp1 (see Bierens) so even if does not have higher moments we know j j 1.
12
] ! 1.
Further ! 0 because [
Now take the -power in (2): by the Binomial expansion
h
i
1 2
1 2
12
=
1
+
=
1
2
=0
2
X
1
1 2
=
1
+
1
2
2
=1
2
1 2
1
! 2
2
1
= (1 + )
=1
=1
12
=0
Example (Bernoulli):
The most striking way to demonstrate the CLT is to
begin with the least normal of data, a Bernoulli random variable which is discrete
2
2
p
p
p
:=
= p
=
4
2 8
In order to show the small sample distribution of we need a sample of 0 ,
so we repeat the simulation 1000 times. We plot the relative frequencies of the
sample of 0 for each . Let f g1000
the simulated sample of 0 . The
=1 beP
relative frequencies are the percentage 11000 1000
=1 ( +1 ) for interval
endpoints = [5 49 48 49 50]. See Figure 7. For the sake of comparison
in Figure 8 we plot the relative frequencies for one sample of 1000 iid standard normal
1000 0 , = 5
1000 0 , = 50
1000 0 , = 500
1000 0 , = 5000
13
Figure 8
Standard Normal
Standard Normal
Figure 9 - Empirical Quantiles q
2.3
2.2
2.1
2.0
1.9
1.8
1.7
5
505 1005 1505 2005 2505 3005 3505 4005 4505 5005
Sample Size n
14