Professional Documents
Culture Documents
Let {Xn : n = 1, 2, ...} be a sequence of random variables indexed by n and Fn (x) = Pr(Xn x) be the distribution function of Xn . Further, let X be a random variable with distribution function F (x) = Pr(X x) and c be a constant. Almost sure convergence: Xn c if Pr(limn Xn = c) = 1 Convergence in probability: Xn c if limn Pr(|Xn c| ) = 1, > 0 Convergence in quadratic mean: Xn c if limn E[(Xn c)2 ] = 0 Convergence in distribution: Xn X if limn Fn (x) = F (x) at all points x at which F (x) is continuous [F is the asymptotic distribution of Xn ]
d q.m. p a.s.
The denitions of convergence in probability, quadratic mean (mean square) and a.s.(almost sure) can be generalised to the case in which c is not a constant but a random variable whose distribution does not depend on n. For example, we say that Xn X if Xn X 0. If a random sequence converges in q.m., then it converges in probability and in distribution & % Using Chebyshevs inequality, we can prove that: Xn X = Xn X only if X is a constant (the distribution of X is degenerate)
d p q.m. a.s. p p
As with non-stochastic limits, these concepts extend immediately to vectors and matrices of nite dimension. For example, convergence in probability is said to hold for a random vector if it holds for each of its components. X n = (X1n , ..., Xkn )0 , X = (X1 , ..., Xk )0
bj Xjn
k X j=1
bj Xj
Convergence of transformations 1. Continuous Mapping Theorem: Let X1 , X2 , .. and X be random k-vectors and g be a vector-valued continuous function on <k . Then: (a) Xn X = g(Xn ) g (X) (b) Xn X = g(Xn ) g (X)
0 e.g. Xn X N (0, Ik ) = Xn Xn X 0 X 2 (k) p p a.s. a.s.
2. Slutskys Theorem: Let {Xn } and {Yn } be sequences of random vectors such that Xn X and Yn c (constant). Then (a) Xn + Yn X + c
0 (b) Yn Xn c0 X d d d p
3. Cramers Linear transformation Theorem: Let {Xn } be a sequence of random vectors and {An } be a sequence of random 2
square matrices. Then Xn X and An A implies An Xn AX. Example Xn N (, ) and An A = An Xn N (A, AA0 ) 4. Xn X and (Xn Yn ) 0 = Yn X 5. Xn X and Yn 0 = Xn Yn 0 6. Delta Method: Let {Xn } be a sequence of random p-vectors and g : <p < a function with continuous rst derivatives. Then, d d n(Xn c) X = n(g (Xn ) g (c)) GX, where G =
g x0 x=c d p p d p d d p d
Proof : Consider any real vector m1 and form the function h(x) = 0 g(x), h() dierentiable. Then using the mean-value theorem, there exists cn between Xn and c such that h0 (x) (Xn c) h(Xn ) h(c) = x x=cn
p p
and therefore
Since cn is between Xn and c, and since Xn c = cn c. Then,given that h is a continuous function, we have that h(x) p h(x) x0 x=cn x0 x=c d h(x) n[h(Xn ) h(c)] X x0 x=c 3
Since
d n(Xn c) X,
g(x) d 0 { n[g(Xn ) g(c)]} 0 X x0 Since this is true for any we conclude that d g(x) n[g(Xn ) g(c)] X x0 x=c
Example 1 Let X1 , X2 , ...Xn be a random vector sample from D(, 2 ) d n(X n ) X, X N (0, 2 )
Sn =
1 Xn ,
Xn = n
n X i=1
Xi
1X Xt n t=1
as n .
2 = E(Xt ) if i.i.d. .
Yt = a + Xt + ut , yt = Yt Y , xt = Xt X P P yt xt xt ut = P 2 =+ P 2 xt xt
Consistency
p An estimator n is said to be consistent for if n as n .
A sucient (but not necessary) condition for consistency is that lim E(n ) = lim var(n ) = 0
and
These conditions imply that as n tends to innity the sampling distribution of n becomes less and less dispersed and eventually collapses (becomes degenerate) at . Example 3 Multiple Linear Regression y = X + u, E(u) = 0, E(uu0 ) = 2 In
1 0 Xu= n
n
1 n
p lim(
1X 1X ut ) = E( ut ) = 0 n t=1 n t=1 6
ut
x2t ut . . . xkt ut
n
p lim(
(provided that the error is uncorrelated with the regressors) Thus, 1 p lim( X 0 u) = 0 and p lim = n
n = E(X n ),
2 = nvar(X n ) n
i.e. X n N (, ) (Lindeberg-Levy) (extends directly to random vectors) n Let {Xt } be independent with E(Xt ) = t , var(Xt ) = 2 and E(|Xt t |2+ ) < t for some > 0 and all t. If 2 > 0 n suciently large, n
2 =asymptotic variance of n
A useful multivariate CLT (LindebergFeller) Let {Xt } be a sequence of independent random vectors with E(Xt ) = 0, var(Xt ) = t and distribution functions Ft (x) = P (Xt x). Suppose further that 1X lim t = 6= 0 n n t=1 1X lim n n t=1
n n
|x|> n
Then,
1 X d Xt N(0, ) n t=1
Convergence of moments
Consider a random sequence {Xn } such that Xn X (i.e. limn Fn (x) = F (x)) asymptotic moments of Xn are:2
r Ea (Xn )
1
E(X ) =
X r dF (x), r 1
The limit of the r-th moment is dened as3 Z r lim E(Xn ) = X r dFn (x)
n r r In general, there is no reason why Ea (Xn ) and lim E(Xn ) should be equal. n
If Xn X =
a.s.
r lim E(Xn ) n
6=
r Ea (Xn )
q q Proposition 1 If E(|Xn |r ) < for all n, then lim E(Xn ) = Ea (Xn ) for any n 2 q < r. In particular, if E(Xn ) < , then lim E(Xn ) = Ea (Xn ). n
If {n } is a sequence of estimators of , then n is asymptotically unbiased if Ea (n ) = . (if p lim n = , then Ea (n ) = ) d n is asymptotically normal if n(n ) N (0, V ), V is the asymptotic covariance matrix of n(n ). If n is consistent, this class of estimators is known as the consistent asymptotically normal (CAN) class.
(xt is k 1)
where {xt ut } is an independent sequence of random vectors with E(xt ut ) = 0 and var(xt ut ) = 2 xt x0 2 Qt . Hence, applying the LindebergFeller CLT to {xt ut } t 1 X d xt ut N (0, ), n t=1
n n
we get
= lim
1X var(xt ut ) n n t=1
10
1 AN (, 2 Q1 ). n In practice 2 and Q1 are unknown and they are replaced by: 2 = 1 1 u0 u and ( X 0 X) nk n
respectively. Thus, the asymptotic covariance matrix of is estimated by 2 (X 0 X)1 , which is the nite-sample estimator of var(). The asymptotic normality of ensures that conventional t and F tests are asymptotically valid. Critical values should be taken from N (0,1) rather than t(n k) and from
2 (q) q
11