ECON509 - Slides 5 - 2014 11 17

ECON509 Probability and Statistics
Slides 5
Bilkent
This Version: 16 November 2014
(Bilkent)
ECON509
1 / 82
Introduction
In this part of the lecture notes, we will focus on properties of random samples and
consider some important statistics and their distributions, which you will encounter
in your future econometrics courses.
In addition, we will also introduce key convergence concepts, which are at the
foundation of asymptotic theory. You might nd these concepts a bit too abstract,
but they are used in econometrics pretty frequently.
(Bilkent)
ECON509
2 / 82
Basic Concepts of Random Samples
We start with a denition.

Denition (5.1.1): The random variables X1 , ..., Xn are called a random sample of
size n from the population f (x ) if X1 , ..., Xn are mutually independent random
variables and the marginal pdf or pmf of each Xi is the same function f (x ).
Alternatively, X1 , ..., Xn are called independent and identically distributed random
variables with pdf or pmf f (x ). This is commonly abbreviated to iid random
variables.
In many experiments there are n > 1 repeated observations made on the variable,
where X1 is the rst observation, X2 is the second observation etc.
In that case, each Xi has a marginal distribution given by f (x ).
In addition, the value of one observation has no eect on or relationship with any of
the other observations.
X1 , ..., Xn are mutually independent.
(Bilkent)
ECON509
3 / 82
Basic Concepts of Random Samples
Then, the joint pdf or pmf is given by

n
f (x1 , ..., xn j ) = f (x1 j ) f (x2 j ) ... f (xn j ) =
f (xi j ),
i =1
where we assume that the population pdf of pmf is a member of a parametric family,
and is the vector of parameters.
The random sampling model in Denition 5.1.1 is sometimes called sampling from
an innite population.
Suppose we obtain the values of X1 , ..., Xn sequentially.
First, the experiment is performed and X1 = x1 is observed.
Then, the experiment is repeated and X2 = x2 is observed.
The assumption of independence implies that the probability distribution for X2 is
unaected by the fact that X1 = x1 was observed rst.
(Bilkent)
ECON509
4 / 82
Sums of Random Variables from a Random Sample

Suppose we have drawn a sample (X1 , ..., Xn ) from the population. We might want
to obtain some summary statistics.
This summary might be dened as a function
T (x1 , ..., xn ),
which might be real or vector-valued.
So,
Y = T (X1 , ..., Xn ),
is a random variable or a random vector.
We can use similar techniques as those introduced for functions of random variables,
to investigate the distributional properties of Y .
Thanks to (X1 , ..., Xn ) possessing the iid property, the distribution of Y will be
tractable.
This distribution is usually derived from the distribution of the variables in the
random sample. Hence, it is called the sampling distribution of Y .
(Bilkent)
ECON509
5 / 82

Denition (5.2.1): Let X1 , ..., Xn be a random sample of size n from a population
and let T (x1 , ..., xn ) be a real-valued or vector-valued function whose domain
includes the sample space of (X1 , ..., Xn ) . Then, the random variable or random
vector Y = T (X1 , ..., Xn ) is called a statistic. The probability distribution of a
statistic Y is called the sampling distribution of Y .
Lets consider some commonly used statistics.
Denition (5.2.2): The sample mean is the arithmetic average of the values in a
random sample. It is usually denoted by
1 n
X = Xi .
n i =1
Denition (5.2.3): The sample variance is the statistic dened by
S2 =
1
n
(X i
1 i =1
2
X ) .
The sample standard deviation is the statistic dened by

p
S = S 2.
(Bilkent)
ECON509
6 / 82

Theorem (5.2.4): Let x1 , ..., xn be any number and x = (x1 + ... + xn ) /n. Then,
1
2
min a ni=1 (xi a )2 = ni=1 (xi x )2 ,

(n 1 ) s 2 = ni=1 (xi x )2 = ni=1 xi2
n x 2 .
Proof: To prove part (a), consider

n
(xi
a )2
i =1
(xi
x + x
a )2
(xi
x )2 + 2
(xi
(xi
x )2 +
i =1
n
i =1
n
i =1
i =1
n
(x
x ) (x
a) +
(x
a )2
i =1
a )2 .
i =1
Clearly, the value of a that minimises ni=1 (xi
a )2 is given by
a = x .
Part (b) can easily be proved by expanding the binomial and taking the sum.
(Bilkent)
ECON509
7 / 82
Lemma (5.2.5): Let X1 , ..., Xn be a random sample from a population and let g (x )
be a function such that E [g (X1 )] and Var (g (X1 )) exist. Then
"
#
n
g (X i )
= nE [g (X1 )]
i =1
and
n
Var
g (X i )
i =1
Proof: This is straightforward. First,

"
#
n
g (X i )
i =1
= nVar (g (X1 )).
i =1
i =1
E [g (Xi )] = E [g (X1 )] = nE [g (X1 )],
since Xi are distributed identically. Note that independence is not required here.
(Bilkent)
ECON509
8 / 82
Then,
n
Var
g (X i )
i =1
Var (g (Xi )) + Cov (g (Xi ), g (Xj ))
i =1
i 6 =j
Var (g (Xi )) + 0 = Var (g (X1 )) = nVar (g (X1 )),
i =1
i =1
where we have used the fact that

Cov (g (Xi ), g (Xj )) = 0
for all i 6= j,
due to independence and that Var (g (Xi )) is the same for all Xi , due to their
distributions being identical.
(Bilkent)
ECON509
9 / 82
Theorem (5.2.6): Let X1 , ..., Xn be a random sample from a population with mean
and variance 2 < . Then,
1
2
3
E [X ] = ,
Var (X ) = 2 /n,
E [S 2 ] = 2 .
Proof: Now,
E [X ] = E
1 n
Xi
n i
=1
1 n
1 n 2
2
.
Var
(
X
)
=
=
i
n
n 2 i =1
n 2 i =1
1 n
1
E [Xi ] = n = .
n i
n
=1
Then,
Var (X ) = Var
(Bilkent)
1 n
Xi
n i
=1
ECON509
10 / 82
Finally,
2
E [S ]
=
=
=
=
1
n
(X i
i =1
E [Xi2 ]
1 i =1
n
1
n
1 i =1
1
#
2
X) =
1
n
"
n
n
"
Xi2
i =1
n X 2
E [X 2 ]
n
2 + 2
n 2 + 2
2
+ 2
n
2
+ 2
n
(n
1 ) 2
= 2 .
n 1
An aside: You might already know that, since E [X ] = and E [S 2 ] = 2 , one

would refer to X and S 2 as the unbiased estimators for and 2 , respectively.
(Bilkent)
ECON509
11 / 82

Now, observe that
MX (t ) = E [e t X ] = E [e t (X 1 +...+X n )/n ] = E [e (t /n )Y ] = MY (t/n ).

Of course, since X1 , ..., Xn are identically distributed, MX i (t ) is the same function
for each i . Therefore, we have the following result.
Theorem (5.2.7): Let X1 , ..., Xn be a random sample from a population with mgf
MX (t ). Then, the mgf of the sample mean is
MX (t ) = [MX (t/n )]n .
The proof is an application of Theorem (4.6.7).
Proof: Remember that Theorem (4.6.7) says that if X1 , ..., Xn are mutually
independent random variables with mgfs MX 1 (t ), ..., MX n (t ) and if
Z = X1 + ... + Xn ,
then, the mgf of Z is
MZ (t ) = MX 1 (t ) ... MX n (t ).
(Bilkent)
ECON509
12 / 82
In particular, if X1 , ..., Xn all have the same distribution with mgf MX (t ), then
MZ (t ) = [MX (t )]n .
Now the sequence we have is actually X1 /n, ..., Xn /n. Observe that if
E [e tX ] = MX (t ),
then,
E [e t (X /n ) ] = E [e n X ] = MX (t/n ).
Then, for Z = X , Theorem (4.6.7) gives
MZ (t ) = MX (t ) = [MX (t/n )]n .
(Bilkent)
ECON509
13 / 82

Example (5.2.8) (Distribution of the mean): Let X1 , ..., Xn be a random sample
from a N (, 2 ) population. Then the mgf of the sample mean is
MX (t )
=
=
exp
t
2 (t/n )2
+
n
2
"
#
2 /n t 2
= exp t +
.
2
t
2 (t/n )2
exp n +
n
2
Thus, X
N (, 2 /n ).
In this example, it was helpful to use Theorem (5.2.7) because the expression for
MX (t ) turned out to be a familiar mgf. It cannot, of course, be guaranteed that this
will always be the case for any X . However, when this is the case, this result makes
derivation of the distribution of X very easy.
Another example is the sample mean for a gamma (, ) sample. The mgf of X in
this case is given by
MX (t ) =
1
(t/n )
1
( /n ) t
which reveals that X

gamma (n, /n ).
Of course, there are cases where this method is not useful, due to mgf of X not
being recognisable. Remedies for such cases are considered in Casella and Berger
(2001, pp. 215-217). We will not cover these.
(Bilkent)
ECON509
14 / 82
Sampling from the Normal Distribution
Now, we focus on the case where the population distribution is the normal
distribution.
The results we introduce in this section will be very useful when you deal with linear
regression models.
We have already talked about distributional properties of the sample mean and
variance. Given the extra assumption of normality, we are now in a position to
determine their full distributions.
The chi squared distribution will come up frequently within this context, so we start
by introducing some important properties related to this distribution.
Remember that the chi squared pdf is a special case of the gamma pdf and is given
by
1
f (x ) =
x (p/2 ) 1 e x /2 ,
0 < x < ,
(p/2)2p/2
where p is called the degrees of freedom.
(Bilkent)
ECON509
15 / 82

Lemma (5.3.2) (Facts about chi squared random variables): We use the notation
2p denote a chi squared random variable with p degrees of freedom.
a. If Z
N (0, 1 ) random variable, then

Z2
21 .
In other words, the square of a standard normal random variable is a chi squared
random variable.
b. If X 1 , ..., X n are independent and X i
2pi , then
X 1 + ... + X n
2p1 +...+pn ;
that is, independent chi squared variables add to a chi squared variable, and the
degrees of freedom also add.
To prove the second part, remember that a 2p random variable is a gamma (p/2, 2)
random variable. By the result obtained in Example (4.6.8), the sum of independent
gamma (i , ) random variables is a gamma (1 + ... + n , ) random variable.
Then, X1 + ... + Xn above is a gamma ((p1 + ... + pn )/2, 2) random variable, which
is a 2p1 +...+pn random variable.
Part (a) can be proved by obtaining the pdf of the transformation Y = Z 2 and then
conrming that this is the pdf of a 21 random variable.
(Bilkent)
ECON509
16 / 82
A nice result about normally distributed random variables is that zero covariance
implies independence, which does not hold necessarily for other distribution
functions.
Lemma (5.3.3): Let Xj
N (j , 2j ), j = 1, ..., n independent. For constants aij and
brj (j = 1, ..., n; i = 1, ..., k; r = 1, ..., m) where k + m n, dene
n
Ui
aij Xj ,
i = 1, ..., k,
brj Xj ,
r = 1, ..., m.
j =1
n
Vr
j =1
a. The random variables U i and V r are independent if and only if Cov (U i , V r ) = 0.

Furthermore, Cov (U i , V r ) = nj=1 a ij b rj 2j .
b. The random vectors (U 1 , ..., U k ) and (V 1 , ..., V m ) are independent if and only if U i is
independent of V r for all pairs i , r (i = 1, ..., k ; r = 1, ..., m).
(Bilkent)
ECON509
17 / 82
Again, we consider the case where Xj

"
n
Cov (Ui , Vr ) = E [Ui Vr ] = E
N (0, 1), for simplicity. Then,

#
"
#
n
aij Xj brj Xj
j =1
j =1
=E
aij brj Xj2
j =1
aij brj ,
j =1
due to independence. The implication from independence to zero covariance is also

immediate (Theorem (4.5.5)). In addition, since Ui and Vr are linear combinations
of normal random variables, they are also normally distributed (Corollary 4.6.10).
Now, it is a bit more involved to show that we have indeed independence. Consider
the case where n = 2, as the more general proof will be similar in spirit but more
complicated.
The joint pdf is given by
fX 1 ,X 2 (x1 , x2 ) =
(Bilkent)
2
2
1
e (1/2 )(x1 +x2 ) ,
2
ECON509
< x1 , x2 < .
18 / 82
Consider the transformation is given by

u = a1 x1 + a2 x2
and
v = b1 x1 + b2 x2 ,
and
x2 =
which imply
x1 =
b2 u
a1 b2
a2 v
b1 a2
a1 v
a1 b2
b1 u
.
b1 a2
The Jacobian of the transformation is

b2
a1 b2 b1 a2
b1
a1 b2 b1 a2
a2
a1 b2 b1 a2
a1
a1 b2 b1 a2
(b2 a1
(a1 b2
b1 a2 )
b1 a2 )
1
a1 b2
b1 a2
Then, the new joint pdf is,

fU ,V (u, v ) = fX 1 ,X 2
(Bilkent)
b2 u
a1 b2
a2 v
a1 v
,
b1 a2 a1 b2
ECON509
b1 u
b1 a2
1
a1 b2
b1 a2
19 / 82

Therefore,
fU ,V (u, v )
1
exp
2
1
2 (a1 b2
1
a1 b2
where
b1 a2
b1 a2 )2
(b2 u
a2 v ) + (a1 v
b1 u )
< u, v < .
Now,
(b2 u
a2 v )2 + (a1 v
b1 u )2 = b12 + b22 u 2 + a12 + a22 v 2
2 (a1 b1 + a2 b2 ) uv .
But we know that (a1 b1 + a2 b2 ) = 0. Hence, this shows that the joint pdf factorises
into a function of u and a function of v . Therefore, U and V are independent!
A similar technique can be utilised to prove part (b). Specically, by using a
transformation argument, it can be shown that the joint pdf of vectors (U1 , ..., Uk )
and (V1 , ..., Vm ) will factorise, which will prove independence. We skip this.
(Bilkent)
ECON509
20 / 82
The main message here is that, if we start with independent normal random
variables, the zero-covariance and independence are equivalent for linear functions of
these random variables. Therefore, checking independence comes down to checking
covariances.
Part (b), on the other hand, makes it possible to infer overall independence of
normal vectors by just checking pairwise independence, which is not valid for general
random variables.
Thinking about part (a), I nd it intuitively much easier to follow that the normal
distribution is completely determined by its mean and variance while other moments
do not matter. Hence, ensuring zero covariance is equivalent to ensuring
independence, since we do not have to care about the remaining moments of the
distribution.
(Bilkent)
ECON509
21 / 82
We can show the usefulness of Lemma 5.3.3 by trying to prove the independence of
X and S 2 when X1 , ..., Xn is sampled from a normal population distribution.
It can be shown that S 2 can be written as a function of (X2 X , ..., Xn X ) . Now,
if we can show that these random variables are not correlated with X , then by the
normality assumption and Lemma 5.3.3, we can conclude independence.
We have
1 n
X = Xi
n i =1
and
X =
i =1
where
ij =
(Bilkent)
Xj
1
0
ECON509
ij
1
n
Xi ,
if i = j
.
otherwise
22 / 82

For simplicity (and without loss of generality) consider the case Xj
N (0, 2 ) for all
j. Then,
(
#)
!"
n
1 n
1
Cov (X , Xj X ) = E
Xi
ij n Xi
n i
=1
i =1
#
"
1 n
1
2
= E
ij
Xi
n i
n
=1
2
=
Now,
i =1
ij
1
n
1 n
ij
n i
=1
1
n
=1
1
=1
n
i =1
1 = 0.
Hence,
Cov (X , Xj
and so, X and Xj
(Bilkent)
X ) = 0,
X are independent for all j, which yields the desired result.
ECON509
23 / 82
Convergence Concepts
The underlying idea in this section is to understand what happens to sequences of

random variables, or also summary statistics, when we let the sample size go to
innity.
This is, of course, an idealistic concept, if you like, because the sample size never
goes to innity. However, the idea is to attain a grasp of what happens when the
sample size becomes large enough.
This is important because, although important results are based on the case when
the sample size approaches , they are actually relevant for nite sample size, which
are large enough.
How large is large enough is very much related to the data, its dependence
structure, the econometric model at hand etc, so it is di cult to give a proper rule
of thumb.
(Bilkent)
ECON509
24 / 82
The tools you will learn in this section are the fundamental building blocks of what
is known as asymptotic theory or large sample theory. Although a bit abstract
at rst sight, these results are at the core of many proofs you will encounter in
econometrics articles.
The three important concepts we will consider in what follows are
1
2
3
almost sure convergence,

convergence in probability,
convergence in distribution.
In the rst instance we will consider the case where a sequence of random variables
X1 , ..., Xn exhibits the iid property. This is one (and the simplest) of many possible
dependence settings.
(Bilkent)
ECON509
25 / 82
Almost Sure Convergence
Remember that random variables are functions dened on the sample space, e.g.
X ( ) . Our interest will be on a sequence of random variables, indexed by sample
size, i.e. Xn ( ).
To motivate the following discussion, consider pointwise convergence:
lim Xn ( ) = X ( )
n !
for all 2 ,
where is, as before, the sample space.

Notice that, convergence occurs for all ! There is not much of a probabilistic
statement here.
This is the strongest form of convergence we can have on the sample space. But it
is not relevant for probabilistic statements.
(Bilkent)
ECON509
26 / 82
A slightly restricted version of pointwise convergence is almost sure convergence.

Denition (Almost Sure Convergence): A sequence of random variables X1 , X2 , ...
dened on a probability space (, F , P ) converges almost surely to a random
variable X if
lim Xn ( ) = X ( ) ,
n !
for each , except for 2 E , where P (E ) = 0.
The idea is this: pointwise convergence fails for some points in . However, the
number of such points is so small that we can safely assign zero measure (or zero
probability) to the set of these points.
Other ways of expressing this denition are,
P
lim jXn
n !
or
P
(Bilkent)
Xj > e = 0
for every e > 0,
: lim Xn ( ) = X ( )
n !
ECON509
= 1.
27 / 82
This time, we have a dierence. Convergence fails on a very small set E such that
P (E ) = 0.
That P (E ) = 0 is due to the set being so small that we can safely assign zero
probability to the set.
Remember that in earlier lectures we have stated that for a continuously distributed
random variable, the probability of a single point is always equal to zero. This is
similar, in spirit, to the situation at hand.
This type of convergence is also called convergence almost everywhere and
convergence with probability 1.
The following notation is common:
a.s .
Xn ! X
wp1
Xn ! X
lim Xn = X a.s.
n !
Note that, at the cost of sloppy notation, the argument of Xn ( ) is usually dropped.
Also, Xn ( ) need not converge to a function. It can also simply converge to some
constant, say, a.
(Bilkent)
ECON509
28 / 82
Usually, Xn ( ) is some sample average. For example, let Zi ( ), i = 1, ..., n, be

some random variable and let
Xn ( ) =
1 n
Z i ( ).
n i
=1
Then, almost sure convergence is a statement on the joint distribution of the entire
sequence fZi ( )g.
White (2001, p.19): The probability measure P determines the joint distribution
of the entire sequence fZi ( )g. A sequence Xn ( ) converges almost surely if the
probability of obtaining a realisation of the sequence fZi ( )g for which convergence
to Xn ( ) occurs is unity.
(Bilkent)
ECON509
29 / 82
We can extend this concept to vectors. Let Xn = (X1,n , ..., XD ,n )0 and

X = (X1 , ..., XD )0 . Then
a.s .
Xn ! X ,
if and only if
a.s .
Xd ,n ! Xd
for all d = 1, ..., D.
So almost sure convergence has to occur component by component.

A more compact notation is available. Dene the Euclidian norm:
q
X12 + ... + Xd2
jjX jj =
and observe that
jX i j
a.s .
Then, Xn ! X if
lim jjXn ( )
n !
X ( )jj = 0,
X12 + ... + Xd2 .
except for 2 E , where P (E ) = 0.
Whenever you see the terms almost sure, with probability one, or almost
everywhere you should remember that the relationship that is referred to holds
everyhwhere except for some set with zero probability.
(Bilkent)
ECON509
30 / 82
Now that we have a convergence concept in our arsenal, the next question is: under
which conditions can we use this result for statistics of interest?
One important result is Kolmogorovs Strong Law of Large Numbers (SLLN).
Theorem (Kolmogorovs Strong Law of Large Numbers): If X1 , X2 , ... are
independent and identically distributed random variables and the common mean is
nite, i.e. E [jX1 j] = < , then
1 n
Xi =
n ! n
i =1
lim
a.s.
or
1 n
a.s .
Xi ! .
n i
=1
Note that the sample size will never be equal to . However, when the sample size
is large enough, the sample mean will be very close to the population mean, .
What is remarkable about this result is that, as long as we have iid random
variables, the only condition is that the random variables have nite mean.
For the time being, su ce to say that one can relax the iid assumption but that
means that we will require more than just nite means. There is a trade-o between
the amount of dependence and the strength of moment assumptions you have to
make in order to attain convergence.
(Bilkent)
ECON509
31 / 82
Convergence in Probability
The next convergence type is convergence in probability. Its denition is similar to

that of almost sure convergence but in essence it is a much weaker convergence
concept.
Denition (Convergence in Probability): A sequence of random variables
X1 , X2 , ... converges in probability to a random variable X if, for every e > 0,
lim P (jXn
n !
X j > e) = 0.
An equivalent statement is that given e > 0 and > 0 there exists an N, which
depends on both and e, such that P (jXn X j > e) < for all n > N. This
merely is a restatement using the formal denition of limit.
One could also write
lim P (f : jXn ( )
n !
(Bilkent)
ECON509
X ( )j > eg) = 0.
32 / 82
It is not easy to see the dierence between the two modes of convergence. However,
lets give it a try!
Almost sure convergence states that we have pointwise convergence for all 2
except for a small, zero measure set E . Importantly, this set is independent of n.
Convergence in probability states that as the sample size goes towards , the
probability that Xn will deviate from X by more than e decreases towards zero.
However, for any sample size, there is a positive probability that Xn will deviate by
more than e. In other words, for some 2 En , such that P (En ) > 0, jXn X j
will be larger than e.
Importantly, there is nothing that restricts En to be the same for all n. Hence, the
set on which Xn deviates from X may change as n increases.
The good news is, deviation probability slowly goes to zero, hence P (En ) will
eventually be zero.
(Bilkent)
ECON509
33 / 82
Now, as before, let Zi , i = 1, ..., n, be some random variable and let

Xn =
1 n
Zi .
n i
=1
White (2001, p.24): With almost sure convergence, the probability measure P
takes into account the joint distribution of the entire sequence fZi g, but with
convergence in probability, we only need concern ourselves sequentially with the joint
distribution of the elements of fZi g that actually appear in Xn , typically the rst n.
(Bilkent)
ECON509
34 / 82
Consider the behaviour of Xn ( ) X ( ) for some xed as n ! on the next

p
two slides. Which one makes you think that Xn ( ) X ( ) ! 0? Which one makes
a.s .
you think that Xn ( ) X ( ) ! 0?
(Bilkent)
ECON509
35 / 82
X ()-X( ) for some given - Case 1
n
80
60
40
20
-20
-40
(Bilkent)
20
40
60
80
100
n
ECON509
120
140
160
180
200
36 / 82
X ()-X( ) for some given - Case 2
n
80
60
40
20
-20
-40
-60
(Bilkent)
20
40
60
80
100
n
ECON509
120
140
160
180
200
37 / 82
Example (5.5.8) (Convergence in probability but not almost surely): Let

= [0, 1] with the uniform probability distribution.
Dene the following sequence.
X1 ( )
+ I[0,1 ] ( ) ,
X2 ( ) = + I[0, 1 ] ( ) ,
X3 ( )
+ I[ 1 ,1 ] ( ) ,
X4 ( ) = + I[0, 1 ] ( ) ,
X5 ( )
+ I[ 1 , 2 ] ( ) ,
X6 ( ) = + I[ 2 ,1 ] ( ) ,
3 3
etc. Let, X ( ) = .
Do we have convergence in probability? Observe that
X1 ( )
X ( )
I[0,1 ] ( ) ,
X2 ( )
X3 ( )
X ( )
I[ 1 ,1 ] ( ) ,
X4 ( )
X ( ) = I[0, 1 ] ( ) ,
2
X ( ) = I[0, 1 ] ( ) ,
X5 ( )
X ( )
I[ 1 , 2 ] ( ) ,
X6 ( )
X ( ) = I[ 2 ,1 ] ( ) ,
3
3 3
etc.
(Bilkent)
ECON509
38 / 82
Now, due to the uniform probability distribution assumption, P ( 2 [a, b ]) = b
where 0 a b 1. Then, as n ! , P : I[a,b ] ( ) > e ! 0, since

b a ! 0 as n ! .
Do we have almost sure convergence? No. There is no value of 2 for which

Xn ( ) ! . For every , the value Xn ( ) alternates between the values and
+ 1 innitely often.
For example, if = 3/8, X1 ( ) = 11/8, X2 ( ) = 11/8, X3 ( ) = 3/8,

X4 ( ) = 3/8, X5 ( ) = 11/8, X6 ( ) = 3/8, etc.
No pointwise convergence occurs for this sequence.
(Bilkent)
ECON509
39 / 82
Example (5.5.7): Let the sample space be the closed interval [0, 1] with the
uniform probability distribution. Dene random variables
Xn ( ) = + n
and
X ( ) = .
For every 2 [0, 1), n ! 0 as n ! and Xn ( ) ! = X ( ).
However, Xn (1) = 2 for every n, so Xn (1) does not converge to 1 = X (1).

But, since the convergence occurs on the set [0, 1) and P ([0, 1)) = 1 (remember
that when we have a continuous random variable, the probability of a single point is
equal to zero),
a.s .
Xn ! X .
(Bilkent)
ECON509
40 / 82
One would usually write

p
Xn ! X
or
p lim Xn = X ,
n !
as short hand.
Associated with convergence in probability is the Weak Law of Large Numbers
(WLLN)
Theorem (Weak Law of Large Numbers): If X1 , X2 , ... are iid random variables
with common mean < and variance 2 < , then
1 n
p
Xi ! ,
n i
=1
as n ! .
(Bilkent)
ECON509
41 / 82
Proof: The proof uses Chebychevs Inequality. Remember this says that if X is a
random variable and if g (x ) is a nonnegative function, then, for any r > 0,
P (g (X )
r)
e2
E [g (X )]
.
r
Now, consider
P (jX n
e) = P (X n
where we use r = e2 and g (X n ) = (X n

The above result implies that
P (jX n
) .
P (jX n
j < e) = 1
and
lim 1
Hence,
E [(X n ) ]
Var (X n )
2
=
= 2,
2
2
e
e
ne
n !
e)
2
,
ne2
2
= 1.
ne2
lim P (jX n
n !
(Bilkent)
ECON509
j < e) = 1.
42 / 82
This is perhaps a good time to stop and reect a little bit on these new concepts.
What the Weak and Strong LLNs are saying is that under certain conditions, the
sample mean converges to the population mean as n ! . This is known as
consistency: one would say that the sample mean is a consistent estimator of the
population mean.
In actual applications, this means that if the sample size is large enough, then the
sample mean is close to the population mean. So n does not have to be that close
to innity. On the other hand, as mentioned at the beginning , how large the
sample size should be in order to be considered a large enough sample is a
dierent question in its own. We will not deal with this here.
Sometimes, consistency is compared to unbiasedness.
An estimator of a population value is an unbiased estimator if and only if
E [ ] = .
What are the things we might want to estimate? One example would be parameters
of a distribution family. For example, we might know that the data are distributed
with N , 2 , but we may not know the particular values of and 2 . In this case,
we would estimate these parameters.
(Bilkent)
ECON509
43 / 82
The p lim operator has some nice properties that makes it much more convenient to
deal with, compared to the expectation operator. In particular, let X1 , X2 , ... and
Y1 , Y2 , ... be two random sequences and a1 , a2 , ... be some non-stochastic sequence.
Then, we have the following.
1
p lim
n !
while we usually have

E
Xn
p limn ! Xn
=
,
Yn
p limn ! Yn
Xn
Yn
6=
E [X n ]
;
E [Y n ]
p lim (Xn + Yn ) = p lim Xn + p lim Yn ;

n !
n !
n !
the p lim of a non-random sequence is equal to its limit:

p lim an = lim an .
n !
(Bilkent)
ECON509
n !
44 / 82
Note that, one concept does not usually imply another. In other words, a consistent
estimator can be biased, while an unbiased estimator can be inconsistent.
Suppose we are trying to estimate the population parameter . Consider the
following estimators.
1
2
= + 20/n : consistent but biased.

= X where P (X = + 100 ) = P (X =
(Bilkent)
ECON509
100 ) = .50 : unbiased but inconsistent.
45 / 82
Returning to the discussion at hand, it is important to acknowledge that neither

almost sure convergence nor convergence in probability (and nor any convergence
type) says anything about the distribution of the sequence X1 , X2 , ... . For example,
it might be such that the distribution of Xi changes as i varies. This is ne.
So far, we have only considered LLNs that work when the sequence is drawn from an
iid population. If this assumption is violated, we can still probably have convergence
of the sample mean to the population mean, but we will have to nd an appropriate
LLN that works for the particular population distribution we have.
A useful result relating almost sure convergence and convergence in probability is
that
p
a.s .
Xn ! X ) Xn ! X .
Convergence in probability, however, does not imply almost sure convergence.
Obviously, for some constant K
a.s .
K !K
(Bilkent)
and
ECON509
K ! K.
46 / 82
As far as economists and most of the econometricians are concerned, one would not
care too much about whether convergence is achieved almost surely or in probability.
As long as convergence is achieved, the rest is not important.
However, in some cases it might be easier to prove the LLN for one of the two
p
a.s .
convergence types. This is no problem, as ! implies ! anyway.
In addition, convergence almost surely might be slower than convergence in

probability in the sense that it might require a larger sample size before the sample
mean is close enough to the population mean.
(Bilkent)
ECON509
47 / 82
Example (5.5.3) (Consistency of S 2 ): Suppose we have a sequence X1 , X2 , ... of

iid random variables with E [Xi ] = and Var (Xi ) = 2 < . If we dene
Sn2 =
1
n
(X i
1 i =1
2
X n ) ,
what condition do we require in order to prove a Weak Law of Large Numbers

(WLLN) for Sn2 ?
Again, use Chebychevs Inequality, to obtain
P
Sn2
e =P
Sn2
Sn2
e2
Var (Sn2 )
.
e2
Therefore, a su cient condition for weak convergence of Sn2 to 2 is that

limn ! Var (Sn2 ) = 0.
(Bilkent)
ECON509
48 / 82
Although we might get convergence results for some sample mean Xn and Yn , we
might actually be interested in the convergence properties of a function of these, say
g (X n , Y n ) .
Fortunately, we have the following useful result.
If (Xn , Yn ) converges almost surely (in probability) to (X , Y ) , if g (x , y ) is a
continuous function over some set D , and if the images of under
[Xn ( ) , Yn ( )] and [X ( ) , Y ( )] are in D , then g (Xn , Yn ) converges almost
surely (in probability) to g (X , Y ) .
More pragmatically,
a.s .
a.s .
Xn ! X
g (X n ) ! g (X ),
Xn ! X
g (X n ) ! g (X ).
2
Example (5.5.5) (Consistency of S ): If Sn2 is a consistent
p estimator of , then
one can show that the sample standard deviation Sn = Sn2 = h (Sn2 ) is a consistent
estimator of .
Interestingly, it can be shown that Sn , in fact, is a biased estimator of ! However,

the bias disappears asymptotically.
(Bilkent)
ECON509
49 / 82
Before we move on to a dierent type of convergence, let us, for sake of

completeness, introduce one more type of convergence.
Denition (Lp Convergence): Let 0 < p < , let X1 , X2 , ... be a sequence of
random variables with E [jXn jp ] < and let X be a random variable with
E [jX jp ] < . Then, Xn converges in Lp to X if
lim E [jXn
n !
X jp ] = 0.
Just as a reference, Lp convergence does not imply almost sure convergence and nor
does almost sure convergence imply Lp convergence. However, Lp convergence
implies convergence in probability.
(Bilkent)
ECON509
50 / 82
Finally, we present another type of LLN before we move on to convergence in

distribution.
Theorem (Uniform Strong LLN): If X1 , X2 , ... are iid random variables, if g (x , ) is
continuous over X where X is the range of X1 and is a closed and bounded
set, and if
E sup jg (X1 , )j < ,
2
then
lim sup
n ! 2
1 n
g (X i , )
n i
=1
E [g (X1 , )] = 0
a.s.
Moreover, E [g (X1 , )] is a continuous function of .

Therefore, the worst deviation of the sample average from the population average
(E [g (X1 , )]) that one can nd over all 2 converges to zero almost surely.
This is not a simple generalisation of strong LLN. Generally, LLN for each and every
individually would not imply uniform LLN (except under certain assumptions).
(Bilkent)
ECON509
51 / 82
Convergence in Distribution
So far, we have dealt with results concerning the sample mean.

The main theme has been the convergence of the sample mean to the population
mean.
This is useful, but we can get much more.
For example, we can get convergence in distribution, as well.
(Bilkent)
ECON509
52 / 82
Denition (Convergence in Distribution): A sequence of random variables

X1 , X2 , ... converges in distribution to a random variable X if
lim FX n (x ) = FX (x ) ,
n !
at every x where F (x ) is continuous.

This is also called convergence in law. The following short hand notation is used to
denote convergence in distribution:
d
Xn ! X ,
d
Xn ! FX ,
L
Xn ! FX .
It is important to underline that it is not Xn that converges to a distribution.
Instead, it is the distribution of Xn that converges to the distribution of X .
(Bilkent)
ECON509
53 / 82
As far as sequences of random vectors are concerned, a sequence of random vectors

Xn = (X1,n , ..., Xd ,n ) converges in distribution to a random vector X if
lim FX n (x1 , ..., xd ) = FX (x1 , ..., xd ) ,
n !
at every x = (x1 , ..., xd ) where F (x1 , ..., xd ) is continuous.

Importantly, convergence in probability implies convergence in distribution.
Theorem (5.5.12): If the sequence of random variables X1 , X2 , ... converges in
probability to a random variable X , the sequence also converges in distribution to X .
Consequently, almost sure convergence implies convergence in distribution, as well.
Theorem (5.5.13): The sequence of random variables X1 , X2 , ... converges in
probability to a constant a if and only if the sequence also converges in distribution
to a. Equivalently, the statement
P (jXn
a j > e) ! 0 for every e > 0
is equivalent to
F X n (x ) = P (X n
(Bilkent)
x) !
ECON509
0
1
if x < a
.
if x > a
54 / 82
We now introduce one of the most useful theorems we have considered so far.
Theorem (5.5.15) (Central Limit Theorem): Let X1 , X2 , ... be a sequence of iid
random variables with E [Xi ] = < and 0 < Var (Xi ) = 2 < . Dene
1 n
X n = Xi .
n i =1
Let Gn (x ) denote the cdf of
(X n )
.
lim Gn (x ) =
n !
In other words,
1
p
n
(Bilkent)
i =1
Then, for any x ,
Z x
Xi
ECON509
1
p e
2
y 2 /2
< x < ,
dy .
! N (0, 1).
55 / 82
This is a powerful result! We start with the iid and nite mean and variance
assumptions. In return, the Central Limit Theorem (CLT) promises us that the
distribution of a properly standardised version of the sample mean given by
1
p
n
i =1
Xi
will converge to the standard normal distribution as the sample size tends to innity.
As before, the sample size will never be equal to . BUT, for large enough samples,
Xi
p1 n
will be approximately standard normal. As n becomes larger, this
n i =1
approximate result will become more accurate.
As with LLNs, it is possible to obtain CLTs for non-iid data. However, this will
require one to make stronger assumptions regarding the moments of the sequence of
random variables. The trade-o between dependence and moment assumptions is
always there.
(Bilkent)
ECON509
56 / 82
Lets prove the CLT. However, before we do that, we have to revisit Taylor
Expansions.
Denition (5.5.20): If a function g (x ) has derivatives of order r , that is,
dr
g (r ) (x ) = dx
r g (x ) exists, then for any constant a, the Taylor polynomial of order r
about a is
r
g (i ) (a )
T r (x ) =
(x a )i .
i!
i =0
This polynomial is used in order to obtain a Taylor expansion of order r about
x = a. This is given by
g (x ) = T r (x ) + R,
where R = g (x )
(Bilkent)
Tr (x ) is the remainder for the approximation.
ECON509
57 / 82
Now, a useful major result is as follows.

Theorem (5.5.21): If g (r ) (a ) =
dr
dx r
g (x )
exists, then
x =a
lim
x !a
g (x )
(x
T r (x )
= 0.
a )r
This says that the remainder, g (x ) Tr (x ) , always tends to zero faster than the
highest-order term of the approximation.
Importantly, this also means that as x tends to a, the remainder term approaches 0.
(Bilkent)
ECON509
58 / 82
We can now prove the CLT.

Proof: We will do the proof for the case where the mgf exists for jt j < h for some
positive h. The CLT can be proved without assuming existence of the mgf and using,
instead, characteristic functions. However, this would be much more complicated.
Let E [Xi ] = and Var (Xi ) = 2 . The aim is to show that as n ! the mgf of
X n
converges to the mgf of a N (0, 1) random variable, which will prove that the
p X n
distribution of n
converges to the standard normal distribution.
Now, let Yi =
Xi
.
Then,
h Xi i
MY i (t ) = E [e tY i ] = E e t
=e
(Bilkent)
ECON509
h Xi i
E et = e
MX i (t/ ) .
59 / 82
Let Mn (t ) be the mgf for
X n
n Y .
Since Xi are iid, Yi are also iid. In addition, E [Yi ] = 0 and Var (Yi ) = 1.
Now, due to the independence and identical distribution assumptions,
i
h
i
h p1
p
p
t (Y +...+Y n )
= E e (t / n )Y 1 ... e (t / n )Y n
M n (t ) = E e n 1
p
p
p n
= E [e (t / n )Y 1 ] ... E [e (t / n )Y n ] = MY i t/ n
.
p
Lets expand MY i t/ n around t = 0.
MY i
t
p
M Y i (0 ) +
1 d2
M
2 dt 2 Y i
where
RY
(Bilkent)
t
p
d
M
dt Y i
t
p
dk
dt k MYi
k =3
ECON509
t
p
t =0
t
p
t =0
t
p
t
p
t =0
+ RY i
p
t/ n
k!
t
p
60 / 82
These expansions do exist since that the mgf of Xi exists in a neighbourhood of 0

p
(jt j < h for some h) implies that jt j < nh which, in turn, implies that MY i ptn
exists.
We know from Theorem (5.5.21) that, for xed t,
p
p
RY t/ n
RY t/ n
lim
p 2 = nlim
p 2 = 0,
p
!
t / n !0
t/ n
t/ n
where we need to have t 6= 0.
But, as will become clear in a moment, we are interested in the behaviour of

p
RY t/ n
p 2 .
1/ n
Since the above result is based on xed t, we have
p
p
RY t/ n
lim
n RY t/ n = 0,
p 2 = nlim
n !
!
1/ n
p
as well, for all t, including t = 0, since RY 0/ n = 0.
(Bilkent)
ECON509
(1)
61 / 82
Now, notice that
p
d
M
t/ n
dt Y i
E [Yi ] = 0,
Var (Yi ) = 1,
t =0
p
d2
M
t/ n
dt 2 Y i
t =0
and MY i (0) = 1 by denition.

Therefore,
MY i
t
p
=
=
(Bilkent)
"
1+0
1+
1
n
t
p
1
+
2
t2
+ n RY i
2
ECON509
t
p
t
p
+ RY i
t
p
#n
62 / 82
Remember that for any sequence an , if limn ! an = a, then

lim
n !
Let an =
t2
2
lim
n !
pt
+ n RY i
MY i
t
p
1+
= ea.
and observe that, by (1), limn ! an = t 2 /2. Then,

n
an
n
= lim
n !
1+
1
n
t2
+ n RY i
2
t
p
= et
/2
But this is the mgf of the N (0, 1) distribution! Hence, the CLT is proved.
(Bilkent)
ECON509
63 / 82
Example (5.5.16): Suppose (X1 , ..., Xn ) are a random sample from a negative
binomial (r , p ) distribution. For this distribution, one can show that
E [X i ] =
r (1
p)
p
and
Var (Xi ) =
r (1 p )
,
p2
for all i .
Then, the CLT tells us that
p
n (X r (1 p ) /p ) d
p
! N (0, 1) .
r (1 p ) /p 2
Hence, in a large sample, this quantity should be approximately standard normally

distributed.
(Bilkent)
ECON509
64 / 82
One can also do exact calculations but these would be di cult. Take r = 10, p = .5
and n = 30. Now, consider
!
30
P (X
11) = P Xi
330
i =1
330
x =0
300 + x
x
1
2
300
1
2
= .8916,
which follows from the fact that 30

i =1 Xi is negative binomial(nr , p ) (you do not
have to prove the results presented in this bulletpoint!).
Such calculations would be tough, even using a computer, as we are considering
factorials of very large numbers.
We could also use the CLT to obtain the following approximation.
!
p
p
10)
30
X
30
11
10
(
(
)
p
p
P (Z
1.2247) = .8888,
P (X
11) = P
20
20
where Z
(Bilkent)
N (0, 1).
ECON509
65 / 82
Two useful results are given next.

Theorem: If Xn is a sequence of random vectors each with support X , g (x ) is
continuous on X and
d
Xn ! X ,
then
g (X n ) ! g (X ).
d
Theorem (5.5.17) (Slutskys Theorem): If Xn ! X and Yn ! k, where k is a

constant, then
1
2
Y n X n ! kX ,
d
Xn + Yn ! X + k .
(Bilkent)
ECON509
66 / 82
Example (5.5.18): Suppose that

p
n (X n ) d
! N (0, 1) ,
however the value of is unknown. What to do?

p
In Example (5.5.3), we have seen that if limn ! Var Sn2 = 0, then Sn2 ! 2 . One
p
can show that this implies that Sn ! .

Then, by Slutskys Theorem,
p
p
n (X n )
n (X n
=
Sn
Sn |
{z
|{z}
d
!N (0,1 )
!1
If you nd this confusing, try to see this as

p
n (X n ) d
! X , where
Sn
The mean of X is equal to 1
normally distributed. Hence,
(Bilkent)
)
}
! N (0, 1) .
0 while Var (X ) = 12
n (X n
Sn
ECON509
N (0, 1) .
1. Moreover, clearly, X is
! N (0, 1) .
67 / 82
The Delta Method
When talking about the CLT, our focus has been on the limiting distribution of
some standardised random variable.
There are many instances, however, when we are not specically interested in the
distribution of the standardised random variable itself, but rather of some function of
it.
The delta method comes in handy in such cases. This method utilises our knowledge
on the limiting distribution of a random variable in order nd the limiting
distribution of a function of this random variable.
In essence, this method is a combination of Slutskys Theorem and Taylors
approximation.
(Bilkent)
ECON509
68 / 82
The Delta Method
Theorem (5.5.24) (Delta Method): Let Yn be a sequence of random variables

that satises
p
d
n (Yn ) ! N 0, 2 .
For a given function g ( ) and a specic value of , suppose that g 0 ( ) exists and
g 0 ( ) 6= 0. Then,
p
d
2
n [g (Yn ) g ( )] ! N 0, 2 g 0 ( )
.
Proof: The rst-order Taylor expansion of g (Yn ) about Yn = is
g (Y n ) = g ( ) + g 0 ( ) (Y n
p
) + R,
where R is the remainder and R ! 0 as Yn ! .

Then,
p
p
n [g (Yn ) g ( )] g 0 ( ) n (Yn
|
{z
)
}
and therefore,
p
n [g (Y n )
as n ! ,
!N (0,2 )
g ( )] ! g 0 ( ) N 0, 2 = N 0, g 0 ( )
p
2 .
p
Implicitly, what we need here is that Yn ! , as this makes sure that R ! 0.

(Bilkent)
ECON509
69 / 82
The Delta Method
In some cases, one might have

g 0 ( ) = 0.
In this case, the delta method as proposed above will not work.
However, this problem can be solved by using a second-order delta method.
Consider the second order expansion.
g (Y n ) = g ( ) + g 0 ( ) (Y n
) +
g 00 ( )
(Y n
2
)2 + R.
p
p
As before, R ! 0 as Yn ! . However, this time g 0 ( ) = 0.
So,
g (Y n )
(Bilkent)
g ( ) +
g 00 ( )
(Y n
2
ECON509
)2 ,
as n ! .
70 / 82
The Delta Method
Now,
p Yn
n
! N (0, 1) ,
which implies that

n
Hence,
n [g (Y n )
g ( )]
Yn
! 21 .
g 00 ( )
n (Y n )2
2 |
{z
}
as n ! ,
!2 21
and, therefore,
n [g (Y n )
(Bilkent)
g ( )] !
ECON509
g 00 ( ) 2 2
1 .
2
71 / 82
The Delta Method
The next Theorem follows.

Theorem (5.5.26) (Second Order Delta Method): Let Yn be a sequence of
random variables that satises
p
d
n (Yn ) ! N 0, 2 .
For a given function g and a specic value of , suppose that g 0 ( ) = 0, g 00 ( )
exists and g 00 ( ) 6= 0. Then,
n [g (Y n )
(Bilkent)
g ( )] !
ECON509
g 00 ( ) 2 2
1 .
2
72 / 82
Some More Large Sample Results
So far we have worked with iid random sequences only. We now introduce large
sample results for a dierent type of distribution. The following is largely based on
White (2001).
Lets start with independent heterogeneously distributed random variables.
The failure of the identical distribution assumption results from stratifying
(grouping) the population in some way. The independence assumption remains valid
provided that sampling within and across the strata is random.
Theorem (Markovs Law of Large Numbers): Let X1 , ..., Xn be a sequence of
independent random variables, with E [Xi ] = i < , for all i . If for some > 0,
i1
0 h
E jX i i j1 +
A < ,
@
i 1 +
i =1
then
1 n
Xi
n i
=1
(Bilkent)
1 n
a.s .
i ! 0.
n i
=1
ECON509
73 / 82
Notice some important dierences.

First, note that we have relaxed the iid assumption: the sequence is not identically
distributed. Instead, we have heterogeneously distributed random variables. This
comes at a cost: We now have to ensure that moments of order higher than one
should also be bounded. The following Corollary makes this easier to see.
Corollary 3.9. (White (2001)):
h Let Xi1 , ..., Xn be a sequence of independent
random variables such that E jXi j1 + < for some > 0 and all i . Then,
a.s .
1 n
ni=1 Xi
n i =1 i ! 0.
Remember that Kolmogorovs strong law only requires the existence of the rst
order moment.
1
n
So, moving away from the iid assumption usually comes at the cost of stronger
moment assumptions.
Xi are heterogeneous so their expected values are not identical. So, now we have
n 1 ni=1 i lying around. Compare this with the iid case,
1 n
1 n
E [Xi ] = = .
n i =1
n i =1
(Bilkent)
ECON509
74 / 82
There is also a CLT associated with independent heterogeneously distributed random

sequences. The following is based on Theorem 5.6 from White (2001). For
conciseness, dene
=
1 n
E [X i ]
n i
=1
2 =
and
1 n
Var (Xi ) ,
n i
=1
i.e. the average mean and the average variance, respectively.

Theorem (Lindeberg-Feller): Let X1 , ..., Xn be a sequence of independent random
scalars with E [Xi ] = i < , Var (Xi ) = 2 , where 0 < 2 < and distribution
functions Fi , i = 1, 2, ... . Then,
p (X
n
! N (0, 1)
if and only if for every > 0,

lim
n !
1 1 n
2 n i =1 (x
i )2 >n 2
(x
i )2 dFi (x ) = 0.
(2)
Now, this looks a lot more complicated!

(Bilkent)
ECON509
75 / 82
Again, lets proceed step by step.

First of all, the normalised term that converges in distribution is almost the same as
the one under the iid assumption. The only dierence is that we now use the
average mean and the average variance. Not surprising since we are dealing with
heterogeneous random variables.
Second, the conditions of the Theorem seem to be very similar to the CLT for iid
processes except for (2).
This is known as the Lindeberg condition and it requires the average contribution of
the extreme tails to the variance of Xi to be zero in the limit.
The integral is actually the contribution to the variance of Xi , in the region where
(x i )2
n 2
(x i )
n
i =1 Var (X i )
(Bilkent)
> .
ECON509
76 / 82
There is an equivalent result, which is easier to follow. This is given in Theorem

5.10 (White, 2001).
Theorem (Liapounov): Let X1 , ..., Xn be a sequence of independent random scalars
with E [Xi ] = i < , Var (Xi ) = 2i < and
h
i
E jXi i j2 + < , for some > 0 and all i .
If
2 > 0
for all n su ciently large, then
p (X
n
! N (0, 1) .
The conditions of this results are easiler to follow. We now need existence of even
higher order moments (more than the second order).
(Bilkent)
ECON509
77 / 82
Order Notation
We nish this part by introducing the order notation.

This so called big-O, little-o notation is used to determine at which speed random
variables are approaching some bounded limit.
Put dierently, the notation is related to the order of magnitude of terms, as n ! .
Let f (x ) and g (x ) be two functions.
If
f (x )
!0
g (x )
as
x ! ,
then f is of smaller order than g and we write

f (x ) = o fg (x )g .
If,
lim jf (x ) /g (x )j
then we will write
x !
constant,
f (x ) = O fg (x )g
(Bilkent)
ECON509
78 / 82
Order Notation
There is also a corresponding notation for random variables.

Let X1 , X2 , ... be a sequence of random variables and f be a real function.
If,
Xn p
! 0,
f (n )
then we write
Xn = op ff (n )g ,
while if
Xn p
! X,
f (n )
where X is a constant,
we write
Xn = Op ff (n )g .
The same applies to almost sure convergence.
(Bilkent)
ECON509
79 / 82
Order Notation
It is important to know the following two representations of convergence in

probability and distribution, based on order notation.
Let X = n 1 n Xi .
i =1
Then, instead of
p
X !
a.s .
X !
or
one can also write

X = + op (1)
X = + oa.s . (1) ,
or
respectively.
Moreover, if
then, one can also write
(Bilkent)
p (X
n
p (X
n
ECON509
! N (0, 1) ,
= O p (1 )
80 / 82
Order Notation
You might nd it not so clear at rst, but if

Xn = o n ,
for some , then
Xn = O n .
To see this, notice that
So,
Xn
! 0 = O (1 ) .
n
Xn
! O (1 ) , X n ! O n .
n
(Bilkent)
ECON509
81 / 82
Order Notation
It might now be obvious to you that, if

Xn ! 0,
then
X n = o (1 ).
Now, consider
Xn = O n )
Xn
= O (1 ) .
n
Then, for any > 0,

Xn
= o (1 ) ) X n = o n + .
n +
(Bilkent)
ECON509
82 / 82

ECON509 - Slides 5 - 2014 11 17

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECON509 - Slides 5 - 2014 11 17

Uploaded by

Copyright:

Available Formats

ECON509 Probability and Statistics

This Version: 16 November 2014

This Version: 16 November 2014

Basic Concepts of Random Samples

We start with a denition.

This Version: 16 November 2014

Basic Concepts of Random Samples

Then, the joint pdf or pmf is given by

f (x1 , ..., xn j ) = f (x1 j ) f (x2 j ) ... f (xn j ) =

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

The sample standard deviation is the statistic dened by

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

min a ni=1 (xi a )2 = ni=1 (xi x )2 ,

Proof: To prove part (a), consider

Clearly, the value of a that minimises ni=1 (xi

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

Proof: This is straightforward. First,

= nVar (g (X1 )).

E [g (Xi )] = E [g (X1 )] = nE [g (X1 )],

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

Var (g (Xi )) + Cov (g (Xi ), g (Xj ))

Var (g (Xi )) + 0 = Var (g (X1 )) = nVar (g (X1 )),

where we have used the fact that

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

An aside: You might already know that, since E [X ] = and E [S 2 ] = 2 , one

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

MX (t ) = E [e t X ] = E [e t (X 1 +...+X n )/n ] = E [e (t /n )Y ] = MY (t/n ).

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

This Version: 16 November 2014

Sums of Random Variables from a Random Sample

which reveals that X

This Version: 16 November 2014

Sampling from the Normal Distribution

This Version: 16 November 2014

Sampling from the Normal Distribution

N (0, 1 ) random variable, then

This Version: 16 November 2014

Sampling from the Normal Distribution

a. The random variables U i and V r are independent if and only if Cov (U i , V r ) = 0.

This Version: 16 November 2014

Sampling from the Normal Distribution

Again, we consider the case where Xj

Cov (Ui , Vr ) = E [Ui Vr ] = E

N (0, 1), for simplicity. Then,

aij brj Xj2

due to independence. The implication from independence to zero covariance is also

This Version: 16 November 2014

Sampling from the Normal Distribution

Consider the transformation is given by

The Jacobian of the transformation is

Then, the new joint pdf is,

This Version: 16 November 2014