Professional Documents
Culture Documents
Lecture 2:
Convergence Concepts
This lecture covers some additional concepts of probability that address the limiting behavior
of sequences of random variables. In particular, we consider concepts and results related to
convergence of such sequences. At least some of this material should be familiar from previous
courses, but some of it may be new as well.
2.1
Before dealing with convergence of random variables, first recall the definition of convergence
for a sequence of real numbers.
Convergence of Real Numbers
Recall that a sequence of real numbers {an n 1} converges to a (an a as n ) if for
every > 0, there exists N 1 such that an a for every n N .
Convergence in Probability and Convergence in Distribution
For our purposes, there are two main notions of convergence for random variables. Let
{Xn n 1} be a sequence of random variables, and let X be a random variable. Let F (Xn )
and F (X) denote their cdfs.
We say that {Xn n 1} converges in probability to X (written Xn P X as n )
if for every > 0, P (Xn X > ) 0.
We say that {Xn n 1} converges in distribution to X (written Xn D X as n )
if F (Xn ) (x) F (X) (x) at every point x where F (X) is continuous.
Note that convergence in distribution is defined by convergence of cdfs, rather than the
values of the actual random variables. For this reason, it is sometimes simply written as
F (Xn ) D F (X) or Xn D F (X) . We may also replace F (X) with its common name if it
has one, e.g., Xn D N (0, 1) if F (X) is the cdf of a standard normal random variable.
Note: Convergence in distribution is also called convergence in law or weak convergence.
The next theorems address the relationship between these two types of convergence. The
proofs of these theorems and the other theorems in this section can be found at the end of
this lecture in Section 2.4.
Theorem 2.1.1. If Xn P X, then Xn D X.
Thus, convergence in probability is stronger than convergence in distribution. However, in
the case where the limiting random variable X is actually a constant, they are equivalent.
Theorem 2.1.2. Let a R be a constant. Then Xn P a if and only if Xn D a.
Notice that to apply Slutskys theorem, the sequence of random variables Yn in the theorem
statement must be converging to a constant.
2.2
We now state two extremely important asymptotic results: the weak law of large numbers
and the central limit theorem. Note that we use the abbreviation iid to mean independent
and identically distributed (i.e., independent with the same cdf).
Note: DeGroot & Schervish use the term random sample to refer to a collection of
iid random variables. This terminology is not standard.
Example 2.2.4: In Example 2.2.2, the CLT tells us that n(X n ) D N [0, (1 )],
noting that (1 ) = Var(X1 ). Informally, this tells us that for large n, the distribution
of X n is approximately a normal distribution with mean and variance (1 )/n.
Note: It may appear as though the WLLN as stated above is implied by the CLT since
1
X n = [ n(X n )] +
n
2.3
Delta Method
Let {Yn n 1} be a sequence of random variables such that n(Yn a) D Z for some
random variable Z and some constant a R. Let g R R be a function. What can we say
about the asymptotic behavior of g(Yn )?
Proof. Formally, n[g(Yn ) g(a)] = g (Yn ) n(Yn a) for some Yn between Yn and a. Note
that for any > 0, P (Yn a > ) P (Yn a > ), and P (Yn a > ) 0 since Yn P a.
Then Yn P a, so g (Yn ) P g (a) by Theorem 2.1.3. Since n(Yn a) D Z, the result
follows by Slutskys theorem.
Delta Method (Normal Case)
The following special case is by far the most common use of the delta method.
Corollary 2.3.2 (Delta
Let {Yn n 1} be a sequence of random
Method, Normal Case).
2
constants a R and 2 > 0. Let
variables such that n(Yn a) D N (0, ) for some
1
1
n[(X n ) ] D N (0, 1/2700),
30
noting that 300(1/900)2 = 1/2700. Thus, for large n, ( X n )1 is approximately normal, with
mean 1/30 and variance 1/(2700 n).
2.4
Proofs
This section contains several proofs that are not particularly insightful but are provided for
the sake of completeness. There is no need to study these proofs in detail.
Proof of Theorem 2.1.1. Let F (Xn ) and F (X) denote the cdfs of Xn and X, and let x R be
any point where F (X) is continuous. Now let > 0. Then there exists > 0 such that
F (X) (x)
(2.4.1)
by the definition of continuity and the fact that the cdf is nondecreasing. Now observe that
P (Xn x) P (X x + ) + P (Xn X > ),
P (X x ) P (Xn x) + P (Xn X > ),
(2.4.2)
(2.4.3)
for every n 1, noting for each line that if the event inside the left-hand probability occurs,
then at least one of the events inside the right-hand probabilities occurs. Combining (2.4.2)
and (2.4.3) yields
F (X) (x ) P (Xn X > ) F (Xn ) (x) F (X) (x + ) + P (Xn X > ),
(2.4.4)
noting the definitions of F (Xn ) and F (X) . Then combining (2.4.1) and (2.4.4) yields
F (X) (x) P (Xn X > ) F (Xn ) (x) F (X) (x) + + P (Xn X > ).
2
2
Since Xn P X, there exists N 1 such that P (Xn X > ) < /2 for every n N . Then
F (X) (x) F (Xn ) (x) F (X) (x) +
for all n N , which establishes that F (Xn ) (x) F (X) (x).
Proof of Theorem 2.1.2. By Theorem 2.1.1, we only need to show that Xn D a implies
Xn P a. Let F (Xn ) denote the cdf of Xn , and let F (X) denote the cdf of the random
variable X = a, which is
0 if x < a,
(X)
F (x) = 1(x a) =
1 if x a.
If Xn D a, then F (Xn ) (x) 1(x a) for all x a. Now let > 0. Then
P (Xn a > ) = P (Xn < a ) + P (Xn > a + )
= P (Xn < a ) + [1 P (Xn a + )]
Proof of Theorem 2.1.6. Note that (trivially) Xn + a D X + a and aXn D aX. Then
we may take a = 0 without loss of generality since Xn + Yn = (Xn + a) + (Yn a) and
Xn Yn = Xn (Yn a) + aXn . Now let > 0. To prove the first result, let x R be any point
where F (X) is continuous. Then there exists > 0 such that
F (X) (x)
(2.4.5)
and such that F (X) is continuous at x and x + . Now note that for each n 1,
F (Xn ) (x ) P (Yn > ) P (Xn + Yn x) F (Xn ) (x + ) + P (Yn > )
(2.4.6)
by the same argument as in (2.4.2), (2.4.3), and (2.4.4). Then there exists N 1 such that
F (Xn ) (x ) F (X) (x ) ,
3
F (Xn ) (x + ) F (X) (x + ) + ,
3
P (Yn > )
(2.4.7)
3
2
2
,
F (Xn ) (c) F (Xn ) (c) 1 ,
5 5
5
5
(2.4.8)
(2.4.9)