You are on page 1of 19

Lecture 9: Weak Convergence - II

1. Convergence in distribution of real-valued


random variables
1.1 Convergence in distribution (weak convergence)
1.2 Convergence in distribution of transformed random variables
1.3 Subsequence approach to weak convergence
2. Convergence in distribution of random variables
with values in a metric space
2.1 Convergence in distribution (weak convergence)
2.2 Weak convergence of transformed random variables
2.3 Subsequence approach to weak convergence

3. Invariance principle
3.1 Process of step-summs of i.i.d. random variables
3.2. Brownian motion (Wiener process)
3.3 Central Limit Teorem and Donsker Invariance Principle
3.4 Examples
1. Convergence in distribution
1.1 Convergence in distribution (weak convergence)

< n , Fn , Pn > are probability spaces;


Xn is a real-valued random variables defined on a probability
space < n , Fn , Pn > for every n = 0, 1, 2, . . ..
Fn (x) = P (Xn x), x R1 is a distribution function of the
random variable Xn for every n = 0, 1, 2, . . ..
CF is the set of continuity points for a distribution function
F (x).
Fn (A) = P (Xn A), A B1 is the distribution of the random
variable Xn for every n = 0, 1, 2, . . ..
Definition 9.1 . Random variables Xn converge in distrid
bution to X0 as n (Xn X0 as n ) or that is the
same their distribution functions Fn weakly converge to F0 as
n (Fn F0 as n ) iff
Fn (x) F0 (x) as n , x CF0 .

(1) Alternatively, the term weak convergence is used instead of


term convergence in distribution and the corresponding notation
Xn X0 as n is applied directly to random variables Xn
instead of their distribution functions.
d

Theorem 9.1 (Skorokhod)*. Let random variables Xn


X0 as n . The it is possible to construct on some probabil n , n = 0, 1, 2, . . . such
ity space < , F, P > random variables X
n x), x R1 for every n = 0, 1, . . .;
that (a) P (Xn x) = P (X
a.s
n X
0 as n 0.
(b) X
1.3 Weak convergence via convergence of for transformed random variables
2

F (x) be some distribution function, F (A) be the corresponding


probability measure on Borel -algebra of subsets of real line;
f (x) is a Borel function R1 R1 ;
Cf the set of continuity points of function f .
d

Theorem 9.2 (8.4). Let random variables Xn X0


d
as n if and only if random variables Yn = f (Xn )
Y0 = f (X0 ) as n for any real-valued measurable functions
f (x), x R1 a.s. continuous with respect to measure F0 (A), i.e.,
such that F0 (Cf ) = 1.
d

Theorem 9.2 (8.7). Random variables Xn X0 as n


if and only if Ef (Xn ) Ef (X0 ) as n for any realvalued measurable bounded function f (x), x R1 a.s. continuous with respect to measure F0 (A), i.e., such that F0 (Cf ) = 1.
A B1 , A is a boundary of the set A, i.e., the set of points
x R1 such that (x , x + ) A 6= and (x , x + ) A 6=
for any > 0.
d

Theorem 9.3. Random variables Xn X0 as n if


and only if P (Xn A) P (X0 A) as n for any A B1 ,
such that F0 (A) = 0.
(a) P (Xn A) = EIA (Xn );
(b) A is the set of discontinuity points of function IA (x);
(c) Fn (y) = EI(,y] (Xn );
(d) function I(,y] (x) has the boundary {y};
3

(e) F0 ({y}) = F0 (y) F0 (y 0) and, thus, F0 ({y}) = 0 iff


y C F0 .
1.3 Subsequence approach to weak convergence
Let {an } is a sequence of real numbers. Then an a0 as
n if and only if from an arbitrary subsequence nk as
k one can select a subsequence n0r == nnr as r
such that (a) an0r a as r , (b) a = a0 does not depend
on choice of subsequences nk and n0r .
It follows from the definition of weak convergence that, in order to prove that distribution functions Fn (x) weakly converge
as n 0, one can use the following subsequence approach.
First, an arbitrary subsequence nk as k should
be selected. Second, it should be shown that a subsequence
n0r = nnr can be selected from the first subsequence such that
Fn0r () F (), where F (x) is a distribution function. Third, it
should be shown that the distribution function F (x) F0 (x)
does not depend on the choice of subsequences nk and n0r . Then
Fn () F0 () as n .
Indeed, in this case for any x CF0 we get Fn0r (x) F (x) =
F0 (x) as r . Thus, Fn (x) F0 (x) as n .
Let R be a subset of R1 . A set S R is dense in R1 , if
inf yS |x y| = 0 for every x R.

Let us choose a set S dense in R1 . Note that S = {s1 , s2 , . . .}


can be a countable subset of R1 . Due to continuity from the
right, any distribution function F (x) is completely determined
by its values in points of the set S In this sense, S can be referred
to as a defining set.
Let Fn (x), n = 1, 2, . . . be an arbitrary sequence of distribution functions.
(1) Let now nk 0 be an arbitrary sequence such that nk
as k . Using Cantors diagonal method it is always
possible to find a subsequence n0r = nnr such that Fn0r (x) F (x)
as r for all x S, where the limits F (x) [0, 1].
(a) Fn(1) (s1 ), Fn(1) (s1 ), Fn(1) (s1 ), . . .;
1

(b) Fn(1) (s1 ) F (s1 ) as k ;


k

(1)

(c) n(2)
r = nkr , r = 0, 1, . . .;
(s2 ) F (s2 ) as r ;
(d) Fn(2)
r
(e) . . .;
(l)

(f) n0l = nl , l = 1, 2, . . .;
(g) Fn0l (sm ) F (sm ) as l for every m = 1, 2, . . ..
(2) The function F (x), defined on the set S, is non-decreasing.
Using this fact one can always define this function at every point
x R1 \ S as the right limit of the values F (xk ) for some sequence of points xk S, xk > x, xk x. The function F (x),
defined on R1 in this way, is non-decreasing, continuous from the
right, and F (x) [0, 1] for every x R1 . So, it is a distribution
function.
5

(3) However, it can be an improper distribution function,


i.e., it can be such that F (+) F () < 1, where F () =
limx F (x).
(4) For example, let Fn (x) = I[an ,) (x), where an a0 =
as n . In this case, Fn (x) F0 (x) 0 as n .
(5) In order to prove that F (x) is a proper distribution function (for any subsequences nk and n0r chosen as is described
above), one should require that the initial family of distribution
functions Fn (x), n = 0, 1, . . . be tight that means that the following condition holds:
T: limK maxn0 (Fn (K 0) + 1 Fn (K))
= limK maxn0 Fn ([K, K]) = 0.
(6) Condition T implies that, for any subsequence nk
as k , a subsequence n0r = nkr can be selected from the
first subsequence in such a way that the distribution functions
Fn0r () F () as k , where F (x) is a proper distribution
function.
(7) The family of distribution functions Fn (x), n = 0, 1, . . .
is tight if and only if it is r elatively compact, i.e., for any subsequence nk as k , a subsequence n0r = nkr can be
selected from the first subsequence in such a way that the distribution functions Fn0r () F () as k , where F (x) is a
proper distribution function.

(8) Now, let us also require convergence of distribution functions Fn (x) in points of the defining set S:
W: Fn (x) F0 (x) as n , for x S.
(9) Note that limits in are some numbers from the interval
[0, 1]. The function F0 (x), defined in W, is automatically nondecreasing. But it is not required that the corresponding limits
of F0 (x), as x tends to or +, be equal to 0 and 1, respectively. The function F0 (x) can be continued to the whole
real line, as it was described above, by using right limits. It is a
proper or improper distribution function.
(10) Condition W implies also that Fn (x) F0 (x) as n
, for x CF0 .
(11) If condition T holds, then, according (7) the family of
distribution functions Fn (x), n = 0, 1, . . . is relatively compact.
Condition W implies, in this case, that all limits F (x) = F0 (x),
x S and also that F (x) are a proper distribution function.
Since S is a defining set, F (x) = F0 (x), x R1 . So, the distribution function F (x) does not depend on the choice of the
subsequences nk and n0r .
(12) Summarizing the remarks made above one can conclude
that, in order to prove weak convergence of distribution functions Fn () F0 () as n , it is sufficient to assume that
both conditions T and W hold.
(13) Moreover, it can be easily shown that conditions T and
7

W are not only sufficient but also necessary for weak convergence.
(a) Sufficiency statement was prove above;
(b) Let Fn () F0 () as n ;
(c) Then condition W holds for any countable dense in R1 set
S = {sk } such that sk CF0 , k = 1, 2, . . .;
(d) Obviously, the family of distribution functions is relatively
compact and, therefore, is is tight, i.e. condition T also holds.
2. Convergence in distribution of random variables
with values in a metric space
2.1 Convergence in distribution (weak convergence)
Let X be a metric space with a metric d(x, y) ((a) 0
d(x, y) = d(y, x); (b) d(x, y) = 0 x = y; (c) d(x, y) + d(y, z)
d(x, z)).
The space X is complete if for any fundamental sequence of
points xn X , i.e., a sequence such that d(xn , xm ) 0 as
n, m , there exists a point x X such that d(xn , x) 0
as n .
The space X is separable if there exists a countable subset
Y = {y1 , y2 , . . .} X such that minkn d(yk , x) 0 as n
for any point x X .

The term Polish space is used to indicate that X is a complete separable metric space. Below, X is always a Polish space.
A set K is a compact (set) in a Polish space X if there exists
a countable set Y = {y1 , y2 , . . .} X such that
min sup d(yk , x) 0 as n .
kn xK

Let BX be the Borel -algebra of subsets of X (the minimal


-algebra containing any ball Br (x) = { y: d(x, y) r } in the
space X ).
The space Rm is a particular example of a Polish space wit4h
the metric
q

d(x, y) = |x y| = (x1 y1 )2 + + (xm ym )2 .


Other examples that we will be interested in are the functional spaces of continuous functions, C[0, 1] of real valued ffunctions x =< x(t), t [0, 1] > with the uniform metric
dU (x, y) = sup |x(t) y(t)|.
t[0,t]

A random variable X = X() defined on a probability space


< , F, P > with values in the Polish space X is a measurable
function acting X such that X 1 (A) F for any A BX .
The distribution of a random variable X is a probability measure
FX (A) = P (X A) = P (X 1 (A)), A BX .
In the case of X = Rm a random variable X = (X1 , . . . , Xm )
is a random vector.

In the case of X = C([0, 1]), a random variable X =


< X(t), t [0, T ] is a real-valued continuous stochastic process.
Let Xn , n = 0, 1, . . . be a sequence of random variables (defined on probability spaces < n , Fn , Pn > are probability spaces)
that take values in X . We denote by Fn (A) = P {Xn A},
A BX , the distribution of the random variable Xn .
Let A denote the boundary of the set A, i.e., the set of
points x such that every ball Br (x), with centre in x and a radius r > 0, has non-empty intersections with both sets A and
A. If F0 (A) = 0, then A is called a set of continuity for the
distribution F0 . The class of such sets, B(F0 ), is a -algebra of
subsets from BX .
Definition 9.2. Random variables Xn converge in distribud
tion to X0 as n (Xn X0 as n ) or equivalently,
distributions Fn weakly converge to F0 as n (Fn F0 as
n ) iff
Fn (A) F0 (A) as n , A B(F0 ).
Definition 9.3. Random variables Xn a.s. converge to X0
a.s.
as n (Xn X0 as n ) iff
P (n
lim Xn () = X0 ()) = 1.
a.s.

Lemma 9.1. If Xn X0 as n then Xn X0 as


n .
d

Theorem 9.4 (Skorokhod)**. Let random variables Xn


X0 as n . The it is possible to construct on some prob n , n = 0, 1, 2, . . .
ability space < , F, P > random variables X
10

n A), A BX for every


such that (a) P (Xn A) = P (X
a.s
n
n = 0, 1, . . .; (b) X
X0 as n , i.e.,
2.2. Convergence in distribution of transformed random variables
If f (x) be a measurable real-valued function defined on a
space Polish space X (the inverse image of any Borel set in R1
is a Borel set in X ), and X is a random variable with values in
X ) then f (X) is a real-valued random variable.
F (A) be the corresponding probability measure on Borel algebra of subsets of real line;
f (x) is a Borel function R1 R1 ;
Cf the set of continuity points of function f .
d

Theorem 9.5. Random variables Xn X0 as n if


and only if Ef (Xn ) Ef (X0 ) as n for any real-valued
measurable bounded function f (x), x R1 a.s. continuous with
respect to measure F0 (A), i.e., such that F0 (Cf ) = 1.
(a) Let assume that Ef (Xn ) Ef (X0 ) as n for any realvalued bounded Borel function f defined on X and such that
F0 (Cf ) = 1.
(b) The indicator function IA (x) of a Borel set A is a measurable function and it has the set of discontinuity points, A. The
condition F0 (A) = 0 means that IA (x) is an a.s. continuous
function with respect to the probability measure F0 .
(c) Thus, EIA (Xn ) = Fn (A) EIA (X0 ) = F0 (A) as n for
all sets of continuity for the limiting distribution F0 .
11

(d) Let Xn X0 as n . Construct on some probabil n , n = 0, 1, 2, . . . such


ity space < , F, P > random variables X
n x), x R1 for every n = 0, 1, . . .;
that P (Xn x) = P (X
a.s
n X
0 as n 0.
and X
a.s
n )
0 ) as n .
(d) f (X
f (X
n ) Ef (X
0 ) as n .
(e) By Lebesgue theorem Ef (X
(f) P (f (Xn ) x) = P (f (Xn ) x), x R1 for every n =
0, 1, . . .;
(g) Ef (Xn ) Ef (X0 ) as n .
d

Theorem 9.6. Let random variables Xn X0 as n


d
if and only if random variables Yn = f (Xn ) Y0 = f (X0 ) as
n for any real-valued measurable functions f (x), x R1
a.s. continuous with respect to measure F0 (A), i.e., such that
F0 (Cf ) = 1.
d
(a) Let assume that f (Xn ) f (X0 ) as n for any realvalued Borel function f defined on X and such that such that
F0 (Cf ) = 1.
d

(b) Then f (Xn ) f (X0 ) as n for any real-valued


bounded Borel function f defined on X and such that such that
F0 (Cf ) = 1.
(c) Then, by Helly theorem, Ef (Xn ) Ef (X0 ) as n .
d

(d) Let Xn X0 as n . Construct on some probabil n , n = 0, 1, 2, . . . such


ity space < , F, P > random variables X
12

n x), x R1 for every n = 0, 1, . . .;


that P (Xn x) = P (X
a.s
n
and X
X0 as n 0.
n ), n = 0, 1, . . ..
(b) Let Yn = f (X
n f 1 ((, x])
(c) P (Yn x) = P (X
= P (Xn f 1 ((, x]) = P (Yn x).
n () X
0 ()}, B = { : X
0 () Cf }. Then
(d) A = { : X
P (A) = 1, P (B) = 1 and, therefore, P (A B) = 1.
n ()) Y0 () = f (X
0 ()).
(d) If A B then Yn () = f (X
d
a.s.
(e) Thus, Yn Y0 and, therefore, Yn Y0 .
d

(f) Thus Yn Y0 .
2.3 Subsequence approach to weak convergence
Theorem 9.7**. Distributions Fn () F0 () as n if
and only any subsequence nk as k contains a subsequence n0r = nkr as r such that Fn0r () F0 () as
k .
The notions of tightness and relative compactness for a family of distributions play a principle role in the theory. Let us
introduce the following condition:
T: There exists a sequence of compact sets Km X , m =
1, 2, . . ., such that limm maxn0 Fn (K m ) = 0.
Definition 9.4. A family of distributions Fn , n 0, is tight
if condition K holds.
13

Definition 9.5. A family of distributions Fn , n 0, is relatively compact, if any subsequence nk as k contains


a subsequence n0r = nkr as r , such that distributions
Fn0r weakly converge to some probability measure F as r
(possibly depending on the subsequences nk and n0r ).
The following theorem plays a fundamental role in the theory.
Theorem 9.8 (Prokhorov)**. A family of probability
measures Fn , n 0, is relatively compact if and only if it is
tight.
Also, the notion of defining class for a distribution is also important.
Definition 9.6. A class of sets DF from the -algebra BX is
a defining class for a probability measure F , if any probability
measure F 0 that takes the same values as F on sets from the
class DF coincides with F .
Let us introduce the following condition:
W: Fn (A) F0 (A) as n for A DF0 , where DF0 is
some defining class for the distribution F0 .
Theorem 9.9 (Prokhorov)**. Conditions T and W are
necessary and sufficient for the weak convergence Fn F0 as
n .

14

(a) It follows from Theorem 9.5 that the family of distributions


Fn is relatively compact;
(b) By condition W, any weakly converging subsequences Fn0r
has the same limiting distribution F F0 ;
(c) Thus, by Theorem 9.4, Fn F0 as n ;
(d) If Fn F0 as n 0, then the family of distributions Fn ,
n 0, is relatively compact;
(e) Therefore, due to Theorem 9.5, this family of distributions
is also tight;
(f) Also, the class of sets of continuity for the distribution F0
is a defining class for this distribution (DF0 is a -algebra and
(DF0 ) = DX ;
(g) Thus, condition W holds.
3. Invariance principle
3.1 Process of step-summs of i.i.d. random variables
< , F, P > is a probability space;
Xn , n = 0, 1, 2, . . . are independent identically distributed (i.i.d.)
random variables defined on a probability space < , F, P >;
Z = N (0, 1) is standard normal random variable with the distribution function F (x) =

2
Rx
y2
1
e
dy,
n

< x < .

CLT: If E|X1 |2 < , EX1 = a, V arX1 = 2 > 0, then

15

random variables
Wn =

X1 + + Xn an d

Z as n
n

(1)

that means
P (Wn x) F (x) as n , x R1 .

(2)

Let us construct the continuous stochastic process Xn =<


Wn (t), t [0, 1] > based on sums Wn , n = 0, 1, . . . in the following way:
(1) Define the values Wn ( nk ) = Wk , k = 0, 1, . . . , n;
(2) Define the values Wn (t) = Wk ( nk )+n(t nk )(Wk+1 Wk ), nk
k
t k+1
n , k = 0, . . . , n 1, i.e., by connecting points ( n , Wk )
and ( k+1
n , Wk+1 ) by the segment of strait line, for every k =
0, 1, . . . , n 1.
One can consider the continuous stochastic process Xn as
random variable taking value in the Polish space X = C[0, 1].
3.1 Brownian motion (Wiener process)
Let us consider a stochastic X0 =< W0 (t), t [0, 1] >, which:
(a) is a continuous process;
(b) has independent increments, i.e., for any 0 t0 t1
tm 1 random variables W0 (tk+1 ) W0 (tk ), k = 0, 1, . . . , tm1
are independent for every m 1;
(c) increments W0 (t + s) W0 (t) has a Gaussian distribution
with expectation zero and variance s, for 0 t t + s 1.

16

The process X0 is called a Brownian motion (or Wiener process), It can be considered as a random variable taking values
in the Polish space X = C[0, 1].
The distribution of this random variable F0 (A) = P (X0 A)
on the Borel -algebra of space X = C[0, 1] is called the Wiener
measure.
3.3 Central Limit Teorem and Donsker Invariance
Principle
Let f () be a measurable real-valued function (functional)
acting from space C[0, 1] R1 ;
Cf is the set of continuity of the functional f () in the uniform metrics, i.e., the set of continuous functions x0 () =<
x0 (t), t [0, 1] > such that f (xn ()) f (x0 ) as n if
dU (xn (), x0 ()) = supt[0,1] |xn (t) x0 (t)| 0 as n .
Theorem 9.10 (Donsker Invariance Principle). If E|X1 |2 <
, EX1 = a, V arX1 = 2 > 0, then random variables
d

f (Xn ) f (X0 ) as n .

(3)

for any functional f () a.s. continuous with respect to Wiener


measure, i.e., such that F0 (Cf ) = 1.
d
(a) (Wn (t1 ), . . . , Wn (tm )) (W0 (t1 ), . . . , W0 (tm )) as n
for every 0 t0 t1 tm 1, m 1;
(b) limc maxn0 P (c (Wn ()) > ) = 0, > 0,
where c (Wn ()) = sup0tt+st+c1 |Wn (t + s) Wn (t)|;
(c) limN maxn0 P (sup0t1 |Wn (t)| > N ) = 0;
17

(d) Relation (a) implies that condition W holds for random variables Xn ;
(e) Relations (b) and (c) imply that condition K holds for random variables Xn ;
3.4 Examples
(1) f (x()) = mt0 ,t00 (x()) = supt0 tt00 x(t), 0 t0 t00 1;
Cf = C[0, 1];
d

mt0 ,t00 (Wn ()) mt0 ,t00 (W0 ()) as n .


(2) f (x()) = a (x()) = inf(t : x(t) a) 1, a 0;
Cf = {x() : m0,t0 6= m0,t00 = a, 0 t0 < t00 1};
d

a (Wn ()) a (W0 ()) as n .


(3) f (x()) = Ig (x()) = 0T g(x(s))ds where g is a real valued
continuous function defined on a real line;
R

Cf = C[0, 1];
d

Ig (Wn ()) Ig (W0 ()) as n .


LN Problems
1. Prove that a family of distribution functions Fn (x), n =
0, 1, . . . is tight if it is relatively compact.
d

2. Prove that weak convergence Xn X0 as n implies


condition T are not only sufficient but also necessary by direct
estimation of limK maxn0 P (|Xn | K).
18

3. Prove that Cf = C[0, 1] in the Example (1) above.


4. Prove that Cf = {x() : m0,t0 6= m0,t00 = a, 0 t0 < t00
1} in the Example (2) above.
5. Prove that Cf = C[0, 1] in the Example (3) above.

19

You might also like