You are on page 1of 10

A GLIVENKO-CANTELLI TYPE THEOREM: AN APPLICATION OF THE CONVERGENCE THEORY OF STOCHASTIC SUPREMA *

Gabriella Salinetti Universit` la Sapienza, 00100 Roma a Roger J-B Wets University of California, Davis CA 95616

Abstract. The uniform convergence of empirical processes on certain classes of sets follows from the convergence theory for random lower semicontinuous functions studied in the context of stochastic optimization. In the process, a richer class of sets for which one can prove this type of result is exhibited.

Keywords: Glivenko-Cantelli lemma, narrow (weak, weak*) convergence Date: January, 1990 Printed: May 8, 2009 Annals of Operations Research, (30)1991, 157-168.

* Research supported in part by grants from Ministero Publica Istruzione, the Air Force Oce of Scientic Research and the National Science Foundation.

1 1. Introduction and problem setting Let {X , = 1, 2, . . .} be an iid (independent and identically distributed) sequence of random variables dened on a probability space (, A, ) with values in a locally compact, separable metric space E and P the distribution induced by the X , = 1, . . . , on B, the Borel eld on E . Typically, the X model the observation process of a statistical experiment. The empirical random measure P : B [0, 1] associated with the rst random variables X 1 , . . . , X is
P (B, ) = 1

1 [X i ()B] () l
i=1

B B,

with 1 D the indicator function of the set D. l In the case the X are real-valued random variables, the classical Glivenko-Cantelli theorem assert the -almost sure uniform convergence of the empirical distributions functions P to P , i.e.,
suplR |P ((, ], ) P (, ]| 0 -a.s..

Theorems of the Glivenko-Cantelli type [3, 5, 10] are concerned with the almost sure (a.s.) uniform convergence of the random measures {P , lN } to the distribution P for a given subclass of sets C of B. More precisely, they assert that for certain classes of sets C B, for -almost all in ,
sup
C C

P (C, ) P (C) 0

as .

(1.1)

An assertion of this type certainly demands that for all C C and \ N ,


P (C, ) P (C). (1.2)

Thus, a minimal requirement, is that the P a.s. converge narrowly (weakly) to P and, that C is a subset of the continuity set of P , contP := { B B (bdry B) = 0 }. Almost sure uniform convergence has been proved for particular subclasses C contP by relying mostly on the geometrical properties of the class C . We are going to show that it is possible to obtain these results as special cases of a general theorem that is topological in nature, viz., as a consequence of a certain compactness of C . A key step in the derivation is to identify probability measures (dened on B) with their restriction to the space F of closed subsets of E . We know form earlier results [8] that such (restricted) functions are upper semicontinuous on F with respect to the topology of set-convergence. From this point of view, stating a Glivenko-Cantelli type theorem boils down to nding conditions that guarantee that a certain sequence of (random) upper semicontinuous functions converges uniformly (almost surely). This is elaborated in section 3 and the implications for empirical processes are collected in section 4. Section 2 is a compilation of facts about setconvergence and the (hypo-) epi-convergence of functions.

2 2. Preliminaries. Let F = F (E) be the class of closed subsets of E . A sequence {F F , lN } (topologically) converges to the (closed) set F if Lim sup F F Lim inf F with Lim inf F := {x E x limit point of {x } , x F for all but nitely }, =1 Lim sup F := {x E x cluster point of {x } , x F for innitely many }. =1 It is well know that this convergence induces a topology T on F . With G , the class of open subsets of E , and K, the class of compact subsets of E , and for any D E ,
F D := {F F F D = }

and FD := {F F F D = },

the sets
{F K FG1 FGs K K, Gl G, l = 1, . . . , s, s

nite }

determine a base for the topology T . The topological space (F , T ) is separated (Hausdor), has a countable base and is compact, and consequently metrizable, e.g., see [1, 2]. h A sequence of functions {f : E lR, lN } hypo-converges to f : E lR, f f , if for every x
f(x) lim inf f (x )

for some x x, for all x x.

f(x) lim sup f (x )

The functions f hypo-converge to f on C E if the f hypo-converge relative to C . Hypo-convergence of the functions f to f implies and is implied by the identities: Lim sup hypo f = hypo f = Lim inf hypo f , with hypo g = {(x, ) f(x), x E} E lR, the hypograph of the function g. Hypo-convergence provides a minimal framework that guarantees the convergence of the suprema. It is easy to verify that if C E is compact, then
h f f = sup f (x) sup f xC xC

(2.1)

This next proposition encapsulates the avor of our approach. 2.1. Proposition. A sequence of extended real-valued functions { f , lN } dened on a metric space E converge uniformly to a function f on a compact C E if and only if the sequence of functions { |f f|, lN } hypo-converges on C to the function that is identically 0 on C . Proof. It is immediate that the uniform convergence of the sequence { f , lN } to a function f on a compact subset C E , i.e., supxC |f(x) f (x)| 0 implies the hypo-convergence on C of the sequence { |f f |, lN } to the function that is identically 0 on C . For the converse, observe h that |f f | 0 on C , means that for all x C and > 0, there exists Vx, a neighborhood of x, and (x) such that for all x Vx, and all (x), |f (x ) f(x )| . With x varying over C ,

3 the Vx, determine an open cover of C . Since C is compact, this open cover admits a nite open cover, say { Vxk , , k K, K nite }. Let = maxk (xk ). It follows that for all x C and all , |f (x) f(x)| . When E is a locally compact separable metric space hypo-convergence induces a topology, the hypo-topology, on the space of upper semicontinuous (usc) functions [1, 2]. This provides the functional framework for the study of statistical processes with usc realizations, in particular for processes whose realization are empirical distributions. Indeed, a stochastic process { g(x, ), x E } dened on a complete probability space (, A, ) with usc realizations g(, ) and measurable (i.e., (x, ) g(x, ) is BA-measurable) is an upper semicontinuous random function, i.e., a random function that is usc with respect to x and hypo-measurable (measurable with respect to the Borel eld generated by the hypo-topology on the space of usc functions) [9, theorem 6.1], [8]. Almost sure hypo-convergence of the stochastic processes { (g (x, ), x E), lN } to the process h { g(x, ), x E }, denoted g g a.s., means that there exists a -null set N such that
h g (, ) g(, )

\ N.

The next proposition provides a characterization of a.s. hypo-convergence in terms of the convergence of the suprema. An earlier proof, given in [6], was based on a characterization of the a.s. convergence of measurable set-valued mappings, a more direct proof follows. 2.2. Proposition [6]. Let { g; g , lN } be a family of hypo-measurable stochastic processes with usc realizations and values in E a separable metric space. Let U be a base of open sets for the topology on E . Then, the g a.s. hypo-converge to g if and only if for all open sets B in U and all lR there exists a subset N (B, ) of with (N (B, )) = 0 such that
(i) (ii) { | sup g(x, ) > } { | sup g (x, ) > a.a. } N (B, ),
xB xcl B xB xcl B

{ | sup g(x, ) < } { | sup g (x, ) < a.a. } N (B, ),

where cl B is the closure of B in E and a.a stands for almost always and means that the inequality is satised for all but a nite number of s. Proof. Suppose that
h g (, ) g(, )

\ N, (N ) = 0.

Let B be an open ball and lR. Let be such that supxB g(x, ) > . Then, there exists x B such that g(, ) > . By hypo-convergence (see above) with = there exists x x with x g(, ) lim inf g (x , ). Moreover, for suciently large, x B since B is open. Thus, for all x suciently large, we have that supxB g (x, ) g (x , ) > and this yields (i). To show (ii) let be such that supxcl B g(x, ) < . Because E is a separable metric space, we can always nd a countable base of E of relatively compact open balls. Arguing by contradiction, passing to a subsequence if necessary, assume that supxcl B g (x, ) for all (in the subsequence). Since cl B is compact and g (, ) is usc, this yields a sequence {x } cl B with g (x , ) = supxB g (x, ) . Since all {x } belong to cl B , they cluster at some point x cl B . From hypo-convergence we have
> sup g(x, ) g(x, ) lim sup g (x , ) ;
xcl B

4 this is a contradiction. And thus (ii) must hold. To show the if-part, let N (Bi , j ) be the sets of null -measure that appear in (i) and (ii) with Bi varying over a countable base (for E ) of open balls and the j in a countable dense subset A of lR. Set N = i,j N (Bi , j ). Note that (N ) = 0. We show that if (i) and (ii) are satised, then for all h \ N , g (, ) g(, ). Let \ N and x E . Let {Bs (), s = 1, . . .} a countable fundamental system of neighborhoods x of x and let {s }s be a sequence in A converging to g(, ) with s < g(, ) for all s. For each s, x x g(, ) > s implies supxBs g (x, ) > s for suciently large, say > s . Hence, for all > s there x exists xs Bs with g (xs , ) > s . Choosing s as s , we can generate a sequence {x } such that
x Bs

and

g (x , ) > s

for s < s+1 .

It follows that x x and lim inf g (x , ) lims s = g(, ). We have just proved that for each x \ N the lim inf-condition for hypo-convergence is satised. We obtain the lim sup-condition for hypo-convergence from (ii). Let \ N , x E , x x. Let {s }s be a sequence in A converging to g(, ) with s > g(, ) for all s. For each s, since g(, ) is usc x x at x there exists cl Bs such that supxcl Bs g(x, ) < s . By (ii) we also have that supxcl B g (x, ) < s for suciently large, say > s . Also, g (x , ) < s for suciently large, say > s . It follows that lim sup g (x , ) s . The argument repeated for every s yields lim sup g (x , ) lims s = g(, ) x and completes the proof. 3. Uniform convergence of probability measures and a.s. uniform convergence of random measures. For a probability measure P on B, the restriction D on F with D(F ) = P (F ) for all F F is an usc function on the topological space (F , T ), [7]. Moreover, for the family of probability n measures { P ; P , lN } narrow convergence P P is equivalent to the hypo-convergence of the (usc-)restrictions D; D , lN }, [7],
n P P h if and only if D D.

The uniform convergence of the probability measures P to P on a subset C of F can thus be expressed in terms of the convergence of suprema, namely,
sup |D (C) D(C)| 0.
CC

In view of proposition 2.1., a necessary and sucient condition for this to hold, is that the function |D D| hypo-converges on C to the null function. Because |D D| 0, from the denition of hypoconvergence, it follows that this is equivalent to demanding that for all C C , and every sequence C T -converging to C
lim sup |D (C ) D(C )| = 0.

(3.1)

Hypo-convergence of the D to D is not enough to guarantee (3.1), but it will do if the set C is a bicompact subset of the continuity set, contD = contP F , of D; note that contD = { F F | D(bdry F ) = 0 }.

5 3.1. Denition. A subset C of F is bicompact if it is compact with respect to T and if for any set C C and any sequence { C , lN } with the C T -converging to C , the complements of the interior of the C also T -converge to the complement of the interior of C , i.e., cpl(int C ) cpl(int C) We call such set (T -)bicompact because the second condition corresponds to the compactness of C = { int C, C C } with respect to the topology T on the open subsets of E induced by the T convergence of their complements. Classes of sets that are bicompact have been identied in [4]. They include, for example, convex sets (assuming E is a linear space), and any collection of sets that can be obtained as the (lower) level sets lev f := {x f(x) } of a continuous function.
h h 3.2. Proposition. Suppose C contD is bicompact and D D. Then |D D| 0 on C . h Proof. From D D, it follows that for any F and sequence F F , lim sup D (F ) D(F ). In turn, this implies that for any open set G and any sequence G of open sets such that cplG cplG,

P (G) lim inf P (G ).


l N

Because C is bicompact, for the closed sets F and F , and with G = int F , G = int F , we have that cplG cplG, and thus
P (int F ) lim inf P (int F ).

(3.2)

By (a) upper semicontinuity of D at cpl(int F ), (b) cpl(int F ) cpl(int F ), and (c) C contD, we have
lim supD(F ) D(F ) = P (int F )

= 1 P (cpl(int F )) = 1 D(cpl(int F )) 1 lim sup D(cpl(int F ))

lim inf P (int F ) lim inf D(F )


For any > 0 and suciently large, it follows that


D (F ) D(F ) = D (F ) D(F ) + D(F ) D(F ) < .

Moreover, because D is usc at F , relation (3.2) and F C contD, for suciently large we have
D (F ) D(F ) > P (int F ) D(F ) /2 > P (int F ) D(F ) = .

The last two string of inequalities hold for every > 0, they imply that lim sup |D (F ) D(F )| = 0 and this completes the proof. By relying on the correspondence between P s and Ds ([7]), we can rephrase this result in terms of the probability measures P . 3.3. Proposition. Let { P ; P , lN } be a family of probability measures dened on B. Suppose that the P converge narrowly to P . Then the P converge uniformly to P on any bicompact subset of F contP . The extension of this result to the a.s.-convergence of random probability measures is immediate. Given a probability space (, A, ), the stochastic process { P (B, ), B B } whose realizations

6 are probability measures dened on B, is called a random probability measure. Because every random probability measure P is uniquely dened by its restriction to the closed sets (again [7]), we can identify such a stochastic process with one that involves the corresponding (usc) functions: { D(F, ), F F }. We note that in view of our earlier observations, this is a measurable process with usc realizations. For a family { P ; P , lN } of random probability measures, almost sure narrow convergence means that there exists a set N of -measure 0 such that for all \ N :
n P (, ) P (, ),

or, equivalently
h D (, ) D(, ),

\ N.

n We now rely on proposition 3.3 to conclude that if P (, ) P (, ) a.s. they also a.s. converge uniformly on every bicompact subset C of F that is contained in contP .

3.4. Proposition. Let { P , lN } be a sequence of random probability measures, and P a proban bility measure, all dened on B. Suppose that P (, ) P () a.s.. Then, they converge a.s. uniformly on every bicompact subset of F contP . 4. Glivenko-Cantelli type results. In the statistical framework described in section 1, it follows from the strong law of large numbers that for iid observations { X , lN }, the sequence of random empirical measures { P , lN } converge narrowly to the distribution P , i.e.,
P (B, ) P (B) B B, (4.1)

i.e., for every B B there exists a -null subset of , say NB such that
P (B, ) P (B) \ NB .

For the corresponding random usc functions, the restrictions of the P to F , we have that
D (F, ) D(F ) \ NF , F F .

We thus have a.s-pointwise convergence of the stochastic processes D (indexed by F ) to the constant valued stochastic processes { D(F, ), F F } with D(F, ) = D(F ) for all . However, as pointed out in section 1, to obtain a.s. uniform convergence, a minimal requirement is the a.s. convergence of the empirical random probability measure, or equivalently, the a.s. hypoconvergence of the corresponding random usc functions D to D. In general, for measurable stochastic processes with usc realizations, the a.s. convergence (as stochastic processes) and the a.s. hypoconvergence are not equivalent, neither implies the other [8, section 3]. But for random probability measure, their specic properties (monotonicity) allows us to show that in fact, a.s. convergence (in the classical sense of stochastic processes) is enough to ensure a.s. hypo-convergence.

7 Before we get to this, let us identify maximal elements for the sets in the (countable) base
S := { F K FG1 FGs | K K, Gl G, l = 1, . . . , s, s

nite }

for the topology T ; recall that K and G consist of the compact and open subsets of E . Let H be a nonempty subset of S with cl H its T -closure; note that H is nonempty if for l = 1, . . . , s, Gl K . We are going to show that cl(cplK) =: F is a maximal element with respect to inclusion in cl H, i.e., F cl H and F F for all F cl H. Pick > 0 such that for l = 1, . . . , s, Gl K := { x E | dist(x, K) < }. Now choose 0 such that for all , < and let F = cplK . It is obvious that the F H and that F F . Thus F cl H. Now, for every F H, we have that F K = and consequently, F cl(cplK) = F . Moreover, for every F cl H there exist F F with F H so that for all , F F . It follows that F = lim F F . 4.1. Theorem. Suppose { D; D , lN } is a family of random usc functions dened on F obtained by restricting probability measures { P ; P , lN } (dened on B) to F . If
D (F, ) D(F, ) a.s., F F,

then there exists a set N of -measure 0 such that


h D (, ) D(, )

\ N.

Proof. In view of proposition 2.2, it suces to check if the inclusions (i) and (ii) of proposition 2.2 are satised for every open set H in the base
S := { F K FG1 FGs | K K, Gl G, l = 1, . . . , s, s

nite };

for the topology T (on F ), and all lR. More precisely, we have to show that there exists a set N (H, ) of -measure null such that
(i ) (ii ) { | sup D(F, ) > } { | sup D (F, ) > a.a. } N (H, ),
F H F cl H F H F cl H

{ | sup D(F, ) < } { | sup D (F, ) < a.a } N (H, ),

where a.a. stands, as in proposition 2.2, for almost always. Let FH be a maximal element in cl H with respect to inclusion (). Because D and the D are usc on F , they attain their maximum on every closed subset of the compact space F , and thus
D(FH , ) = sup D(F, )
F cl H

and D (FH , ) = sup D (F, )


F cl H

so that (ii) follows from the pointwise a.s. convergence of the D to D at FH . Let H = F K FG1 FGs be a nonempty element of the countable base S . In the remarks that precede this theorem, we noted that H = implies the existence of a sequence of strictly positive numbers { , lN } monotonically decreasing to 0 such that for all , the set
H = F K+ B FG1 FGs

8 is nonempty. The sequence of sets { H , lN } is contained in H and H = H . Since for all , supH D (, ) is always larger than supH D (, ), we have
{ | sup D(F, ) > } = { | sup D(F, ) > , sup D (F, ) > }
F H F H F H F H F H

{ | sup D(F, ) > , sup D (F, ) } { | sup D (F, ) > }


F H

{ | sup D(F, ) > , sup D (F, ) }


F H F H

Because (F , T ) is a compact space and the functions D are usc, thus supH D (, ) = supcl H D (, ) and this last supremum is attained at A := cl cpl(K + B); see the remarks that precede the theorem. Thus
{ | sup D(F, ) > } { | sup D (F, ) > }
F H F H F H

{ | sup D(F, ) > , D (A , ) }.

Because this holds for every , it follows that


{ | sup D(F, ) > } { | sup D (F, ) > a.a. }
F H F H

{ | sup D(F, ) > , D (A , ) a.a. }.


F H

But this last set can only be a set of measure zero. Indeed, let
N (H, ) = { | D (A , ) D(A , ) }.

This set is of -measure null; that follows from pointwise convergence and the fact that there are only countably many sets A . If does not belong to N (H, ) and supH D(, ) > that means that there exists F H such that D(F, ) > . For suciently large, F A and from pointwise convergence, and the exclusion of N (H, ), it follows that cannot belong to
{ | sup D(F, ) > , D (A , ) a.a. }.
F H

And hence
{ | sup D(F, ) > } { | sup D (F, ) > a.a. } N (H, ),
F H F H

which completes the proof of (i). We can now apply this result in the Glivenko-Cantelli framework. 4.2. Theorem. For any sequence of iid random variables { X , lN } dened on a probability space (, A, ) with values in E . The empirical random measures,
P (B, ) = 1

1 [X i ()B] () l
i=1

B B,

a.s. converge narrowly to the common distribution P of these random variables. Moreover, they a.s. converge uniformly on every class of closed sets contained in contP that is T -bicompact. Proof. An immediate consequence of the preceding theorem and proposition 3.4. The approach that we followed directs our attention to the fact that to obtain the a.s. uniform convergence of empirical measures there are two basic ingredients that enter into play. First, a

9 condition is needed to ensure the a.s. narrow convergence. This role is played here by the iid condition. A second condition is needed to guarantee the passage from a.s. narrow convergence to uniform convergence. This is a condition that must guarantee that the class of sets under scrutiny has a certain property. We have seen here that bicompactness (with respect to the topology of set convergence) is a natural requirement. References [1] Hedy Attouch, Variational Convergence for Functions and Operators, Pitman, London, 1984.. [2] Szymon Dolecki, Gabriella Salinetti & Roger J-B Wets, Convergence of functions: equi-semicontinuity, Transactions of the American Mathematical Society 276 (1983), 409429. [3] Peter Gnssler & Winfried Stute, Seminar on Empirical Processes, Birkhuser Verlag, Basel, 1987, a a DMV Seminar, Band 9. [4] Roberto Lucchetti, Gabriella Salinetti & Roger J-B Wets, Uniform convergence of propability measures: topological criteria, manuscript, University of California-Davis, November 1988. [5] David Pollard, Convergence of Stochastic Processes, Springer Verlag, Berlin, 1984. [6] Gabriella Salinetti, Funzione aleatoree semicontinue e misurie aleatorie, Rendiconti del Seminario Matematico di Milano LVII (1987). [7] Gabriella Salinetti & Roger J-B Wets, On the hypo-convergence of probability measures, in Optimization and Related Fields. Proceedings, Erice 1984, R. Conti, E. De Giorgi & F. Giannessi, eds., Springer Verlag Lecture Notes in Mathematics 1190, Berlin, 1986, 371395. [8] Gabriella Salinetti & Roger J-B Wets, On the convergence in distribution of measurable multifunctions (random sets), normal integrands, stochastic processes and stochastic inma, Mathematics of Operations Research 11 (1986), 385419. [9] Gabriella Salinetti & Roger J-B Wets, Random semicontinuous functions, in Nonlinear Stochastic Processes, L.M. Ricciardi, ed., Manchester University Press, Manchester, 1988, I.I.A.S.A. WP-86-47, Laxenburg, Austria, August 1986. [10] Galen R. Shorack & Jon Wellner, Empirical Processes and Applications to Statistics, John Wiley & Sons, New York, 1986.

You might also like