Convergence of Random Sequences

Chapter 9 Convergence of Random Sequences 9.1 Sequences of R.V.’s ‘We often encounter soquences oft.’ and are interested in the behavior of these sequencer, such as whether they converge of not, and if so, in what cenee, Since a rv. ita function (albeit ‘ special one) from the sample space $ to the (extended) zeal numbers R, 2 sequence of rs is just a sequence of functions. Hence, the convergence concepts atsociated with sequences of functione are also valid for sequences of r.'s. However, v's ate special type of functions casually characterized by their PDF's, CDF's,or moments rather than bythe specific values they take at each outcome oS. Hence, there are cartain convergence concepts that are particularly devised for sequences of rv Consider 2 sequence of -.'s (Xa) = Bn B21} = i Xe Kay) (a) ‘We assume that each X; has a PDF fi(z) and a CDF A(z). We are interested in ways the andom sequence {2,) converges as n+ oo toarrv. X with PDF fy(2) and CDF Fx(s). ‘The types of convergence which wil be discuated are the flowing: 1. Almost sure convergence (aka. convergence with probability 1), 2. Convergence in probably (aka. stochastic convergence), ‘3. Convergence in quadratic mean (aba. mean-square convergence), 4, Convergence in dstibution (a.k.a. convergence in lav) on92 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES First, however, we will briefly review the convergence concepts for sequences of real numbers and eaquences of (deterministic) fanctions. Definition 9.1 The sequence of reel numbers {sy} converges 105, iffor any €> 0, there existe «positive integer (= N(@), such that Ie-dce YaRN, (02) where the symbol ¥ denotes ‘forall. ‘That {60} converges to sis denoted as: im 5, = s. Note that the postive integer inthe above definition depends on the choice of ¢, hence it ean be denoted as 1) to express this dependence, For example, the smaller is, the larger 7 might have to be. ‘The next question is; What can we sny about the convergence of 2 sequence of real-valued functions, say {q(t}, 2 3) defined for Le[e,3}? Suppose the fanctons in the sequence are ceraluated, say 611 = te toyield the sequence of ral numbers: Xy(,), Kat) Xalte) --- Now, if this sequence of real mumbere converges according to the above dfiition, that is, if ) then the sequence (X(6)} i ald to converge to X(@) at t= ty. Such convergence is called pointise convergence, and if ie trae for every point in [a8], then the sequence of fonctions {Xa()) is sid to pointeive converge to a function X(.) on the interval [a8]. Note thatthe function X() is obtained /deined at the nitng values of the sequence of fonctions (X4(t)h for tele) Also note that inthis cate, the integer 1 depends not only on the selected, but als on, the point the sequence of factions are evaluated at. Thes, the pointwise convergence ofthe sequence of functions (Xq(t)} to X(}) fr tela is defined as: Por each ¢in [0,8] and > 0, there exists an (= (6) such that |Xa(0)~ X(0) < efor alls 2 A stronger typeof convergence fora soquenesoffanetions (X()} i the notion of uniform convergence, which is defined as follows: The sequence of functions X,(), alt), Xa(t defined on (a,8] ssid to converge uniformly to X(t) on [a] if and only iffor any €> 0, there such that cxsts a positive integer (= N(6) L&a()- XO) < € (oa)9.1. SEQUENCES OF RY.'S 33 for all n > N and forall te(a,8). Note that 1 in this cate is function of ¢ but not oft So for the given ¢, there exists an JV such that the definition holds for al in (a5, hence the convergence is called uniform. [Note thatthe above definitions of pointwise cosvergence and uaiform convergence can be Jefined for sequencas of functions that are defined on semi-infinite or infinite intervals such a {051 < 20} of (-00 <1 < eh. ‘The following example luetrates the dlstinction between pointwise convergence and un form convergence. Example 9.1 Consider the sequence of rel-calued functions defined on t [0,1], moe {> GES» nnaa. es ‘See Fig, 9.1 for the frst few functions in this sequence. [tis easy to see that this sequence of functions converges pointwise to xi) @o t o< st Fort = 0 all functions inthe seruence are epual to 1, hence the sequence clearly converges fo 18 X(0), Por ony tin (0,1), the first ng elements of the sequence of fenctions take valve 1 end the rest take vlue 0, where ng is such that hy 0 is selected, we con let N= no, thus aseuring thet [Xalt) - 0] = 0 < ¢ form > N= ny and the specifi t selected, Hence, leary the sequence of fection is pointwise convergent tothe Limiting function given above However, the soquence i not uniformly convergent, because for any given ¢ > 0, there does not exist ¢ single N suck that |Xal#)~ 0] < ¢ for n> N, for allt én (0,1]. This is so, fecouse, no matter how lenge an IV is slected, there wil aiays be (infinitely many) valves fn (0,1) Jor which [Xa(2)~ 0] <¢ for m > N is not true. For a specific t, we can find a large ‘enough N, but we connect find an I for which the ebove i true for allt. In other words, the 3 selected also depends on t. Thus, this cequence of functions does not converge uniformly. EOE ‘Another simple example of a sequence of fonctions that converges pointwise but not “uniformly isthe Fourier seties representation of periodic eqsare-wave that takes halfway values4 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES Figure 9.1: A sequence of functions defined on [0,1]. st the discontinuities. (Actually, thie cold be any periodic function with dscontinaitis,) fwe let the partial eum ofn terme of the Fourier serias be dfined as X(t), then we can show see that the sequence Xq(t) converges pointwise tothe equarewave. Note that at points of discontinuity the Fourier ceies converges tothe halfvay values. Hence, ifthe square-waveis already defined to take the halfway values atthe discontinnities, ther the pointwise convergence holds everywhere: otherwise it would hold everywhere except at the poiats of discontinuity. However, we know ‘rom Fourier Analysis that due to what is known as the Gibbs phenonmenon, the partial sum of the Fourier series displays oscilations (overthoots and undershoots) around a discontinuity and with increasing number of terms in the sum, the oscilations got compressed in t, but the ‘magnitude of the oscillations (overshoots and undershoote) remain constant, Hence, for smell enough ¢ > 0 no matter how large m ie, that is, ao matter how many terme there Is in the Fourier eeies sum, there wil always be some ¢ close enough tothe point of discontinuity where the sum will aot be within the € neighborhood of the square-wave. Thus the Fourier cares sama does not converge uniformly to the equare-wave, although it converges pointwve. See Fig. 9.2 for plots ofthe square-wave and the partial sum of the Fousier eetiee with increasing number of terms in the sum, Since rs are real-valued functions defined on the eample space 5, itis natural to talk about pointwise convergence of a sequence of r.v's (Xz). This corresponds tothe sequence of real numbers (Xy(e), X(t), Xa), ++) converging to a real sumber for each seS. But this Aefition of convergence may be excessively stringent for rvs, For example, suppose at some a X is points 45 the sequence {,} falls to converge, so that at these points the limit func undefined. If the set of points on which the sequence fil to converges small enough, thie doos92, ALMOST SURE CONVERGENCE 95. Figure 9.2; Peiodic square-wave and ite partisl Fourier sum with increasing n. rot seam to be a suficiat reason for throwing away the whole sequence as non-convergeat, considering that in many instances the behavior ofa rv. 08 a set of probability 2ero may be ‘ignored completely. For example, if we consider a zy. as an equivalence class of fonctions, thea pointwise convergence ie tao restrictive fora sequence of rvs. So for e's it is useful to devise weaker forms of convergence. 9.2 Almost Sure Convergence Almost eure (a.s.) convergence is also known es convergence almost everywhere (ae.) of consergence with probability (w.p.) J. This type of convergence represents our wilingnes to ignore the failure of pointwise convergence on suficieatly small subset ofthe sample space S. Iie deized as: Definition 9.2 Let (X,) bea equence of rx.’ defined on the probability space (S$,F,P). The sequence ofr.'s (Xq) i aid to converge almost surely (a.s.) tothe nu. X, ifthe sequence of mumbers (Xa(8)} converges to the limit X(s) for all seS escept those belonging to 6 set of probabity zero, tht is, if Pfs: Him, Xo(s) = X00) (on)96 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES Symbolially, convergence almost surely is denoted by (98) X= X as of Xe i for ae. for almost everywhere (instead of es.) of wp. 1 for with probability 1 (instead of [Atmost sure convergence is slightly weaker than pointwise (and usiform) convergence Since the set, say A, on which pointwise convergence fas may be empty, in which case we Ihave pointwise convergence, itis clear that: if a sequence of sv.’ converges pointwise, then it alo converges almost surely. Oa the other hand, itis easy to construct a sequence of 1's which converges elmost surely, but not pointwise. For this we could take any sequence ofr’: that converges pointwise and redefine the valuos of the sequence of functions (rvs) on a set of outcomes seS with probability 2ero to be osilating, hence not converging; this would create a new sequence of rv. that converges almost surely but not pointwise. ‘With pointwise convergence, if a eoquence converges to limit function, that function is defined exactly and aniquely. In contrast to this, convergence almost surely uniquely defines buly aa equivalence class of fonctions, similar to the oqulvalance clase defsition of rs “The following theorems provie equivalent definitions for convergence almost surely. They ase useful in relating a.t. convergence to other types of convergence. Proofs will be omitted, as they aze beyond the scope of this presentation. ‘Theorem 8.1 A sequence {Xq} of re.’ converges as. to.X if and only if for any ¢> 0 PMs Uen {IXm— X12 9) (9) ‘Theorem 9.2 A sequence {Xa} of ra.’s converges at. to X if end only if for any €> 0 end 5 > 0 there exists an integer N (= 1V(c,6)) such thet PCy {Xn X] < JP 2 1-8 (210) 9.3 Convergence in Probability Convergence in probability, alzo known as convergence in measur or stochastic convergence, i defined as fllows:9.3. CONVERGENCE IN PROBABILITY 97 Definition 9.9. The sequence {Xq} of res converges in probability to the re. X if and only if for any «> 0, dim Pex] > d= 0 ean An equivalent statement would be: The sequence {fn} of rs converges in probability to the rv, X if and only if for any € > 0 and 6 > 0, there exists an integer IV (= (6) such at PUXp-X1> <6 YN (a2) Convergence in probability is denoted as Xq 2 X ot plimX, = X. Note that convergence in probability appears to be similar to convergence almost surely, but in fact they are quite , +k the largest integer such tat 852 < mand j = n~ G2, Thus, th corerpondence betveen n end (i, ie one tone. In terms of (yj), the eae Ay can expressed os A Thus, Pan) = 3 ‘This sequence of r..'s {Xq} converges in probability to the rx. X= 0. Tabe ony ¢ >0. First consider, €> 1, whichis certainly alowed. For ¢> 1, PUXn=X|>6) = P%q> 0 = 0 foraltn (218) $0 fim, P(Xq-X|>0 =0 for e21. Now consider 0< ¢<1. We have P(X, X1> 0 = PXy> 9 7 (an)9.3, CONVERGENCE IN PROBABILITY 99 ow since» < S62 < LEBE, fotlows that (VIR ~ 1) $ i. Therefore, (hr geic gl = r rroat< gis mee (ea) Thus, {Xq} converges to X = 0 in probability Hlossever, it dote not converge almost surely, because for any 6S, such that s# 0, if we evaluate the sequence of 1's at this s, the resulting sequence of real members will consist of bunch of 0° followed by « 1, then another bunch of 0's, then @ 1, then @ bigger bunch of 0, and then a J, and so on. No matter how for we go, there will always be a 1 following eack bunch of 's. So we will never get rid of the 1's no matter how far we go. This is so becouse of the woy Ay sets are defined: for each i, they sweep S = [0,1] completely hence for each i, for some j = 1,2,..us4, one of the 1's will abe value 1 for any s0,1). In other words, the sequence of real numbers, consisting of @ bunch of Os followed fy an occasional but certainly ‘occurring 1, does not converge. (Technically, no matter how lage we select N, there is ehuays n> N such that Xq = 1.) And thie is true, forall s £0 in $. Thus, {Xq} does not converge at ony point in S (except at 0) to X = 0. So, clearly, the proabiity of points the saguence of ‘real numbers converges i not J; it ée actually . This completes the proof ofthe claim thet the sequence {Xq} does not converge almost surely. Note, Kowever, that there is @ subsequence (as implied by the generoi assertion stated hove) of {Xa} which converges almost surely (even pointwise): actully there are many such Xa, Xe Ain} converges almost curly subsequences. For ezomple, the subsequence {X:, ‘The reader ean enily verify this. BOE ‘A sequence that converges in probability, lke one that converges almost surly, dote sot define a unique mit rv. Both types of convergeaces define only an equivalence class of 6. With the counterexample presented above, we just showed that convergence in probability, that is, P(UXq—X|> ¢) converging to 0 as n ~ oo, i not euficont for almost sure convergence. ‘The following theorer, however, indicates that if P(|Xq — X| > 6 converges to 0 fast enough, then that is sufcent for almost sure convergence. ‘Theorem 9.4 A sequence (Xq) of 0's converges almost surely to X, iffor any €>0 S puna 9 < © (219)10 CHAPTER 9, CONVERGENCE OF RANDOM SEQUENCES ‘We can easily verify that in the above example P(|Xq ~ X| > ¢) is converging to 0 but not atthe rate sogired in this theorem. 9.4 Convergence in Quadratic Mean Convergence in quadratic mean (g.m.), alzo called mean-square (m-s.) convergence, is defined follows: Definition 9.4 ‘The sequence of 0s {Xn} is seid to converge in quadratic mean tothe ne Xf E(Xel} <0, BUXP} < 00, and Hin, BU. xP} = 0 (020) ‘An equivalent statement would be: The sequence {Xa} of v's converges in quadratic (X if and only if for any 6 > 0, there exists an integer mean to the EllXe-XP} <8, Ya>N 21) Convergence in quadratic mean is usually denoted as Xa 2S X or Xp BS or Lim xy =X (2 ‘where Lim, denotes fimit in the mean Convergence in qm. is one of the most commonly wed type of convergence. Similar to ‘cher types of convergence; convergence i qm. defines the mit rv. only upto an equivalence class, Note that, in convergence in q.m., what converges is the soguence of second moments of |X, ~ Xj thus it is again true that in each type of convergence a diferent typeof entity is converging. Instead of 2 as the exponent in the definition for convergence in q.m, if we take > 0, aay positive integer, then the resulting convergence i called convergence in the rth, ‘The following theorems provide the relationship between convergence in q.m. with other types of convergence already considered. ‘Theorem 9.5 Ifthe sequence {Xq} of 22.’ converges to X in quadratic mean, then it aso ‘converges in probability.9.4. CONVERGENCE IN QUADRATIC MEAN on Proof: Illus from Chebyshev ineguaby that Pilxa—x12 9 < Hlta= XE (02) for any ¢> 0. The RHS goes to 0, since {Xq} converges ia qm. Therefore, the LHS goes to 0. ‘Thus, the sequence converges in probability ‘Theorem 9.6 For 0 0, we ean easly ind an (= N(s, 6) opecitclly, N 2 ¥ wil do it) anc that IX(9)-01 5 6 Yeo (028) ‘That is {a} converges pointwise 200, although not uniformly. (Of coarse, it also converges ) On the other band, as to P%,- OP} = (028)saz CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES Which, of courte, converges tooo as n + oo, not to 0, So the sequence {Xs} does not converge in gm, Since pointwise convergence implies convergence a. and coavergence in probability, this, ‘example has ako shown that convergence a, or in probabity does aot imply convergence in am 9.5 Convergence in Distribution Convergence in distribution, alo known as convergence in lay is defined as follows Definition 9.8 A sequence {Xq} of r.x'+ is said to converge in distribution to nv. X, if the distribution function (CDE) of Xa» Fxq(2) converges pointwise to F(x), the distribution Junction of X, at all continuity points of F(z), that i, if im Fes(2) = Fx(2) (oar) at all continuity points of Fx(2) Convergence in distsibution is denoted at X,, 2. X, The following is an example ofa sequence of rs that converges in distribution. Example 9.8 Let ru. Xy be Laplacian distributed with parameter o = n. That is, it has PDP mo = 8 28) Determine whether or not the seuence (Xq) converges in distribution. And, ifs, to what? First we find the CDF of Xe. his jer foro Fas) = (0.29 La few forzz0 Plots ofthe CDF"s Fxy(2) are shown in Fig. 0.4. Ii easy to oe that a <0 fim Fle) =} $220 (230) 1 apa95. CONVERGENCE IN DISTRIBUTION ss Figure 94: A sequence of Laplacian CDF. Thos, the sequence (Kq} of 0's converges in isribtion tothe n.. X with dition ‘unetion Fx() = {2 eon (oan) whichis the CDF of the nex X= 0. Hote that Fe(2) has « discontinuity at 2 = 0 ond Exg(2) + F(a) erphereeacept at 2 = 0, Hence, the definition of convergence in datr- bution i satisfied. Therefor, Xq 2. X = 0 1 ie left as an exereise to the seader, to determine whether {Xs} converges in probability ‘ond in quadratic mean, EOE ‘The following theorem establishes the relation between convergence in dls other types of convergence, ‘Theorem 9.9 if Xq "5 X, then Xq 2 X. Proof wil be omitted. Since convergence as. and convergence In gun. Imply convergence in probabilty, it follows from the above theorem that as. convergence as well as convergence in ‘qm, imply convergence in distribution. ‘Thus, convergence in distsibution is the weakest of the four types of convergence dscusted s0 far. The following counterexample indicates that convergence in distrbation does not imply convergence in probability,out CHAPTER 9, CONVERGENCE OF RANDOM SEQUENCES Example 0.4 We will now construct «sequence {Xn} ofr.’ hati convergent in dstribution but notin probability. Let X ~ (0,1), that is, X és Geussian with 0 mean and 1 voriance. Define Xq 0s (932) ‘Then the sequence corresponds to {-X,X =X, X,-¥ (933) ‘The oid members ofthe sequence have the CDF Pxl) = PC-X Ss) = P(X2-2) (934) = L- Fre) = 1-1 - Fx(-2)) (9.35) = Fels), (936) luhich is identical to the CDF of the even members ofthe sequence. In other words, ~X is also ¥(Q,1)- That é, Fya(2) = Fa(2) = N(O,1) for all n21 (oan) Hence, Xq 2. x. ou consider © fork een ea {Sa ‘or k odd oa) abe an arbitrary €> 0, say ¢= 1. Then PU X21) = {Gets » fete (o39 0 fork een Sve iee con (040) ‘Thus, fim, P(X, — XI 1) does not exist. Therefore, {q} does not converge in probability wex=0 EOE Convergence in distribution is wed in the Central Limit Theorem which wil be it- cussed ina later section,96. PURTHER COMMENTS ON CONVERGENCE OF SEQUENCE OF RV.S 9:18 converze converge coo power convene aaifamly PP pointe AP Since sry EE retstiy TP icsetaon t conver comvege iorhnen “PP ngtmen Figure 9.5: Summary of convergence relationships 9.6 Further Comments on Convergence of Sequence of R.V.’s 1. In the previous sections we discussed various types of convergence for random sequences snd their relationships with each other. The diagram in Fig. 9.5 summasizes these rela tionships, tis possible to use thete relationships inestabihing convergences of certain types, For exarapl, if we prove or know that sequence converges as, then it fllows by the theorems stated above that it also converges in probability and in distribution, 2, When a sequence of rvs converges in more than one sense, the limiting rv. in each convergence is the same, In other words, the sequence cannot converge to one Limit in cone sence and a different limit in another sense, In whatever senses it converges, it must be to the same ining rv.6 (CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES 3, Note that in many instances the limiting rv. is jut a constant such as 0 or 1. But, of, ‘nurse, a constant is also a rv. (a degenerate one). Therefore, in general we view the limit ofa random sequence as a. 4. Note that in all types of convergence of random sequences, to be able to verify whether ‘or not the convergence definition holds, we need to know the limiting 1.v. X. This might ‘pose a problem: Given the sequence {2}, how do we know what X tose in verifying ‘the definition? IX is epacifed, we can of cours try that. If itis not epecfed, then by evaluating tho v's in the sequence or their CDF's with increasing n, uewaly i i not Aificlt to identify what limit X the sequence might eoaverge to, if it converges at all Hence, asually we can easily identify X the possible imit the voquence might converge to, However, corexponding to each type of convergence studied (a. in prin gan. and ia 4), there is also s mutual convergence concept defined by using Vie Xal ome 0s (241) instead of [Xq—X| =z, 0 in the dfinition. In each type of convergence, the mutual convergence is equivalent to the corresponding ordinary convergence, Since the limiting sv, X isnot neoded in implementing the mutual convergence definitions, we can uee these Aefntions if the potential limiting rv. X i not easly identifiable for a given random sequence {2,}. But usually the application of the motual convergence definition is more involved thaa that ofthe ordinary convergence, hence the latter would be preferred if X ‘ean be identified. 5. In determining whether a random sequence converges in a certain sense or not, there sometimes is «tendency to guess the result and give an incomplete or incorrect justifies: ‘lon. Students are advised to avold that and obtain the answer by applying the definition precisa. Anything other than that is bound to lead to incorrect reults, For example, to determine convergence in probability, we simply have to determine the probability P(|Xq—X| > 6) asa function of m and see if that converges to as n 0, To determine convergence in qum., we have to determine B(|X, — X[?} 262 function of m and soe if ‘that converges fo 0 as n 0. And to determine convergence in distribution, we have to find F,(z), the CDF of Xq, and see if that converges to a CDP at the continuity pointe96. FURTHER COMMENTS ON CONVERGENCE OF SEQUENCE OF RV.'S 9:17 of that CDF asm — 0, Unfortunately, there are no magical shortcuts in establishing the convergence of random sequences other than the application of the definitions and the use of established relationships among diferent types of comvergences. 6, We note that when dealing with sequences of rx, usvally we are given the statistics, that is, the PDP ot CDF of the x.¥-'s in the sequence, These statistics are sufcient to determine whether the sequence converges in distribution, in probability, and/or in ‘quadratic mean, However, thete statistics may or may not be sufcieat to determine the convergence ofthe sequence a. (e,,w-p-). Through the use of Theorem 9.4 and ‘based on the rate of decrease of the probabilities P(|%q— | > €), we may be able to determine as. convergence. However, in general, to determine as. convergence, the vce Xqls) have to be specified as fonctions of s, the outcomes in the sample space. ‘When the tpedfcation of the xv.'s as functions, isnot availble, which frequently isthe case, we canaot arbitrarily assume these specifications. In that case, to determine as. convergence, the best we can dois to check if the condition of Theorem 94 is catcfod or not, If tie xot satisfied, we simply cannot say whether the sequence converges a. of not, The random sequence In the example below illustrates this pont. 1 Wei ow dame wete a eu. conrgia te ot sping te crept eaton ‘Example 9.5 Consider the following sequence of 0-1 r.9.'s {Xn} for which ray=0) = 1-3, oa Poteet) = 2 (04s) {for n= 1,2,3y..u Determine whether or not the random sequence {Xu} converges to 0, (a) in distribution, (b) sn probeilty, (c) in quadratic mean, (d) elmost surely - in that order. 4) Based on the probbilitcs given, the CDF of Xr. is given by 0 ze0 Fy(2)= 41-3} Ogze1 (o.44) 1” ise9.7 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES As no, Brg(2) converges to 0 2<0 mio-{2 135 = Which ie the CDF of X = 0. Therefore, X20. 2) We have PU 01> 0) = PURE) = Peo) (048) 2, (9.47) for 0.<€< 1, It is equal t0 0 for ¢ > 1. Hence, in either ase, PIIXy —O} > OO. Therefore, Xq 2 0. ) We compute BUXe- oP) = B(x = oF (1-2) +a" (ons = bao, nse (04) Therefore, Hy 2 0 4) We cannot soy whether {Xq) converges to 0 almost surely or not, because ve are not given Xq()}: S — Re, explicitly, Moreover, th ae of convergence of P(Xq-0l > a 24 {snot fast enough to eneune convergence elmott surely acconing to Theorems EOE ‘Weak Law of Large Numbers (WLLN) Inthe view of many, Weak Law of Large Numbers (WEEN) isthe single most important result in Frobabilty Theory, because it provides justification for our intuitive notions on probability, within the mathematical framework established ae the Probabity Theory. For example if we ave interested in the value of a random quantity, that is, a quantity that takes different value cach time it is measured, how would we attempt to find ite value? Without the benefit of Probability Theory, any reasonable perton would take eeveral measurements ofthis quantity ‘and take the arithmetic everage of these measurements and would tend to use that average as9.7. WEAK LAW OF LARGE NUMBERS (WELN) eas ‘the value of that random quantity. How many measurements would we have to take and how such confidence would we have in using that average as the value of the quantity? Again, any reasonable person would opt to take az many measurements as time and resources allow, because one would have greater confidence in the average iit is based on a greater umber of measurements. That is what our intuition tells us. Ie that the right thing to do? What isthe justification for it? Now that we have established che mathematical framework of Probability ‘Theory, we should be able to answer thete questions prectely. Weak law of Large Numbers provides the anewers to theve questions. Hence, it s the bridge between our intuition and the theory, In the mathematical framework of Probability Theory, we would represent the random quantity of interest by a rv., say X. If we want 2 value for it, it would be B(X} = p, the ‘expected value of X. But wedo not know the PDF ot CDF of X, so we cannot find E(X) nor its variance 0. Taking » independent messurements ofthe random quantity would correspond to taking realizations /samples from independent identically distributed copies of X, which we can represent by the L.d. v's {X%1,Xay---sNq}s and the arithmetic average of the measurements swowldl earvenpand to taking a ealization feng from yy = Ste (250) leh ea ctl the sre an, Ts, the thei ean of» oben coripnde 04 stl fh. Yast. Why than ae ing ‘this single sample/realization of the sample mean Y,, to estimate/approximate E{X}? Why woud be more coneat with chi etiata/spproximation i age We can averse mos detain he ait of, th seg an, Saad andes bed wih X, ne ave Sib t {Rs} 0 2d ee 3) BX) = 1 (os) EG) ‘Thus, the expected value of Yq is equal to the expected value E(X) = that we ae trying to ‘estimate/approximate, But that was also true for a single realization/sample. The key question in: Ie the realization of the sample mean close to B{X) with a lage probability, so that we ean be confident that we have a good estimate for E{X}, That of course depends on the variance9.29 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES of the rv. that is being evaluated, i., eample mean. Therefore, we need to find the variance of Ye: o, } = Snvertx) (os) (9.54) ‘Thus, the vasance of the comple mean is one n-th of the variance of the origina rv. X. ‘That we know will cause individual realizatione/samples to be closer to the mean witha larger probability. Specifically, using the Chebyshev inequality, we have, for any € > 0, Pilvs-ul20 s es) ‘tom which wo ote that: No matter how small «might be as n+ 0, the RUS gosto, which then implies that the LHS also goes to zero. That means, as, the numberof meuacemests the ashmetc average is based on, is increasod, the probability that the arithmeti average of those measurements js outside ¢ range of the tue mean B(X} = pcan be made as dose to 0 as dened, no matter how small cis. And equivalently, by increasing m, the probability ‘hat the arithmetic average of the meatuterent ie within e range ofthe true mes (2) can be made as dose to 1 as dasized, no matter how small ‘very lange for this to be tre for a mall But tha i ot important; the important pont i that by the above result, 26 n + 2, the sample mean 2¥. Ya converges in probability tothe true mean E{X) =p. That iy ¥y > y. This result is called the Weak Law of Large [Numbers [ti called the *weak law”, becangetimplce convergence in probability - = weaker songs type is. Of course, m may have to be type of convergence; the “strong law” corresponds to almost sure convergence of convergence.) ‘Thus, our intuition is correct (proved by theory) in trying to use the arithmetic average of measurements to estimate/approximate the value (she expected value) of arandom quantity. It Js also correct in believing that the larger the number of measurements the arithmetic average Js based on, the more confidence we have in the estimate, All of this is essentially proved above ‘The following example will demonstrate what WLLN implies in a specific example. Example 8.6 Consider the random quantity represented by the r-0. X. We wont 10 esti- mats/approsimate E{X) based on the arithmetic average of m independent measurements of ‘this random quantity.9.7. WEAK LAW OF LARGE NUMBERS (WEEN) en If we base the sample mean on 100 observations of X, i take m = 100, find a lower found for the probability that our estimate is within 0.57 of the true mean BX), where o i the standard deviation of X, which is not known. That i ¢ = 0.5e. Then using Chebysher inequality given above, we hove Pion PU2 O86) < ap tone = 008. (256) Considering the complement event, we hve P(|Yieo ~ E{X}| < 0.50) 2 0.96. (957) ‘Ths, we con say that we are “96% confident thet the sample mean wit 100 measurements falls within 0.50 of the true mean E(X}*. With 500 observations, we can say: P(IYine ~ E{X}1< 0.60) > 0.992. We can also pose the question as follow: What is the minimum value for n, if we wont 10 be $98 confident that the sample mean ie within 0.10 ofthe true mean? So we want P( (Ya ~ B{XY< te) > 0.99, Thus, 1-099 = cela. That implies n = 10,000. So, with fairly tringent specifications the number of observation has to be guite lage BOE ‘A couple of semasks are in order hore: First, as pointed out in the above example, may have to be fisy large, if we expect to have the estimate close to the true value with 1 high probability: So WLLN may not appear to be a strong result, but we should Leep in mind that no information'about the distribution of X was veed in getting this result. IF we happen to know the distribution of X, hence that of ¥, then we could compute the probability PilYs ~ E{X}| < 6 exactly, rather than getting a lower bound for it using the Chebyshev Inoguality. But without that information Chebyshev inoguaity isthe best we can do. Second point we want to makes that: WLLN says that with acertainn, the sample mean vil be within ¢ ofthe true mean with a certain probability. But it does not guaraniee that the particular sample of ¥ that we observe will necesaly be within ¢ range ofthe true mean. So, ‘tis quite possible (with nonzero probability) that the particular sample we happen to observe may be outside the ¢ range. WLLN (or the Chebyshev inequality) gives only @ probability that i& Join that range. The higher that probability is, more confident we can be tha it is within the range, but we cannot be absolutely certala that It i922 (CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES Special Case: Bernoulli Trials A special case of WLLN is the Bernoulli trials, Again, we can pose the problem intuitively. Suppose we areinterested in determining the probability of an event A, thatis P(A), How can we determine this er obtain an estimate/approximate value for it? Our intuition tells us to perform the random experiment m times (assuming we can do that), count the number of times event A occurs, say it isk, form the ratio £, called the relative frequency, and tend :0 believe that the relative frequency elves an estimate of P(A). We aso believe that, the larger estimate for P(A). Again, the questions is, more confidence we will have inthe accuracy of are: Is this the sight thing to do What i the justification fori? The answers, once again, ie in the WLLN. Actually, this isa special case of the WLLN discussed above Consider the 0-1 rv, also ealled the indictor function, associated with the event A, { A ised (omy 14= 10 otherwise ‘That is, X takes value 1, if event A occurs; 0, if does not oecar. Tt follows that E(X} = PAY LX correspond to independant trials of this experiment, called the Bernoulli ‘rials. Then, Xi, +Xat..4%Xq 2 Ke corresponds to the number of times event A oorurs in m independest ‘Ths, the quantity we are trying to determine is B(X}. The iid. copies ofthe xv tas ofthe random experiment. Ther. Ku is» binomilydntbuted rv. with parame (op) where p = P(A), For a binomial dstbuted r. we know that (KG) = np and Var{ Hn} = np(1 = p). ‘The quantity, we called relative frequency, that is, the rato of the mumber of times event A occurs ton, corresponds to the rv. Ke = abKitti%e, which ie precisely what we called the sample mean in the general formulation of WLLN. It flows that Or, equivalently, > 1-Pa=z) (eo)97. WEAK LAW OF LARGE NUMBERS (WILN} 923 ‘Thus, we have: the probsbilty that the rv. Ke, that is, the relative frequency takes a value ‘utide rage of p= P(A) goes to 0 as n ~ oo; or equivalently, the probability that i takes valve within ¢ range ofp = P(A) goes to Las n+ oo Ik ater words, Me 2 P(A). Thus, ‘once agai, our intuition correct in belving that a reazation of the relative Segueny gives an eatimate ofthe dened probability P(A) and that we can be more conden af the accuracy ‘ofthe etimate if itis based on a large numberof tala In the special cae of Bernouli tials, since we know exacy what the distribution ofthe sn, By it except forth parameter p, we can fad the probability PAB — p< 6) exactly in terms ofp. Hence, we donot have to settle with the Tower Bound provided bythe Chebyshev ‘inequality. Therefore, we have P(E =a] 0, (098) 100 PU —05|< 005) > 0, (00) sm Pp —08}< 008) > 075, (o0) so K-05 |< 0.08) > 080 (ony So me se tht, for n= 50 and n = 100, the ler bound provided by the Chebyshev inequality is totaly wseles, The Gaussian epprosimation for this proability forth same salaes of are: PAR 05|< 0.08) ¥ 082, (om) n= 100 P(| Bs - 0.5 |< 0.05) © 0.68, (9.73) n= 40 PAE -05|< 005) & 098, (om n= 000 PUA -05}< 05) ¥ 0.907 (075) We can interpret one line ofthis equation (eay tne ) os if we tae the coin 00 times and form the rato, then tis number wil be in [0480.55] with probity 0.05. Whereas, the Chebyshev inequality only says this probbilty wil be greater then 0.78. We se thatthe Bound provided by the Chebyshev inequality is useless for small ond not very tight for larger. But ‘again without the knowledge of the distribution of ther, that i the best we can do. EOE ‘As mentioned above, without the knowledge of p, normally we cannot ase the Chebyshev inequality or the Gaussian approximation to compute the desired probability for a specific »9.7. WEAK LAW OF LARGE NUMBERS (WILN) 9.25 and « In the general WLLN case, we were able to use the Chebyshev inequality, by leting = ca where @ is the standard deviation of the rv. involved. But lesting € to a specific ‘omerial value rather than a multiple of ¢ leaves pin the expressions. Hence, without p we ‘cannot compute them. But we can get yet another bourd forthe probability. Continuing with the above example, for given n and ¢, and unksowa p, the Gaussian approximation implies 2(-a]<) © ot (Ges) (078) 1-20 («ve zat) : er [Now without knowing p we cannot compute the RHS, but we kuow that the quaatity {p(1—p)) ie maximoum when p= 0.5. Therefore, t+ Is minimum when p= 0.5. Then, since erf() isa monotone increasing function, the RAS of the above equation takes its minknam value for 'p=05. Thus, we conclude that the LHS is greater than the value of the RHS corresponding to p05. Therefore, we have K. p(|E- ( hich is vali for any m and ¢ This inequality provides a lower hound forthe probability that the relative frequency Mp. within ange ofthe unknowa p = P(A), without having to know P <6) > eftevis) = 1-200), (or In the previous example, cuppose that the con i not flr and that p = P(H)is unknown and itis to be determinod/estimated using the relative frequency. For n = we have 2912) P(e =a] < 008) > e005) oss. om) ‘Ther, we conclude that we are “Q5% confident thatthe relative frequency fs_we observe in Ai a wa O98 ng rn tk) peat = OR), He, he release dt int (in O08 ofthe aon p with hg ong (0.95) probability. Strong Law of Large Numbers (SLLN) ‘The weak law of lasge numbers considers conditions under which the sample mean of sequence of rs converges in probability tothe arithmetic mean of the expected values of9:26 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES the rs, That i, . Edw FoR] Zo (6.80) Above we proved the version of WLLN for iid. x's, Specifically, 23x Fa, (oat) where £{X;} = u. Whereas the Strong Law of Large Numbers (SLLN) éeals with the convergence of such sequences to the same limit almost surely, ‘That is, SLLN aseerts that ED pe- 200) 4 (082) ‘There are several difereat versions of SLLN, each one stating ffom a diferent eet of assumptions. Some of these are: Kolmogorov’s first and second SLLNs and Boras SLLN. 9.8 Central Limit Theorem (Central Limit Theorem (CLT) provides an explanation to the fact that quite a few random ‘quantities that are encountered in nature ar in physical proceser are Gaussian (normally) stribated. CLT states that the sum of several rs of arbitrary distribution tends to be Gaussian distributed asthe number ofr.v.'s tend to 00, and the convergence ie quite fst. Since the random quantities in natare or physical procesies are uevally the sum of small independent ‘fects (eg, thermal noise and shot noise), the fact that there random quantities tend to be Gaussian distributed is theoretically justified by the CLT. ‘There are differnt versions of the CLT that start with diferent assumptions. Here we will state and prove one simple version. Suppose Xe are id, nv's with mean 4 and vasiance let ox (os) Sy has mean nt and variance no2, Now define the normalized cample meen Sy—me vie So, Z is 0 mean and unit variance rv, forall tn (984) CLT asserts that: Z, 2+ N(0,1), as n— oo. In practice, the convergence to (0,1) Js quite fast. For finite n, this also means 5, is approximately (nano?) for X; having98. CENTRAL LIMIT THEOREM oar aay distribution, For example, in Bernoulli Tvals diteutved above, X; = Ia. So Ky (which cocresponds to 5; here) it approximately 1V(np,np(1 ~ p))- This Gaussian approximation of 4 Binomial distributed rv. is known as the De Moiere-Laplace approzimation, Note that the vnmormalized rv. Sy has a mean and vatiance that grows without bound as n ~+ oo. Hence, we can consider $, only for finite n ‘We now provide s proof forthe fact that Z converges in distribution to (0,1). Consider the characteritie function (CF) of the normalize 2. Zy E{e} = 2 {ee Ee 2.659) aa} (985) S af (986) = [efaee (est) bar om where olin) = 2 {ee } Note that in line 9 of the above equation, we used the fact at XP ane inept, ‘power series expansion of ¢ (dt) i ven By (Bare Ba eg Bluens 0 w= 2{( 0% Ie follows that wf = 0 and w} = 1. Thus, for large nthe above series can be approximated a: Fm (992) Substtting tis into the expresion forthe CF of 2 above, we get fim, $2,640) = 7, (999) whichis the CF of a (0,1) 1.7. Thus, the CF of Zq converges pointwise to the N(0,1) CP. ‘This then implies that CDF of Z, converges poletwise to the (0,1) CDP. Therefore, we have9-28 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES 24 2 (0,1), a2 oo. This completes the prof ofthis sinple version of CLT involving iid. no “Thereare diferent versions of the CLT based on itr esmaptions, sucha, Xi being independent but nt ientialy distributed. Ia this at becomes necessary to add conditions to esure thet each X; hats nglisble eet on the asymptotic distribution, so that no term Aomiastes the sum. This i accomplished by having lowe or upper bounds on cata moments of Xi We note that CLP involves the convergence of the dsibio faction, nt the deni function, or Xs tht are discrete tv's, Z's ae alo dices, Hence, Z's PDF wil bea sum of poles, early not converging to a Gansian PDF. But the distribution function of Zn beng stop fection, wil converge to 8 Gaussian distribution function, However, i Xs se continuous £0, then under some regularity conditions it an be shown thatthe PDF of 2 converges to aut soomal PDF. 1 can be verfed by examples thatthe convergence ofthe ditibution of Z, to Gaus ‘sian distribution is remarkably fast. For example, for X;'s uniformly distributed on (0, 1], the distribution of 24 ot Zs i quite ote toa Gaassan distbaton. in fat, ome compat tt vyaregeorate Ganssias random muzbers by a normaliced sm of 1 eifrm random numbers. Specially, for X; uniformly dtibuted on [0,1] having p= } and o? = gp the x. v= (Ex)-s, (o99 Js used to approximate (0,1), Gaussian rv. with zero mean and unit variance 9.9 Problems 1. Let § = [0,1] be the sample space, F be the Borel sets on [0,1], and the probability fanction be defined as P((a,b]) = ba for OS a 0 otherwise Determine whether o not {x} converges, 42) in distribution, ») in probability, )in quadratic mean, 4) almost surely (Answer the questions in the order they are asked) Consider the random sequence {Xx}, where X, isa rv. uniformly distibuted on (0, Determine whether or aot {%q} converges a)in distribution, ») in probability, ) in quadsatie mean, 4) almost surely? If so, to what? (Answer the questions inthe order they are asked.) Suppote 4 is av. uniformly distsibated on (0,1) snd the random sequence (Xq) or 21, i defined as fllows: xy aa, Determine whether or not the sequence {Xq) converges )in distribution, »b) in probability, ) in quadratic mean, 4) almost surely. 20, to what? (Anewer the questions in the onder they are asked) Soppote A is a 2. uniformly distributed on [0,1] and the random sequence {Xq) for ‘22, is dafined a fllows: Determine whether or not the sequence {X,} converges a) in distribution,9.30 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES )in probability, ©) in quadratic mean. 6. signe pray ue on. ate don sue (2) fr se Adan meted joni, Determine whether or not the sequence (JX%q) converges etibation, b)in probability, )in quadratic mean. 7. Consider the random sequence {Xn}, where r.v. Xq has the probsbilty density fonction mie {QPP Boge 4) Find the cumulative distribution function of X +) Determine whether or not {2} converges toa limiting ¢.v. in distribution, If <0, to what? Specify the Limiting r.¥. X. If not, explain why. 1. Let Xq bea sequence of independent 0-11.v's taking values O and 1 with equal probability. 3) Suppoce re Yq are defied ae weelhx Determine whether or not {Yq} converge, and i 0, in what sens ') Repeat part(a) forthe random sequence {Zn}, where a-osftx 9, Suppose X's are independent and uniformly distsbuted on (0,1). Consider Ye = me mindXiy Kaye Nady WENZT 2) Determine the CDP of Yq. 1) Does the sequence {Ya} converge in distribution? Wit does, vo what? 10. (Thomas 1971) Let (%q} be a sequence of id, rs that are uniformly distributed on (0,0). Consider the folowing sequences Yu = maz[XyXayonKs = n(a-%e) 1) Determine whether {Yq} converges in probability or not, 1) Determine whether {Za} converge in distribution or not98, n 2 13. M4. 16. 16. uw PROBLEMS eat Hint: lirigos (1 ~ Suppose {Xn} is & sequence of nor-negative r.'s and that 9() 8 a non-negative, non- decreasing faction. Show that 8) POe2 9) ¢ EGP for any 3 > Oand any n +b) Xn 20, if B{g(Xn)} — 0. Suppose {Xq} i a sequence of Lid. rvs where each X; takes value +1 with probebility and —1 with probability 1 ~ p. Consider the random walk process {Yq} where Vy HX ete, 0 1,2,3, a) Find the mean and variance of Yq. ') Find Cm, a] the sutocovariance of {Ya}: ©) Find P(Van = 16) Let XX, cou XX where |p] <1. Find the mean and variance oftheir somple mean Mit Xa bent Xe Suppose in a random experiment P(A) = p for some event A. The expesiment is reprated ‘seceively and independently. Determine Ny the smallest number of times the exper ‘met trust be repeated, so that the relative flequency is within 0.02 neighborhood of p ‘with atleast 0.98 probability. (Use the Gaussian approximation to binomial dstibution,) Determine and sketch the PDF of Z, the sum of four Lid. tv's uniformly distibated on (0,1). Compare the numerical value of f(z) with the Gauatian PDF with the same ‘mean and vasiance as Z, at the mean and ore and twa standard deviations away from, the mesa, A fair de ie tossed 100 times, Let Xj be the number (face) that shows inthe i ~ th toss 4) Using Chebyshoyicoguality, find a bound oa the probability thatthe total number af spots (um of faces that show), chat, is beeen 316 and 84 3) Using cantral limit theorem (that is, the Gauasian assomption with the caleaated ‘ean sad variance), estimate the probability that Y is between 316 and 384. Com- pave your enswer with that of pst (3). Suppote the ftime of «component i exponentially dstibuted with PDP Sx(a) = 4e-*9(2) When the component fl ts immediatly replaced by another one with independent and identical statisties, and the same s dane when that fas, as long as spare components at932 CHAPTER 9. CONVERGENCE OF RANDOM SEQUENCES 4) Using Chebyohev inequality, determine a boun forthe probability that, with a total ‘20 components the system wil stl be foretioning at = 10 1s this probability & lower bound or an upper bound? ») Again using Chebyshev inequality, determine the minimum number of components ‘owded such tht the probability thatthe syst wil til be fenctoning at = 10 is greater than O38. ) Ropest part (b), thie time using the central limit theorem (that it, Gausiaa) approx Tmition with the calculated mean and rarianee | | |

Convergence of Random Sequences

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Convergence of Random Sequences

Uploaded by

Copyright:

Available Formats

You might also like