Chap 4

Chapter 4 The First and Second Moment Methods
When analyzing randomized algorithms or data structures with random inputs, we are usually interested in time complexities, storage requirements or the value of a particular parameter characterizing the algorithm. These quantities are random (non-deterministic) since either the input is assumed to vary or the algorithm itself makes random decisions. We often are satis ed with average values, however for many applications this might be too simplistic. In many instances these quantities of interest, though random, behave in a very deterministic manner for large inputs. The rst and second moment methods are the most popular probabilistic tools used to derive such relationships. We discuss them in this chapter together with some interesting applications.
MAGINE that Xn is a random variable representing a quantity of interest (e.g., the number cult to compute the average E Xn ] of Xn but it is equally easy to come up with examples that show how poor such a measure of variability of Xn could be. But, often we can discover a re ned information about Xn ; e.g., that with high probability Xn E Xn ] as n ! 1. In this case, the random variable Xn converges (in probability or almost surely) to a deterministic value an = E Xn ] when n is larger and larger (cf. Example 2). Consider the following example: Let Mn = maxfC1 ; : : : ; Cn g where Ci are dependent random variables (e.g., Mn may represent the longest path in a digital tree). Again, Mn varies in a random fashion but for large n we shall nd out that Mn an where an is a (deterministic) sequence (cf. Example 1). More precisely, for every " > 0 the probability of (1 ")an Mn (1 + ")an becomes closer and closer to 1, thus according to the de nition pr from Section 2.2.2 we say that Mn !an or Mn ! an (pr.). These and other problems can be tackled and solved successfully by two simple probabilistic methods called the rst and second moment methods. They resemble Boole's and Bonferroni's inequalities, but instead of bounding the probability of a union of events by marginal probabilities they use the rst and the second moments. We also brie y discuss the fourth moment method recently proposed by B. Berger 38]. Here is the plan for the chapter: We rst discuss some theoretical underpinning of the methods which we illustrate through a handful examples. Then, we present several applications such as a Markov approximation of a stationary distribution, computing the number of primes dividing a given number, and estimating the height in tries, PATRICIA tries, digital search trees and su x trees.
Iof pattern occurrences in a random text of length n). Usually, it is not di
The First and Second Moment Methods
4.1 The Methods

Let us start by recalling Markov's inequality (2.6) from Chapter 2: For a nonnegative random variable X , we can bound the probability that X exceeds t > 0 as follows: PrfX tg E X ]=t. If in addition, X is integer-valued random variable, after setting t = 1 we obtain
PrfX > 0g
E X ];
1 X
k=0
(4.1)
which is the rst moment method. We can derive it in another manner (without restoring to the Markov inequality). Observe that
PrfX > 0g =
1 X
k=1
PrfX = kg
kPrfX = kg = E X ]:
(3.6) follows. In a typical application of (4.1), we expect to show that E X ] ! 0, and hence that X = 0 occurs with high probability (whp). Throughout, we shall often write whp instead a more formal \convergence in probability". We illustrate the rst moment method on a simple example. Example 4.1: Maximum of Dependent Random Variables Let Mn = maxfX1 ; : : : ; Xn g where Xi are dependent but identically distributed random variables with the distribution function F ( ). We assume that the distribution function is such that for all c > 1 1 F (cy) = (y)(1 F (y)); where (y) ! 0 as y ! 1. We shall prove that whp Mn =an 1 where an is the smallest solution of the following equation:
The above implies Boole's inequality (3.6). Indeed, let Ai (i = 1; : : : ; n) be events, and set X = I (A1 )+ + I (An ) where, as before, I (A) = 1 if A occurs, and zero otherwise. Inequality
S provided an ! 1 as n ! 1. Indeed, observe that fMn > xg = n=1 Ai where Ai = fXi > i xg. Thus, by the rst moment method,
PrfMn > xg PrfMn > (1 + ")an g
n(1 F (an )) = 1;
nE I (A1 )] = n(1 F (x)):
Let us set now x = (1 + ")an for any " > 0, where an is de ned as above. Then
n(1 F ((1 + ")an )) = (an )n(1 F (an )) = (an ) ! 0;
since an ! 1. Thus Mn (1 + ")an whp, as desired (cf. also Chapter 3.3.2).
The rst moment method uses only the knowledge of the average value of X to bound the probability. A better estimate of the probability can be obtained if we know the variance of X . Indeed, Chebyshev already noticed that (cf. (2.7)) Var X ] : PrfjX E X ]j tg Setting in the Chebyshev inequality t = jE X ]j we arrive at Var X ] : PrfX = 0g E X ]2 Indeed, we have
PrfX = 0g PrfX (X 2E X ])
t2
(4.2)
0g = PrfjX E X ]j jE X ]jg
Var X ] E X ]2
Inequality (4.2) is known as (Chebyshev's version of) the second moment method. We now derive a re nements of (4.2). Shepp 374] proposed to apply Schwarz's inequality (2.8) to obtain the following:
E X ]2 = E I (X 6= 0)X ]2 E I (X 6= 0)]E X 2 ] = PrfI (X 6= 0)gE X 2 ];

which leads to a re nement of the second moment inequality Var X ] : PrfX = 0g (4.3) E X 2] Actually, another formulation of Shepp's inequality due to Chung and Erd}s is quite o useful. Consider Xn = I (A1 ) + + I (An ) for a sequence of events A1 ; : : : ; An , and observe S that fXn > 0g = n=1 Ai . Rewriting (4.3) as i
PrfXn > 0g
E Xn]2 ; 2 E Xn ]
we obtain after some simple algebra

Prf
n
g) Ai g Pn Prf( gi=1 PrfAiPrfA \ A g : Ai + Pi6=j i j i=1 i=1

2
Pn
(4.4)
pr 2 In a typical application, if we are able to prove that Var Xn ]=E Xn ] ! 0, then Xn !E Xn ] as n ! 1, that is, for any " > 0 nlim PrfjXn !1
E Xn]j "E Xn ]g = 0:
We present below two simple examples delaying a discussion of a more sophisticated problem till the application Section 4.2. Example 4.2: Given Pattern Occurrences in a Random Text Let H be a given string (pattern) of length m, and T be a random string of length n. Both strings are over a nite alphabet, and we assume that m = o(n). Our goal is to nd a typical (in a probabilistic sense) behavior of the number of the pattern occurrences in T . We denote this by On . Is it true that On E On ] whp? We compute the mean E On ] and the variance Var On ]. Let Ii be a random variable equal to 1 if the pattern H occurs at position i, and 0 otherwise. Clearly On = I1 + I2 + + In m+1 , and hence
E On ] =
nX m+1 i=1
E Ii ] = (n m + 1)P (H );
where P (H ) = PrfTii+m 1 = H g for some 1 i n m + 1, that is, the probability that the pattern H occurs at position i of the text T . The variance is also easy to compute since
Var On ] =
where
nX m+1 i=1
E Ii2] +
X
1 i<j n m+1
Cov IiIj ] =
nX m+1 i=1
E Ii] +
X
1 i<j n m+1
Cov Ii Ij ];
Cov IiIj ] = E IiIj ] E Ii]E Ij ]: Let us compute the covariance Cov Ii Ij ]. Observe that Cov Ii Ij ] = 0 for jj ij > m, and otherwise Cov Ii Ij ] E Ii Ij ] E Ii ] = P (H ). Thus X E Ii] (n m + 1)P (H ) + 2m2 P (H ): Var On] (n m + 1)P (H ) +
jj ij m
In view of the above, and Chebyshev's inequality we obtain for any " > 0
PrfjOn =E On ] 1j > "g
(n m + 1 + 2m2 )P (H ) "2 (n m + 1)2 P 2 (H ) 1 2 2 + "2 (n m m 1)2 P (H ) ! 0; "2 (n m + 1)P (H ) +
In the next example, we consider a deterministic problem and use randomized technique to solve it. It is adopted from Alon and Spencer 9]. Example 4.3: Erd}s Distinct Sum Problem o
whp. A fuller discussion of this problem can be found in Section 7.6.2.
provided m = o(n) (we remind the reader that H is given). This proves that On =E On ] ! 1
Consider a set of positive integers fx1 ; : : : ; xk g f1; : : : ; ng. Let f (n) denote the maximal k such that there exists a set fx1 ; : : : ; xk g with distinct sums. The simplest set with distinct sums is f2i : i log2 ng, which also shows that f (n) 1 + blog2 nc. Erd}s asked to prove o that for some constant C . We shall show that
f (n) log2 n + C
f (n) log2 n + log2 log2 n + O(1):

Indeed, let fx1 ; : : : ; xk g be a distinct sum set, and let Bi be a random variable taking values 0 and 1 with equal probability 1 . Consider 2
X = B1 x1 +
Certainly,
+ B k xk :
2 2 2 Var X ] = x1 + 4 + xk n4k : p Using Chebyshev's inequality with t = Var X ] we obtain, after reversing the inequality,
E X ] = x1 + 2 + xk ;
1 for some
PrfjX
E X ]j
n k=2g
fx1 ; : : : ; xk g is a distinct sum set, the p probability that X has a particular value that is equal p either to 0 or 2 k . Since there are n k values within jX E X ]j n k=2, we obtain p p PrfjX E X ]j n k=2g 2 k n k:
2 n p 1 k
k
2
> 1. On the other hand, we should observe that due to the assumption that
Comparing the above two inequalities leads to
After setting the optimal = 3 we arrive at
f (n) log2 n + log2 log2 n + O(1);

which completes the proof of the claim.
Finally, we discuss the fourth moment method due to Berger 38]. We shall follow the presentation of Co man and Lueker 63]. We start with a simple (deterministic) inequality. Consider for x 0 the function 3 f (x) = qx 3=2x ; where q is a parameter.p It is easy to check that the function reaches its maximum value p fmax = 2=(3 3) at x = q=3. Thus we can write
3 3 x2 x4 =q : 2pq
After setting q = 3E Y 4 ]=E Y 2 ], we nally obtain Berger's fourth moment inequality
If we now replace x by a random variable, say Y , take the expectation of both sides of the above, then we arrive at ! p 3 3 E Y 2] E Y 4] : E jY j] 2pq q
E Y 2 3=2 E jY j] E Y 4 ]]1=2 :
(4.5)
We illustrate the fourth moment method on a simple example, while the reader is encouraged to read Berger 38] for considerably deeper analysis. Example 4.4: Discrepancy of a Sum of Random Variables As in 63], consider Y = X1 + + Xn where Xi 2 f 1; 0; 1g are i.i.d. and PrfXi = 1g = PrfXi = +1g = p while PrfXi = 0g = 1 2p. Actually, we allow p to vary with n provided np 1. We wish to nd a lower bound for E jY j]. Observe that E Y 2 ] = np and
E Y 4] =
n n n n XX XX i=1 j =1 k=1 l=1
E XiXj Xk Xl ]:
When all indices i; j; k; l are di erent, then clearly E Xi Xj Xk Xl ] = 0. In fact, E Xi Xj Xk Xl ] is not vanishing only if all four indices are the same or two of them are equal to one value and the other two equal to another value. But there is only O(n2 ) such indices, thus E Y 4 ] = (n2 p2 ), and by the fourth moment inequality we prove that
E jY j] = (pnp): In fact, one can prove that E jY j] = (pnp).
We summarize this section by presenting all the results derived so far.
Theorem 4.1 Let X be a random variable. First Moment Method] If X is nonnegative, integer-valued, then PrfX > 0g E X ]: Chebyshev's Second Moment Method] Var X ] : PrfX = 0g E X ]2 Shepp's Second Moment Method] Var X ] : PrfX = 0g E X 2] Chung and Erd}s' Second Moment Method] For any set of events A1 ; : : : ; An o P n ( n=1 PrfAi g)2 i P : Prf Ai g Pn i=1 PrfAi g + i6=j PrfAi \ Aj g i=1 Fourth Moment Method] For an arbitrary random variable Y 2 3=2 E jY j] E Y 4 ]]1=2 ; EY
provided it possess the rst four moments.
4.2 Applications
It is time to apply what we have learned so far to some interesting problems. We start this section with a kth order Markov approximation of a stationary distribution (cf. Section 4.2.1). Then we shall present the Hardy and Ramanujan result 181] concerning the number of primes dividing a large integer n (cf. Section 4.2.2), and nally we consider the height in digital trees and su x trees (cf. Sections 4.2.3{4.2.6). In Chapter 6 we shall discuss other applications of information theoretical avor in which the methods of this chapter are used. In information theory and other applications, one often approximates a stationary process by a kth order Markov chain. Such an approximation can be easily derived through the Markov inequality, hence the rst moment method. We discuss it next. The presentation below is adopted from Cover and Thomas 69].
4.2.1 Markov Approximation of a Stationary Distribution
Let fXk g1 1 be a two sided stationary process with the underlying distribution P (xn ) = 1 k= n n PrfX1 = xn g. We consider the random variable P (X0 1 ) and estimate its probabilistic 1 behavior through a kth order Markov approximation. The latter is de ned as follows:
n k P k (X0 1 ) := P (X0 1 ) n 1 Y i=k
1 P (Xi jXii k )
for xed k. n n Let us evaluate the ratio P k (X0 1 )=P (X0 1 ) by the rst moment method. First of all, we have " k n1# n X n 1 P k (x 0 1 ) X E P ((X n0 1 )) = P (x0 ) n 1 P 0 P (x0 ) xn 1 =
0 X
Now, by Markov's inequality, for any an ! 1 ) ( ( ) n n 1 log P k (X0 1 ) 1 log a = Pr P k (X0 1 ) a 1: Pr n n n 1) n 1) n P (X0 n an P (X0 Taking an = n2 , we conclude by the Borel-Cantelli lemma that the event n 1 log P k (X0 1 ) 1 log a n n n P (X0 1 ) n occurs only nitely often with probability one. Hence, k n 1 1 0 lim sup n log P (Xn 1 ) 0 (a:s:): n!1 P (X0 ) n n 1 Actually, we can also nd an upper bound on P (X0 1 ). Let us consider P (X0 1 jX 1 ) n 1 ). We shall use Markov's inequality, so we must rst evaluate as an approximation of P (X0 " # " # " n n P (X0 1 ) P (X0 1 ) X 1 i E P (X n 1 jX 1 ) = EX 1 EX0n 1 P (X n 1 jX 1 ) 1 1 1 1 0 0
xn 1 0
n P k (x0 1 ) 1:
By the same argument as above we conclude that n 1 1 lim sup n log P (X10 ) 1 0 (a:s:): n n!1 P (X0 jX 1 ) In summary, we formulate the following lemma (cf. 8, 69]).
3 2 n 1) X P (x0 n 1 1 7 = EX 1 6 1 4 n 1 P (x jX 1 )5 1: P (x0 1jX 1 ) 0 x0 1

n
Lemma 4.2 (Algoet and Cover 1988 Sandwich Lemma) Let P be a stationary measure and P k its kth order Markov approximation, as de ned above. Then
k n 1 1 0 lim sup n log P (Xn 1 ) n!1 P (X0 ) n 1 1 lim sup n log P (X10 ) 1 n n!1 P (X0 jX 1 )
0 0
(a:s:); (a:s:)
(4.6) (4.7)
for large n.
In fact, the proof of the above lemma allows us to conclude the following: Given " > 0, there exists N" such that for n > N" with probability at least 1 " we have 1 P k (X n 1 ) P (X n 1 ) n2 P (X n 1 jX 1 ):
n2
The \sandwich lemma" was proved by Algoet and Cover 8] who used it to establish the existence of the entropy rate for general stationary processes. We shall return to it in Chapter 6. We present below Turan's proof of the Hardy and Ramanujan result concerning the number of primes dividing a number chosen randomly between 1 and n. We denote the number by (n), and we prove that n has close to ln ln n prime factors. We use the second moment method, thus we again solve a deterministic problem by a probabilistic tool. We shall follow the exposition of Alon and Spencer 9]. Let throughout this section p denote a prime number. We need two well know results from number theory (cf. 181, 413]), namely, X1 = ln ln x + C + o(1); (4.8)
4.2.2 Excursion into Number Theory
x (x) = ln x (1 + o(1)); (4.9) where C is a constant C = 0:26149 : : : , and (x) denotes the number of primes smaller than x. We now choose x randomly from the set f1; : : : ; ng. For prime p we set Xp = 1 if pjx, and zero otherwise. We ask for the number of prime factors (x) of x. Observe that X :=
p xp
p n
Xp = (x):
10
and by (4.8) we also have
Since x can be chosen on n di erent ways, and in bn=pc cases it will be divisible by p, we easily nd that 1 E Xp] = bn=pc = 1 + O n ; n p
E X] =
Now we bound the variance
p n
X 1 1 p +O n X
p6=q n
= ln ln n + O(1):
Var X ] =
p n
Var Xp] +
Cov Xp Xq ] E X ] +
X
p6=q n
Cov XpXq ];
since Var Xp ] E Xp ]. In the above p and q are two distinct primes. Observe that Xp Xq = 1 if and only if pjx and qjx which further implies that pqjx. In view of this we have Cov X X ] = E X X ] E X ]E X ] = bn=(pq)c bn=pc bn=qc
p q
pq
Then by (4.9)
p q
1 1+1 : n p q
p n
slowly, we arrive at the main result
ln n ( X1 Cov XpXq ] 2 nn) p 2n(ln ln ln+ O(1)) = O lnln n ! 0: n n p n p6=q p Finally, by Chebyshev's inequality with t = !(n) ln ln n, where !(n) ! 1 arbitrarily
Prfj (x) ln ln nj > !(n) ln ln ng
c ; !(n)
where c is a constant. This is summarized in the theorem below.
Theorem 4.3 (Hardy and Ramanujan, 1920; Turan 1934) For !(n) ! 1 arbitrarily slowly, the number of x 2 f1; : : : ; ng such that p j (x) ln ln nj > !(n) ln ln n
is O(n=!(n)) = o(n).
11
where Cij is de ned as the length of the longest common pre x of strings X i and X j . We shall use this relation to assess the typical probabilistic behavior of the height Hn for memoryless and Markovian sources. Throughout this section, we apply the rst moment method to bound the height from above, and the Chung-Erd}s second moment method to establish a o lower bound. The results of the next three subsections are mostly adopted from Devroye 88, 93, 95], Pittel 326] and Szpankowski 399, 400, 401]. However, the reader is advised to study also a series of papers by Arratia and Waterman 20, 21, 22, 24] on similar topics.
In the next three subsections, we shall analyze three digital trees, namely: tries, PATRICIA tries and digital search trees (in short: DST), and their generalizations such as b-tries. As we discussed in Chapter 1.1, these digital trees are built over a set X = fX 1 ; : : : ; X n g of n independent strings of (possibly) in nite length. Each string X i is generated by either a memoryless source of a Markovian source. We concentrate on estimating the typical length of the height in digital trees. We recall that the height is the longest path in a tree. Here, we consider a trie built over n strings generated independently by a probabilistic source. The reader may recall that in Theorem 1.3 we express the height Hn in terms of the alignments Cij as Hn = 1 max nfCij g + 1; (4.10) i<j
4.2.3 Height in Independent Tries
Let us rst assume that the strings from X are generated by a memoryless source. To simplify the analysis, we only consider a binary alphabet A = f0; 1g and assume that a memoryless source outputs an independent sequence of 0's and 1's with probabilities p and q = 1 p, respectively. Observe that P2 = p2 + q2 is the probability that two independently generated strings agree on a given position (i.e., either both symbols are 0 or both are 1). We also write Q2 = 1=P2 to simplify our exposition. Clearly, the alignment Cij has a geometric distribution, that is, for all i; j 2 f1; : : : ; ng and k 0
PrfCij = kg = P2k (1 P2 )
Memoryless Source.
and hence PrfCij kg = P2k . We now derive a typical behavior of Hn . We start with an upper bound. By the rst moment method (or equivalently the Boole inequality), for any nonnegative integer k
PrfHn > kg = Prf1 max nfCij g i<j
kg n2 PrfCij kg;
12
since the number of pairs (i; j ) is bounded by n2 . Set now k = b2(1 + ") logQ2 nc for any " > 0. Then n2 = 1 ! 0: PrfHn > 2(1 + ") logQ2 ng n2(1+") n2" Thus Hn =(2 logQ2 n) 1 (pr.). We now match the above upper bound and prove that Hn=(2 logQ2 n) = 1 (pr.) by showing that PrfHn > 2(1 ") logQ2 ng ! 1 for any " > 0. We set k = b2(1 ") logQ2 nc. We use the Chung-Erd}s formulation of the second moment method with Aij = fCij kg. o Observe that the sums in (4.4) are over pairs (i; j ). Let us de ne
S1 = S2 =
1 i<j n
PrfAij g; PrfAij \ Alm g;
(i;j )6=(l;m)
where the summation in S2 is over all pairs 1 i; j Then

PrfHn
The following is obvious for n 2
n, 1 l; m n such that (i; j ) 6= (l; m). 2 kg S S1 S : 1+ 2

1
S1 =
1 i<j n
PrfAij g = n(n 1)P2k 2
n2 P k :
4
2
The sum S2 a little harder to deal with. We must consider two cases: (i) all indices i; j; l; m are di erent, (ii) either i = l (i.e., we have (i; j ) and (i; m)) or j = m (so we have (i; j ) and 0 00 0 (l; j )). Let us split S2 = S2 + S2 such that S2 is the summation over all di erent indices (as 00 covers the latter case. Notice that Cij and Clm are independent when in (i) above), and S2 the indices are di erent, hence 4 2 00 To evaluate S2 we must compute the probability PrfCij ter 3.3.1, we easily see that
PrfCij
0 S2
n4 P 2k :
k; Cim
kg. But, as in Chap-
k; Ci;m kg = (p3 + q3 )k = P3k ;
since the probability of having the same symbol at a given position for three strings Xi , Xj and Xm is equal to P3 = p3 + q3 . But, there are no more than n3 =6 pairs with one common index, thus 00 S2 n3 (p3 + q3 )k :
13
In summary,
S2 =
X
(i;j );(l;m)
PrfCij
4 k; Clm kg n P22k + n3 (p3 + q3 )k : 4
00 To proceed further, we need to bound the sum S2 . To accomplish this, we need a useful inequality that we discuss next.
Lemma 4.4 For all s t > 0 the following holds

(ps + qs )1=s (pt + qt )1=t ;
where 0 < q = 1 p < 1.
(4.11)
Proof. Let f (x) = (px + qx)1=x for x > 0, and then
Now, the rest is easy: An application of the Chung{Erd}s formula (4.4) leads to o 1 PrfHn > kg 0 2 00 2 1=S1 + S2 =S1 + S2 =S1 1 2 P k + 1 + 16(p3 + q 3 )k =(nP 2k ) 4n 2 2 1 1 = 1 O(1=n" ) ! 1; 2" + 16n 1 P k=2 1 + 4n 2" + 16n " 1 + 4n 2 where the third inequality follows after setting k = 2(1 ") logQ2 n, and the last is a consequence of (4.11). Thus, we have shown that Hn =(2 logQ2 n) 1 (pr.), which completes our proof of Prf2(1 ") logQ2 n Hn 2(1 + ") logQ2 ng = 1 O(n " ) (4.12)
x + qx f 0 (x) = f (x) p ln p + qx ln q ln f (x) : x x p x For p q de ne a = q=p. For x 1 we proceed as follows: 1 1 1 px ln p + qx ln q ln f (x) px + qx x 1 + ax ln p + 1 + a x ln q x ln p 1 1 1 + a x (ln q ln p) + 1 x ln p 0: Thus f 0 (x) < 0 for x 1 and f (x) is a decreasing function. This can be easily extended to 0 x < 1 so the lemma is proved for all x 0. In Figure 4.1 we plot the function f (s) = (ps + qs)1=s for s = 2 and s = 3.
14

1 0.98 0.96 0.94 0.92 0.9 0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74 0.72 0.7 0.68 0.66 0.64 0 0.2 0.4 p 0.6 0.8 1
3 Figure 4.1: Function (p3 + q3 ) 1 (lower curve) plotted against (p2 + q2) 12 .
for any " > 0, where Q2 1 = P2 = p2 + q2 . In passing we should observe that the above implies the convergence in probability of Hn = logQ2 n ! 1, but the rate of convergence is too slow to use the Borel-Cantelli lemma and to conclude almost sure convergence of Hn = logQ2 n. Nevertheless, at the end of this section we provide an argument (after Kingman 231] and Pittel 326]) that allows us to justify such a convergence (cf. Theorem 4.7 for a complete statement of the result).
Memoryless Source for b-Tries. We now generalize the above to b-tries in which every (external) node is capable to store up
to b strings. As in Theorem 1.3 we express the height Hn in terms of the alignments Ci1 ;:::;ib+1 as Hn = 1 i <max nfCi1 ;:::;ib+1 g + 1; <i
1
b+1
where Ci1 ;:::;ib+1 is the length of the longest common pre x of strings X i1 ; : : : ; X ib+1 . We follow the same route as for b = 1, so we only outline the proof. First of all, observe that
PrfCi1 ;:::;ib+1
kg = Pbk+1 ;
where Pb+1 = pb+1 + qb+1 represents the probability of a match in a given position of b + 1 strings X i1 ; : : : ; X ib+1 . As before, we write Qb+1 = 1=Pb+1 . Our goal is to prove that
15

i i
which is the desired result. The lower bound is slightly more intricate due to complicated indexing. But, in principle it is nothing else than the second moment method as in the case b = 1. Let us denote by S1 and S2 the two sums appearing in the Chung{Erd}s formula. The rst sum can be estimated o as follows: ! b+1 X n S1 = PrfA g = b + 1 Pbk+1 (nb +b) Pbk+1 : ( 1)! The second sum S2 we upper bound by ! b X 1 n2(b+1) P 2k + X b + 1 n2(b+1) i P k S2 = PrfA \ A g b+1 2(b+1) i : (b + 1)!]2 i i=1 D 6=D
i j
i j
Hn (b + 1) logQb+1 n (pr.). To simplify the analysis, we let D = fi = (i1 ; : : : ; ib+1 ) : ik 6= il whenever k 6= lg and A = fCi1 ;:::;ib+1 kg. The upper bound is an easy application of the rst moment method. We have for k = b(1 + ")(b + 1) logQb+1 nc PrfHn > kg = PrfmaxfCi1 ;:::;ib+1 g kg D b+1 PrfCi ;:::;i n (1 + ")(b + 1) logQb+1 ng 1 b+1 = (b1 " ! 0; n +1)
i
Using the Chung-Erd}s formula we obtain o 1 PrfHn > kg 2 1=S1 + S2 =S1

(b+1)! (n b)b+1 Pbk+1
b+1 n2(b+1) i 2(b+1) i + 1 + nb b k i=1 i (b + 1)!]2 (n b)2(b+1) Pb2+1 Setting now k = b(1 ")(b + 1) logQb+1 nc and using inequality (4.11) we nally arrive at 1 PrfHn > (1 ")(b + 1) logQb+1 ng (b+1)" + 1 + Pb=1 c2 (b)ni" c1 n i 1 " 1 + c(b)n" = 1 O(n );
2(b+1)
P + b
Pk
where c1 ; c2 (b) and c(b) are constants. In summary, as in the case b = 1, we conclude that Hn (b + 1) logQb+1 n (pr.), or more precisely for any " > 0, (4.13) n H (1 + ")(b + 1) log ng = 1 O 1 ; Prf(1 ")(b + 1) log
Qb+1 n Qb+1
1 where Qb+1 = Pb+1 = pb+1 + qb+1 (cf. Theorem 4.7 for a precise formulation of this result).
n"
16
transition matrix P and the stationary distribution satisfying = P. For simplicity, we also assume b = 1. We shall follow the footsteps of the previous analysis once we nd formulas on PrfCij kg and PrfCij k; Clm kg. In principle, it is not that di cult. We need, however, to adopt a new approach which Arratia and Waterman 21] called the \analysis by pattern", Jacquet and Szpankowski 207] named it the \string-ruler approach", and it was already used in Pittel 326]. The idea is to choose a given pattern w 2 Ak of length k, and measure relationships between strings X 1 ; : : : ; X n by comparing them to w. In particular, it is easy to see that j j j i i i Cij k ) 9w2Ak X1 X2 : : : Xk = X1 X2 : : : Xk = w; that is, if Cij k, then there exists a word w such that pre xes of length k of X i and X j are the same and equal to w. Thus X Prfwg]2 ; (4.14) PrfCij kg =
PrfCij PrfCij
X 1 ; : : : ; X n are generated independently by a stationary Markovian source over a nite alphabet A of size V . More precisely, the underlying Markov chain is stationary with the
Our last generalization deals with the Markovian model. We now assume that all strings
Markovian Source.
k; Clm kg = k; Cim kg =
w2Ak
X X
k
w2A u2A
To complete our analysis, we must compute the probabilities PrfCij kg and PrfCij k; Clm kg as k ! 1. As shown in (4.14){(4.16) these probabilities depend on Prfwg where w 2 Ak is a word of length k. Let w = wj1 wj2 : : : wjk . Then, in the Markovian model Prfwg = j1 pj1 ;j2 pj2 ;j3 pjk 1 ;jk : Thus, for any r we have X X r Prfwg]r = j1 pj1 ;j2 pjk 1 ;jk : We can rewrite succinctly the above using the Schur's product of matrices. Let P P P := P r] be the r-th Schur product (i.e., element-wise product) of P. We also write = (1; 1; : : : ; 1)
r r for the unit vector, and r] = ( 1 ; : : : ; V ) for the the r-th power of the stationary vector. The above formula becomes in terms of matrices just introduced X Prfwg]r = h r]; Pkr] 1 i; w2Ak w2Ak
1 j1 ;j2 ;:::;jk V
w2Ak
Prfwg]3
Prfwg]2 Prfug]2
i 6= l; j 6= m;
(4.15) (4.16)
i = l; j 6= m:
17
Let r and l be, respectively, the right eigenvector and the left eigenvector of a matrix A associated with the eigenvalue , that is,
Table 4.1: Spectral properties of non-negative matrices lA = l ; Ar = r:
To avoid heavy notation, we do not specify whether vectors are column or row vectors since this should be clear from the context. Consider now a nonnegative matrix A (all elements of A are nonnegative). We also assume it is irreducible (cf. 191, 316]; the reader may think of A as a transition matrix of an irreducible Markov chain). Let 1 ; : : : ; m are eigenvalues of A associated with the eigenvectors r1 ; : : : ; rm . We assume that j 1 j j 2 j j m j.
Theorem 4.5 (Perron{Frobenius) Let A be a V V irreducible nonnegative matrix. Then A has a positive real eigenvalue 1 of the largest value, that is 1 j i6=1 j. The eigenvector r1 associated with 1 has all positive coordinates.
1 1
is of multiplicity one.
diagonal entries are strictly positive.
> j i=1 j if there exists k such that all entries of Ak are strictly positive or the main6
Assume rst that all eigenvalues are of multiplicity one. Then the left eigenvectors l1 ; l2 ; : : : ; lV are orthogonal with respect to the right eigenvectors r1 ; r2 ; : : : ; rV , that is hli ; rj i = 0 for i 6= j where hx; yi denotes the inner (scalar) product of x and y. (Indeed, hli ; rj i = ji hli ; rj i = 0 since i 6=P j .) Setting hli ; ri i = 1 for all 1 i V we can write for any vector x = hl1 ; xir1 + V=2hli ; xiri which yields i
Ax = hl1 ; xi 1 r1 +
V X i=2
hli ; xi i ri:
Since Ak has eigenvalues k ; k ; : : : ; k , then | dropping the assumption about eigenvalues 1 2 V 2 ; : : : ; V being simple | we arrive at
Ak x = hl
V k + X q (k)hl ; xir k 1 ; xir1 i i i i i=2
where qi (k) is a polynomial in k (qi (k) 1 when the eigenvalues 2 ; : : : ; V are simple). In particular, for irreducible non-negative matrices by the above and the Perron-Frobenius theorem Ak x = k (1 + O( k )) for some and < 1.
18
where hx; yi is the scalar product of vectors x and y. Assuming that the underlying Markov chain is aperiodic and irreducible, by the Perron-Frobenius Theorem 4.5 (cf. Table 4.1) the above can be represented as for some < 1 where l and r are the left and the right principal eigenvectors of P r]. In summary, there exists a constant such that
h r]; Akr] 1 i = X
k 1h r];rihl; r]
i(1 + O( k ));
(4.17)
w2Ak
Prfwg]r =
k (1 + O( k )); r]
hence
PrfCij PrfCij PrfCij
k; Clm k; Cim
kg = kg = kg =
k k 1 2](1 + O( )); 2 2k k 1 2](1 + O( )) k k 2 3](1 + O( ))
i 6= l; j 6= m; i = l; j 6= m
(4.18) (4.19) (4.20)
for some constants 1 and 2 . We need the following inequality on 1r=r that extends Lemma 4.4 to Markov models and ] is interesting on its own right.
Lemma 4.6 (Karlin and Ost, 1985) Let P r] the rth order Schur product of a transition
matrix P of an aperiodic irreducible Markov chain. The function
F (r) =
1=r r]
is decreasing for r > 0, where r] is the largest eigenvalue of P r].
Proof. We follow the arguments of Karlin and Ost 219]. From (4.17) we observe that 0 11 X Prfwg]r A = r] 1: lim @
k
k!1 w2Ak
But (either by a probabilistic argument or an algebraic one)
w2Ak
Prfwg]r+s
w2Ak r+s]
Prfwg]r
w2Ak
Prfwg]s ;
hence together with the above we conclude that

r ] s] :
(4.21)
19
But (4.21) implies f (2r) 2f (r), hence the above yields
Furthermore, it is easy to see that log r] is convex as a function of r. Let now f (s) = log s] and set for r < s < 2r s = 2r r s r + s r r 2r: By convexity f (s) 2r r s f (r) + s r r f (2r):
and therefore f (r)=r is decreasing. This proves the lemma. The rest is easy, and we imitate our derivation from the above. The reader is asked in Exercise 2 to complete the proof of the following for any " > 0
Prf2(1 ") logQ2 n
s f (s) r f (r)
Hn 2(1 + ") logQ2 ng = 1 O(n ");
(4.22)
where Q2 1 =
2]
is the largest eigenvalue of P P = P 2] . Thus Hn 2 logQ2 n (pr.).
For simplicity of derivations, we again assume here b = 1. In (4.12), (4.13) and (4.22) we proved that whp the height Hn of a trie is asymptotically equal to 2 logQ2 n with the rate of convergence O(n " ). This rate does not yet justify an application of the Borel-Cantelli Lemma in order to improve the result to almost sure convergence. Nevertheless, we shall show in this section that Hn=(2 logQ2 n) ! 1 (a.s.) thanks to the fact that Hn is a nondecreasing sequence. We apply here a method suggested by H. Kesten and reported in Kingman 231]. First of all, observe that for any n
Almost Sure Convergence.
Hn Hn+1;
that is, Hn { though random { is a nondecreasing sequence. Furthermore, the rate of convergence O(n " ) and the Borel-Cantelli Lemma justify almost sure convergence of Hn =(2 logQ2 n) along the exponential skeleton n = s2r for some integers s and r. Indeed, we have
Hs2r Pr 2 logQ2 (s2r ) 1 r=0
1 X (
" < 1:
We must extend the above to every n. Fix s. For every n we nd such r that
s2r n (s + 1)2r :
20
Since logarithm is a slowly varying function and Hn is a nondecreasing sequence, we have H r 2 logQ2 (s + 1)2r H (a:s:): lim sup 2 log n n lim sup 2 log (s(+1)2 1)2r 2 log s2r = 1 r!1 n!1 Q2 Q2 s + Q2 In a similar manner, we prove
r
H lim inf 2 log n n lim inf 2 log H(ss2 + 1)2r = 1 (a:s:): n!1 r!1 Q2 Q2 We summarize the main ndings of this section in the following theorem. Theorem 4.7 (Pittel, 1985; Szpankowski, 1991) Consider a b-trie built over n independent strings generated according to a stationary Markov chain with the transition matrix P. Then Hn (b + 1) (a:s:); nlim ln n = ln 1 !1 b+1] where b+1] is the largest eigenvalue of the (b + 1)-st order Schur product of P. In particular, P in the Bernoulli model b+1] = V=1 pb+1 where pi is the probability of generating the ith i i symbol from the alphabet A = f1; : : : ; V g.
In Chapter 1.1 we described how to obtain PATRICIA trie from a regular trie: In PATRICIA we compress a path from the root to a terminal node by avoiding (compressing) unary nodes (cf. Figure 1.1). In this section, we look at the height Hn of PATRICIA and derive its typical probabilistic behavior. To simplify our analysis, we again assume a binary alphabet A = f0; 1g with p q = 1 p. Actually, we shall write pmax = maxfp; qg and Qmax = 1=pmax. We again assume the memoryless model. We start with an upper bound. Following Pittel 326] we argue that for xed k and b the event Hn k + b 1 implies that there exist b strings, say X i1 ; : : : ; X ib such that their common pre x is of length at least k. In other words, Hn k + b 1 ) 9i1 ;:::;ib Ci1 ;i2 ;:::;ib k: This is true since in PATRICIA there is no unary nodes. Hence, if there is a path of length at least k + b 1, then there must be b strings sharing the common pre x of length k. But, as in the analysis of b tries, we know that PrfCi1 ;i2 ;:::;ib kg = Pbk ; where Pb = pb + qb . Thus, for xed b and k(b) = b(1 + ") logPb 1 n
PrfHn
4.2.4 Height in Independent PATRICIA Tries
k + b 1g nbPrfCi1 ;i2 ;:::;ib kg = O(n "):
21
The above is true for all values of b, hence we allow now b to increase arbitrary slowly to in nity. Observe then lim k(b) = (1 + ") blim b log n1 = (1 + ") blim log n1=b !1 log Pb b!1 !1 log Pb (1 + ") log n1 = (1 + ") logQmax n; log pmax
b since limb!1 Pb1=b limb!1 pmax = pmax. In summary, for any " > 0 we just proved that PrfHn (1 + ") logQmax ng = O(n " ); which is the desired upper bound. Not surprisingly, we shall use the second moment method to prove a lower bound. Let \1" occurs with the probability pmax. We show that with high probability the following k strings: X i1 = 10 : : :, X i2 = 110 : : : ; ; X ik = 111 10 : : : appear among all n strings X 1 ; : : : X n , where string X ij has the rst j symbols equal to 1 following by a 0. Observe that if this is true, then Hn k + 1 whp, and we select k = (1 ") logQmax n to complete the proof. Let now i = (i1 ; i2 ; : : : ; ik ), and de ne Z = 1 if X i1 = 10 : : :, X i2 = 110 : : : ; ; X ik = 111 10 : : :, and zero otherwise. Let also X Z= Z; b
i i
where D = fi = (i1 ; : : : ; ik ) : ij 6= il whenever j 6= lg. Observe that PrfZ > 0g PrfHn > kg: We estimate the left-hand side of the above, by the second moment method. First of all
i
2D
manner we compute the variance
k(k+1) 2 E Z ] = n pmax (1 pmax)k ; k since the probability of X i is pimax(1 pmax), and the strings are independent. In a similar
Var Z ] E Z ] +
i j
X
D 6=D
i j
Cov Z Z ]:
i j
The covariance Cov Z Z ] depends on how many indices are the same in i and j. If, say 0 < l < k, indices are the same, then
Cov Z Z ] E Z Z ]
i j i j
n k(k+1) 2k l pmax
l(l+1)=2 (1
pmax)2k l :
22
Thus, by the second moment method Var Z ] PrfZ = 0g E Z ]2 ! X 1 + 1 k 1 n pk(k+1) l(l+1)=2 (1 p )2k l : max E Z ] E Z ]2 l=1 2k l max
(4.23)
follows
k2 = c n ! pmax2 (1 pmax)k k ! ln2 n (1 ")2 ln2 n O(ln n ln ln n) exp (1 ") 1 1 2 ln pmax ln pmax ! 2 1 = exp 2 (1 ") ln n ! 1; 1 ln pmax where " is an arbitrary small positive number. The lth term of (4.23) can be estimated as
1 Let us now estimate the terms of the above. First of all, for k = (1 ") ln n= ln pmax we obtain for some constant c
E Z]
l2 = O n l pmax=2 (1 pmax ) l ! 0;
n k(k+1) l(l+1)=2 (1 2k l pmax E Z ]2
pmax)2k
l(l+1)=2 (1 n 2k l pmax n2 l
pmax)
1 where the convergence to zero is true for l 2 k = (1 ") ln n= ln pmax by the same arguments =2 as above (indeed, the function f (l) = nl plmax is nondecreasing with respect to l for l k = 1 ). Thus ln n= ln pmax
PrfHn
1 (1 ") ln n= ln pmaxg 1 O (exp(
ln2 n + ln ln n)
for some > 0. In summary, we prove the following result (the almost sure convergence below is established in the same fashion as in the previous section). Theorem 4.8 (Pittel, 1985) Consider a PATRICIA trie built over n independent strings generated by a memoryless source (Bernoulli model) with pi being the probability of generating the ith symbol from the alphabet A = f1; : : : ; V g. Then 1 Hn 1 nlim log n = log pmax (a:s:); !1 where pmax = maxfp1 ; p2 ; : : : ; pV g.
23
To complete our discussion of digital trees, we shall analyze the height of a digital search tree and prove a result similar to the one presented in Theorem 4.8. We start with a lower bound, and use an approach similar to the one discussed for PATRICIA tries. Actually, the lower bound from Section 4.2.4 works ne for digital search trees. Nevertheless, we provide another derivation so the reader can again see the second moment method at work. Set Z0 = n, and let Zi for i 1 be the number of strings whose pre x of length i consists of 1's. We recall that 1 occurs with the probability pmax. Let also 1 k = (1 ") ln n= ln pmax. Observe that
4.2.5 Height in Independent Digital Search Trees
Zk > 0 ) Hn > k;
where Hn is the height in a digital search tree. We prove that PrfZk > 0g whp when 1 k = (1 ") ln n= ln pmax using the second moment method. Thus, we must nd E Zk ] and Var Zk ]. All strings X 1 ; : : : ; X n are independent, hence Z1 has the binomial distribution with parameters n and pmax, that is, Z1 binomial(n; pmax ). In general, Zi binomial(Zi 1 ; pmax). Using conditional expectation we easily prove that (the reader is asked to provide details in Ex. 8)
E Zk ] = npk ; max Var Zk ] = npk (1 pk ): max max

Then, for Qmax = 1=pmax
PrfHn
(4.24) (4.25)
(1 ") logQmax ng PrfZk = 0g
1 = n ": Var Zk ] 2 E Zk ] npk max
1 This proves that Hn= ln n 1= ln pmax whp. The upper bound is slightly more complicated. Let us consider a given word w (of possibly in nite length) whose pre x of length k is denoted as wk . De ne Tn (w) to be the length of a path in a digital search tree that follows the symbols of w until it reaches the last node of the tree on its path. For example, referring to Figure 1.1 we have T4 (00000 : : : ) = 2. The following relationship is quite obvious:
PrfTn (w)
kg = PrfTn 1 (w) kg + PrfTn 1(w) = k 1gPrfwk g;
since when inserting a new string to the tree we either do not follow the path of w (the rst term above) or we follow the path leading to the second term of the recurrence. Observe now that for given w Tn (w) Tn 1 (w);
24
so that Tn (w) is a nondecreasing sequence with respect to n. Since, in addition, Prfwk g pk and PrfTn 1 (w) = k 1g PrfTn 1 (w) k 1g, we can iterate the above recurrence max to get PrfTn (w) kg PrfTn 2 kg + 2pk PrfTn 1 (w) k 1g; max so that after iterations with respect to n the above leads to
PrfTn (w)
kg npk PrfTn 1 (w) k 1g: max

PrfTn (w)
Iterating now with respect to k, yields
k g nk pk 2 : max
1 Set now k = (1 + ") log n= log pmax to obtain
PrfHn > (1 + ") logpmax n 1 g
1 exp k ln 2 + k ln n k2 ln pmax exp ( k(1 + ") ln n + k ln n + k ln 2) exp( ln2 n) ! 0
w2Ak 2k nk pk2 max
PrfTn (w)
kg
for some > 0 as long as " > ln 2= ln n.
Theorem 4.9 (Pittel, 1985) Consider a digital search tree built over n independent strings
generated by a memoryless source (Bernoulli model) with pi being the probability of generating the ith symbol from the alphabet A = f1; : : : ; V g. Then
H lim lognn = n!1

where pmax = maxfp1 ; p2 ; : : : ; pV g.
1 log pmax
(a:s:);
Finally, we consider the height of a su x tree built from the rst n su xes of a (possibly 1 in nite) string X1 generated by a memoryless source. As before, we assume a binary alphabet with p and q = 1 p being the probability of generating the symbols. We denote by X (i) = Xi1 the ith su x, where 1 i n. (We recall that a su x tree is a trie built from dependent strings, namely, X (1); : : : ; X (n).) We restrict the analysis to the memoryless model. The reader is asked in Exercise 7 to extend this analysis to the mixing model.
4.2.6 Height in a Su x Tree
25
The analysis of the height follows the same footsteps as in the case of tries (i.e., we use the rst moment method to prove an upper bound and the second moment method to prove a lower bound) except that we must nd the probability law governing the self-alignment Cij that represents the length of the longest pre x common to X (i) and X (j ). We recall that the height Hn is related to Cij by Hn = maxi6=j fCij g + 1. We need to know the probability of Aij = fCij kg. Let d = ji j j and consider the su x X (i) and the su x X (i + d). Below, we assume that j = i + d. When d k, then X (i) and X (i + d) are independent in the memoryless model, hence as in tries we easily nd that PrfCi;i+d kg = P2k d k; where P2 = p2 + q2 . The problem arises when d < k. We need to identify conditions under which k symbols starting at position i are the same as k symbols following position i + d, that is, +d Xii+k 1 = Xii+d +k 1 ; +d where Xii+k 1 and Xii+d +k 1 are k-length pre xes of su xes X (i) and X (i + d), respectively. The following simple combinatorial lemma provides a solution. Lemma 4.10 (Lothaire, 1982) Let X11 be a string whose ith and j th su xes are X (i) = Xi1 and X (j ) = Xj1, respectively, where j i = d > 0. Let also Z be the longest common pre x of length k d of X (i) and X (j ). Then there exists a word w 2 Ad of length d such that Z = wbk=dc w; (4.26) where w is a pre x of w, and wl is the string resulting from the concatenation of l = bk=dc copies of the word w. We leave the formal proof of the above lemma to the reader (cf. Exercise 9). Its meaning, k 3 however, should be quite clear. To illustrate Lemma 4.10, let is assume that X1 +d = X1 d+2 = x1 x2 : : : xd x1 x2 : : : xd : : : x1 x2 : : : xd x1x2 , where k = 2d+2 and d > 2. One easily identi es w = k k+2 x1 x2 : : : xd and w = x1 x2 . The common pre x Z of X1 +d 1 and Xd+1 d can be represented 2 as Z1 d+2 = w2 w, as stated in Lemma 4.10. In view of the above, we can now derive a formula on the probability PrfCi;i+d kg which is pivotal to our analysis of the height. An application of Lemma 4.10 leads to X k (4.27) PrfCi;i+d kg = Prfwb d c+1 wg
d d (4.28) pb k c+2 + qb k c+2 ; where r = k dbk=dc and k ^ d = minfd; kg (so word w has length k ^ d). Observe that by restricting the length of w to k ^ d, we cover also the case k < d in which case we have PrfCi;i+d kg = (p2 + q2 )k .
w2Ak^d d r d d pb k c+1 + qb k c+1
26
To see why (4.27) implies (4.28), say for d < k, we observe that d r symbols of w must be repeated l = bk=dc times in wl , while the last r symbols of w must occur l + 1 times. The probability of having the same symbol on a given position in l strings is pl+1 + ql+1 . Thus (4.28) follows. Before we proceed, we need an inequality on (4.28) which we formulate in the form of a lemma. Lemma 4.11 For d < k we have k+d PrfCi;i+d kg (p2 + q2 )k=2 = P2 2 (4.29) Proof. The result follows from Lemma 4.4 (cf. (4.11)) and (4.28). Indeed, let Pl = (pl + ql). Then k+d 1 PrfCi;i+d kg = Pld r Plr+2 P22 (d r)(l+1)+r(l+2)] = P2 2 ; +1 since k = dl + r. Now, we are ready to prove an upper bound for the height. We use the Boole inequality and the above to obtain 0 1
PrfmaxfCi;i+d g i;d
kg
k n @X PrfCi;i+d kg + X PrfCi;i+d kgA n d 1=k+1 0d=1 n k X + X kA P2 n @ P2 2 +

k d
d=1
n P2 2 (1
k+1
d=k+1 P2 ) 1 + nP2k
where (as before) Q2 = P2 1 and c is a constant. Thus whp we have Hn 2(1 + ") logQ2 n. For the lower bound, we use the Chung and Erd}s second moment method, as one can o guess. We need, however, to overcome some di culties. Let D = f(i; j ) : ji j j > kg. Observe that 2 Aij g S S1 S ; PrfHn > kg = Prf Aij g Prf + where
i6=j i;j 2D
1 2
where the second inequality is a consequence of Lemma 4.11. From the above we conclude that c; PrfHn > 2(1 + ") logQ2 ng = PrfmaxfCi;i+d g 2(1 + ") logQ2 ng i;d n"
S1 = S2 =
X
i;j 2D
PrfAij g; PrfAij \ Alm g:
(i;j )6=(l;m)2D
27
The rst sum is easy to compute from what we have learned so far. Since ji j j > k in D we immediately nd that S1 (n2 (2k + 1)n)P2k : To compute S2 we split it into three terms, namely S21 , S22 and S23 such that the summation in S2i (i = 1; 2; 3) is over the set Di de ned as follows: D1 = f(i; j ); (l; m) 2 D : minfjl ij; jl j j) k and minfjm ij; jm j j) kg D2 = f(i; j ); (l; m) 2 D : minfjl ij; jl j j) k and minfjm ij; jm j j) < k or minfjl ij; jl j j) < k and minfjm ij; jm j j) kg D3 = f(i; j ); (l; m) 2 D : minfjl ij; ; jl j j) < k and minfjm ij; jm j j) < kg Using the above and similar arguments as in the case of tries, we obtain X S21 = PrfAij \ Alm g n4 P22k ;
(i;j );(l;m)2D1
S22 = S23 =
(i;j );(l;m)2D2
PrfAij \ Alm g PrfAij \ Alm g
X
(i;j );(i;m)2D2
PrfAij \ Aim g
8kn3 P3k 16k2 n2 P2k ;
8kn2 P22 k
(i;j );(l;m)2D3
(i;j );(i;j )2D3
PrfAij \ Aij g
where, as always, P3 = p3 + q3 . The last inequality for S22 follows from Lemma 4.4. The rest is a matter of algebra. Proceeding as in Section 4.2.3 we arrive at log n 1 PrfHn > 2(1 ") logQ2 ng " log n + c2 n 2" = 1 O n" ; 1 + c1 n for some constants c1 ; c2 . As before we can extend this convergence in probability to almost sure convergence by considering an exponential skeleton n = s2r . Thus, we just proved the following interesting result. Theorem 4.12 (Devroye, Rais and Szpankowski, 1992) Consider a su x tree built from n su xes of a string generated by a memoryless source with pi being the probability of generating the ith symbol from the alphabet A = f1; : : : ; V g. Then Hn 1 nlim log n = log P2 1 (a:s:); !1 P where P2 = V=1 p2 . i i In passing we point out that results of Sections 4.2.3{4.2.6 can be extended to the mixing probabilistic model (cf. 326, 400, 401]). The reader is asked to try to prove such extensions in Exercises 6{7. Additional details can be also found in Chapter 6.
28
4.1 Prove the following alternative formulation of the second moment method. Let fAi gn=1 i be a set of identically distributed events such that nPrfAi g ! 1. If X PrfAi jAj g !1 i6=j nPrfAi g S as n ! 1, then Prf n=1 Ai g ! 1. i 4.2 Consider the height of a b-trie in a Markovian model and prove Prf(b + 1)(1 ") logQ +1 n Hn (b + 1)(1 + ") logQ +1 g = 1 O(n " );
b b
4.3 Extensions and Exercises
1 P. where Qb+1 = b+1] is the largest eigenvalue of Pb+1] = P 4.3 Find another proof for the lower bound in the derivation of the height in PATRICIA tries.
!
4.4
4.8 Prove (4.24) and (4.25). 4.9 Prove Lemma 4.10. 4.10 (Pittel 1985) Consider the ll-up level Fn (cf. Section 1.1) in tries, PATRICIA tries
4pExtend the Markovian model? (cf. Chapter 6.) source. What is the equivalence the analysis PATRICIA trie to a Markovian of in 4.5 4 Extend the analysis of digital search trees to a Markovian source. 4.6 4 (Pittel 1985) Establish typical behaviors of heights in all three digital trees in the mixing model (cf. Chapter 6). 4.7 4 (Szpankowski 1993) Extend the analysis of the height in a su x tree (cf. Section 4.2.6) to the mixing model.
max
! ! !
4 digital search trees. Prove the following theorem: and

!
Theorem 4.13 (Pittel, 1985) Consider tries, PATRICIA tries and digital search
trees built over n independent strings generated by a memoryless source with pi being the probability of generating the ith symbol from the alphabet A = f1; : : : ; V g. Then F lim lognn = 1 1 (a:s:); n!1 log pmin where pmin = minfp1 ; p2 ; : : : ; pV g.
29
4.11
5in Exerciseagainprove ortrees as above,following result.x trees. Using the same notation Consider digital including su as 10 disprove the
?
Can this theorem be extended to su x trees?
generated by a memoryless source with pi being the probability of generating the ith symbol from the alphabet A = f1; : : : ; V g. Then 1 sn 1 nlim log n = log pmin (a:s:); !1 where pmin = minfp1 ; p2 ; : : : ; pV g.
Theorem 4.14 (Pittel, 1985) Consider digital trees built over n independent strings
4.12 Consider the following \comparison" of two random strings. Let X1n and Y1m of length n
and m < n, respectively, be generated by a binary memoryless source with probabilities p and q = 1 p of the two symbols. De ne Ci to be the number matches between Xii+m 1 and Y , that is, nX m+1 Ci = equal(Xi+j 1; Yj ); where equal(x; y) is one when x = y and zero otherwise (i.e., the Hamming distance). De ne Mmn = max1 i n m+1 fCi g. Prove that if log n = o(m) then
nlim !1 j =1
4.13
4 (Atallah, Jacquet=and(mSzpankowski=1993)qConsider the same problem as in Exercise 12 above. Let n ) and P p + a De ne

!
where P2 = p2 + q2 .
Mm;n = P 2 m
3
(a:s:);
Prove that if P2 P3 n; m ! 1. 4.14 Consider an n-dimensional unit cube In = f0; 1gn , i.e., a binary sequence (x1 ; : : : ; xn ) is regarded as a vertex of the unit cube in n dimensions. Let Tn be the cover time of a random walk, that is, time required by a simple walk to visit all 2n vertices of the cube. Prove that Tn (1 + n)2n log 2 whp.
2 3)( = (P2 P 6(P P2 p23P2 + 2P3 ) : 3 2) p2m(P P ) log n (pr.) as (1 ) then Mm;n mP 2 3

Chap 4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 4

Uploaded by

Copyright:

Available Formats

Chapter 4 The First and Second Moment Methods

Iof pattern occurrences in a random text of length n). Usually, it is not di

The First and Second Moment Methods

4.1 The Methods

nE I (A1 )] = n(1 F (x)):

n(1 F ((1 + ")an )) = (an )n(1 F (an )) = (an ) ! 0;

since an ! 1. Thus Mn (1 + ")an whp, as desired (cf. also Chapter 3.3.2).

The First and Second Moment Methods

E X ]2 = E I (X 6= 0)X ]2 E I (X 6= 0)]E X 2 ] = PrfI (X 6= 0)gE X 2 ];

we obtain after some simple algebra

g) Ai g Pn Prf( gi=1 PrfAiPrfA \ A g : Ai + Pi6=j i j i=1 i=1

The First and Second Moment Methods

(n m + 1 + 2m2 )P (H ) "2 (n m + 1)2 P 2 (H ) 1 2 2 + "2 (n m m 1)2 P (H ) ! 0; "2 (n m + 1)P (H ) +

whp. A fuller discussion of this problem can be found in Section 7.6.2.

The First and Second Moment Methods

f (n) log2 n + log2 log2 n + O(1):

Comparing the above two inequalities leads to

After setting the optimal = 3 we arrive at

f (n) log2 n + log2 log2 n + O(1);

The First and Second Moment Methods

After setting q = 3E Y 4 ]=E Y 2 ], we nally obtain Berger's fourth moment inequality

n n n n XX XX i=1 j =1 k=1 l=1

E jY j] = (pnp): In fact, one can prove that E jY j] = (pnp).

We summarize this section by presenting all the results derived so far.

The First and Second Moment Methods

4.2.1 Markov Approximation of a Stationary Distribution

The First and Second Moment Methods

3 2 n 1) X P (x0 n 1 1 7 = EX 1 6 1 4 n 1 P (x jX 1 )5 1: P (x0 1jX 1 ) 0 x0 1

The First and Second Moment Methods

4.2.2 Excursion into Number Theory

The First and Second Moment Methods

and by (4.8) we also have

slowly, we arrive at the main result

Prfj (x) ln ln nj > !(n) ln ln ng

where c is a constant. This is summarized in the theorem below.

The First and Second Moment Methods

4.2.3 Height in Independent Tries

The First and Second Moment Methods

PrfAij g; PrfAij \ Alm g;

where the summation in S2 is over all pairs 1 i; j Then

The following is obvious for n 2

n, 1 l; m n such that (i; j ) 6= (l; m). 2 kg S S1 S : 1+ 2

PrfAij g = n(n 1)P2k 2

kg. But, as in Chap-

k; Ci;m kg = (p3 + q3 )k = P3k ;

The First and Second Moment Methods

4 k; Clm kg n P22k + n3 (p3 + q3 )k : 4

Lemma 4.4 For all s t > 0 the following holds

Proof. Let f (x) = (px + qx)1=x for x > 0, and then

The First and Second Moment Methods

The First and Second Moment Methods

Using the Chung-Erd}s formula we obtain o 1 PrfHn > kg 2 1=S1 + S2 =S1

The First and Second Moment Methods

The First and Second Moment Methods

Table 4.1: Spectral properties of non-negative matrices lA = l ; Ar = r:

diagonal entries are strictly positive.

The First and Second Moment Methods

k k 1 2](1 + O( )); 2 2k k 1 2](1 + O( )) k k 2 3](1 + O( ))

(4.18) (4.19) (4.20)

is decreasing for r > 0, where r] is the largest eigenvalue of P r].

But (either by a probabilistic argument or an algebraic one)

hence together with the above we conclude that

The First and Second Moment Methods