Professional Documents
Culture Documents
inequalities a survey
Fan Chung
Linyuan Lu
Abstract
We examine a number of generalized and extended versions of concentration inequalities and martingale inequalities. These inequalities are effective for analyzing processes with quite general conditions as illustrated
in an example for an infinite Polya process and webgraphs.
Introduction
analysis in the sparse part of the graph. For the setup of the martingales, a
uniform upper bound for the incremental changes are often too poor to be of
any use. Furthermore, the graph is dynamically evolving and therefore the
probability space is changing at each tick of the time.
In spite of these difficulties, it is highly desirable to extend the classical concentration inequalities and martingale inequalities so that rigorous analysis for
random graphs with general degree distributions can be carried out. Indeed, in
the course of studying general random graphs, a number of variations and generalizations of concentration inequalities and martingale inequalities have been
scattered around. It is the goal of this survey to put together these extensions
and generalizations to present a more complete picture. We will examine and
compare these inequalities and complete proofs will be given. Needless to say
that this survey is far from complete since all the work is quite recent and the
selection is heavily influenced by our personal learning experience on this topic.
Indeed, many of these inequalities have been included in our previous papers
[9, 10, 11, 12].
In addition to numerous variations of the inequalities, we also include an
example of an application on a generalization of Polyas urn problem. Due
to the fundamental nature of these concentration inequalities and martingale
inequalities, they may be useful for many other problems as well.
This paper is organized as follows:
1. Introduction overview, recent developments and summary.
2. Binomial distribution and its asymptotic behavior the normalized binomial distribution and Poisson distribution,
3. General Chernoff inequalities sums of independent random variables in
five different concentration inequalities.
4. More concentration inequalities five more variations of the concentration inequalities.
5. Martingales and Azumas inequality basics for martingales and proofs
for Azumas inequality.
6. General martingale inequalities four general versions of martingale inequalities with proofs.
7. Supermartingales and submartingales modifying the definitions for
martingale and still preserving the effectiveness of the martingale inequalities.
8. The decision tree and relaxed concentration inequalities instead of the
worst case incremental bound (the Lipschitz condition), only certain local
conditions are required.
9. A generalized Polyas urn problem An application for an infinite Polya
process by using these general concentration and martingale inequalities.
2
For webgraphs generated by the preferential attachment scheme, the concentration for the power law degree distribution can be derived in a similar
way.
(1)
< x < .
0.008
0.4
0.007
0.35
Probability density
Probability
0.006
0.005
0.004
0.003
0.002
0.001
0.3
0.25
0.2
0.15
0.1
0.05
0
4600
4800
5000
5200
0
-10
5400
-5
value
10
value
Theorem 1 The binomial distribution B(n, p) for Sn , as defined in (1), satisfies, for two constants a and b,
Z
lim Pr(a < Sn np < b) =
2
1
ex /2 dx
2
p
where = np(1 p) provided, np(1 p) as n .
When np is upper bounded (by a constant), the above theorem is no longer
true. For example, for p = n , the limit distribution of B(n, p) is the so-called
Poisson distribution P ():
Pr(X = k) =
k
e ,
k!
for k = 0, 1, 2, .
and
Var(X) = .
n k
p (1 p)nk
n k
Qk1
k i=0 (1 ni ) p(nk)
e
= lim
n
k!
k
e .
=
k!
lim
4
0.25
0.2
0.2
Probability
Probability
0.25
0.15
0.1
0.05
0.15
0.1
0.05
0
0
10
15
20
10
value
15
20
value
Figure 4:
P (3)
0.12
Probability
Probability
0.1
0.03
0.02
0.08
0.06
0.04
0.01
0.02
0
0
70
80
90
100
value
110
120
130
10
15
20
25
value
If the random variable under consideration can be expressed as a sum of independent variables, it is possible to derive
P good estimates. The binomial distribution
is one such example where Sn = ni=1 Xi and Xi s are independent and identical. In this section, we consider sums of independent variables that are not
necessarily identical. To control the probability of how close a sum of random
variables is to the expected value, various concentration inequalities are in play.
5
(Lower tail)
Pr(X E(X) + )
(Upper tail)
e
e
/2E(X)
2
2(E(X)+/3)
We remark that the term /3 appearing in the exponent of the bound for
the upper tail is significant. This covers the case when the limit distribution is
Poisson as well as normal.
There are many variations of the Chernoff inequalities. Due to the fundamental nature of these inequalities, we will state several versions and then prove
the strongest version from which all the other inequalities can be deduced. (See
Figure 7 for the flowchart of these theorems.) In this section, we will prove
Theorem 8 and deduce Theorems 6 and 5. Theorems 10 and 11 will be stated
and proved in the next section. Theorems 9, 7, 13, 14 on the lower tail can be
deduced by reflecting X to X.
The following inequality is a generalization of the Chernoff inequalities for
the binomial distribution:
Theorem 5 [9] Let X1 , . . . , Xn be independent random variables with
Pr(Xi = 0) = 1 pi .
Pn
For P
X = i=1 ai Xi with ai > 0, we have E(X) = i=1 ai pi and we define
n
= i=1 a2i pi . Then we have
Pr(Xi = 1) = pi ,
Pn
Pr(X E(X) )
Pr(X E(X) + )
where a = max{a1 , a2 , . . . , an }.
e
e
/2
2
2(+a/3)
(2)
(3)
Upper tails
Theorem 11
Lower tails
Theorem 10
Theorem 13
Theorem 8
Theorem 9
Theorem 6
Theorem 7
Theorem 14
Theorem 5
Theorem 4
Cumulative Probability
0.8
0.6
0.4
0.2
0
70
80
90
100
value
110
120
130
n.
We
consider
the
sum
X
=
i=1 Xi with
P
P
expectation E(X) = ni=1 E(Xi ) and variance Var(X) = ni=1 Var(Xi ). Then
we have
2
Pr(X E(X) + ) e 2(Var(X)+M /3) .
In the other direction, we have the following inequality.
Theorem 7 If X1 , X2 , . . . , Xn are non-negativePindependent random variables,
n
we have the following bounds for the sum X = i=1 Xi :
Pr(X E(X) ) e
Pn 2E(X2 )
i=1
2(kXk2+M /3)
2(kXk2+M /3)
Before we give the proof of Theorems 8, we will first show the implications
of Theorems 8 and 9. Namely, we will show that the other concentration inequalities can be derived from Theorems 8 and 9.
Fact: Theorem 8 = Theorem 6:
Pn
i=1
for 1 i n.
We also have
kX 0 k2
n
X
E(X 0 i )
i=1
n
X
Var(Xi )
i=1
Var(X).
Pr(X E(X) + ) =
2(kX 0 k
2 +M /3)
2
2(Var(X)+M
/3)
.
n
X
E(Yi2 ) =
i=1
n
X
a2i pi = .
i=1
PX
i
)=
n
Y
E(etXi )
i=1
2(ey 1y)
,
y2
g(0) = 1.
g(y) 1, for y < 0.
g(y) is monotone increasing, for y 0.
X
y k2
k!
k=2
X
y k2
k=2
3k2
1
1 y/3
since k! 2 3k2 .
Then we have, for k 2,
E(etX ) =
=
n
Y
E(etXi )
i=1
n
Y
k k
X
t Xi
)
E(
k!
i=1
k=0
n
Y
1
E(1 + tE(Xi ) + t2 Xi2 g(tXi ))
2
i=1
n
Y
1
(1 + tE(Xi ) + t2 E(Xi2 )g(tM ))
2
i=1
n
Y
1 2
etE(Xi )+ 2 t
E(Xi2 )g(tM)
i=1
=
=
1 2
= Pr(etX etE(X)+t )
etE(X)t E(etX )
1 2
g(tM)kXk2
1 2
1
kXk2 1tM/3
et+ 2 t
et+ 2 t
To minimize the above expression, we choose t =
3 and we have
Pr(X E(X) + )
kXk2 +M/3 .
1 2
et+ 2 t
.
Therefore, tM <
1
kXk2 1tM/3
= e
2(kXk2+M /3)
.
2(Var(X)+
Pn2
i=1
Pn
i=1
a2 +M /3)
i
Xi0 . We have
for 1 i n.
X 0 E(X 0 ) =
n
X
(Xi0 E(Xi0 ))
i=1
=
=
n
X
i=1
n
X
(Xi0 + ai )
(Xi E(Xi ))
i=1
X E(X).
Thus,
kX 0 k2
n
X
E(X 0 i )
i=1
n
X
E((Xi E(Xi ) ai )2 )
i=1
n
X
i=1
Var(X) +
n
X
a2i .
i=1
2(Var(X)+
Pn
i=k
2
(Mi Mk )2 +Mk /3)
for 1 k n,
n
X
a2i =
n
X
i=1
(Mi Mk )2 .
i=k
2(Var(X)+
Pn
i=k
2
(Mi Mk )2 +Mk /3)
.
and
Pr(Xi = 1) = p,
and
Pr(Xn =
n) = p.
We have
E(X) =
n
X
E(Xi )
i=1
= (n 1)p +
Var(X) =
n
X
np.
Var(Xi )
i=1
= (n 1)p(1 p) + np(1 p)
= (2n 1)p(1 p).
2((2n1)p(1p)+(1p)
n/3)
Thus,
Pr(Xi E(X) + ) e
For constant p (0, 1) and = (n
1
2 +
2((1p2 )n+(1p)
2 /3)
), we have
2
2(Var(X)+
Pn2
i=1
a2 +M /3)
i
2(Var(X)+
Pn
i=k
2
(Mi Mk )2 +Mk /3)
Continuing
the above example, we choose M1 = M2 = . . . = Mn1 = p, and
2(p(2p)n+p
2 /3)
consideration and can be difficult to deal with in various cases. In this survey
we will use the following definition which is concise and basically equivalent for
the finite cases.
Suppose that is a probability space with a probability distribution p. Let
F denote a -field on . (A -field on is a collection of subsets of which
contains and , and is closed under unions, intersections, and complementation.) In a -field F of , the smallest set in F containing an element x is the
intersection of all sets in F containing x. A function f : R is said to be
F-measurable if f (x) = f (y) for any y in the smallest set containing x. (For
more terminology on martingales, the reader is referred to [17].)
If f : R is a function, we define the expectation E(f ) = E(f (x) | x )
by
X
E(f ) = E(f (x) | x ) :=
f (x)p(x).
x
f (y)p(y)
(4)
14
Pn2
i=1
c2
i
(5)
where c = (c1 , . . . , cn ).
Theorem 17 Let X1 , X2 , . . . , Xn be independent random variables satisfying
|Xi E(Xi )| ci
for 1 i n.
P
Then we have the following bound for the sum X = ni=1 Xi .
Pr(|X E(X)| ) 2e
Pn2
i=1
c2
i
1 tc
1
(e etc )x + (etc + etc ).
2c
2
E(
1 tci
1
(e etci )(Xi Xi1 ) + (etci + etci )|Fi1 )
2ci
2
1 tci
(e + etci )
2
2 2
et ci /2 .
Here we apply the conditions E(Xi Xi1 |Fi1 ) = 0 and |Xi Xi1 | ci .
Hence,
2 2
E(etXi |Fi1 ) et ci /2 etXi1 .
Inductively, we have
E(etX ) =
E(E(etXn |Fn1 ))
et cn /2 E(etXn1 )
n
Y
2 2
et ci /2 E(etX0 )
2 2
i=1
1 2
e2t
n
i=1
c2i tE(X)
Therefore,
Pr(X E(X) + )
= Pr(et(XE(X)) et )
et E(et(XE(X)) )
1 2
et e 2 t
1 2
= et+ 2 t
15
P
P
n
i=1
c2i
n
i=1
c2i
We choose t =
Pc
n
i=1
2
i
et+ 2 t
Pr(X E(X) + )
= e
Pn2
i=1
c2
i
n
i=1
c2i
Pn2
c2
i=1 i
.
Many problems which can be set up as a martingale do not satisfy the Lipschitz
condition. It is desirable to be able to use tools similar to the Azuma inequality
in such cases. In this section, we will first state and then prove several extensions
of the Azuma inequality (see Figure 9).
Upper tails
Theorem 19
Lower tails
Theorem 20
Theorem 22
Theorem 21
Theorem 24
Theorem 23
Theorem 18
Pn
2(
i=1
2
2 +M /3)
i
Pn
2
(2 +M 2 )
i
i
i=1
Pn
2(
i=1
2
(2 +a2 )+M /3)
i
i
Pn
2(
i=1
2 +
i
2
PM >M
(Mi M )2 +M /3)
i
1
1b/3 .
17
k
X
t
(Xi Xi1 ai )k |Fi1 )
= 1 tai + E(
k!
k=2
2
1 tai + E(
t
(Xi Xi1 ai )2 g(tM )|Fi1 )
2
t2
g(tM )E((Xi Xi1 ai )2 |Fi1 )
2
t2
= 1 tai + g(tM )(E((Xi Xi1 )2 |Fi1 ) + a2i )
2
t2
1 tai + g(tM )(i2 + a2i )
2
= 1 tai +
t2
t2
Inductively, we have
E(etX ) =
E(E(etXn |Fn1 ))
t2
n
Y
t2
i=1
1 2
e2t
g(tM)
n
2
2
i=1 (i +ai )
etE(X) .
= Pr(etX etE(X)+t )
etE(X)t E(etX )
1 2
et e 2 t g(tM)
1 2
= et+ 2 t g(tM)
1
t2
et+ 2 1tM/3
18
P
P
P
n
2
2
i=1 (i +ai )
n
2
2
i=1 (i +ai )
n
2
2
i=1 (i +ai )
We choose t =
n
2
2
i=1 (i +ai )+M/3
Pr(X E(X) + )
et+ 2 1tM/3
Pn
2(
i=1
n
2
2
i=1 (i +ai )
2
(2 +a2 )+M /3)
i
i
Pn
2(
i=1
2
(2 +a2 )+M /3)
i
i
Pn
2
(2 +M 2 )
i
i
i=1
Pn
2(
i=1
2 +
i
2
PM >M
(Mi M )2 +M /3)
i
Supermartingale
Theorem 27
Theorem 29
Theorem 25
Theorem 26
Theorem 22
Pn2
2((X
0 +)(
i=1
i )+M /3)
2(X
for any X0 .
20
Pn
0(
2
i )+M /3)
i=1
Pn
2(
i=1
Pn
2
(2 +a2 )+(X0 +)(
i
i
i=1
i )+M /3)
n
2
2
i=1 (i +ai )
n
i
i=1
Pn
2(
i=1
Pn
2
(2 +a2 )+X0 (
i
i
i=1
i )+M /3)
tE(Xi |Fi1 )+
k=2
P k=0
y k2
k!
tk
k=2 k!
satisfies
1
1 b/3
k!
k=2
g(tM ) 2
t E((Xi E(Xi |Fi1 ) ai )2 |Fi1 )
2
g(tM ) 2
t (Var(Xi |Fi1 ) + a2i )
2
g(tM ) 2 2
t (i + i Xi1 + a2i ).
2
etE(Xi |Fi1 )+
etXi1 +
= e(t+
tk
k=2 k!
g(tM ) 2
t (i2 +i Xi1 +a2i )
2
g(tM )
i t2 )Xi1
2
t2
e 2 g(tM)(i +ai ) .
g(t0 M ) 2
i ti ,
2
t2
i
E(eti Xi |Fi1 )
e(ti +
g(ti M )
i t2i )Xi1
2
e(ti +
g(t0 M ) 2
ti i )Xi1
2
t2
i
t2
n
i
i=1 2
22
n
2
2
i=1 (i +ai )
Note that
tn
n
X
t0
t0
(ti1 ti )
i=1
n
X
i=1
g(t0 M ) 2
i ti
2
n
g(t0 M ) 2 X
t0
i .
2
i=1
t0
Hence
Pr(Xn X0 + )
t2
0
e(t0
et0 +
Now we choose t0 =
t0 M < 3, we have
Pr(Xn X0 + )
P
P
t (
g(t0 M ) 2
t0
2
n
i=1
g(t0 M ) 2
0
2
e
e
t0 +t20 (
n
2
2
i=1 (i +ai )
t2
0
2
i )(X0 +)+t0 X0 +
n
2
2
i=1 (i +ai )+(X0 +)
n
2
2
i=1 (i +ai )+(X0 +)(
n
i=1
i )+M/3
Pn
2
(2 +a2 )+(X0 +)(
i=1 i
i
i=1
n
i=1
n
2
2
i=1 (i +ai )
i )
n
2
2
i=1 (i +ai )+(X0 +)
P
2( n
g(t0 M)
n
i=1
i ) 2(1t1 M/3)
i )+M /3)
tE(Xi |Fi1 )+
P k=0
tk
k=2 k!
etE(Xi |Fi1 )+
g(tM ) 2
t E((Xi E(Xi |Fi1 )ai )2 )
2
etE(Xi |Fi1 )+
g(tM ) 2
t (Var(Xi |Fi1 )+a2i )
2
e(t
g(tM ) 2
t i )Xi1
2
g(tM ) 2
t (i2 +a2i )
2
g(tn M ) 2
i ti .
2
and
E(eti Xi |Fi1 )
e(ti
g(ti M ) 2
ti i )Xi1
2
e(ti
g(tn M ) 2
ti i )Xi1
2
= eti1 Xi1 e
g(ti M ) 2
ti (i2 +a2i )
2
g(tn M ) 2
ti (i2 +a2i )
2
g(tn M ) 2
ti (i2 +a2i )
2
Pr(tn Xn tn (X0 ))
etn (X0 ) E(etn Xn )
t2
n
2
g(tn M ) 2
2
tn (n
+a2n )
2
g(tn M ) 2
n
ti (i2 +a2i )
i=1
2
g(tn M)
n
2
2
i=1 (i +ai )
We note
t0
tn +
tn
n
X
(ti1 ti )
i=1
n
X
g(tn M ) 2
i ti
2
i=1
tn
n
g(tn M ) 2 X
tn
i .
2
i=1
Thus, we have
Pr(Xn X0 )
We choose tn =
etn +
n
2
2
i=1 (i +ai )+(
Pr(Xn X0 )
n
i=1
g(tn M ) 2
n
2
Pn
2(
n
i=1
i )X0 +M/3
2
i=1
g(tn M)
n
2
2
i=1 (i +ai )
g(tn M)
P
( +a )+(
2
i
2
i
n
i=1
n
2
2
i=1 (i +ai )
i )X0 )
n
2
2
i=1 (i +ai )+(
Pn
2
(2 +a2 )+X0 (
i
i
24
t2
g(tn M ) 2
tn )X0 + 2n
2
P
t (
etn +tn (
e
t2
n
2
i=1
n
i=1
1
i )X0 ) 2(1tn
M/3)
i )+M /3)
t0
tn
tn (1
n
X
1
tn
i )
2(1 tn M/3) i=1
tn (1
P ( +a ) )
2X0 + P
0.
n
g(tn M ) 2 X
tn
i
2
i=1
n
2
2
i=1
i
i
n
i=1 i
We allow puv to be zero and thus include the case of having fewer than r outputs,
for some fixed r. Let i denote the probability space obtained after the first i
decisions. Suppose = n and X is the random variable on . Let i : i
be the projection mapping each point to the subset of points with the same
first i decisions. Let Fi be the -field generated by Y1 , Y2 , . . . , Yi . (In fact,
Fi = i1 (2i ) is the full -field via the projection i .) The Fi form a natural
25
filter:
{, } = F0 F1 Fn = F .
The leaves of the decision tree are exactly the elements of . Let X0 , X1 , . . . , Xn =
X denote the sequence of decisions to evaluate X. Note that Xi is Fi -measurable,
and can be interpreted as a labeling on nodes at depth i.
There is one-to-one correspondence between the following:
A sequence of random variables X0 , X1 , . . . , Xn satisfying Xi is Fi -measurable,
for i = 0, 1, . . . , n.
A vertex labeling of the decision tree T , f : V (T ) R.
In order to simplify and unify the proofs for various general types of martingales, here we introduce a definition for a function f : V (T ) R. We say f
satisfies an admissible condition P if P = {Pv } holds for every vertex v.
Examples of admissible conditions:
1. Supermartingale: For 1 i n, we have
E(Xi |Fi1 ) Xi1 .
Thus the admissible condition Pu holds if
X
puv f (v)
f (u)
vC(u)
where Cu is the set of all children nodes of u and puv is the transition
probability at the edge uv.
2. Submartingale: For 1 i n, we have
E(Xi |Fi1 ) Xi1 .
In this case, the admissible condition of the submartingale is
X
puv f (v).
f (u)
vC(u)
26
vC(u)
vC(u)
For any labeling f on T and fixed vertex r, we can define a new labeling fr
as follows:
f (r) if u is a descedant of r.
fr (u) =
f (u) otherwise.
A property P is said to be invariant under subtree-unification if for any tree
labeling f satisfying P , and a vertex r, fr satisfies P .
We have the following theorem.
Theorem 31 The eight properties as stated in the preceding examples supermartingale, submartingale, martingale, c-Lipschitz, bounded variance, general
bounded variance, upper-bounded, and lower-bounded properties are all invariant under subtree-unification.
Proof: We note that these properties are all admissible conditions. Let P
denote any one of these. For any node u, if u is not a descendant of r, then fr
and f have the same value on v and its children nodes. Hence, Pu holds for fr
since Pu does for f .
If u is a descendant of r, then fr (u) takes the same value as f (r) as well as
its children nodes. We verify Pu in each case. Assume that u is at level i of the
decision tree T .
1. For supermartingale, submartingale, and martingale properties, we have
X
X
puv fr (v) =
puv f (r)
vC(u)
vC(u)
f (r)
=
=
f (r)
fr (u).
puv
vC(u)
vC(u)
vC(u)
2
vC(u)
2
= f (r) f (r)
= 0
i2 .
28
vC(u)
puv fr (v))2
vC(u)
puv f 2 (r) (
vC(u)
2
puv f (r))2
vC(u)
2
= f (r) f (r)
= 0
i2 + i fr (u).
5. For upper-bounded property, we have
X
X
puv fr (v) = f (r)
puv f (r)
fr (v)
vC(u)
vC(u)
= f (r) f (r)
= 0
ai + M.
for any child v of u.
6. For the lower-bounded property, we have
X
X
puv fr (v) fr (v) =
puv f (r) f (r)
vC(u)
vC(u)
= f (r) f (r)
= 0
ai + M,
for any child v of u.
.
Therefore, Pv holds for fr and any vertex v.
For two admissible conditions P and Q, we define P Q to be the property,
which is only true when both P and Q are true. If both admissible conditions P
and Q are invariant under subtree-unification, then P Q is also invariant under
subtree-unification.
For any vertex u of the tree T , an ancestor of u is a vertex lying on the
unique path from the root to u. For an admissible condition P , the associated
bad set Bi over Xi s is defined to be
Bi = {v| the depth of v is i, and Pu does not hold for some ancestor u of v}.
Lemma 1 For a filter F
{, } = F0 F1 Fn = F ,
suppose each random variable Xi is Fi -measurable, for 0 i n. For any
admissible condition P , let Bi be the associated bad set of P over Xi . There are
random variables Y0 , . . . , Yn satisfying:
29
1. Yi is Fi -measurable.
2. Y0 , . . . , Yn satisfy condition P .
3. {x : Yi (x) 6= Xi (x)} Bi , for 0 i n.
We modify f and define f 0 on T as follows. For any vertex u,
f (u) if f satisfies Pv for every ancestor v of u including u itself,
f 0 (u) =
f (v) v is the ancestor with smallest depth so that f fails Pv .
Proof:
Pn2
i=1
c2
i
+ Pr(B),
Yi1
ci .
Pn2
c2
i=1 i
Pr(|Yn Y0 | ) + Pr(Xn 6= Yn )
2e
Pn2
i=1
c2
i
+ Pr(B).
For c = (c1 , c2 , . . . , cn ) a vector with positive entries, a martingale is said to
be near-c-Lipschitz with an exceptional probability if
X
Pr(|Xi Xi1 | ci ) .
(6)
i
Pna2
i=1
c2
i
+ .
Now, we can apply the same technique to relax all the theorems in the
previous sections.
Here are the relaxed versions of Theorems 20, 25, and 27.
Theorem 34 For a filter F
{, } = F0 F1 Fn = F ,
suppose a random variable Xi is Fi -measurable, for 0 i n. Let B be the
bad set associated with the following admissible conditions:
E(Xi | Fi1 )
Var(Xi |Fi1 )
Xi E(Xi |Fi1 )
Xi1
i2
ai + M
Pn
2(
i=1
2
(2 +a2 )+M /3)
i
i
+ Pr(B).
Xi1
i Xi1
M
Pn2
2((X
0 +)(
i=1
i )+M /3)
+ Pr(B).
Xi1
i2 + i Xi1
ai + M
Pn
2(
i=1
Pn
2
(2 +a2 )+(X0 +)(
i
i
i=1
i )+M /3)
+ Pr(B).
Xi1
i2
ai + M
Pn
2(
i=1
2
(2 +a2 )+M /3)
i
i
+ Pr(B).
Xi1
i Xi1
E(Xi |Fi1 ) Xi
2(X
Pn
0(
2
i )+M /3)
i=1
+ Pr(B).
for all X0 .
Theorem 39 For a filter F
{, } = F0 F1 Fn = F ,
suppose a non-negative random variable Xi is Fi -measurable, for 0 i n.
Let B be the bad set associated with the following admissible conditions:
E(Xi | Fi1 )
Var(Xi |Fi1 )
E(Xi |Fi1 ) Xi
Xi1
i2 + i Xi1
ai + M
Pn
2(
i=1
Pn
2
(2 +a2 )+X0 (
i
i
i=1
i )+M /3)
+ Pr(B),
for < X0 .
process has a similar flavor as the preferential attachment scheme, one of the
main models for generating the webgraph among other information networks
(see Barabasi et al [4, 6]).
In Subsection 9.1, we will show that the infinite Polya process generates
a power law distribution so that the expected fraction of bins having k balls
is asymptotic to ck , where = 1 + 1/(1 p) and c is a constant. Then
the concentration result on the probabilistic error estimates for the power law
distribution will be given in Subsection 9.2.
9.1
To analyze the infinite Polya process, we let nt denote the number of bins at
time t and let et denote the number of balls at time t. We have
et = t + .
The number of bins nt , however, is a sum of t random indicator variables,
nt = +
t
X
st
i=1
where
Pr(sj = 1) = p,
Pr(sj = 0) = 1 p.
It follows that
E(nt ) = + pt.
To get a handle on the actual value of nt , we use the binomial concentration
inequality as described in Theorem 4. Namely,
2
/(2pt+2a/3)
(1 p)k
(1 p)(k 1)
) + mk1,t1 (
)
t+
t+1
(1 p)k
(1 p)(k 1)
) + E(mk1,t1 )(
(7)
).
E(mk,t1 )(1
t+1
t+1
mk,t1 (1
34
(8)
p
E(m1,t )
=
.
t
2p
Now we assume that limt E(mk1,t )/t exists and we apply the fact again
with bt = b = k(1 p) and ct = E(mk1,t1 )(1 p)(k 1)/(t + 1), so
c = Mk1 (1 p)(k 1). Thus the limit limt E(mk,t )/t exists and is equal to
Mk = Mk1
(1 p)(k 1)
k1
= Mk1
1 .
1 + k(1 p)
k + 1p
(9)
Mk
=
= (1 ) = 1 + O( 2 ).
Mk1
(k 1)
k
k
k
From (9) we have
1+
k1
Mk
=
=1
1
Mk1
k + 1p
k+
1
1p
1
1p
= 1
1+
1
1p
p
1
=2+
.
1p
1p
35
+ O(
1
).
k2
9.2
Since the expected value can be quite different from the actual number of bins
with k balls at time t, we give a (probabilistic) estimate of the difference.
We will prove the following theorem.
Theorem 40 For the infinite Polya process, asymptotically almost surely the
number of bins with k balls at time t is
p
Mk (t + ) + O(2 k 3 (t + ) ln(t + )).
Recall M1 =
p
2p
and Mk =
1
p (k)(2+ 1p )
1
2p (k+1+ 1p
)
other words, almost surely the distribution of the bin sizes for the infinite Polya
1
.
process follows a power law with the exponent = 1 + 1p
Proof: We have shown that
E(mk,t )
= Mk ,
t
where Mk is defined recursively in (9). It is sufficient to show mk,t concentrates
on the expected value.
We shall prove the following claim.
lim
Claim: For any fixed k 1, for any c > 0, with probability at least 1 2(t +
2
+ 1)k1 ec , we have
|mk,t Mk (t + )| 2kc t + .
p
To see that the claim implies Theorem 40, we choose c = k ln(t + ). Note
that
2
2(t + )k1 ec = 2(t + + 1)k1 (t + )k = o(1).
From the claim, with probability 1 o(1), we have
p
|mk,t Mk (t + )| 2 k 3 (t + ) ln(t + ),
as desired.
It remains to prove the claim.
Proof of the Claim: We shall prove it by induction on k.
The base case of k = 1:
For k = 1, from equation (8), we have
E(m1,t M1 (t + )|Ft1 )
= E(m1,t |Ft1 ) M1 (t + )
1p
) + p M1 (t + 1) M1
= m1,t1 (1
t+1
1p
) + p M1 (1 p) M1
= (m1,t1 M1 (t + 1))(1
t+1
1p
)
= (m1,t1 M1 (t + 1))(1
t+1
36
since p M1 (1 p) M1 = 0.
m M (t+)
Let X1,t = t1,t (11 1p ) . We consider the martingale formed by 1 =
j=1
j+1
Qt1
1p
1p
(1
)
j=1
j=1 (1 j+1 )
j+1
=
Qt
Qt
j=1 (1
j=1 (1
1p
j+1 )
1p
j+1 )
1p
)]
t+1
1p
(m1,t1 M1 (t + 1)) M1 ].
t+1
|X1,t X1,t1 | Qt
j=1 (1
1p
j+1 )
p
2p
< 1. We have
(10)
1
j=1 (1
Var (m1,t M1 (t + )) Qt
1
(1
j=1
Qt
Qt
Qt
j=1 (1
j=1 (1
1p 2
j+1 )
1p 2
j+1 )
1p 2
j+1 )
1p
j+1 )
Var(m1,t M1 (t + )|Ft1 )
Var(m1,t |Ft1 )
.
(11)
4
1p
t
j=1 (1 j+1 )
and ai = 0. We have
Pr(X1,t E(X1,t ) + ) e
37
Ft1
Pt
2(
i=1
2
2 +M /3)
i
1p
i
2
j=1 (1 j+1 )
(1
j=1
where C =
()
(1+p)
i
Y
j +2+p
j+1
j=1
1p
) =
j +1
=
()(i + 1 + p)
( 1 + p)(i + )
C(i + )1+p
Q 4c (1t+
i2
i=1
1p
j )
t
j=1
t
X
t
X
Qi
j=1 (1
i=1
4C 1 ct3/2p . We have
1p 2
j )
4C 2 (i + )22p
i=1
4C 2
(t + )32p
3 2p
< 4C 2 (t + )32p .
We note that
M /3
provided 4c/3 <
8 2 5/22p
C ct
< 2C 2 t32p
3
t + . We have
Pr(X1,t 1 + )
e
< e
Pt
2(
i=1
2
2 +M /3)
i
16C 2 c2 t32p
8C 2 t32p +2C 2 (t+)32p
2
< ec .
Since 1 is much smaller than , we can replace 1 + by 1 without loss of
2
generality. Thus, with probability at least 1 ec , we have
X1,t .
2
m1,t M1 (t + ) 2c t + .
(12)
We remark
that inequality (12) holds for any c > 0. In fact, it is trivial when
4c/3 > t + since |m1,t M1 (t + )| 2t always holds.
Similarly, by applying Theorem 23 on the martingale, the following lower
bound
m1,t M1 (t + ) 2c t +
(13)
38
Mk (t + ) 2(k 1)c t + .
2
(1 p)k
)(mk,t1 Mk (t + 1) 2(k 1)c t + 1)
(1
t
from the fact that Mk Mk1 as seen in (9).
Therefore, 0 = Xk,0 , Xk,1 , , Xk,t forms a submartingale with failure prob2
ability at most 2tk2 ec .
Similar to inequalities (10) and (11), it can be easily shown that
4
|Xk,t Xk,t1 | Qt
j=1 (1
(14)
(1p)k
j+1 )
and
Var(Xk,t |Ft1 )
Qt
j=1 (1
(1p)k 2
j+1 )
4
(1p)
t
j=1 (1 j+1 )
and ai = 0. We have
Pr(Xk,t E(Xk,t ) + ) e
Pt
2(
i=1
2
2 +M /3)
i
(1p)k 2
i
j=1 (1 j+1 )
+ Pr(B),
(1
j=1
i
Y
j (1 p)k
j +1
j=1
(1 p)k
) =
j +1
(i + 1 (1 p)k)
()
(1 (1 p)k)
(i + )
Ck (i + )(1p)k
where Ck =
()
(1(1p)k)
i2
i=1
t
X
i=1
4c t+
(1p)k
t
)
j=1 (1
j
t
X
Qi
j=1 (1
(1p)k 2
)
j
4Ck2 (i + )2k(1p)
i=1
4Ck2
(t + )1+2k(1p)
1 + 2k(1 p)
<
4Ck2 (t + )1+2k(1p) .
We note that
M /3
provided 4c/3 <
8 2
C c(t + )1/2+2(1p) < 2Ck2 (t + )1+2(1p)
3 k
t + . We have
Pr(Xk,t )
<
e
e
Pt
2(
i=1
2
2 +M /3)
i
+ Pr(B)
2 2
16C
c (t+)1+2k(1p)
k
2
2
8C
(t+)1+2(1p) +2C
(t+)1+2(1p)
k
k
2
<
ec + (t + )k1 ec
(t + + 1)k1 ec .
+ Pr(B)
mk,t Mk (t + ) 2ck t + .
(15)
We remark
that inequality (15) holds for any c > 0. In fact, it is trivial when
4c/3 > t + since |mk,t Mk (t + )| 2(t + ) always holds.
40
0
is nearly a supermartingale. Similarly, if apIt can be easily shown that Xk,t
0
plying Theorem 38 to Xk,t
, the following lower bound
(16)
mk,t Mk (t + ) 2kc t +
2
References
[1] J. Abello, A. Buchsbaum, and J. Westbrook, A functional approach to
external graph algorithms, Proc. 6th European Symposium on Algorithms,
pp. 332343, 1998.
[2] W. Aiello, F. Chung and L. Lu, A random graph model for massive graphs,
Proceedings of the Thirty-Second Annual ACM Symposium on Theory of
Computing, (2000) 171-180.
[3] W. Aiello, F. Chung and L. Lu, Random evolution in massive graphs,
Extended abstract appeared in The 42th Annual Symposium on Foundation
of Computer Sciences, October, 2001. Paper version appeared in Handbook
of Massive Data Sets, (Eds. J. Abello, et. al.), Kluwer Academic Publishers
(2002), 97-122.
[4] R. Albert and A.-L. Barab
asi, Statistical mechanics of complex networks,
Review of Modern Physics 74 (2002), 47-97.
[5] N. Alon and J. H. Spencer, The Probabilistic Method, Wiley and Sons, New
York, 1992.
[6] A.-L. Barab
asi and R. Albert, Emergence of scaling in random networks,
Science 286 (1999) 509-512.
[7] N. Alon, J.-H. Kim and J. H. Spencer, Nearly perfect matchings in regular
simple hypergraphs, Israel J. Math. 100 (1997), 171-187.
[8] H. Chernoff, A note on an inequality involving the normal distribution,
Ann. Probab., 9, (1981), 533-535.
[9] F. Chung and L. Lu, Connected components in random graphs with given
expected degree sequences, Annals of Combinatorics 6, (2002), 125145.
[10] F. Chung and L. Lu, The average distances in random graphs with given
expected degrees, Proceeding of National Academy of Science 99, (2002),
1587915882.
[11] F. Chung, L. Lu and V. Vu, The spectra of random graphs with given
expected degrees, Proceedings of National Academy of Sciences, 100, no.
11, (2003), 6313-6318.
[12] F. Chung and L. Lu, Coupling online and offline analyses for random power
law graphs, Internet Mathematics, 1 (2004), 409-461.
[13] F. Chung and L. Lu, Complex Graphs and Networks, CBMS Lecture Notes,
in preparation.
[14] F. Chung, S. Handjani and D. Jungreis, Generalizations of Polyas urn
problem, Annals of Combinatorics, 7, (2003), 141-153.
42
[15] W. Feller, Martingales, An Introduction to Probability Theory and its Applications, Vol. 2, New York, Wiley, 1971.
[16] R. L. Graham, D. E. Knuth and O. Patashnik, Concrete Mathematics,
Second edition, Addison-Wesley Publishing Company, Reading, MA, 1994.
[17] S. Janson, T. Luczak, and A. Rucnski, Random Graphs, WileyInterscience, New York, 2000.
[18] N. Johnson and S. Kotz, Urn Models and their Applications: An approach
to Modern Discrete Probability Theory, Wiley, New York, 1977.
[19] J. H. Kim and V. Vu, Concentration of multivariate polynomials and its
applications, Combinatorica, 20 (3) (2000), 417-434.
[20] C. McDiarmid, Concentration, Probabilistic methods for algorithmic discrete mathematics, 195248, Algorithms Combin., 16, Springer, Berlin,
1998.
[21] M. Mitzenmacher, A brief history of generative models for power law and
lognormal distribution, Internet Mathematics, 1, (2004), 226-251.
[22] N. C. Wormald, The differential equation method for random processes
and greedy algorithms, in Lectures on Approximation and Randomized Algorithms, (M. Karonski and H. J. Proemel, eds), pp. 73-155, PWN, Warsaw,
1999.
43