A Modular Proof of ALN PDF

A Modular Proof of ALN
September 26, 2017
Abstract
p
We describe the Õ( log n)-distortion embedding of the metrics of negative type to `2 due
to Arora, Lee and Naor [ALN05]. This yields a general tool for the roundings of natural SDP
relaxations of certain classes of combinatorial optimization problems. A well-studied example is
the Generalized
p Sparsest Cut problem, where this approach gives the best known approximation
ratio of Õ( log n). The crux of their argument is to construct a family of embeddings for several
di↵erent growth ratios and glue them in a non-trivial way.
1 Introduction
The problem of embedding one metric space into another is a well-studied problem in Computer
Science. Much of the research in this area deals with embedding finite metrics into simpler target
metrics such as `1 or distribution over tree metrics. Of those metrics, `1 is a particularly interesting
metric, as it can be written as a cone of cut metrics. There are numerous combinatorial optimization
problems seeking to compute an optimum cut in the graph, based on some objective function. Thus,
they are equivalent to optimizing that objective function over all n point `1 metrics, which is generally
NP-hard. A work around is to optimize over all n point metrics instead, and hope to embed the
resulting metric into `1 , without much loss. One of the first result in the area was due to [Bou85],
who showed that each n point metric embeds (tightly) into `1 (more generally, to any `p ) space with
distortion O(log n).
A natural extension of this idea is to optimize over some smaller class of metrics which (a) contains
`1 ; and (b) can be optimized (say, via SDP) in polynomial time. A candidate class that has received
significant attention is the class of metrics of negative type (see Definition 2), which can be optimized
over by semidefinite programming. But, in order to exploit it for better approximation algorithms,
one must show that it embeds into `1 with low distortion. In a major breakthrough, Arora, Rao
and Vazirani [ARV04] showed that each n point metric of negative type with diameter p 1 has a non-
expanding embedding intop`1 such that the sum of distances decrease by only O( log n). As an
application, they show O( log n) approximation algorithm for the Uniform Sparsest Cut problem.
Building on the work of [ARV04] and its improvement by James Lee [Lee05], Chawla, Gupta and
Räcke [CGR05] showed that each n point metric of negative type can be embedded into `2 ✓ `1
3/4
with
p distortion O(log n). Arora, Lee and Naor [ALN05] improved the bound to the current best of
Õ( log n):
Theorem 1.1 Let (X, d) be an n-pointp metric space of negative type. Then, (X, d) embeds into L2
(and hence, `2 ) with distortion Õ( log n).
p
As an application, they give factor Õ( log n) approximation algorithm for the Generalized Sparsest
Cut problem. They use the well known fact that the integrality gap of the standard SDP for the
1
Generalized Sparsest Cut Problem is equal to the least distortion required to embed every n point
metric of negative type to `1 . In the current writeup, we refrain from going into these applications,
and address the structural question of embedding metrics of negative type:
Question 1 How well can we embed a metric of negative type into `1 ?
The best (unconditional) lower bound known on the requiredpdistortion is ⌦(log log n), due to [KR06].
On the other hand, the best algorithm that we have gives Õ( log n) distortion embedding, directly to
`2 , which is tight up to polylog factors [Enf70] (see Figure 1). An important open problem is to close
this gap, likely by show a better lower bound. In the writeup, we give a slightly simpler and modular
proof of Theorem 1.1.
Figure 1: A schematic view of `1 , `2 and the metric of negative type.
2 Preliminaries
We are given an n point metric space (X, d). We assume that the distance is normalized, so that
minimum distance is 1 and maximum distance is . We assume that log log n 4 is a large enough
integer, and log is also an integer.
Definition 1 A map f : X ! Y between metric spaces (X, dX ) and (Y, dY ) is said to be an embedding
of distortion D i↵ for all pairs x, y 2 X, 1/D · dX (x, y)  dY (f (x), f (y))  dX (x, y).
In the write up, the target space (Y, dY ) will always be a normed space. Thus, we will write
dY (f (x), f (y)) as kf (x) f (y)k when Y is clear from context. We will only need to consider a
special case of L2 metric: for a given a finite dimensional `2 space H and a probability space
(⌦, µ), we denote q by⇥ L2 (H, ⌦, µ) the Hilbert space of H-valued random variables Z with norm
⇤
kZkL2 (H,⌦,µ) = Eµ kZk22 . When the parameters are clear from context, we will denote the space
L2 (H, ⌦, µ) simply by L2 . To state the main result, let us first formally define metrics of negative
type.
Definition 2 The metric (X, d) is said to be of negative type if there exists a mapping h : X ! `2
such that d(x, y) = kh(x) h(y)k22 for all x, y 2 X.
Though the write-up deals with the low distortion L2 embeddings of a metric of negative type, we will
only use this fact in a black-box application of the following theorem to get a collection of sets, called
zero-sets. The following theorem is a concatenation of the result of [ARV04] strengthened by [Lee05],
and the reweighting method of [CGR05].
2
Theorem 2.1 (CGR Technique) There exists an absolute constant c⇤ , such that given any negative
type metric (X, d) and a parameters r, there is an efficient algorithm to compute a distribution on
zero-sets
p Wr ✓ X such that for all x, y 2 X for which 2r 1  d(x, y) < 2r+2 , x 2 Wr and d(y, Wr )
⇤
2r c / log |X| with constant probability.
This theorem is essentially same as Theorem 4.1 in [CGR05], but we include the proof in Section 6
for completeness. The majority of work is done in two parts: (i) repeatedly obtaining embeddings by
invoking Theorem 2.1 on more and more refined subspaces (X 0 , d) of (X, d); and (ii) gluing all the
resulting embeddings using, what is called, a gluing lemma.
For some expressions A and B, when we say A & B, we mean that A cB for some absolute constant
c. Similarly, A . B means A  c0 B for some absolute constant c0 . When we say A ⇡ B, we mean that
A 2 [B, 2B). We use the phrase ‘scale s’ to denote the distances in the range [2s , 2s+1 ) (see Definition
4, for instance). For convenience, we denote the set of all relevant scales S = {0, 1, . . . , log }. The
radius R ball around any point x 2 X is defined to be B(x, R) = {z 2 X : d(x, z)  R}.
2.1 Growth Ratio
The goal of this sub-section is to define the notion

n i of growth ratio, which
o is at the cornerstone of the
2
entire argument. For convenience, let G = 2 : i 2 {1, . . . , log log n} be the set of first log log n
natural numbers. Recall that c⇤ is a constant used in Theorem 2.1.
Definition 3 The growth ratio of a point x at scale s is defined as:

✓ ◆
0 0 |B(x, 2s+7 · log n)| 0
⇧s (x) := arg min 8K 2 G s.t K K, p K .
K2G |B(x, 2s 2 c⇤ / log K 0 )|
|B(x,2 s+7
log n)|
Note that ⇧s is well defined. Indeed, when K 0 = n, |B(x,2 s 2 c⇤ / log n)|  n. Thus, 4  ⇧s (x)  n for
p
all x 2 X. We note that the definition is somewhat convoluted, but we want the following properties
to be satisfied by it:
• For a point x and a scale s, if ⇧s (x) = O(1), then the balls around x of radius R ⇡ 2s 2 are
approximately of the same size (see: Observation 2.2); and
p
• The ratio of the sizes of balls of radius roughly 2s · log n and 2s / log ⇧s (x) is around ⇧s (x) in
the logarithmic scale, whenever ⇧s (x) >> 1 (see: Observation 2.3).
In the remainder of the section, we focus on quantifying the above mentioned properties stemming
out from our definition of growth ratio.
|B(x,2s 1 )|
Observation 2.2 If x 2 X be a point and s be a scale such that ⇧s (x) = 4, then |B(x,2s 2 )|  4.
2 c⇤ /
p
Proof: Notice that B(x, 2s 1 ) ✓ B(x, 2s+7 · log n) and B(x, 2s log ⇧s (x)) ✓ B(x, 2s 2 ). By
definition of the growth ratio,
|B(x, 2s 1 )| |B(x, 2s+7 · log n)|

 p  ⇧s (x) = 4
|B(x, 2s 2 )|
|B(x, 2s 2 c⇤ / log ⇧s (x))|
3
Observation 2.3 If x 2 X be a point and s be a scale such that ⇧s (x) > 4, then
|B(x, 2s+7 log n)| p

⇧s (x) ⇤
p > ⇧s (x).
|B(x, 2s 2 c / log ⇧s (x))|
Proof: Since ⇧s (x) > 4 and the growth ratio cannot be further reduced, we know that,
|B(x, 2s+7 log n)| p

q p > ⇧s (x),
|B(x, 2s 2 c⇤ / log ⇧s (x))|
p q p
Noting that B(x, 2s 2 c⇤ / log ⇧s (x)) ✓ B(x, 2s 2 c⇤ / log ⇧s (x)),
|B(x, 2s+7 log n)| |B(x, 2s+7 log n)| p

⇧s (x) p q p > ⇧s (x).
|B(x, 2s 2 c⇤ / log ⇧s (x))| |B(x, 2s 2 c⇤ / log ⇧s (x))|
We note that Observations 2.2 and 2.3 are the only properties we need out of growth ratio. As long
as these properties hold, we can also work with any alternate definition of growth ratio. Finally, given
a pair of points x, y 2 X, we will also need to consider a ‘symmetrized’ version of the growth ratio:
⇧⇤ (x, y) := max (⇧s (x), ⇧s (y)), where s is the scale such that d(x, y) ⇡ 2s .
2.2 Distortion as a Function of Points
For convenience, with an overload of notation, we think of distortion of an embedding as a function,

which maps a pair of points to their respective stretch under that embedding. Since we will be dealing
with the stretch incurred between di↵erent pairs of points at various scales, this definition will allow
us to write intermediate observations succinctly.
Definition 4 We say that an embedding f to some normed space, is of distortion D at scale s and
growth ratio G i↵ for all x, y 2 X:
• kf (x) f (y)k  d(x, y); and

d(x,y)
• If d(x, y) ⇡ 2s and ⇧⇤ (x, y) = G, then D(x,y)  kf (x) f (y)k  d(x, y).
We say that f is of distortion D at scale s i↵ f is of distortion D at scale s and all growth ratios
G 2 G. We say that f is of distortion D at growth ratio G i↵ f is of distortion D at growth ratio G
and all scales s 2 S. We say that f is of distortion D i↵ f is of distortion D at all scales s 2 S.
Given two embeddings f1 : X ! L2 and f2 : X ! L2 , we define f1 f2 : X ! L2 as the embedding

given by (f1 f2 )(x) = (f (x1 ), f (x2 )). We also extend this notation naturally in the case of more
than two embeddings. We work with the above definition of distortion as it allows easy gluing:
Observation 2.4 Let f1 , . . . , fz be L2 embeddings of distortions D1 , . . . , Dz . Then the embedding

p p
f := (1/ z) · (f1 . . . fz ) is an L2 embedding of distortion D = z · min (D1 , . . . , Dz ).
4
3 Ingredients
The key idea is to consider low distortion embeddings at various scales, and combine them to get a
single low distortion embedding, that works for all scales. A trivial concatenation of embeddings as
in Observation 2.4 will result in a loss of factor (log ). To overcome this difficulty, we will prove the
following ‘gluing lemma’, which is a generalization of Lee’s gluing lemma [Lee05], which in turn, is
based on the Measured Descent technique of [KLMN04]. We di↵er its proof to Section 5.
Lemma 3.1 (Gluing Lemma) For each s 2 {0, . . . , log }, let fs be an L2 embedding of distortion
D at scale s. Then, there is an L2 embedding f of distortion D̂, where
!
p D(x, y)
D̂(x, y) . log n · max 1, p · poly (log log n).
log ⇧⇤ (x, y)
The last theorem that we need is very similar to Bourgain’s argument [Bou85], and its proof (with
slightly di↵erent parameters) can be found in the full version of [Lee05].
Theorem 3.2 (A Variant of Bourgain’s Argument) For any p metric space (X, d), there is an
embedding H̃ of distortion D̃ at growth ratio 4, where D̃(x, y) . log n.
4 ALN Theorem
The goal of this section is to prove Theorem 1.1. We n assume we are given a metric
o space (X, d) as
2 i
described in Section 1. Recall that we defined G = 2 : i 2 {1, . . . , log log n} . We start with the
following core claim, whose proof is present in Section 4.1.
Lemma 4.1 For each scale m 2 S, and growth ratio G G
p 2 G, there is an embedding gm of distortion
Dm at scale m and growth ratio G, where Dm (x, y) . log ⇧⇤ (x, y).
G G
Let us first complete the proof of Theorem 1.1 assuming the above lemma.
Proof of Theorem 1.1. For each scale m and parameter G 2 G,plet gm G be the embedding given by
L G
Lemma 4.1. For each scale m, consider the embedding gm := 1/ |G| · G gm obtained by concate-
nating all the embedding at scale m, as described in Observation 2.4.
p
Claim 4.2 The embedding gm is of distortion D at scale m, where D(x, y) . log ⇧⇤ (x, y) · log log n.
Proof: Fix any pair x, y 2 X. Let m be the scale such that d(x, y) ⇡ 2m , and let ⇧⇤ (x, y) = G0 . From
Observation 2.4,
p p 0 p
D(x, y)  |G| · min DG m (x, y) = m (x, y) .
log log n · DG log ⇧⇤ (x, y) · log log n,
G2G
where the last inequality comes from Lemma 4.1.

Finally, we apply Gluing Lemma 3.1 on the embeddings gm to obtain the embedding Ĥ, with distrotion
D̂, where,
!
p D(x, y) p
D̂(x, y) . log n · max 1, p · poly (log log n) . log n · poly (log log n).
log ⇧⇤ (x, y)
This completes the proof of Theorem 1.1. ⇤
5
4.1 The Core of ALN
The goal of this sub-section is to prove Lemma 4.1. Let m, G be given as required in Lemma 4.1. If
G = 4, then we use the embedding guaranteed by Theorem 3.2. Thus, in the remainder of the section,
we assume G > 4. We let s be the scale such that 2s = 2m+6 · log n. We want to decompose X into
clusters P of bounded diameter such that, if we denote by Px 2 P the cluster containing x 2 X, then
a sufficiently large ball around x must be present in Px with high probability. Formally, we start with
the following folklore theorem which can be distilled out of [FRT04] (see, for example the discussion
following Definition 1.3 in [KLMN04]). We include its proof in Section 7 for completeness.
Theorem 4.3 (FRT Technique) Given a metric space (X, d), there is an efficient algorithm to
obtain a decomposition P of X such that (i) diameter of each cluster
⇥ P 2 P is bounded by ⇤ 2s+1 ; and
(ii) for any node x 2 X and the cluster Px 2 P containing it, Pr B(x, 2s 4 / log n) ✓ Px ⌦(1).
G.
Now, we are ready to describe the core algorithm to obtain the embedding gm
1. Let P be the decomposition obtained by Theorem 4.3
2. Let S be a random sample such that |S \ P | = min (|P |, G) for each block P 2 P
S\P for the scale m
3. Apply Theorem 2.1 on each S \ P to obtain zero-sets Wm
G (x) := d(x, W S\Px ) be the embedding that we output
4. Let gm m
The goal now is to show that the embedding gm G indeed has the claimed distortion. The following
observation is a direct consequence of triangle-inequality.

G (x)
Observation 4.4 For any pair of points, x, y 2 X, kgm G (y)k  d(x, y).
gm 2
For the sake of analysis, fix any pair x, y 2 X, such that d(x, y) ⇡ 2m and ⇧⇤ (x, y) = G. Let EALN works
S\P p
be the event that |d(x, Wm S\Px ) d(y, Wm y )| & d(x, y)/ log G. In the remainder of the section, our
aim is to show that the event EALN works p occurs with constant probability. Indeed, if we can show
G (x)
this, then kgm G (y)k & d(x, y)/ log G and Lemma 4.1 follows.
gm 2
For any pair of points x0 , y 0 2 S \ P , for some cluster P 2 P, let ECGR works (x0 , y 0 ) be the event that
0 S\P S\P m c ⇤ p
x 2 Wm and d(y, Wm ) > 2 / log G. From Theorem 2.1, we know that,
Observation 4.5 Let x0 , y 0 2 S \ P , for some cluster P 2 P, be a pair of points such that 2m 1 
d(x0 , y 0 ) < 2m+2 . Then, Pr [ECGR works (x0 , y 0 )] ⌦(1).
Assume for a moment that x, y 2 S \pPx . From Observation 4.5, we have that x 2 Wm S\Px and
p
S\Px ) & 2m / log G & d(x, y)/ log G, with constant probability. Thus, E
d(y, Wm ALN works indeed
occurs with constant probability, and we are done. But, it might be the case that either x or y (or
both) do not appear in S \ Px . To overcome this challenge, first, we want to show that a sufficiently
large ball around x and y is contained in Px with constant probability. Next, we will argue that S \ Px
contains points x0 and y 0 close enough to x and y respectively with high probability. In this way, we
will be able to apply Observation 4.5 to the nearby points x0 and y 0 , and the analysis will go through.
Let Esame block be the event that B(x, 2m+2 ) = B(x, 2s 4 / log n) ✓ Px . From Theorem 4.3,
6
Observation 4.6 Pr [Esame block ] = ⌦(1).
Notice that, if Esame block occurs, B(y, 2m+1 ) ✓ Px . From now on, we assume that Esame block indeed
occurs, and let Px = Py = Pp. Let Eneighbor in S (x) be the event that there is a point x0 2 S \ P
⇤
such that d(x, x0 )  2m 2 c / log G. Similarly, let Eneighbor in S (y) be the event that there is a point
0 0 m 2 c ⇤ p
y 2 S \ P such that d(y, y )  2 / log G.
Claim 4.7 Pr [Eneighbor in S (x) ^ Eneighbor in S (y)] ⌦(1).
p
2 c⇤ /
Proof: First, we give a lower bound on the size of the ball B(x, 2m log G):
|B(x, 2m+7 log n)| |P |

G = ⇧m (x) p p .
|B(x, 2m 2 c⇤ / log ⇧m (x))| |B(x, 2m 2 c⇤ / log G)|
Here, the last inequality follows from the fact that P ✓ B(x, 2s+1 ) = B(x, 2m+7 log n) (recall, the
⇤ p
s+1
diameter of P is bounded by 2 ). Rearrnging the terms, |B(x, 2 m 2 c / log G)| G1 · |P |
⇤ p
Notice that B(x, 2m 2 c / log G) ✓ B(x, 2m+2 ) ✓ P . Recall that S is a random sample of X such
that |S \ P | = min (|P |, G). A standard calculation now reveals that with constant probability, at
⇤ p
least one element of B(x, 2m 2 c / log G) is picked in S. Thus, Pr [Eneighbor in S (x)] ⌦(1). Using
the exact same calculations, we can also show that, Pr [Eneighbor in S (y)] ⌦(1).
The claim now follows by noting that both the events Eneighbor in S (x) and Eneighbor in S are independent
⇤ p ⇤ p
since the balls B(x, 2m 2 c / log G) and B(x, 2m 2 c / log G) are disjoint.
Now we are ready to show the following claim, completing our proof.
Claim 4.8 Pr [EALN works ] = ⌦(1).
Proof: Assume from now on that the events Eneighbor in S (x) and Eneighbor in S (x) indeed occur. Let
x0 , y 0 be the p pair of points thus guaranteed to exist. By the choice of points, |d(x0 , y 0 ) d(x, y)| 
⇤ ⇤
2·2 m 2 c / log G  2m 1 c . In other words, 2m 1  d(x0 , y 0 ) < 2m+2 . From Observation 4.5,
Pr [ECGR works (x0 , y 0 )] ⌦(1). Conditioned on this event,
S\P
|d(x, Wm ) S\P
d(y, Wm )| d(y 0 , Wm
S\P
) d(y, y 0 ) d(x, x0 )
⇤ p ⇤ p
2m c / log G 2 · 2m 2 c / log G
⇤ p
= 2m 1 c / log G
p
& d(x, y)/ log G
Thus, EALN works indeed occurs and the claim follows.
5 Gluing Lemma
The goal of this section is to prove Gluing Lemma 3.1. From Theorem p 3.2, we already have an
embedding H̃ with distortion D̃ at growth ratio 4, such that D̃(x, y) . log n. Invoking Observation
2.4, it is now sufficient to prove the following theorem:
7
Theorem 5.1 For each s 2 {0, . . . , log }, let fs be an L2 embedding of distortion D at scale s.
Then, for each G > 4, there is an L2 embedding fG of distortion D̂G at growth ratio G, where
!
p D(x, y)
D̂G (x, y) . log n · max 1, p · poly (log log n).
log ⇧⇤ (x, y)
In the remainder of the section, we prove the above theorem. We assume that we are given G 2 G
such that G > 4. We will make use of Shrinking Lemma, which appears as Lemma 5.2 in [MN04].
Intuitively, the lemma says that any metric space can be projected to a L2 shell in a way that the
distance between close points is not changed by much, and the distance between far enough points is
set roughly to the new diameter.
Lemma 5.2 (Shrinking Lemma) Given any metrix space (Y, dY ) and a parameter ⌧ 2 R+ , there
is an embedding h⌧ : Y 7! L2 with distortion 2, such that:
• 8y 2 Y , kh⌧ (y)k = ⌧ ; and
• 8y, y 0 2 Y , 1/2 · min (z, dY (y, y 0 ))  kh⌧ (y) h⌧ (z)k  min (⌧, kdY (y, y 0 )k).
For each scale s, let fˆs be the L2 embedding

p obtained by applying Shrinking Lemma 5.2 on the
embedding fs with parameter ⌧ = 2s / log G. Thus,
p ⇣ s ⌘
d(x,y)
Observation 5.3 kfˆs (x)k = 2s / log G and 12 · min plog
2
,
G D(x,y)
 kfˆs (x) fˆs (y)k  d(x, y).
5.1 Partitions of Unity
For a parameter t
p t, we let R(x, t) be the largest radius R such that B(x, R) contains at most 2 points.
We need a O( log G)-Lipschitz map ⇢ (called, partitions of unity) that (i) has a small enough support;
(ii) identically 1 in a large enough interval; and (iii) bounded by 1 everywhere. To this end, we let
1
⇢ : R 7! [0, 1] be a piecewise linear function with support [ 23+c⇤ p log G
, 28 log n] such that ⇢(q) = 1
1
in range [ 22+c⇤ p log G
, 27 log n]. We let ⇢s,t (x) := ⇢G (R(x, t)/2s ). First, observe that ⇢s,t is somewhat
smooth:
p
Observation 5.4 For any pair x, y 2 X, |⇢s,t (x) ⇢s,t (y)| . log G · d(x, y)/2s .
Proof: By definition of linear functions,

p |R(x, t) R(y, t)| p d(x, y)
|⇢s,t (x) ⇢s,t (y)| . log G ·  log G · .
2s 2s
Here, the last ineqality follows by noting that B(y, R(y, t)) ✓ B(x, d(x, y) + R(y, t)), and hence,
R(x, t)  d(x, y) + R(y, t). Similarly, we obtain R(y, t)  d(x, y) + R(y, t).
Exploting the smoothness of ⇢s,t , we claim that the embedding s,t defined as s,t (x) := ⇢s,t (x) · fˆs (x)
is nice in a sense. First, we claim that s,t is non-expanding.
Claim 5.5 For all x, y 2 X, k s,t (x) s,t (x)k . d(x, y).
Proof: By definition of the embedding,
k s,t (x) s,t (x)k = k⇢s,t (x) · fˆs (x) ⇢s,t (y) · fˆs (y)k
8
 |⇢s,t (x) ⇢s,t (y)| · kfˆs (x)k + ⇢s,t (y) · kfˆs (x) fˆs (y)k
p d(x, y) 2s
. log G · · p + 1 · d(x, y)
2s log G
. d(x, y)
Here, the second inequality follows from Observation 5.4, the fact that the range of ⇢s,t is [0, 1] and
Observation 5.3.
Next, we claim that s,t does not stretch the distances of pairs (x, y) too much when ⇧⇤ (x, y) = G,
d(x, y) ⇡ 2s and ⇢s,t (x) = 1.
Claim 5.6 Fix any pair x, y 2 X of points such that ⇧⇤ (x, y) = pG. Let s, t be such that d(x, y) ⇡ 2s
and ⇢s,t (x) = 1, then k s,t (x) s,t (y)k & d(x, y)/ max D(x, y), log G .
Proof: Notice that in this regime, from Observation 5.3,
1 d(x, y)
kfˆs (x) fˆs (y)k · p .
4 max D(x, y), log G
From our choice of t, on one hand,
k s,t (x) s,t (y)k kfˆs (x) ⇢s,t (y) · fˆs (y)k
kfˆs (x)k ⇢s,t (y) · kfˆs (y)k

p
= (1 ⇢s,t (y)) · 2s / log G,
where the last equality comes from Observation 5.3. While, on the other hand,
k s,t (x) s,t (x)k kfˆs (x) fˆs (y)k (1 ⇢s,t (y))kfˆs (y)k
1 d(x, y) 2s
· p (1 ⇢s,t,G (y)) · p ,
4 max (D(x, y), log G) log G
where again, the last inequality comes from Observation 5.3 Averaging both the bounds gives the
desired claim.
5.2 Ensembling the Embedding

L
The
L embedding t is now defined as t (x) := s s,t (x). The final embedding is then (x) =
t2{0,...,log n} t (x). In the remainder of the section, we prove that is indeed an embedding with
claimed properties.
p
Lemma 5.7 For each pair of points x, y 2 X, k (x) (y)k2 . log n · log log n · d(x, y).
s 2 c⇤
Proof: Notice that for fixed t, if ⇢s,t (x) 6= 0, then R(x, t) 2 [ 2plog G , 2s+7 log n]. In other words,
⇤ p
2s 2 [R(x, t)/(27 · log n), R(x, t) · 22+c · log G].
9
p
Thus, there are only O(log ( log G · log n)) = O(log log n) values of s such that ⇢s,t (x) 6= 0. Similarly,
(y) 6= 0 for at most O(log log n) values of s. First, we claim that for a fixed t, k t (x)
⇢s,tp t (y)k2 
O( log log n) · d(x, y). Indeed,
s
X
k t (x) t (y)k2 = k⇢s,t (x) · fˆs (x) ⇢s,t (y) · fˆs (y)k22
s
p
 O( log log n) · maxk⇢s,t (x) · fˆs (x) ⇢s,t (y) · fˆs (y)k2
s
p
. log log n · d(x, y),
where the last inequality comes from Claim 5.5. Now,

sX
p
t (y)k2 .
k (x) (y)k  k t (x) 2 log n · log log n · d(x, y).
t
p
log ⇧⇤ (x,y)
Lemma 5.8 For all x, y 2 X such that ⇧⇤ (x, y) = G, k (x) (y)k & d(x, y) · p .
max (D(x,y), log ⇧⇤ (x,y)
s
p ⇡ 2 . Let t be such that ⇢s,t = 1. From Claim 5.6,
Proof: Let s be the scale such that d(x, y)
k s,t (x) s,t (y)k & d(x, y)/ max D(x, y), log G in this regime.
We want to argue that there are many choices t such that ⇢s,tp (x) = 1. Let Ix be thep interval of t
for which ⇢s,t (x) = 1. From Observation 2.4, k (x) (y)k |Ix | · max D(x, y), log G . In the
remainder of the proof, we show that |Ix | & log ⇧⇤ (x, y), completing the proof.
By definition of ⇢s,t (x),
⇢  s 2 c⇤
2
Ix = t 2 Z | R(x, t) 2 p , 2s+7 · log n
log G
⇢ ✓ 2 c⇤
◆
2s
= Z \ log B x, p , . . . , log |B(x, 2s+7 · log n)|
log G
Thus, ⌫
|B(x, 2s+7 log n)|
|Ix | log p & log G = log ⇧⇤ (x, y),
|B(x, 2s 2 c⇤ / log G)|
where, the last inequality follows from Observation 2.3.
Theorem 5.1 now follows by normalizing the embedding so as to ensure that it is non-expanding.
6 CGR Technique
The goal of this section is to prove the stated version of Theorem 2.1. We begin by introducing the
notation used in [CGR05], and then adapt it to our setting.
Given a metric (X, d), let P be a partition into clusters. Given any point x 2 X, we denote by Px 2 P
the cluster containing x. Given a ppair of points x, y 2 X, we say that P separates (x, y) i↵ Px 6= Py
⇤
and d(y, X\Py ) d(x, y)/(2c 1 · log |X|) for some large enough constant c⇤ .
10
Theorem 6.1 (Theorem 4.1 in [CGR05]) Given a metric (X, d) of negative type and a scale m,
there is an efficient algorithm to find a collection Qm of partitions, such that for any pair x, y 2 X
of points with d(x, y) ⇡ 2m , a uniformly randomly selected partition P ⇠r Qm separates (x, y) with
constant probability.
Now we are ready to complete the proof of Theorem 2.1.

Proof of Theorem 2.1. For the given parameter r, we obtain the zero-set Wr as follows:
1. m ⇠r {r 1, r, r + 1}
2. Let Qm be the collection of partitions returned by Theorem 6.1.
3. Let P ⇠r Qm be a partition uniformly randomly selected from Qm .
4. Wr is obtained by picking each cluster of P randomly independently with probability 1/2.
Now, we show that Wr is indeed a zero-set with claimed properties. Fix any pair x, y 2 X of points
with 2r 1  d(x, y) < 2r+2 . Notice that d(x, y) ⇡ 2m with probability 1/8. Thus, from Theorem
6.1, P separates (x, y) with constant probability. Moreover, from our choice of the zero-set Wr ,
Px ✓ Wr and Py \ Wr =p ; with probabilityp1/4. Hence, with constant probability, x 2 Wr and
⇤ 1 ⇤
d(y, Wr ) d(x, y)/(2 c · log |X|) 2r c / log |X|. ⇤
7 FRT Technique
The goal of this section is to prove Theorem 4.3 by utilizing FRT technique. First, we perturb the
distances so that no two pairwise distances are equal. For the given scale s, we obtain a partition P
as follows:
1. R ⇠r [2s 1 , 2s )
2. = random permutation of X
3. For each element x 2 X, according to the order , pick all non-assigned elements within
distance R from it, and make a new block, and add it to P
4. We denote by Px 2 P the block containing x
⌧
Lemma 7.1 For every 2s 1 ⌧ 0, Pr [B(x, ⌧ ) 6✓ Px ]  2s 3 · log n.
6 B(z, R) \ B(x, ⌧ ) ( B(x, ⌧ ). By

Proof: We say that a point z potentially cuts the ball B(x, ⌧ ) i↵ ; =
definition, PrR [z potentially cuts B(x, ⌧ )]  2⌧ /(2s 2 )  ⌧ /2 2 .
s 1 s
11
We say that z cuts the ball B(x, ⌧ ) i↵ z potentially cuts B(x, ⌧ ) and it is the first point in to do
so. Notice that z cuts B(x, ⌧ ) only if z is the first point in among the points of B(x, d(x, z)). We
denote by Ez the event that z appears first among the points of B(x, d(x, z)) in . Thus,
Pr ,R [z cuts B(x, ⌧ )]  Pr [Ez ] · PrR [z potentially cuts B(x, ⌧ ) | Ez ]
1 ⌧
 ·
|B(x, d(x, z))| 2s 2
Applying union bound, " #

_
Pr [B(x, ⌧ ) 6✓ Px ] = Pr z cuts B(x, ⌧ )
z
X
 Pr [z cuts B(x, ⌧ )]
z
X 1 ⌧
 · s 2
z
|B(x, d(x, z))| 2
⌧
 · log n
2s 3
Pn
Here, the last inequality follows from the well known inequality that i=1 1/i  2 log n.
Notice that by our construction, the radius of each cluster is bounded by 2s . Theorem 4.3 now follows
by applying Lemma 7.1 with ⌧ = 2s 4 / log n.
References
[ALN05] Sanjeev Arora, James R. Lee, and Assaf Naor. Euclidean distortion and the sparsest cut.
In Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing,
STOC ’05, pages 553–562, New York, NY, USA, 2005. ACM.
[ARV04] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometric embeddings
and graph partitioning. In Proceedings of the Thirty-sixth Annual ACM Symposium on
Theory of Computing, STOC ’04, pages 222–231, New York, NY, USA, 2004. ACM.
[Bou85] J. Bourgain. On lipschitz embedding of finite metric spaces in hilbert space. Israel Journal
of Mathematics, 52(1):46–52, Mar 1985.
[CGR05] Shuchi Chawla, Anupam Gupta, and Harald Räcke. Embeddings of negative-type met-
rics and an improved approximation to generalized sparsest cut. In Proceedings of the
Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, pages 102–
111, Philadelphia, PA, USA, 2005. Society for Industrial and Applied Mathematics.
[Enf70] Per Enflo. On the nonexistence of uniform homeomorphisms between lp -spaces. Ark. Mat.,
8(2):103–105, 08 1970.
[FRT04] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating
arbitrary metrics by tree metrics. Journal of Computer and System Sciences, 69(3):485 –
497, 2004. Special Issue on STOC 2003.
[KLMN04] R. Krauthgamer, J. R. Lee, M. Mendel, and A. Naor. Measured descent: a new embedding
method for finite metrics. In 45th Annual IEEE Symposium on Foundations of Computer
Science, pages 434–443, Oct 2004.
12
[KR06] Robert Krauthgamer and Yuval Rabani. Improved lower bounds for embeddings into l1.
In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm,
SODA ’06, pages 1010–1017, Philadelphia, PA, USA, 2006. Society for Industrial and
Applied Mathematics.
[Lee05] James R. Lee. On distance scales, embeddings, and efficient relaxations of the cut cone.
In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms,
SODA ’05, pages 92–101, Philadelphia, PA, USA, 2005. Society for Industrial and Applied
Mathematics.
[MN04] Manor Mendel and Assaf Naor. Euclidean quotients of finite metric spaces. Advances in
Mathematics, 189(2):451 – 494, 2004.
13

A Modular Proof of ALN PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Modular Proof of ALN PDF

Uploaded by

Copyright:

Available Formats

A Modular Proof of ALN

September 26, 2017

Question 1 How well can we embed a metric of negative type into `1 ?

Figure 1: A schematic view of `1 , `2 and the metric of negative type.

2.1 Growth Ratio

The goal of this sub-section is to define the notion

Definition 3 The growth ratio of a point x at scale s is defined as:

|B(x, 2s 1 )| |B(x, 2s+7 · log n)|

|B(x, 2s+7 log n)| p

|B(x, 2s+7 log n)| p

|B(x, 2s+7 log n)| |B(x, 2s+7 log n)| p

2.2 Distortion as a Function of Points

For convenience, with an overload of notation, we think of distortion of an embedding as a function,

• kf (x) f (y)k  d(x, y); and

Given two embeddings f1 : X ! L2 and f2 : X ! L2 , we define f1 f2 : X ! L2 as the embedding

Observation 2.4 Let f1 , . . . , fz be L2 embeddings of distortions D1 , . . . , Dz . Then the embedding

where the last inequality comes from Lemma 4.1.

1. Let P be the decomposition obtained by Theorem 4.3

observation is a direct consequence of triangle-inequality.

Claim 4.7 Pr [Eneighbor in S (x) ^ Eneighbor in S (y)] ⌦(1).

|B(x, 2m+7 log n)| |P |

Claim 4.8 Pr [EALN works ] = ⌦(1).

Thus, EALN works indeed occurs and the claim follows.

• 8y 2 Y , kh⌧ (y)k = ⌧ ; and

For each scale s, let fˆs be the L2 embedding

5.1 Partitions of Unity

Proof: By definition of linear functions,

Proof: By definition of the embedding,

Proof: Notice that in this regime, from Observation 5.3,

From our choice of t, on one hand,

kfˆs (x)k ⇢s,t (y) · kfˆs (y)k

5.2 Ensembling the Embedding

where the last inequality comes from Claim 5.5. Now,

Now we are ready to complete the proof of Theorem 2.1.

2. Let Qm be the collection of partitions returned by Theorem 6.1.

3. Let P ⇠r Qm be a partition uniformly randomly selected from Qm .

4. Wr is obtained by picking each cluster of P randomly independently with probability 1/2.

4. We denote by Px 2 P the block containing x

6 B(z, R) \ B(x, ⌧ ) ( B(x, ⌧ ). By

Applying union bound, " #

You might also like