Professional Documents
Culture Documents
E. R. Elenberg
Beyond Triangles
1/20
Introduction
E. R. Elenberg
Beyond Triangles
2/20
3-profile
vertices
H3
E. R. Elenberg
Beyond Triangles
3/20
3-profile
vertices
H0
E. R. Elenberg
H1
H2
Beyond Triangles
H3
3/20
3-profile
vertices
H0
H1
H2
H3
Definition
Let ni be the number of Hi s in a graph G. The vector
n(G) = [n0 , n1 , n2 , n3 ] is called the 3-profile of G.
- Always sums to |V3 | , the total number of 3-subgraphs
E. R. Elenberg
Beyond Triangles
3/20
Examples
H3
E. R. Elenberg
Beyond Triangles
4/20
Examples
H3
E. R. Elenberg
Beyond Triangles
4/20
Examples
H3
E. R. Elenberg
Beyond Triangles
4/20
Examples
H3
E. R. Elenberg
Beyond Triangles
4/20
Examples
E. R. Elenberg
Beyond Triangles
5/20
Examples
H0
E. R. Elenberg
Beyond Triangles
5/20
Examples
H1
E. R. Elenberg
Beyond Triangles
5/20
Examples
H2
E. R. Elenberg
Beyond Triangles
5/20
Examples
H3
E. R. Elenberg
Beyond Triangles
5/20
Related Terms
For each v V :
Definition
The local 3-profile counts how many times v participates in each
Hi with 2 other vertices.
E. R. Elenberg
Beyond Triangles
6/20
Related Terms
For each v V :
Definition
The local 3-profile counts how many times v participates in each
Hi with 2 other vertices.
Definition
The ego 3-profile is the 3-profile of ego graph N (v).
- Graph induced by set of neighbors (v)
E. R. Elenberg
Beyond Triangles
6/20
Motivation
E. R. Elenberg
Beyond Triangles
7/20
Introduction
large graph
E. R. Elenberg
Beyond Triangles
8/20
Introduction
large graph
E. R. Elenberg
Beyond Triangles
8/20
Contributions
E. R. Elenberg
Beyond Triangles
9/20
Related Work
Subgraph counting
Graphlets
E. R. Elenberg
Beyond Triangles
10/20
Outline
Introduction
3-profile Sparsifier
Edge Sub-sampling Process
Concentration Bound
3-PROF Algorithm
Experiments
Conclusions
E. R. Elenberg
Beyond Triangles
10/20
probability p
Markov chain
E. R. Elenberg
Beyond Triangles
11/20
Original
Sub-sampled
2.5
1.5
0.5
0.5
1.5
2
5
2.5
1.5
0.5
0.5
1.5
2.5
2.5
2.5
1.5
0.5
0.5
1.5
p3
E. R. Elenberg
Beyond Triangles
12/20
2.5
Original
Sub-sampled
2.5
1.5
0.5
0.5
1.5
2
5
2.5
1.5
0.5
0.5
1.5
2.5
2.5
2.5
1.5
0.5
0.5
1.5
p3
E. R. Elenberg
Beyond Triangles
12/20
2.5
Original
Sub-sampled
2.5
1.5
0.5
0.5
1
p2
1.5
2
5
2.5
1.5
0.5
0.5
1.5
2.5
2.5
2.5
1.5
0.5
0.5
1.5
p3
E. R. Elenberg
Beyond Triangles
12/20
2.5
Original
Sub-sampled
2.5
1.5
0.5
0.5
1
p2
2 (1
5
2.5
p)
3p
2
1.5
0.5
0.5
1.5
2.5
1.5
2
2.5
2.5
1.5
0.5
0.5
1.5
p3
E. R. Elenberg
Beyond Triangles
12/20
2.5
p)
2
p)
Sub-sampled
p) 3
(1
(1
(1
Original
2.5
2
1.5
2p (
p)
(1
3p
p)
0
0.5
p2
2 (1
5
2.5
1
0.5
p)
3p
2
1.5
0.5
0.5
1.5
2.5
1.5
2
2.5
2.5
1.5
0.5
0.5
1.5
p3
E. R. Elenberg
Beyond Triangles
12/20
2.5
p)
(1
p)
Sub-sampled
p) 3
(1
(1
Original
2.5
2
1.5
2p (
p)
(1
3p
p)
0
0.5
p2
2 (1
5
2.5
1
0.5
p)
3p
2
1.5
0.5
0.5
1.5
2.5
1.5
2
2.5
2.5
1.5
0.5
0.5
1.5
p3
1
0
Estimator =
0
0
E. R. Elenberg
1p
p
0
0
(1 p)2
2p(1 p)
p2
0
(1 p)3
2
3p(1 p)
Sub-sampled
3p2 (1 p)
3
p
Beyond Triangles
12/20
2.5
Main Result
Theorem (3-profile sparsifiers)
For all (,p)-balanced graphs , thel -norm of the 3-profile
sparsifier error is bounded by |V3 | with high probability.
E. R. Elenberg
Beyond Triangles
13/20
Main Result
Theorem (3-profile sparsifiers)
For all (,p)-balanced graphs , thel -norm of the 3-profile
sparsifier error is bounded by |V3 | with high probability.
Definition
A graph is (,p)-balanced if the majority of triangles, wedges,
or single-edges do not depend on one common edge.
E. R. Elenberg
Beyond Triangles
13/20
Main Result
Theorem (3-profile sparsifiers)
For all (,p)-balanced graphs , thel -norm of the 3-profile
sparsifier error is bounded by |V3 | with high probability.
Definition
A graph is (,p)-balanced if the majority of triangles, wedges,
or single-edges do not depend on one common edge.
Proof Sketch:
- Apply multivariate polynomial concentration inequalities [Kim,
Vu 00] to each estimator
f (G, p) = e1 e2 e4 + e4 e5 e6 + . . .
E. R. Elenberg
Beyond Triangles
13/20
Outline
Introduction
3-profile Sparsifier
Edge Sub-sampling Process
Concentration Bound
3-PROF Algorithm
Experiments
Conclusions
E. R. Elenberg
Beyond Triangles
13/20
3-PROF
Vertex program in the Gather-Apply-Scatter framework
E. R. Elenberg
Beyond Triangles
14/20
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1
E. R. Elenberg
For each vertex v: Gather and Apply vertex IDs to store (v)
Beyond Triangles
14/20
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1
For each vertex v: Gather and Apply vertex IDs to store (v)
E. R. Elenberg
Beyond Triangles
14/20
3-PROF
Vertex program in the Gather-Apply-Scatter framework
1
For each vertex v: Gather and Apply vertex IDs to store (v)
n3,v =
nc2,v =
E. R. Elenberg
1
2
a(v) n3,va
c
a(v) n2,va ,
...
Beyond Triangles
a
14/20
Outline
Introduction
3-profile Sparsifier
Edge Sub-sampling Process
Concentration Bound
3-PROF Algorithm
Experiments
Conclusions
E. R. Elenberg
Beyond Triangles
14/20
Implementation
GraphLab PowerGraph v2.2
Multicore server
256 GB RAM, 72 logical cores
EC2 cluster (Amazon Web Services)
20 c3.8xlarge, 60 GB RAM, 32 logical cores each
E. R. Elenberg
Beyond Triangles
15/20
Implementation
GraphLab PowerGraph v2.2
Multicore server
256 GB RAM, 72 logical cores
EC2 cluster (Amazon Web Services)
20 c3.8xlarge, 60 GB RAM, 32 logical cores each
Datasets
Name
Twitter
PLD
LiveJournal
Wikipedia
DBLP
E. R. Elenberg
Vertices
41, 652, 230
39, 497, 204
4, 846, 609
3, 515, 067
317, 080
Edges (undirected)
1, 202, 513, 046
582, 567, 291
42, 851, 237
42, 375, 912
1, 049, 866
Beyond Triangles
15/20
1.015
edge
empty
Accuracy [exact/approx]
1.010
1.005
1.000
0.995
0.990
0.985
E. R. Elenberg
p=0.7
p=0.4
p=0.1
Beyond Triangles
p=0.01
16/20
Trian
600
500
400
300
200
100
E. R. Elenberg
PLD
Beyond Triangles
17/20
Ego-par 12 nodes
105
>10000 sec
104
>1000 sec
103
102
101
100
101
E. R. Elenberg
100 egos
1K egos
Beyond Triangles
10K egos
18/20
Ego-par 12 nodes
Ego-par 16 nodes
Ego-par 20 nodes
12
10
E. R. Elenberg
10k egos
Beyond Triangles
19/20
Outline
Introduction
3-profile Sparsifier
Edge Sub-sampling Process
Concentration Bound
3-PROF Algorithm
Experiments
Conclusions
E. R. Elenberg
Beyond Triangles
19/20
Summary
E. R. Elenberg
Beyond Triangles
20/20
n3,va
a(v)
2
E. R. Elenberg
+
F2 (v)
Beyond Triangles
3F3 (v)
20/20
n3,va
a(v)
2
E. R. Elenberg
+
F2 (v)
Beyond Triangles
3F3 (v)
20/20
n3,va
a(v)
2
E. R. Elenberg
+
F2 (v)
Beyond Triangles
3F3 (v)
20/20
Accuracy [exact/approx]
1.004
triangles
wedges
edge
empty
1.002
1.000
0.998
0.996
p=0.7
E. R. Elenberg
p=0.5
p=0.3
Beyond Triangles
p=0.1
20/20
3-prof p=1
3-prof p=0.5
3-prof p=0.1
Trian p=1
120
3-prof p=1
3-prof p=0.5
3-prof p=0.1
Trian p=1
Trian p=0.5
Trian p=0.1
6
100
80
60
40
20
12 nodes
16 nodes
20 nodes
E. R. Elenberg
12 nodes
16 nodes
20 nodes
Beyond Triangles
20/20
1010
1.0
3-prof p=0.1
Trian p=1
1.2
Trian p=0.5
Trian p=0.1
1011
3-prof p=0.1
Trian p=1
Trian p=0.5
Trian p=0.1
0.8
1.0
0.6
0.4
0.2
0.0
0.6
0.4
0.2
12 nodes
16 nodes
20 nodes
E. R. Elenberg
0.8
0.0
12 nodes
16 nodes
20 nodes
Beyond Triangles
20/20
120
Ego-ser 16 nodes
Ego-par 12 nodes
Ego-par 16 nodes
Ego-par 20 nodes
12
100
10
80
60
40
20
100 egos
EGO-SER
E. R. Elenberg
100 egos
EGO-PAR
Beyond Triangles
20/20