You are on page 1of 76

P2P and multimedia

applications over the


Internet
Notes on the course
Fiandrino Claudio
July 4, 2011

II
Contents
1 P2P systems 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Time evolution of applications . . . . . . . . . . . . . . . . . 2
1.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 General Issues . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Issues for ISP . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.3 Issues for Users . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Overlay network . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Family of systems . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Napster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.8.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . 12
1.8.4 Performance evaluation . . . . . . . . . . . . . . . . . 13
1.9 Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.9.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.9.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.9.3 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.9.4 Load balance . . . . . . . . . . . . . . . . . . . . . . . 23
1.9.5 Comparison between Chord and Gnutella . . . . . . . 25
1.10 CAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.10.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.10.2 Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.10.3 Performances . . . . . . . . . . . . . . . . . . . . . . . 28
1.10.4 Leaving of a node and failures . . . . . . . . . . . . . . 28
1.11 Tapestry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.12 BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.2 Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.12.3 Case study: Flash Crowd . . . . . . . . . . . . . . . . 34
1.13 Skype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
III
IV CONTENTS
1.14 P2P Streaming systems . . . . . . . . . . . . . . . . . . . . . 38
1.14.1 Tree-based systems . . . . . . . . . . . . . . . . . . . . 40
1.14.2 Meshed-based systems . . . . . . . . . . . . . . . . . . 43
2 Random graphs 53
2.1 Introduction and denitions . . . . . . . . . . . . . . . . . . . 53
2.2 Erdos-Renyi Model . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.1 Average degree . . . . . . . . . . . . . . . . . . . . . . 55
2.2.2 Degree distribution . . . . . . . . . . . . . . . . . . . . 56
2.3 Bender-Caneld Model . . . . . . . . . . . . . . . . . . . . . . 56
2.3.1 Node reachability . . . . . . . . . . . . . . . . . . . . . 56
2.3.2 Small-world eect . . . . . . . . . . . . . . . . . . . . 61
2.3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.4 Heavy-Tailed Distribution . . . . . . . . . . . . . . . . . . . . 65
2.5 Watts-Strogatz model . . . . . . . . . . . . . . . . . . . . . . 66
2.5.1 Clustering analysis . . . . . . . . . . . . . . . . . . . . 67
2.5.2 Small-world analysis . . . . . . . . . . . . . . . . . . . 68
2.6 Theory of evolving networks . . . . . . . . . . . . . . . . . . . 69
2.7 Resume scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Chapter 1
P2P systems
1.1 Introduction
For P2P analysis point of view, the Internet is a structure already dened
and perfectly working: only users are taken into account and they are called
hosts or peers. Hosts communicate thanks to the Internet, which can be seen
as the transport media that carries data, therefore the analysis focuses on
layers 4 and 7 of the OSI stack. Indeed it is necessary having a knowledge
of transport layer to understand and predict the behavior of the network,
but it is also necessary know what kind of features users may require from
the application layer, since they operate with applications.
Layer 7
Layer 4
1.2 Denition
P2P (peer-to-peer) systems are system in which users receive and provide
part of the service. This is a general denition, indeed the concept of ser-
vice has to be declared. The important thing is that hosts also contribute
to service provisioning: it means that the service is distributed and not cen-
tralized like a web browsing application. Depending on the type of service,
users provide dierent things using their resources.
1
2 CHAPTER 1. P2P systems
Sharable resources
In this section the attention will focus on kind of sharable resources.
A rst type are content resources: users share content that they have on
their machines. If there are no other users with that content, the quality of
service will be very bad while, if a lot of hosts share the same content, the
service will be excellent. An example of application is Napstar where the
content is music. Types of content indeed might be various; grouping them,
it is possible introduce the following classication:
. le sharing;
. directories.
File sharing groups a lot of possible contents: music, games, videos, lms,
ebooks. Directories are typically part of a distributed database that once
received it is redistributed and anyone can access to that part (Skype).
Another possible sharable resource is CPU: in this context the compu-
tational power is shared. For example, if an application requires a very huge
computational capacity not owned by a single machine, it can be distributed
among Internet hosts to use their computational power to process a single
part of the application (application to discover new form of life that require
sharing power to signal processing).
The last possible shareable resource is bandwidth: an example is the case
in which an host owns a very popular lm requested by a lot of other peers;
if it has to distribute to everyone, a very large bandwidth it is required at
the access link. Perhaps it is better if he distribute parts of that lm to
other users that in turn redistribute: in this way the bandwidth actually
used is greater. Examples of applications are Bit-torrent, P2P Tv, Gaming.
1.3 Time evolution of applications
At the begin, the Internet was in certain sense peer-to-peer: at topology,
distributed features and protocols. Growing up, it moves to the client-
server paradigm in which someone provide some service requested by other
user: the web browsing is a typical client-server application. ISP developed
applications in that sense and that choose implied having asymmetric access:
upload and download treated separately, typically assigning to download
much bandwidth (ASDL). Indeed, usually there is one server with several
clients.
With the development of peer-to-peer applications the situation changed
in a fair symmetric way and now there is no a strict division to download
and upload bandwidth because, if peers have to redistribute contents, they
need an application able to exploit in particular the upload bandwidth.
1.4. Issues 3
Moreover with the technological evolution of devices, a much more com-
putational power has made it possible push down some tasks from the core
network to edges.
1.4 Issues
1.4.1 General Issues
Peer-to-peer systems suers of critical issues. One is churning, the high
variability in time of the system. Indeed hosts can freely join or leave so
the quantity of content avaiable changes very frequently. For example, for
P2P Tv, resources have to be balanced on the quantity that a peer can
redistribute and the quantity that he needs.
Furthermore, a perfect knowledge of participants is required, such as
their Ip address that, due to churning, can change over time. This knowledge
is not strictly necessary in others applications.
If a peer is hidden behind a NAT or a rewall, further information is
required, in particular the public Ip address of NATs. The reason is that
NATs were developed for a client-server kind of application. Firewalls, in-
stead, can denied the access of a machine to the P2P application.
Every P2P system has to deal with join issue: when users want to join
the net, they require some information like the address of the rs neighbor.
If, in a certain moment, there are no peers in the network the service can
not be provided. In order to join is possible:
. access to a web page which contains a list of peers active or recently
active: the new peer contact them as soon as he nds one up;
. connect to some server always on.
These mechanisms are centralized techniques: an application that use them
is BitTorrent.
1.4.2 Issues for ISP
ISPs have to cope with following troubles:
. trac engineering: to improve the service, having in mind the goal
of satisfying users requirement, ISPs can balance trac (symmetric or
asymmetric access means dierent amount of trac in the network);
. capacity problems: many applications generate a lot of trac and ISPs,
when exchange trac to other ISPs, have to respect cost policies stip-
ulated; moreover, the quantity of the trac can be huge because ap-
plications does not care of the physical topology so, being neighbors
in the peer network does not implies belonging to the same ISP: the
consequence is that, in general ISPs are crossed many times;
4 CHAPTER 1. P2P systems
. competitive services: ISP can have their own telephony company which
gives a non free service; of course they also carry data trac and, if that
trac is Skype trac, which is free service VOIP, they may penalize
it since it is concurrent.
1.4.3 Issues for Users
Considering users, they have to deal with:
. legal issues: some services, for example le sharing, may incur in this
issue because contents are distribuited violing copyright;
. security and private issues: maybe some applications are malicious
and exchange trac potentially riskily (viruses, malaware, spyware).
1.5 Overlay network
The layer 7 network that connects peers is called overlay network. The
overlay network is completely independent from the physical network and
can be fully mesh connected or not (if peers does not know all other peers,
but they have a partially view of the topology). The picture below reports
an example.
Isp 1
Isp 2
Isp 3
Overlay Network
Links are logical of course, and two peers connected by a link of the
overlay network are neighbors and they may belong to dierent ISP: it means
that physically they can be located very far away. Links can be created
in dierent ways, with direct TCP connections for example, or with UDP
connections plus some further information.
1.6. Family of systems 5
The overlay network is used to implement functions, dierent from appli-
cation to application and it is possible have more than one overlay network
nested together. Some examples are:
Gnutella :

query les: overlay network


retrieve les: tcp connection
BitTorrent :

retrieve les: overlay network


1.6 Family of systems
According to the following classication, it is possible to distinguish:
. unstructured P2P systems: they are systems in which the topology is
not regular, but a random graph (neighbors are randomly chosen); an
example is Gnutella;
. structured P2P systems: in these systems the topology is regular; an
example is Chord;
. hierarchical P2P systems: a hierarchy is created among peers, distin-
guishing high priority peers (super peers) and ordinary peers; super
peers are connected together in a structured way while ordinary peers
are connected with unstructured topology; an example is Skype.
1.7 Napster
Napster can be considered as the rst P2P system, developed by Shawn
Fanning with Sean Parker and released in 1999. Actually it was not a
really P2P system since users were not connected together (they had to join
servers), but it has some peculiar characteristics of P2P systems. Those
servers contained, in a database, lists of shareable contents that users had
on their pcs. The architecture was something like a star where central nodes
were servers; it is briey shown in the following picture.
Server Users Db
6 CHAPTER 1. P2P systems
Properties
. Informations that users declare: ID, Ip address, number of port, list
of sharable contents.
. Fundamental function: query for a given content.
How it worked
When a user wanted to retrieve some content like a song, sent his request to
the server; at that point the server looked for the content into the database
to know who hold it. If someone had it, it returned to the initial user all
informations regarding the user that had the content: in this way the two
hosts can exchanged the content using a direct connection.
1.8 Gnutella
Gnutella is not an application or a system, but it is a protocol that other
applications implement (for example Shareaza, Bearshare, LimeWire). The
topology is unstructured and there is no distinction among peers: it is server-
less. Moreover, each node can request or distribute contents: this kind of
peers are called servent.
It is assumed that users share contents stored on their pcs so they have
rst to declare to the network the knowledge of their contents. The pur-
pose of Gnutella is make queries in a smart way. A query, to discover the
requested le, has to search on a list of contents held by peers; such a search
is realized thanks to ooding: the initial node can send the request only to
its neighbors, they forward it to their neighbors and so on. It implies that
each node has not a global view of the network.
1.8.1 Analysis
To analyze a P2P protocol, the attention has to be focused on the following
aspects:
. how users join;
. maintenance: fundamental task to deal with churning;
. search: discover some content in the network (is a typical task for le
sharing applications);
. download: when a search succeeds, how the le is downloaded.
1.8. Gnutella 7
Joining
The protocol does not specify a procedure: usually on a web page there is
a list of peers active or recently seen active. The new user has to connect
to that page and download that list; then he has simply to try to contact
users presents on the list as soon as he is able to nd one of them active:
at that point he can open a connection and wait for the acknowledge. To
be contacted each peer has to declare: its ID, the Ip address and the port
number.
The graphical explanation is reported below.
A
Step 1: contact the web page
Web Page
A
Step 2: download the list of peers
Web Page
A
Step 3: contact a peer
Web Page
A
Step 4: wait the acknowledge
Web Page
8 CHAPTER 1. P2P systems
A
Step 5: the new user is a peer
Web Page
Steps 1-4 are called signalling procedure: after that the new user becomes a
peer and, at the beginning, he has just one neighbour (the peer contacted by
means of the web page); in Gnutella, two peers are neighbors when they have
established a TCP connection (at that time using TCP was very peculiar).
Since it is possible to contact each peer present in the list, the topology is
randomly created.
Maintenance
When one peer is connected he has to discover other neighbors to have a good
connectivity; indeed, if the only neighbor that it has switch o, he remains
no longer connected with the network. Goals of maintenance mechanism
are:
. guarantee a good connectivity;
. give the possibility of change neighbors (in order to discover peers with
more contents).
The second feature implies that the overlay change a lot in time, due to this
fact and to churning.
To reach the two purposes, the following mechanism is provided:
. time by time a ping message is sent to check if neighbors are alive;
. when a ping message is received:
. with a pong message a neighbor signals that it is alive;
. the peer forwards to all its neigbors the ping (they will answer
with a pong just to the peer that forward the rst ping, not to
the initial sender);
. when a pong message is received it is forward to peers that previously
send a ping.
This mechanism is called ooding or discovery method because the new peer,
thanks to ping and pong, can discover new neighbors.
The algorithm stops using the TTL eld of both messages; it allows to:
1.8. Gnutella 9
. avoid messages that run forever in the network;
. discover a part of the topology, not the complete knowledge of the
network.
Since each message has an almost unique identier (it is selected ran-
domly among a large set, so the probability of having two messages with the
same ID is negligible), the peer i has not to forward a message (both ping
and pong) if he has received it more times; this choice has been taken:
. to avoid useless propagation of messages;
. to have a small cache in which store messages (possible only if useless
messages are not propagated).
The mechanism does specify the policy in which a new peer operates,
once it has discovered new peers with pongs: contact all of them, just a part
chosen randomly, a part chosen following some criterion.
Search mechanism
The search method is implemented with ooding as the maintenance mech-
anism. When a peer wants to search a given le, it has to send a query
message to his neighbors; the message contains all fundamental information
on the le. Nodes that receive the query check if they have that content:
. if not, they have to forward the message to their neighbors (as before,
the message has an unique ID, so if a peer receive it more time, it just
ignores the message);
. if yes, they have to answer with a query hit message.
The node that has the content does not forward anymore the query message;
notice that the query hit uses the reverse path to reach the initial node.
The reverse path is exactly the path followed by the query message and it
is extremely important since, as mentioned, each node does not have the
global view of the topology.
Download
When a query succeeds, the initial requester has to download the le; it is
able to do it, since the query hit message contains all information on the
node that holds the content. In particular, peculiar features are:
. Ip address;
. peer ID;
. port number.
10 CHAPTER 1. P2P systems
The download uses HTTP protocol and it happens directly between the
requester and the peer that holds the le: it means that contents are not
distributed over the overlay network, just queries are.
1.8.2 Messages
Messages, or descriptors, are used to implement functions mentioned before
like maintenance and search. They are composed by header (common to all
messages) and payload (dierent from function to function):
Fields Header
Payload
0 22 23 variable
Header
The header is composed by:
Fields
Descriptor ID
PT TTL
Hops Lenght
0 16 17 18 19 22
where:
. descriptor ID is the unique identier;
. PT is the payload type;
. TTL is the counter decremented each hop crossed;
. Hops is a counter incremented each hop crossed;
. Length is the eld that specify the length of the payload (since it is
variable is not known a priori).
Payload
Ping This message has no payload.
1.8. Gnutella 11
Pong
Fields Port N. Ip Addr Num. Files Num. Kb
0 2 6 10 13
Last two elds represent the sharable capability of the node (in number of
les and Kb): this information helps to decide to what peer is convenient to
be connected to.
Query
Fields
Min. Speed
Search Criterion
0 2 variable
where:
. minimum speed is the rate at which the peer wants to achieve the le
(measured in kbit/s);
. search criterion is the eld that contains information used to search the
content; since the protocol says nothing, each application can specify
its own policy and it is a good choice because, the more general is the
search criterion, the easier will be the research.
Query Hit
Fields
Hits
N. N.
Port
Ip Addr
Speed
Result Set Servent ID
0 1 3 7 11 N N+16
where:
. num hits eld represents how many contents satisfy the research;
. speed represents the minimum speed (see query message);
. result set contains:
Fields File index File size File name
0 4 8 variable
12 CHAPTER 1. P2P systems
Push If the node that contains the le is behind a rewall the requester
servent is not able to contact him: in this situation he sends to his neighbors
a push message. Once it is reached by the nal node (always with ooding),
the connection between the two servent is opened by that peer and not by
the requester on. A push message is compound by:
Fields Servent ID File index Ip Addr Num Port
0 16 20 24 25
1.8.3 Characteristics
Networks aspects
From the network point of view, main characteristics to keep in mind are:
. scalability with the number of peers: the system scales very well be-
cause is completely distributed;
. robustness with respect to churning/failures: the system is very robust
both to churning and failures because the maintenance is realized with
ooding and the connectivity is very high.
Users aspects
From the users point of view, the main characteristic in which they are
interested in, is the eciency or response time. It depends on the popularity
of the content:
. if it is very or quite popular, probably the hit will happen before the
TTL goes to 0;
. if it is not popular, the probability of nding the content before the
TTL goes to 0 is not sure.
In the rst case the eciency is guaranteed while in the second case no.
Costs
Since there is a lot of trac to deal with, from the network point of view,
the protocol is extremely costly: this is the main drawback of Gnutella.
Considering users, the algorithm is simply and, in terms of resources
consumed, is cheaper since the storage capability devoted to the protocol is
little. Only things to manage are:
. neighbors;
. cache.
1.8. Gnutella 13
1.8.4 Performance evaluation
To evaluate Gnutella performances the analysis focuses on ooding proce-
dure:
Each arrow color represents a dierent step of the procedure: this is a sort
of tree:
A
B
E F
C
G H
D
I L
To perform some analysis, rst parameters have to be declared; they are:
. is the number of neighbors for each peer (in the previous picture
= 3: for example, A can contact B, C and D while C can contact G,
H and A); it is assumed constant;
. H is the number of hops: represent the deep (number of levels) of the
tree;
. N is the number of peers;
. T is the average time to contact a peer; it is a random variable de-
pending on:
. layer 3-4;
. physical distance;
. number of routers crossed;
. possible congestion in the network;
. p represent the popularity of the le: it is a probability that some peer
hold that content.
14 CHAPTER 1. P2P systems
Number of contacted peers
Since is assumed to be constant, at each level of the tree there, each node
can contact exactly other nodes; to have an approximation of the number
of contacted peers c , the following assumptions are taken:
. common neighbors are neglected, therefore each node contact (1)
peers (all sons of the tree a part from the father);
. the value ( 1) is approximate with
2
.
In conclusion at each step the number grows by:
c =

+
2
+
3
+. . . +
H
It is possible to rewrite the expression into:
c =
H

i=1

i
Example Taking values for H and it is possible to determine realistic
values for c:

= 4
H = 7
= c

= 22k
If the message was a ping, peers will answer with a pong, therefore for each
ping, in a scenario like the preceding one, there will exchange

= 44k.
Time need to contact peers
To compute it, rst an assumption has to be taken: at each level of the tree
the time to contact peer (from father node to sons) it xed and equal to T.
Implicitly it means that the time required to send sequentially messages is
considered negligible with respect to the time need to reach neighbors.
Under that assumption, considering independent each level of the tree,
parallels propagations occur and so:
Avgtime = H

In a time (H T),
H
nodes are reached.
Example Considering:

H = 7
T

= 200 ms
= Avgtime = 0.2 7 = 1.4 s
Therefore, it is possible say that, the response received by an huge number
of peers is quite quick.

First step.

Second step.

Third step.

H-th level of the tree.

Number of
hops.

Time to cross an hop.
1.8. Gnutella 15
Probability of not nding a content
This is an ineciency of the system perceived by users. In general, the
number of copies of a given content with popularity p is (N p). It means
that each peer has an independent probability of having that content.
Considering c the number of contacted peer, the probability of not nd-
ing the content is:
{ (not nd) = (1 p)
c
Choosing a target F under which { (not nd) must be assured:
{ (not nd) < F

(1 p)
c
< F
Taking the logarithm:
c log(1 p) < log F = c >
log(F)
log(1 p)
Example Considering = 4:
Value of H Value of c
1 4
2 20
3 84
4 340
5 1360
6 5460
7 21844
Maintaining = 4, considering F = 0.01:

p = 0.05 (5%)
p = 0.01 (1%)
=

c > 90 take H = 4
c > 458 take H = 5
16 CHAPTER 1. P2P systems
Performance
Performances principally means the average number of hops require to con-
tact before having the rst hit. For example:
{ (1) = { (nd the le at the rst hop) = 1 (1 p)

Prosecuting:
{ (2) = = (1 { (1)) [1 (1 p)

2
]
{ (3) = = (1 { (1)) (1 { (2)) [1 (1 p)

3
]
The average time to send a request is:

i=0
i { (i)

T
The average time to receive an answer is:

i=0
i { (i)

2T
1.9 Chord
Chord is a structured system (on the overlay) which implies that churning
is a big issue since the topology is xed. So the choice of the topology is
very relevant: it can not be a star because in a P2P system in general there
are no role distinctions like the one introduced by the star topology with
the central node. Moreover, also regular structured topologies are not so
good since they introduce the concept of priority based on the geographical
position. The topology actually used is a ring.
The attention must be focused on the P2P technology, so the application
layer and network layer are non considered; using a diagram, the stack
should be:
Application
P2P Technology
Layer 3/4
1.9. Chord 17
The P2P technology concerns features like overlay creation and maintenance,
join operation and management of messages.
Chord is similar to Gnutella since it is a protocol, but it distributes the
information about contents and not the request for a given le. For example,
it is possible that the peer that knows where is located a certain content is
not the holder: the two aspects are completely separated.
1.9.1 Analysis
A regular structure like the ring gives, implicitly, a knowledge about the
distance between nodes. This fact is very useful to help the join operation:
a new peer that wants to be connected has just to know in which position
he should be placed. The distance knowledge is not provided physically:
it is too complex to manage. Moreover it introduce some dierences from
a peer to another one: if the application that runs this protocol becomes
very popular in a given country, nodes belonging to that country will be
physically placed near with respect to a node belonging to another country.
The density would be dierent.
On the contrary, supposing to have a knowledge of distance at the over-
lay, allows to consider peers physically located very far away as neighbors.
The way in which nodes are placed on the ring is to apply a function T
to a list of information about the peer: the outcome is deterministic and
uniformly distributed into an interval. This outcome is a number mapped in
bits, so the ring is usually divided into m bits and, consequently the interval
is divided into 2
m1
parts.
Peer Info Node Id
T
The function T is realized thanks to cryptografy (SHA-I):
. because makes dicult from the Node Id, obtain the peer information
list;
. allow to map a lot of information into an uniformly distributed space
avoiding some proximity among peers;
. although the mapping is random into the interval [0, 2
m1
], the func-
tion is deterministic, so receiving two identical inputs, it will provide
the same output (possible collisions).
The Node Id represent the nal position of the peer on the ring; thanks
to that topology, each peer has just two neighbors called predecessor (i 1
in the following picture) and successor (i + 1 in the picture); therefore it is
18 CHAPTER 1. P2P systems
possible dene neighbors as the closest active peers of the considered node
(i).
0
2
m

1
i
i 1
i + 1
Join
Up to now, the join operation can occur with following steps:
. the new node applies the function T to his peer list information re-
ceiving as a result his own position (N7);
. he should know another peer and contact it (N24);
. this peer contact his successor and so on until the right position of the
new node is reached;
. when successor and predecessor of the new node are founded, the con-
nection is established and the node becomes a peer.
Graphically:
N7
N
2
4
N7
N
2
4
N7
1.9. Chord 19
How information is distributed
Unlike Gnutella, in Chord the information of where contents are located is
distributed among peers. Each peers knows that information thanks to keys
that are generated applying a function ( to metadata (data that describe
synthetically the content). Graphically:
Metadata Key
(
Keys are values generated with the same properties of Node Id, there-
fore they are uniformly distributed in the same interval [0, 2
m1
]. An im-
portant thing to remark is that T and (, starting from dierent inputs
(peer information list and metadata), are both able to map dierent kind of
outputs (Node Ids and keys) into the same interval.
To associate keys to Node Ids the rule used is to assign a key to the
nearest peer succeeding the key value.
Queries
When the node N wants to retrieve a content, runs the function ( over the
metadata obtaining the key. Since it knows only his neighbors, he forwards
to them the query that each time is redistributed. In this way sooner or
later the peer that has holds the key searched by N is founded.
If peer are n, globally, the expected time to found the one with the right
key is n/2. This assumption holds just because both keys and node id are
uniformly distributed. Therefore the order of complexity is quite high with
respect to Gnutella, but Chord guarantees that the content is surely found
(in Gnutella it depends).
Shortcuts The query process has been improved by using shortcuts: in
practise each node does not have just the knowledge about his neighbors,
but know the location of more peers. Those peers are not chosen randomly,
but with a specic rule: each time the space of a possible search of a le
must be divided in two parts. The graphical explanation is:
20 CHAPTER 1. P2P systems
The principal advantage of using shortcuts is that the search, instead
being linear (complexity n), becomes dicotomic and therefore, the complex-
ity is log n. The main drawback is that a sort of routing table is required:
in Chord is called nger table. For a given node N, it has m entries and it
is build as:
Index Value Successor
1 N+2
0
successor(N+1)
2 N+2
1
successor(N+2)
3 N+2
2
successor(N+4)
.
.
.
i N+2
i1
successor(N+2
i1
)
.
.
.
m N+2
m1
successor(N+2
m1
)
The value of m is critical: if it is large the probability of having conicts
(same output value applying the function on dierent inputs) is negligible;
on the other side, high values of m imply:
. large number of bits used;
. high length of the nger table.
1.9.2 Example
Given the following picture with m = 6 and the number of bits 2
6
= 64:
1.9. Chord 21
N56 K54
N4
N8
K10
N14 K10
N32 K24
K30
K38 N39
K38
N42
N48
N51
consider the case in which N8 is looking for K54. The nger table of N8 is:
Index Value Successor
1 8+1=9 N14
2 8+2=10 N14
3 8+4=12 N14
4 8+8=16 N21
5 8+16=24 N32
6 8+32=40 N42
In this case the query is forwarded to N42 which is the nearest peer; the
nger table of N42 is:
Index Value Successor
1 42+1=42 N48
2 42+2=44 N48
3 42+4=46 N48
4 42+8=50 N51
5 42+16=58 N4
6 42+32=74=10 N4
22 CHAPTER 1. P2P systems
At this moment, the nearest peer is N51; its nger table is:
Index Value Successor
1 51+1=52 N56
2 51+2=53 N56
3 51+4=55 N56
4 51+8=59 N4
5 51+16=67=3 N4
6 51+32=83=19 N21
Since the key is in between values 53 and 55, the peer selected is N56: in
three hops the key is founded.
Join procedure with shortcuts
If a new node wants to connect to the P2P application, runs the function
T to discover his Node Id: assume it is N26. In the example, it has to be
placed between N21 and N32. If, for example, he contact N4 to discover his
successor and predecessor, the way in which this search is made is thanks
to shortcuts, exactly like a query: rst the successor of N26 is found and
then contacting N32 is possible discover N21 which will be the predecessor
of N26, but at the moment is the predecessor of N32. After this preliminary
step, all nger tables have to be updated.
Procedure
1. ask to some nodes to retrieve the successor(n) and the predecessor(n);
2. create nger table of n and update nger tables of other nodes; the
update operation is very complex;
3. redistribution of keys.
1.9.3 Issues
A possible problem of consistency takes place when nger tables are up-
dated: for example, if a node is searching a key in a given node N, but if
nger tables that point to N are not updated the content will not be found.
Another issue is a failure of a peer. When it happens due to a simple
switch o of a peer, notications are sent to other nodes, but if a node fails
how notications are sent?
1.9. Chord 23
To avoid some of those issues, it is possible introduce some redundancy:
each node maintains a list of some successors and not only the knowledge of
one predecessor and successor. If, for some reason, the immediate successor
fails, the node considered contact some of other successors.
Stabilization procedure
It is run every some time: each peer n ask to his successor n + 1 to answer
who is its predecessor; if the answer is positive the peer n is actually the
predecessor of n + 1. Otherwise, if the answer is p, two possible anomalies
take place:
1. in the case p > n:
n
p
n + 1
in this case the information is wrong and the node n has to update his
nger table since his own successor is p and not n + 1;
2. in the case p < n:
p
n
n + 1
in this case the information is wrong and the node n+1 has to update
his nger table since his own predecessor is n and not p.
1.9.4 Load balance
The amount of work that each peer has to deal with depends how keys are
associated to nodes. Let x:
A
B
x
24 CHAPTER 1. P2P systems
x =
B A
2
m
This parameter x is simply the fraction of the ring that the peer B is in
charge of; larger is x, larger can be the number of key assigned to B, so
that node has to deal with a large amount of work. In other words, it is
also possible to say that x is the probability that B is storing a given key:
since they are uniformly distributed on the space (normalized values in the
picture below), the probability of having a key is proportional to the space
that a node is in charge of:
0 1
x
Assuming that there are keys in the system, the probability that A is
not in charge of having keys is:
{ (A has no keys) = (1 x)

while the probability that A has exactly i keys:


{ (A has i keys) =

x
i
(1 x)
i
The distribution of that probability is something like:
f
A
(n)
n
1 2
The region 1 represents nodes that hold few keys, while region 2 describe
peers with a huge amount of work to deal with; since the distribution is
symmetric with a low variance, the load is assigned quite fairly to nodes.
The mean number of keys stored in peer B is:
E[# keys] = x
and, if there are N active peers, due to their uniformly distribution into the
ring:
x =
1
N
1.9. Chord 25
Therefore:
E[# keys] =
1
N
=

N
The fair assignment of keys to nodes on average should not be good: if,
for example, the peer A has much more bandwidth with respect to peer B,
it would be better assign to A more keys in order to provide a better service
to all users.
1.9.5 Comparison between Chord and Gnutella
Chord Gnutella
scalability very good very good
robustness (to churning) poor very good
overlay maintenance complex/less costly simple/costly
performances (users) service guaranteed no service guaranteed
responsiveness O(log n) O(H)
performances (network) ecient (shortcuts) inecient (ooding)
O(log n) O(
H
)
node: complexity small very small
node: storage size order of m order of
node: load balanced depends on
node: contents no user dependency user dependency
Robustness in Chord is poor since the routing is deterministic (short-
cuts): if churning is high, updating nger table implies consistency prob-
lems. Indeed, structured systems, suer an intrinsic issue due to the fact
that peers have a quite large knowledge of the topology: this implies that
the state information is high therefore the accuracy have to be very precise
otherwise the system will be not reliable.
The responsive time is similar for both protocols, but actually they
are not comparable because one is a structured system and the other one
unstructured, Chord uses a deterministic routing to found contents while
Gnutella uses ooding.
26 CHAPTER 1. P2P systems
1.10 CAN
CAN (Content Addressable Network) uses the same basic approach of Chord:
peer, thanks an hash function, are mapped on a space like keys. Moreover
the space is the same for both keys and peers; the main dierence is that the
space is not mono-dimensional like in Chord, but it could have d-dimensions.
Peer Info Node Id
T
Contents Keys
(
For example, with d = 2, the space will have two dimensions identied by
two coordinates:
x
y
The way in which keys are assigned to peers is on the base of the distance:
the space is divided fairly to peers and each one controls his region. It implies
that, all keys placed in a given region, are assigned to the peer that is in
charge of that region. Graphically peers are marked in blue while keys in
orange:
1.10. CAN 27
1.10.1 Routing
When a peer is looking for a given key, he follows the shortest path to contact
the peer that is in charge of the region where the key is placed. Implicitly,
it means that peers has a detailed knowledge about their neighbors (with
a routing table): indeed, to select the shortest path, they have to choose
among them to contact the best one that guarantees the reachability of the
key.
1.10.2 Join
Once a new host has run the hash function he is able to know its nal
own position on the space. First he has to download, from a web page, for
example, a list of active peer. Then he contact one of them: this node, by
contacting his neighbors, determines the position of the new peer in the same
way in which queries are performed. When the right position is discovered,
the node that is in charge of that region has to partitioned it, assigning
to the new node a portion. Regions describe the load that each peer deal
with, therefore high width means high load. Graphically, pictures show the
scenario before and after the arrive of a new peer (marked in yellow):
A
A
B
At rst, peer A, was in charge of an huge area with 2 keys. After the arrive
of peer B, the area has been reduced and, nodes A and B, have to deal with
one key each one. In practise, the step 3 in Chord (redistributing keys in
page 22), is realized in an hidden way just dividing the area.
It could happen that the hash function returns values very similar for
two dierent peers: in this scenario is possible that, one of the two nodes is
in charge of a region, but it does not physically belong to that region. For
example:
28 CHAPTER 1. P2P systems
A
B
B is in charge of the yellow region although it does not belong to it. This
phenomenon is due to the fact that the algorithm tries to obtain a fair
distribution of the load and, therefore, to divide regularly areas.
1.10.3 Performances
The complexity of a query request or a join can be evaluated by means of
the average path length:
AVGpath lenght =
d
4
n
1/d
The formula says that, in order to have a complexity not too high, d must be
taken suciently large, but large values of d implies have many dimensions
and, therefore, many neighbors to contact each time a message is sent.
The parameter d is much more critical with respect to the parameter m
analyzed in Chord: indeed, the complexity in Chord grows by log n inde-
pendently by m while the complexity of CAN is directly given by the value
of d.
1.10.4 Leaving of a node and failures
When a node leaves, notications must be sent to his neighbors in order
to decide which of them have to take care of the leaving peers region.
Periodically, peers send messages containing information to their neighbors:
among of them there is also the width of the area. Indeed, the criterion that
peers uses to incorporate region is simply: the neighbor with the smallest
area will be the new owner. This is done to maintain some uniform into the
space.
When a message is sent and after sometime a timeout expires without
having received any notication, the peer realizes that some problems occur.
To recover, a timer is started and that peer waits for some other informa-
tion about his neighbor that seems failed. If nothing arrives the takeover
procedure take place. The timer is proportional to the area owned by the
neighbor of the node that seems failed, therefore being in charge of a small
area allows to enter quickly in the recover procedure. The takeover runs:
1.11. Tapestry 29
. sending pickover messages to all neighbors of the node that is assumed
to fail (it implies that each peer has also the knowledge about neighbors
of his neighbors);
. assigning to someone the area of the node failed.
All these managing mechanisms are asynchronous and only provided in
structured systems that are very complex to managed.
1.11 Tapestry
Tapestry adopts the same method of Chord and CAN: peers and keys are
mapped on the same space. The peculiarity is that the space is composed
of 160 bits organized into 40 hexadecimal digits.
To know distances among nodes, digits that represent a peer are com-
pared; for example, considering:
Node 4227:
. Node 4228 has distance 1 so it is a Layer 4 neighbor (1 digit dier-
ent);
. Node 42A2 has distance 2 so it is a Layer 3 neighbor (2 digits
dierent);
. Node 43C9 has distance 3 so it is a Layer 2 neighbor (3 digits
dierent);
. Node 6FA0 has distance 4 so it is a Layer 1 neighbor (4 digits
dierent).
Therefore:
. Layer 4: 422x;
. Layer 3: 42xx;
. Layer 2: 4xxx;
. Layer 1: xxxx.
where x [0 F].
If each digit is a peer the knowledge near the considered one is very
detailed while it is reduced going far away: this mechanism is called mesh
routing and allows to reduce complexity.
30 CHAPTER 1. P2P systems
Routing
It is very similar to the longest prex match: if the peer 5230 queries 42A1:
5230 400F
4277 42A2 42A1
L1
L2
L3 L4
The search is reduced more deeply goes into layers, but this advantage has
a cost: the maintenance of tables that potentially are large. If is the base
of digits, the complexity is O(log

(n)).
It could happen that the table is not completely full: it means that
some digits are not associated to some peer. This is very risky because the
algorithm was designed for a stable number of peers and this implies that
is not robust to churning.
1.12 BitTorrent
BitTorrent is a very popular system and it is a bit dierent with respect to
previous mentioned systems. The objective is distribute les with huge size
to a, potentially, high number of customers. The peculiar feature is that, the
content, is not stored by a given user, but it is distributed among peers that
share, among them, the bandwidth to download it. The overlay, therefore,
is designed for this purpose and not for make queries.
The content is divided into small pieces called chunks: to consume the
le they have to be all downloaded so, from a peer point of view, they have
the same importance. The usual dimension of chunks is around 64256 kbit:
they are quite small. The neighborhood (overlay) is established randomly, so
peers are forced to both download (new chunks) and upload (chunks held).
Transmission occur by means of TCP.
1.12.1 Analysis
The distributor that wants to share the le, has to create a .torrent le by
means of an hash function: indeed the .torrent is simply a le which index
all chunks including the hash keys that guaranteed the correctness of chunks
and, therefore, of the le. The .torrent contains also other information; some
1.12. BitTorrent 31
of them are: the le name, the le size, the number of chunks in which is
divided into a the address of the tracker.
After the creation of the .torrent the distributor has to upload it to a
website from which peers can download and start to receive the le. There
is a central authority that maintains the list of active peers that are sharing
the content: it is called tracker. The tracker is not connected to the overlay;
his purpose is just help peer to download the le and, for reliability is better
have more than one tracker managing the overlay for each le.
A
.torrent Website
Tracker
1. upload
2. request
3. download .torrent
4. contact
5. list of peers
The list downloaded by the tracker is, usually, composed by 40 peers: they
will become the neighborhood of peer A.
Denitions
. seeders: peers that hold the whole content; they are very important
for the well behaviour of the system because it is possible download
every chunk by a seeder;
. leechers: peers that hold just a part of the content;
. swarm: the totality of peers (seeders and leechers) that share the le;
. chocked peers: this nodes are not allowed to receive content from a
given peer;
. unchocked peers: this nodes are allowed to receive content from a
given peer.
Among the list of 40 peers downloaded by the tracker, the node select
just 4 peers: they are eectively those one that he is in contact with.
32 CHAPTER 1. P2P systems
1.12.2 Policies
In this section are describe policies in which a peer select the 4 nodes to
exchange trac and how select chunks to be downloaded.
Selection of chunks
Peers distribute a map that shows what chunks they hold; this map is sent
to peers neighbors, so they can decide which chunk should be downloaded.
The policy is simple: the rarest chunk is selected and this is done for two
reasons:
. avoid risks that a rare chunk disappears from the network;
. speed up the download.
Chunk are subdivided in sub-blocks which are composed by around 10
TCP packets ( 16 kbit). If some neighbors have the same chunk, it is
possible open more TCP connections to download in parallel (typically 5)
sub-blocks at a time. In this way an higher download bit rate is expected
because the bandwidth is enlarged: indeed, if the connection established for
downloading a sub-block is very very slow, the eect on the global rate is
mitigate from the other connections.
Selection of peers
Actually BitTorrent introduces two overlays:
. one for the list of 40 peers downloaded by the tracker (green peers);
. a second that contains the 4 peers (marked in orange) in which a given
peer is in contact with (the blue one).
The following picture shows this concept:
Overlay 2
Overlay 1
Physical network
1.12. BitTorrent 33
The selection is based on the technique tit-for-tat: it depends on how much
peers contributed in the past. The global advantage is that connections
with large bandwidth are favourite and the local advantage is that the sys-
tem forced each peer to share more because in this way it will receive a
better service (avoid free riders: peers that want just to download and not
contribute). In conclusion, tit-for-tat:
. improve cooperation among peers;
. provide fairness.
Due to tit-for-tat, there is the distinction of chocked and unchocked peers:
if a node in the past has contribute very little, probably it will be put in
the chocked list. Each peers has his own chocked list, computed every time
window (10 s for example), in which nodes are ordered by how much they
shared: in rst positions are put unchocked peers.
The main drawback is that, at the beginning, each node should receive a
very bad service since he is not able to contribute so much. This fact is avoid
thanks to optimistic unchocking: each time, one chocked peer is unchocked.
Indeed, when a peer receives request from others, the one that he will serves
are peers that have lots of chunks (they have lots of rare chunks and they
can contribute to share wery well). It means that the rarest approach for
beginning users can not be used: they have to choose randomly chunks to
download, then when their number will be suciently high, they can start
use the rarest approach since their contribution will be enough.
Tit-for-tat tries to improve fairness balancing how much a peer can con-
tribute with his desired service, but it is possible that, due to asymmetry
of network ow, it reduces the performances of the system. Imagine that
two peers are exchanging chunks belonging to the same content: if the com-
munication follows two dierent paths, it is possible that one of them is
bottlenecked. It implies that one of the two peers (A) has a very slow
upload ratio with respect to the other (B), therefore (B) can not exploit
completely his bandwidth because the mechanism tries to punish (A) that
has a low contribute.
To improve eciency and performances the end game mechanism has
been introduced: for each chunk, last sub-blocks are requested by the peer
in broadcast to his neighbors. Once the positive answer is received, the
request is aborted. This technique allows to avoid that, being unlucky, the
receiver waits too much time the download from a slower peer: indeed, since
just one chunk at a time is possible download, waiting for just the last
sub-blocks is waste of time that is possible to avoid. This implies that the
download is sped up.
34 CHAPTER 1. P2P systems
1.12.3 Case study: Flash Crowd
Supposing that a content is very popular and the purpose is to distribute it
to the largest number of customer possible. Assume:
. the number of peers interested in is n = 2

;
. two cases are avaiable:
1. a client/server scenario;
2. a scenario in which the content is redistributed by peers;
. the content distributed is an atomic entity;
. all peers have the same upload bandwidth b.
If the size of the content is s, the time needed to download/upload the
content is:
T =
s
b
Plotting on the x axis the number of peer contacted at each step and on the
y axis the time:
peers
time
T
2T
3T
T
2
4
8
2

Case 1
Considering the client/server scenario, the service capacity needed, is:
1.12. BitTorrent 35
t
C(t)
B
where B is the global capacity of the server, and B > b.
Case 2
In the other approach:
t
C(t)
b
It implies that this method is very eective: in a very short time, it reaches
the client/server approach.
Now consider the case of parallel download: each peer divides in two
his upload bandwidth in such way that two other peers can download the
content simultaneously. This time the time to complete a download is:
T
x
=
s
b/2
=
2s
b
= 2T
The graph will be:
36 CHAPTER 1. P2P systems
peers
time
T
2T
3T
4T
T
3
9
If the content is a chunk, comparing the two graphs, it is immediately
clear that is better not divide the bandwidth distributing it: this allows
to speed up the download because more peers are reached in less time.
Moreover, now becomes clear the fact that the size of chunks is reduced: if s
is small, also T is small and if the download time is small, the redistribution
takes place quickly improving performances.
The source (colored in blue in both graphs) is the peer that works for
the highest time, but the ( 1) step (that is the most eective because
allows to reach half peers interested in the content) works just for a while: it
implies that the potential bandwidth (2

b) is not completely exploited. A


way to improve it, is having independent distribution trees: they represent
paths follow by chunks to reach peers.
The most eective step is, as mentioned before, the last one because
allows to reach a large number of peers: this is a reason why the rarest
chunk selection is implemented. Indeed, in rst steps, the chunk is very
rare, so it is better to distribute it otherwise it can disappear from the
network, but at the end it is very popular and the risk of a loss is negligible.
1.13 Skype
Skype is a very popular system that adopts proprietary solutions, therefore
the design is closed and everything is encrypted. The knowledge about this
system is obtained thanks to reverse engineering. In this system directories
of people are distributed and they are managed only by super-peers. Reasons
of his success are:
1.13. Skype 37
. very good design and high quality (also in presence of NATs/rewalls);
. users are involved to use it since lot of people use it.
The overlay is hierarchical and distinguish:
. peers;
. super-peers that are very well connected.
An example is:
Super-peers
Normal peers
Super-peers are chosen by election among normal peers and it is possible
force the software to be not elected; super-peers must have:
. a public Ip address;
. bandwidth to share.
Super-peers are in charge of managing their normal-peers: they know when
peers are on/o line, they helps peers to nd other contacts and with com-
munications in presence of NATs/rewalls. However, each normal peer can
contact more than one super-peer for reliability.
Users are not identify based on their Ip address, but with an identier:
this helps people to use the application regardless the place in which they
are. Indeed, if they are at home they can use a pc, when they are at oce
another, but for the application the user is the same. This purpose is reached
through an authentication method: each time the user have to declare his
identity before being connected. Due to this fact, it is possible distinguish
two classes of signalling:
. one to login and to authenticate;
. one to look for other users.
38 CHAPTER 1. P2P systems
In general, as transport protocol, is used UDP: since the human voice
requires a low bandwidth, to avoid uctuations is better use UDP that does
not provide congestion control although it is not reliable. Of course, when
it is needed (in particular in presence of NATs and rewalls), it is possible
use TCP; the signalling trac, instead, is always sent through TCP.
A communication between two hosts not behind a NAT happens like:
. the initiator asks to his super-peer informations (Ip address and port
number) about the peer that wants to talk with;
. the super-peer provide those information;
. a test connectivity take place: the initiator tries to open a direct con-
nection;
. if possible they can start to communicate.
If the initiator is behind a NAT, the connectivity test fails because infor-
mation retrieved by the super-peer are dierent from the actual information
for the receiver: the answer is negative, therefore, and in the message are
specied the current Ip address and port number. In this way, the initiator,
using those new parameters seems that he is not behind the NAT. In the
case in which is the destination behind the NAT, it can not be reached:
therefore the initiator contact the super-peer telling that is the destination
has to start the talk.
When both are behind the NAT, they have also to retrieve their public
information from super-peers before start the communication. It is possible
to conclude that the reachability in Skype is very high: indeed, super-nodes
can also works as relay nodes in presence of NATs or rewalls; in this case
the two links are completely independent and transport protocols used can
be dierent. Solutions discussed are called Simple Traversal of UDP through
NATs (STUN) and Traversal Using Relay NAT (TURN).
With Skype is also possible contact the xed telephone network (proce-
dures called skypein/skypeout) by using gateways: in this case the quality
perceived is the same of the xed telephone because a dierent codec is used
(G729). Usually the voice codec is select from a list; main features are:
. bit rate: 10 32 kbit/s;
. xed inter packet gap (IPG): 30 ms.
Moreover, to deal with losses, Skype introduce redundancy.
1.14 P2P Streaming systems
P2P streaming systems are systems that provide multimedia service in prin-
ciple and the fundamental assumption is that, the user interested in the
1.14. P2P Streaming systems 39
content, consume it in real time that is he consume it while downloading.
Therefore several eorts are make in this sense: service interruption avoided,
reduce the delay are some of them.
Services provided are video, audio or both video and audio. Those sys-
tems can be distinguished based on the kind of service provided:
. VoD: video on demand (example: catalogue of video channels);
. real-time TV (examples: live sports events, interactive TV).
The fundamental distinction is the delay: in the second case it is much more
tight than the previous category. For real-time TV the latency, therefore is
very short: it is the gap between the moment in which the video is generated
and the moment in which the video is consumed. Regardless this classica-
tion, there is a delay to take into account every time: it is the delay that
consider the distribution of the content. Therefore, peers that compose the
neighborhood of a given node, are just those one interested in the same part
of the content. Peers are not forced to be synchronized, but in general peers
are interested to consume the same part of the content more or less at the
same time.
Reasons for which these systems are now popular are:
. possibility of distribution everywhere at the same time (example of for-
eign communities or places with few infrastructure where just internet
arrives);
. scenarios of closed market or due to expensiveness;
. small distributors: small communities interested in a given moment
where the number of users is large but sparse, for example scientic
contests.
Another classication of this system is based on the the type of overlay
used:
. tree-based;
. mesh-based (similar to BitTorrent).
The overlay is in charge of the distribution of contents and tasks performed
are:
. how nd content;
. how nd neighbors.
Users share their upload bandwidth to distribute contents.
40 CHAPTER 1. P2P systems
1.14.1 Tree-based systems
These systems were proposed as alternative to multicast IP distribution us-
ing routers to reach more than one users simultaneously. That method suf-
fered because routers were assumed to have much more capability; moreover
multicast suer of following issues:
. routers bottlenecked;
. addresses;
. group maintenance;
. security.
Hosts are divided in source (that generate the content), destinations and
intermediate hosts. An example of topology is:
Source
P1
P4
P6 P7
P5
P8
P2 P3
P9
If each node is in charge of distribute the content just to his children, the
bottleneck problem disappears because the required bandwidth is not too
high.
Tree construction Parameters to dene are:
. the number of levels of the tree (the number of hops to reach the last
layer of the tree);
. the fan out: the maximum number of children that each node can
have.
1.14. P2P Streaming systems 41
Based on the number of levels, it is possible impose an upper-bound on the
delay: it will be small if the number of levels is reduced. Based on the fan
out, instead, it is possible to impose a limit on the upload bandwidth: too
many children are dicult to manage. Indeed, the maximun fan out is:
F
out
=
global capacity
bit rate for each video
The important think to remind is that the upload bandwidth can not be
completely exploited because some signalling is needed.
Tree maintenance This is a very critical point because trees suer of
an intrinsic vulnerability: when a node switch o, the topology is divided,
therefore some parts of the tree may incur in a potential service interruption.
Potential problem Based on their position in the network, nodes can
contribute more or less to distribute contents, a part from nodes placed in
the last layer: they do not contribute at all. It implies that there is some
unfairness.
End-system multicast (ESM)
This system was not designed with a P2P approach and it has two overlays:
. one which is in charge of the tree maintenance: it is based on a mesh
topology (information related to maintenance is distributed with ood-
ing);
. the second is in charge of distribute and nd contents: it is based on
the tree topology.
The approach is distributed, but peers actually maintain a global view of
the network.
Join operation
. After a bootstrap phase (where the initial node downloads from a web
page a list of active peers) he contact someone;
. a join message is sent through the mesh overlay to all peers (in this
way everyone knows that a new peer wants to join);
. the same happens for the leaving step: a leave message is propagated
through the mesh topology.
42 CHAPTER 1. P2P systems
Periodically each peer sends a message by ooding: in this way nodes can
build a neighbor table because messages contains information like the peer
id from which they have received the message, the Ip address, the id of
the message and a timestamp. This helps to know when a node leave the
network after a failure: if after a timeout (checked with respect to the last
timestamp received by a given node) no messages arrives from that node,
the peer send a message to it; in the case in which no answer is received, he
is in charge to notify by ooding the leaving otherwise he just has to update
his neighbor table.
Once the mesh topology is created, to select the subset of the graph used
to detect the tree is used a distance vector algorithm.
Multi-tree systems
They are still tree-based system in which tree used are more than one; those
systems are also called second generation systems.
They were developed to deal with issues of single tree-based systems,
like:
. little robustness to churning (part of the tree isolated);
. inecient use of the bandwidth (last layer of the tree does not con-
tribute).
The content is organized in m sub-streams and each one is served by a
dierent tree. In this way, nodes that are in the nal layer of a tree can
be sources for another tree: this improve eciency and robustness because,
when a node leave there should be problems not in all trees, but only in those
one in which the node is not in the last layer. An important advantage is
that, managing several trees does not implies having too much complexity.
There is a balancing to what a peer receive and how much he contribute:
in some conditions it acts like an internal peer, distributing contents to m
children and in other conditions it acts like a leave by receiving contents.
1/m
m
1.14. P2P Streaming systems 43
A drawback is that the parameter m can not be adapted time by time (to
the capacity, to the number of peers), but it has to be decided a priori.
Peer Join A new peer that wants to join to the network, has to:
. nd his current position in m trees;
. join as an internal child in one tree, the parent will be the rst node
with the lowest depth that can accept a further child.
The highest is the position the more the peer will contribute in the distri-
bution.
Peer leaving The leaving of a peer, if he is placed as a leaf, does not cause
problems, while if he is an interior peer yes: his children have to re-perform
join operation.
Descriptors Multi-tree systems were designed for multi-descriptor codecs:
the original information is taken and coded into several descriptors, where
each one has a dierent codec. If the user is able to receive all of them, he
can consume the content with an high quality; if he is able to catch just
a part of them, he still is able to consume the content, but with a lower
quality.
Multi-trees systems ensure that, a node leaving, is not a critical issue
because it just could happen that some descriptors are lost, but this implies
that there is no service interruption: the content will be received with a low
quality.
The drawback is compression: the eciency of multi-trees plus multi-
descriptors is a bit reduced because multi-descriptors, to reach a good qual-
ity, need much more bandwidth.
1.14.2 Meshed-based systems
Those systems take inspiration from BitTorrent although the purpose is
dierent. Pieces of a streaming content are distributed to neighbors and
there is a tracker whose role is letting peers join by sending them a list of
active peers. As BitTorrent, there is no structured overlay: nodes are not
force to be placed in a given position accordingly to a given topology. The
overlay, indeed, is a mesh randomly created. The maintenance is provided
with a gossip-algorithm: peers declare a list of neighbors that send to their
neighbors and through Hello messages the presence is periodically notied.
Indeed, having a small neighborhood, limit a peer in:
. distributing/receiving;
. more easily to be out of service.
44 CHAPTER 1. P2P systems
Gossiping is not ooding: no rules are imposed to reach all nodes with a
given update information.
As in BitTorrent, the neighborhood from which the peer exchange trac
is reduced: the peer select them based on:
. their workload or capacity;
. path characteristics: RTT, loss probability (but are time variable pa-
rameters and have to be measured);
. content availability.
Mesh Topology
Neighbors
Neighbors to exchange trac
Data delivery Contents are divided into pieces called chunks that are
treated independently, therefore they can follow dierent distribution trees.
Policies to distribute chunks are local, so there is no a wide coordination;
scheduling mechanism are basically:
. push: decisions are taken by the transmitter;
1.14. P2P Streaming systems 45
. pull: decisions are taken by the receiver.
With a push, the peer, based on the chunk, send it to neighbor without
negotiation while with a pull, is the receiver that requests the desired chunk:
this implies having some knowledge about contents.
Push Pull
short delays (no negotiation) requires more signalling
multiple copies (waste of bw) no multiple copies
possible losses larger delays
Push may suer of losses when, due to multiple copies, the bandwidth is not
enough.
Strategies Let:
. u and v be peers and neighbors;
. c(u) be the set of chunks held by u;
. C c(u) be the set of chunks sent by u.
Strategies are methods to decide what transmit or request ,based on C
and v, chunks and neighbors.
. rst peer selection:
. random selection;
. random selection of useful peer (the one that need something
from u)
v such that c(u) ` c(v) = 0
here is very important to keep in mind what is the order of selec-
tion: if rst is the peer, there are constraints on chunks to deliver,
while on the contrary, a chunk-peer selection implies having con-
straints on the peer;
. most deprived peer (the one that can receive a lot of chunks from
u);
. rst chunk selection:
. random selection;
. random useful selection;
. latest blind chunk (the most recent chunk, with respect to source
generation, is sent: it is the more urgent needed by peers, the one
with the tight delay constraint);
46 CHAPTER 1. P2P systems
. latest useful.
Indeed, another concept similar to BitTorrent, is that latest chunks are held
by few peers, so it is better to distribute to make them safe (not lost easily).
Examples
. random peer/latest blind: this combination pushes every time the last
chunk; if the source is greedy, the service perceived is good because,
due to latest blind, does not matter which chunks hold the receiver;
properties are:
. little overhead;
. minimum delay;
. possible losses and duplicates;
. most deprived/latest useful: rst is selected the peer that hold the less
and is sent to him the latest useful chunk; this implies that peers must
have knowledge about chunks held by their neighbors; properties are:
. large overhead;
. large delays.
Performances There are two complementary indices:
. diusion rate r(t): probability that a generic peer receives a chunk in
a time smaller than t; it gives ideas on delays;
Diusion rate
Given a time, how many peer is possible reach?
. diusion delay: it is the delay that a chunk takes to reach a fraction
1 of peers; xed , i.e. 5%, the diusion delay measure the time
needed to reach the 95% of peers.
Diusion delay
Given a population, which time take to reach a part of it?
1.14. P2P Streaming systems 47
Relation delay-losses Since users consume the content while they are
downloading it, the delay should be the shortest as possible:
Source t
1 2 3
t t
Layer 3 Network
Peer t
1 2 3
The delay of layer 3 network is composed by a combination of scheduling and
buer policies, possible congestion and propagation delay; but also delays of
layer 4 and 7 have to be considered, therefore each packet is received with
a dierent delay: the variability of delay is called jitter. Moreover, it can
happen that packets are received out of order:
Source t
1 2 3
t t
Peer t
1 3 2
When the rst packet had stared to be played, the codec needs that exactly
after t the second is ready and so on, therefore out of order packets are
very dangerous: they decrease the quality perceived. To deal with this fact
is possible introduce an initial playout delay: it is articial and used just to
increase the probability of receiving right chunks before play them.
48 CHAPTER 1. P2P systems
Source t
1 2 3
t t
Peer-received t
1 3 2
Peer-played t
1 2 3
playout delay t t
The trade o delays-losses is:
. higher delay no losses high quality perceived;
. lower delay possible losses low quality perceived.
Losses can be due to:
. packets/chunks never received;
. packets/chunks received late.
The second category is much more critical because packets late received
are useless and are an useless waste of resources (in terms of bandwidth).
Indeed, the following picture highlights this fact:
current chunk played
useless info buer
chunk loss chunk owned
chunk not already received
Chunks that are not received that in the sequence order are put previously of
the current chunk going to be played are useless, while if they are consequent
they can still be received: therefore a buer is needed to store them. By
adopting a policy in which chunks are selected by latest blind, the one needed
is shown in orange in the previous picture. The use of buer allows to reach
some synchronization: indeed peers are interested in the same content at
the same time, so in a situation like:
1.14. P2P Streaming systems 49
Peer 1
current chunk played
Peer 2
current chunk played
the two peers are not interested in communicate each other. The conse-
quence is that chunks has not the same relevance as in BitTorrent: some of
them are more urgent and other can become useless if not received in time.
The information needed by peers to communicate what chunks they can
transmit is the buer map (BM): it is a map that describe owned and not
owned chunks by a given peer. For example:
1 0 0 1 1
The exchange of buer maps can happen:
. periodically (issues in choosing the period: long implies delays, small
overhead);
. at each received chunk.
When a peer has to re-distribute a chunk:
A
B C D
new chunk received
the temporal diagram, considering a pull policy, is:
50 CHAPTER 1. P2P systems
t
A B C
request from B
request from C
received B received C
B
M
t
o
B
B
M
t
o
C
c
h
u
n
k
t
o
B
c
h
u
n
k
t
o
C
delay A-B
delay A-C
It is possible conclude that, the exchange of buer maps introduce a further
delay, while, considering a push policy, the temporal diagram is:
t
A B C
request from B
request from C
received B received C
B
M
t
o
B
B
M
t
o
C
c
h
u
n
k
t
o
B
c
h
u
n
k
t
o
C
delay A-B
delay A-C
The delay is reduced, but if B and C do not request that particular chunk,
the bandwidth is waste with no meaning behind.
There are some proposes to reduce the delay by using a pull policy:
. select peer based on RTT: indeed the exchanging phase of buer maps
allows to measure the RTT, therefore if the peer is selected based on
that measure, decision could be better; a possible drawback is that,
distance-based decisions may degenerate in partition the network, in
terms of connectivity: locality is introduced;
. select peer based on probability: with p select randomly, with 1 p
use RTT measures;
. wait an amount of time t before select the same peer again: this allows
to inhibit the selection of the same peer to not favourite him;
1.14. P2P Streaming systems 51
. bandwidth aware policy: reduce delays by exploiting at the best the
bandwidth, especially when peers have dierent upload bandwidth
because the delivery is favourite to nodes that have it more; the picture
show this fact, emphasising that peers with more bandwidth have large
size:
statistically, this allows to reduce the number of hops because trees
are much more short; the main issue is the detection of the upload
bandwidth.
Issues
. Fairness: very evident in bandwidth aware policy, some nodes may
distribute more than they receive.
. Depending on the codec, is not possible increase too much the down-
load bandwidth, therefore the quality is bounded.
. Content aware: to improve eciency is possible change codec, but
chunks do not have all the same importance, so the ones more relevant
have to be transmitted in such a way to be sure that they can be
received.
. Costs for ISPs.
52 CHAPTER 1. P2P systems
Chapter 2
Random graphs
2.1 Introduction and denitions
Random graphs are created through rules that provide randomicity: they
are use to model and describe systems with many components and high
complexity. Application elds are:
. model the internet (layer 3 network);
. model the web, www (interconnection when browse a page, layer 7
network);
. network designing;
. biology;
. social networking.
P2P systems are based on overlays: a way to model them is through random
graphs. This kind of models are used to:
. understand the system;
. tuning parameters;
. design choices;
. performance evaluation (in simulation, for example, evaluation of scal-
ability).
Denitions
. Graph: composed by:
. nodes/vertices;
. edges/links;
53
54 CHAPTER 2. Random graphs
. neighbor: node connected directly through a link;
. degree: number of neighbors of a given node;
. component: subset of nodes connected each other through links
(more components inside the graph implies having a disconnected net-
work because picking up two nodes from two dierent components
they are not reachable);
. giant component: a nite fraction of nodes belonging to the same
component (if the number of nodes is high and there is a giant compo-
nent the network has a very good connectivity; in biology scenario, to
isolate viruses, the presence of giant component is bad because allows
infections easily: better have a low connectivity);
. clustering: the probability that two nodes are neighbors increases if
they have at least one neighbor in common;
. clustering coecient: the average probability that two neighbors
of a given node are neighbors too;
. radius (around a node): the distance (in number of hops) to reach
any node from a given node.
2.2 Erdos-Renyi Model
Given:
. n: nodes;
. p: probability that a link between two nodes exists.
the the resultant graph is called G(n, p). Another equivalent denition is:
G(n, p) is a set of graph of n nodes and each graph appears with a probability
that is typical of the number of links. Indeed, considering:
. n: nodes;
. m: links;
there are many combinations that have a certain probability of appear:
{ = p
m
(1 p)
Mm
where M is the total number of possible links. Analysing each term:
. p
m
: is the probability that exactly m links are present;
. (1 p)
Mm
: is the probability that all other links do not exist.
2.2. Erdos-Renyi Model 55
Denitely { is the probability that a given graph G appears, but it is possible
build several graphs over the same number of nodes; consider another of
them, G

:
{ (G) = {

M is the total number of links, therefore is the number of links if the


topology is a full mesh:
M =
n (n 1)
2
where the division by 2 is necessary since directions of links do not count.
2.2.1 Average degree
The average degree of a node is the average number of links that he has, it
depends on the graph and it is a random variable. The average degree can
be computed as the probability of the total number of possible links divided
by the number of nodes.
. M p is the average number of links generated by the process (the
number of potential links times the success probability);
. n is the number of nodes.
Actually this is not enough because to be precise one link consist of two
end-links that connects two nodes, therefore the average number of links
generated is 2 Mp. In conclusion:
avgdegree =
2 Mp
n
=
n (n 1) p
n
= (n 1) p
The average degree can also be written as z or < >. For large number of
n:
z = (n 1) p n p
Values of z
. z = 1: is a critical value.
. z > 1: with high probability there is a giant component.
. z < 1: there is not a giant component.
Clustering coecient
The clustering coecient perceived is:
c = p
therefore:
c = p =
z
n
for large values of nodes present.
56 CHAPTER 2. Random graphs
2.2.2 Degree distribution
Given random variable describing the degree and {

the probability that


the degree of a node is equal to , it is possible say that:
{

n 1

(1 p)
(n1)
where:
. n 1 are the total number of possible experiments: all nodes minus
the one considered;
. is exactly the number of successful experiments.
If n <( z):
{

=
z

!
and it is a Poisson distribution with parameter z: it means that E[{

] = z.
This approximation is due to the fact that the binomial distribution tends
to a Poisson for large numbers of n and small numbers of .
2.3 Bender-Caneld Model
This model deals with random graphs that have a given non-Poisson degree
distribution. Graphs are built in two steps:
. assign edge-ends to nodes (for each value of the degree probability
density function, edge ends are assigned accordingly);
. randomly connect edge-ends.
This is a dierent way to build random graphs with respect to the Erdos-
Renyi model because positions are independent and no notions of locality is
present.
Following sections deal with properties derived by this model.
2.3.1 Node reachability
The node reachability property studies the possibility of having a giant com-
ponent: if nodes are easily reached, it means that the probability of having a
giant component increases, while, on the contrary, a bad reachability implies
low connectivity and therefore, the giant component will not be present.
Consider the following topology, in which, starting from a given node
(marked in orange) the reachability of 1-hop (in light-blue) and 2-hop (in
violet) neigbors is studied:
2.3. Bender-Caneld Model 57
. 1-hop neighbors: their number is the degree;
. 2-hop neighbors: to compute their number, the distribution degree
of 1-hop neighbors is required; in principal, each node has the same
probability {

to be picked, but 1-hop neighbors are not picket ran-


domly: if a node has an higher degree, it has much more probability
to be picked, so the rule is {

.
To understand this concept, consider the star topology:
in which n nodes are composed in such a way:
. the center with degree n 1;
. n 1 nodes with degree 1.
From this is possible to derive:

n 1
n
1 n 1
1
n
58 CHAPTER 2. Random graphs
where the heigh is proportional to the degree and, to be a distribution,
is normalized. By starting from the center, the degree perceived is 1, but
starting from any other node the degree perceived is n1 because the center
is easy to reach. Therefore:

n 1
n
1 n 1
1
n
If each node counts proportionally to his degree, the center counts:
(n 1) {

because is reached many times, while any other node counts:


1 {

In conclusion, it is possible say that the general distribution of 2-hop neigh-


bors is proportional to:
{

Of course it is not a distribution because it does not sum to 1. Since from


1-hop neighbors also the initial node is reachable, it does not have to be
accounted, therefore new nodes reachable are 1. It implies that the
probability density function of new nodes reachable in 2 hops is:
q
1

= {


Therefore:
q

= {
+1
( + 1)
To be a distribution:
q

=
{
+1
( + 1)

j
j {
j
The average is given by:
Avgq

=0
q

=0

{
+1
( + 1)

j
j {
j
By substituting i = + 1 = = 0 i = 1:
Avgq

i=1
{
i
i(i 1)

j
j {
j
=

i=1
{
i
(i
2
i)

j
j {
j
2.3. Bender-Caneld Model 59
By splitting the numerator into two sums:
Avgq

i=1
{
i
i
2

i=1
{
i
i

j
j {
j
Now:
.

i=1
{
i
i
2
is the second moment <
2
>;
.

i=1
{
i
i and

j
j {
j
are rst moment (average) < >.
Therefore:
Avgq

=
<
2
> < >
< >
Since this represents the average number of nodes discovered in two hops
it will be denoted with z
2
. Till now are considered just 2-hop neighbors of
one 1-hop neighbor of a given node; the following picture shows this fact by
highlighting the paths mentioned in red:
Of course, the initial node has more neighbors so, to compute exactly z
2
all
of them have to be considered: to do this, it is just needed to multiply z
2
by the number of nodes of the initial node and this number is the degree
< > (also possible to call z
1
to emphasize that counts 1-hop reachable
neighbors):
z
2
=
<
2
> < >
< >
z
1
=
<
2
> < >
< >
< >=<
2
> < >
The formula shows how the number of reachable nodes growths: the domi-
nant value is <
2
>.
60 CHAPTER 2. Random graphs
Example
If the distribution is Poisson (it is the case of the Erdos-Renyi model) the
variance is equal to the mean value and:
< >=<
2
> (< >)
2
= <
2
>= (< >)
2
+ < >
Therefore:
z
2
=<
2
> < >= (< >)
2
+ < > < >= (< >)
2

Starting from z
2
, by iteration, it is possible discover that:
z
m
=
<
2
> < >
< >
z
m1
Since:
. z
2
=<
2
> < >
. z
1
=< >
the result is:
z
m
=
z
2
z
1
z
m1
=

z
2
z
1

m1
z
1
By analysing the fraction z
2
/z
1
:
. if:

z
2
z
1

< 1
when m grows (the distance grows) it seems like a constant, so there
is bad connectivity: it implies that there is not a giant component;
. if:

z
2
z
1

> 1
on the contrary, all conditions lead to have a giant component;
. if:

z
2
z
1

= 1
there is the so called critical condition: it is dicult study the be-
haviour.
2.3. Bender-Caneld Model 61
Example
Focusing on the Erdos-Renyi model in critical conditions:
z
2
= (< >)
2
z
2
z
1
= 1 =
(< >)
2
< >
= 1
therefore:
< >= 1
Conditions that lead to a giant component is:
(< >) > 1
Since:
z
2
z
1
=< >
It is possible discover that:
z
m
= (< >)
m1
z
1
= z
m
= (< >)
m1
< > = z
m
= (< >)
m
it means that the discovering process of reachable nodes grows geometrically.

2.3.2 Small-world eect


This eect tells that considering a network with a large number of users,
the distance between them is relatively small because some of users are very
well connected.
Assuming:
z
2
z
1
1 (2.1)
for sure there is a giant component, therefore the network is very well con-
nected. Now m represent the distance between a given node and any other:
each iteration (1 , 2 , . . . , m) allow to discover a very high number of new
nodes, but is the last iteration, the one that allows to reach nodes at distance
m, that lead to discover more nodes. As a consequence, the mean value is
dominated by the last hop. If n is the number of nodes, when z
l

= n, the
maximum distant nodes are reached and, thanks to hypothesis 2.1, for sure
is possible reach them. In formulas:
z
l
=

z
2
z
1

l1
z
1
= n
By taking the logarithm:
log

z
2
z
1

l1
= log
n
z
1
= l 1 =
log n/z
1
log z
2
/z
1
62 CHAPTER 2. Random graphs
In conclusion:
l =
log n/z
1
log z
2
/z
1
+ 1
where such l is the average distance inside the network: it is also called
diameter. The parameter l grows as the logarithm of n: if the number of
nodes is very large, l does not grow too much, therefore the small-world
eect is ensured. It also means that randomly built graphs have a shortest
distance.
Since in the Erdos-Renyi model z
1
=< >= z and z
2
= (< >)
2
= z
2
:
l =
log n/z
1
log z
+ 1

=
log n/z
1
log z

=
log n log z
log z

=
log n
log z
This behavior is also valid for trees topologies, while regular structures:
. the ring has an average distance that grows with n (because it is n/2);
. a grid topology in which there are n
2
nodes has an average distance
that grows with

n.
It means that regular structures have intrinsically worst performances be-
cause:
. have higher distances;
. are less robust to churning (maintenance is hard).
Example
Consider an average delay D = 0.2 s; to not exceed a maximum average
delay R = 1 s the distance l should be computed as:
l D

< R
By using:
l

=
log n
log z
D < R
It is possible obtain:
log z >
log n
R
D
Consider:
. n = 10
4
= log z > (4 0.2) = z > 6.3
. n = 10
6
= log z > (6 0.2) = z > 15.8

This term, l D, shows the average delay to reach the farest node.
2.3. Bender-Caneld Model 63
It means that the degree increases by a factor of 3 every time the number
of nodes increase by a factor of 100.

Focusing on the critical condition, it is possible say that:


z
2
z
1
= 1 = z
2
= z
1
Therefore:
<
2
> < >=< > = <
2
> 2 < >= 0
This is:

=0
( 2) {

= 0
By analysing this expression, it is clear that terms with = 0 , 1 , 2 have no
eect on the nal result (the occurrence of the giant component) because:
. terms with = 0 are isolated nodes;
. in terms of reachability, = 1 , 2 are the same:
=
2.3.3 Clustering
The following analysis are performed for any distribution that is not Poisson;
the clustering property shows the probability that two neighbors of a given
nodes are neighbors. To be veried, it is need that the orange link in the
following picture is established:
A
B
C
Therefore the clustering coecient describe how much locality is introduced
into the network. Considering that:
. node B has connectivity
i
;
64 CHAPTER 2. Random graphs
. node C has connectivity
j
,
the clustering coecient is given by:
c =
<
i
> <
j
>
n z
where:
. the numerator represents the all ways in which is possible connect the
two nodes;
. the denominator represents the average number of links in the network
because is given by the number of nodes n multiplied by the average
degree of each node z.
For 1-hop neighbors the distribution is q

and it is independent looking ad


dierent nodes, therefore:
c =
<
i
> <
j
>
n z
=
1
n z

2
=
1
n z

<
2
> < >
< >

2
By multiplying and dividing by z
2
:
c =
z
n

<
2
> < >
(< >)
2

2
=
Now, to the numerator is added and subtracted the quantity (< >)
2
:
c =
z
n

<
2
> (< >)
2
+ (< >)
2
< >
(< >)
2

2
In this way is possible recognize, within the numerator, the variance. Since
the coecient of variation is dened as:
c
v
=

var
avg
=

Var <
2
> (< >)
2

< >
within the clustering coecient it is possible recognize the square:
(c
v
)
2
=
<
2
> (< >)
2
(< >)
2
Therefore:
c =
z
n

(c
v
)
2
+
< > 1
< >

2
Since the clustering coecient depends on the square of coecient of vari-
ation, the dominant value is the variance. In conclusion the variance is
extremely important: it ensures high connectivity and introduces locality.
2.4. Heavy-Tailed Distribution 65
Variance
Giant component Clustering coecient
Example
Using this formulas for the Erdos-Renyi model:
(c
v
)
2
=
Var
(< >)
2
=
< >
(< >)
2
=
1
z
Therefore:
c =
z
n

1
z
+
z 1
z

2
=
z
n
1 =
n p
p
= p
Indeed, p is the probability that two nodes have a link that connect them,
so it is the also the clustering coecient.

2.4 Heavy-Tailed Distribution


The heavy-tailed distribution (also called power-law) is used to represent
phenomena like P2P systems, the topology of the Internet, how much a client
is connect (temporarly) and social networks: they both have in common the
feature that their distribution does not decrease as an exponential, therefore
are not representable through a Poisson distribution. It means that the
probability of having large values is not negligible; it is:
{

and such can take, typically, values:


2 < < 3
In mathematical terms, this systems has a nite average, but innite vari-
ance since the second moment tends to innite:

2
{

d
This behavior is not really good, because both the small world and clustering
property depends largely on the variance. But the distribution comes from
measures and the tail is typically dicult to estimate precisely.
66 CHAPTER 2. Random graphs
Scale-free property
The scale-free property says that after this change:

the shape of the distribution does not change. But, the mean value, is
not too much representative of system described before: think at the time
connectivity. There are few users that have very long time connections while
the major part of users have short time connections.
2.5 Watts-Strogatz model
This model represent a family of random graphs that is obtained as an
intermediate solution between pure random-graphs and regular structures.
This interpolation allows to provide both peculiar properties of the two
families:
. regular structures (lattices): notion of locality (clustering);
. random graphs: small world eect.
By considering a regular structure (a ring, for example), a Watts-Strogatz
model is built introducing randomicity:
The connectivity, considering a given node (marked in blue in the picture),
is:
. m nodes in the clockwise order;
. m nodes in the counter clockwise order.
Therefore, each node has a degree equal to 2m. The average distance be-
tween nodes grows linearly with the number of nodes n in the network:
thanks to short cuts (as in Chord) it is possible reduce it. Indeed, the
process to obtain a Watts-Strogatz model is:
. for each node:
2.5. Watts-Strogatz model 67
. take each clockwise link;
. rewire it randomly with a probability p (or maintain it with a
probability 1 p).
The following picture shows this procedure:
=
Properties mentioned before (small-world eect and clustering) depends
on p:
. if it is large, the system tends to be a pure random-graph (for p 1
tends to be a Erdos-Renyi graph);
. if it is small, the system tends to be a regular structure with high
clustering (long xed routes to reach farthest nodes).
2.5.1 Clustering analysis
When p = 0, the clustering coecient is:
c =
3 (m1)
2 (2m1)
therefore depends basically on m, but it is very high (greater than 0, while
for Erdos-Renyi is something near 10
4
). It means that, the probability
for two nodes of being neighbors is high if they have a common neighbor.
Indeed, look at the following picture:
the green nodes are neighbors and have a common neighbor: the blue node.
This behavior has to taken into account not just considering the degree of
68 CHAPTER 2. Random graphs
a node, but considering the degree for all of them: the result is a very high
locality.
When p > 0:
c =
3 (m1)
2 (2m1)
(1 p)
3
it means that when p increases, the connectivity based on locality decreases.
2.5.2 Small-world analysis
The small-world property describe the distance between nodes. The average
distance depends on the number of nodes in regular structures: if it is a grid:
. with 2 dimensions, the complexity is O(

n);
. with 3 dimensions, the complexity is O(
3

n);
In general:
l O(n)
Look at the following graph:
In the region placed at the left top values of p leads to a regular structure,
while the bottom right region describe random graphs. In the center there
is a zone in which are satisfy both the small-world property and clustering.
Considering the ring, it is possible say that, by introducing few short
cuts (few with respect to the number of links) the small-world property
start to be ensured because those short cuts connect very far nodes. When
the number of short cuts inserted increases, their benet decreases: it is
better, indeed, introduce few of them an use just to reach farthest regions,
then use the locality connections to reach the destination.
With short cuts, the size of regions obtained by splitting is given by:
n
np

1
p

where:

The complexity of this formula il linear.


2.6. Theory of evolving networks 69
. n is the space size (number of nodes);
. np is the number of short cuts introduced.
To ensure the small-world property:
1
p
<n = p
1
n
If the network is large, the small-world property is ensured by having p also
small.
To guarantee clustering:
p <1

In conclusion, to have simultaneously the small world eect and clustering,


is necessary have:
1
n
<p <1
This model has been largely used to model P2P systems: for example, in
P2P streaming system, too much locality lead to obtain bad performances
because to reach, with a chunk, the entire network it take a very large time
(so the delay increases). To deal with this fact, sometimes neighbors are
randomly picked: this can be seen as a short cut. In BitTorrent happens
the same: to diversify the content downloadable, neighbors are not always
selected based on the tit-for-tat procedure, but sometimes are randomly
selected.
2.6 Theory of evolving networks
This model takes care of the evolution of the network: how the overlay
evolves in time. The algorithm:
. dene a graph with n nal nodes;
. starts with m
0
nodes, where m
0
< n (such m
0
is the initial condition);
. at each step a node is added: it takes n m
0
steps to build the nal
topology.
The time evolution is characterized by the fact that, at each step, nodes
have a dierent degree: depending on the policy adopted, the system can
evolve dierently. Basically, the simplest policy is add new nodes to ones
that have an higher degree: this helps to reach more nodes with shortest
path.

This term is due to the term (1 p)


3
.
70 CHAPTER 2. Random graphs
Denitions
. s: time in which a node is introduced, it represents the age (older
nodes has more chance to be well connected);
.
s
: degree of the node introduced at time s; it is described with a dif-
ferential equation: to simplify the math, it is assumed to be continuous
(
s
(t));
. m: links of the new node.
The evolution of the system is described by:

s
(t)
t
= m (
s
(t)) (2.2)
The increase of the degree depends on the number of links and is proportional
to the degree itself. The term () is a function that describe how new nodes
are connected to the already existent network: it is the connection policy
and can be considered as the term that describe the system evolution. At
the beginning the degree is:

s
(s) = m
Barabasi-Albert criterion
This approach says that scale-free networks are built with a preferential
attachment criterion. The algorithm is:
. start with an initial graph;
. at each step a node is attached (m links);
. links are preferentially attached to nodes based on their degree:
(
s
(t)) =
1

j

j
(t)

s
(t) (2.3)
The term:

j
(t)
is a normalization coecient that describe, statistically, the amount of all
possible degrees of links.
By substituting (2.3) in (2.2), it is possible obtain:

s
(t)
t
=
m
s
(t)
(2mt + 2m
0
<
s
>)
(2.4)
where:
2.6. Theory of evolving networks 71
. 2mt represents the links already introduced in the network;
. 2m
0
<
s
> is the initial distribution of the degree since m
0
is the
number of links at time s and <
s
> is the average degree at the
beginning.
The denominator is, globally, the coecient of normalization seen in (2.2).
The equation (2.4) shows that at each step t, 2m new links are introduced:
this is the contribution of the degree of two dierent nodes.
At the beginning:

s
(s) = m <
s
>= 2m
At the end:

s
(s)

= m

t
s

1/2
for t
This suggest that the degree increases as a square root function in time; the
denominator is s and represent the current node: the degree is high if the
node is older, therefore it depends on the age of nodes. Consider a node s

older than s where:


s

< s < t
The ratio:

(t)

s
(t)

s
s

1/2
Looking at large values of t:
{

= 2 m
2

3
therefore the probability that a node has degree is a heavy-tailed distribu-
tion: the scale-free property is ensured. For what concern the small-world
property and the clustering:
l
log n
log log n
c =
m
8n
(log n)
2
The small-world property is expected because there are few nodes very well
connected: they are the oldest nodes. The clustering property is similar to
the Erdos-Renyi model in which decreases with the number of nodes.
72 CHAPTER 2. Random graphs
2.7 Resume scheme
Model Small-world Clustering
ER l

=
log n
log z
c = p
1
n
RG with empirical distr. l =
log n/z
1
log z
2
/z
1
c =
z
n

(c
v
)
2
+
z 1
z

WS (p = 0) l n non ensured c =
3 (m1)
2 (2m1)
WS (p > 0) ensured high clustering
BA ensured low clustering
For random graphs with empirical distribution, both, small-world prop-
erty and clustering depends on the variance: with power-law the scale-free
property is ensured.
For Watts-Strogatz the value p should be taken:
1
n
<p <1

You might also like