You are on page 1of 16

Peer-to-Peer Netw. Appl. DOI 10.

1007/s12083-011-0108-4

A peer-to-peer system for on-demand sharing of capacity across network applications


Georgios Exarchakos & Nick Antonopoulos

Received: 14 March 2009 / Accepted: 24 August 2011 # Springer Science+Business Media, LLC 2011

Abstract As a plethora of various distributed applications emerge, new computing platforms are necessary to support their extra and sometimes evolving requirements. This research derives its motive from deficiencies of real networked applications deployed on platforms unable to fully support their characteristics and proposes a network architecture to address that issue. Hoverlay is a system that enables logical movement of nodes from one network to another aiming to relieve requesting nodes, which experience high workload. Node migration and dynamic server overlay differentiate Hoverlay from Condor-based architectures, which exhibit more static links between managers and nodes. In this paper, we present a number of important extensions to the basic Hoverlay architecture, which collectively enhance the degree of control owners have over their nodes and the overall level of cooperation among servers. Furthermore, we carried out extensive simulations, which proved that Hoverlay outperforms Condor and Flock of Condors in both success rate and average successful query path length at a negligible increase in messages. Keywords Peer-to-peer . Computational resource sharing . Resource migration . Keyword-based control

1 Introduction Heterogeneous distributed applications deployed on different networks may have quite variable network throughput performance requirements during their lifetime. We define the Network Capacity as the number of user queries a node can process within a time unit. The Network Capacity depends on the combined communication and computation throughput of that node. In case the applications workload overcomes the available network capacity (overloaded situation), new nodes are required to join the network to serve the demand. On the contrary, when the application produces traffic that fewer nodes could efficiently serve (underloaded situation) the network application may free some of them and increase the remaining nodes utilization. Hoverlay is a P2P management system that enables the sharing of reusable resources and specifically network capacity. Network capacity is a non-replicable, reusable, stochastically available resource and only one instance of it may exist within a network and no more than one user may use it each time [1]. This architecture facilitates the cooperation of heterogeneous networks for improving the utilization of the spare (currently not used) capacity in the whole system. The overlay consists of a set of interconnected servers each of which represents the nodes of an underlying network. In this paper we extend the CSOA model presented in [2] with a detailed presentation of the resource matching functionality of servers and with a mechanism for an improved server collaboration based on keyword-driven description of underlying applications. Moreover, this paper builds on top of [2] with an extensive evaluation section comparing Hoverlay with competitive systems. That model, CSOA, was named after the initials of a phrase describing

G. Exarchakos : N. Antonopoulos University of Surrey, Surrey GU2 7XH, Guildford, UK N. Antonopoulos e-mail: n.antonopoulos@surrey.ac.uk G. Exarchakos (*) Electrical Engineering, Eindhoven University of Technology, Eindhoven 5612 AZ, The Netherlands e-mail: g.exarchakos@tue.nl

Peer-to-Peer Netw. Appl.

its functionality. Now, the same model is identified with a unique name: Hoverlay. The main contributions of this work are: & A distributed cooperation mechanism among free resources and requestors in a multi-domain network. Hoverlay relies on resource migration to; hence, control over resource mobility gets important. The proposed mechanism discourages closed group formations via rewiring. An extensive experimentation with the behaviour of Hoverlay against its more static counterpart Flock of Condors. We claim that resource migration can significantly improve the success rate and discovery latency of a distributed environment especially in cases of flash crowds at certain areas.

&

The paper has the following structure. After a literature review (section 2) to justify the scope of this work, we give small overview to Hoverlay (section 3) and we continue with the cooperation mechanism at section 4. Sections 5, 6 and 7 deal with the setup of the extensive experiments and their results in two scenarios respectively. Conclusions and future work close the paper at section 8.

2 Related work Any centralized approach of interconnecting all the underlying networks servers would suffer from the high workload from frequent queries and advertisements of the requested and available capacity respectively. The high rates of leave/join actions of the servers and nodes would cause extra significant update overhead. In case of a failure of the central manager, no capacity sharing would be possible. Adopting a more distributed approach using P2P Networks solves the single-point-of-failure and reliability problems of the centralised one. Replication of resources [3, 4] may increase the throughput performance of the overlay network since it increases the availability of the same resource [5]. Advertisement [6] or gossiping may direct the query faster to the resource provider thus reducing the latency. DHT-based P2P systems such as CAN [7], Chord [8], Pastry [9] and Tapestry [10] can guarantee successful discovery if the requested resource is available in the network within O(logn) messages [11]. However, replication or gossiping/advertisement as well as informed resource discovery techniques of P2P Networks are not applicable to the discovery of network capacity since it is a non-replicable reusable with high fluctuations on its availability resource. Organizing the overlay servers in a Structured P2P would require the use of its lookup function each time a node joins the system resulting in a high maintenance cost. Existing research on high throughput computing has produced several solutions to the issue of reusable resources

discovery especially storage capacity and CPU cycles. Condor [12, 13] is one of the most mature high throughput computing technologies. It is a distributed job scheduler providing the appropriate services for submission and execution of jobs on remote idle resources even if they are under different administrative domains. A central manager receives the advertisements of the available resources and tries to submit the queued jobs to the appropriate ones which report back to the manager the execution state of each job. The central manager along with the idle resources constitutes the condor pool. The flocking [14, 15] was introduced to statically link several condor pools and share resources between them. Manual configuration of a pools neighbours is required thus limiting the adaptivity of the system in case of dynamic changes in resource availability. It is also assumed that the pool managers run on reliable machines [16] since their failure can prevent the execution of new jobs. These problems can be approached with a Pastry [9] selforganizing overlay of condor pools [16]. The condor pools are organized into a ring and proactively advertise their idle resources to their neighbours so that they can choose these advertised resources whenever necessary. Unfortunately, this P2P-based flock of Condors requires a substantial maintenance overhead for updating the proximity-aware routing tables since it is based on the advertisements of available resources. If the availability of the resources changes very frequently these updates need to be frequent and therefore high maintenance costs. Finally, an important feature of Condor Flocking is that the execution machines are always managed by the same managers. Thus, every new discovery of the same/similar remote resources by the same manager follows the same procedure. Given that the required network capacity could frequently exceed the locally available, the local manager would forward equally frequent queries seeking almost the same amount of capacity; thus resulting in a significant number of messages. The benefits of P2P overlays [11, 17] for the discovery of reusable resources have been identified and used in PGrid. P-Grid, identifying the update overhead posed by available resources advertisements on DHT-based overlays, uses a tree-based distributed storage system of the requesting resources advertisements [18]. The resource providers locate in this tree the requestors they can serve and offer themselves for use. While other structured P2P networks hash the indexing keys, thus limiting the searching capabilities, P-Grid enables complex queries. The organization of this overlay raises a number of concerns about its scalability in case of large highly dynamic networks since an update action of one advertisement could propagate to many peers. Sensor Networks is another field that uses the benefits of P2P Networks to achieve reliable cooperation of networked

Peer-to-Peer Netw. Appl.

sensors. Recent research on P2P-based end-to-end bandwidth allocation [19] proposes a wireless unstructured overlay of sensors. Initially a central peer posses all the bandwidth and distributes it on-demand and every query is broadcasted to all peers. This system cannot be applied to the case of network capacity sharing since it makes the assumption that the available bandwidth within the whole network is known a-priori and that the topology of the network remains the same. Finally, it is suitable for wireless environments where the cost of broadcasting is the same to unicasting. All the systems described above are efficient in the context they were developed for but they are insufficient in the context of network capacity. Network characteristics may change extremely fast so that any advertisement and/or indexing scheme could result in frequent updates with a high cost on messages.

those resources to query originator server. Otherwise (not enough capacity available), it forwards the same query to its server neighbours without reserving any resources. Figure 1 illustrates the main Hoverlay components. All three server components (Neighbour List, Node Pool and Query Processor) are for handling incoming queries and answers. The Underlying Network Relocator resides in every underlying node accomplishing its logical movement as well as monitoring its workload: & Neighbour List (NL) manages a servers direct links to neighbouring servers and determines the next destinations of a forwarded query. It applies the server forwarding policy (if flooding all neighbours are selected, a subset otherwise). As stated in the previous sections, the deployment of informed techniques is not a efficient approach to service discovery in a highly intermittent environment such as Hoverlay; thus, at the implementation level, the servers will be using blind search methods. The proposed architecture is designed to operate under no centralized monitoring layer or neighbour-list updating scheme; though there is no guarantee for good direct links between servers, the maintenance costs of these lists stay minimal. This may improve system applicability in large-scale networks. A servers neighbour list gets refreshed upon receiving and answer replacing the oldest neighbour with the answer originator server. Thus, the answer rate drives the rewiring one; if most of systems resources are busy the answers and, thus, rewiring are rare. Symmetrically, low loaded environments generate few queries and trigger even fewer rewirings. The neighbour lists get refreshed in situations with normal load. These frequent updates help on keeping the overlay connected. Node Pool (NP) keeps a record of free nodes until they are reused. These records primarily focus on access and commission details. Internal queries may reserve any amount of capacity available whereas external ones can only reserve the capacity that fully satisfies their requirements. Node Pools keep some of the available capacity (safety capacity) for use by underlying busy nodes only.

3 Architecture overview In this section we provide a brief summary of the Hoverlay functionality. For more details please see [2]. A Hoverlay server uses only a random list of other server IDs (Neighbour List) to share its own resources and discover new ones in their overlay. Whenever necessary, a local (requesting) server forwards queries originated from underlying nodes (internal queries) to its neighbours and waits for an answer. A Node Pool, embedded in every server, keeps records of available underutilized nodes and tries to satisfy an internal query using that capacity reserving as much as possible. Any extra amount of capacity (if at all), not provided by local pool, is queried to neighbouring servers. Each request has a lifetime which is the maximum time a requesting server may wait for answers from the overlay. A query terminates if its lifetime expires or an answer is received. In case of an external query (sent by another overlay server), servers try to completely satisfy it using capacity in their local Node Pool only. A pool reserves, if there is locally sufficient capacity, a subset of the available resources that collectively represent capacity at least as much as the query requirements and initiates a handshaking protocol to deliver
Fig. 1 Hoverlay system architecture and components

&

Peer-to-Peer Netw. Appl.

&

&

No external query is satisfied if the capacity availability in a Node Pool is lower than its safety level. Safety capacity is used to serve only internal queries produced by small fluctuations of workload of underlying nodes. It prevents a large number of queries from being forwarded to the overlay. The safety capacity size is a percentage of the average requested capacity from the underlying nodes within the last few time units (time frame). This percentage and time frame are application specific and are configured by server administrators. An implementation of Hoverlay may use any kind of well-established resource specification format (e.g. Condor Classads [20]) adopted by all servers. Query Processor (QP) processes any incoming query and performs all the communication activities of a server. It caches any internal and external query for a given period of time and interacts with Node Pool to satisfy it, if possible responding back or forwarding it to neighbouring servers otherwise. In case of internal queries, it waits for the answers. As soon as an answer is received it merges the discovered capacity with that reserved in local Node Pool, if any, and acknowledges back. Underlying Network Relocator (UNR) resides in underlying network nodes and is responsible for controlling node relocations from one network to another. It is used by remote servers that nodes migrate to.

The three keyword containers of the model are the Keyword List (KL), the Popular Keyword List (PKL) and the Keyword Exclusion List (KEL). Every node has a Keyword Exclusion List to specify the traffic that is not willing to serve (i.e. incompatible applications). Every server uses a Keyword List denoting the applications its underlying network runs. Queries originated from that server carry its Keyword List. The same server also collects the most frequent keywords of the queries it receives in its Popular Keyword List. These keywords are the most frequently occurring keywords in the Keyword Lists of servers within its vicinity. The pool of the server cannot contain a node that has in its Keyword Exclusion List a keyword that is common to the Keyword List of the server. This ensures that each pool has solely nodes that can serve traffic of the underlying network. Each query carries the Keyword List of its originating server. When a server receives a query, it tries to find nodes in its local pool that have no common keywords in their Keyword Exclusion Lists with the querys Keyword List. This guarantees that the nodes found will be useful in the context of requesting server; thus compatible with the application running in requestors underlying network. 4.2 Query matching Hoverlay is to be deployed on highly dynamic environments to support primarily traffic needs of networked applications. It does assume guaranties of resource availability and therefore it cannot 100% rely on the service capacity of registered nodes. This makes the use of kbps and MHz two more feasible parameters of their capacity. A server upon receiving a query, it starts the matching process on this Node Pool. Based on the S field of the query (see [2]), it tries to discover nodes that cumulatively or individually satisfy those capacity parameters. Though the required network throughput can be easily compared against those offered by free nodes, CPU speed and usage comparisons in heterogeneous environments are more difficult. Hoverlay assumes that a multiplication of the CPU usage with its clock speed gives a rough estimation of the required processing capacity. More detailed comparisons need information of both hardware (e.g. cache, CPU architecture, MIPS, memory speed, I/O latency) and software (e.g. language or executable, compiler version, operating system) environments of nodes. Such matchmaking is well-documented and used by Condors which, however, differ in purpose; Condor focuses on job completion whereas the proposed one here on traffic and enables nodes to request for extra capacity if necessary. CompuP2P [21], as a computational resource sharing paradigm deployed on dynamic P2P Networks, also uses cycles/second to represent processor capacity.

4 Keyword-driven resource selection Unlike Condor which practically relies on resource availability and reliability, Hoverlay is by definition a system to be deployed on highly dynamic environments. Here, this paper introduces a mechanism based on the use of keywords to enhance the collaboration of nodes by improving the success rate and resource utilisation of the system without creating closed groups that may fragment the topology. The aim of the following mechanism is not to shorten the paths between requestor and provider but rather ensure that the most suitable resources are well visible to nodes. The next paragraphs present this mechanism accompanied by a worked example. 4.1 Keyword components Answer refinement is necessary in the context of P2P Networks, which are dedicated to specific applications. Not all the discovered nodes are able to participate in an underlying network. Even if they are able to provide the requested resources, they may be inappropriate to take over a specific task. Therefore, keywords help to specify any special requirement of the requestor. These keywords are simple words or (key, value) pairs.

Peer-to-Peer Netw. Appl.

The following scenario demonstrates a simple matchmaking process on a server with two free nodes in its Pool. There are three nodes involved in this interaction: the requestor, and the two available ones. Their configuration appears below: 1. Node A (requestor): Its network capacity is NC = 100 kbps and processor capacity PC = {50%,2,000 MHz} and its overload threshold is NC = 100 kbps and PC = {40%,2,000 MHz} for the network and processor respectively. Assuming that its load is: NC = 100 kbps and PC = {60%,2,000 MHz} then it seeks for resources satisfying the: NC = 100 kbps and PC = {20%,2,000 MHz}. That is, the query that reaches the server host of nodes B & C looks for a set of nodes each of which has a minimum network throughput 100 kbps and collectively {40%,2,000 MHz} processor capacity. 2. Node B (free): Its overload threshold published in its s e r ve r P o o l i s N C = 2 0 kb ps an d P C = {20%,2,000 MHz}. This node cannot match the requirements as its network throughput is well below the requested one. 3. Node C (free): Its overload threshold published in its s e r v e r P o o l i s N C = 2 00 k bp s a n d P C = {70%,1,000 MHz}. Its network throughput is enough to cover A requirements. The product of its CPU s usage with clock speed is bigger than that of querys; that is, 70%*1,000 MHz>20%*2,000 MHz. Therefore, Node C migrates to requesting node underlying network to take on part of requestors workload. 4.3 Answer and safety capacity selection In this section we introduce two contributions regarding the answer selection and the filtering of nodes in servers pool for filling its safety capacity. That is, we deploy keywordbased heuristics to select one among multiple concurrently received answers and to assign nodes from the Node Pool to the safety capacity container.

While the system retains its principle using the first received answer of each query, this keyword-driven collaboration changes servers behaviour in case of multiple answers in the same time-unit. In brief, a server chooses the one providing nodes compatible with the most applications of servers in its vicinity as detailed below. This is a self-less approach as each server tries to collect nodes useful to other servers maximising its contribution to the network. To achieve that each server does the following: & calculate the union of KELs of all nodes for every answer it receives in the same time-unit referring to the same query. Lets assume that a server received N answers labelled as j for all 1 j N. The j-th answer carries a set of discovered nodes counting up to Rj The Keyword Exclusion List of that answer is the union of the exclusion keywords (KEL n ) of each node: Rj S KELn;x . KELj
x1

&

select the answer with the minimum intersection of that union (see above) and the servers PKL. Among all received answers on the same time-unit for the same query a server accepts the A such that Na T T KELA PKL minKELk PKL.
k1

While servers tend to collect resources most useful for the whole network, they try to keep the least useful ones to fill their safety capacity. Every node arriving into a servers domain is compatible with the underlying application. However, after similar calculations as above, a server may detect the ones that cannot serve traffic of most of underlying networks in its vicinity. Such nodes are the ones with the maximum intersection of their KEL with the servers PKL. The following pseudocode provides more details on the safety capacity functionality. Assuming that SF is the current set of nodes comprising the servers safety capacity, TargetSF is the level of safety capacity that server aims for and Pool the set of nodes in its pool:

Peer-to-Peer Netw. Appl.

The above algorithm runs on every action on the pool. That is, when a new node arrives in the pool or removed from it, the server tries to optimize its safety capacity by running that algorithm. 4.4 Worked example Without loss of generality the following example demonstrates a simplified interaction scenario involving the exchange of keyword-based components as explained before. The example consists of 5 servers with two neighbours each. Lets also assume that queries are forwarded to all ones neighbours and travel maximum two hops. The frequency of the received keywords appears in PKL inside a subscript parenthesis next to the keyword. For simplicity, this example omits the query matching and focuses on the functionality of keyword components. The table below describes the initial configuration of the servers. Table 1 Each query carries the KL of its originator. Assuming that server D sends a query to its neighbours A & E carrying its KL, the query will be: Q1={DA&E, (d,f)}. This query triggers a sequence of actions within the system: 1. Q1 reaches A: Server A has already keyword {d} in its PKL whose frequency increases by 1; keyword {f} is also added. Then, the query matching process finds node n1 in the pool which, assumingly, can satisfy the capacity requirements of Q1 but the intersection of its K L w i t h q u e r y s K E L i s n o n e m p t y : T KELn1 KLQ1 fd g. Thus server A cannot help with an answer and prepares Q2 to be sent to its neighbours D & E asking for the same capacity and with the same KL. However, one of the neighbours is the query originator and thus Q2 goes only to E: Q2={AE, (d, f)}. At the end of this interaction server A is: A={(D,E), (f), (b(3),d(2),c(1),f(1)), (n1)}. 2. Q1 reaches E: Following the same principles, E has to update its PKL and check its Node Pool for resource availability. Though the intersection of nodes n2 KEL T with querys KL is empty: KELn2 KLQ1 fg, assuming that this node cannot cover the required capacity, server E has prepares Q3 to send to its neighbours B &

3. 4.

5.

6.

C: Q3={EB&C, (d,f)}. Server E is now: E={(B,C), (c,j), (f(5),d(1)), (n2)}. Q2 reaches E: Server E has already processed the same query and thus without any further processing rejects the query. Q3 reaches B: Server B updates its PKL and checks its Node Pool for resource availability. Node n3 cannot help T as KELn3 KLQ3 f f g but assuming that n4 offers enough capacity to serve the requirements it is also a T compatible solution as KELn4 KLQ3 fg. Thus, server B prepares answer A1={BD, (b,e,g)}. Server B is now: B={(A,C), (a,g), (c(2),d(1),e(1),f(1)), (n3, n4)}. Q3 reaches C: Server C has in its Node Pool three nodes; the KELs of two of them have empty intersecT t i o n w i t h q u e r y s K L : KELn5 KLQ3 fg, T T KELn6 KLQ3 fd g and KELn7 KLQ3 fg. Assuming that none of the two nodes n5, n7 is able to satisfy the requirements, server C prepares an answer with including both nodes as their collective capacity covers the required one: A2={CD, ((a,b),(a,c,e))}. Server C is now: C={(A,D), (b,h), (a(2),d(1),f(1)), (n5, n6, n7)}. A1 and A2 reach D: Server D receives both answers in the same time unit. Therefore, it has to select the one with the least popular keywords in the KELs of the discovered T n o d e s : KELn4 PKLD fb; gg f o r A 1 a n d T S KELn5 KELn6 PKLD fbg for A2. Answer A1 offers resources that are more compatible with servers in Ds vicinity; thus, A2 is rejected. Both answer originators are then notified for the decision of server D.

The condition of all servers after the actions above appears below. Table 2 Node n4 was moved from server B to server D to help an overloaded underlying node. Assuming that n4 is released after some time and that the configuration of server D has stayed unchanged, the server has to decide which node between n4 and n8 will keep for safety capacity. Calculating T t h e i n t e r s e c t i o n s KELn4 PKLD fb; gg a n d T KELn8 PKLD f gg appears that n4 is more useful to servers in its vicinity. Thus, assuming that n8 is enough safety capacity for that server, n4 stays in the Node Pool for on-demand migration. This way, servers in a dynamic environment of non-replicable reusable resources (e.g.

Table 1 Initial configuration of the worked example A: B: C: D: E:

Neighbours D,E A,C A,D A,E B,C

KL f a,g b,h d,f c,j

PKL b(3),c(1),d(1) c(2),e(1) a(2) b(2),g(2) f(4)

Node pool n1 n3 n5 n8 n2 = = = = = {d,e} {a,c,f}, n4 = {b,e,g} {a,b}, n6 = {b,d}, n7 = {a,c,e} {a,g} {a,c}

Peer-to-Peer Netw. Appl. Table 2 Final configuration of the worked example A: B: C: D: E:

Neighbours D,E A,C A,D A,E B,C

KL f a,g b,h d,f c,j

PKL b(3),d(2),c(1),f(1) c(2),d(1),e(1),f(1) a(2),d(1),f(1) b(2),g(2) f(5),d(1)

Node pool n1 n3 n5 n8 n2 = = = = = {d,e} {a,c,f} {a,b}, n6 = {b,d}, n7 = {a,c,e} {a,g} {a,c}

Hoverlay) realise an altruistic approach to sharing resources decoupling the connectivity of servers from the semantic layer. & 5 Experiments and evaluation With regards to Hoverlay principal concepts and functionalities, the following set of experiments serves as a proofof-concept and basis for a detailed evaluation. At the heart of Hoverlay is an Unstructured P2P Network of servers (pools of resources) designed to support on-demand resource migration between networks. Both this architectural element (dynamic P2P Overlay of pools) and sharing technique (resource migration) are two main differences compared to other architectures. Existing competitive resource sharing systems, appropriate to work as a benchmark are listed below: & Condor: a local pool (manager) to facilitate resource sharing within an individual network (centralised architecture). Experimentation with Condor systems may provide useful material for evaluating possible costs (e.g. traffic, latency) introduced by Hoverlay as it proposes an arbitrary connection of similar systems. Flock of Condors; this category represents Condor-like systems with interconnected (via an Unstructured P2P Overlay) managers. They basically assume static links between pools and no mobility of resources as opposed to Hoverlay. This family of resources is the closest paradigm to Hoverlay.

&

&

&

response; such cases are not precluded from that percentage. This metric serves as an indication of Hoverlay efficiency in finding the required by overloaded underlying networks capacity. Hops per Answered Query: Query latency represents the elapsed time from query generation till answer delivery to requesting node. However, it depends on several factors such as connection speeds, processing power and memory of intermediate servers on query paths. As these factors are difficult to predict, the path length (hops) of a successful query can be used to estimate its latency without loss of generality assuming that all hops are temporally equal. This metric corresponds to the average path length of successful queries from their requesting nodes to the first provider server. Late responses that a server may receive for an already satisfied query do not contribute to it. Satisfied User Queries: number of additional user queries (requested capacity) that overloaded networks have managed to satisfy with extra capacity discovered via a given overlay. If an underlying node gets a response with its requested capacity, that system has made it possible for these user queries to be processed successfully. Messages: number of messages (traffic) produced in the system by any operation (registrations, queries, answers and acknowledgements).

5.1 Simulation practices For these evaluation purposes, a C++ object-oriented simulator (called Omeosis) was developed. It can simulate time as a sequence of timeslots during which any message (query, answer, registration and acknowledgement) may travel for a single hop only and ensures their concurrent processing and propagation. It assumes that no connection introduces any extra delay; thus, any message produced during a timeslot reaches its next destination on the following timeslot. Timeslots are equivalent to iterations of the main loop. Therefore, every iteration executes three phases: 1. Set global workload: add or remove workload on a random subset of underlying nodes. Each node of this subset takes on a chunk of that workload (w) defined as

Hoverlay, via its dynamic Unstructured P2P Overlay of servers, supports resource volatility and, via resource migration, aims at reducing discovery latency. All experiments below need to use a set of performance metrics against which all these three resource sharing paradigms (Condors, Flock of Condors and Hoverlay) will be assessed: & Success rate: percentage of successful queries over the total number of those generated in the system. Due to workload fluctuations, in certain cases answers delivered to requesting nodes may be unnecessary as their load has fallen to normal levels while waiting for a

Peer-to-Peer Netw. Appl.

a random integer within: w = {x : 1 z max(c)} where max(c) corresponds to the maximum capacity a node can have. This chunk distribution finishes as soon as the whole additional workload of this iteration is consumed; this determines the size of that subset. Thus, there is a non-negligible probability that a) a node takes on two or more, b) multiple underlying nodes of a single server take on at least one or even c) no underlying node of a server take on any chunks. 2. Generate queries: once a message reaches a node or server, another appropriate one (if necessary) is queued in its output buffer for delivery on the following timeslot. Apart from output buffer, messages are also stored in caches with the appropriate expiry time. If an underlying node is still overloaded, as soon as the waiting time of its cached query expires a new one with same requirements is generated and again cached for an exponentially increasing period (i.e. TTL+2repetitions+1). Therefore, message queues in output buffers follow the order of incoming ones. Exponential increase of waiting time before retry is a usual practice in request submission to networks and helps them avoid bursts of requests especially during high load situations. 3. Send produced messages: the message delivery also triggers their processing upon reaching their destination. This processing may result in more messages generated as replies to or forwarding of the received ones; these messages are to be delivered on the following time-unit. While both phases above may be in parallel or sequentially in any order executed, this one has to follow both. This is only an implementation requirement as all the messages produced by previous phases can be sent out with one pass per node. Every network component (server or underlying node) incorporates a set of modules which facilitate communication (input and output buffers), message caching, time event handling (reactions to time progress such as cache cleaning, message regeneration) and message processing. Servers use a pool, which can reserve resources upon request, free resources if response is not accepted or otherwise release them. Apart from time events, underlying nodes react to workload changes, too. An increase of their load may trigger the suitable query generation module. Symmetrically, a drop to its load may force the rejection of pending queries in its cache. All experiments used the same initial configuration of servers and nodes achieved by choosing the same parameters and feed to the random number generator used throughout. Both server overlay size and number of underlying nodes are user inputs and remain fixed during the experiment. The simulator first creates the server overlay and carries on with the underlying nodes. As soon as it generates a server, it populates its Neighbour List with a

random subset of the previously created ones; thus, their popularity follows the order they were created by resulting into a power-law network. The global workload fluctuates based on a pattern predefined by user input; applying a positive or negative workload per timeslot on underlying nodes implements a rise or drop, respectively, of the global workload. A monitoring module records all actions triggered by any event (message deliveries, workload changes, lifetime expirations) which finally creates appropriate output files in both analytical and concise forms. 5.2 Experiments configuration The experiments configuration is as follows: & Network Sizes: 10,000 servers (i.e. independent networks) and 50,000 nodes uniformly distributed among the servers. During connections initial setup, a node 1 links to any server with probability 10000. Therefore, servers with no underlying nodes cannot generate queries or provide answers but may increase the hops per answered query and number of messages. All nodes are initially free and available in their local pools. Capacity: each node represents capacity (c) of size between 5 and 10 (5 c 10) units inclusive. The capacity density function shows the number of nodes representing a certain amount of capacity. It follows a c4 geometric distribution: dc 1 . Multiplying the 2 number of nodes with the sum of c*d(c) products gives a close estimation of system capacity: 10 P c C 50000 300000 2c4 capacity units.
c5

&

&

Connectivity & Time-to-Live: each server connects with a maximum 3 other random ones. Its Neighbour List initial configuration occurs during servers creation with links to other existing ones. That is, the probability a server attracting a new link from another one exponentially increases with latters age thus resulting into a power-law incoming-degree distribution. This Neighbour List size is small enough to increase the average path length between any two servers making difficult the access of any resource from the vast majority of underlying nodes. The TTL of every query is set fixed to 7; that is, each query may access a maximum of 7 P t 3 3280 (=32.8% of all) servers. Practically, this
t0

&

percentage is lower as the average number of servers another one may reach is 26.29; that is, only 0.26% of Hoverlay overlay size. Workload: given the global initially available capacity, system-wide workload should have both valleys

Peer-to-Peer Netw. Appl.

and peaks fluctuating from 0% to even 150% of global capacity. This helps the system evaluation under several situations like workload increase/ decrease, long-lasting strenuous high-load or relaxing low-load phases. As explained above, server-to-server degrees follow a power-law distribution as only a small number of them are very popular; that is a result of the way new servers join their overlay. This coincides with real systems based on preferential attachment of new nodes onto older and more stable ones: e.g. Gnutella WebCaches [22]. It is a reasonable network topology for Hoverlay; it is expected to have power-law properties as strong providers will attract more links. Initial network configuration achieves a Poisson distribution of links from nodes to servers. Node migration (Hoverlay case only) may distort this distribution. While good providers (pools with plenty of resources and, thus, high node-originated incoming degree) attract more and more links they lose their resources faster. Finally, the distribution of global capacity onto nodes (number of nodes with the same capacity) follows a geometric density function complying with the idea that most of Hoverlay users offer low-capacity resources. Despite the theoretical maximum number of reachable servers for each query (see above), monitoring Hoverlay at its initial phase shows that this number does not exceed 57, way lower to the theoretical one. That is the maximum number of servers a query may visit via Outbound Neighbour Lists even with infinite TTL; thus, servers overlay appears to have a number of cyclic paths. Without revisiting servers, a server may access all its reachable servers with an average path length of 7.5 hops. Therefore, TTL = 7 appears to be an appropriate query path length for this network configuration. The deployed discovery protocol is set to a well-known and usual benchmark in Peer-to-Peer scientific community: Flooding. This mechanism minimises doubts about results accuracy as it explores the whole vicinity of each requesting server. Being more selective at query propagation at this
Fig. 2 Global workload and capacity in (a) uniform query distribution and (b) hotspot query distribution environments: added or removed workload per timeslot (top layer) and cumulative workload and system capacity (bottom layer)

evaluation stage, factors such as selectivity heuristics could distort results (e.g. k-walkers have unstable success rate and results would be unclear and non-conclusive regarding benefits and costs of Hoverlay). Flooding on a static overlay ensures that queries from a server, either in a Condor-based or Hoverlay architecture, may explore the same servers; this eliminates one factor of results differentiation: deployed search technique. 5.3 Query distribution The system evaluation was based on two environments selected to test different aspects: & & Uniform Query Distribution: every node in the whole system has equal probability with any other to generate a query. Hotspot Query Distribution: queries are generated from a specific small subset of servers. This means that servers that have generated queries in the past have high probability to generate again.

While global capacity remains fixed as no node joins or leaves the overlay throughout the experiments, global workload fluctuates as shown in the two-layered Fig. 2. Both layers share the same x-axis (timeslots) but their yaxes have same units (capacity) and different scale. Bottom layer describes the workload added or removed per timeslot whereas top one illustrates the system-wide capacity and cumulative load applied on to the system. Fixed positive or negative new load produces linear increases or decreases of global load at same intervals. After initialization phase, Hoverlay load fluctuates between 3 and 1 of its capacity. 2 3

6 Uniform query distribution 6.1 Query success rate Figure 3 presents Condors, Flock of Condors and Hoverlay success rates. Based on its plots, Hoverlay outperforms

Peer-to-Peer Netw. Appl. Fig. 3 Success rate of disconnected Condors, Flock of Condors and Hoverlay in a uniform query distribution environment

both disconnected Condors and their Flock with regards to the percentage of successful over the total number of generated queries. Throughout the experiment Hoverlay achieves better than Condors (up to 50%) success rate confirming that interconnecting individual networks contributes to the satisfaction of more user requests. For most of the experiment duration, when the system is normally and heavily loaded, Hoverlay manages to satisfy bigger portion of queries (on average 5% more) compared to Flock of Condors. Any explanation of this improvement in success rate achieved with Hoverlay should derive from the fundamental differences of the two systems: resource migration. Indeed, the reasoning is two-faceted: a) resource reservations in Flock of Condors last longer and b) resource migrations are also translated to query migrations. In details: & Service capacity is a highly dynamic resource; one of the system design requirements was that it should not rely on guarantees that migrated resources have the capacity their provider pools claim to have. A requesting node may reject discovered but unnecessary or unsuitable capacity. Resource migration eases the reregistration of this capacity with requestors pool avoiding extra messages and latency to return it back to its provider pool. In case of Hoverlay, once capacity is discovered an answer travels from remote provider (Server A) to requestor server (Server B) which then, at the same time, acknowledges the provider and wraps that answer to forward it to the underlying node. If for any reason that capacity is not used, it registers with Server B; providers get acknowledgements on the next timeslot (reservations last 2 timeslots). However in case of Flock of Condors, rejected resources need to return back to the provider with the acknowledgement to their response. Before Server B acknowledges Server A, it has to wait for the acknowledgement from the underlying node. Thus, discovered resources need to stay reserved for 4 timeslots before released. While Flock of Condors keeps resources reserved for 2 extra timeslots practically useless, Hoverlay provides them to requesting nodes serving extra load.

&

The overlay topology of these experiments is a powerlaw one. With a uniform distribution of load on nodes, every node initially has the same probability to generate a query. The majority of servers are at the edges of the overlay (leaf servers) and, hence, most of the queries come from those edges. Due to this overlay topology, most of the query paths direct to high-incoming-degree servers. These queries in combination with resource migrations force capacity to move from the centre of the overlay to its edges. Further increase of global workload will most likely generate queries from those overlay edges. With an adequate TTL and due to the smallworld phenomenon in the topology, queries may traverse the whole overlay. However, if resources do not migrate, extra increase of their workload generates queries forwarded via always the same server. A good portion of resources are managed by servers in the overlay centre which, however, have a shorter horizon, less accessible capacity and worst success rate.

Uniform resource distribution in scale-free networks does not work for the benefit of success rate in highly dynamic environments and resources. Two vertical lines divide the Fig. 3 area into three phases (A, B, C): A & C marked with (+)s and B with a (). These marks denote the areas in which Hoverlay is more (A & C) or less (B) successful than Flock of Condors. When the whole system handles low global workload (Phase B), Flock of Condors reach even 20% higher success rate than Hoverlay. This deterioration is superficial due to the very low number of produced queries and the even lower number of satisfied ones for all three systems. Moreover, the difference between the number of satisfied queries of Hoverlay and Flock of Condors is negligible compared to that of Phases A or C. Therefore, migrating resources can help on satisfying more user requests even in a static network (i.e. without rewiring). Condors, as a set of disconnected pools preventing access to remote resources, seem to have from 10% to even 50% lower success rate compared to the other two architectures. For the first few timeslots, Flock of Condors and Hoverlay reach 100% success by seeking for both local and remote capacity. Some servers contain no local free

Peer-to-Peer Netw. Appl.

capacity, even on the first few timeslots, in which case Condors cannot serve requests from their underlying nodes; hence, lower success rate than Flock of Condors or Hoverlay. Global workload in the beginning of Phase A steadily increases and therefore no new fresh nodes appear in server pools. Gradually all resources within requestors vicinity exhaust and a) more queries fail, b) more new queries browse the overlay and c) more servers regenerate the unsatisfied ones. Capacity exhaustion increases the number of queries and their repetitions deteriorating the success rate of all three systems. Symmetrically, applying negative workload on nodes makes the global cumulative one drop; some of these nodes (resources) become underloaded and available for ondemand migration or local re-commission via their pools. In some other cases despite their workload drop, they may remain overloaded but reactively adjust downwards their requested capacity. That is, responses for past queries may not be necessary and thus the discovered capacity may either stay in its new local pool or be partially used by the requesting node. The unused portion of that capacity may serve extra load of the underlying network without extra requests. All systems during Phase B generate very few queries and satisfy even fewer. Flock of Condors satisfy negligibly more queries than Hoverlay but this difference is substantial compared to the number of queries they generate. This makes the success rate of Flock of Condors by 20% better than that of Hoverlay. However, it is a misleading conclusion if not accompanied by this observation. As shown in 2, from timeslot 80 till around 100, global workload drops and stabilises at almost 1 of global capacity. 4 On timeslot 96 (left border of phase B), systems workload approaches the 2 of its capacity. That is the point after 3 which Flock of Condors become more successful. As underlying nodes lose workload, some either become underloaded or normally loaded or even remain overloaded at same or lower workload levels. There are no new queries but for repeated ones. As fresh capacity becomes available, repeated queries get satisfied improving success rate for both systems.
Fig. 4 Average number of query hops before they discover their first answer deployed on disconnected Condors, Flock of Condors and Hoverlay in a uniform query distribution environment

While in case of Condor flocking unnecessary capacity returns back to its provider, Hoverlay moves it to requesting server. Within phase B, this migration is useless as that capacity may only be used on workload increase or even potentially harmful for system success rate as it may be moved away from places accessible by requesting servers. This is the case for those few repeated queries. Global workload affected most of the loaded nodes but not all; some remain overloaded and thus keep regenerating queries. As workload drops and before its stabilisation at its lowest level, repeated queries from several servers move out capacity from the vicinity of other servers which keep propagating queries even during steady-workload period. Without any workload increase that will trigger other servers query propagation, no fresh capacity can migrate in their horizon whereas Flock of Condors repositions free capacity to its initial provider and thus probably close to repeated query generators. This clearance of requesting servers horizon from available resources explains that success rate swap between Flock of Condors and Hoverlay architectures with formers being higher than latters one. However, that cannot justify why their difference in phase B is well bigger than in the other two. That difference can be justified considering the minimal number of queries on which this percentage is based. Flock of Condors keep, in low global workload cases, the resources at their initial distribution among servers bringing a bigger impact on success rate. Overloaded nodes and their queries start increasing with the global workload. At the end of phase B good portion of those queries are successful due to the available capacity generated during the last workload drop. Therefore, the impact of that factor gets lower and the success rate of both systems increases. 6.2 Average path length of successful queries Figure 4 presents the average number of hops successful queries had to travel before they discover their first answer. The graphs confirm that Hoverlay manages to achieve better success rate with shorter query paths especially on workload fluctuations. It helps queries get responses from

Peer-to-Peer Netw. Appl. Fig. 5 User-end perceived satisfaction: (left) capacity requested and satisfied from server overlays and (right) requested capacity per timeslot

0.5 to even 2 hops sooner than Flock of Condors do. This 2-hop improvement takes place around timeslots that global workload started decreasing: fresh capacity appears close to the regenerators of repeated queries. The average path lengths of Flock of Condors and Hoverlay do not follow the pattern of success rate and stay well below query TTL. The power-law topology of the overlay, in the absence of rewiring, stays the same throughout the experiment. Most query paths start from the edges of the network and finish at its centre. Thus, while workload increases resources in the centre become scarce; the biggest portion of the capacity is close to leaf servers. Long successful paths are less than short ones and, thus, their average remains below TTL mean value 4.5 (1 hop from node to server plus TTL). 2 As soon as workload starts dropping the effect of resource migrations becomes clearer. Fresh capacity appears in the pools of domains it migrated to, closer to requestors. On the contrary, Flock of Condors places that capacity back to its originators and thus repeated queries need to travel further to re-discover it. This explains why Flock of Condors exhibit path length bursts on the first few timeslots the workload starts decreasing. While workload keeps dropping, free capacity increases close to requestors vicinity, servers generate no new queries and more and more repeated ones are cancelled. This keeps the average hop count in low levels. In case of disconnected set of Condors, queries can only travel from underlying nodes to their local servers; hence, one hop. Till timeslot 44 successful queries of both Flock of Condors and Hoverlay exhibit almost the same hop count. Workload increases linearly for the first 44 timeslots and all
Fig. 6 Cost in messages: (left) total number of messages produces and (right) total number of generated queries

migrated nodes join the requesting underlying networks. Given that both systems use flooding tested on same topologies, all servers explore their whole horizon and thus the average hop count has minimal deviation. 6.3 Traded capacity and cost in messages In general, Hoverlay satisfies more load than Flock of Condors (Fig. 5 left) though both systems request almost the same amount (Fig. 5 right). This improvement comes at almost same cost in messages (Fig. 6 left). Unlike Flooding, the number of messages spent by random walkers is highly correlated to the success rate. Therefore, Hoverlay could outperform flocks of Condors if a search mechanism that is able to detect migrations was used. Condors satisfies much less requested capacity than the other two networked systems; however, Condors have a minimal overall cost in messages as their queries can only travel one hop. Following similar patterns as success rate, satisfied capacity of Hoverlay is more than that of Flock of Condors in phases A & C. Flock of Condors superiority in Phase B becomes insignificant as the requested capacity during that period is much lower than that of remaining two phases. Figures 5 left and 6 left and right have a common characteristic: almost vertical deep decreases and high increases in plotted lines. These radical changes happen on timeslots that new load swaps from positive to negative values and vice-versa. Though within 4565 timeslots interval the global workload drops linearly the number of messages, number of queries and requested capacity are non-zero and follow similar patterns. Similarly, the pattern of 4565 interval tops the new positive workload of 6679

Peer-to-Peer Netw. Appl.

timeslots. Differences on those patterns appear as portions of those repeated queries get satisfied or stopped as unnecessary (especially after workload drops). Condors, due to lack of query forwarding in the servers overlay, exhibit much less number of messages. Their low success rate forces them to repeat many queries and finally overpass both Flock of Condors and Hoverlay in number of messages and queries. To sum up, this experiment proves that even in fixed topologies with high workload situations and similar search technique (exhaustive flooding) deployed, Hoverlay is more efficient than Flock of Condors in terms of: & & & success rate (percentage of successful queries), satisfied capacity (portion of requested capacity that was satisfied) and average path length of successful queries before they hit their first answer.

TTL is set to maximum 7 hops as longer paths would unrealistically overload the network. The workload in these three small areas cannot be as high as the load a whole network could take as much less capacity is accessible with TTL = 7 from hotspots (see Fig. 2b). Most of the observations from the results below share the same analysis as in previous experiments. Detailed explanations will only follow observations that differentiate these results from section 6. 7.1 Query success rate As shown in Fig. 7, all systems experience, in general, low success rate but Hoverlay is the most successful (up to 20%) compared to Condor-based systems. Hotspot areas are very small compared to the network size and the global workload is uniformly distributed among their nodes. Resource migrations in Hoverlay help each hotspot area to serve more queries and to increase the number of resources that can take over future workload. Condor-based systems are task-oriented; each node is assigned a very specific job and cannot take over another from the same flock. A remote resource needs to return its control back to the originator flock after the completion of a job. However, Hoverlay resources remain available at the server they migrated to and are able to take over extra workload by their new manager server. Newly migrated nodes join the networks without getting overloaded; they can take on some more workload without overpassing their threshold. This reduces both number of queries and requested capacity and improves the success rate. On timeslot 70, a hotspot relocation takes place to areas that cumulatively contain more available capacity. This makes success rates of all systems abruptly increase. Though there is now enough accessible and available capacity only about 50% of queries are successful. Workload already applied onto certain servers remains on them until is satisfied. Therefore, past highly loaded hotspots (as is the case during 070 timeslots) regenerate queries alongside new ones. Around timeslot 80, though global workload starts decreasing, success rate of all systems drops quickly, too, justified by the following reasoning:

This is basically the positive effect of resource migration to requesting networks and comes at practically no cost in messages. However, under certain circumstances, when the global load gets lower than global capacity the resources distribution is skewed and may negatively affect the success rate. Considering the low total and even lower satisfied number of queries on those cases, that deterioration of success rate is almost insignificant.

7 Hotspot query distribution To setup these experiments few more parameters are necessary to generate hotspots throughout. Number of spots: a subset of servers are picked to belong in three, not necessarily neighbouring, hotspot areas. In fact, each of these areas consists of a centroid (server) and all other servers within its direct or indirect vicinity accessible via incoming or outgoing links. Spot radius: any server within a hotspot is maximum five hops away from its centroid. Spot lifetime: each spot area lives for 70 timeslots. At any moment, there are three hotspot areas. Thus, in 210 timeslots the areas change three times. The new set of areas is picked randomly and may overlap with the previous set.
Fig. 7 Success rate of Condors, Flock of Condors and Hoverlay in Hotspot areas

Peer-to-Peer Netw. Appl.

1. calculation of success rate includes queries from both past and new hotspots, 2. past hotspots do not generate new but only repeat unsatisfied queries, 3. new hotspots have much more capacity and less workload than past ones, thus, most of their queries get satisfied, 4. as soon as their workload starts dropping (timeslot 80 onwards), new hotspots stop producing or repeating queries, 5. queries come from old hotspots only which, however, have not managed to discover more capacity despite exploring their entire vicinity. Decreasing workload helps success rate to increase. Unlike previous experiments, Hoverlay is more successful until timeslot 140. The main percentage of global workload is still on servers of past hotspots. Hence, workload is removed primarily from old hotspot areas and fresh capacity remains in these areas. However, in case of Flock of Condors, free capacity returns to its original host and owner. This may move capacity outside hotspot areas making difficult to reach from inner servers. 7.2 Average path length of successful queries Based on the left part of Fig. 8, Hoverlay outperforms Flock of Condors in terms of average query path length of successful queries. While the former fluctuates between 1 and 3 hops, the latter reaches even 8 hops with an average of 2 hops longer paths for most of the experiment. Initially, queries tend to travel far to discover resources as low-capacity hotspots are charged with high workload. This causes a non smooth increase of average path length. Resource migration helps Hoverlay reduce requested capacity per query and thus increase success in few hops. As hotspots exhaust all reachable capacity, query success drops to zero for Condor-based systems and thus no average hop count to be recorded. Within first few timeslots Hoverlay moves all discovered capacity within hotspot areas keeping query paths shorter than Condor-based systems and thereof even gradually reducing them.
Fig. 8 Average number of hops (left) of successful queries and average satisfied requested capacity (right)

The second interval starts with a relatively small workload increase compared to capacity available within the second hotspot triplet and thus both Flock of Condors and Hoverlay exhibit similar path lengths. Once workload starts dropping, all queries come from previous triplets of hotspots. Hence, fresh capacity returns to its owner: a) outside -if Flock of Condors- or b) inside -if Hoverlay- their borders. This explains why the latter satisfies repeated queries faster than the former. At the final phase of this experiment, success rate of Flock of Condors approaches zero and thus the average path length is a calculation of a small sample causing a fluctuation on the graph. This experiment shows the important gains of Hoverlay compared to Flock of Condors especially in cases of workload bursts to specific areas of a network. It is worth mentioning that the number of messages spent by those two systems are almost the same as Flooding is deployed on both. Finally, the results are further confirmed by right side of Fig. 8 which illustrates the gains over satisfied user queries. The locality of fetched resources allows the discovery of more capacity when workload increases.

8 Conclusion Hoverlay is a system that enables logical movement of nodes from one network to another aiming to relieve nodes experiencing high workload. Remote nodes migrate into the requesting node domain to take over some of that excessive workload. It is an arbitrary network of servers (overlay) each of which represents a single underlying network. All servers use blind search techniques to discover free resources from other networks and move them into the requesting network. It is designed to be tolerant to node and server failures since it has minimized the maintenance costs of server and node components. Node migration and dynamic server overlay differentiate Hoverlay from Condor-based architectures which exhibit more static links between managers and nodes. In this paper, we have presented a number of important extensions to the basic Hoverlay architecture which collectively enhance the degree of control owners have over their nodes and the overall

Peer-to-Peer Netw. Appl.

level of cooperation among servers. The former is achieved through the concept of exclusion lists used by nodes in order to avoid participation in incompatible applications. This is actually a general purpose method that can be used to deploy node mobility constraints imposed by their owners. Cooperation among servers is enhanced in two ways: firstly each server chooses nodes based on their overall compatibility with its predecessor servers, secondly servers reserve safety capacity with nodes that are likely to be the least useful in the same context as above. The keyword-based cooperation of nodes in a distributed environment, even in the context of P2P systems, is a wellresearched area. However, unlike other studies on semantic cooperation, Hoverlay does not use keywords to refine servers neighbour lists based on the type of resources they provide. In highly dynamic environments, rewiring may quickly create closed groups of servers sharing the same type of resources and fragment the network. Queries trapped in these groups cannot discover new capacity that may appear outside the group thus deteriorating the success rate. The proposed scheme improves the visibility and mobility of widely compatible and acceptable resources and restricts the less useful ones. These latter resources tend to gather at the edges of the overlay practically deteriorating their access to popular capacity in overlays centre and consequently their success rate. An extensive simulator was developed to evaluate the conceptual characteristics of Hoverlay in static environments against the benchmark of Condors and Flock of Condors. A series of simulations were run and the results proved that Hoverlay performs better than disconnected Condors and Flock of Condors achieving important improvements in both success rate and average successful query path length at a negligible expense in messages. Another significant contribution in this paper is the second set of experiments of the two systems in flash crowd at specific areas of a network. In these scenarios the gains of Hoverlay are clear and suggests that resource migrations more robust to network changes. There are plenty of interesting avenues to explore for further work. Currently we are experimenting with a new type of blind search mechanisms that can trace resource mobility without logging statistical data on nodes. This concept is expected to allow Hoverlay discover remote/migrated resources and converge to network changes very fast. We also develop heuristics to pro-actively move unutilised nodes from pools to other servers in order to cater for future overload scenarios before they actually occur. These heuristics are based on real-time topology features extraction.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16. 17.

18.

19. 20.

References
21. 1. Bedrax-Weiss T, Macgann C, Ramaksishnan S (2003) Formalizing resources for planning, PDDL03: Proceedings of the Work-

shop on Planning Domain Description Language, Trento, Italy, pp 714, June Exarchakos G, Antonopoulos N (2007) Resource sharing architecture for cooperative heterogeneous P2P overlays. J Netw Syst Manag 15:311334 Cohen E, Shenker S (2002) Replication strategies in unstructured peer-to-peer networks. ACM SIGCOMM Comput Comm Rev 32 (4):177190 Tsoumakos D, Roussopoulos N (2006) Analysis and comparison of P2P search methods, ACM 1st International Conference on Scalable Information Systems, Hong Kong, China, vol. 152, no. 25, May Yang X, de V eciana G (2006) Performance of peer-to-peer networks: service capacity and role of resource sharing policies. Perform Eval 63(3):175194 Zhou D, Lo V (2004) Cluster computing on the fly: resource discovery in a cycle sharing peer-to-peer system, CCGrid: IEEE International Symposium on Cluster Computing and the Grid, Chicago, Illinois USA, pp 6673, April Ratnasamy S, Francis P Handley M, Karp R, Shenker S (2001) A , scalable content addressable network, conference on applications, technologies, architectures, and protocols for computer communications, San Diego, California, United States, pp 161172, August Stoica I, Morris R, Liben-Nowell D, Karger DR, Kaashoek MF, Dabek F, Balakrishnan H (2003) Chord: a scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Trans Netw 11(1):1732 Rowstron A, Druschel P (2001) Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems. Lect Notes Comput Sci 2218 Zhao BY Kubiatowicz JD, Joseph AD (2001) Tapestry: an , infrastructure for fault-tolerant wide-area location and routing. University of California at Berkeley, Berkeley Lua K, Crowcroft J, Pias M, Sharma R, Lim S (2005) A survey and comparison of peer-to-peer overlay network schemes. Commun Surv Tutor IEEE: 7293 Litzkow MJ, Livny M, Mutka MW (1988) Condor-a hunter of idle workstations, 8th International Conference on Distributed Computing Systems, pp 104111 Thain D, Tannenbaum T, Livny M (2005) Distributed computing in practice: the condor experience. Concurrency Comput Pract Ex 17(24):323356 Evers X, de Jongh JFCM, Boontje R, Epema DHJ, van Dantzig R (1993) Condor flocking: load sharing between pools of workstations, technical report DUT-TWI-93-104. Delft University of Technology, The Netherlands Epema DHJ, Livny M, van Dantzig R, Evers X, Pruyne J (1996) A worldwide flock of Condors: load sharing among workstation clusters. J Future Generat Comput Syst 12(1):5365 Butt A, Zhang R, Hu C (2006) A self-organizing flock of Condors. J Parallel Distr Comput 66(1):145161 Androutsellis-Theotokis S, Spinellis D (2004) A survey of peerto-peer content distribution technologies. ACM Comput Surv (CSUR) 36(4):335371 Philippe K (2001) P-Grid: a self-organizing structured P2P system, Sixth International Conference on Cooperative Information Systems (CoopIS 2001). Lect Notes Comput Sci 2172:179 194 Caviglione L, Davoli F (2005) Peer-to-peer middleware for bandwidth allocation in sensor networks. IEEE Comm Lett 9(3):285287 Raman R, Livny M, Solomon M (1999) Matchmaking: an extensible framework for distributed resource management. Clust Comput 2(2):129138 Gupta R, Sekhri V Somani AK (2006) CompuP2P: an architec, ture for internet computing using peer-to-peer networks. IEEE Trans Parallel Distr Syst 17(11):13061320

Peer-to-Peer Netw. Appl. 22. Karbhari P Ammar M, Dhamdhere A, Raj H, Riley GF, Zegura E , (2004) Bootstrapping in Gnutella: a measurement study, volume 3015/2004 of Lecture Notes in Computer Science. Springer Berlin, Heidelberg, pp 2232 Dr. Nick Antonopoulos is a Senior Lecturer at the Department of Computing, University of Surrey, UK. He holds a BSc in Physics from the University of Athens in 1993, an MSc in Information Technology from Aston University in 1994 and a PhD in Computer Science from the University of Surrey in 2000. He has worked as a networks consultant and was the cofounder and director of a company developing Webbased management information systems. He has over 9 years of academic experience during which he has designed and has been managing advanced Masters programmes in computer science at the University of Surrey. He has published over 50 articles in fully refereed journals and international conferences and has received a number of best paper awards in conferences. He is the organiser and chair of the 1st international workshop on Computational P2P networks. He is on the editorial board of the Springer journal of Peer-to-Peer Networking and Applications (effective from 2009) and on the advisory editorial board of the IGI Global Handbook of Research on Telecommunications Planning and Management for Business. He is a Fellow of the UK Higher Education Academy and a full member of the British Computer Society. His research interests include emerging technologies such as large scale distributed systems and peerto-peer networks, software agent architectures and security. Contact him at the Department of Computing, University of Surrey, Guildford, Surrey, GU2 7XH, United Kingdom; N. Antonopoulos@surrey.ac.uk

Dr. George Exarchakos is a Researcher in Autonomic Network at the Eindhoven University of Technology, The Netherlands. His research interests span from distributed network features extraction and modelling to network-aware video distribution and P2P Computing. He finished his PhD in the Department of Computing at the University of Surrey in 2009. He successfully completed the BSc in Informatics and Telecommunications of the University of Athens in 2004. The same year joined the MSc in Advanced Computing at Imperial College London to complete it in 2005. He collaborated with the Network Operation Centre of the University of Athens from 2003 until 2004. His achievements include an authored book on Networks for Pervasive Services published by Springer in 2011 and an edited handbook of research on P2P and Grid Systems for Service-Oriented Computing published by IGI-Global in 2010. He has gotten best paper awards in the international network and multimedia conferences Since the start of his PhD in 2005. He has contributed to peer-review journals and international conferences with more than 20 articles. Contact him at the Department of Electrical Engineering, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands or via email at g.exarchakos@ieee.org

You might also like