You are on page 1of 9

A distributed resource discovery algorithm for P2P grids

Javad Akbari Torkestani


n
Department of Computer Engineering, Arak Branch, Islamic Azad University, Arak, Iran
a r t i c l e i n f o
Article history:
Received 11 April 2012
Received in revised form
3 August 2012
Accepted 3 August 2012
Available online 10 August 2012
Keywords:
Grid
P2P grid
Resource discovery
Resource allocation
Learning automata
a b s t r a c t
Centralized or hierarchical administration of the classical grid resource discovery approaches is unable
to efciently manage the highly dynamic large-scale grid environments. Peer-to-peer (P2P) overlay
represents a dynamic, scalable, and decentralized prospect of the grids. Structured P2P methods do not
fully support the multi-attribute range queries and unstructured P2P resource discovery methods suffer
from the network-wide broadcast storm problem. In this paper, a decentralized learning automata-
based resource discovery algorithm is proposed for large-scale P2P grids. The proposed method
supports the multi-attribute range queries and forwards the resource queries through the shortest path
ending at the grid peers more likely having the requested resource. Several simulation experiments are
conducted to show the efciency of the proposed algorithm. Numerical results reveal the superiority of
the proposed model over the other methods in terms of the average hop count, average hit ratio, and
control message overhead.
& 2012 Elsevier Ltd. All rights reserved.
1. Introduction
Grid systems interconnect a collection of heterogeneous and
autonomous systems from multiple administrative domains geo-
graphically distributed to make possible the sharing of existing
resources. Grid implies to an extensive concept that is often
referred to as the parallel system of the 1970s, the large-scale
cluster system of the 1980s, and the distributed system of the
1990s. Therefore, grids widely inherit of the traditional comput-
ing models. However, they have distinguished characteristics
such as large scalability, heterogeneity and diversity, autonomy,
dynamicity and open-endedness, and task complexity (Yu et al.,
2005). Grid systems are mostly based on a centralized or hier-
archical administration (Deng et al., 2009). Traditional centralized
or hierarchical grid architectures are unable to effectively manage
the large-scale, heterogeneous, and highly dynamic grid resources
(Deng et al., 2009). The unique characteristics of the P2P archi-
tecture (e.g., distributed administration, large-scale size, and so
on) enable it to cope with the dynamicity, scalability, and
availability problems of the grids. In P2P networks unlike the
traditional client-server models, each peer can simultaneously
perform as a client or as a server. Depending on the organization
method of the peers and the communication protocols, P2P
systems are mainly subdivided into structured and unstructured
classes. The former one uses a rigid structure to interconnect the
peers, while the letter one lets the peers randomly join or leave.
Furthermore, hybrid approaches have been also proposed to keep
the benets and to overwhelm the drawbacks (Truno et al.,
2007). File sharing, real time data streaming, and cycle stealing
are some well-known representative services provided by P2P
networks (Deng et al., 2009). Generally, grid and P2P systems
both are resource sharing environments having different advan-
tages. Integration of the grid system with the philosophy and
techniques of the P2P architecture is a promising approach, called
P2P grid, to alleviate the disadvantages of the traditional grid
systems (Merz and Gorunova, 2007; Kocak and Lacks, 2012; Deng
et al., 2009).
The main objective of the resource sharing systems is to pool
together the software and hardware resources from multiple
administrative domains and providing a huge collection of avail-
able resources to assign to the user applications. Therefore,
resource management is an integral part of these systems and
resource discovery is a key service that locates the system
resources across a large-scale distributed system (Truno et al.,
2007). Classical approaches to the grid resource discovery pro-
blem are generally based on the centralized or hierarchical
architectures. This may shorten the average response time of
the local requests. However, centralized and hierarchical archi-
tectures make the grid system inefcient and susceptible to
failure as the grid size rapidly grows in distributed environments
(Truno et al., 2007; Kocak and Lacks, 2012). Centralized and
hierarchical resource discovery approaches suffer from the single
failure point, performance bottleneck in highly dynamic systems,
and lack of scalability in large-scale distributed systems (Deng
et al., 2009). Therefore, designing of efcient resource discovery
algorithms is a crucial problem and of a great importance. P2P
Contents lists available at SciVerse ScienceDirect
journal homepage: www.elsevier.com/locate/jnca
Journal of Network and Computer Applications
1084-8045/$ - see front matter & 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.jnca.2012.08.001
n
Tel.: 98 861 3663041-9.
E-mail address: j-akbari@iau-arak.ac.ir
Journal of Network and Computer Applications 35 (2012) 20282036
overlay technology emerged as a scalable solution to the tradi-
tional grid systems offering several advantages over the centra-
lized approaches (Truno et al., 2007; Merz and Gorunova, 2007).
P2P grid exploits the synergy between the grid system and P2P
network to efciently manage the grid resources and services in
large-scale distributed environments. Literature argues that the
grid and P2P systems will eventually converge (Truno et al.,
2007). Resource location and discovery in P2P grids has been of
research interest during the recent years and several studies have
been conducted.
Depending on the division of the architecture of the P2P
systems to the structured and unstructured architectures, grid
P2P resource discovery approaches are grouped into structured
and unstructured too. The unstructured underlying P2P architec-
tures proposed in literature for grid resource discovery are
generally classied as at P2P networks (Iamnitchi and Foster,
2003; Talia and Truno, 2005), tree-based overlays (Marzolla
et al., 2005), and cluster-based networks (Mastroianni et al.,
2005a, 2005b; Puppin et al., 2005). The following briey reviews
some well-known architectures. A fully decentralized P2P archi-
tecture was proposed by Iamnitchi and Foster (2003) for resource
discovery in grid systems. In this architecture, the resource
discovery process is divided into four parts: membership proto-
col, overlay construction, preprocessing, and request processing.
Resource information is stored in one or more peers. User sends
its request to the local peer. Peer checks its table to see if it has a
matching resource description. If so, it responds the user. Other-
wise, it forwards the request to another peer. This process repeats
until the requested resource is found or the TTL of the request
expires. Talia and Truno (2005) proposed a P2P architecture for
resource discovery in OGSA-compliant grids. The proposed model
has two layers. The lower layer is composed of a number of
hierarchical index services and the upper one is a P2P layer
including peer services and contact services. Each index service
publishes the resource information of its own virtual organiza-
tion. Peer services are responsible for resource discovery and
contact services for organizing the peer services in a P2P network.
Mastroianni et al. (2005a, 2005b) designed a regional resource
information service for P2P grids based on the super-peer con-
cept. This model comprises two types of peer: super-peer and
regular peer. Each super-peer is associated with a number of local
regular peers. Super-peers are connected by an overlay P2P
network. A regular peer sends its request to its local super-peer.
Super-peer returns the response, if it nds a local peer providing
the requested service. Otherwise, it forwards the request to its
neighboring super-peers. Another super-peer based resource
discovery scheme was proposed by Puppin et al. (2005). In this
scheme, grid nodes are partitioned as clusters, each having one or
more super-peers. This model includes two main components:
agent and aggregator. Each aggregator plays the role of a super-
peer responsible for data collection, query processing and for-
warding, and information indexing. A P2P network connects the
neighboring super-peers. At each cluster, agent publishes the
information of the provided resources. A tree structure-based
grid resource discovery approach was proposed by Marzolla and
Mordacchini (Orlando).
Structured P2P grid resource discovery approaches are gen-
erally based on a distributed indexing service known as hashing
technique. To maintain the rigid structure, a self-organization
mechanism is required in structured P2P systems that imposes a
heavy burden to the systems (Truno et al., 2007). Cai et al.
(2003) proposed a multi-attribute addressable network for grid
information services called MAAN that is an extension of the P2P
structured Chord system (Stoica et al., 2001). Andrzejak and Xu
(2002) designed a scalable efcient range queries for grid infor-
mation services that is based on an extension of the distributed
hash table (DHT) CAN system (Ratnasamy et al., 2001). In DHT-
based systems, to location a resource (or to map the le attribute
to the network address), the resource is initially associated with
an ID (key) by using a hash table. Then, a lookup function
calculates the value of the key (or where the source is stored).
A decentralized single-dimensional DHT-based information dis-
covery technique supporting multi-attribute queries was proposed
by Schmidt and Parashar (2003). In this method, each resource
having multiple attributes is mapped into the node whose ID is
obtained by interleaving the binary representation of the attri-
butes value. Ratnasamy et al. (2003) presented a load distribution
approach based on a uniform hash function. Besides the under-
lying DHT, a binary tree structured overlay is also used to allow
the efcient range query resolution. Each resource is registered
only at the leaf node whose range contains its attribute value.
Spence et al. (2003) extended the Pastry (Rowstron and Druschel,
2001) indexing and routing system. This model allows multi-
dimensional search by preparing a separate Pastry ring for each
resource attribute. Key values are managed in a tree like structure
whose leaves are the nodes. Non-leaf nodes summarize the range
of values of their children. To nd the responsible node, key value
of the tree like structure is mapped into the Pastry ring structure.
In Merz and Gorunova (2007), Merz and Gorunova proposed a
hybrid fault tolerant resource discovery mechanism for P2P grid
environments combining efcient chord-like spanning tree algo-
rithms and robust epidemic algorithms (Eugster et al., 2004).
Deng et al. (2009) designed an ACO (Ant Colony Optimization)
based resource discovery algorithm for large-scale P2P grid
systems. The proposed ACO-based algorithm avoids the notorious
global ooding problem by sending the packets along the routes
that are frequently travelled by the ants. This considerably
reduces the network load. Moreover, the proposed method sup-
ports multi-attribute range queries. To improve the grid perfor-
mance, this algorithm can use multiple ants searching the
resources in parallel. Kocak and Lacks (2012) proposed a resource
discovery protocol in which the network routers are in charge of
resource discovery process. In this protocol, besides the routing
table, each router is equipped with a resource table. Resource
table maps the IP addresses to the available computing resource
values. Discovery packets are encapsulated within the TCP/IP
packets and look up the resource tables for nding the requested
resources.
As mentioned earlier, unstructured P2P resource discovery
methods suffer from the network-wide broadcast storm problem.
Flooding the resource queries makes the unstructured approaches
inappropriate for current large-scale grid systems. On the other
hand, structured methods do not perform well in highly dynamic
networks and multi-attribute range queries. In this paper, a
decentralized resource discovery algorithm is proposed for
large-scale P2P grids to relief the problems with the previous
methods. Taking advantage of learning automata, the proposed
resource discovery algorithm nds the shortest path (the path
with the minimum hop count) connecting the user to the resource
providing peer. In this method, the communication link that is
chosen by each peer to route the resource provider is selected at
random by the automaton. If the route that is selected at each
stage is shorter than the average length of the routes selected so
far, algorithm rewards the selected route, otherwise it is pena-
lized. Therefore as the proposed algorithm proceeds, algorithm
converges to the route with the minimum expected length. The
proposed algorithm supports the highly dynamicity of the scal-
able P2P grids where the peers frequently and unpredictably
joins, leaves, and rejoin the system. To show the performance of
the proposed resource discovery algorithm, several simulation
experiments are conducted under several grid scenarios. The
results of the proposed algorithm are compared with those of
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2029
KL (Kocak and Lacks, 2012) and DWC (Deng et al., 2009).
Simulation results show that the proposed algorithm outperforms
the other methods in terms of the average hop count, average hit
ratio, and control message overhead.
The rest of the paper is organized as follows. In the next
section, learning automata theory is briey reviewed. In Section 3,
a learning automata-based algorithm is proposed for resource
discovery in P2P grid environments. In Section 4, the performance
of the proposed algorithm is evaluated through simulation
experiments, and Section 5 concludes the paper.
2. Learning automata theory
A learning automaton (Narendra and Thathachar, 1989;
Thathachar and Harita, 1987) is an adaptive decision-making unit
that improves its performance by learning how to choose the
optimal action from a nite set of allowed actions through
repeated interactions with a random environment. The action is
chosen at random based on a probability distribution kept over
the action-set and at each instant the given action is served as the
input to the random environment. The environment responds the
taken action in turn with a reinforcement signal. The action
probability vector is updated based on the reinforcement feed-
back from the environment. The objective of a learning auto-
maton is to nd the optimal action from the action-set so that the
average penalty received from the environment is minimized.
Learning automata have been found to be useful in systems where
incomplete information about the environment exists. Learning
automata are also proved to perform well in complex, dynamic
and random environments with a large amount of uncertainties.
Learning automata have a wide variety of applications in combi-
natorial optimization problems (Akbari Torkestani, 2012, 2012j;
Akbari Torkestani and Meybodi, 2012), computer and commu-
nication networks (Akbari Torkestani and Meybodi, 2011a,
2011b; Akbari Torkestani, 2012h, 2012b, 2012f, 2012g, 2012e),
grid computing (Akbari Torkestani, 2012a), and Web engineering
(Akbari Torkestani, 2012c, 2012d, 2012k). Fig. 1 shows the
relation ship between the learning automaton and the random
environment.
The environment can be described by a triple oa,b,c 4,
where a a
1
,a
2
,. . .,a
r
f g represents the nite set of the inputs,
b b
1
,b
2
,. . .,b
m
denotes the set of the values that can be taken by
the reinforcement signal, andc c
1
,c
2
,. . .,c
r
denotes the set of the
penalty probabilities, where the element c
i
is associated with
the given action a
i
. If the penalty probabilities are constant, the
random environment is said to be a stationary random environ-
ment, and if they vary with time, the environment is called a non
stationary environment. The environments depending on the
nature of the reinforcement signal b can be classied into
P-model, Q-model and S-model. The environments in which the
reinforcement signal can only take two binary values 0 and 1 are
referred to as P-model environments. Another class of the
environment allows a nite number of the values in the interval
[0,1] can be taken by the reinforcement signal. Such an
environment is referred to as Q-model environment. In S-model
environments, the reinforcement signal lies in the interval [a,b].
Learning automata can be classied into two main families
(Narendra and Thathachar, 1989): xed structure learning auto-
mata and variable structure learning automata. Variable structure
learning automata are represented by a triple ob,a,T 4, where b
is the set of inputs, a is the set of actions, and T is learning
algorithm. The learning algorithm is a recurrence relation which
is used to modify the action probability vector. Let a
i
k Aa and
pk denote the action selected by learning automaton and the
probability vector dened over the action set at instant k,
respectively. Let a and b denote the reward and penalty para-
meters and determine the amount of increases and decreases of
the action probabilities, respectively. Let r be the number of
actions that can be taken by learning automaton. At each instant
k, the action probability vector pk is updated by the linear
learning algorithm given in Eq. (1), if the selected action a
i
(k) is
rewarded by the random environment, and it is updated as given
in Eq. (2) if the taken action is penalized.
p
j
k1
p
j
k a 1p
j
k

j i
1a p
j
k 8j ai
(
1
p
j
k1
1bp
j
k j i
b
r1

1bp
j
k 8j ai
(
2
If ab, the recurrence Eqs. (1) and (2) are called linear reward-
penalty (L
RP
) algorithm, if acb the given equations are called
linear reward-E penalty (L
REP
), and nally if b0 they are called
linear reward-Inaction (L
RI
). In the latter case, the action prob-
ability vectors remain unchanged when the taken action is
penalized by the environment.
A variable action-set learning automaton is an automaton in
which the number of actions available at each instant changes
with time. It has been shown in Thathachar and Harita (1987)
that a learning automaton with a changing number of actions is
absolutely expedient and also E-optimal, when the reinforcement
scheme is L
RI
. Such an automaton has a nite set of r actions,
a a
1
,a
2
,. . .,a
r
. A{A
1
,A
2
,y,A
m
} denotes the set of action subsets
and A(k)Da is the subset of all the actions can be chosen by the
learning automaton, at each instant k. The selection of the
particular action subsets is randomly made by an external agency
according to the probability distribution C(k){C
1
(k),C
2
(k),y,
C
m
(k)} dened over the possible subsets of the actions, where
C
i
(k)prob[A(k)A
i
9A
i
AA,1ri r2
r
1].
^ p
i
k prob a k a
i
9A k ,a
i
AAk

denotes the probability of
choosing action a
i
, conditioned on the event that the action subset
A(k) has already been selected and a
i
AA(k) too. The scaled
probability ^ p
i
k is dened as
^ p
i
k
p
i
k
Kk
3
where K k
P
a
i
AAk
p
i
k is the sum of the probabilities of the
actions in subset A(k), and p
i
(k)prob[a(k)a
i
].
The procedure of choosing an action and updating the action
probabilities in a variable action-set learning automaton can be
described as follows. Let A(k)be the action subset selected at
instant k. Before choosing an action, the probabilities of all the
actions in the selected subset are scaled as dened in eq. (3). The
automaton then randomly selects one of its possible actions
according to the scaled action probability vector ^ pk. Depending
on the response received from the environment, the learning
automaton updates its scaled action probability vector. Note that
the probability of the available actions is only updated. Finally,
the probability vector of the actions of the chosen subset is
rescaled as p
i
k1 ^ p
i
k1 AKk, for all a
i
AA(k). The absolute
Random Environment
Learning Automaton


Fig. 1. The relationship between the learning automaton and its random
environment.
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2030
expediency and e-optimality of the method described above have
been proved in Thathachar and Harita (1987).
3. Resource discovery algorithm
Resource management is one of the key design issues in grid
systems. Classic grid resource discovery methods are generally
administered in a hierarchical or centralized manner, while grid
environments are highly dynamic, large scale, and naturally
distributed. A peer-to-peer overlay network is a distributed,
dynamic, and scalable approach to connect the grid nodes.
Existing P2P resource discovery approaches are generally classi-
ed into unstructured and structured. The former one suffers
from the network-wide broadcast storm problem, and the latter
one does not fully support the multi-attribute range queries. The
aim of this paper is to design a learning automata-based resource
discovery algorithm for P2P grids to cope with the problems of
the previous structured and unstructured methods.
Let graph GoP,L4 denotes the topology graph of the P2P
network, where P{p
1
,p
2
,y,p
n
} denotes the set of peers, and
LDP P denotes the set of communication links connecting the
peers. Let R{r
1
,r
2
,y,r
m
} denotes the set of available resource
types. In this method, a network of learning automata
A{A
1
,A
2
,y,A
n
} isomorphic to the P2P network graph GoP,L4
is formed by assignment of a learning automaton A
i
to each peer
p
i
. Each learning automaton comprises m action-sets (and action
probability vectors), where m denotes the number of available
resource types. The action-set of automaton is denoted as
a
i
a
ik
91rkrm. Let a
ik
Aa
i
denotes the action-set of kth action
probability vector of learning automaton A
i
assigned to peer p
i
.
The action-set a
ik
includes an action a
j
ik
for each neighboring peer
p
j
of peer p
i
. Let us assume that D
i
denotes the number of peers
that are directly connected to (are neighbors of) peer p
i
. Hence, for
each action probability vector 1rkrm, action-set a
ik
a
j
ik
9
8p
i
,p
j
AL includes D
i
different actions. Selection of action a
j
ik
means that automaton A
i
selects the connection p
i
,p
j
to forward
the query message of resource r
k
. Let p
j
ik
Ap
ik
denotes the choice
probability of action a
j
ik
by automaton A
i
(or communication link
p
i
,p
j
) to locate the peer providing resource r
k
. For each action-set
a
ik
, all actions (communication links) are initially chosen with the
same probability
1
D
i
. This is due to the fact that learning auto-
maton has not a priori knowledge of the resource location. So, at
rst it impartially selects the links at random.
P2P grid is a highly dynamic scalable environment where the
peers frequently and unpredictably enter, depart, and rejoin the
system. Under such circumstances, the topology of the P2P net-
work and consequently that of the isomorphic network of learn-
ing automata frequently changes. The action probability vectors
must be updated upon a topological change. When a peer p
j
joins
the P2P system, it sends a ERQ (Enter ReQuest) message to all its
neighboring peers. ERQ message includes the sender ID, resource
information, and neighbors IDs. Upon receiving ERQ message,
each neighboring peer p
i
calls procedure ERQ(p
i
) shown in Fig. 2.
In this procedure, each neighboring peer p
i
checks the information
of the resources provided by the newly joined peer. If there exist
one or more new type of resources connecting to the system by
the new arrived peer, each neighboring peer p
i
creates a new
action-set for every new type of resource. In this case, since all
peers must be aware of the new resource types, ERQ is ooded
within the network. To do so, each peer resends the receiving ERQ
message to its neighboring peers until the TTL of the ERQ message
expires. Regardless of providing new resource, each neighboring
peer p
i
must update all its action-sets and action probability
vectors by adding a new action a
j
ik
for each resource k as shown in
Lines 0410 of Fig. 2.
When a peer p
j
decides to leave the P2P grid system, it sends a
DRQ (Departure ReQuest) message to all its neighboring
peers. Each neighboring peer p
i
calls procedure DRQ(p
i
) shown
in Fig. 3 as soon as it receives a DRQ message. Neighboring
peer p
i
removes the action corresponding to the leaving
peer p
j
from all its k possible action-set. To do this, the choice
probability of all remaining actions must be increased propor-
tional to the choice probability of the removed action (see
Lines 0409 of Fig. 3). For each resource r
k
AR provided by leaving
peer p
j
, action-set a
ik
of all learning automata A
i
AA must be
removed, if no other peer can provide such a resource. In this case,
each peer that receives the DRQ message resends it to all its
neighboring peers. In the proposed method, the action-sets play
the role of resource and routing tables that are used in the other
approaches. The aim of procedures ERQ and DRQ is to keep the
routing and resource information of the grid system up to date.
Sending DRQ message is not mandatory to remove a peer from
the grid system. The action-set of the learning automata is
updated by removing the resources that are provided by the
leaving peer as soon as one routing query fails to access the
leaving peer. The rst neighboring peer that cannot be connected
to the leaving peer generates the DRQ message. By this scenario,
the proposed resource discovery mechanism tolerate the peer
failures.
Fig. 2. Pseudo code of procedure ERQ (enter request).
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2031
When a user asks the P2P grid system for a resource of type r
k
,
its resource query is initially submitted to its local peer p
i
as a RSQ
(ReSource Query) message. RSQ message includes the source peer
p
d
, the travelled path G
s
k
, receiver ID realine;
id
, path length L
s
k
, and
dynamic validity threshold v
s
k
. Source peer p
d
is the peer to which
the resource query is submitted. Travelled path G
s
k
is a stack
structure comprising IDs of the traversed peers at stage k to locate
resource r
k
. At each stage k, each activated peer must append its
ID to the travelled path G
s
k
by a PUSH operation (G
s
k
G
s
k

ID p
i

). Path length L
s
k
is dened as the number of peers traversed
at stage k to locate the requested resource. Dynamic validity
threshold v
s
k
denotes the average number of peers traversed
during s1 earlier stages to nd the resource location. Validity
threshold v
s
k
is initially (i.e., for the rst stage) set to the number
of peers. In RSQ message that is received at source p
d
, stack G
k
is
empty, receiver ID realine;
id
is ID of p
d
, and path length L
s
k
is set
to zero.
Upon receiving a RSQ message at each stage s, each peer p
i
calls procedure RSQ(p
i
,s,k) shown in Fig. 4. Procedure RSQ(p
i
,s,k) is
locally run at each peer p
i
for locating resource r
k
at stage s. In this
procedure, source peer p
d
rst checks its resource table stored as
the action-sets to see if resource r
k
is provided by the P2P grid
system. If that is not the case (i.e., if a
ik
= 2a
i
), peer p
d
returns an
error message stating that the requested resource is not available
and terminates the procedure. Otherwise, every peer p
i
incre-
ments path length L
s
k
by one and checks its available resources to
see if it is able to provide the requested resource itself. If so, peer
p
i
(which is hereafter called resource providing peer and denoted
as p
d
0 ) returns its location by a RLC (Resource LoCation) message
to the source peer to which user submits its resource request. RLC
message is composed of ve parts: resource providing peer p
d
0 ,
receiver ID realine;
id
, path G
s
k
including the travelled peers from
the source peer to the resource providing peer in a stack order,
updated validity threshold v
s
k1
, and reinforcement signal b
s
k
.
Fig. 3. Pseudo code of procedure DRQ (departure request).
Fig. 4. Pseudo code of procedure RSQ (resource query).
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2032
Reinforcement signal b
s
k
is used to update the internal state of
activated learning automata based on the optimality of the
travelled path to locate the resource. To set the reinforcement
signal b
s
k
, resource providing peer p
d
0 compares path length L
s
k
with validity threshold v
s
k
. If L
s
k
rv
s
k
, then b
s
k
is set to zero and all
learning automata corresponding to the peers included in G
s
k
are
rewarded. Otherwise, it is set to one and all learning automata are
penalized. At each stage k, validity threshold v
s
k1
is updated as
v
s
k1

k1 v
s
k
L
s
k

k
4
Otherwise (if peer p
i
can not provide the requested resource),
peer p
i
activates its corresponding automaton A
i
. Learning auto-
maton A
i
updates action-set a
ik
and action probability vector p
ik
by temporarily disabling the actions corresponding to the peers
selected so far (included in G
s
k
) as described earlier in Section 2
and procedure DRQ. This is to avoid the loop formation and
repetitive peers in G
s
k
. Then, learning automaton A
i
randomly
chooses one of its possible actions from a
ik
based on p
ik
, if any. If
there is no more action in action-set a
ik
, travelled path G
s
k
is
traced back to nd a peer with non-empty action-set. This is done
by sending a TRB (TRacing Back) message to the peer appended to
stack G
s
k
before current peer. This peer resumes the resource
discovery process and chooses one of its possible actions from
non-empty action-set a
ik
. Let us assume that automaton A
i
chooses action a
j
ik
. This implies that peer p
j
is the next peer to
which the task of resource location is entrusted. Selected action is
temporarily removed from the action-set a
ik
. Peer p
i
sends a RSQ
message to peer p
j
through communication link (p
i
,p
j
). This
process continuous until the resource providing peer p
d
0 is found.
As mentioned earlier, resource providing peer p
d
0 sends the
location of the requested resource to the user along traversed path
G
s
k
by a RLC message. To do so, the resource providing peer p
d
0
extracts the peer appended to stack G
s
k
before itself (e.g., peer p
i
)
and sends a RLC message to it. Upon receiving a RLC message, each
peer p
i
calls procedure RLC shown in Fig. 5. In this procedure, the
reinforcement signal b
s
k
is rst checked and the internal state of
automaton A
i
is updated by applying Eq. (1) on p
ik
if b
s
k
is zero and
on Eq. (2) otherwise. After updating the action probability vector,
the action-set must be restored again by enabling the disabled
actions. Then, peer p
i
extracts the peer appended to stack G
s
k
before
itself (e.g., p
j
) and sends a RLC message to it (see Lines 0709 of
Fig. 5). This procedure repeats until the RLC message is received at
source peer p
d
. When RLC message is received at source peer p
d
,
the resource discovery process is over and source peer p
d
can be
connected to resource providing peer p
d
0 through G
s
k
.
Upon receiving a TRB message at peer p
i
, it calls procedure
TRB(p
i
,s,k). In this procedure, learning automaton A
i
checks its
action-set to see if it is empty. If so, peer p
i
decrement the path
length L
s
k
by one and sends a TRB message to the peer p
j
that has
been added to G
s
k
before peer p
i
. This process is repeated until a
peer with non-empty action-set is found. In this case, the learning
automaton corresponding to the found peer selects one of its
actions at random according to p
ik
, and resumes the resource
discovery process by sending a RSQ message to the selected peer
(see Lines 0709 of Fig. 6).
4. Experimental results
In this section, several simulation experiments are performed to
investigate the efciency of the proposed resource discovery algo-
rithm called LARD (short for Learning Automata-based Resource
Discovery algorithm) under three different grid sizes: small, medium,
and large scale P2P grids. The small scale P2P grid system is
composed of 256 peers, and 1024 resources of 4 different resource
types (each resource type having four classs). The medium scale P2P
grid systemis composed of 2048 peers, and 8192 resources. The large
scale P2P grid system is composed of 16,384 peers, and 65,536
resources. In real scenarios, large scale P2P grids may include tens of
Fig. 5. Pseudo code of procedure RLC (resource location).
Fig. 6. The pseudo code of procedure TRB (tracing back).
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2033
thousand peers or even more. However, in this paper, large scale grid
systems are composed of 16,384 peers. Resources are generally of
4 different types: CPU, memory, disk and operating system. CPU,
memory, and disk can be of four different capacities: low, moderate,
high, and very high. Operation system can be also of four different
types on different machines. Therefore, grid resources are generally of
4 different types and 16 classes.
Resources are evenly and randomly distributed between the
peers. 1024, 8192, and 65,536 resource queries are submitted to
the randomly chosen peers of small, medium, and large scale
systems. Queries are for different resource types selected at
random. P2P network topologies are generated as follows. For
small, medium, and large scale systems, peers are randomly and
evenly distributed within the square simulation area of size
250250, 10001000, 40004000 unit, respectively. Neighbor-
ing peers are connected together if the Euclidean distance between
them is less than or equal to 20, 40, and 80 unit for small, medium,
and large scale P2P grid systems, respectively. The nominal
bandwidth of the network connecting every two peers is assumed
to be 10 Mbps. To improve the precision of the reported results,
each experiment is independently repeated 50 times and the
obtained results are averaged over these runs. The performance
of the proposed resource discovery algorithm is compared with
that of KL (a resource discovery method proposed by Kocak and
Lacks (2012) in which the network routers are responsible for
locating the grid resources) and DWC (an ant colony-based
resource discovery algorithm proposed by Deng et al. (2009)) in
terms of the following metrics of interest.
Hop count This metric is dened as the average number of
peers that are traversed to locate the requested resource. Hop
count is affected by the network routing mechanism, resource
distribution, and prior knowledge of the resource location.
Hit ratio This is dened as the percentage of the success
resource discoveries. A resource discovery is successful if at
least one resource providing peer can be found for the
requested resource before TTL expires.
Control Message Overhead This metric is dened as the number
of (extra) control messages required for resource discovery
process. This metric is measured as the number of control
messages that must be sent per second.
In our experiments, the learning algorithm is L
RP
with the
same reward and penalty parameters (learning rate). Obviously,
the effectiveness of the proposed algorithm directly depends on
the choice of a proper learning rate. By the proper choice of the
learning rate, a trade off between the cost of algorithm (control
message overhead) and the solution optimality (hit ratio and hop
count) can be made. Depending on the application nature,
different learning rates can be chosen. If an application sacrices
the cost in favor of the solution optimality, a small learning rate is
preferred, a larger one can be chosen otherwise. Several experi-
ments were initially conducted to determine the best value of the
learning rate. To nd such a proper value, different learning rates
ranging from 0.05 to 0.5 were tested on small, medium, and large
scale P2P grids. The obtained results showed that the best results
are achieved when the learning rate is set to 0.075, 0.080, and
0.090 for small, medium, and large scale systems, respectively.
Therefore, the learning rate is set to the above mentioned values
in different P2P grid scales for further experiments.
4.1. Hop count
The aim of this experiment is to show the ability of different
resource discovery algorithms to locate the nearest peer provid-
ing the requested resource. Fig. 7 represents a comparison of the
average hop count of the proposed resource discovery algorithm
with KL (Kocak and Lacks, 2012) and DWC (Deng et al., 2009) for
different grid scale scenarios. From the results shown in this
gure, it can be seen that the average hop count increases as the
system scale (network size) increases. One possible reason might
be that the resources are distributed in a wider area and so the
distance (number of hops) between the user and resource
increases. The results shown in Fig. 7 are averaged over the
number of submitted resource queries to the system. Each
experiment is repeated 50 times and the results are also averaged
over the number of runs. Comparing the results given in Fig. 7, it
is clear that the proposed algorithm signicantly outperforms the
other algorithms in terms of the number of hops, KL (Kocak and
Lacks, 2012) lags far behind LARD and DWC (Deng et al., 2009)
has the largest hop count. One reason is that the proposed
algorithm avoids appearing the cycle and redundant peer in the
constructed path. The results also show that the gap between the
proposed algorithm and the other methods becomes more sig-
nicant as the system scale increases. Contrary to KL (Kocak
and Lacks, 2012) and DWC (Deng et al., 2009), no signicant
growth can be seen in the number of hops of the proposed
algorithm as the network size increases. This is because the
proposed algorithm is fully distributed and independent from
the network size.
4.2. Hit ratio
Hit ratio is a very important measure to evaluate the effec-
tiveness of a resource discovery algorithm that represents the rate
of successful discoveries. This set of experiments is performed to
investigate the hit ratio of different algorithms under different
grid scales. The obtained results are shown in Fig. 8. Form the
results shown in this gure, it is obvious that the hit ratio of
the proposed resource discovery algorithm is higher than that of
the other approaches. Comparing the results shown in Fig. 8, we
nd that the gap between the proposed algorithm and the other
methods becomes larger as the network size grows. This shows
the higher scalability of LARD. The proposed method taking
advantage of learning automata is able to memorize the shortest
path toward the resource. This path is stored as the probability
vectors in learning automata. When a path is constructed to
connect a requesting peer to a resource provider, it can be used to
connect the intermediate peers for the same resource queries too.
Among different possible paths toward the same resources, LARD
converges to the shortest path. That is why LARD selects the more
probable paths and has a higher hit ratio.
0
2
4
6
8
10
12
14
16
18
20
Small Scale Medium Scale Large Scale
H
o
p

C
o
u
n
t
Grid Size
KL
DWC
LARD
Fig. 7. Average hop count under different grid scale.
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2034
4.3. Control message overhead
These experiments are conducted to measure and compare the
control message overhead of different resource discovery mechan-
ism. The experimental results are depicted in Fig. 9. Comparing the
results shown in this gure, it can be seen that the proposed
algorithm LARD has the lowest rate of control message overhead
and DWC (Deng et al., 2009) has the highest one. KL (Kocak and
Lacks, 2012) encapsulates the resource discovery packets within
the TCP/IP packets and so it places a considerably smaller amount
of extra control packets to the system as compared to DWC (Deng
et al., 2009). As mentioned earlier, the main objective of the
proposed algorithm is to alleviate the impact of the network-wide
broadcast storm problem (to reduce the number of broadcasts).
The proposed algorithm sends the resource query messages only
to the peers that have the requested resources with a much higher
probability. As the proposed algorithm proceeds, the resource
queries are forwarded along the shortest paths connecting the
peers with a probability as close to one as possible. This mean-
ingfully decreases the rate of extra message overhead required for
resource discoveries. As shown in Fig. 9, the rate of control
message overhead increases as the grid becomes larger. This is
clear because the hop count and so the number of rebroadcasts
increases as the P2P network size increases.
5. Conclusion
In this paper, a decentralized learning automata-based
resource discovery algorithm was proposed for large-scale
unstructured P2P grids. This algorithm was designed to relief
the negative impacts of the global ooding problem on the
network performance and to support the multi-attribute range
queries too. In this method, the resource queries are forwarded
through the shortest paths ending at the grid peers more likely
having the requested resources. In the proposed algorithm, each
peer is equipped with a learning automaton and network of
learning automata is responsible for routing the query toward
the resource provider through the shortest path. The proposed
algorithm supports the highly dynamicity of the scalable P2P
grids. Several simulation experiments were conducted on small,
medium, and large scale P2P grid environments to show the
performance of the proposed resource discovery algorithm. The
obtained results were compared with those of KL (Kocak and
Lacks, 2012) and DWC (Deng et al., 2009) in terms of average hop
count, average hit ratio and control message overhead. Numerical
results showed that the proposed algorithm outperformed KL
(Kocak and Lacks, 2012) and DWC (Deng et al., 2009) in all small,
medium, and large scale grids. The more signicant gap between
the hop count, hit ratio and message overhead of the proposed
algorithm and the others for large scale grid environments show
the higher scalability of the proposed algorithm as compared to
KL (Kocak and Lacks, 2012) and DWC (Deng et al., 2009).
References
Akbari Torkestani J. A new approach to the job scheduling problem in computa-
tional grids, Cluster Computing, in press, 2012a.
Akbari Torkestani J. LAAP: a learning automata-based adaptive polling scheme for
clustered wireless Ad-Hoc networks, Wireless Personal Communication, in
press, 2012b.
Akbari Torkestani J. An adaptive learning automqata-based ranking function
discovery algorithm, Journal of intelligent information systems, in press,
2012c.
Akbari Torkestani J. An adaptive focused web crawling algorithm based on
learning automata, Applied Intelligence, in press, 2012d.
Akbari Torkestani J. Backbone formation in wireless sensor networks, Sensors and
Actuators A: Physical, in press, 2012e.
Akbari Torkestani J. Mobility prediction in mobile wireless Networks. Journal of
Network and Computer Applications 2012f;35:163345.
Akbari Torkestani J. A stable virtual backbone for wireless MANETS, Telecommu-
nication Systems Journal, in press, 2012g.
Akbari Torkestani J. An adaptive backbone formation algorithm for wireless sensor
networks. Computer Communications 2012h;35:133344.
Akbari Torkestani J. Degree constrained minimum spanning tree problem in
stochastic graph. Journal of Cybernetics and Systems 2012i;43(1):121.
Akbari Torkestani J. An adaptive heuristic to the bounded-diameter minimum
spanning tree problem, Soft Computing, in press, 2012j.
Akbari Torkestani J. An adaptive learning to rank algorithm: learning automata
approach. Decision Support Systems, in press, 2012k.
Akbari Torkestani J, Meybodi MR. LLACA: an adaptive localized clustering algo-
rithm for wireless Ad hoc networks based on learning automata. Journal of
Computers & Electrical Engineering 2011a;37:46174.
Akbari Torkestani J, Meybodi MR. A link stability-based multicast routing protocol
for wireless mobile Ad hoc networks. Journal of Network and Computer
Applications 2011b;34(4):142940.
Akbari Torkestani J, Meybodi MR. Finding minimum weight connected dominating
set in stochastic graph based on learning automata. Information Sciences
2012;200:5777.
Andrzejak A, Xu Z. Scalable, efcient range queries for grid information services In:
Proceedings of 2nd international conference on P2P computing, pp. 3340,
2002.
Cai M, Frank M, Chen J, Szekely P. MAAN: a multi-attribute addressable network
for grid information services. in: Proceedings of 4th international workshop on
grid computing, pp. 184191, 2003.
Deng Y, Wang F, Ciura A. Ant colony optimization inspired resource discovery in
P2P grid systems. Journal of Supercomputing 2009;49:421.
Eugster PT, Guerraoui R, Kermarrec AM, Massoulie L. From epidemics to dis-
tributed computing. IEEE Computer 2004;37(5):607.
Iamnitchi A, Foster IT. A P2P approach to resource location in grid environments,
grid resource management. In: Weglarz J, Nabrzyski J, Schopf J, Stroinski M,
editors. Kluwer; 2003.
Kocak T, Lacks D. Design and analysis of a distributed grid resource discovery
Protocol. Cluster Computing 2012;15(1):3752.
Marzolla M, Mordacchini M, Orlando S. Resource discovery in a dynamic grid
environment. In: Proceedings of DEXA workshop, pp. 356360, 2005.
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Small Scale Medium Scale Large Scale
C
o
n
t
r
o
l

M
e
s
s
a
g
e

O
v
e
r
h
e
a
d
Grid Size
KL
DWC
LARD
Fig. 9. Control message overhead under different grid scale.
0.8
0.85
0.9
0.95
1
Small Scale Medium Scale Large Scale
H
i
t

R
a
t
i
o
Grid Size
KL
DWC
LARD
Fig. 8. Average hit ratio under different grid scale.
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2035
Mastroianni C, Talia D, Verta O. A super-peer model for building resource
discovery services in grids: design and simulation analysis. In: Proceedings
of European grid conference, LNCS, vol. 3470, pp. 132143, 2005a.
Mastroianni C, Talia D, Verta O. A super-peer model for resource discovery services
in large-scale grids. Future Generation Computer Systems 2005b;21:123548.
Merz P, Gorunova K. Fault-tolerant resource discovery in P2P grids. Journal of Grid
Computing 2007;5:31935.
Narendra KS, Thathachar MAL. Learning automata: an introduction. New York,
Printice-Hall; 1989.
Puppin D, Moncelli S, Baraglia R, Tonelotto N, Silvestri F. A grid information service
based on P2P. In: Proceedings of 11th Euro-Par conference, LNCS, vol. 3648,
pp. 454464, 2005.
Ratnasamy S, Hellerstein JM, Shenker S. Range queries over DHTs, IRB-TR-03-009,
Intel Corporation, 2003.
Ratnasamy S, Francis P, Handley M, Karp RM, Shenker S. A scalable content-
addressable network. In: Proceedings of ACM SIGCOMM 2001 conference on
applications, technologies, architectures, and protocols for computer commu-
nication, pp. 161172, 2001.
Rowstron A, Druschel P. Pastry: Scalable, decentralized object location and routing
for large scale P2P systems. In: Proceedings of IFIP/ACM international
conference on distributed systems platforms, middleware, LNCS, vol. 2218,
pp. 329350, 2001.
Schmidt C, Parashar M. Flexible information discovery in decentralized distributed
systems. In: Proceedings of 12th international symposium on high-
performance distributed computing, pp. 226235, 2003.
Spence D, Harris T, XenoSearch. Distributed resource discovery in the XenoServer
open platform. In: Proceedings of the 12th IEEE international symposium on
high performance distributed computing, pp. 216225, 2003.
Stoica I, Morris R, Karger DR, Frans Kaashoek M, Balakrishnan H. Chord: a scalable
P2P lookup service for internet applications. In: Proceedings of ACM SIGCOMM
2001 conference on applications, technologies, architectures, and protocols for
computer communication, pp.149160, 2001.
Talia D, Truno P. P2P protocols and grid services for resource discovery on grids.
In: Grandinetti L, editor. Grid computing: the new frontier of high perfor-
mance computing, advances in parallel computing, Vol. 14. Elsevier Science;
2005. p. 83105.
Thathachar MAL, Harita BR. Learning automata with changing number of actions.
IEEE Transactions on Systems, Man, and Cybernetics 1987;SMG17:1095100.
Truno P, Talia D, Papadakis H, Fragopoulou P, Mordacchini M, Pennanen M, Popov
K, Vlassov V, Haridif S. P2P resource discovery in grids: models and systems.
Future Generation Computer Systems 2007;23:86478.
Yu H, Bai X, Marinescu DC. Workow management and resource discovery for an
intelligent grid. Parallel Computing 2005;31:797811.
J. Akbari Torkestani / Journal of Network and Computer Applications 35 (2012) 20282036 2036

You might also like