You are on page 1of 25

Honeysort

a framework for adaptive distributed


algorithms

Presented by Yookyung Jo
Motivation
• Demand for distributed protocols and
algorithms to adapt to dynamic system
conditions in a preferably decentralized
manner
• Distributed algorithms are sensitive to
dynamic system conditions
– Distributed sorting : network bandwidth,
memory, CPU
– Image compression : (low CPU & high
bandwidth) vs. (high CPU & low bandwidth)
Motivation (2)
• Existing approaches on adaptive
distributed computing in Grid :
– Elaborate tuning of adaptation
parameters
– Centralized control of adaptation
• Solution : social insect’s collective
decision-making mechanism
– Individual insect’s local, simple behavior
=> desired global patterns
Honey bees’ foraging :
collective decision making
• Honey bees’ foraging behavior
– Given multiple nectar sources
– All bees forage the best nectar source
– Without centralized control
• Distributed algorithms
– Various candidate algorithms are tried
– All machines converge to the best
algorithm
State transition :

Dancing d d2 d3
: 1

Foraging a1 a2 a3
:

Following :
f
Translation
Honey bee Distributed
algorithm

•N honey bees •N distributed


•M nectar sources machines or
•Foraging processes
•M different local
•Dancing
algorithms
•Following
•Algorithm execution
•Algorithm
advertising
•Algorithm selection
Difference equation

sqi d i (t − 1)
ai (t ) = ai (t − 1)(1 − pf i ) + ( ) f (t − 1)
∑ sq j d j (t − 1)
j

where
Positive feedback
d i (t − 1) = pd i ai (t − 1)
f (t − 1) = ∑ pf j a j (t − 1)
j
Difference equation analysis
:
• The only stable equilibrium point :

(a1 ,..., ai ,..., am ) = (0,..., N ,...,0)


where i satisfies
sqi pd i sq j pd j
> for all j.
pf i pf j
Brush up : sorting
• Many candidate algorithms with
different computational complexity
– Quick sort, heap sort … : O(N log(N))
– Insertion sort, shell sort … : O(N^2)
• Sensitive to the initial input data
– Insertion sort : nearly sorted data : O(N)
random data : O(N^2)
Sorting : Input data
• Nearly sorted input is common
– A database table with highly correlated
attributes

Employee database
table
Name Salary Work Ag … Already sorted in “Work
year e years”
s
A Sort it according to “Salary”
===>
B
Sorting problem on
… nearly sorted data
Honeysort
• Master : sample sort, data distribution

3 7 4 1 11 9 15 14 20 24 17 19

• Worker : while(receive_data()) {
local_sort (); // quick sort, insertion sort...
if (pd) advertise_sort ();
if (pf) select_a_new_local_sort ();
send_back_the_data ();
}

Advertising message : {local sorting algorithm used,


speed}
Honeysort

: data transfer
: bee communication (peer
communication of algorithm evaluation
Experimental setup
• # of worker machines : 6-40
• Candidate local sorting algorithm :
quick sort, insertion sort
• Machines : csil-linux#.cs.uiuc.edu
• TCP socket connection
• Sorting data : 2 4byte-integers
• Written in C++
Experiment :
Performance with fixed # of
segments

Random data Pre-sorted data


Experiment :
Performance with fixed chunk size
(1000)

Random data Pre-sorted data


Explorative model
If the characteristic of the input data changes after the
convergence, the new best algorithm has no means to
attract machines
=>
Non-zero influx of machines to all candidate algorithms

sqi d i (t − 1) + r
ai (t ) = ai (t − 1)(1 − pf i ) + ( ) f (t − 1)
∑ sq j d j (t − 1) + r
j
Experiment :
Performance with varying degree of
homogeneous randomization on input data
Experiment :
Performance with varying degree of input data
character change
Experiment :
Evolution of local sorting algorithm with
heterogeneously changing input data
Experiment :
Sensitivity to the initial distribution of local
sorting algorithms
Conclusion
• Honeysort exhibits
– good performance
• Close to the best algorithm on homogeneous
input
• Better than all other algorithms on
heterogeneous input
– Desirable properties
• Dynamic adaptation without pre-tuning
• Decentralized control (decision making)
Conclusion(2)
• More application
– Applying honey adaptive distributed
algorithm requires
• The job could be divided and concurrently
run
• Different candidate algorithms exist with
different performance on varying conditions
• Data transfer cost is far smaller than the job
• Preferably involves a large number of
participation
– 3D rendering algorithms…
Conclusion(3)
• Future work
– To experiment honeysort on a large
number of clients to see the benefit of
decentralized control
– To find a good application scenario for
honeysort (e.g. stream data sorting) and
experiment
– To apply honey adaptive framework in
other problems such as 3D-rendering
Experiment :
Experiment :

You might also like