Le Havre University Faculty of Sciences and Technology Laboratory of Computer Sciences Practical training report from the 8 th of March 2004 to the 10 th of June 2004. M. Cyrille BERTELLE headmaster of the DEA-ITA and one of my tutors all along this time. Damien OLIVIER, my tutor and mentor, who helped me with every problem or trouble I encountered.
Le Havre University Faculty of Sciences and Technology Laboratory of Computer Sciences Practical training report from the 8 th of March 2004 to the 10 th of June 2004. M. Cyrille BERTELLE headmaster of the DEA-ITA and one of my tutors all along this time. Damien OLIVIER, my tutor and mentor, who helped me with every problem or trouble I encountered.
Le Havre University Faculty of Sciences and Technology Laboratory of Computer Sciences Practical training report from the 8 th of March 2004 to the 10 th of June 2004. M. Cyrille BERTELLE headmaster of the DEA-ITA and one of my tutors all along this time. Damien OLIVIER, my tutor and mentor, who helped me with every problem or trouble I encountered.
0.1. Acknowledgments I want to thank all members of the laboratory who I worked with, particularly:
M. Cyrille BERTELLE, headmaster of the DEA-ITA and one of my tutors all along this time.
M. Damien OLIVIER, my tutor and mentor, who helped me with every problem or trouble I encountered.
M. Frdric GUINAND, who encouraged me to continue my studies at the University of Le Havre.
M. Antoine DUTOT, PhD student and author of AntCO 2 , with whom I worked with during the entire project and who was always ready to tackle new problems.
M. Guillaume PREVOST and M. Sylvain LEREBOURG, PhD students, who always answered my questions about the ProActive API in a constructive way.
Mme Emna BOUAZIZI, M. Majed ABDOULI, M. Jrme HAUBERT, M. Denis MERON, M. Samy SAMGHOUNI, M. Pierrick TRANOUEZ and everybody else who was very patient with me.
Special thanks go also to:
My classmates, Mme. Mlanie DERRE, Mme. Noemy PICARD, M. Mahmoud ABIDER, M. Mokrane BOUARABA, M. Thomas DE CONTES, M. Jean-Claude DE SOUZA, M. Frederic DUCHAUSSOY, M. Mathieu GALLET, M. Jean Baptiste GASHUMBA, M. Anis HAJ SAID, M. Nizar IDOUDI, M. Mohamedou OULD BIBI, M. Yoann PIGNE, M. Gauthier PITOIS, M. Mathieu PRIGENT, M. Frank SANNIER
Romy FISCH, my sister, and Chantal WEIS, my wife, who helped to revise this document.
PructicuI Truininq Report
Page 3 of 35
0.2. Abstracts 0.2.1. Abstract Entity-based simulations are basically very synchronous and need a lot of calculation power. This document describes an entity model for large-scale distributable simulations. It tries to break down the synchronism in order to make it possible to process different tasks independently on distinct nodes of the network. Additionally, a related architecture will be presented. With the growing number of entities, problems of overcharged CPU's or heavily loaded network connections appear. In o rder to set up some kind of load-balancing between the distinct nodes, an intelligent ant-based algorithm will be used.
0.2.2. Rsum Les simulations centr individus sont souvent trs synchrone et ont besoin d'une puissance de calcul formidable. Ce document dcrit un modle d'entits pour de grandes simulations distribues qui essaye de casser le dit synchronisme en vue de pouvoir faire tourner indpendamment diffrentes tches sur des noeuds distinct du rseau. Une architecture adquate sera prsente. Lorsque le nombre d'entits augmente, des problmes de surcharges des CPUs ou d'embouteillages de connexion rseau peuvent surgir. En vue d'une quilibration des charges entre les noeuds, un algorithme de fourmis intelligent sera utilis.
0.2.3. Kurzfassung Individuum basierte Simulationen sind oft sehr synchron und bentigen sehr viel Rechenkraft. Dieses Dokument beschreibt ein individuum-basiertes Modell fr groe verteile Simulationen. Dieses versucht hinsichtlich einer unabhngigen Ausfhrung verschiedener Prozesse auf den einzelnen Knoten des Netzwerkes diesen Synchronismus zu brechen. In diesem Zusammenhang wird auch eine dementsprechende Architektur prsentiert. Mit steigender Anzahl an simulierten Einheiten, knnen Rechnerberlastungen oder Transmissionsengpsse von Netzwerkverbindungen auftreten. Diesbezglich wird ein intelligenter Ameisenalgorithmus eingesetzt um einen Lastenausgleich zwischen den einzelnen Knoten zu schaffen.
Figure 1: Space division.......................................................................................11 Figure 2: Environment "torus" representation.........................................................12 Figure 3: Communication scheme .........................................................................14 Figure 4: Simulation cycle....................................................................................16 Figure 5: Entity transaction scheme ......................................................................16 Figure 6: Simulation cycle....................................................................................16 Figure 7: Simulation cycle with migration...............................................................17 Figure 8: Circular view scheme.............................................................................19 Figure 9: View by circular zone with arc (discrete values) ........................................19 Figure 10: Communication graph with AntCO 2 ........................................................21 Figure 11: Communication scheme between AntCO 2 and DEDIS ...............................22 Figure 12: Synchronization by ProAvtive................................................................24 Figure 13: Interface to AntCO 2 -Master part ............................................................25 Figure 14: Interface to AntCO 2 -Slave part ..............................................................25 Figure 15: Simulation time response versus number of entities per machine ..............27 Figure 16: Emerging groups .................................................................................28 Figure 17: Group splitting ....................................................................................29 Figure 18: Entities around an obstacle...................................................................29 Figure 19: DEDIS post-simulation visualization.......................................................31 Figure 20: AntCO 2 graph visualizer........................................................................31 Figure 21: AntCO 2 analyze ...................................................................................32
PructicuI Truininq Report
Page 7 of 35
1. INTRODUCTION
This document describes the different experiences, reflections and studies of my practical training at the computer laboratory between the 8 th of March 2004 and the 10 th of June 2004. It is primarily dedicated to my project, a distributed entity-based simulator.
1.1. LIH 1.1.1. Presentation Under the leadership op Jol COLLOC, the Laboratory of Informatics of the University of Le Havre counts today 2 PU, 2 HDR, 18 MCF and 13 PhD students.
It main research is done in the following categories:
Evolutionary systems on life and environment Distributed artificial intelligence - Mutli-agent systems Real-time database management systems
1.1.2. Members 1.1.2.1. Professors
Alain Cardon Jol Colloc
1.1.2.2. Lecturers
Thierry Galinho Da Silva Frdric Guinand Vronique Jay Bruno Mermet Moustapha Nakechbandi Damien Olivier Patrick Person Jean-Luc Ponty Bruno Sadeg Frdric Serin Gale Simon
Laurent Amanton Dominique Archambault Mustapha Arfi Stefan Balev Cyrille Bertelle Hadhoum Boukachour Michel Coletta Jean-Yves Colin Marianne Deboysson - Flouret Claude Duvallet Dominique Fournier
PructicuI Truininq Report
Page 8 of 35
1.1.2.3. PhD Students
1.1.3. Location Laboratoire d'Informatique du Havre (EA3219) 25 rue Philippe Lebon, BP 540 F-76058 Le Havre cedex
1.2. Related projects One of the pillars of the research at the LIH is about life-based information technology models 1 for developing distributed or parallel applications. One of the main goals is to understand and explain the way certain natural complex systems work and to reproduce their behaviour in digital simulations. Secondly, it aims at setting up new conceptual models inspired by models taken from the real world and their specific mechanisms. As a result, it tries to define new approaches which are more appropriated for distributed and parallel systems.
The different projects I want to mention are:
Antoine DUTOT Ant Algorithms for Adaptative Dynamic Distribution http://www-lih.univ-lehavre.fr/~dutot/Ess2003/index.html
Guillaume PREVOST Individual-based simulations in estuarial environments http://www-lih.univ-lehavre.fr/~prevost/sujet.html
Sylvain LEREBOURG Decentralized models for organisational stream simulations in complex environments http://www-lih.univ-lehavre.fr/~lerebourg/
1 http://www-lih.univ-lehavre.fr/Recherches/Themes/miv.html Sylvain Lerebourg Denis Meron Guillaume Prevost Aurlia Rabia Samy Semghouni Pierrick Tranouez Majed Abdouli Emna Bouazizi Roland Coma Xavier Denis Antoine Dutot Jrme Haubert
PructicuI Truininq Report
Page 9 of 35
2. DESCRIPTION
2.1. Context Due to the variety and diversity of organisms that live in an aquatic ecosystem, the relative characteristics of each species, and the properties of their environment, it is very hard to predict what kind of influence any change to the system may have. For obvious reasons such tests cannot be performed on real objects. On the one hand, they would be too time-consuming because the evolution of the global system is very slow; on the other hand, wrong tests may be harmful and injurious. Thus the need to do these tests in a virtual environment, using virtual entities, arises. Here failed tests on making changes to the system do not result in a catastrophe. The main objective is to end up with a computer simulation of an aquatic ecosystem.
As already mentioned above, such an ecosystem is constituted of a huge number of interacting entities. This is a complex system because the whole is more than the sum of its pieces, meaning in this case that the overall behaviour of the system cannot be explained by the absolute knowledge of each kind of participating entities. As a matter of fact the behaviour of a single entity depends on its nearby environment. Thus, every entity is influenced by other entities which it may influence in return.
In order to obtain significant results, the number of necessary entities can easily exceed one hundred thousands. This will be too much for a simulation running on a single processor system. To overcome this problem, distributed simulation systems should be considered. Spreading the entities over many machines could speed up the entire simulation considerably.
But an ecosystem is something alive, which means that some entities die and others will be born. For the simulation, this causes the number of entities which reside on a given machine not to be stable but constantly changing. Hence there is a need to perform some kind of dynamic load-balancing between each participating computer.
Generally speaking, entities which are close to each other exchange more data than distant ones. This data exchange can also be seen as a type of communication. They form what is also called a "heavy communicating cluster". The appearance of such groups and their related movements has not been programmed explicitly. Their emergence is due to the individual behaviour their constituting entities.
By extracting the necessary data from the simulation, a communication graph can be established. The mentioned load-balancing is based upon the latter.
PructicuI Truininq Report
Page 10 of 35
2.2. Objective DEDIS is the acronym of "Distributed Environment for Dynamic Individual-based Simulations". The main goal of the DEDIS project is to find a way to dynamically distribute an entity-based simulation among multiple computers.
One of the major problems of such models is the huge number of entities which have to participate and communicate with each other. While adding more and more entities to the simulation and making it run over longer periods of time, this can easily become very sticky. A possible solution would be to distribute the entities dynamically among different computers.
Instantly the question about which computer will simulate which part of the simulation with which entities will arise. Hence the distribution has to take into account not only the number of entities which reside on a given node, but also their need for communication with others and the node's calculation power. This explains why the simulation has to monitor parameters like the communication variation of each entity or of groups of entities, their number and the average load of their node.
2.3. Requirements #1 All code has to be written in Java.
#2 The communication part should be based on the ProActive API. ProActive is a Java library for parallel, distributed, and concurrent computing, also featuring mobility and security in a uniform framework. With a reduced set of simple primitives, ProActive provides a comprehensive API allowing to simplify the programming of applications that are distributed on Local Area Networks, on workstation clusters, or on Internet Grids. 2
2 http://www-sop.inria.fr/oasis/ProActive
PructicuI Truininq Report
Page 11 of 35
3. SIMULATION MODEL
In this section I will describe the model I conceived in order to result in a dynamic, distributable and large-scale simulation.
3.1. Environment First of all, I fixed ideas about the environment entities will live in. In order to keep things clear and simple, I want to study the case where the space is reduced to a fixed sized 2D ground, divided into different cells.
Figure 1: Space division
This grid does not influence the way entities are executing. It only helps them to speed up the search for neighbours. As mentioned by [REY01], a straightforward implementation of a neighbour search algorithm has a complexity of O(n 2 ), because of the fact that a given entity has to query all remaining entities for their position, and only a suitable spatial data structure allows to reduce this cost to nearly O(n). As a result, the proposed space division merely serves to locate entities very quickly.
PructicuI Truininq Report
Page 12 of 35
Two types of environments may be considered:
Closed environment: This means that the environment is closed at every side, e.g. if an entity reaches the left border it cannot move further on and is blocked by the environment border.
Open environment: In this case the borders are not closed but open, which means that an entity leaving the environment at the left side will re-enter the environment at the right side. As a matter of fact the environment may be seen as a "torus":
Figure 2: Environment "torus" representation
3.2. Entities Every piece of the simulation needing special processing or interactions is called an entity. Each of them has individual needs and reacts or evolves in a different way. Every entity has its own life; it is responsible for its behaviour. It may be influenced by nearby entities but the latter must not take a decision in its stead.
Each entity is tied to a cell on the environment. As mentioned above, the partitioning of the space is only used to speed up neighbour research. In fact every entity may "look around" to find nearby entities. Depending on its neighbourhood it might take a certain decision, as for example to move away from its actual position or to send out a message to other entities.
3.3. Simulation architecture Because such a simulation needs a minimum of synchronization, I made the decision to use master-slave scheme with a global clock and a kind of entity directory.
In the next few paragraphs I will try to give a more detailed description of the roles of each part. I will also discuss its advantages and disadvantages and point out where I have encountered problems.
PructicuI Truininq Report
Page 13 of 35
3.3.1. The master The master's first role is to manage all its slaves. This comprises the knowledge about the registered clients and the ability to dispatch incoming messages among them.
As already mentioned, the master includes a global clock, on which the slaves are synchronized. This implies that only the master can initiate, pause or stop a simulation.
Furthermore this part is responsible for maintaining a lookup table for the slave-entity relation. This allows a slave, or the master itself, to quickly find out on what other slave a given entity is located or reversely which entities are positioned on a given slave.
3.3.2. The slave The slaves are the big working parts of the simulation. They are controlled by the master and have to register with it before the simulation can begin. Each of the slaves holds a certain number of entities and is responsible for their execution. Each slave has to keep updated a repository of the entities it owns.
Moreover, it has to offer different services to its own entities as well as to other slaves. As a matter of fact, an entity may want to communicate with another one, but as it does not have the knowledge about the exact location of the entity, it must have the possibility to delegate this job to the slave it is located on. The latter must offer a kind of local communication service, which will guarantee that messages and queries passed across the network will arrive at their destination. This same service will also allow inter-slave- communication.
For performance reasons, I decided not to attribute a thread to each entity, but rather to let each slave make its own schedule. This makes the slave machines a lot more reliable and decreases their response time.
While working on a prototype slave, I encountered blocking problems, which were related to ProActive's request-management. In fact, after posting a remote request, active objects [PRO04, p. 8] block until the answer has been returned. This means that when two slaves are sending out a request to each other, each of them is waiting for the other's response. Of course, this blocks the entire simulation, because the master waits for the slaves' result. Thus, in order to exclude mutual blocking of slaves, I composed each slave of two different threads: the first one is responsible for executing incoming requests from other slaves or from the master and the second schedules the execution of its entities, only making outgoing requests.
PructicuI Truininq Report
Page 14 of 35
Master Slave1 Slave2 execute 1 execute 2 launch launch a new thread inside the slave slave-slave communication slave-master communication
Figure 3: Communication scheme
3.4. Environment distribution When thinking about distributing the simulation, one must also consider the environment, which is common to all entities, can be distributed and shared among all participating machines. Different scenarios may be considered:
3.4.1. Static distribution With a static environment distribution, every client will statically receive a certain part of the environment. This implies that the entire environment is not really shared, but divided and then spread among the slaves. Regarding the objectives of the project, this method presents two main disadvantages:
The number of client machines is static and cannot change during the execution.
Fixing the environment distribution will also fix the location of the entities, which means that the distribution of the entities will mainly depend on their position and not on their communication volume with other.
These disadvantages are opposed to the project's objectives of dynamic entity distribution; this method is therefore rejected.
PructicuI Truininq Report
Page 15 of 35
3.4.2. Dynamic distribution Distributing the environment dynamically would mean that the cells would be distributed among the clients depending on the client load. Again, this approach does not respect the fact that entities will communicate more or less.
Furthermore, determining dynamically what cell to position on what machine introduces on one hand an overhead in calculation time and on the other hand implies that cells additionally must be able to migrate from one machine to another; this also means more communication. Given the scale the simulation may reach, this method will not fit our needs either and can be excluded.
3.4.3. No distribution As the environment only serves to determine quickly what entity resides on a given cell, I think there is no real need to distribute the environment. As solution, I propose that every client machine should possess its own entire environment. The environment can be seen as a simple lookup table which maintains for a given pair(x,y) of coordinates a list of entities. This is quite similar to what [MOR96] describes in his paper about large-scale distributed simulations.
When a given entity wants to access another one, whether local or remote, it has to delegate its local communication service. This is similar but not as expensive as the model [DIE98] conceives. In his model an entity on a given machine communicates with a remote entity via a proxy object, representing the distant entity.
3.5. Simulation model Having clarified things this far, I now want to take a closer look at the simulation itself. Limited by the fact that it has to be dynamic, distributable and large-scale, the following questions have to be answered:
In general, entity simulations are very synchronous. How can one break this synchronism without loosing data integrity?
Because entities may die or be born, the scenario's characteristics are dynamic. Thus intelligent distribution is difficult [BRU98] because on the one hand the available computational power of each machine is difficult to quantify and on the other hand network behaviour may be very chaotic. So the major question is how can entities be distributed equally among all participating computers?
PructicuI Truininq Report
Page 16 of 35
execution synchronisation synchronisation commit
Figure 4: Simulation cycle
3.5.1. Entity transactions During the simulation, the execution of a given entity may depend on the values of some attributes of other entities. Thus the major problem is to make sure that the state of the queried entities does not change. Unfortunately simulation is placed in a distributed context, so one possible solution is to give every entity two states: an old one, the "read" state, and the actual one, the "write" state. In order to implement this, a commit operation, to copy the values from the write state to the read state, has to be introduced.
variable ABC_read variable ABC_write read write commit copy value
Figure 5: Entity transaction scheme
This technique partially allows the breaking of the simulation's synchronism. In fact, when an entity executes and needs to access other entities, it is only allowed to obtain data from their read state. Nobody can modify the write state except the entity itself. Thus an entire simulation cycle will be composed as shown on the following figure:
execution synchronisation synchronisation commit
Figure 6: Simulation cycle
3.5.2. Entity migration To solve the problem with the changing number of entities and the load of the client machines, a dynamic distribution of the entities must be set up. This implies that an entity must be able to migrate from one slave to another one.
PructicuI Truininq Report
Page 17 of 35
The state of a given entity is only determined by the values of its attributes. The simplest and easiest way to achieve this is to capture the entity's relevant attributes and to transfer these to the destination machine before removing the entity. Once arrived, the destination slave will instantiate a class of the same type as the entity and copies the values of the attributes to it.
In addition to the execution and commit cycles of the simulation, a migration phase has to be introduced. During this phase entities have to decide whether to migrate to another machine.
Finally, the question of how to decide when and where to migrate arises. As a matter of fact, this decision depends on various parameters: the average load of the machine which an entity resides on, the total number of entities on this machine or the amount of communication the entity has do make with others.
This is where the DEDIS project has to be interfaced with the AntCO 2
project, which offers load-balancing suggestion services. Further details about the connection to AntCO 2 will be discussed in the next chapter.
3.6. Interactions Basically there exist two different possibilities for entities to communicate with other ones: either an entity "pushes" information towards or "pulls" it from another one.
3.6.1. Push A push interaction is characterized by the fact that a given entity "pushes" information towards another entity. The simplest way to make this work is by using a kind of messaging system.
Pushing information from one entity to another one is not a very flexible method and does not really reflect the emitter entity's independence. Nevertheless, this method presents some advantages, mainly when a precise entity should be contacted or an asynchronous action is needed.
PructicuI Truininq Report
Page 18 of 35
It is also important to notice that during the push method, the emitter has to communicate actively with its neighbours. As a matter of fact, it needs two actions: one from the emitter entity, which has to send data to the receiver, and one from the receiver entity, which has to look up the newly arrived message and process it.
3.6.2. Pull Pulling information from another entity means reading its state, or at least a part of it. During this action, only the receiver entity needs to participate actively in data exchange. As a matter of fact, the emitter entity is not required to perform any operation.
The main disadvantage of this method is that the receiver, the entity that wants to query another one, does not have direct references to its neighbours, nor does it know their. So it seems to be evident that, in order to acquire references to the entities which surround it, the receiver entity has to post a query to every other machine, asking for the content of all the cells in its neighbourhood.
Different query strategies have been considered:
3.6.2.1. Cell by cell This is the first and simplest strategy. The receiver entity defines a group of cells whose content it wants to get hold of. It then loops trough this list and makes for each entry it its local communication service query for each entry all other participating slaves.
This method is really slow because a new query has to be emitted for each selected cell. It can be improved by sending a pair of coordinates to each slave and retrieve a single result set and, more importantly, by emitting the definition of a group of cells. Thus the entire result is obtained in one go.
3.6.2.2. By rectangular zone With this technique, the requesting entity queries its neighbouring cells only once by transmitting the definition of a rectangular zone of the field. Hence it transmits coordinates of the top left point as well as the width and the height of the zone. It then receives result sets from each participating slave and merges them into one array. The latter includes all relevant information.
This method speeds up the communication part because only one request is made and only one, more or less big, result set is retrieved. As a matter of fact, a lot of small queries take much more time than a big one.
PructicuI Truininq Report
Page 19 of 35
3.6.2.3. By circular zone with arc This strategy is based on the fact that in real life an entity normally is not able to capture everything that happens around it. As described by [JLP04], the view of an entity is defined by a radius and an angle.
angle distance
Figure 8: Circular view scheme
The query method is the same as the one for the method described above, hence its efficiency is roughly the same as well. Of course calculations have to be mapped onto discrete values; as can be seen on the figure above, the covered zone is somewhat angular.
Figure 9: View by circular zone with arc (discrete values)
PructicuI Truininq Report
Page 20 of 35
3.7. Behaviour How do entities live and how do they act? The following few lines will try to give a quick overview of the entity behaviour model I set up.
As explained in the previous chapter, each entity can receive messages from others, so the first thing that an entity does during its cycle is to read out these messages and process them. Depending on the content of the message the entity may decide to initiate an action, as for example when it receives a "KILL"- message, it dies.
Its next step is to scan its environment for nearby entities. The nature as well as the state of a neighbour determines whether a neighbour attracts or repulses the entity. The calculation of the final direction and speed of the entity is based on vector-calculations as described by [REY01]. Of course I had to tweak attraction and repulsion coefficients according the considered entity. For example, a prey is repulsed more strongly by a predator than by an obstacle.
Another task of an entity, for example the predator entities, is to kill other ones. For this special example, a message is sent to the concerning entity. Other scenarios can be considered, as for example a moving entity, which informs its neighbours where it will move next or what are its goals or needs.
PructicuI Truininq Report
Page 21 of 35
4. INTERFACE TO AntCO 2
This part describes how DEDIS and the AntCO 2 project of Antoine DUTOT 3
interact. It can be seen as the glue that binds together our projects.
4.1. Interaction As said in [BDG03], AntCO 2 aims to provide some kind of load-balancing for dynamic communication-based graphs. Thus the idea arises to couple DEDIS with AntCO 2 , in such in way that both applications remain independant up to a certain degree. Hence DEDIS will host the main simulation whereas AntCO 2 's goal is to provide a dynamic suggestion on how and where to migrate entities. In other words, AntCO 2 offers a service to DEDIS.
4.2. Communication 4.2.1. General concept As AntCO 2 is accessed by DEDIS only as a service, the communication between both parts is unidirectional. This means that DEDIS can launch request to AntCO 2 , but the contrary is not permitted.
DEDIS Master DEDIS Slave AntCO2 {1} {1..*} {1}
Figure 10: Communication graph with AntCO 2 The figure above shows in what direction the communication takes place, but it hides the fact that a given slave is communicating with other slaves at the same time and that AntCO 2 may be distributed. As a matter of fact, a single AntCO 2 part might manage different slaves.
We use the ProActive API 4 as the communication layer between any distributed parts. It allows us to synchronize AntCO 2 easily with the slaves and gain a maximum of transparency between the two applications.
Viewed from the side of the application layer, each DEDIS slave has to inform AntCO 2 about what happens in the simulation and must hold a remote interface to an AntCO 2 part.
As can be observed on the figure below, the DEDIS application forms a complete graph. This means that every node knows all other nodes and has to communicate with them for example if an entity wants to know its neighbours, its slave will query all other slaves in order to satisfy the request.
Figure 11: Communication scheme between AntCO 2 and DEDIS
As for AntCO 2 , the final architecture has not yet been set up. Nevertheless, the most generic way of looking at it, is to consider AntCO 2 as distributed in some way and as connected to it through ProActive. This approach ensures that communication is completely independent of AntCO 2 s later architecture.
Using ProActive also offers the advantage that it leaves the infrastructure used by DEDIS and AntCO 2 independent. As shown on the figure above, there may be, for example, 7 DEDIS slaves and only 3 AntCO 2 parts. Again, as each part is distributed, this does not fix which part runs on which machine.
PructicuI Truininq Report
Page 23 of 35
4.2.2. Scenario A possible communication scenario between DEDIS and AntCO 2 could look like this:
1. AntCO 2 starts up with an empty graph and registers itself in the RMI registry.
2. DEDIS Master starts up, registers it self in the RMI registry and connects to the AntCO2 master part.
3. Each time a DEDIS slave in launched and registers with the DEDIS master, the latter adds a color to AntCO 2 and passes the returned URI to the slave. The slave then has to establish a connection to the given AntCO 2 part.
4. The simulation is initialized and started. The following events might happen:
An entity delegates a migration suggestion.
An entity decides to migrate to another node.
A new entity is born.
An active entity dies and disappears.
...
Each of these events needs to communicate with the node's relative AntCO 2 part, either to inform it about changes or to query it for a new color suggestion.
PructicuI Truininq Report
Page 24 of 35
4.2.3. Mutual blocking problem When moving a node from one slave to another, AntCO 2 must be informed of this movement. In order to maintain a minimum of synchronisation between DEDIS and AntCO 2 , the migration of the node on AntCO 2 must terminated before the DEDIS slave continues its work.
If AntCO 2 migrates the given node faster than DEDIS, no problem will arise, but if the migration on AntCO 2 is slower than the one on DEDIS, the latter has to wait for AntCO 2 .
In order to solve this problem, we use ProActive's support for asynchronous method invocation through future objects [PRO04, p. 16]. By changing the method "moveNodeTo" to return a result-object and by making the DEDIS slave access it, the latter will block until AntCO 2 has finished its migration.
4.2. Interface AntCO 2 has to offer two distinct interfaces to the outside. The first one is a general administrative interface, named "InterfaceAntCO2Master", through which colors can be added, removed or updated. The "weight parameter quantifies the computational power of a node.
The second interface, which is tied directly to a certain color, allows the adding and removing of nodes. These can also be moved or connected to other nodes. The "suggestColor" method queries AntCO 2 s state of a given node. Beside the suggested color, the result contains a kind of trust-index which is a self- evaluation of AntCO 2 s color suggestion. This is needed because it is possible that a node may switch permanently between two colors. This "blinking is then detected by AntCO 2 and decreases the returned trust-index on which the caller entity will base its decision whether it should migrate or not. Indeed, AntCO 2
only makes a proposal and the considered entity has no obligation to follow it.
This chapter will describe the different results obtained during simulation runs.
5.1. Scalability During the setup of the distributed simulation environment, different performance tests were executed in order to study the overall behaviour of the simulation as a function of the number of entities on each machine.
Early experiments consisted of simulation runs on a single machine with fixed parameters (meaning a fixed number of entities), the same execution environment (the simulation was considered as the only really active process on the machine), the same kind of entities, and an identical field size. As this experiment was running on a single machine, there was no need to take into account external communication. Fixing all these parameters allowed the testing of different communication independent algorithms such as for example the entity scheduling and execution algorithm on each slave.
Each time acceptable results where obtained for a given set of parameters, the number of entities on the machine were increased in order to perceive how the behaviour of the machine was evolving.
In a second phase, the tests where enlarged to more than one machine, thereby introducing network communications. The rest of the parameters were held constant, except for the number of entities per machines. Unfortunately, the first results were very poor. Simulation runs were extremely slow and, as the number of entities rose, the time it took to simulate a given number of cycles grew exponentially. Therefore I introduced some caching mechanisms for repeated requests and allowed the different slaves to intercommunicate better.
The next figure shows how the time response of the simulation evolves if the number of participating machines as well as the environment size is constant and only the number of entities change. For this experiment I used as master machine a Pentium IV M 1,8GHz and five Pentiums IV 2,4Ghz slave machines, each equipped with 512MB of RAM. The machines were connected to a 100MBit switched Ethernet network.
Only one kind of entities was used for this simulation. They were designed to consider all neighbours in a range of 20 cells and then to move towards the closest. Although this implies a huge number of communication with surrounding entities, the saturation of the network did not surpass 5%.
PructicuI Truininq Report
Page 27 of 35
As can be observed, the behaviour is likely to be linear. It is also worthy to notice that, as only the number of entities per machine is drawn on the X-axes, the absolute difference between two steps is 500 entities (because there are five participating slaves). 500x500, 100 cycles, log 0 10000 20000 30000 40000 50000 60000 0 100 200 300 400 500 600 700 800 900 1000 1100 entites per machine t
( m s ) run 1 run 2
Figure 15: Simulation time response versus number of entities per machine
At the time I performed these runs, the simulation contained two type of entities: the predators and the preys. Both of them were able to see other entities within a certain radius. The preys were trying to form groups with other preys and to run away from predators. The latter, of course, were chasing the preys. Because I wanted to take measures in constant conditions during the experiment runs, predators were not allowed to eat the caught preys.
PructicuI Truininq Report
Page 28 of 35
5.2. Entities The next stage consisted of analysing more closely the behaviour of the entities, especially emerging group effects. For this, I set up different rules, for example the moving toward entities of the same type, the moving away from enemies or the reaching of a certain location.
5.2.1. Grouping Based on Reynolds [RAY87] "boids", as described in [JLP04] I tried to obtain first of all some grouping effect. Depending on several parameters, as for example the repulsion coefficient, the attraction coefficient or the view angle and distance, the grouping effect is stronger.
Figure 16: Emerging groups
5.2.2. Ungrouping At the next stage, I added more entities and slightly changed the different parameters and rules. Then, hoping to see some prey groups splitting and rejoining, I also let out some predators as well.
PructicuI Truininq Report
Page 29 of 35
Figure 17: Group splitting
On the above figure one can see a predator (the big spot) approaching a group. As it moves closer to the group, the preys try to. This leads to a division of the group into two.
5.2.3. Obstacles Next, I introduced obstacles to the simulation. These are entities like the predators and the preys, but they do not move. Obstacles are thus passive entities which are influencing others through their presence.
The figure above shows two obstacles surrounded by a group of preys. The major difficulty was to tweak the repulsion coefficient in such way that the entities did not run against the obstacle but were allowed to enclose it.
Figure 18: Entities around an obstacle
PructicuI Truininq Report
Page 30 of 35
5.2.4. Inactive entities An obstacle is modelled as being an ensemble of distinct passive entities. Thus the total number of entities in a simulation grows very fast when a lot of obstacles are added. The latter may not need any processing power; therefore I decided to give every entity a flag which tells the simulator if it is active of passive. Depending on this flag the entity is executed or not. Hence the overall performance of simulations with multiple obstacles increased.
Although this works well enough, I am not satisfied with this solution. Rather than defining an obstacle as an ensemble of entities, I would prefer to give it a fixed size. This would imply different changes to the global entity model. For example, a single entity would be allowed to occupy more than one cell. Going this way, rectangle obstacles can be defined as single entities with a width and a height attribute. More generally, any specialized obstacle can be set up by implementing a certain "ObstacleInterface" in order to make it compatible with the rest of the simulation.
5.3. Communication graphs Each simulation run writes down its log files. These log files contain information about the position of an entity and about the communication top other entities for each cycle. For convenience and compatibility I chose the XML format, which also presents the advantage that the log files remain easily readable.
In order to test AntCO 2 , the log files had to be transformed into dynamic graphs. After this step, two different tests were realized: the first one introduces the dynamic graphs into the AntCO 2 's graph visualizer and the second one puts the same dynamic graph into the ant analyser.
PructicuI Truininq Report
Page 31 of 35
Figure 19: DEDIS post-simulation visualization
The above figure (Figure 19) shows the effective positions of a given test simulation run. The import of the associated dynamic graph inside AntCO 2 's graph visualizer can be seen on the figure below (Figure 20). The left image represents the initial situation, whereas the right one shows the positions, respectively the communications, of step 92.
Figure 20: AntCO 2 graph visualizer
One can easily see that the left-hand side figure is constituted of a lot of isolated entities. However, later on they form different clusters with heavy communication connections.
PructicuI Truininq Report
Page 32 of 35
On the next figure, ants are let out to populate this dynamic graph and detect the different clusters. On the figure beneath, there are four different ant populations. Some clusters are colonized by two or more different populations. This is due to the agitated nature of the dynamic graph, which comes originally from the simulation. Thus clusters join und split very fast.
These analyses are somewhat "static". In fact, the ant's movements are synchronized with the dynamic changes of the graph. This means that a calculation phase for the ants is followed immediately by a simulation cycle which is expressed by a set of changes that occur inside the graph.
Once DEDIS and AntCO 2 runs in an active communication mode, meaning that AntCO 2 will analyze the simulation's communication graph in real-time, both parts will be decoupled and will be no longer synchronous. This implies that ants might move faster or slower, the first being a positive result, the second a negative one.
PructicuI Truininq Report
Page 33 of 35
6. CONCLUSION
6.1. The project This project has been very fascinating and rewarding. The dynamic side, the distribution of the entities during the simulation, was especially challenging. Even during harder periods, when dealing with the distribution of the simulation itself and with a lot of node inter-blocking problems, it allowed me to develop my personal skills.
But other parts of the project turned in a kind of adventure, such as doing a lot of research and trying out many possibilities until a way was found to make it operational and coherent. This concerns above all the modelling of the entities. Incoherent, uncontrolled or unpredictable movements were just some of the problems encountered. It was not always easy to find a solution.
Whichever way I look at it, this was a really interesting and fascinating project.
6.2. General conclusion I am very satisfied with the experience gained during this practical training. I was given the possibility to work hand in hand with other researchers inside a laboratory, learning about essentials in research.
Working in a laboratory hand in hand with other researchers, gave me the opportunity to learn about essential research techniques. I was able to fill gaps concerning document research and project interfacing. Furthermore, I was shown once again how important good communication between group members working on related projects is.
Last but not least, I want to mention how pleased I was to work on the given project. Nevertheless I know that theory is the key to every project and I have to admit that I am not a great theorist, which is why the implementation part, and thus the fact to have achieved something, was very important to me. Encountering concrete problems and difficulties motivated me the most.
PructicuI Truininq Report
Page 34 of 35
7. REFERENCES & ACRONYMS
7.1. Acronyms API Application Programmer Interface HTTP HyperText Transport/Transfer Protocol LAN Local Area Network RPC Remote Procedure Call RMI Remote Method Invocation SQL Structured Query Language WWW World Wide Web XML eXtensible Markup Language
7.2. Links Universitof le Havre http://www.univ-lehavre.fr
Laboratory of Informatics http://www-lih.univ-lehavre.fr
ProActive API http://www-sop.inria.fr/oasis/ProActive
7.3. References [BDG03] BERTELLE Cyrille, DUTOT Antoine, GUINAND Frdric, OLIVIER Damien, In DOA 2003 International Symposium on Distributed Objects and Applications, Catania (Sicile) October 2003, [online] http://www-lih.univ-lehavre.fr/~dutot/Papers/doa2003.pdf
[BDG04] BERTELLE Cyrille, DUTOT Antoine, GUINAND Frdric, OLIVIER Damien, Distribution of Agent Based Simulation with Colored Ant Algorithm, In ESS 2002 European Simulation Symposium, Pages 39- 43, Dresden (Germany) October 2002, [online] http://www-lih.univ- lehavre.fr/~dutot/Papers/ess2002.ps
Aspects of the Dialogical Self: Extended proceedings of a symposium on the Second International Conference on the Dialogical Self (Ghent, Oct. 2002), including psycholonguistical, conversational, and educational contributions