Evolutionary Design of Experiments Using The MapReduce Framework

Evolutionary Design of Experiments using the MapReduce Framework
James Decraene, Fanchao Zeng, Malcolm Yoke Hean Low, Wentong Cai School of Computer Engineering Nanyang Technological University 50 Nanyang Avenue, Singapore 639798
{jdecraene,fczeng,yhlow,aswtcai}@ntu.edu.sg
Yong Yong Cheng, Chwee Seng Choo Operations Research Laboratory DSO National Laboratories 20 Science Park Drive, Singapore 118230
{yyongche,cchweese}@dso.org.sg
Keywords: Design of experiments, agent-based simulations, evolutionary computation, mapreduce, cloud computing
Abstract
We examine cloud computing, using the MapReduce framework, to assist the evolutionary design of experiments method (EvoDOE). Cloud computing has recently attracted considerable attention due to the massive and scalable computational resources it can deliver. These features may potentially benet EvoDOE, a highly computationally intensive methodology in which many computer simulations are generated and evaluated using evolutionary computation techniques. To assist this research, we implement a selection of distributed evolutionary computation techniques using the MapReduce framework. The aim of this paper is to identify the evolutionary computing model which may most efciently exploit cloud computing for EvoDOE. Multiple series of experiments are conducted using a case study from the military domain. Specically, red teaming experiments using an agent-based simulation of a maritime anchorage protection scenario are performed.
1.
INTRODUCTION
Cloud computing [22] has recently attracted considerable attention due to the numerous potential benets it may offer: Cost reduction is possible as no infrastructure has to be acquired upfront, instead, the resources are purchased from a third-party provider on a utility basis. Scalability is achieved through dynamic and on-demand provisioning of computational infrastructures in real-time. Other advantages include high availability and fault tolerance. Finally, the computational resources are delivered via the Internet where end-users are relieved from the software and hardware implementation. The above features are signicantly valuable for the scientic community where computational requirements for experimental purposes are rapidly growing. In this paper, we examine how cloud computing can be best exploited in the evolutionary design of experiments (EvoDOE). EvoDOE is a recent design of experiments method which utilizes evolutionary algorithms to drive the experimental design process. This technique is most adequate when dealing with com-
plex systems (e.g. real biochemical systems, agent-based simulations) characterized with non-linear dynamics and highdimensionality. EvoDOE is also utilized as an optimization method where the optimized system is often tackled as a black-box; this is sometimes due to practical constraints where real-time system information cannot be accessed/modied or understood due to the system complexity. EvoDOE has been successfully applied to conduct both invivo and in-silico experiments [21, 3]. To assist this research, we implement a selection of distributed evolutionary computation models using the Hadoop MapReduce framework [6]. The selected distributed evolutionary algorithms are adapted to comply with the programming model of MapReduce (the model was inspired by the Map and Reduce functions originating from functional programming). To evaluate our MapReduce evolutionary algorithms, we consider a military red teaming case study which has been previously studied in [25, 18, 26, 8]. In this case study a maritime anchorage protection scenario is utilized and modelled using the agent-based simulation platform MANA [16]. The paper is organized as follows: Some background material on the EvoDOE method and studies combining both MapReduce and evolutionary computation techniques are rst briey presented. Our evolutionary framework and MapReduce implementations of the different algorithms are then described. Then a series of experiments is analysed and discussed.
2.
EVOLUTIONARY DESIGN OF EXPERIMENTS
When examining stochastic complex systems under practical constraints (i.e. time, nancial/computational resources, etc.), traditional design of experiments techniques such as the Near Orthogonal Latin Hypercube design may be limited to gain sufcient insights [3]. It was recently proposed to further reduce the search space of possible designs through the use of evolutionary computation techniques. In this approach, the experimenter rst devises a set of objectives (or system behaviour of interest). These (typically quantitative) objectives are then used by an evolutionary algorithm which through an
iterative process, successively evaluates, selects and generates new sets of candidate designs (i.e. solutions). This evolutionary process is reiterated until the objectives are met. For instance, in biochemical research, such a method was utilized for the design of drug molecules [3]. A fully automated system was built where the experiments were designed and evaluated by evolutionary algorithms. Robotic manipulators were utilized to conduct the wetlab experiments. In this paper, we employ EvoDOE for Computational Red Teaming (CRT). CRT is a computer simulation based approach employed to identify the potential weaknesses of military operational plans. In CRT, military agent-based simulation models are evolved to exhibit pre-specied/desirable output behaviours (i.e., when Red defeats Blue - hence Red Teaming). Agent behavioural parameters (e.g., troop clustering/cohesion, response to injured teammates, aggressiveness, stealthiness, etc.) are evolved to optimize the Red agents collective efciency (e.g., maximize damage to target facilities) against the Blue team. Example studies include: [13, 27, 4, 8].
Figure 1. The MapReduce model claimed that, as devised, the MapReduce model cannot directly support the implementation of parallel genetic algorithms. As a result, MapReduce was extended and included an additional Reduce process. The iterative cycle is as follows. During the Map phase, multiple instances of the genetic algorithms are executed in parallel. The local optimal solutions of each population are collected during the rst Reduce phase. An additional collection and sorting of the local optimal solutions is conducted during the second Reduce phase. The resulting set of global optimal solutions is then utilized to initiate the next generation. Llora et al. [17] presented a different approach where several evolutionary algorithms were adapted to support the MapReduce model (in contrast with Jin et al. who adapted the MapReduce model and not the evolutionary algorithm itself). The parallelization of the evolutionary algorithms was here conducted using a decentralized and distributed selection approach [5]. This method avoided the requirement of a second Reduce process (i.e., a single selection operation is conducted over the aggregation of the different pools of solutions). Note that in contrast with Jin et al. and Llora et al.s approaches, the objective function is here the simulation of stochastic agent-based models. The resolution (i.e., level of abstraction) of the simulations is the key factor (i.e., the bulk of the work) determining the computational requirements of the evolutionary experiments. The above studies provide guidance for translating evolutionary algorithms for MapReduce operations. The approach proposed by Llora et al. was examined in [10]. In this paper, we further evaluate evolutionary algorithms using MapReduce through examining different distributed evolutionary computational models. The evoluationy framework and MapReduce evolutionary computational models are presented in the next section.
3.
MAPREDUCE AND EVOLUTION COMPUTATION
One of the core technologies underlying cloud computing, enabling the benets outlined earlier, is the MapReduce programming model [6]. This model is composed of two distinct phases (Fig. 1): Map: The input data is partitioned into subsets (<key, value> pairs) and distributed across multiple compute nodes. The data subsets are processed in parallel by the different nodes. A set of intermediate les results from the Map phase and is processed during the Reduce phase. Reduce: Multiple compute nodes process the intermediate les which are then collated (sorted using the key values) to produce the output les. Similarly to the Map processes, the Reduce operations are distributed (and executed in parallel) over multiple compute nodes. The relative simplicity of the MapReduce programming model facilitates the efcient parallel distribution of computationally expensive jobs. This parallelism also enables the recovery from failure during the operations (this is particularly relevant when considering a distributed environment where some nodes may fail during a run). Map/Reduce operations may be replicated (if an operation fails, its replication is then retrieved). Also, failed operations may automatically be rescheduled. These fault-tolerant features are inherent properties of cloud computing frameworks such as the Apache Hadoop. Thus the user is not required to handle such issues. Recent studies have combined evolutionary computation and the MapReduce programming model. In [14], Jin et al.
4.
METHOD
This sections rst briey describes the evolutionary framework, then a selection of distributed evolutionary algorithm models are presented.
4.1.
The Evolutionary Framework
4.2.
MapReduce Implementations
A brief description of the evolutionary framework, coined CASE (complex adaptive systems evolver), is provided in this section. This framework was also described and evaluated (against additional system features such as optimization under constraint, multi-objective optimization and cloud computing) in [11, 10, 9]. CASE is composed of three main components which are distinguished as follows: 1. The model generator: This component takes as inputs a base simulation model specied in the eXtended Markup Language and a set of model specication text les. According to these inputs, new XML simulation models are generated and sent to the simulation engine for evaluation. Thus, as currently devised, only simulation models specied in XML are supported. Moreover, the model generator may consider constraints over the evolvable parameters (this feature is optional). These constraints are specied in a text le by the user. These constraints (due for instance to interactions between evolvable simulation parameters) aim at increasing the plausibility of generated simulation models (e.g., through introducing cost trade-off for specic parameter values). 2. The simulation engine: The set of XML simulation models is received and executed by the stochastic simulation engine. Each simulation model is replicated a number of times to account for statistical uctuations. A set of result les detailing the outcomes of the simulations (in the form of numerical values for instance) are generated. These measurements are used to evaluate the generated models, i.e., these gures are the tness (or cost) values utilized by the evolutionary algorithm (EA) to direct the search. 3. The evolutionary algorithm: The set of simulation results and associated model specication les are received by the evolutionary algorithm, which in turns, processes the results and produce a new generation of model specication les. The generation of these new model specications is driven by the user-specied (multi)objectives (e.g., maximize/minimize some quantitative values capturing the target system behaviour). The algorithm iteratively generates models which would incrementally, through the evolutionary search, best exhibit the desired outcome behaviour. The model specication les are sent back to the model generator; this completes the search iteration. This component is the key module responsible for the automated analysis and modelling of simulations. Further details about the input les settings can be found in [9]. Finally, a demonstration video of CASE can be visualized at http://www.youtube.com/watch?v= d2Day_MEruc.
Two models are considered for the parallelization of evolutionary algorithms using the MapReduce framework:
4.2.1.
Distributed model
In this model, the execution of simulation models is distributed over the Map tasks. A single Reduce task is conducted to retrieve and collate the simulation results. During the Reduce phase, the evolutionary selection, recombination and generation of solutions is also conducted (Fig. 2). Since evolutionary algorithms (EAs) are iterative (generational) search methods, the scalability of this distributed model is limited by the EA population size. Previous studies [10] also suggested that due to some Hadoop and network overheads, this scalability is also dependent on the complexity of the simulation models: When executing ne-grained jobs, little gain (in terms of running time requirements) would be achieved when using a large number of computers.
Figure 2. Distributed MapReduce
Evolutionary
Algorithm
using
Each search iteration (generation) spawns a distinct MapReduce job which itself launches a number of Map tasks. Each Map task is assigned with the execution of a pre-dened number of simulation models. In this iterative model, the output data resulting from the Reduce task is supplied as the input data to the next MapReduce job. A selection of modern EAs are individually investigated using the distributed model: Non-dominated Sorting Genetic Algorithm II (NSGAII) [7], Strength Pareto Evolutionary Algorithm 2 (SPEA2) [29], Hypervolume Estimation Algo. for MO Optimization (HypE) [2], Multi-objective Differential Evolution (MODE) [15] and Sampling-based HyperVolumeoriented Algorithm (SHV) [1]. The EA parameter settings are set according to the most commonly used parameters values as reported in the literature when addressing two-objective optimization problems. Note that these parameter values may not reect the optimal setting when considering the case study described in Section 5.1.
4.2.2. Island-based Model The island-based model [23] is a distributed evolutionary computation technique in which multiple islands execute, in parallel, a distinct EA instance and maintain their own population of solutions. These islands periodically exchange, between each other, a selection of their solutions during a process called migration. In contrast with the distributed model, the scalability of the island-based model is not limited by the EA population size as multiple populations of candidate solutions can be evolved simultaneously. Nevertheless, increasing the global population size may also affect (slowing down) the search convergence time [5]. Also, the island-based model introduces additional parameters to set the migration process: 1) Interval: The number of generations/search iterations between migrations, 2) Size: The number of migrating solutions, 3) Policy: The selection of migrating (from the source island) and displaced (at the destination island) solutions, and 4) Topology: The selection/direction of exchanging islands. These parameters may require ne-tuning to optimize the efciency of the islandbased model.
different sets of values for specic NSGAII parameters (crossover rates and index, mutation probability and index). This may relieve the user from having to tune the EA. Specically, two sets of parameter values are used to promote two types of search behaviour: exploitation and exploration. A similar study [19] was applied using the different EAs. Multiple EAs: This method is relatively similar to the previous one as an ensemble of different EAs are employed over the different islands. According to [24], no single EA may consistently outperform all other EAs. Thus an alternative here is to employ multiple EAs, each of which is executed on distinct islands. The strengths of the different EAs may then be combined to provide consistent competitive results. Such techniques were investigated in [20]. Search space partition: Another island-based strategy is to partition the search space. Each island focuses on a pre-dened region of the search space (the entire search space is thus explored collectively by the different islands). This may potentially simplify the search (through reducing the search space dimensionality) process for each EA instance. A related study can be found in [12]. A ring topology is utilized in all island-based model experiments. The next section reports the model utilized and the experimental results using the above MapReduce evolutionary computational models.
5.
Figure 3. Island-based Evolutionary Algorithm using MapReduce. The number of islands is denoted by m. Each island has a single Reduce task which collects simulation ren sults from m Map tasks. In our MapReduce implementation (Fig. 3), islands are distinguished by the Reduce tasks. The Reduce tasks collect the island-specic solutions (sorted by k, see Fig. 1) and execute the evolutionary selection and recombination processes. According to the migration settings, the Reduce tasks can also select a subset of solutions that will be sent to a neighbouring island. In addition to exploring different migration settings, we also examine particular EA congurations which are motivated as follows: Multiple EA parameter settings: NSGAII is examined when multiple parameter settings are employed on different islands. Parameter-tuning is a time-consuming effort which may affect the EA performances. The islandbased model is here employed to examine, in parallel,
EXPERIMENTS
This section rst presents the red teaming case study, then the experimental results are reported and discussed.
5.1.
Model
The scenario (Fig. 4) describes a military operation that consists of 5 Red (enemy) ships, 7 Blue (friendly) ships and 10 Green (neutral) ship. Each Green ship is a trading ship and a high-value target for the Red ships. The Blue ships protect the Green ships against Red attacks. The goal of the evolutionary search is to discover Red tactics, which can destroy most Green ships, whilst allow Red ships to escape (i.e. Red is not suicidal). In CASE, each candidate solution (a distinct simulation model) is represented by a vector of real values dening the different evolvable Red behavioural parameters (Table 2). As the number of decision variables increases, the search space becomes dramatically larger. Behavioural or psychological elements are included in the evolvable decision variables. The aggressiveness determines the reaction of individual vessels upon detecting an
Parameter Vessel home position(x,y) Intermediate waypoint position (x,y) Vessel nal position (x,y) Determination Aggressiveness Cohesiveness
Min (0,0) (0,40) (0,160) 20 -100 -100
Max (399,39) (399,159) (399,199) 100 100 100
Table 2. Evolvable Red parameters. The Red routes (dened by series of waypoints) are subjected to evolution. The nal positions of the Red craft is constrained to the opposite region (with respect to the initial area) to simulate escapes from the anchorage following successful attacks. the true Pareto front is here unknown. In the next section we report the experimental results using the above model.
Figure 4. Maritime anchorage protection scenario modelled using the agent-based simulation platform MANA. The map covers an area of 100 by 50 nautical miles (1 nm = 1.852km). 7 Blue vessels conduct patrols to protect an anchorage of 10 Green commercial ships against Red ships. The Red ships intend to break through Blues defence, inict damages to the anchored Green ships and nally, to escape to the Red safety area. Fixed agent behavioural parameters are listed in Table 1.
(a) Fixed Blue parameters
Parameter Detection range (nm) # hits to be killed Weapon hit prob. # patrolling agents Speed (unit) Weapon range (nm) Determination Aggressiveness Cohesiveness Value 24 2 0.8 7 100 8 500 0100 0
5.2.
Results
(b) Fixed Red parameters

Parameter Detection range (nm) # hits to be killed Weapon hit prob. # agents Speed (unit) Weapon range (nm) Value 8 1 0.8 5 100 5
Table 1. (a): Fixed Blue parameters. Value pairs are specied for the determination and aggressiveness properties. In this model, Blue changes its behaviour upon detecting Red, i.e., Blue targets Red, with aggressiveness being increased, when the latter is within Blues detection range. (b): Fixed Red parameters. The behavioural parameters are not specied as these parameters are subjected to evolution.
The experiments were performed using a 37-node Apache Hadoop cluster. The master node is a dedicated physical computer, and the other 36 slave nodes are made up of 30 nondedicated physical computers and 6 dedicated virtual machines. Since most of the slave nodes are non-dedicated laboratory computers, the experiments can be affected by the nodes that are, occasionally, being utilized by other users. This accounts for the hazards that are unavoidable in a distributed heterogeneous computing environment. We deliberately did not use a dedicated computing environment to test the fault tolerant features of the Apache Hadoop framework. To evaluate the quality of the (multi-objective) solutions through the evolutionary search, the hypervolume indicator [28] is utilized. This method is currently considered as the state of the art technique to evaluate Pareto fronts. This indicator measures the size of the objective space subset dominated by the Pareto front approximation. A hundred search iterations were conducted in all evolutionary runs. A single Map task may include up to 20 simulation models for execution. The experimental settings and results are summarized in Table 3. The following observations are made: We can observe that some series of experiments (i.e. S8 , S9 , S14 ) clearly exhibited weaker performances when considering their nal mean hypervolume indicator value. Then with regards to the other series, a hypervolume indicator values averaging at -31 with a standard deviation of 2 can be noted. However, these series achieved such Pareto optimality performances with different realtime requirements. Indeed, series S3 , S4 , S5 , S6 and S7 took almost 5 hours to complete. This is naturally due to the signicantly larger population size of 500 used in these experiments. Nevertheless the results of these series are comparable with the results of S1 and S2 which
adversary (from coward to very aggressive). Cohesiveness inuences the propensity of vessels to maneuver as a group or not (from independent to very cohesive), whereas determination stands for the agents willingness to follow the dened routes (go to next waypoint with a minimum value of 20 to prevent inaction from occurring). The efciency of the search is measured by the number of Green casualties with respect to the number of Red casualties. In other words, the search objectives are: 1)To minimize the number of Green (commercial) vessels alive and 2) To minimize the number of Red casualties. Considering the current scenario, these objectives are thus conicting. Moreover,
Series S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20
Evolutionary algorithm(s) NSGAII HypE MODE NSGAII SHV HypE SPEA2 NSGAII NSGAII MODE, NSGAII, SHV, HypE, SPEA2 MODE, NSGAII, SHV, HypE, SPEA2 MODE, NSGAII, SHV, HypE, SPEA2 MODE, NSGAII, SHV, HypE, SPEA2 NSGAIIp1 , NSGAIIp2 NSGAIIp1 , NSGAIIp3 NSGAII, HypE NSGAII, HypE NSGAII, HypE NSGAII, HypE NSGAIISP
PS 200 200 500 500 500 500 500 100 100 100 100 100 100 100 100 100 100 100 100 100
Im 1 1 1 1 1 1 1 2 4 5 5 5 5 2 2 2 2 2 2 2
MI N/A N/A N/A N/A N/A N/A N/A 5 5 5 10 5 10 5 5 5 5 10 10 5
MS N/A N/A N/A N/A N/A N/A N/A 5 5 5 5 10 10 5 5 5 10 5 10 5
Mean std. dev. -30.9626 1.9704 -32.9987 1.7614 -29.5069 1.6910 -32.7307 1.6672 -30.0603 2.9022 -32.4202 2.0611 -31.7765 2.4230 -26.3269 1.7694 -18.8606 1.7622 -33.5938 1.4767 -33.0039 0.2235 -30.1750 0.8312 -29.4991 1.0213 -26.8779 2.4649 -32.1435 0.6754 -31.8141 2.0893 -31.7359 1.7304 -33.0292 1.7969 -32.0045 2.0890 -31.3845 2.9795
Best -31.1959 -34.1106 -32.4411 -32.9662 -31.6328 -36.2019 -36.4701 -28.0703 -19.4497 -34.6156 -33.2138 -30.9952 -30.1967 -29.5718 -32.6782 -34.6772 -32.4516 -33.8587 -32.6467 -33.4357
Ex. time 2:05:05 2:00:27 4:58:59 4:50:19 4:59:22 5:03:19 5:15:06 2:13:12 4:11:35 5:14:49 5:18:46 5:11:48 5:08:55 1:55:44 2:11:28 1:52:23 2:14:33 2:04:39 1:57:24 1:54:49
Fail. 0 10 0 0 0 11 6 2 1 0 5 0 0 0 11 0 13 5 1 0
Table 3. Summary of experimental settings and results averaged over 5 distinct evolutionary runs. PS is the population size. The number of island is denoted by Im , MI is the migration interval (in number of search iterations), MS is the migration size (number of exchanged solutions). An island-based model is employed when Im > 1 in which case PS stands for the population size per island. The mean indicates the hypervolume indicator value (lower is better), obtained at the nal search iteration and averaged over the 5 runs. Specic sets of parameter settings are distinguished for NSGAII: p1 = {xr = 0.9, xi = 20, mp = 0.1, mi = 20}, p2 = {xr = 0.3, xi = 20, mp = 0.8, mi = 20} and p3 = {xr = 0.3, xi = 20, mp = 0.8, mi = 5}, with xr , xi , mp and mi being the crossover rate, crossover index, mutation probability and mutation index respectively. SP identies the search partitioning experiment where only half of parameters were evolved on each island. Ex. time stands for the real wallclock time (averaged over 5 runs) required to complete a single evolutionary run. Fail. indicates the total number of computer failures (e.g. due to computer shutdown) that occurred during the 5 experimental series. The bold gures identify the absolute best results whereas the underlined ones discriminate the best results accounting for the running times. employed a population size of 200 (and took 2 hours to complete). This suggests that using a simple distributed model with a large population size would not deliver any improvements in terms of search speed convergence (some example and representative hypervolume indicator value dynamics can be observed in Fig. 5). Further experiments are required to determine the optimal (accounting for both Pareto optimality and experimental time requirements) population size for such distributed evolutionary algorithms. Although, S3 , S4 , S5 , S6 and S7 took a longer time to complete, one may also argue that these series may have achieved competitive results at an earlier stage. However when examining Fig. 5, we can observe that the (representative) series S6 did not exhibit any improved search speed convergence when compared with S1 and S2 . Similarly, series S10 , S11 , S12 and S13 which employed a multi-population/EA scheme achieved comparable results in terms of both Pareto optimality and running time requirements than series S3 , . . . , S7 . However we can note that, particularly for S11 and S12 , that the standard deviation (0.2235 and 0.8312 respectively) is signicantly lower than in any other evolutionary runs. This suggests that this scheme may potentially yield interesting benets in terms of consistency performance. This would increase the end-user condence in using such stochastic search algorithms. The results of the dual EA/population schemes (S16 , . . . , S19 ) were fairly consistent. This indicates that this model is, to some extent, quite insensitive to the different migration settings employed. Also note that a number of computer failures occurred in the some of the experimental series. Nevertheless, the Apache Hadoop framework successfully dealt with those failures and rescheduled the jobs accordingly, without affecting the overall running experimental time and simulation results. This demonstrates the fault tolerant features of the cloud computing paradigm.
Although the series S20 exhibited competitive results, the series achieved the worst standard deviation (2.9795). This suggest that this space partition scheme may not be as promising as some of the other approaches described above. Further exhaustive experiments remain necessary to investigate this particular island-based model.
-5 Hypervolume indicator value -10 -15 -20 -25 -30 -35 0 20 40 60 Generation 80 100 S1 S2 S6 S10 S15
Acknowledgements
We thank the Defence Research and Technology Ofce, Ministry of Defence, Singapore, for sponsoring the Evolutionary Computing Based Methodologies for Modeling, Simulation and Analysis project which is part of the Defense Innovative Research Programme FY08.
REFERENCES
[1] J. Bader, K. Deb, and E. Zitzler. Faster Hypervolumebased Search using Monte Carlo Sampling. Multiple Criteria Decision Making for Sustainable Energy and Transportation Systems, pages 313326, 2010. [2] J. Bader and E. Zitzler. A Hypervolume-Based Optimizer for High-Dimensional Objective Spaces. New Developments in Multiple Objective and Goal Programming, pages 3554, 2010. [3] J.N. Cawse, G. Gazzola, and N. Packard. Efcient Discovery and Optimization of Complex High-throughput Experiments. Catalysis Today, 159(1):5563, 2010. [4] C.S. Choo, C.L. Chua, and S.H.V. Tay. Automated Red Teaming: a Proposed Framework for Military Application. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pages 1936 1942. ACM, 2007. [5] K. De Jong and J. Sarma. On Decentralizing Selection Algorithms. In Proceedings of the Sixth International Conference on Genetic Algorithms, pages 1723, 1995. [6] J. Dean and S. Ghemawat. MapReduce: Simplied Data Processing on Large Clusters. Commun. ACM, 51:107 113, January 2008. [7] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGAII. IEEE Transactions on Evolutionary Computation, 6(2):182197, 2002. [8] J. Decraene, M. Chandramohan, M.Y.H. Low, and C.S. Choo. Evolvable Simulations Applied to Automated Red Teaming: A Preliminary Study. In Proceedings of the 42th Winter Simulation Conference, pages 1444 1455. ACM, 2010. [9] J. Decraene, M.Y.H. Low, F. Zeng, S. Zhou, and W. Cai. Automated Modeling and Analysis of Agent-based Simulations using the CASE Framework. In Proceedings of 11th International Conference on Control, Automation, Robotics and Vision (ICARCV), pages 346 351. IEEE, 2010.
Figure 5. Hypervolume indicator value (averaged over 5 distinct runs) dynamics for a selection (for clarity) of evolutionary experiments. Overall, the results suggest that the distributed model of series S2 (using HypE), and dual population schemes of series S15 , . . . , S19 are relatively promising in terms of Pareto optimality, performance consistency and experimental running time requirements. Although the results presented above are informative, further experimental work remain critically necessary. Indeed, this study only examined a single case study. Additional simulation models of differing complexity must be investigated to provide more signicant results. Nevertheless, conducting such empirical investigations is a highly time-consuming endeavour (the above experiments required, at least, between 10 and 25 hours for the evaluation of a single distributed evolutionary computation model using a 37 node cluster). Thus, future work would greatly benet from using very large scale cloud computing facilities.
6.
CONCLUSIONS
An initial empirical investigation on evolutionary design of experiments using the MapReduce framework was presented. Multiple distributed evolutionary computation models were adapted and implemented using MapReduce. The experiments examined an agent-based simulation model of a maritime anchorage protection scenario. The experimental results highlighted the potential weaknesses and strengths of the different distributed evolutionary models. Nevertheless, further exhaustive experiments are still required to gain condence in the results.
[10] J. Decraene, Y.C. Yong, M.Y.H. Low, S. Zhou, W Cai, and C.S. Choo. Evolving Agent-based Simulations in the Clouds. In Third International Workshop on Advanced Computational Intelligence (IWACI), pages 244249. IEEE, 2010. [11] J. Decraene, F. Zeng, M.Y.H. Low, S. Zhou, and W. Cai. Research Advances in Automated Red Teaming. In Proceedings of the 2010 Spring Simulation Multiconference (SpringSim), pages 47:147:8. ACM, 2010. [12] D. Gong and Y. Zhou. Multi-population Genetic Algorithms with Space Partition for Multi-objective Optimization Problems. International Journal of Computer Science and Network Security, 6(2A):5258, 2006. [13] A. Ilachinski. Articial war: Multiagent-based Simulation of Combat. World Scientic Pub Co Inc, 2004. [14] C. Jin, C. Vecchiola, and R. Buyya. MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms. In ESCIENCE 08: Proceedings of the 2008 Fourth IEEE International Conference on eScience, pages 214221, Washington, DC, USA, 2008. IEEE Computer Society. [15] C. Kwan, F. Yang, and C. Chang. A Differential Evolution Variant of NSGA II for Real World Multiobjective Optimization. In Proceedings of the 3rd Australian Conference on Progress in Articial Life, ACAL07, pages 345356, Berlin, Heidelberg, 2007. Springer-Verlag. [16] M. Lauren and R. Stephen. Map-aware Non-uniform Automata (MANA)-A New Zealand Approach to Scenario Modelling. Journal of Battleeld Technology, 5:2731, 2002. [17] X. Llora, A. Verma, R. Campbell, and D. Goldberg. When Huge Is Routine: Scaling Genetic Algorithms and Estimation of Distribution Algorithms via DataIntensive Computing. Parallel and Distributed Computational Intelligence, pages 1141, 2010. [18] M. Y. H. Low, M. Chandramohan, and C. S. Choo. Multi-Objective Bee Colony Optimization Algorithm to Automated Red Teaming. In Proceedings of the 41th Winter Simulation Conference, pages 17981808. ACM, 2009. [19] R. Mallipeddi, P.N. Suganthan, Q.K. Pan, and M.F. Tasgetiren. Differential Evolution Algorithm with Ensemble of Parameters and Mutation Strategies. Applied Soft Computing, 11(2):16791696, 2011.
[20] M.G. Tasgetiren, P.N. Suganthan, and Q.K. Pan. An Ensemble of Discrete Differential Evolution Algorithms for Solving the Generalized Traveling Salesman Problem. Applied Mathematics and Computation, 215(9):33563368, 2010. [21] J. Tate, B. Woolford-Lim, I. Bate, and X. Yao. Comparing Design of Experiments and Evolutionary Approaches to Multi-objective Optimisation of Sensornet Protocols. In IEEE Congress on Evolutionary Computation (CEC09)., pages 11371144. IEEE, 2009. [22] L.M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner. A Break in the Clouds: Towards a Cloud Denition. SIGCOMM Comput. Commun. Rev., 39:50 55, December 2008. [23] D. Whitley, R. Soraya, and R.B. Heckendorn. The Island Model Genetic Algorithm: On Separability, Population Size and Convergence. Journal of Computing and Information Technology, 7:3347, 1998. [24] D.H. Wolpert and W.G. Macready. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation, 1(1):6782, 1997. [25] A. C. H. Wong, C. L. Chua, Y. K. Lim, S. C. Kang, C. L. J. Teo, T. Lampe, P. Hingston, and B. Abbott. Team 1: Applying Automated Red Teaming in a Maritime Scenario. In Scythe 3: Proceedings and Bulletin of the International Data Farming Community, pages 35, 2007. [26] Y. L. Xu, M. Y. H. Low, and C. S. Choo. Enhancing Automated Red Teaming with Evolvable Simulation. In Proceedings of the rst ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pages 687694. ACM, 2009. [27] A. Yang, HA Abbass, and R. Sarker. Characterizing Warfare in Red Teaming. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 36(2):268285, 2006. [28] E. Zitzler, D. Brockhoff, and L. Thiele. The Hypervolume Indicator Revisited: On the Design of Paretocompliant Indicators Via Weighted Integration. In Proceedings of The 4th International Conference on Evolutionary Multi-criterion Optimization, Lecture notes in computer science, volume 4403, pages 862876. Springer, 2007. [29] E. Zitzler, M. Laumanns, L. Thiele, et al. SPEA2: Improving the Strength Pareto Evolutionary Algorithm. In Proceedings of EUROGEN 2001 - Evolutionary Methods for Design, Optimisation and Control with Applications to Industrial Problems, pages 95100, 2001.

Evolutionary Design of Experiments Using The MapReduce Framework

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evolutionary Design of Experiments Using The MapReduce Framework

Uploaded by

Copyright:

Available Formats

Evolutionary Design of Experiments using the MapReduce Framework

EVOLUTIONARY DESIGN OF EXPERIMENTS

MAPREDUCE AND EVOLUTION COMPUTATION

The Evolutionary Framework

Figure 2. Distributed MapReduce

Min (0,0) (0,40) (0,160) 20 -100 -100

Max (399,39) (399,159) (399,199) 100 100 100

(b) Fixed Red parameters

MI N/A N/A N/A N/A N/A N/A N/A 5 5 5 10 5 10 5 5 5 5 10 10 5

MS N/A N/A N/A N/A N/A N/A N/A 5 5 5 5 10 10 5 5 5 10 5 10 5

You might also like