Professional Documents
Culture Documents
0, 703822)
WS 2006/07
Seminar work
Seminar "Grid Computing 2" (SE 2.0, 703822) SIMULATING THE GRID
Abstract In the scientific world, with its increasing number of large scientific collaborations, requirements for computational power grow steadily. One feasible solution to provide the resources is Grid [2] computing. To analyze widespread grids in order to understand their large scale behavior and to optimize the use of resources is a crucial task for which simulations offer a low cost effective solution. In this seminar work we give an overview to the vast field of simulation in general and the simulation of grids and grid-like environments in particular. First, most important simulation techniques are highlighted and some of the numerous existing simulation tools are introduced. Next, related work in the grid simulation field is presented. We pick out GridSim - one of the freely available tools - to explain the handling and usage of a simulation tool by example. The detailed explanation includes installation, setup and working examples as well as a small experiment performed using our selected tool.
1. Introduction
Grids are complex, heterogeneous, disparate systems that span several sites and organizations. Numerous research groups from universities, research labs and industry are working with Grids and there are teaching programs which focus on Grid computing so it is a fact that Grids are nowadays widespread. Still their large-scale behavior is still poorly understood. Students often do not have access to Grid testbeds or to only small ones where large scale evaluation of scalability problems is not possible. Even with access to a large testbed there are still some points to consider: Using a real testbed would incur real cost, for example analyzing new models and algorithms requires a large number of tests involving as many resources as available. The real test bed may not provide a repeatable and controllable environment for specific hypothetical problems, for example the experiment may compete with regular Grid users for currently really available resources. By simulating a Grid environment it is possible to develop an understanding of the overall system beh5avior using different user and resource constraints. Hours of real job time can be simulated in seconds provided the simulation is having sufficient processor power. On a network level it is possible to find the bottlenecks and unused links, the dataflow between the resources and users can be analyzed without stressing and overloading actual infrastructure. In a business environment a simulation can help optimizing the cost of a workflow, in a more scientific approach a simulation can help to find out where to place resources (e.g. for caching) before they are needed, how to use existing resources in an optimal way and to test different scheduling, brokerage, policing and security strategies. Finally a simulation can answer the question if moving the computation of a problem to a Grid infrastructure is worth the usually considerable overhead effort; such as joining virtual organizations, restructuring software, hiring additional personnel or outsourcing, dealing with security issues; or if it is not worth the cost to participate in a Grid.
WS 2006/07
The rest of this seminar work is structured as follows: Section 2 gives a theoretical basis to the topic by explaining the most important simulation techniques that are being applied in real simulations, while section 3 concentrates on the practical, presenting a compilation and brief analysis of several selected widely used simulation tools. Section 4 gives a brief overview about related work in this huge field of research. In section 5, which makes about half of this work, we present GridSim, one selected simulation tool from those presented in section 3, in detail, showing the installation, working examples and more detailed background information. Finally, section 6 concludes this seminar.
lightweight tables. The table entries updates are triggered by the events and calculated according to mathematical equations. Figure 1 shows a possible implementation of event queuing. A time wheel is a fast implementation using a round robin queue, for example a ring buffer with a constant number of time steps and a counter that proceeds modulo the size of the buffer. Each position of the time wheel corresponds to a specific point in time and can holds a list of events. During execution time the counter is increased to a time step where a non-empty list exists, and where the events are evaluated (consumed).
A special type of discrete simulation is agent-based simulation (a model which is for example used in many bio science applications). There is no underlying equation, the individual entities (such as routers, network links, data producers or consumers) in the model are represented directly (rather than by distance functions or average computation time formulas) and possess an internal state and set of behaviors or rules which determine how the agent's state is updated from one time-step to the next.
WS 2006/07
3.1 Bricks
The first [33] Grid resource scheduling simulator is Bricks [4]. It is a discrete event simulator written in Java. It allows definition of different network topologies, simulation of resource allocation for multiple clients and servers with different strategies. Bricks offers an interface to replace components of the scheduling units, also real existing computing components can be imported into the simulated environment. The simulator has been validated by comparing measurements on a simulated Grid and real monitoring using NWS.
3.4 GangSim
GangSim [14] is a scheduling algorithm simulator for huge systems with hundreds thousands of computers and storage systems. It simulates components like sites and virtual organizations with different usage policies. This discrete event simulator periodically evaluates the state of all simulated components. GangSim derived from Ganglia, a monitoring toolkit, in the simulation the reports from the monitoring are generated by component models.
3.5 GridNet
GridNet [22] is a Data Grid simulation tool for dynamic data replication strategies. It has been developed to evaluate the performance of novel scalable distribution topologies that adapt replica placement to meet the need of a large numbers of users who continuously change their data. GridNet is written in C++ and built on top of the event driven network simulator ns [32]. From ns the basic Grid network specification: nodes, links, and messages have been taken that allow to model different network topologies underlying the simulated environment. The simulator modules are composed of objects that are mapped into ns. Data exchange is defined on the application level and passed down to ns nodes as a stream of packets.
WS 2006/07
3.7 HuskySim
HuskySim [38] is a discrete event Grid simulator toolkit developed in JAVA. Its purpose is to simulate both static and dynamic job execution assignment to different processors and a hybrid approach combining static and dynamic techniques.
3.8 MicroGrid
The goal of the MicroGrid [11] [35] project is to develop and implement simulation tools that provide a vehicle for the convenient scientific study of grid topologies and application performance issues. The MicroGrid provides a virtual grid infrastructure supporting controlled, repeatable experiments. Computation and applications are emulated; actual code is executed on virtualized resources using some physical CPU. Network is simulated on packet level by discrete events. It has been designed to emulate Globus Grids, all Globus components run on virtual hosts. The emulation runs continuously, however it is possible to manage a virtual time that is passed to the applications through system calls.
3.9 OptorSim
OptorSim [13] was designed to test various replication optimization strategies in a simulated Grid environment before they are deployed in the real Grid, especially simulating data access optimization algorithms. It uses discrete event simulation. This open source simulator has been developed in the framework of the European DataGrid (EDG) as a joint effort of ITC-first, University of Glasgow and CERN [29]. The architecture, shown in Figure 3, is based on the EDG model where sites provide computational and data storage resources, both modeled as computing elements (CEs), resource brokers schedule the jobs to CEs and routers without CEs. Each site handles its file content with replica managers; replicas are automatically created and destroyed using replica optimizers with different algorithms. Network topology can be described by enumerating the nodes and specifying the bandwidth, also there are several file access patterns available for configuration.
Emulab [23] is maintained by the University Of Utah. It is an emulator that runs on over 200 nodes, most of them PCs. It uses the same script language as the ns [32] network simulator which is a widely used simulation tool for network topologies. Emulab can be used as a frontend to PlanetLab, however not all functionality of PlanetLab can be accessed.
WS 2006/07
4. Related Work
The immense number of related work about Grid simulation shows the importance of research in this field. We have picked out some articles to demonstrate that new simulators are developed constantly for various reasons and some work related to tools presented in the previous section to underline their relevance. In [7] the authors report on calculations of loading, scaling and utilization behaviors of computational grids, based on simulations. The interesting things are that a Grid overlay network is represented by topologically detailed graph and a special case of agent based simulation has been applied. Agents are used to represent computational jobs, users and resources. Job tokens can be encapsulated and sent between resource agents through FIFO queues. Those mobile agents are representing grid jobs; each communicating agent can be either an injector (producer), a filter (e.g. a router with incoming and outgoing queues) or a consumer of jobs. The agents in the system are simulated using a stochastic Monte Carlo procedure. Grid graphs can be visualized and for example colored according to the usage. On a macroscopic scale such a simulation seems to deliver excellent results. In [28] the effects of several scheduling and replica optimization strategies are considered. The simulation is based on the real environment on the UK Grid for Particle Physics. The authors point out, that several existing simulators like ChicagoSim, GridSim and GridNet concentrate on scheduling problems while what they need is a simulation for combining scheduling problems with optimization of replication strategies to enable the best performance from all the Grid's resources. This has been achieved with OptorSim [13]. Looking at the number of simulators for scheduling and replication optimization, none of them really addressing security, it seems that simulating security is not a big issue in Grid simulation. To fill this gap, [6] presents GS3, a simulation tool that focuses on security issues. The main motivation is that swelling number of applications and consequent increase in the amount of critical data over the grids has considerably raised the stakes for efficient security architecture; still, establishing security solutions for computational grid remains in its initial stages, as there are a number of impediments in the way of successful implementation of these security designs on a real grid. ARMS is an agent-based resource management system for grid computing in which agents are organized into a hierarchy and cooperate with each other to discover available grid resources using decentralized resource advertisement and discovery. Since a large-scale application of ARMS is not available, ARMSim an ARMS performance modeling and simulation environment is presented in [16]. [33] is an excellent up to date survey of Grid research tools. It includes analysis of simulators (Bricks, SimGrid, GridSim, GangSim, OptorSim), emulators (MIcroGrid, Grid eXplorer) and a new category: real life experimental platforms, testbeds built explicitly for research in Grid (DAS2000, Grid5000). Very interesting are the validation issues which are raised in this survey, for example the validation of MicroGrid is held in high esteem.
In [31] an overview about established simulation tools can be found. A tubular overview is based on [3] a paper often referenced in related work. The analyzed tools are OptorSim, P2Psim, PlanetSim, Peersim and GridSim. A part of the work deals with agent based simulators, mostly used in artificial intelligence and biological research like: SWARM, RePast JAS and Diet Agents. The Catnet simulator, a self development is presented here. Of special interest is the modeling of a Grid infrastructure as overlay network, namely as ALNs (Application Layer Networks). The TRUST website [37] shows several simulation testbeds used for simulation research. The common objective is to study security issues in different aspects of distributed systems. The Cyber Defense Technology Experimental Research (DETER) network testbed is a multi-institution project jointly funded by DHS, DARPA and NSF; it deals with selfpropagating malicious code (Worms), Distributed Denial of Service (DDoS), and attacks against routing hardware and inter-/intra-autonomous system routing protocols. Secure Network Embedded Systems Testbed for secure sensor networks. PlanetLab testbed for networking, Peer-to-Peer networking and distributed systems. Electric Power Grid Testbed for power infrastructure protection. In [12] a short overview about simulation languages, simulation libraries and specific simulators can be found. The authors compare some established tools and argue the following showing the limitations: The Bricks system is useful simulating client-server like global computing multi-user system, but focuses on centralized overall system performance and service rates. The MicroGrid system is a Globus emulator and expects applications and scheduler to be constructed using Globus toolkit and evaluation of largescale Grid scenarios and configuration takes huge amount of real-time. The Simgrid supports modeling of resources that are time-shared and restricted to single user environment. It is targeted for developing schedulers that support application execution time span minimization. The authors are closely associated with GridSim and show some usage potential of this tool. In [20] and [26] a lot of information about how to simulate with GridSim and comprehensible examples are given. [8] is a website that provides an excellent compilation of presentations, publications, explanations and download regarding simulating data access optimization algorithms with OptorSim. In [25] algorithms implemented for a real scheduler are simulated using a mathematical system model of behavior (instead of the usual event driven or emulations). It is interesting to see that a stand alone simulation without a comparison to a validated simulator or verification on some real testbed leaves some room for skepticism. On [21], an IMB webpage, OptimalGrid a research prototype of a grid-enabled collaboration framework is shown. One of its capabilities is the simulation of a Grid on a stand alone machine.
WS 2006/07
investigate effective resource allocation techniques based on computational economy simulate millions of resources and thousands of users with varied requirements study scalability and efficiency of systems explore how significantly the local economy and the global positioning (time zone) of a particular resource play role explore pricing and demand/supply situations analyze policies on large-scale distributed computing systems including the Internet, e-commerce, e-trading, ...
11
new allocation/scheduling policy can be made and integrated into the GridSim Toolkit has the infrastructure or framework to support advance reservation of a grid system. incorporates a functionality that reads workload traces taken from supercomputers for simulating a realistic grid environment. incorporates an auction model into GridSim incorporates a datagrid extension into GridSim incorporates a network extension into GridSim incorporates a background network traffic functionality based on a probabilistic distribution. This is useful for simulating over a public network where the network is congested. incorporates multiple regional (unique timezone) GridInformationService (GIS) entities connected in a network topology. adds ant build file to compile GridSim source files
// the GridSim class files // GridSim and SimJava API Documentation // GridSim examples // the GridSim and SimJava2 jar archives // the GridSim Java source code // framework for the Auction model // framework for the DataGrid model // framework for the Network model
GridSim Toolkit v4.0 contains a simple Ant buildfile for compiling the GridSim classes. Ant is a Java-based build tool similar to 'make' that uses XML configuration files (http://ant.apache.org/). Ant can be used in both Windows and Unix/Linux environment. To use the Toolkit with Eclipse just add the JAR file gridsim.jar as an external Jar to the project.
WS 2006/07
5.4 SimJava
SimJava is a discrete event, process oriented simulation package. It is an API that augments Java with building blocks for defining and running simulations. The original SimJava was based on HASE++, a C++ simulation library. HASE++ was in turn based on Jade's SIM++. [27] It must be noticed that SimJava is a general simulation package and not exclusively for network or grid environments. Each system is considered to be a set of interacting processes or entities as they are referred to in SimJava. These entities are connected together by ports and communicate with each other by passing events. A central system class controls all the threads, advances the simulation time, and delivers the events. The simulation is recorded through trace messages produced by the entities.[27] The simulation time progresses on the basis of sent and received events.
Before we can start coding, we have to create a simulation model from our real live system. Defining the entities of our system to study and measure valuable data is the trickiest part to examine. In SimJava, entities are represented by the class Sim_entity. This class encapsulates all the functionality that should be available to entities in the simulation. For the modeller to define an entity, or rather an entity type, he must subclass Sim_entity. The subclass will then be implemented to contain the entity's desired behaviour. This behaviour is provided by means of the body() method which must be overridden in the subclass.[27] The initialization takes place in the entities constructor where at least one port must be created by means of which the entity communicates with others by scheduling events. Ports are represented by instances of Sim_port which exist pairwise since a sending port always needs a receiving port and vice versa. Statistical gatherers could be created as well in case the entity brings valuable information which helps improving or analyzing the real System. As previously mentioned, entities interact by sending each other events represented by the Sim_event class. The sending entity schedules this objects on it's port and the recipient catches the incoming events through it's linked port.
13
All the simulations behaviour is modelled through the SimJava runtime functions in the entities body() method. These functions can be separated into 6 families.
send events and/or data to port: sim_schedule() wait functions: sim_wait(), sim_wait_for() check an entity's deferred queue: sim_waiting(), sim_select(), sim_get_next() busy processing: sim_process(), sim_process_for(), sim_process_until() inactive/pause. time spent does not count towards the entity's utilisation: sim_pause(), sim_pause_for(), sim_pause_until() cancel or complete events: sim_cancel(), sim_complete()
We wont explain the function families in more detail here. Instead we give an example of a very simple entity representing a disk to illustrate the concept. The average latency of the disk is given as a delay parameter. The constructors first parameter is the entity name for the super class which must always be called. The disk receives an access request as an event through it's port, spends time processing this access and marks the event as completed.
class Disk extends Sim_entity { private Sim_port in; private double delay; Disk(String name, double delay) { super(name); this.delay = delay; in = new Sim_port("In"); add_port(in); } public void body() { while (Sim_system.running()) { Sim_event e = new Sim_event(); sim_get_next(e); sim_process(delay); sim_completed(e); } } } // The class for the two disks
// Get the next event // Process the event // The event has completed service
WS 2006/07
After modelling and coding the entities we have to initiate the simulation. There are basically 4 steps to proceed:
initialize Sim_system the static, controlling class make instances of the defined entities link the entities' ports run the simulation
Again we give an explaining example of how such a task could look like. We pick up our previously defined Disk example and add two more entities. A Source class representing a user and a self-explaining Computer class. A graphic will illustrate the relationship between the entities. We don't actually show the newly defined entity classes (Source and Computer) nor do we bother about the usefulness of this simulation since it should just reflect the main concepts of SimJava.
The user triggers events to the computer, which itself triggers the events further to the disks in a randomly fashion. We create another class called ProcessorSubsystem with the main() method starting the simulation.
public class ProcessorSubsystem { public static void main(String[] args) { // Initialise Sim_system Sim_system.initialise(); // Instantiate the entities Source source = new Source("User", 50); Computer computer = new Computer("Computer", 30); Disk disk1 = new Disk("Disk1", 60); Disk disk2 = new Disk("Disk2", 110); // Link the entities' ports Sim_system.link_ports("User", "Out", "Computer", "In"); Sim_system.link_ports("Computer", "Out1", "Disk1", "In"); Sim_system.link_ports("Computer", "Out2", "Disk2", "In"); // Run the simulation Sim_system.run(); } }
15
After we've discussed the principles on SimJava (Sim_system, Sim_entity, Sim_port, Sim_event), we finish this introduction with some additional SimJava features we just briefly mention.
predicates (event filter) simulation verification by means of Sim_trace 3 types of trace default (Sim_system information) in entity by modeler (sim_trace(), sim_trace_level()) mark events of interest (track_event(), track_events()) trace detail: set_trace_detail(boolean default, boolean entity, boolean event) random generators (distributions) add them to the entity by the add_generator() method statistical measurements by means of Sim_stat 3 types of measures rate based (occurrence of an event over a period of time) state based (reflect the entity's state over a period of time) interval based (time intervals that were experienced by events) default and custom measurements
conditions (focus the measurement on the transient or the stable system) 2 types of conditions transient: set_transient_condition() event completion, elapsed time, min-max method termination: set_termination_condition() event completion, elapsed time, confidence interval accuracy
WS 2006/07
Figure 7: GridResource
GridInformationService (Grid Information Service) A Grid Information Service (GIS) is an entity that provides grid resource registration, indexing and discovery services. [27] This class is similar to the GRIS service in the Globus Toolkit.
Figure 8: GridInformationService
GridSim (users, source or a broker by instantiating or subclassing) This class is mainly responsible in initialization, running and stopping of the overall simulation. Worth mentioning are the two static functions init() and startGridSimulation().
Figure 9: GridSim
17
AllocPolicy (a new scheduler by subclassing) New scheduling algorithms can be added into a GridResource entity by extending this class and implement the required abstract methods.
Figure10: AllocPolicy
Gridlet (Applications; not derived from Sim_entity) A Gridlet is a package that contains all the information related to the job and its execution management details such as job length expressed in MI (Millions Instruction), the size of input and output files, and the job owner id. Individual users model their application by creating Gridlets for processing them on Grid resources.[27]
Figure11: GridletList
Many, many more classes are included in this toolkit. Some of them for direct instantiation, others act as an abstract base class to enhance the functionality. The technical demand determines their usage. Likewise with SimJava, the whole toolkit in all details is far beyond the scope of this seminar paper. For that reason, we investigate a rather simplistic, concluding example with some reporting functionality.
5.6 Example
We create 3 grid resources (computing environment) and 3 users after the initialization. Each user creates an application represented as a list of Gridlets. Once the simulation is started, each user asks the GIS (Grid Information Service) for a list of registered resources and sends one Gridlet at a time to a stochastic resource. After the last user was shutdown, the main function prints the status and the history of the Gridlets to the command line. The activity diagram on the next page illustrates the simulation progress.
WS 2006/07
19
Now, we take a look into the sources of the example in order to gain a deeper understanding. We don't present the activities in chronological order but function wise and begin with the simulations entry point, the main function.
public static void main(String[] args) { try { // First step: Initialize the GridSim package. It should be called // before creating any entities. We can't run this example without // initializing GridSim first. We will get run-time exceptions int num_user = 3; // number of grid users Calendar calendar = Calendar.getInstance(); boolean trace_flag = false; // mean don't trace GridSim events // list of files or processing names to be excluded from any // statistical measures String[] exclude_from_file = { "" }; String[] exclude_from_processing = { "" }; // the name of a report file to be written. We don't want to write // anything here. See other examples of using the ReportWriter class String report_name = null; // Initialize the GridSim package GridSim.init(num_user, calendar, trace_flag, exclude_from_file, exclude_from_processing, report_name); // Second step: Creates one or more GridResource objects GridResource resource0 = createGridResource("Resource_0"); GridResource resource1 = createGridResource("Resource_1"); GridResource resource2 = createGridResource("Resource_2"); int total_resource = 3; // Third step: Creates grid users Example user0 = new Example("User_0", 560.00, total_resource); Example user1 = new Example("User_1", 250.00, total_resource); Example user2 = new Example("User_2", 150.00, total_resource); // Fourth step: Starts the simulation GridSim.startGridSimulation(); // Final step: Prints the Gridlets when simulation is over printGridletList(user0.getGridletList(), "User_0"); printGridletList(user1.getGridletList(), "User_1"); printGridletList(user2.getGridletList(), "User_2"); } catch (Exception e) { System.out.println("Unwanted errors happen"); } }
With the comments and the introductions so far, the principles of the main function should be obvious. Next, we pull out the createGridResource() function that returns a GridResource class. We call that function three times. Hence, we create three identical GridResources with 3 Machines in a time shared manner each.
WS 2006/07
private static GridResource createGridResource(String name) { // 1. We need an object of MachineList to store one or more Machines MachineList mList = new MachineList(); // 2. A Machine contains one or more PEs or CPUs. PEList peList1 = new PEList(); // 3. Create PEs and add these into an object of PEList. peList1.add( new PE(0, 377) ); // id and MIPS Rating peList1.add( new PE(1, 377) ); // from hpc420.hpcc.jp peList1.add( new PE(2, 377) ); peList1.add( new PE(3, 377) ); mList.add( new Machine(0, peList1) ); // First Machine // 5. Repeat the process from 2 if we want to create more Machines PEList peList2 = new PEList(); peList2.add( new PE(0, 377) ); peList2.add( new PE(1, 377) ); peList2.add( new PE(2, 377) ); peList2.add( new PE(3, 377) ); mList.add( new Machine(1, peList2) ); // Second Machine PEList peList3 = new PEList(); peList3.add( new PE(0, 377) ); peList3.add( new PE(1, 377) ); mList.add( new Machine(2, peList3) );
// Third Machine
// 6. Create a ResourceCharacteristics object with properties String arch = "Sun Ultra"; // system architecture String os = "Solaris"; // operating system double time_zone = 9.0; // time zone of this resource double cost = 3.0; // the cost (unit per PE sec.) ResourceCharacteristics resConfig = new ResourceCharacteristics( arch,os,mList,ResourceCharacteristics.TIME_SHARED,time_zone,cost); // 7. Finally, we need to create a GridResource object double baud_rate = 100.0; // communication speed long seed = 11L*13*17*19*23+1; double peakLoad = 0.0; // load during peak hour double offPeakLoad = 0.0; // load during off-peak hour double holidayLoad = 0.0; // load during holiday // incorporates weekends and holidays LinkedList Weekends = new LinkedList(); Weekends.add(new Integer(Calendar.SATURDAY)); Weekends.add(new Integer(Calendar.SUNDAY)); LinkedList Holidays = new LinkedList(); try { GridResource gridRes = new GridResource(name, baud_rate, seed, resConfig, peakLoad, offPeakLoad, holidayLoad, Weekends, Holidays); } catch (Exception e) { } return gridRes; }
21
We continue with class Example, a subclass of GridSim, that appears as a individual. Since we already learned that GridSim is a subclass of Sim_entity, we know that the Constructor is responsible for the initialization and the overridden body() function determines it's behavior.
class Example extends GridSim { private Integer ID; private String name; private GridletList list, receiveList; private int totalResource; public Example(String name, double baud_rate, int total_resource) { super(name, baud_rate); this.name = name; this.totalResource = total_resource; this.receiveList = new GridletList(); // Gets an ID for this entity this.ID = new Integer( getEntityId(name) ); // Creates a list of Gridlets or Tasks for this grid user this.list = createGridlet( this.ID_.intValue() ); System.out.println(name + ":Creating "+this.list.size()+"Gridlets");
The Constructor calls a function named createGridlet(), in which the users Gridlets, the application if you will, are created.
private GridletList createGridlet(int userID) { // Creates a container to store Gridlets GridletList list = new GridletList(); // We create three Gridlets or jobs/tasks manually double length = 3500.0; long input_size = 300, output_size = 300; Gridlet gridlet1 = new Gridlet(0, length, input_size, output_size); Gridlet gridlet2 = new Gridlet(1, 5000, 500, 500); Gridlet gridlet3 = new Gridlet(2, 9000, 900, 900); // setting the owner of these Gridlets gridlet1.setUserID(userID); gridlet2.setUserID(userID); gridlet3.setUserID(userID); // Store the Gridlets into a list list.add(gridlet1); list.add(gridlet2); list.add(gridlet3); // We create <=5 Gridlets with GridSimRandom and GriSimStandardPE // sets the PE MIPS Rating GridSimStandardPE.setRating(100); // creates 5 Gridlets int max = 5; int count = GridSim.rand.intSample(max);
WS 2006/07
for (int i = 1; i < count+1; i++) { // the Gridlet length determines from random values and the // current MIPS Rating for a PE length = GridSimStandardPE.toMIs(GridSim.rand.doubleSample()*50); // determines the Gridlet file size that varies within the range // 100 + (10% to 40%)
file_size=GridSimRandom.real(100, 0.10, 0.40, GridSim.rand.doubleSample());
// determines the Gridlet output size that varies within the range // 250 + (10% to 50%)
output_size=GridSimRandom.real(250,0.10, 0.50,GridSim.rand.doubleSample());
// creates a new Gridlet object and set the owner Gridlet gridlet = new Gridlet(2+i, length, file_size, output_size); gridlet.setUserID(userID); // add the Gridlet into a list list.add(gridlet); } } return list;
The following function completes your trip into the sources. The core of the whole simulation toolkit, the body() function, determines the examples action. As already mentioned, we first wait for GIS to give us the list of registered resources and send our Gridlets for calculation to them. when all Gridlets are received we're finished and shut the entity down.
public void body() { int resourceID[] = new int[this.totalResource_]; double resourceCost[] = new double[this.totalResource_]; String resourceName[] = new String[this.totalResource_]; LinkedList resList; ResourceCharacteristics resChar; // waiting to get list of resources. Since GridSim package uses // multi-threaded environment, your request might arrive earlier // before one or more grid resource entities manage to register // themselves to GridInformationService (GIS) entity. // Therefore, it's better to wait in the first place while (true) { // need to pause for a second to wait GridResources finish // registering to GIS super.gridSimHold(1.0); resList = super.getGridResourceList(); if (resList.size() == this.totalResource_) break; else { System.out.println(this.name + ":Waiting to get resources ..."); } }
23
// a loop to get all the resources available for (int i = 0; i < this.totalResource; i++) { // Resource list contains list of resource IDs not resource objects resourceID[i] = ( (Integer)resList.get(i) ).intValue(); // Requests to resource entity to send its characteristics super.send(resourceID[i], GridSimTags.SCHEDULE_NOW, GridSimTags.RESOURCE_CHARACTERISTICS, this.ID); // waiting to get a resource characteristics resChar = (ResourceCharacteristics) super.receiveEventObject(); resourceName[i] = resChar.getResourceName(); resourceCost[i] = resChar.getCostPerSec(); System.out.println(this.name + ":Received from " + resourceName[i] + ", with id = " + resourceID[i]); // record this event into "stat.txt" file super.recordStatistics("\"Received from "+resourceName[i] + "\"","");
Gridlet gridlet; String info; // a loop to get one Gridlet at one time and sends it to a random grid // resource entity. Then waits for a reply int id = 0; for (i = 0; i < this.list.size(); i++) { gridlet = (Gridlet) this.list.get(i); info = "Gridlet_" + gridlet.getGridletID(); id = GridSim.rand.intSample(this.totalResource); System.out.println(this.name + ":Sending " + info + " to " + resourceName[id] + " with id = " + resourceID[id]); // Sends one Gridlet to a grid resource specified in "resourceID" super.gridletSubmit(gridlet, resourceID[id]); // Recods this event into "stat.txt" file for statistical purposes
super.recordStatistics("\"Submit "+info+" to "+resourceName[id]+"\"", "");
// waiting to receive a Gridlet back from resource entity gridlet = super.gridletReceive(); System.out.println(this.name + ":Receiving Gridlet "+ gridlet.getGridletID()); // Recods this event into "stat.txt" file for statistical purposes super.recordStatistics("\"Received " + info + " from " + resourceName[id] + "\"", gridlet.getProcessingCost()); // stores the received Gridlet into a new GridletList object this.receiveList.add(gridlet); } // shut down all the entities, including GridStatistics entity super.shutdownGridStatisticsEntity(); super.shutdownUserEntity(); super.terminateIOEntities(); System.out.println(this.name + ":%%%% Exiting body()"); }
WS 2006/07
After we've been through the interesting code of our simulation, we list some lines of the simulation output.
$ java -classpath ../../jars/gridsim.jar:. Example6 Starting Example6 Initialising... Creates one Grid resource with name = Resource_0 Creates one Grid resource with name = Resource_1 Creates one Grid resource with name = Resource_2 Creating a grid user entity with name = User_0, and id = 17 User_0:Creating 3 Gridlets Creating a grid user entity with name = User_1, and id = 20 User_1:Creating 6 Gridlets Creating a grid user entity with name = User_2, and id = 23 User_2:Creating 5 Gridlets Starting GridSim version 4.0 Entities started. User_0:Waiting to get list of resources ... ... ... User_2:Sending Gridlet_1 to Resource_0 with id = 5 User_0:Receiving Gridlet 1 User_0:Sending Gridlet_2 to Resource_2 with id = 13 User_1:Receiving Gridlet 1 User_1:Sending Gridlet_2 to Resource_2 with id = 13 User_2:Receiving Gridlet 1 User_2:Sending Gridlet_2 to Resource_1 with id = 9 ... ... User_1:%%%% Exiting body() GridInformationService: Notify all GridSim entities for shutting down. Sim_system: No more future events Gathering simulation data. Simulation completed. ========== OUTPUT for User_0 ========== Gridlet ID LENGTH STATUS Resource ID Cost 0 3500.0 SUCCESS 9 27.851458885941646 1 5000.0 SUCCESS 9 39.78779840848807 2 9000.0 SUCCESS 13 71.6180371352786 Time below denotes the simulation time. Time (sec) Description Gridlet #0 -----------------------------------------0,00 Creates Gridlet ID #0 0,00 Assigns the Gridlet to User_0 (ID #17) 58,68 Allocates this Gridlet to Resource_1 (ID #9) with cost = $3.0/sec 58,68 Sets the submission time to 58,68 58,68 Sets Gridlet status from Created to InExec 58,68 Sets the execution start time to 58,68 67,964 Sets Gridlet status from InExec to Success 67,964 Sets the wall clock time to 9,284 and the actual CPU time to 9,284 67,964 Sets the length's finished so far to 3500.0 ... ... Finish Example6
25
Furthermore, we list a fragment of the GridSim_stat.txt file. A log file, created by GridSims GridStatistic object during the initialization that contains statistical data.
... 24.12 "Received ResourceCharacteristics from Resource_0" User_1 34.68 "Received ResourceCharacteristics from Resource_2" User_0 34.68 "Submit Gridlet_0 to Resource_1" User_0 34.68 "Received ResourceCharacteristics from Resource_1" User_1 34.68 "Received ResourceCharacteristics from Resource_0" User_2 45.24 "Received ResourceCharacteristics from Resource_2" User_1 45.24 "Submit Gridlet_0 to Resource_1" User_1 45.24 "Received ResourceCharacteristics from Resource_1" User_2 55.800000000000004 "Received ResourceCharacteristics from Resource_2" 55.800000000000004 "Submit Gridlet_0 to Resource_2" User_2 91.96381962864722 "Received Gridlet_0 from Resource_1" User_0 91.96381962864722 "Submit Gridlet_1 to Resource_1" User_0 113.08381962864723 "Received Gridlet_0 from Resource_2" User_2 113.08381962864723 "Submit Gridlet_1 to Resource_1" User_2 ...
So what are these numbers all about? Well, we analyze User_0's first Gridlet (Gridlet_0) that was sent to Resource_1 and try to figure that out. At first we have to recall, that User_0's first Gridlet has the following characteristics.
instruction length: 3500.0 MI (Million Instructions per PE) input and output size: 300 bytes
Properties of Resource_1:
cost: 3 (Units per PE Second) baud rate: 100 bps (Bits per Second) speed: 377 MIPS (Million Instructions per Second)
Next we recognize that the time from simulation start to the point where User_0 sent Gridlet_0 to Resource_1 took exactly 34.68 seconds. Resource_1 received this Gridlet 24 seconds later at timestamp 58.68. After a calculation time of 9.284 seconds, the resource sent the Gridlet back to the User. That was from second 67.964 to 91.964.
WS 2006/07
We can easily calculate the submission time, the calculation time and the costs now.
time_submission = input_size in bytes * 8 / baud_rate in bps = 300 byte * 8 / 100 bps = 24 seconds time_calculation = instruction_length in MI / speed in MIPS = 3500.0 MI / 377 MIPS = 9.284 seconds cost_calculation = time_calculation in seconds * cost in unit per second = 9.284 seconds * 3 units per seconds = 27.852 units
6. Conclusion
It is hard to find a tool that will cover all aspects needed to simulate a Grid, it seems that each one is suited best to work on a specific task. A general rule is that a simulator should be as coarse-grained as possible and concentrate on specific issues since it is not possible to simulate a real Grid. Therefore the selection of the tool depends on what is needed to be analyzed and how many resources are available. Looking closely at the presented tools reveals that simulators are more specialized than emulators, for example there exists a big number of established scheduling simulators. It might be early to speak of a trend but the newest simulation research shows a tendency into the direction of security. A big issue is that the correctness of the obtained results is arguable. However even if a simulation of specific algorithm may not produce the same performance data as on a real grid it can still be very helpful, for example for the comparison of different algorithms indicating a trend that can be expected in the real world.
7.References
Sorted by link name and first author.
[1] R. Bagrodia et al. Parsec: A Parallel Simulation Environment for Complex Systems, Computer, Vol. 31, No. 10, pp. 7785, 1998. I.Foster, C.Kesselman, S.Tuecke The Anatomy of the Grid, Enabling scalable virtual organizations, International Journal of Supercomputer Applications, 2001 A. Sulistio & C.S. Yeo & R. Buyya A Taxonomy of Computer-Based Simulations and its Mapping to Parallel and Distributed Systems Simulation Tools, Softw. Prac. Exper. 2004;34:653673. Atsuko Takefusa, Satoshi Matsuoka, Kento Aida, Hidemoto Nakada, and Umpei Nagashima. Overview of a performance evaluation system for global computing scheduling algorithms. In HPDC '99: Proceedings of the The Eighth IEEE International Symposium on High Performance Distributed Computing, page 11, Washington, DC, USA, 1999. IEEE Computer Society. http://boinc.berkeley.edu/ Berkeley Open Infrastructure for Network Computing
[2]
[3]
[4]
[5]
27
[6]
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/ proceedings/e-science/2005/2448/00/2448toc.xml&DOI=10.1109/E-SCIENCE.2005.46
Syed Naqvi, Michel Riguidel, Grid Security Services Simulator (G3S) A Simulation Tool for the Design and Analysis of Grid Security Solutions First International Conference on e-Science and Grid Computing (e-Science'05) pp. 421-428 [7] http://delivery.acm.org/10.1145/1160000/1151837/p63hawick.pdf?key1=1151837&key2=7946984611&coll=&dl=acm&CFID=15151515&CFTO KEN=6184618 K.A. Hawick H.A. James Simulating a Computational Grid with Networked Animat Agents Institute of Information and Mathematical Sciences, Massey University Albany North Shore 102-904, Auckland, New Zealand http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html Simulating data access optimization algorithms with OptorSim. http://en.wikipedia.org/w/index.php?title=Simulation&oldid=92633632 "Simulation." Wikipedia, The Free Encyclopedia. 7 Dec 2006, 04:54 UTC. Wikimedia Foundation, Inc. 9 Dec 2006. http://faq.boinc.de/index.php?title=BOINC-Projekt Information about BOINC. http://grail.sdsc.edu/papers/xia_clade2004.pdf Huaxia Xia*, Holly Dail, Henri Casanova*, Andrew A. Chien* The MicroGrid: Using Online Simulation to Predict Application Performance in Diverse Grid Network Environments *Department of Computer Science and Engineering University of California at San Diego San Diego Supercomputer Center University of California at San Diego http://gridbus.org/papers/gridsimedu.pdf Manzur Murshed(1), Rajkumar Buyya(2) Using the GridSim Toolkit for Enabling Grid Computing Education (1)Gippsland School of Computing and Information Technology Monash University, Gippsland Campus, Churchill, VIC 3842, Australia (2)School of Computer Science and Software Engineering Monash University, Caulfield Campus, Melbourne, VIC 3145, Australia http://grid-data-management.web.cern.ch/grid-datamanagement/optimisation/optor/OptorSim OptorSim, a Replica Optimizer Simulation. http://people.cs.uchicago.edu/~cldumitr/docs/GangSim.pdf Catalin L. Dumitrescu(1), Ian Foster(2) GangSim: A Simulator for Grid Scheduling Studies (1) Department of Computer Science, The University of Chicago (2) Mathematics and Computer Science Division, Argonne National Laboratory & The University of Chicago http://ptolemy.eecs.berkeley.edu/ptolemyII/ Ptolemy II homepage http://sim.sagepub.com/cgi/content/abstract/80/4-5/221
[8] [9]
[10] [11]
[12]
[13]
[14]
[15] [16]
Junwei Cao
ARMSim: A Modeling and Simulation Environment for Agent-Based Grid Computing
WS 2006/07
http://ww1.ucmss.com/books/LFS/CSREA2006/PDP4657.pdf Andrew Flahive1, Wenny Rahayu1, Bernady O. Apduhan2, David Taniar3 Simulating the Distributed Ontology Framework in the Semantic Grid Environment with GridSim 1La Trobe University, Australia 2Kyushu Sangyo University, Fukuoka, Japan 3Monash University, Australia http://www.alphaworks.ibm.com/tech/optimalgrid OptimalGrid; A research prototype of grid-enabled middleware http://www.cs.rpi.edu/~szymansk/papers/hcw03.pdf Houda Lamehamedi, Zujun Shentu, and Boleslaw Szymanski1 Ewa Deelman2 Simulation of Dynamic Data Replication Strategies in Data Grids 1Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180 2Information Sciences Institute, University of Southern California, Marina Del Rey, CA 90292 http://www.cs.utah.edu/flux/papers/workflow-usenix06/ Eric Eide Leigh Stoller Tim Stack Juliana Freire Jay Lepreau Integrated Scientific Workflow Management for the Emulab Network Testbed Appeared in Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, MayJun. 2006 http://www.dcs.ed.ac.uk/home/simjava/tutorial/index.html http://www.icsa.informatics.ed.ac.uk/research/groups/hase/simjava/ SimJava homepage and tutorial. http://www.ens-lyon.fr/LIP/Pub/Rapports/RR/RR2006/RR2006-01.pdf Alexis Ballier ,Eddy Caron , Dick Epema , Hashim Mohamed Simulating Grid Schedulers with Deadlines and Co-Allocation http://www.fi.muni.cz/~xklusac/gridsim/index.html Grid Scheduling with GridSim Dalibor Kluscek Faculty of Informatics, Masaryk University Brno, Czech Republic http://www.gridbus.org/gridsim/ GridSim: A Grid Simulation Toolkit for Resource Modeling and Application Scheduling for Parallel and Distributed Computing http://www.gridpp.ac.uk/papers/ah03_035.pdf David G. Cameron1, Ruben Carvajal-Schiano2, A. Paul Millar1, Caitriana Nicholson1, Kurt Stockinger3, Floriano Zini2 UK Grid Simulation with OptorSim 1 University of Glasgow, Glasgow, G12 8QQ, Scotland
[21] [22]
[23]
[24]
[25]
[26]
[27]
[28]
29
2 ITC-irst, Via Sommarive 18, 38050 Povo (Trento), Italy 3 CERN, European Organization for Nuclear Research, 1211 Geneva, Switzerland [29] http://www.gridpp.ac.uk/papers/OptorSimJune2002.pdf William H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger, Floriano Zini Simulation of Dynamic Grid Replication Strategies in OptorSim http://www.hep.ucl.ac.uk/~pac/EDGSim/index.html EDGSim homepage http://www.iw.uni-karlsruhe.de/catnets/fileadmin/publications/d2_1.pdf IST-FP6-003769 CATNETS D2.1 Analysis of Simulation Environment http://www.isi.edu/nsnam/ns/ The ns Network Simulator homepage http://www.lri.fr/~quetier/papiers/imacs2005_survey.pdf Benjamin Quetier and Franck Cappello A survey of Grid research tools: simulators, emulators and real life platforms INRIA, LRI University of Paris South http://www.planet-lab.org/ PlanetLab homepage. http://www.sc2000.org/techpapr/papers/pap.pap286.pdf The MicroGrid: a Scientific Tool for Modeling Computational Grids H. J. Song X. Liu D. Jakobsen R. Bhagwan X. Zhang K. Taura A. Chien Department of Computer Science and Engineering University of California, San Diego In Proceedings of SC2000, Dallas, Texas http://www.springerlink.com/content/hj25770j1825817u/fulltext.pdf Kavitha Ranganathan(1) and Ian Foster(2) Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids Journal of Grid Computing Publisher Springer Netherlands ISSN 1570-7873 (Print) 1572-9814 (Online) Subject Computer Science and Mathematics and Statistics Issue Volume 1, Number 1 / March, 2003 DOI 10.1023/A:1024035627870 Pages 53-62 Online Date Friday, October 29, 2004 http://www.truststc.org/testbeds.htm
[34] [35]
[36]
[37] [38]