Cluster Computing in Large Scale Simulation

Cluster Computing in Large Scale Simulation
Larry Mellon
David Itkin
Science Applications International Corporation
1100 N. Glebe Rd. Suite 1100
Arlington, VA 22201
(703) 907-2553/2560
mellon@jade.saic.com, itkin@jade.saic.com
Keywords:
cluster computing; low latency; large scale; data distribution
ABSTRACT: Simulations tend to be distributed for one of two major factors: the size and performance requirements
of the simulation precludes execution within a single computer, or key users/components of the simulation are located
at physically separate facilities. It is generally recognized that distributed execution entails a considerable degree of
overhead to effectively link the distributed components of a simulation and that said overhead tends to exhibit poor
scaling behaviour. This paper postulates that the limited scalability of distributed simulations is primarily caused by
the high cost of communication within a LAN/WAN environment and the high latency in accessing the global state
information required to efficiently distribute data and manage time. An approach is described where users of a
simulation remain distributed, but the majority of model components are executed within a tightly-coupled cluster
computing environment. Such an environment provides for lower communication costs and very low latency between
simulation hosts. With an accurate view of current global state information made possible by low latency
communication, key infrastructure algorithms may be made significantly more efficient. This in turn allows for
increased system scalability.
within a cluster are connected via high bandwidth, low

1. Introduction latency transport mechanisms. Such fabrics are becoming
increasingly available and industry standardization efforts
A number of existing and proposed simulation systems are along these lines exist, such as the Virtual Interface
distributed in nature. A large number of successful Architecture [2] and related efforts. The primary effects
training exercises have been conducted using ALSP and of executing within such an environment are the lower
DIS and other distributed simulation techniques. costs of transmitting or receiving a message, and the low
However, upper bounds on the size, complexity and latencies involved in accessing information on other hosts.
performance levels of distributed simulation have been From these effects, it may be possible to devote
shown to exist. New infrastructure with improved significantly less compute resources to the system
scalability has been successfully prototyped under infrastructure thus freeing resources for modeling
programs such as JPSD and STOW, and a version of the activities. Increased scalability is possible from increased
RTI is under development which encompasses and accuracy of data distribution – itself a potential gain from
expands the scalability concepts explored within those low latency access to global addressing information.
programs [1]. While it is expected that the RTI will Secondary effects are: simplified administration issues
provide sufficient scalability for the majority of such as workstation / network configuration, initialization
federations, the limitations of distributed execution cannot files, software version control, upgrades; and greater
be fully overcome. Thus federations with extreme accuracy within some federations due to lower latency
performance or size requirements may require further between models. The DARPA program Advanced
improvements in system scalability to achieve their goals. Simulation Technology Thrust (ASTT) is sponsoring
A set of potential improvements to system scalability may research into such clustering optimizations with the goal
exist within the domain of cluster computing. Clusters of supplying experimental data on potential system
have successfully replaced supercomputers in certain use scalability techniques to JSIMS and similar simulation
conditions as a lower cost option with better upgrade systems.
potential. The clustering concept may be extended into
tightly-coupled cluster computing, where the computers
While the potential gains from cluster computing in models. Algorithms utilizing dynamic information
simulation are large, it is recognized that many resident on many hosts are particularly susceptible to this
simulations are distributed for reasons beyond increased problem.
compute requirements: users and controllers of the models
And finally, the issue of multi-level security exists for
are often geographically separated. This does not
some proposed federations. Proposed solutions usually
necessarily preclude the use of clustered computing. A
involve some form of security gateway which must
shift in the definition of distributed simulation is
validate or downgrade information before it may leave the
indicated: allow the users of a simulation to be distributed,
source site. Such gateways quickly become data flow
but centralize the bulk of the modeling computation within
bottlenecks in fully distributed simulation where shared
a resource-efficient cluster environment. Standard
state is maintained by all models.
latency-hiding techniques may be used to increase the
effective coupling of users to models. Note that human This paper suggests that the complexities of fully
time perception limitations may also be effectively distributed simulation are sufficient to prevent low-cost,
exploited at this connection point. effective scalability in the currently limited network
environment. By switching to a clustered computing
The following sections expand on the potential advantages
approach, many of the basic problems encountered by
of cluster computing and potential problems that may be
attempting to fully distribute a large-scale simulation may
encountered. A small set of basic distributed simulation
be avoided.
problem areas are defined which may expect gains from
cluster computing and the key assumptions which are used 3. Proposed Cluster Architecture
in this analysis are outlined.
A brief summary of available clustering technology is As shown in Figure 1, a cluster architecture consists of
given and commented on with respect to suitability within workstations linked with low-latency network devices.
this problem domain. An abstract architecture is defined Controllers and users of models are linked into the
to support experimentation with major infrastructure simulation via software agents which obtain and pre-
algorithms in the context of cluster computing. Finally, process data on behalf of their clients. Agents and clients
algorithms and experiments under development via are connected via standard WAN point to point
DARPA’s ASTT program are described. mechanisms and may employ latency-hiding techniques
between each other. As described below, the majority of
2. Cluster Background data transmissions are expected to occur within the low-
latency cluster environment.
One of the motivating factors in examining cluster
computing is the high cost of fully distributed simulations 3.1 Key Assumptions
done to date. Exercises such as STOW required
considerable manpower in simply configuring the system, In generating the basic distributed simulation via cluster
access to tailored network hardware, and advanced computing concept, two basic assumptions are made
prototype algorithms in the simulation infrastructure. regarding data flows and data rates between distributed
Successes were required from multiple R&D efforts and simulation components. Such assumptions are kept very
vendor tailoring to allow STOW’s achieved scale. general in our analysis, as they will vary greatly across
Examples include Bi-level multicast, LAN edge devices federations. First, we assume the bulk of data exchanges
able to support thousands of multicast groups, OS within a large simulation are between model components
modifications, and WAN stability at the STOW traffic of the simulation. A far lesser volume of communication
rates and characteristics. To a lesser extent, the JPSD tends to exist between model controllers (or users) and
system also required new infrastructure algorithms, but the models. This assumption is based in part on the number
centralized nature of JPSD allowed for simpler exercise of model components within a large simulation and the
configuration and data distribution algorithms [3]. One relatively smaller number of human controllers, and the
potential lesson learned is the high cost of tailoring fact that models interact tens to hundreds of times per
network technology to meet the needs of distributed second, whereas human interactions with the models
simulation. simply cannot occur at that rate. As backing data, an
informal survey of ModSAF users was conducted to
Another problem in distributed simulation is the high cost establish rough user to model and model to user data rates,
of synchronizing distributed host actions across a WAN. and compared against early STOW projections of model
The cost of communication and associated latency (and to model interaction rates. The following generalized
the variability of said latency) greatly affect distributed assumptions are then used: a screen refresh rate of twice
algorithms used within both the infrastructure and the per second, a entity state update of once per second, and a
command from the controller to the model of once per 5 Secondary optimizations available via clustering are the
minutes. Given these generalized numbers, it is clear that simpler security implementation options that exist when
for a large entity-count simulation with few controllers, the majority of the models are within a protected site and
the majority of data exchanges will be inter-model. minimal data flows exist out to distributed users. Further,
However, it is also clear that some federations will fall the rate of such model to user traffic is expected to be
outside these bounds – a user controlling a model via lower than model to model traffic and correspondingly
realtime joy-stick operations changes the (simplistic) lowers the load on the security gateway.
equation. An additional motivator is the human
perception range. Given that humans cannot react to 3.3 Clustering hardware options
information changes which occur at a rate of several times
per second, it seems logical to minimize latency between A number of options now exist in the low-latency
fast-reacting model components and allow larger latency clustered computing field, where low-latency is defined as
links to the slower-reacting humans in the loop. one to ten microseconds for an application to application
Model Remote
Host 1
Model
Low Latency
Model Cluster Model High Latency
Communication Agent WAN
Remote
Host 2
Host N Controller
Model
Remote
Controller
Agent
Host 3
Figure 1: Basic cluster architecture data transfer. First, technologies such as SCRAMnet have
long been a critical part of realtime control systems.
3.2 Optimizations possible within a cluster Second, a number of similar hardware devices have been
transitioned from the parallel processing hardware
Three basic areas of executing a distributed simulation community to the PC and workstation markets. Point to
may be improved via clustered computing. First, the point devices now exist, as do cards implementing the
mechanics of running an exercise are considerably IEEE distributed shared memory protocol (SCI). Third, a
simpler. From a system perspective, ensuring correct number of research groups have successfully
versions of the OS, configuration files and models within demonstrated direct application access of standard
a cluster is easier than an exercise at multiple sites under networking cards, thus bypassing the cost of going
multiple system administration strategies. It is also through the OS layers and associated data copies. A
possible to upgrade the hardware incrementally, as sample of such clustering technologies is given in Table 1,
opposed to many parallel processors where the entire drawn from literature surveys and experiments [11].
system must be upgraded at once. Also note that upgrades Finally, industry efforts exist to standardize a proposed
are from off-the-shelf computers, thus allowing the system interface for low-latency cluster communication, the VI
to continually include the latest and fastest of the rapidly architecture, as well as the related System-Area Network
improving PC and workstation markets. Upgrades may (SAN) concept. While initial ASTT experiments have
also include asymmetrical memory sizes at each encountered a degree of immaturity in the implementation
processor. This allows better tailoring of resources to of some devices, the projected low latency is provided.
models. Second, the models themselves benefit from a Given the industry backing of tightly-coupled cluster
tightly coupled environment, in that lower latency computing, the future of this technology looks promising.
between model components allows more accurate
modeling for some federations. Third, the simulation
infrastructure of the system gains many advantages from
tightly-coupled clustering. These infrastructure
improvements form the basic research areas of this ASTT
effort and are outlined later in this paper.
Name Type Latency Bandwidth
ATM Fast switched network 20us 155Mbit/s
SCRAMNET (Systran) Distributed Shared Memory 250-800ns 16.7 MB/s
Myrinet (Myricom) Fast switched network 7.2us 1.28Gbit/s
U-Net Software solution 29 us 90 Mbit/s
High Performance Parallel Interface Fast switched network 160ns 800Mbit/s
(HIPPI)
Scalable Coherent Interface (SCI) Distributed Shared Memory 2.7us 1Gbit/s
Table 1 Sample clustering techniques
simulation system are abstracted into simple producers

3.4 Simulation-specific clustering problems and consumers of data linked by several infrastructure
components.
While clustering is expected to reduce the problems
encountered from fully distributed simulation, issues The basic hypothesis addressed by this cluster
unique to cluster execution are likely to arise. One line of infrastructure research may be stated as follows: tightly-
investigation within this ASTT program is simply to coupled cluster computing provides an execution
establish that clustering technology is capable of environment which allows low-latency access to global
supporting the data transport characteristics of distributed data. More accurate infrastructure algorithms may be
simulation. A known area of concern is the style of constructed with the availability of accurate global data.
communication supported by clustering hardware. The System overheads are lowered by the actions of more
majority of low-latency cluster work is based on point to accurate algorithms reducing the number of network
point traffic: single transmit, multiple recipient support is accesses and by the lower cost of network accesses within
not a focus area. Simulation research to date has a cluster. Clustering in general will allow larger, more
primarily explored the use of IP multicast as both a single accurate simulations to be executed. Simulation systems
transmit, multiple recipient interface and as a data as a whole may still be considered distributed, as only the
segmentation scheme. How effectively clustering majority of models need be centralized within the cluster.
technology supports traffic patterns where many receivers Model controllers and some models may exist remote
may exist for a given packet will be an important result. from the cluster.
A second line of investigation is to ensure that coupling 4.1 Data distribution

remote controllers and secondary models into the cluster
will meet performance and reaction-time requirements for Distributed systems must perform two tasks for all
the systems as a whole. potential data transmissions: addressing and routing.
Addressing requires the system to determine what hosts, if
As an attempt to avoid problems in general, a goal of this
any, require a given data packet. Routing requires the
program is to cast the distributed simulation problem into
system to determine the lowest cost mechanism to get a
a more mainstream use of networking technology. We
given packet from its source to its destination(s). These
thus hope to avoid many of the hurdles encountered by
two infrastructure elements are key to system scalability
previous systems, instead leveraging off the commercial
and are expected to map well to a cluster computer
world’s direction in networking. Examples include
environment.
staying within the IP multicast limits of commercial
hardware, and using static links across the WAN to a
4.1.1 Addressing of data
small set of distributed users.
The Global Addressing Knowledge problem exists in all
4. Infrastructure Optimizations distributed simulations. The problem may be summarized
While ASTT overall is concerned with many aspects of as follows:
distributed simulation, this ASTT program is  Some amount of the overall simulation state is shared
infrastructure-centric: only cluster optimizations at the in nature. Changes to shared state by one entity must be
infrastructure layer of a simulation system are addressed. communicated to all relevant entities.
This focus is captured in this project’s experimentation
architecture, where other components of a distributed
 Not all shared state updates are required by all  Deliver each packet at a minimum cost to the system
entities, thus scalable solutions require that state updates overall. System cost is composed of transmission
must be addressed, i.e. the subset of simulation entities costs, network bandwidth consumed, latency
which require that particular state update must be incurred, background overhead in managing
identified. resources, and receiver costs.
 A simulation host acts as the agent for N local  For each data packet, three things may be established:
entities, where each local entity may change some the source of the data, the destination of the data, and
portion of the overall simulation state and in turn the value of the data.
requires access to some subset of the overall simulation
 Each of those three may be used to establish the
state. From a distributed system perspective then, the
routing approach.
shared state actions of local entities may be unified into
a host-level view, where a host may be considered a Also in [4], an analysis concludes that the best routing
producer and/or consumer of simulation shared state. technique will vary based (primarily) on the number of
single transmit, multiple recipient resources available, the
 The set of producers and the set of consumers
load balance of the system, and the number of hosts
associated with any given portion of shared state will
involved. Other factors to consider include cost of
vary over time.
switching resources, number of false data receptions, and
 The cost of accurately mapping producers to number of transmissions required to ensure complete
consumers increases as the number of producers and routing coverage.
consumers increases.
4.2 Time management
The high latency and communication costs in a system
distributed via a WAN inhibit obtaining accurate data on Time management is another area which may be
the current data requirements of consumers and data optimized within a cluster. This topic is addressed by
availabilities of producers. This inevitably leads to another component of ASTT, to be described separately.
inaccuracies in the addressing solution, which, unable to
determine the minimum set of transmissions required, 5. Experimental Approach
must err on the side of caution and distribute as much data
as could possibly be required to each host. Thus, between To determine an effective layer of infrastructure to
the inherent inaccuracies of a dynamic solution to a experiment at, a sample set of distributed simulation
problem based on distributed (global) knowledge, the high systems was analyzed and common functionality was
cost of obtaining global knowledge in a WAN extracted. Systems analyzed include the RTI 2.0, the
environment and the accuracy loss from high latency JSIMS CI architecture, and Tempo / Thema [5]. An
communication, WAN addressing schemes will generally abstracted architecture was created to encompass both
result in a communication load in excess of the theoretical existing approaches and proposed approaches, capturing
minimum load. the basic distributed simulation functional areas in a
simple form. By performing experiments at the common
As stated in our hypothesis, the low latency access to abstraction level, a reasoned argument may be made that
global data within a tightly-coupled cluster environment the cluster computing results obtained within ASTT are
may improve the accuracy of data addressing within the applicable to the set of source distributed simulation
system. systems.
4.1.2 Routing techniques Such an experimentation architecture (shown in Figure 2)
serves a number of purposes. The primary purpose is to
Single transmit, multiple recipient network technologies set the terms of reference and the scope within which
such as IP multicast and ATM point to multi-point have individual experiments will be defined, conducted and
been proposed as mitigating techniques for the large analyzed. Next, by introducing abstractions which
volume of network traffic and the CPU load per host of describe in general portions of a distributed simulation, it
generating multiple copies of the same message. Similar allows static assumptions to be made regarding areas of
schemes exist within the clustering world, although to a the infrastructure which this program does not address.
lesser extent. Described in [4], three basic classes of This simplifies the analysis procedure and provides a
routing may be done with such techniques. simple context in which to analyze the effects under test.
The routing problem may be summarized as follows: For example, predictive contracts are believed to have a
significant effect on large scale simulations. However, we
do not experiment with differing predictive contract
schemes and thus abstract their effect into a simple To allow accurate comparison across differing
reduced data rate between hosts. Finally, the communication fabrics, a single use abstraction is
experimentation architecture is to identify similarities with proposed. A channel abstraction will be implemented in
other distributed computing problems. By casting the the most efficient way possible for a given fabric and the
abstractions of distributed simulation into similar patterns same test suites will be run across all fabrics. Channels are
as other distributed systems, we can more readily utilize multiple-transmitter, multiple-receiver objects. Note that
research and techniques from those fields. the number of channels which may be efficiently
supported will vary across fabrics, as will the cost of
Local Models joining or leaving a channel. Further, a channel
implementation may need to restrict the number of
Data Transmission Optimizations transmitters or receivers on a given channel to make more
local remote efficient use of the communication fabric. Higher layers of
shared state shared state
updates updates the architecture must be able to determine said costs and
potentially restrict themselves to an efficient channel
tagged tagged
local
Inter-Host
Data Manager
remote usage scheme for a given fabric. This technique allows
state changes state changes
for experimentation with existing but untested theories
tags tags
tags outlined in [4] such as destination-based routing will be
required
required produced more efficient than value-based routing over ATM.
Further note that the architecture must prevent such
current current
DD:
tag to channel
DD:
tag to channel
DD: tailoring from affecting how client applications are built –
Assigner GAK Obtainer
map map
only the efficiency of the infrastructure should be affected.
global
packet
tag data
packet
5.1.2 Addressing of data
Inter GAK Some mechanism must exist to label each state update
Channel
API
Channel
such that it may be determined which other entities require
Out packets In this data. A number of techniques have been used to
packets packets
accomplish this task: the JPSD predicate mechanism [3],
Communication
Fabric
routing spaces within the RTI, and categories within
Thema [5] among others. While each technique listed
above presents a substantially different interface to the
Figure 2: Experimental Architecture Components user, internally all three produce an abstract tag based on
the user-presented information. This tag is used to
generically group like elements for efficient bundling onto
5.1 Key abstractions limited channel resources, and/or as the mechanism to
The primary abstractions used within the experimental establish what subset of the simulation shared state is
architecture are described below. Where possible, required at each host.
terminology has been drawn from existing sources. Terms To support experimenting with a varied set of addressing
from the DMSO RTI 2.0 and Thema internal architectures schemes and to avoid the complications of including the
are used extensively. effects of differing user interfaces in our experiments,
abstract tags are used within this research. For the
5.1.1 Communications within a cluster purposes of our analysis, we assume some other
component of the system has assigned a tag to any given
Of the various low-latency hardware systems under state update, and that remote simulation hosts state what
consideration, a number of different communication styles data they require also in terms of abstract tags. Tags are
are supported. For example, SCI and SCRAMnet provide then mapped to channels using an algorithm that attempts
forms of shared memory across distributed hosts, Myrinet to maximize similarities in the tags assigned to a channel.
supports only point to point traffic, and ATM provides for If an efficient mapping is achieved, simulation hosts
point to point or point to multi-point traffic. For the joining channel X to obtain tag Y data will receive a
purposes of this architecture, the term communication minimal number of non-relevant state updates. Central to
fabric (drawn from the networking community, and in such algorithms is the ability to analyze current tag
particular, ATM) is used to encompass the broad set of production and consumption patterns. Note that such tag
physical communication devices. usage information is resident at each host in the system,
and thus is a global knowledge problem. The
architectural component responsible for mapping tags to mapping. Without this loosening, the GAK can consume
channels is thus referred to as the Global Addressing cycles and bandwidth well in excess of target goals, and
Knowledge (GAK) component, as drawn from the RTI 2.0 may also never 'catch up' to the best current mapping.
internal design. Note that while the desired abstraction
Channel: Channels are the communication abstraction for
and generality is gained, application-level knowledge is
distributed hosts. Channels may have 0..N producers and
lost which potentially could be used to further optimize
0..N consumers. Channels may restrict the number of
GAK-level algorithms.
producers and/or consumers to best match a given
hardware system. They may also bundle data for extra
5.1.3 Remote controllers
efficiency.
The experimentation architecture provides for remote Channels present a single transmit, multiple recipient API
controller connectivity via a local agent construct. An to clients, which is then implemented in whatever the
agent is attached to the cluster on behalf of a remote hardware efficiently supports. Due to hardware
controller. Agents interact with clustered models strictly restrictions, there may also be a limit on the number of
as another abstract entity producing and consuming data. available channels.
Using the entity interface, an agent subscribes to a subset
of the simulation shared state. From that subset, an 5.3 Other components
application-specific view is created. A view is a
transformation of the simulation state into what is For the purposes of functional isolation, a number of other
viewable at the remote controller site. It may be a simple components are described in this architecture. Functions
aggregation of platform-level data into platoon-level data, such as predictive contracts, load balancing, variable
the same data presented at a lower resolution, or a resolution data and similar techniques are expected to be
pixelized representation of the simulation state required performed within any large scale distributed simulation.
by the remote controller’s display unit. These functions are abstracted into a single layer of the
architecture, Data Transmission Optimizations [6]. These
From that view (local to the cluster), the agent may use
DTOs are not research tasks under this program, thus their
application-specific knowledge regarding the remote
effects are abstracted into a simple reduction in the
controller to transmit changes to the view. Further,
number of required data transmissions.
predictive contracts may be constructed to effectively hide
the latency between agent updates to the view and Interest Management is another abstracted component.
displays of the view to the controller. Some portion of the system overall is responsible for
translating data requirements and data availability from
Also note the applicability of remote controller agents to
application-specific terms into infrastructure-neutral terms
other remote component problems. This basic agent
suitable for inter-host network communications. In the
construct is also used in the RTI 2.0 design to allow local
JIMS CI, this is done by the MMF. In the RTI 1.3 and
clustering of hosts for hierarchical system scaling
2.0, this is done as a combination of application and RTI
techniques. Within this architecture, multiple clusters and
functionality. For the purposes of our architecture, we
remote models would handled in the same fashion.
assume that some other component has done the above
conversion and all data updates are tagged with an
5.2 Key components
application-neutral identifier.
GAK: The GAK is responsible for an efficient mapping The combined effects of interest management and DTOs
of tagged data to the set of available channels. Efficiency are contained within the Inter-Host Data Manager. This
is determined in a large part by what the underlying component represents all local host operations involved in
hardware will efficiently support and is also affected by managing the remote sharing of simulation state. It
the cost of determining said mapping. Network factors generates tagged state changes ready for distribution and
which must be considered: raw costs of a transmission, provides tag production / consumption data to the GAK.
number of channels effectively supportable by the Test inputs for GAK experiments are in terms of this
hardware, cost of joining or leaving a channel, component and may be extracted from actual data or
unnecessary receptions, and unnecessary transmissions. artificially generated to test GAK behaviour under
Note that the tag assignment algorithm may be described specific conditions.
as a producer/consumer mapping problem, and is believed
to be NP-complete. Further, the set of producers and 6. Research Areas
consumers is dynamic, thus a good mapping will change To test the cluster computer hypothesis stated earlier,
over time. Combined, these two factors will result in three major factors must be demonstrated. First, that
GAK algorithms which only approximate a perfect
clustering fabrics may effectively support the data implementation-level comparison data. In addition, a
exchange characteristics of distributed simulations. simple system-level metric will be used: the reduction of
Second, that it is possible to effectively link a remote system overhead incurred for data distribution. This
controller to a model executing in a clustered encompasses computational resources consumed by the
environment. Third, that GAK algorithms built to exploit GAK at each host, the number of data packets sent and
low latency access to distributed addressing information received, and the number of inter-GAK messages.
will reduce system overhead in distributing data. Full
Subscription and publication test data needs to be
definitions of these factors and initial results are to be
generated in order to test the GAK algorithms. The
presented in a following paper. A summary of ASTT’s
dynamic subscription and publication behavior of the test
current research into these factors is given below.
data can be generated in one of several ways. Either real-
Channel analysis The dominance of point to point world data can be obtained and analyzed [8] or the data
communication in commercial clustering technology can be generated by characterizing the time-varying
requires analysis of the technologies in terms of distribution of subscriptions and publications [9]. By
simulation data exchange characteristics, which tend more using measured data, it is easier to demonstrate the
to point to multipoint communication. A set of simple validity of the results. By using stochastic data, more
tests will be conducted to establish the number of channels diverse application behavior can be generated.
a given hardware solution may efficiently support, and the
For the initial experiments we have chosen a middle
number of publishers and subscribers per channel. These
ground. We will generate the data using a simulation of
tests will be executed across a sample set of clustering
entities that have parameterized kinematic and sensor
hardware solutions. The results will be channeled into the
models. These entities will roam over a sectorized
GAK design process.
battlefield, and subscribe and publish data to cells in the
Remote controllers The linkage between agents and sectorized grid. By using simulated entities, it should be
remote controllers is application-specific, and as such, possible to generate subscription and publication behavior
impossible to prove in the general case. A series of tests that closely matches real-world use and is flexible enough
will be conducted to establish levels of effectiveness in to adequately test the GAK algorithms.
hiding latency for a sample application class.
Analysis tools will be built that provide insight into how
GAK implementations A set of GAK algorithms effectively each GAK algorithm mapped tags to
optimized for cluster execution is under construction in communication channels. The tools will provide both
ASTT. GAK experiments are drawn from addressing static and dynamic views of the GAK effectiveness. In
schemes proposed in [4] and from tag mapping addition, the tools will use standard off-line grouping
approaches from research fields outside of distributed analysis algorithms [10] to provide a measure of the
simulation. Promising approaches have been found in inherent grouping exhibited by the test data.
various online matching problems [7]. A GAK analysis
testbed and sample GAK algorithms are currently under 8. Summary
construction. Metrics and GAK test data descriptions are
given below. Tightly-coupled cluster computing appears to support an
efficient simulation infrastructure. Scalability
7. GAK Metrics and Testing Data improvements are possible within the infrastructure, and
WAN communication become a minor portion of the
Experiments will be conducted that compare the overall system requiring no special tuning for execution.
effectiveness of GAK algorithms running in a cluster An experimentation approach was given in which the
environment. This will provide a relative measure of scope of the ongoing ASTT research effort is defined and
effectiveness for each of the GAK algorithms. In order to relationships to other distributed simulation techniques are
understand the effectiveness of the GAK algorithms in established. Experiments are underway to provide
absolute terms, a best case and worst case GAK will also performance comparisons between data distribution
be measured. The best case GAK will have unlimited algorithms within a cluster and against theoretical
communication channels in which to map tags to. The maximums.
worst case GAK will have only one communication
channel, making it equivalent to a broadcast solution. For 9. Acknowledgments
each experiment scenario, the GAK algorithm under test A number of the abstractions used in the experimentation
will be measured against the performance of both the best architecture are extended from work begun under a
and worst case GAK baselines for the same test inputs. A DMSO RTI 2.0 design contract. Steve Bachinsky, Glen
number of GAK-specific metrics will be used to capture Tarbox, Richard Fujimoto et al were invaluable in
analyzing the basic GAK construct and potential ten years. His research interests include parallel
implementation options which in turn are based on work simulation and distributed systems.
with Ed Powell. Darrin West contributed to the clustering
David Itkin is a Senior Engineer at Science Applications
concept. All work was funded via DARPA, with guidance
International Corporation. David Itkin has been active in
from Dell Lunceford, PM.
the DoD simulation community for 9 years.
10. References
[1] Bachinsky, S., et al. RTI 2.0 Architecture.

Proceedings of the 1998 Spring SIW Workshop.
[2] The VI Architecture.
http://www.intel.com/procs/servers/isv/vi/
[3] Powell, E., et al. Shared State Coherency Using
Interest Expression in a Distributed Interactive
Simulation Application. Simulation Multiconference,
1996.
[4] Powell, E. The Use Of Multicast and Interest
Management in DIS and HLA Applications.
Proceedings of the 15th DIS Workshop.
[5] West, D., Itkin, D., Ramsey, J., Ng, H. Event
Distribution And State Sharing In The Thema
Parallel Discrete Event Simulation Modeling
Framework. Proceedings of the Object Oriented
Simulation Conference, San Diego, January 1998.
[6] Mellon, L. Intelligent Addressing and Routing of
Data Via the HLA RTI Filter Constructs. Proceedings
of the 15th DIS Workshop.
[7] Albers, S. Competitive Online Algorithms. Optima,
54:1-8, 1997. Feature article in the newsletter of the
Mathematical Programming Society.
[8] Billharz, T., Cain, J., Farrey-Goudreau, E., Fieg, D.,
Batsell, S. Performaance and Resource Cost
Comparisons for CBT and PIM Multicast Routing
Protocols for DIS Environments. Proceedings of
Infocom 1996, IEEE, March, 1996.
[9] M. Jeff Danahoo, Ken Calvert and Ellen W. Zequra.
Center Selection and Migration for Wide-area
Multicast Routing. Journal of High Speed Networks.
Volume 6, No. 2, 1997.
[10] Anderberg, M. R. Cluster Analysis for Applications.
Academic press, New York and London, 1973.
[11] Lepler, J, Zhou, D. Cluster Hardware Performance
Survey. ASTT internal report, 1997.
Author Biographies
LARRY MELLON is a senior computer scientist and
branch manager with Science Applications International
Corporation (SAIC). He received his B.Sc. degree from
the University of Calgary and has worked in the area of
parallel and distributed simulation infrastructure for over

Cluster Computing in Large Scale Simulation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster Computing in Large Scale Simulation

Uploaded by

Copyright:

Available Formats

Cluster Computing in Large Scale Simulation

within a cluster are connected via high bandwidth, low

simulation system are abstracted into simple producers

A second line of investigation is to ensure that coupling 4.1 Data distribution

[1] Bachinsky, S., et al. RTI 2.0 Architecture.

You might also like