Professional Documents
Culture Documents
Quinn Lewis
ABSTRACT
matching jobs (applications) to computational resources. This paper applies the virtual
design-time.
INTRODUCTION
criteria, such as performance, reliability, and scalability, to achieve the objectives of the
user(s). In Grid computing, a scalable way to harness large amounts of computing power
resources for the purposes of perhaps several users can be difficult. In such an
requirements and preferences for how they would like their applications and services to
leverage the resources made available by resource providers. Resource providers must
ensure the resources meet a certain quality of service (e.g. make resources securely and
In the past, control over the availability, quantity, and software configurations of
resources has been limited to the resource provider. With virtualization, it becomes
possible for resource providers to offer up more control of the resources to a user without
can more easily create execution environments that meet the needs of their applications
and jobs within the policies defined by the resource providers. Such a relationship,
enabled by virtualization, is both cost-effective and flexible for the resource producer and
consumer. [1]
The virtual workspace term, initially coined in [2] for use with the Globus
environment can encompass several physical resources. Generically, this concept could
jobs" [4]. Condor currently abstracts the resources of a single physical machine into
virtual machines which can run multiple jobs at the same time [5]. A "universe" is used
to statically describe the execution environment in which the jobs are expected to run.
This approach assumes the resources (whether real or virtual) have to all be allocated in
advance. While there is support for adding more resources to an existing pool via the
Glide-in mechanism, the user still has to dedicate the use of these other physical
resources.
(universe) can be dynamically created at run-time by users to more flexibly and cost-
effectively use and manage existing resources using virtualization. Two of the unique
implementation details described in this paper are the use of Microsoft Windows and
Microsoft Virtual Server 2005 R2 for the virtual machine manager (VMM) on the host
operating system (instead of being Linux-based using Xen or VMWare) and the use of
differencing virtual hard disks. More details about virtual workspaces and similar
attempts to virtualize Condor are described in Related Work. The implementation details
of the work performed for a dynamic Condor universe are provided along with
performance tests results. Future enhancements are included for making this work-in-
RELATED WORK
software development and testing, the work outlined in this paper most directly applies to
systems.
Grid Computing
The use of virtualization in Grid computing has been proposed before, touting the
benefits of legacy application support, improved security, and the ability to deploy
creating and managing virtual machines are also described [6]. The virtual workspace
concept [7] extended [6] to present "a unified abstraction" and address additional issues
associated with the complexities of managing such an environment in the Grid. Two key
differences between the Grid-related work mentioned and this paper is the emphasis on
with the Globus Toolkit to temporarily make Globus resources available to a user’s
Condor pool. This has the advantage of being able to submit Condor jobs using Condor
it is expected that the user acquire these remote resources before the jobs are executed.
Using virtualization allows the existing “local” Condor resources to be leveraged as the
jobs require.
Clusters
Many of the same motivations that exist for this work have also been applied to
clusters [9, 10] but focus more on dynamically provisioning homogenous execution
system. The resources are assumed to physically exist and the software is deployed by
re-imaging the machine. In [10], virtualization is used to provision the software on the
cluster(s) but the time required to stage in the virtual image(s) is costly. The use of the
“differencing” virtual hard disk image type in this work offers a mitigating solution to
Condor
Windows campus machines into Unix-based machines required by researchers [12]. The
solution leveraged coLinux to run a Condor compute node through a Windows device
driver [13]. While some of the same motivation exists for this work, using a
virtualization technology such as Virtual Server 2005 R2 allows other operating systems
and versions to be used and provides more flexible ways to programmatically control the
dynamic environment.
IMPLEMENTATION
availability, and match jobs to resources and introduce a flexible extension for
dynamically describing, deploying, and using virtual execution resources in the Condor
universe.
In Condor, one or more machines (resources) along with jobs (resource requests)
are part of a collection, known as a pool. The resources in the pool have one or more of
the following roles: Central Manager, Execute, and/or Submit. The Central Manager
collects information and negotiates how jobs are matched to available resources. Submit
resources allow jobs to be submitted to the Condor pool through a description of the job
and its requirements. Execute resources run jobs submitted by users after having been
virtualization into Condor. Each Execute resource describes the extent to which it can be
virtualized (to the Central Manager) and is responsible for hosting additional (virtual)
resources. The Submit resource(s) takes a workflow of jobs and requirements and
initiates the deployment of the virtual resources plus signals its usage (start/stop) to the
host/execute machine. The Central Manager is responsible for storing virtual machine
metadata used for scheduling. For this implementation, a single machine is used for the
to the Central Manager via authorized use of condor_advertise. Attributes about the
virtual Execute resources, such as the operating system (and version), available memory
and disk space, and more specific data about the status of the virtual machine are
included. Currently, the “host” Execute resource invokes condor_advertise for each
allows virtual resources to appear almost indistinguishable from real physical resources
and will be included in Condor’s resource scheduling. Note that real resources are
running while the virtual resources are not. They have only been described.
Using the standard Condor tools, such as condor_status, users can view the
resources (real and virtual) available in the pool. Users can then create workflows (using
Windows Workflow Foundation [16]) for one or more jobs that intend to run on the
provided resources. Since the virtual resource(s) may not be running when a job is
submitted, the initial scheduling will fail. Fortunately, Condor provides a SOAP-based
API for submitting and querying jobs [15]. Using this Condor API via workflows,
unsuccessful job submissions can be checked for the intended attributes of the advertised
The user can indicate specific job requirements in the workflow. These
requirements can optionally specify the location of the files required to run the virtual
machine for consumer flexibility (assuming the provider has allowed it). These files
provide the operating system and necessary configuration (including Condor) for
executing the job. The workflow is invoked by the Submit machine. If the virtual
resource is specified by the workflow, the workflow manager on the Submit machine
either transfers the virtual machine files to the Execute resource or provides the Execute
resource with the location and protocol for retrieving the virtual machine files. (The
automatic copying of virtual images was not completely implemented for this paper.) For
performance, it is expected that host Execute machines have base virtual images local to
the resource that provide the operating system and Condor. Additional software and
configuration can be added by in a separate file that only stores the modified blocks from
a parent hard disk (file), called differencing virtual disks. This provides a flexible
balance, allowing resource providers to provide base images and giving resource
The workflow, running on the Submit machine, also provides the logic for starting
the virtual resource on the host. Microsoft Virtual Server R2 provides an API for
managing local and remote virtual machines. The workflow leverages this API for
starting the virtual resources. For this paper, it assumes that virtual resources are started
from a “cold” state. The result is that startup times are as long as a normal boot time for
running Windows XP was used as the Central Manager, Execute, and Submit role. Two
virtual Execute machines, running Debian Linux 3.1 and Windows 2000, each with 128
MB RAM were created. A virtual network was created to allow communication between
MEME job was submitted to the Condor pool using the standard Condor command-line
tools (e.g. condor_submit). The test input and configuration options were used resulting
in job submission, execution, and result times of less than one minute.
was constructed that submitted the same MEME job to the cluster, specifically requesting
a Windows 2000 or Linux resource. The same test input and configuration options took 6
to 8 minutes on average. Since the virtual machines are programmatically started only
after an initial job schedule fails and are currently starting from a cold state, the start
times include the setup and also reflects the time for the operating system to boot. There
is also an unresolved issue with the (5 minute) cycle time between scheduling when using
Additionally, the Windows 2000 virtual machine was created as a base image
(932 MB) with a differencing virtual disk that included Condor and other support
software (684 MB). Since the differencing disks use a sector bitmap to indicate which
sectors are within the current disk (1’s) or on the parent (0’s), the specification [11]
suggests it may be possible to achieve performance improvements. It also lent itself well
to compression. The 684 MB difference disk was compressed to 116 MB (using standard
ZIP compression). This file could be transferred over a standard broadband Internet
more robust. For example, security was not considered. Also, the current times for
executing short running jobs are not acceptable. Another improvement would be to start
the virtual machines from a “hot” or paused state. Since the virtual machines used in this
exercise were DHCP, the virtual machines would need to have static IPs or have
additional knowledge of when the virtual machines are un-paused. The virtual hard
disk(s) may be further compressed using a specific compression algorithm that takes the
disk format into account. Performance considerations could also be given to differencing
hard disks that are chained together for application extensibility purposes.
run-time that balances the interests of the resource providers and consumers.
REFERENCES
1. Keahey, K., Foster, I., Freeman, T., Zhang, X. Virtual Workspaces: Achieving
Quality of Service and Quality of Life in the Grid. CCGRID 2006, Singapore,
May 2006.
2. Keahey, K., Foster, I., Freeman, T., Zhang, X., Galron, D. Virtual Workspaces in
the Grid. Europar 2005, Lisbon, Portugal, September, 2005.
3. http://workspace.globus.org/vm
4. http://www.cs.wisc.edu/condor/description.html
5. http://www.bo.infn.it/alice/alice-doc/mll-doc/condor/node4.html
6. Figueiredo, R., inda, P., Fortes, Jose. A Case For Grid Computing On Virtual
Machines.
7. Keahey, K., Ripeanu, M., Doering, K. Dynamic Creation and Management of
Runtime Environments in the Grid.
8. http://www.cs.wisc.edu/condor/CondorWeek2005/presentations/user_tutorial.ppt
9. Chase, J., Irwin, D., Grit, L., Moore, J., Sprenkle, S. Dynamic Virtual Clusters in
a Grid Site Manager.
10. Zhang, X., Keahey, K., Foster, I., Freeman, T. Virtual Cluster Workspaces for
Grid Applications.
11. Virtual Hard Disk Image Form Specification. October 11, 2006 – Version 1.0.
Microsoft.
12. Sumanth, J. Running Condor in a Virtual Environment with coLinux.
http://www.cs.wisc.edu/condor/CondorWeek2006/presentations/sumanth_condor
_colinux.ppt
13. Santosa, M., Schaefer, A. Build a heterogeneous cluster with coLinux and
openMosix. http://www-128.ibm.com/developerworks/linux/library/l-
colinux/index.html
14. Condor Version 6.9.2 Manual. http://www.cs.wisc.edu/condor/manual/v6.9/
15. http://www.cs.wisc.edu/condor/birdbath/
16. http://wf.netfx3.com/content/WFIntro.aspx
17. MEME. http://meme.sdsc.edu
18. https://lists.cs.wisc.edu/archive/condor-users/2006-May/msg00296.shtml