You are on page 1of 43

Cluster Computing

What is a Cluster?
A collection of independent computer systems working together
as if a single system
It is an interconnected stand-alone computers which work
cooperatively as a single integrated computing resource.
Coupled through a scalable, high bandwidth, low latency
interconnect
The network of compute nodes are connected by LAN/SAN and
are typically homogeneous with distributed control running
Unix/Linux.
Suited to HPC
Key Benefits of Clusters
High performance: running cluster enabled programs
Scalability: adding servers to the cluster or by adding more
clusters to the network as the need arises or CPU to SMP
High throughput
System availability (SA): offer inherent high system availability
due to the redundancy of hardware, operating systems, and
applications
Cost-effective
Classification of
Cluster Computers

5
Clusters Classification
Based on Focus (in Market)
High performance (HP) clusters
Grand challenging applications
High availability (HA) clusters
Mission critical applications
Clusters Classification
Based on Workstation/PC Ownership
Dedicated clusters
Non-dedicated clusters
Adaptive parallel computing
Can be used for CPU cycle stealing
Clusters Classification
Based on Node Architecture
Clusters of PCs (CoPs)
Clusters of Workstations (COWs)
Clusters of SMPs (CLUMPs)
Clusters Classification
Based on Node Components Architecture &
Configuration:
Homogeneous clusters
All nodes have similar configuration
Heterogeneous clusters
Nodes based on different processors and running different
OS
Clusters Classification
Based on Levels of Clustering
Group clusters (# nodes: 2-99)
A set of dedicated/non-dedicated computers --- mainly
connected by SAN like Myrinet
Departmental clusters (# nodes: 99-999)
Organizational clusters (# nodes: many 100s)
Internet-wide clusters = Global clusters
(# nodes: 1000s to many millions)
Metacomputing
Clusters and Their
Commodity Components
Cluster Components
Nodes
Multiple high performance components:
PCs
Workstations
SMPs
Distributed HPC systems leading to Metacomputing
They can be based on different architectures and running
different OS
Cluster Components
Processors
There are many (CISC/RISC/VLIW/Vector..)
Intel: Pentiums, Xeon, Merced.
Sun: SPARC, ULTRASPARC
HP PA
IBM RS6000/PowerPC
SGI MPIS
Digital Alphas
Integrating memory, processing and networking into a single chip
IRAM (CPU & Mem): (http://iram.cs.berkeley.edu)
Alpha 21366 (CPU, Memory Controller, NI)
Cluster Components
OS
State of the art OS:
Tend to be modular: can easily be extended and new subsystem
can be added without modifying the underlying OS structure
Multithread has added a new dimension to parallel processing
Popular OS used on nodes of clusters:
Linux(Beowulf)
Microsoft NT (Illinois HPVM)
SUN Solaris (Berkeley NOW)
IBM AIX (IBM SP2)
..
Cluster Components
High Performance Networks
Ethernet (10Mbps)
Fast Ethernet (100Mbps)
Gigabit Ethernet (1Gbps)
SCI (Dolphin - MPI- 12 usec latency)
ATM
Myrinet (1.2Gbps)
Digital Memory Channel
FDDI
Cluster Components
Network Interfaces
Dedicated Processing power and storage embedded in the
Network Interface Mryicom
An I/O card today Net

Tomorrow on chip? 160 MB/s


Myricom
NIC
M
P

M I/O bus (S-Bus)


50 MB/s

P Sun Ultra 170


Cluster Components
Network Interfaces
Network interface card
Myrinet has NIC
User-level access support: VIA
Alpha 21364 processor integrates processing, memory
controller, network interface into a single chip..
Cluster Components
Communication Software
Traditional OS supported facilities (but heavy weight due to
protocol processing)..
Sockets (TCP/IP), Pipes, etc.
Light weight protocols (user-level): minimal Interface into OS
User must transmit directly into and receive from the network
without OS intervention
Communication protection domains established by interface
card and OS
Treat message loss as an infrequent case
Active Messages (Berkeley), Fast Messages (UI), ...
Cluster Components
Cluster Middleware
Resides between OS and applications and offers an infrastructure
for supporting:
Single System Image (SSI)
System Availability (SA)
SSI makes collection of computers appear as a single machine
(globalized view of system resources)
SA supports check pointing and process migration, etc.
Cluster Components
Middleware Components
Hardware
DEC Memory Channel, DSM (Alewife, DASH) SMP techniques
OS/gluing layers
Solaris MC, Unixware, Glunix
Applications and Subsystems
System management and electronic forms
Runtime systems (software DSM, PFS etc.)
Resource management and scheduling (RMS):
CODINE, LSF, PBS, NQS, etc.
Cluster Components
Programming Environments
Threads (PCs, SMPs, NOW, ..)
POSIX Threads
Java Threads
Message Passing Interface (MPI)
Linux, NT, on many Supercomputers
Parallel Virtual machine (PVM)
Software DSMs (Shmem)
Cluster Components
Development Tools
Compilers
C/C++/Java/
RAD (rapid application development tools):
GUI based tools for parallel processing modeling
Debuggers
Performance monitoring and analysis tools
Visualization tools
Cluster Programming Environments
Shared Memory Based
DSM
Threads/OpenMP (enabled for clusters)
Java threads (HKU JESSICA, IBM cJVM)
Message Passing Based
Parallel Virtual Machine (PVM)
Message Passing Interface (MPI)
Parametric Computations
Nimrod-G
Automatic Parallelising Compilers
Parallel Libraries & Computational Kernels (e.g., NetSolve)
Cluster Components
Applications
Sequential
Parallel/distributed (cluster-aware applications)
Grand challenging applications
Weather Forecasting
Quantum Chemistry
Molecular Biology Modeling
Engineering Analysis (CAD/CAM)
.
Web servers, data-mining
Cluster Middleware
and
Single System Image
Single System Image (SSI)
A single system image is the illusion, created by software or
hardware, that a collection of computers appear as a single
computing resource
SSI
Supported by a middleware layer that resides between the OS
and user-level environment
Middleware consists of essentially 2 sublayers of S/W
infrastructure
SSI infrastructure
Glue together OSs on all nodes to offer unified access to
system resources
System availability infrastructure
Enable cluster services such as checkpointing (saving a
snapshot of the application's state, so that it can restart
from that point in case of failure), automatic failover,
recovery from failure, & fault-tolerant support among all
nodes of the cluster
Middleware Design Goals
Complete transparency
Let users see a single cluster system
Single entry point, ftp, telnet, software loading...
Scalable performance
Easy growth of cluster
no change of API and automatic load distribution
Enhanced availability
Automatic recovery from failures
Employ checkpointing and fault tolerant technologies
Handle consistency of data when replicated..
Single System Image
Benefits:
Usage of system resources transparently
Improved reliability and higher availability
Simplified system management
Reduction in the risk of operator errors
User need not be aware of the underlying system architecture
to use these machines effectively
Illusion of a single system
Single system Entire cluster is viewed by users as one
system that has multiple processors.
Single control Logically an end user or system user utilizes
services from one place with a single interface.
Symmetry A user can use a cluster service from any node
i.e. all cluster services and functionalities are symmetric to all
nodes and all users, except those protected by access rights.
Location- transparent User is not aware of the whereabouts
of the physical device that eventually provides a service.
Process view of a Cluster node
Home node: node where the process P resided when it was
created
Local node: node where P currently resided. All other nodes are
remote nodes.
Host Node: used for user logins through Telnet, rlogin, FTP and
HTTP
Compute node : performs computational jobs
I/O node: serves file I/O requests.
Desired SSI Services
Single entry point- enables users to log in to a cluster as one
virtual host to serve the logon sessions.
Single file hierarchy Illusion of a single huge file system image
that transparently integrates local and global disks and other file
devices. AFS, Solaris MC Proxy
Single control point: administrator should be able to configure,
monitor, test and control the entire cluster and each individual
node from a single point i.e. manage from single GUI
Single virtual networking
Single memory space - DSM
Single job management: submission of jobs from any node to
single job management system.
Single user interface: like workstation/PC windowing environment
SSI Levels
Single system support can exist at different levels within
a system, one is able to be built on another

Application and Subsystem Level

Operating System Kernel Level

Hardware Level
Availability Support Functions
Single I/O space (SIO):
Any node can access any peripheral or disk devices without the
knowledge of physical location.
Single process space (SPS)
Any process can create processes on any node, and they can
communicate through signals, pipes, etc, as if they were one a
single node
Checkpointing and process migration
Saves the process state and intermediate results in memory or disk;
process migration for load balancing
Reduction in the risk of operator errors

36
Relationship among Middleware
Modules
Strategies for SSI
Build as a layer on top of existing OS (e.g. Glunix)
Benefits:
Makes the system quickly portable, tracks vendor software upgrades,
and reduces development time
New systems can be built quickly by mapping new services onto the
functionality provided by the layer beneath, e.g. Glunix/Solaris-MC
Build SSI at the kernel level (True Cluster OS)
Good, but cant leverage of OS improvements by vendor
e.g. Unixware and Mosix (built using BSD Unix)
Resource Management and
Scheduling (RMS)
RMS is the act of distributing applications among computers to
maximize their throughput
Enable the effective and efficient utilization of the resources
available
Software components
Resource manager
Locating and allocating computational resource, authentication,
process creation and migration
Resource scheduler
Queuing applications, resource location and assignment. It instructs
resource manager what to do when (policy)
Reasons for using RMS
Provide an increased, and reliable, throughput of user
applications on the systems
Load balancing
Utilizing spare CPU cycles
Providing fault tolerant systems
Manage access to powerful system, etc
Basic architecture of RMS: client-server system
RMS Components
5 . U s e r re a d s
r e s u lts f r o m
s e r v e r v ia W W W

1 . U s e r s u b m it s
W o r ld -W id e W e b J o b S c r ip t v ia
W W W

U ser
2 . S e r v e r r e c e iv e s jo b r e q u e s t a n d a s c e r t a in s b e s t n o d e
S e r v e r (C o n ta in s P B S -L ib r a & P B S W e b )

N e tw o r k o f d e d ic a te d c lu s te r n o d e s

3 . S e r v e r d is p a tc h e s
jo b to o p tim a l n o d e

4 . N o d e r u n s jo b
a n d re tu rn s
r e s u lt s t o s e r v e r
Libra: An example cluster scheduler
User Cluster Management
Application System
(PBS)

Job Job (node 1)


Input Dispatch
Control Control

Application
Budget Best
User Check Node (node 2)
Control Evaluator
..
.
Deadline Control

Node Querying
Module
(node N)

Cluster Worker Nodes


Scheduler (Libra) with node monitor
(pbs_mom)
Server: Master Node
Services provided by RMS
Process Migration
Computational resource has become too heavily loaded
Fault tolerant concern
Checkpointing
Scavenging Idle Cycles
70% to 90% of the time most workstations are idle
Fault Tolerance
Minimization of Impact on Users
Load Balancing
Multiple Application Queues

You might also like