You are on page 1of 15

Please read: A personal appeal from an author of 549 Wikipedia articles We now accept (INR)

Computer cluster
From Wikipedia, the free encyclopedia

Not to be confused with data cluster.

This article's lead section may not adequately summarize its contents. Please consider expanding the lead to provide an accessible overview of the article's key points. (June
2011)
A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.[1]

Contents
[hide]

1 Categorization

1.1 High-availability (HA) clusters 1.2 Load-balancing clusters 1.3 Compute clusters

2 Implementations

2.1 Consumer game consoles

3 History 4 Technologies

5 See also 6 References 7 Further reading 8 External links

[edit]Categorization [edit]High-availability

(HA) clusters

High-availability clusters (also known as failover cluster) are implemented primarily for the purpose of improving the availability of services that the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure. There are commercial implementations of High-Availability clusters for many operating systems. The LinuxHA project is one commonly used free software HA package for the Linux operating system. [edit]Load-balancing

clusters

Load-balancing is when multiple computers are linked together to share computational workload or function as a single virtual computer. Logically, from the user side, they are multiple machines, but function as a single virtual machine. Requests initiated from the user are managed by, and distributed among, all the standalone computers to form a cluster. This results in balanced computational work among different machines, improving the performance of the cluster systems. [edit]Compute

clusters

Often clusters are used primarily for computational purposes, rather than handling IO-oriented operations such as web service or databases. For instance, a cluster might support computational simulations of weather or vehicle crashes. The primary distinction within computer clusters is how tightly-coupled the individual nodes are. For instance, a single computer job may require frequent communication among nodes: this implies that the cluster shares a dedicated network, is densely located, and probably has homogenous nodes. This cluster design is usually referred to as Beowulf Cluster. The other extreme is where a computer job uses one or few nodes, and needs little or no inter-node communication. This latter category is sometimes called "Grid" computing. Tightly-coupled compute clusters are designed for work that might traditionally have been called "supercomputing". Middleware such as MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) permits compute clustering programs to be portable to a wide variety of clusters. [edit]Implementations

The TOP500 organization's semiannual list of the 500 fastest computers usually includes many clusters. TOP500 is a collaboration between the University of Mannheim, the University of Tennessee, and the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory. As of July 2011 the top supercomputer is the K computer in Kobe, Japan, with performance of 8.162 PFlopsmeasured with the LINPACK benchmark. Clustering can provide significant performance benefits versus price. The System X supercomputer at Virginia Tech, the 28th most powerful supercomputer on Earth as of June 2006[2], is a 12.25 TFlops computer cluster of 1100 Apple XServe G5 2.3 GHz dual-processor machines (4 GB RAM, 80 GB SATA HD) running Mac OS X and using InfiniBand interconnect. The cluster initially consisted of Power Mac G5s; the rack-mountable XServes are denser than desktop Macs, reducing the aggregate size of the cluster. The total cost of the previous Power Mac system was $5.2 million, a tenth of the cost of slower mainframe computersupercomputers. (The Power Mac G5s were sold off.) The central concept of a Beowulf cluster is the use of commercial off-the-shelf (COTS) computers to produce a cost-effective alternative to a traditional supercomputer. One project that took this to an extreme was the Stone Soupercomputer. However it is worth noting that Flops (floating point operations per second), aren't always the best metric for supercomputer speed. Clusters can have very high Flops, but they cannot access all data in the cluster as a whole at once. Therefore clusters are excellent for parallel computation, but much poorer than traditional supercomputers at non-parallel computation. JavaSpaces is a specification from Sun Microsystems that enables clustering computers via a distributed shared memory. [edit]Consumer

game consoles

Due to the increasing computing power of each generation of game consoles, a novel use has emerged where they are repurposed into High-performance computing (HPC) clusters. Some examples of game console clusters are Sony PlayStation clusters and Microsoft Xbox clusters. It has been suggested on a news website that countries which are restricted from buying supercomputing technologies may be obtaining game systems to build computer clusters for military use.[3] [edit]History The history of cluster computing is best captured by a footnote in Greg Pfister's In Search of Clusters: Virtually every press release from DEC mentioning clusters says DEC, who invented clusters.... IBM did not invent them either. Customers invented clusters, as soon as they could not fit all their work on one computer, or needed a backup. The date of the first is unknown, but it would be surprising if it was not in the 1960s, or even late 1950s.[4]

The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. Amdahl's Law describes mathematically the speedup one can expect from parallelizing any given otherwise serially performed task on a parallel architecture. This article defined the engineering basis for both multiprocessor computing and cluster computing, where the primary differentiator is whether or not the interprocessor communications are supported "inside" the computer (on for example a customized internal communications bus or network) or "outside" the computer on a commodity network. Consequently the history of early computer clusters is more or less directly tied into the history of early networks, as one of the primary motivations for the development of a network was to link computing resources, creating a de facto computer cluster. Packet switching networks were conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched network, the ARPANET project succeeded in creating in 1969 what was arguably the world's first commodity-network based computer cluster by linking four different computer centers (each of which was something of a "cluster" in its own right, but probably not a commodity cluster). The ARPANET project grew into the Internetwhich can be thought of as "the mother of all computer clusters" (as the union of nearly all of the compute resources, including clusters, that happen to be connected). It also established the paradigm in use by all computer clusters in the world todaythe use of packet-switched networks to perform interprocessor communications between processor (sets) located in otherwise disconnected frames. The development of customer-built and research clusters proceeded hand in hand with that of both networks and the Unix operating system from the early 1970s, as both TCP/IP and the Xerox PARC project created and formalized protocols for network-based communications. The Hydra operating system was built for a cluster of DEC PDP-11 minicomputers called C.mmp at Carnegie Mellon University in 1971. However, it was not until circa 1983 that the protocols and tools for easily doing remote job distribution and file sharing were defined (largely within the context of BSD Unix, as implemented by Sun Microsystems) and hence became generally available commercially, along with a shared filesystem. The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet was not a commercial success and clustering per se did not really take off until Digital Equipment Corporationreleased their VAXcluster product in 1984 for the VAX/VMS operating system. The ARCnet and VAXcluster products not only supported parallel computing, but also shared file systems and peripheral devices. The idea was to provide the advantages of parallel processing, while maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS systems from HP running on Alpha and Itanium systems. Two other noteworthy early commercial clusters were the Tandem Himalaya (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use).

No history of commodity computer clusters would be complete without noting the pivotal role played by the development of Parallel Virtual Machine (PVM) software in 1989. This open source software based on TCP/IP communications enabled the instant creation of a virtual supercomputera high performance compute cluster made out of any TCP/IP connected systems. Free form heterogeneous clusters built on top of this model rapidly achieved total throughput in FLOPS that greatly exceeded that available even with the most expensive "big iron" supercomputers. PVM and the advent of inexpensive networked PCs led, in 1993, to a NASA project to build supercomputers out of commodity clusters. In 1995 the Beowulf clustera cluster built on top of a commodity network for the specific purpose of "being a supercomputer" capable of performing tightly coupled parallel HPC computationswas invented,[5] which spurred the independent development of grid computing as a named entity, although Grid-style clustering had been around at least as long as the Unix operating system and the Arpanet, whether or not it, or the clusters that used it, were named. [edit]Technologies MPI is a widely-available communications library that enables parallel programs to be written in C, Fortran, Python, OCaml, and many other programming languages. The GNU/Linux world supports various cluster software; for application clustering, there is Beowulf, distcc, and MPICH. Linux Virtual Server, Linux-HA - director-based clusters that allow incoming requests for services to be distributed across multiple cluster nodes. MOSIX, openMosix, Kerrighed, OpenSSI are full-blown clusters integrated into the kernel that provide for automatic process migration among homogeneous nodes. OpenSSI, openMosix and Kerrighed are single-system image implementations. Microsoft Windows Compute Cluster Server 2003 based on the Windows Server platform provides pieces for High Performance Computing like the Job Scheduler, MSMPI library and management tools. NCSA's recently installed Lincoln is a cluster of 450 Dell PowerEdge 1855 blade servers running Windows Compute Cluster Server 2003. This cluster debuted at #130 on the Top500 list in June 2006. gridMathematica provides distributed computations over clusters including data analysis, computer algebra and 3D visualization. It can make use of other technologies such as Altair PBS Professional, Microsoft Windows Compute Cluster Server, Platform LSF and Sun Grid Engine.[6] gLite is a set of middleware technologies created by the Enabling Grids for E-sciencE (EGEE) project. Another example of consumer game products being added to high-performance computing is the Nvidia Tesla Personal Supercomputer workstation, which gets its processing power by harnessing the power of multiple graphics accelerator processor chips. Algorithmic skeletons are a high-level parallel programming model for parallel and distributed computing which take advantage of common programming patterns to hide the complexity of parallel and distributed

applications. Starting from a basic set of patterns (skeletons), more complex patterns can be built by combining the basic ones. Global Storage Architecture (GSA)a highly scalable cloud based NAS solutioncombines proprietary IBM HPC technology (storage and server hardware and IBM's high-performance shared-disk clustered file system, GPFS) with open source components like Linux, Samba and CTDB to deliver distributed storage solutions. GSA exports the clustered file system through industry standard protocols like CIFS, NFS, FTP and HTTP. All of the GSA nodes in the grid export all files of all file systems simultaneously.[7] [edit]See

also

This "see also" section may contain an excessive number of suggestions. Please ensure that only the most relevant suggestions are given and that they are not red links, and consider integrating suggestions into the article itself. (June 2011)

Grid computing Botnet DEGIMA (computer cluster) Computer cluster in virtual machines Clustered file system Distributed data store Flash mob computing GPU cluster HP ServiceGuard Peer-to-peer Red Hat Cluster Suite RoS (computing) Server farm Compile farm Single system image Solaris Cluster Symmetric multiprocessing Two-node cluster


[edit]References

Veritas Cluster Server

1.

^ Bader, David; Robert Pennington (June 1996). "Cluster Computing: Applications". Georgia Tech College of Computing. Retrieved 2007-07-13.

2. 3.

^ TOP500 List - June 2006 (1-100) | TOP500 Supercomputing Sites ^ Farah, Joseph (2000-12-19). "Why Iraq's buying up Sony PlayStation 2s". World Net Daily.

4.

^ Pfister, Gregory (1998). In Search of Clusters (2nd ed.). Upper Saddle River, NJ: Prentice Hall PTR. p. 36. ISBN 0-13-899709-8.

5. 6. 7.

^ http://www.beowulf.org/overview/history.html ^ gridMathematica Cluster Integration ^ Chari, Srini (2009). "Mastering the Odyssey of Scale from Nano to Peta: The Smart Use of High Performance Computing (HPC) Inside IBM". Denbury, CT: IBM. p. 5.

[edit]Further

reading

Mark Baker, et al., Cluster Computing White Paper [1], 11 Jan 2001. Karl Kopper: The Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software, No Starch Press, ISBN 159327-036-4

Robert W. Lucke: Building Clustered Linux Systems, Prentice Hall, 2005, ISBN 0-13-144853-6

Evan Marcus, Hal Stern: Blueprints for High Availability: Designing Resilient Distributed Systems, John Wiley & Sons, ISBN 0-471-35601-8

Greg Pfister: In Search of Clusters, Prentice Hall, ISBN 0-13-899709-8 Rajkumar Buyya (editor): High Performance Cluster Computing: Architectures and Systems, Volume 1, ISBN 0-13-013784-7, Prentice Hall, NJ, USA, 1999.

Rajkumar Buyya (editor): High Performance Cluster Computing: Programming and Applications, Volume 2, ISBN 0-13-013785-5, Prentice Hall, NJ, USA, 1999.

[edit]External

links

Wikimedia Commons has media related to: Computer cluster

IEEE task force on cluster computing www.ClusterGate.RU - information on midrange computer clusters J2EE Clustering LanderCluster Survey of Popular Clustering Technologies Build windows cluster with iSCSI Reliable Scalable Cluster Technology, IBM Tivoli System Automation Wiki

[hide]v d eParallel computing General Levels Cloud computing High-performance computing Cluster computing Distributed computing Grid computing Bit Instruction Data Task

Threads Superthreading Hyperthreading Theory Amdahl's law Gustafson's law Cost efficiency KarpFlatt metric slowdown speedup

Element Process Thread Fiber PRAM Instruction window s Multiprocessing Multithreading (computer Coordin architecture) Memory coherency Cache coherency Cache ation invalidation Barrier Synchronization Application checkpointing Models (Implicit parallelism Explicit Progra parallelism Concurrency) Flynn's mming taxonomy (SISD SIMD MISD MIMD (SPMD)) Thread (computer science) Non-blocking algorithm Hardwa re Multiprocessor (Symmetric Asymmetric) Memory (NUMA COMA distributed shared distributed shared) SMT MPP Superscalar Vector

processor Supercomputer Beowulf Ateji PX POSIX Threads OpenMP OpenHMPP PVM MPI UPC Intel APIs Threading Building Blocks Boost.Thread Global Arrays Charm++ Cilk Co-array Fortran OpenCL CUDA Dryad DryadLINQ Embarrassingly parallel Grand Challenge Software Proble lockout Scalability Race ms conditions Deadlock Livelock Deterministic algorithm Parallel slowdown

Category
View page ratings

Commons

Rate this page


What's this? Trustworthy Objective Complete Well-written

I am highly knowledgeable about this topic (optional) Submit ratings Categories:


Log in / create account

Cluster computing Parallel computing Concurrent computing Supercomputers Local area networks Classes of computers Fault-tolerant computer systems

Article Discussion Read

Edit View history


Top of Form

Bottom of Form

Main page Contents Featured content Current events Random article Donate to Wikipedia

Interaction Help About Wikipedia Community portal Recent changes Contact Wikipedia Toolbox Print/export Languages Catal esky Dansk Deutsch Eesti Espaol Esperanto Franais Galego Bahasa Indonesia Italiano Magyar Nederlands Norsk (nynorsk) Polski Portugus Shqip Simple English

/ Srpski Suomi Svenska Trke


This page was last modified on 11 November 2011 at 12:00. Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of use for details. Wikipedia is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

Contact us Privacy policy About Wikipedia Disclaimers Mobile view

Base One International Corporation


.NET database and distributed computing tools
Home Products Consulting Case Studies Order Contents Contact About Us

Technologies
Visual Studio Database Overview Benefits Internet Server SOA Multiproces sing Grid & Cluster Rich Client Thin Client Info Sharing Distributed Computing

Database-Centric Grid and Cluster Computing


Applications built with Base One's Database Library can be interactive (with an end-user sitting in front of a PC) or batchoriented, such as grid and cluster computing services. In Base One's architecture, distributed computing applications are governed by a database representation of the tasks to be done and the rules for running them. Tasks to be done can be long-running jobs or, using the built-in high speed queuing facilities, be short-running, As explained, Base One's database-centric asynchronous architecture is unique in that it does not use transactions. a "master/slave" messaging approach to distributed computing. In addition, unlike Base One's parallel some cluster architectures, it does not processing software model a cluster of computers as implements a communicating through a single, decentralized model in universally addressable, main memory. which each processing Instead, each node (machine) in Base One's node (computer) is "virtual supercomputer" is considered to responsible for finding its be fully independent and uses one or more own work, as opposed to a shared, recoverable databases to convey centralized model where a intermediate processing results. This "master" machine decides architecture permits efficient, safe reuse of which "slave" machines database connections between completely should do what work next unrelated applications. and sends them messages. In Base One's model, all that is With Base One's required to get tasks done in the implementation, background (by a pool of available applications look for work computers) is to add or change a by examining a shared record in the database, specifying the database at regular module to be launched and a list of intervals and whenever a parameters to be passed. In this way, task completes. Once interactive and batch applications can easily activated, an automated "submit" batch jobs for other machines to

Products
B1Framework BFC Number Class Purchase

Case Studies
Finance Media Insurance Communications Medical

About Base One


Customers Consulting Management Contact

Site Contents

batch processing application decides what to run next from the communal pool of pending work stored in the database. The database-centric approach leverages the indexing, transaction processing, integrity, recovery, and security capabilities provided by high-end database systems. Eliminating the concept of a master processor also removes a potential performance bottleneck and point of failure, resulting in applications that can be extended more easily and can adapt dynamically to changing workloads. This architecture makes it straightforward to design, implement, and manage distributed applications that arescalable, fault tolerant, and highly reliable. Base One's architecture is well suited to all forms of standard, back office database processing. With its uniform high level database API, built-in Data Dictionaries, and integration with Microsoft Visual Studio, Base One makes it simple to developdistributed applic ations. Even without any inherent demand for database technology, computationally intensive applications can benefit from Base One's

discover and run in parallel. (Database access itself is done synchronously, i.e. an application waits until any data storage or retrieval operation has completed before proceeding.) Any application built with Base One's Database Library can incorporate the basic facilities for batch processing, as well as those for submitting, scheduling, and monitoring batch jobs. Using Base One's Internet Server (BIS), operations staff can launch jobs and monitor results from remote locations - while getting the performance benefits of running jobs on a cluster close to the database.

architecture - by using databases for communication and storage of temporary data. The built-in administration facilities, which tie into DBMS and network security systems, support both interactive users (people) and batch users (computers). This allows grid and cluster computing security to be controlled through the same, familiar mechanisms already used by the operations staff to control the security of the organization's interactive applications.

Rich Client Architecture

Base One distributed computing - Executive overview

Home

Products

Consulting

Case Studies

Order

Contents

Contact

About Us

Copyright 2011, Base One International Corporation

luster computing
In computers, clustering is the use of multiple computers, typically PCs or UNIX Print E-mail This AA A LinkedIn AAA Twitter Facebook RSS Share Reprints workstations, multiple storage devices, and redundant interconnections, to form

what appears to users as a single highly available system. Cluster computing can be used for load balancing as well as for high availability. Advocates of clustering suggest that the approach can help an enterprise achieve 99.999 availability in some cases. One of the main ideas of cluster computing is that, to the outside world, the cluster appears to be a single system.

A common use of cluster computing is to load


LEARN MORE

Mainframe security and disaster recovery Public cloud computing services x86 commodity rackmount servers

balance traffic on high-traffic Web sites. A Web page request is sent to a "manager" server, which then determines which of several identical or very similar Web servers to forward the request to for handling. Having a Web farm (as such a configuration is sometimes called) allows traffic to be handled more quickly. Clustering has been available since the 1980s when it was used in DEC's VMS systems. IBM's Sysplex is a cluster approach for a mainframe system. Microsoft, Sun Microsystems, and other leading hardware and software companies offer clustering packages that are said to offerscalability as well as availability. As traffic or availability assurance increases, all or some parts of the cluster can be increased in size or number. Cluster computing can also be used as a relatively low-cost form of parallel processing for scientific and other applications that lend themselves to parallel operations. An early and well-known example was the Beowulf project in which a number of off-the-shelf PCs were used to form a cluster for scientific applications.

You might also like