You are on page 1of 28

INTRODUCTION

A grid is a collection of distributed


computing resources over a local or wide
area network, that appear to an end-user
or application as one large virtual
computing system.

Shreenath Acharya, SJEC, Vamanjoor

The Grid computing middleware software: Manages and executes all the
activities related to identification, allocation, de-allocation and consolidation of
all the computing resources to the end-users transparently.
It aims at transparent sharing of all computational resources in a grid of
computing systems so as to meet all the dynamic demands of the end-user
community.
Grid computing environment is a necessity to many end-users who cannot
afford huge computational resources, both hardware and software.
Most organizations(large) have a networking of computing resources, which are
under-utilized at some locations and are over-utilized in some locations,
connecting them in a grid and offering transparent ubiquitous services across
the geographical areas, facilitates utilization of under-utilized resources for
specific needs with greater priority.

Shreenath Acharya, SJEC, Vamanjoor

Thus, a dynamically equitable distribution of resources, so as to meet the changing


requirements from time to time, across a large area is possible.
Grid assumes a WAN (wide Area Network) of large computing resources as its
nodes and end-users connect to the WAN from anywhere ( house and office).
Grid aims at :
optimum utilization of resources - minimizing the cost and
maintaining the utilization of the existing resources.
Thus, it will provide a win-win situation to both the owner organizations and the
end-users by way of ubiquitous computing.

Shreenath Acharya, SJEC, Vamanjoor

The grid is an infrastructure that enables the integrated, collaborative use


of high end computers, networks, databases, and also other scientific
resources including instruments owned and managed by various
organizations.
Grid applications involve large amounts of data and/or computing and often
require secure resource sharing across the organizational boundaries.
Grid computing definition:
A computational grid is a hardware and software infrastructure that
provides dependable, consistent, pervasive and inexpensive access to high
end computational capabilities.

Shreenath Acharya, SJEC, Vamanjoor

Grid computing comprises of a combination of a decentralized architecture of


resource management and a layered architecture of a specific hierarchy for the
implementation of various services of the grid.
Thus, a grid computing system can have any configuration starting with a Local
Area Network (LAN), or a bigger Metropolitan Area Network (MAN) or a large
Wide Area Network (WAN) at the national scale or even an international network
spanning several countries and continents.
ie, it can span a single organization or many organizations or service providers
space.
A grid can focus on the pooled assets of one organization or a pool of Multiple
Virtual Organizations (MVOs), all of which use common protocols enabling the
grid to offer services and run applications in a secure and controlled way.
Resources can be pooled temporarily, for minutes, days or weeks.

Shreenath Acharya, SJEC, Vamanjoor

The Data Centre, the Grid and the Distributed/High


Performance Computing
Before the existence of grid computing, individual data centres has been
operationalized.
Before the data centres existence, each user organization maintained its own servers
and its own specialized software an expensive and redundant approach.
Data Centres eliminated the need for separate servers being maintained by the
individual user organizations- they share the data centre resources.
But, individual data centres may not necessarily maintain and offer all the possible
resources-hardware or software.
The user organizations or individual users connected to a single data centre may be
able to use the resources available in that particular data centre and not the resources
available in another centre, belonging to a different organization.

Shreenath Acharya, SJEC, Vamanjoor

Grid computing enables multiple data centres of same or different organizations to


be networked into a grid, so as to offer all the resources of hardware and software,
in all data centres to any of the users of any of these multiple organizations,
however remote they may be.
Grid computing includes concepts of distributed computing, high performance
computing, and disposable computing, depending upon the exact nature and scale of
the application of the grid.
A virtual supercomputer can be created out of a grid, comprising of its servers, work
stations and even PCs to deliver higher processing power.
Thus, the grid can provide a metacomputing environment, which can be a
metacomputing facility for the users, by treating CPU power, disk space, bandwidth
as commodities to be utilized by the users of the grid, as and when they require.
ie, grid computing provides a computational utility to its consumers.

Shreenath Acharya, SJEC, Vamanjoor

Cluster Computing and Grid Computing


In clusters, the resource allocation is performed by a centralized resource manager and
scheduling system.
All the nodes of a cluster work cooperatively together, as a single unified resource.
In the case of grid, each node has its own resource manager and it does not aim at
providing a single system view.
A cluster comprises of multiple interconnected independent nodes that cooperatively
work together as a single unified resource - all the user interactions with a cluster go
through a centralized system that manages the allocation of resources to application
jobs.
Some grids are collections of clusters e.g., NSF Tera grid and world wide grid - has
several clusters as nodes located in different countries ( Canada, Japan and Australia).
Cluster management systems (such as Sun grid engine) have centralized control,
complete control over individual components and complete knowledge of user
requests and system state( They are not grids).

Shreenath Acharya, SJEC, Vamanjoor

Metacomputing-the Precursor of Grid Computing


The financial resources are finite, but computational needs are infinite.
The demand for computational resources keeps on increasing indefinitely, whatever be
the availability of resources, the need for more remains leading to what is called as
metacomputing.
The idea of Why not try and utilize the potentiality of hundreds of thousands of
computers which are somehow connected with each other? lead to the existence of
metacomputing.
The use of powerful computing resources, transparently available to the user via
networked environment, is indicated by the term metacomputing.

Shreenath Acharya, SJEC, Vamanjoor

How to Achieve Metacomputing?


Three essential steps to achieve the goals of metacomputing are:
Step 1: To integrate the large number of individual hardware and software resources
into a combined networked resource.
Step 2: To deploy and implement a middleware to provide transparent view of the
resources available.
Step 3:
To develop and deploy optimal applications on the distributed
metacomputing environment to take advantage of the resources.
Linking remote resources is not questioned but, the viability of the linking speeds,
etc. for realistic application execution may be questioned.
Similarly, the ability and feasibility of a metacomputing environment to execute
parallelly the components of an applications are also contentious.

Shreenath Acharya, SJEC, Vamanjoor

If distributed processing is essential, i.e. processing data in different distributed


locations is required to be done, metacomputing may not be required.
Metacomputing comes to fore when a single point usage is required for large
remotely located resources.
Sometimes, metacomputing may not be the efficient way, and it is usually better and
more efficient to run independent jobs in different nodes in a network.
But if metacomputing is indeed required and to be used for other reasons, it may not
be really the most efficient way.
However, for many other reasons, linking and sharing geographically distributed
computing resources becomes necessary in many multidisciplinary scientific
research projects and also in the industry, for linking different components of an
organization together.

Shreenath Acharya, SJEC, Vamanjoor

What is a Metacomputer?
That is why, the terms grid and computational grid are used to describe a
metacomputer.
Metacomputing encompasses two broad categories:
Seamless access to high performance
Linking of computing resources, instruments and other resources.

Shreenath Acharya, SJEC, Vamanjoor

What does a Metacomputer Consist of?


Metacomputer is a virtual computer - it has a virtual computing architecture.
The components together provide a single virtual computer image.

Metacomputer consists of:


(a) processors and memory,
(b) network and communication software,
(c) remote data access and retrieval, and
(d) virtual environment.

(a) Processors and memory: The primary resources of a metacomputer are the
processors and the associated memory units. They form the basic computational
power.
A metacomputer is a single virtual view of several of processors and their
associated memory units.

Shreenath Acharya, SJEC, Vamanjoor

(b) Network and communication software: The metacomputer comprises of a


network, and the related communication links which connect the
processors(physically distributed).
 The links between machines could be via modems, ISDNs, Ethernet, FDDI,
ATM (Asynchronous Transmission Mode) or any other networking
technology.
 Networks are required to be of high bandwidth and low latency, so that they
provide rapid and reliable communication link between various processors
or nodes.
 The associated communication software for effective communication will
bridge all the gaps between different computers, between computers and
people and also the gap between different people.
(c) Remote data access and retrieval: In a metacomputer the data stored in the
secondary storage devices with each node is required to be accessed remotely
and retrieved upon demand which may go upto petabytes.
 Not only retrieval but also replication, mirroring, etc. will be required for the
purposes of recovery and business continuity.
 Ability to access remote data without the knowledge of its location is an
essential requirement of meta-computing
ie, distributed database functionality is an essential requirement of a
metacomputer.
Shreenath Acharya, SJEC, Vamanjoor

(d) Virtual environment: The large interconnected communicating network of


computers, processors, memory and disks require the need for something on the
lines of an operating system, which can be used to configure, manage and
maintain the metacomputing environment.


The virtual environment spans the metacomputers and makes the multiple
distributed computers usable as a single system, to both the system
administrator and the individual users, including scientific instruments, if
any and the computer systems located thousands of miles away appear as
a single system.

Shreenath Acharya, SJEC, Vamanjoor

Evolution of Metacomputing Projects


Early metacomputing, which was the precursor of the grid computing, had
produced a few projects, which achieved its goals, especially distributed computing
with resource sharing across a large network.
The two early metacomputing projects are:
i)FAFNER
ii) I-WAY,
Both metacomputing projects have almost extremely opposite situations and
computational requirements.
1. Project FAFNER: A project aimed at finding factors of large numbers parallelly,
over a large network of mathematicians who calculated the factors required for
prime numbers, in the context of encryption for Public Key Infrastructure (PKI).
Public Key Infrastructure (PKI) is meant for secure communication with digital
signatures.
Every signatory user will have a key pair: a public key which is open to public, and
a private key, which is totally secret, unknown to anyone.
Shreenath Acharya, SJEC, Vamanjoor

Public key and private key are mathematically related so that a message encrypted with a
recipient public key can be decrypted only with the recipients private key.
The algorithm developed for this purpose is by Rivest, Shamir and Adleman shortened as
RSA where keys are generated mathematically, in part, by combining prime numbers.
The security of RSA algorithm is based on the fact that it is very difficult to factor
extremely large numbers, especially those with hundreds of digits.
RSA keys are 154 or 512 digit keys and the usage of this technology has led to integer
factorization becoming an active research area.
The factorization challenge provides a test bed for factorizing implementations and
provides one of the largest collection of factorization results from many different
experts worldwide.
Since factorization is a high computational job, parallel factorization algorithms were
developed so that factorizing can be computed parallelly, on several processors on a
network of computational resources, i.e. processors, memory and storage.
The algorithms doesnt require much communication after the initial set up for the
computations and makes it possible for many contributors to provide a small part of the
large factorization project.
Shreenath Acharya, SJEC, Vamanjoor

Initially, the code for factorizing and related information/data was distributed through
e-mail to all the concerned individuals.
Subsequently, the project FAFNER (started by Bellcore Labs, Syracuse University and
Co-operating Systems on the Web) initiated to factor RSA-130 using a new numerical
technique called Number Field Sieve (NFS) factorization method, using web servers for
computation.
A web interface for NFS was produced.
A contributor uses a web form to invoke server side CGI scripts (written in PERL).
Contributors could access (through web pages) a wide range of support services for
the step concerning sieving in factorization.
The activities preferred on the web were: software distribution, project
documentation, user registration, dissemination of sieving tasks, collection of
relations, real time sieving status reports.
The cluster management was done by CGI reports, directing individual sieving
workstations through appropriate day/night sleep cycles to minimize the impact on
the owners of the workstations used in the cluster.
Shreenath Acharya, SJEC, Vamanjoor

Contributors downloaded and built a sieving software daemon (web client), which
used the HTTP protocol to get the values and post the resulting relations back into
CGI script on the web server.
The approach was successful due to several factors such as:
Even single workstations with small memory (4MB) were allowed to perform
useful work using small sieve and small boundaries;
Anonymous registration was supported- users could contribute their hardware
resources anonymously;
A consortium of sites was deployed to run CGI script package locally;
The monitoring was done by RSA-130 web servers hierarchically, round the
clock with minimum human intervention.

Shreenath Acharya, SJEC, Vamanjoor

2. Project I-WAY: I-WAY (Information Wide Area year) was an experimental high
performance network, linking many servers and addressed virtualization environments.
I-WAY was developed in 1995 with the objective to integrate existing high bandwidth
networks with telephone systems.
The servers, datasets and software environments located in 17 different U.S. locations
were integrated by connecting them with 10 networks of different bandwidths and
protocols, using different routing and switching technologies.
The network, bases on ATM (Asynchronous Transmission Mode), provided the back-up
supporting both TCP/IP and ATM and also direct ATM oriented protocols.
To standardize I-WAY software interface management, key sites installed Point of
Presence (PoP) computer system to serve as their receptive gateways to I-WAY.
These I-POP systems were Unix workstations configured homogeneously and contained
a standard software environment, I-Soft helped to overcome problems and issues
related to heterogeneity, scalability, security and performance.
I-POP machines provided uniform authentication, resource reservation and process
creation.
Shreenath Acharya, SJEC, Vamanjoor

Each POP system was accessible from Internet and operated within its firewall.
It also had ATM interface, which allowed monitoring and management of ATM site
switch.
A resource scheduler, called Computational Resource Broker (CRB), was used
consisting of User-CRB, CRB User, CRB-Local scheduler protocols.
A central scheduler maintains queues of jobs and tables indicating the state of local
machines, allocating jobs to machines, etc.
Multiple local schedulers also operated for local scheduling.
AFS file system was used for file movement and processing functions.
To support user level tools, a software Nexus was used to perform automating
configuration mechanisms, appropriately chosen.
Applications supported were: Supercomputing, virtual reality, multi-virtual reality in
addition to GUI, web video. Many features of I-WAI were inputs to Globus.
Grid computing can be used in metacomputing mode for scientific applications.
Shreenath Acharya, SJEC, Vamanjoor

Scientific, Business and e-governance Grids


The grid computing approach facilitates the scientific community and also the
computing communities such as business or government.
Thus, we can have all the categories of grids: Scientific grids, business grids and egovernance grids.
1. Scientific grids: The users may be only the scientists belonging to scientific
organizations.
2. Business grids or e-governance grids: The users may belong to any citizen groups
using business services or government services.
The number of such users can be very large, compared to the restricted users
of scientific grids.
Therefore, the challenges of setting up and operating business grids or egovernance grids are different and much greater than those for setting up
scientific grid computing environments.
The number of citizen being very large, Internet is the only communication
network available to the citizens for accessing a business grid or an e-governance
grid.
The user interfaces, the access speeds and the data sizes will be large.
Shreenath Acharya, SJEC, Vamanjoor

Web Services and Grid Computing


The users of grids viz. citizens, will require web services over the Internet.
The customers will not be interested or required to know of any details of
hardware or software resource locations or resource allocation management.
To provide all this by the grid computing environment, integrating web services
with grid architecture becomes a necessity.
The Open Grid Services Architecture (OGSA) becomes essential to offer
effective, stateful web services based on Service oriented Architecture (SOA) on
the grid.

Shreenath Acharya, SJEC, Vamanjoor

Business Computing and the Grid-a Potential Win-win


Situation
Originally, grid computing was utilized for solving very long computational
problems in scientific research.
For ex: Computation of weather forecasting models, molecular modelling,
bioinformatics, drug design, etc.
For this purpose a massive integration of computer systems called as
computational grid is utilized through the grid computing architecture.
The key objective in any business proposition or government services for citizens
is cost reduction and better quality of service which can be achieved by harnessing
the grid computing approach.

Shreenath Acharya, SJEC, Vamanjoor

The goal of grid computing is to provide the users with a single view and a
single mechanism that can be utilized to support any number of computing tasks:
The grid leverages its extensive information capabilities to support the
processing and storage requirements to complete the task, and all this is done
across the globe with clusters of computer systems, but the user sees only a
single virtual computer undertaking his/her own individual computational
requirements to the fullest satisfaction.
The maximum resource utilization is achieved to provide fastest, cheapest and
maximum satisfaction through quality of service to the end-user.
Thus, an organization will be able to ensure maximum utilization of its
computational resources by adopting the grid computing approach, thus saving
on costs and simultaneously improving the quality of service, are achieved
through grid computing-thus meeting the business objectives.

Shreenath Acharya, SJEC, Vamanjoor

In a grid environment, the large collection of computational resources distributed


geographically, can be ensembled to work together cohesively because of defined
protocols, connectivity, coordination, resource allocation, resource management
and security.
The participating computer systems of a grid could be located in the same room or
distributed across the globe, they may be supporting homogeneous or
heterogeneous hardware platforms, they may be running similar or dissimilar
operating systems, and they may be owned by one or more organizations.
The goal of the grid computing approach : To provide the users with a single view of
single large computing resource to the end user requires high speed network
connectivity - a resource more easily and less expensively available.

Shreenath Acharya, SJEC, Vamanjoor

The promise of grid computing for business purposes is based on three


factors:
1. The ability of grid computing technology to ensure more cost-effective use of a
given amount of computer resources.
2. A methodology to solve any difficult or large problem by using a grid as a large
computer.
3. All the computing resources of a grid such as CPUs, disk storage systems and
software packages can be comparatively and even synergistically harnessed and
managed in collaboration towards a common business objective.

Shreenath Acharya, SJEC, Vamanjoor

e-Governance and the Grid


E-governance is a potential application of grid computing similar to e-business.
In the case of e-governance, the citizen becomes the end-user, and therefore, citizen
services of the government become the most important and high priority application
of the grid.
Citizen services are delivered as e-governance services through the web.
The e-governance services may be web-enabled services or could be delivered as web
services (on SOA).
When delivered as web services, the grid has to support the web services, as the
resources required by a large number of web services call for robust computing
infrastructure such as the grid.
The new grid architecture, OGSA standard offers the stateful web services integrated
with grid technology - Grid tools such as Globus Toolkit support OGSA standard, thus
enabling stateful web services for e-governance.
Thus, it is possible to develop and offer e-governance web services based on OGSA
standard in the grid environment.
Shreenath Acharya, SJEC, Vamanjoor

You might also like