You are on page 1of 43

CHAPTER-1

INTRODUCTION
The initial applications were recently built following different distributed computing approaches:
(1)A Service Oriented Architecture (SOA) based training system
(2) A modular real-time data analyzer
(3) A cluster-based simulator.
But Cloud technologies are currently designed mainly for developing new applications. Early
Cloud providers were focused on developers and technology startups when they designed their
offerings. Software architects looking to build new applications can design the components,
processes and workflow for their solution according to the new Cloud related concepts.
However, building new applications that are being architected from scratch for the Cloud is only
slowly gaining traction, and there are only few enterprise applications that currently take real
advantage of the Clouds elasticity. Having an application distributed across multiple clouds to a
large extent reduces the risk of data security and storage, as well as power and equipment
breakdown. This is one of the reasons that led to the bringing together of several clouds (owned
by different providers) to form what is known as Sky Computing. Infrastructure as- a-service
(IaaS) cloud computing is revolutionizing how we approach computing. Infrastructure-as-a-
service (IaaS) cloud computing represents a fundamental change from the grid computing
Reconciling those choices between multiple user groups proved to be complex, time consuming,
and expensive. Compute resource consumers can eliminate the expense inherent in acquiring,
managing, and operating IT infrastructure and instead lease resources on a pay-as-you-go basis.
IT infrastructure providers can exploit economies of scale to mitigate the cost of buying and
operating resources and avoid the complexity required to manage multiple customer-specific
environments and applications. So, this complexity helped to arose an emerging computing
pattern known as Sky Computing. The main advantage of the cloud computing is that this
technology reduces the cost effectiveness for the implementation of the Hardware, software and
License for all Users can further benefit from low cost and high resource utilization by using sky
computing.

Department of Computer Science and Engineering, VVIT Page 1


1.1 What Is Sky Computing
Sky Computing is an emerging computing model where resources from multiple clouds
providers are leveraged to create large scale distributed infrastructures.

Fig1: Sky Computing

Federation of multiple clouds


Creates large scale infrastructures
Allows to run software requiring large computational power
Sky providers are consumers of cloud providers
Virtual datacenter-less dynamic clouds

Department of Computer Science and Engineering, VVIT Page 2


1.2 Cloud computing vs. Grid Computing vs. Sky Computing
A. Control on Resources
Grid Computing- When using remote resources for regular computing assumption is done
that control over resources is stays with the site, but this choice is not always useful when remote
users who might need a different OS or login access.
Cloud computing- A cloud computing infrastructure is a complex system with a large
number of shared resources. These are subject to unpredictable requests and can be affected by
external events beyond our control. Cloud resource management requires complex policies and
decisions for multi-objective optimization. The strategies for cloud resource management
associated with the three cloud delivery models, Infrastructure as a Service (IaaS), Platform as a
Service (PaaS) and Software as a Service (SaaS), differ from one another.
Sky computing- Sky computing allows users to control resources on their own. So trust
relationships within sky computing are the same as those within a traditional non distributed site,
simplifying how remote resources interact.
B. Scalability
Grid Computing- It is hard to scale
Cloud computing- The ability to scale on demand is one of the biggest advantages of
cloud computing. Often, when considering the range of benefits of cloud, it is difficult to
conceptualize the power of scaling on demand, but organizations of all kinds enjoy tremendous
benefits when they correctly implement auto scaling.
Sky computing- It is dynamically scalable as resources are distributed over several cloud.
C. Security
Grid Computing- Grid systems & applications require standard security functions such as
authentication, access control, integrity, privacy & non repudiation. To develop security
architecture of grid system it should satisfy the following constraints like single sign on,
protection of credential, interoperability with local security solutions etc. But with the
development of latest security technologies like globus toolkit security models has tightened the
grid security to some extent.
Cloud computing- In this security is not strong as users data is disclosed to unauthorized
systems & sometimes hijacking of accounts is possible because of unauthorized access to
intruder.

Department of Computer Science and Engineering, VVIT Page 3


Sky computing- When in sky computing we deploy a single appliance with a specific
provider, we rely on basic security and contextualization measures this provider specific
networking & security context. So security relationships are more complex require provider
independent methods to establish a security & configuration context.
D. Challenges
Grid Computing- It distributes the resources on large geographically distributed
environments & accesses the heterogeneous devices & machines. So it is major challenge to
manage the administrative of the grid computing & the software, which are enabled the grid
computing are less.
Cloud computing- The cloud services providers are faced with large, fluctuating loads
that challenge the claim of cloud elasticity. In some cases, when they can predict a spike, they
can provision resources in advance.
Sky computing-To connects the client to a trusted networking domain and configures
explicit trust & relationships between them so that client securely takes ownership of customized
infrastructure for an agreed time period. To achieve this is a major challenge in sky computing.
E. Applications
Grid Computing- Grid portals, Load balancing, Resource broker etc.
Cloud computing- Big data analytics, File storage, Disaster recovery, Backup etc.
Sky computing- Seasonal e-commerce web server, event based alert systems etc

Department of Computer Science and Engineering, VVIT Page 4


CHAPTER-2
SKY COMPUTING ARCHITECTURE
It is to create a turn-around model to enable intensive computing in Cloud networks. This
is hoped to be achieved by enlarging the set of available resources in a way they overcome the
problems referred before, like elevated latency between nodes. Also, it must be cross Cloud
provider in order to combine resources.

Fig 2: Architecture of Sky Computing

To achieve this, there must be a structure capable of receiving instructions, process and return
results from all different underlying cloud systems. The upper layer, Sky Computing, integrates
the last level of Infrastructure as a Service and the next layer of Software as a Service. It allows
scheduling and distributing resources to inputted tasks/requests made. This is a critical layer, as it
must be as comprehensive as possible in features and capabilities.

Department of Computer Science and Engineering, VVIT Page 5


2.1 CREATING A SKY COMPUTING DOMAIN
Several building blocks underpin the creation of a sky environment. While leveraging
cloud computing, we can in principle trust the configuration of remote resources, which will
typically be connected via entrusted WANs. Furthermore, they wont be configured to recognize
trust each other. So, we need to connect them to a trusted networking domain and configure
explicit trust and configuration relationships between them. In short, we must provide an end-
user environment that represents a uniform abstraction such as a virtual cluster or a virtual site
independent of any particular cloud provider and that can be instantiated dynamically. We
next examine the mechanisms
2.1.1 CREATING A TRUSTED NETWORK ENVIRONMENT
Network connectivity is particularly challenging for both users and providers. Its
difficult to offer APIs that reconfigure the network infrastructure to adjust to users needs without
giving them privileged access to core network equipment something providers wouldnt do
owing to obvious security risks. Without network APIs, establishing communication among
resources in distinct providers is difficult for users. Deploying a virtual cluster spanning
resources in different providers faces challenges in terms of network connectivity, performance,
and management:
Connectivity: Resources in independently administered clouds are subject to different
connectivity constraints due to packet filtering and network address translations; techniques to
overcome such limitations are necessary. Due to sky computing dynamic, distributed nature,
reconfiguring core network equipment isnt practical because it requires human intervention in
each provider. Network researchers have developed many overlay networks to address the
connectivity problem involving resources in multiple sites, including NAT-aware network
libraries and APIs, virtual networks (VNs), and peer-to-peer (P2P) systems.
Performance: Overlay network processing negatively affects performance. To minimize
performance degradation, compute resources should avoid overlay network processing when its
not necessary.

Department of Computer Science and Engineering, VVIT Page 6


For example, requiring overlay network processing in every node (as with P2P systems)
slows down communication among nodes on the same LAN segment. In addition, overlay
network processing is CPU intensive and can take valuable compute cycles from applications. A
detailed study of overlay network processing performance is available elsewhere. Service levels.
Sky computing requires on-demand creation of mutually isolated networks over heterogeneous
resources (compute nodes and network equipment) distributed across distant geographical
locations and under different administrative domains. In terms of SLAs, this has security as well
as performance implications. Core network routers and other devices are designed for a single
administrative domain, and management coordination is very difficult in multisite scenarios.
Overlay networks must be easily deployable and agnostic with respect to network equipment
vendors.
To address these issues and provide connectivity across different providers at low
performance cost, we developed the Virtual Networks (ViNe) networking overlay.6 ViNe offers
end-to-end connectivity among nodes on the overlay, even if theyre in private networks or
guarded by firewalls. We architected ViNe to support multiple, mutually isolated VNs, which
providers can dynamically configure and manage, thus offering users a well-defined security
level. In performance terms, ViNe can offer throughputs greater than 800 Mbps with sub
millisecond latency, and can handle most traffic crossing LAN boundaries as well as Gigabit
Ethernet traffic with low overhead.
ViNe is a user-level network routing software, which creates overlay networks using the
Internet infrastructure. A machine running ViNe software becomes a ViNe router (VR), working
as a gateway to overlay networks for machines connected to the same LAN segment. We
recommend delegating overlay network processing to a specific machine when deploying ViNe
so that the additional network processing doesnt steal compute cycles from compute nodes, a
scenario that can occur if all nodes become VRs. ViNe offers "exibility in deployment as
exemplified in the following scenarios.

Department of Computer Science and Engineering, VVIT Page 7


ViNe-enabled providers: Providers deploy a VR in each LAN segment. The ability to
dynamically and programmatically configure ViNe overlays lets providers offer APIs for virtual
networking without compromising the physical network infrastructure configuration. The cost
for a provider is one dedicated machine (which could be a VM) per LAN segment and can be a
small fraction of the network cost charged to users. IaaS providers offer VN services in this case.

Fig 3: ViNe Routing

End-user clusters: In the absence of ViNe services from providers, users can enable
ViNe as an additional VM that they start and configure to connect different cloud providers. This
user deployed VR would handle the traffic crossing the cluster nodes LAN boundaries. ViNes
cost in this case is an additional VM per user.
Isolated VMs: A VR cant be used as a gateway by machines that dont belong to the
same LAN segment. In this case, every isolated VM (or a physical machine, such as the users
client machine) must become a VR. ViNes cost is the additional network processing that
compute nodes perform, which can take compute cycles from applications.

Department of Computer Science and Engineering, VVIT Page 8


2.1.2 DYNAMIC CONFIGURATION AND TRUST
When we deploy a single appliance with a specific provider, we rely on basic security and
contextualization measures this provider has implemented to integrate the appliance into a
provider-specific networking and security context (for example, to let the appliance owner log
in). However, when we deal with a group of appliances, potentially deployed across different
providers, configuration and security relationships are more complex and require provider-
independent methods to establish a security and configuration context. In earlier work, we
describe a context broker service that dynamically establishes a security and configuration
context exchange between several distributed appliances. Orchestrating this exchange relies on
the collaboration of three parties:
IaaS providers, who provide generic contextualization methods that securely deliver to
deployed appliances the means of contacting a context broker and authenticating themselves to it
as members of a specific context

University of University of Florida Purdue University


Chicago (UC) (UF) (PU)
Xen version 3.1.0 3.1.0 3.0.3
Guest kernel 2.6.18-x86_64 2.6.18-i686 2.6.16-i686
Nimbus version 2.2 2.1 2.1
CPU architecture AMD Opteron 248 Intel Xeon Prestonia Intel Xeon Irwindale
CPU clock 2.2 GHz 2.4GHz 2.8GHz
CPU cache 1Mbyte 512Kbytes 2Mbytes
Virtual CPUs per 2 2 2
node
Memory 3.5Gbytes 3.5Gbytes 1.5Gbytes
Networking Public Private Public
Table 1: Service-level agreement and instances at each cloud provider.

End users provide context information via a simple generic schema and method thats the
same for every appliance used with this provider. Adopting this simple schema lets every
provider deliver basic context information to every appliance.

Appliance providers, who provide methods that let appliance supply information to and
receive it from a context broker and integrate information conveyed by templates describing

Department of Computer Science and Engineering, VVIT Page 9


application-specific roles. Appliances can integrate the information using any configuration
method from any appliance provider. This information in the templates is application-specific
and potentially different from appliance to appliance, but the templates themselves are uniform,
and any context broker can process them.
Deployment orchestrators (context brokers), who provide generic methods of security
context establishment and information exchange based on information the appliance templates
provide.
A typical contextualization process works as follows. Before a user deploys appliances,
he or she registers a context object with a context broker. This object is identified by an identifier
and a secret. The IaaS provider securely conveys the identifier and secret (along with ways to
contact the context broker) on deployment. This gives the appliance a way to authenticate itself
to the context broker, which can then orchestrate security context establishment as well as
information exchange between all appliances in the context (external sources can provide
additional security and configuration information to the security broker).
Defining this exchange in terms of such roles lets any appliance contextualize with any
provider (or across providers). For example, using the Nimbus toolkit implementation of a
context broker, we could dynamically deploy clusters of appliances on Nimbuss Science Clouds
(including multiple Science Cloud providers) as well as Amazon EC2.7
Network connectivity is particularly challenging for both users and providers. Its
difficult to offer APIs that reconfigure the network infrastructure to adjust to users needs without
giving them privileged access to core network equipment something providers wouldnt do
owing to obvious security risks. Without network APIs, establishing communication among
resources in distinct providers is difficult for users
Compute resource consumers can eliminate the expense inherent in acquiring, managing,
and operating IT infrastructure and instead lease resources on a pay-as-you-go basis.
IT infrastructure providers can exploit economies of scale to mitigate the cost of buying
and operating resources and avoid the complexity required to manage multiple customer-specific
environments and applications.

2.2 BUILDING METACLOUDS

Department of Computer Science and Engineering, VVIT Page 10


Next, lets look at how we can exploit resource availability across different Science
Clouds offering different SLAs, to construct a sky environment: a virtual cluster large enough to
support an application execution. Rather than simply selecting the provider with the largest
available resources, we select IaaS allocations from a few different providers and build a sky
environment on top of those allocations using the ViNe network overlay and the Nimbus context
exchange tools.

Fig
4: A virtual
cluster

interconnected with ViNe.


The Science Clouds test bed comprises multiple Ia request. Apart from providing a
platform on which scientific applications explore cloud computing, aS providers configured in
the academic space and providing different SLAs to users; Science Cloud providers grant access
to resources to scientific projects, free of charge and upon request. Apart from providing a
platform on which scientific applications explore cloud computing, the Science Clouds testbed
creates a laboratory in which different IaaS providers use compatible technologies to provide
offerings, letting us experiment with sky computing.
Our sky computing study uses resources on three sites: University of Chicago (UC),
University of Florida (UF), and Purdue University (PU). All sites use the same virtualization
implementation (Xen), and although the versions and kernels differ slightly, VM images are
portable across sites. All sites use Nimbus so that VM images are contextualization-compliant
across those sites.
Consequently, the sites are also API-compliant, but, as Table 1 shows, they offer different
SLAs. Although all sites offer an immediate lease, the provided instances (defined in terms of

Department of Computer Science and Engineering, VVIT Page 11


CPU, memory, and so on) are different. More significantly from a usability viewpoint, the UC
and PU clouds provide public IP leases to the deployed VMs, whereas UF doesnt. To construct a
sky virtual cluster over the testbed we just described, a user with access to the Science Clouds
testbed takes the following steps:
Preparation: Obtain a Xen VM image configured to support an environment the
application requires as well as the ViNe VM image (the ViNe image is available from the
Science Clouds Marketplace). Make sure both images are contextualized (that is, capable of
providing and integrating context information). The user must upload both images to each
provider site.
Deployment: Start a ViNe VM in each site (the ViNe VMs provide virtual routers for
the network overlay). In addition, start the desired number of compute VMs at each provider site.
The contextualized images are configured to automatically (securely) con Deployment. Start a
ViNe VM in each site (the ViNe VMs provide virtual routers for the network overlay). In
addition, start the desired number of compute VMs at each provider site. The contextualized
images are configured to automatically (securely) contact the context broker to provide
appropriate networking and security information and adjust network routes to use VRs to reach
nodes crossing site boundaries. The configuration exchange includes VMs on different provider
sites so that all VMs can behave as a single virtual cluster.
Usage: Upload inputs and start the desired application (typically, by simply logging into
the virtual cluster and using a command line interfaces).
To experiment with the scalability of virtual clusters deployed in different settings, we
configured two clusters: a Hadoop cluster, using the Hadoop Map Reduce framework, version
0.16.2 and a message passing interface (MPI) cluster using MPICH2 version 1.0.7. We used each
virtual cluster to run parallel versions of the Basic Local Alignment Search Tool (Blast), a
popular bioinformatics application that searches for, aligns, and ranks nucleotide or protein
sequences that are similar to those in an existing database of known sequences. We configured
the Hadoop cluster with Blast version 2.2.18 and the MPI cluster with the publicly available
mpiBlast version 1.5.0beta1
Both versions have master-slave structures with low communication-to-computation
ratios. The master coordinates sequence distribution among workers, monitoring their health and
combining the output. The runs used in the evaluation consisted of executing blast of 960

Department of Computer Science and Engineering, VVIT Page 12


sequences averaging 1,116.82 nucleotides per sequence against a 2007 non redundant (NR)
protein sequence database from the US National Center for Biotechnology Information (NCBI)
in 1 fragment (3.5 Gbytes of total data).

University of University of Florida Purdue University


Chicago
Sequential Execution 36 hours and 20 minutes 43 hours and 6 34 hours and 49
Time minutes minutes
Normalization Factor 1.184 1 1.24

Table-2: Normalized single processor performance at each site

We deployed the two virtual clusters in two settings: on the UF cloud only (one-site experiment)
and on all three sites using the same number of processors (three-site experiment). For three-site
experiments, we balanced the number of hosts in each site executing Blast that is, one host in
each site, two hosts in each site, and so on, up to five hosts in each site. (Choosing random
numbers of nodes from different sites would, in effect, weigh the three-site experiments
performance toward comparing the UF site and the site with the most processors). The SLAs
expressed as instances from each metacloud provider are different (PU instances outperform UC
instances which outperform UF instances), which makes it difficult to compare providers. To
establish a comparison base between the SLAs each provider offers, we used the performance of
the sequential execution on a UF processor of the Blast job described earlier to define a
normalized performance benchmark 1 UC processor is equivalent to 1.184 UF processors,
whereas 1 PU processor is equivalent to 1.24 UF processors. For example, an experiment with 10
UF processors, 10 UC processors, and 10 PU processors should provide the performance of a
cluster with 34.24 UF processors. We used these factors to normalize the number of processors

Figure 4 shows the speedup Blast execution on various numbers of testbed processors in
different deployment settings versus the execution on one processor at UF. A sequential
execution on one UF processor resource that took 43 hours and 6 minutes was reduced to 1 hour
and 42 minutes using Hadoop on 15 instances (30 processors) of the UF cloud, a 25.4-fold

Department of Computer Science and Engineering, VVIT Page 13


speedup. It was reduced to 1 hour and 29 minutes using Hadoop on five instances in each of the
three sites (30 processors), a 29-fold speedup. Overall, the performance difference between a
virtual cluster deployed in a single cloud provider and a virtual cluster deployed in three distinct
cloud providers interconnected across a WAN through a VN is minimal for Blast executed with
either Hadoop or MPI. Also, comparison with ideal performance (assuming perfect
parallelization that is, where N CPU clusters would provide N-fold speedup relative to
sequential execution) shows that the application parallelizes well.
In the data presented, we refer only to the VMs used to create the application platform
and not to those additional ones used to run VRs. Running those routers (one per site) constitutes
an additional cost in resource usage. This cost is relatively small and depends on network traffic,
as detailed elsewhere. We can further amortize this cost by sharing the router with other cloud
users (the provider could offer it as another service) or running it in one of the compute nodes.
Our experiments aimed to study the feasibility of executing a parallel application across
multiple cloud providers. In this context, our two main objectives were to demonstrate that end
users can deploy a sky computing environment with full control, and that the environment
performs well enough to execute a real-world application. Weve successfully combined open
source and readily available cloud (Nimbus toolkit) and VN (ViNe) technologies to let users
launch virtual clusters with nodes that are automatically configured and connected through
overlays. The observed impact of network virtualization overheads was low, and we could
sustain the performance of a single-site cluster using a cluster across three sites. This illustrates
sky computing potential in that even when the necessary resources are unavailable in a single
cloud, we can use multiple clouds to get the required computation power.

CHAPTER-3
GRID AND CLOUD COMPUTING

Department of Computer Science and Engineering, VVIT Page 14


Evolution of Distributed computing: Scalable computing over the Internet Technologies
for network based systems clusters of cooperative computers - Grid computing
Infrastructures cloud computing - service oriented architecture Introduction to Grid
Architecture and standards Elements of Grid Overview of Grid Architecture.
3.1 DISTRIBUTED COMPUTING
A distributed computing consists of multiple autonomous computers that
communicate through a computer network. Distributed computing utilizes a network of many
computers, each accomplishing a portion of an overall task, to achieve a computational result
much more quickly than with a single computer. Distributed computing is any computing that
involves multiple computers remote from each other that each has a role in a computation
problem or information processing

Agent Agent
Cooperation
Agent Cooperation
Distribution Distribution Cooperation
Distribution
Internet
Internet
Agent

Subscription Distribution

Job Request

Resource Large-scale
Management Application

Fig 5: Distribution of Computing

A distributed system is one in which hardware or software components located at


networked computers communicate and coordinate their actions only by message passing.

In the term distributed computing, the word distributed means spread out across space.
Thus, distributed computing is an activity performed on a spatially distributed system. These

Department of Computer Science and Engineering, VVIT Page 15


networked computers may be in the same room, same campus, same country, or in different
continents

Characteristics:

Resource Sharing

Openness

Concurrency

Scalability

Fault Tolerance

Transparency

Architecture:

Client-server

3-tier architecture

N-tier architecture

loose coupling, or tight coupling

Peer-to-peer

Space based

3.1.1 APPLICATIONS OF DISTRIBUTED COMPUTING

Department of Computer Science and Engineering, VVIT Page 16


Database Management System
Distributed computing using mobile agents
Local intranet
Internet (World Wide Web)
JAVA Remote Method Invocation (RMI)

Distributed computing using mobile agents:

Fig 6: Distributed Computing Using Mobile Programs

Mobile agents can be wandering around in a network using free resources for their own
computations

Local intranet

Department of Computer Science and Engineering, VVIT Page 17


A portion of Internet that is separately administered & supports internal sharing of
resources (file/storage systems and printers) is called local intranet

Fig 7: Local Intranet

Internet (World Wide Web)

Department of Computer Science and Engineering, VVIT Page 18


The Internet is a global system of interconnected computer networks that use the
standardized Internet Protocol Suite

Fig 8: Messages over the Internet

JAVA Remote Method Invocation (RMI)


Embedded in language Java:-

Department of Computer Science and Engineering, VVIT Page 19


Object variant of remote procedure call
Adds naming compared with RPC (Remote Procedure Call)
Restricted to Java environments

Fig 9: Java RMI

3.2 GRID COMPUTING

Department of Computer Science and Engineering, VVIT Page 20


Grid computing is a form of distributed computing whereby a "super and virtual
computer" is composed of a cluster of networked, loosely coupled computers, acting in
concert to perform very large tasks. Grid computing (Foster and Kesselman, 1999) is a
growing technology that facilitates the executions of large-scale resource intensive
applications on geographically distributed computing resources. Facilitates flexible,
secure, coordinated large scale resource sharing among dynamic collections of
individuals, institutions, and resource Enable communities (virtual organizations) to
share geographically distributed resources as they pursue common goals

Criteria for a grid

Coordinates resources that are not subject to centralized control.


Uses standard, open, general-purpose protocols and interfaces
Delivers nontrivial qualities of service.

Benefits

Exploit Underutilized resources


Resource load Balancing
Virtualizes resources across an enterprise
Data Grids, Compute Grids
Enable collaboration for virtual organizations

Grid Applications

Data and computationally intensive applications

This technology has been applied to computationally-intensive scientific,


mathematical, and academic problems like drug discovery, economic forecasting, seismic
analysis back office data processing in support of e-commerce

A chemist may utilize hundreds of processors to screen thousands of -


compounds per hour.

Teams of engineers worldwide pool resources to analyze terabytes of


structural data.

Department of Computer Science and Engineering, VVIT Page 21


Meteorologists seek to visualize and analyze petabytes of climate data
with enormous computational demands.

Resource sharing

Computers, storage, sensors, networks,

Sharing always conditional: issues of trust, policy, negotiation, payment,

Coordinated problem solving

distributed data analysis, computation, collaboration,

Grid Topologies

Intragrid

Local grid within an organization


Trust based on personal contracts

Extragrid

Resources of a consortium of organizations


connected through a (Virtual) Private Network
Trust based on Business to Business contracts

Intergrid

Global sharing of resources through the internet


Trust based on certification

COMPUTATIONAL GRID

A computational grid is a hardware and software infrastructure that provides


dependable, consistent, pervasive, and inexpensive access to high-end computational
capabilities.

The Grid: Blueprint for a New Computing Infrastructure, Kesselman & Foster

Department of Computer Science and Engineering, VVIT Page 22


Example: Science Grid (US Department of Energy).

DATA GRID

A data grid is a grid computing system that deals with data the controlled sharing and
management of large amounts of distributed data.
Data Grid is the storage component of a grid environment. Scientific and engineering
applications require access to large amounts of data, and often this data is widely
distributed. A data grid provides seamless access to the local or remote data required to
complete compute intensive calculations.

Example:

Biomedical informatics Research Network (BIRN),

The Southern California earthquake Center (SCEC).

3.2.1 METHODS OF GRID COMPUTING

Distributed Supercomputing
High-Throughput Computing
On-Demand Computing
Data-Intensive Computing
Collaborative Computing
Logistical Networking

Distributed Supercomputing

Combining multiple high-capacity resources on a computational grid into a single, virtual


distributed supercomputer.
Tackle problems that cannot be solved on a single system.

High-Throughput Computing

Department of Computer Science and Engineering, VVIT Page 23


Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the
goal of putting unused processor cycles to work.

On-Demand Computing

Uses grid capabilities to meet short-term requirements for resources that are not locally
accessible.
Models real-time computing demands.

Collaborative Computing

Concerned primarily with enabling and enhancing human-to-human interactions.

Applications are often structured in terms of a virtual shared space.

Data-Intensive Computing

The focus is on synthesizing new information from data that is maintained in


geographically distributed repositories, digital libraries, and databases.
Particularly useful for distributed data mining.

Logistical Networking

Logistical networks focus on exposing storage resources inside networks by optimizing


the global scheduling of data transport, and data storage.

Contrasts with traditional networking, which does not explicitly model storage resources
in the network.

3.3 GRID ARCHITECTURE

The Hourglass Model

Focus on architecture issues

Propose set of core services as basic infrastructure


Used to construct high-level, domain-specific solutions (diverse)

Department of Computer Science and Engineering, VVIT Page 24


Design principles

Keep participation cost low

Enable local control

Support for adaptation

IP hourglass model

Grid is the storage component of a grid environment. Scientific and engineering


applications require access to large amounts of data, and often this data is widely distributed. A
grid provides seamless access to the local or remote data required to complete compute intensive
calculations.

This technology has been applied to computationally-intensive scientific, mathematical,


and academic problems like drug discovery, economic forecasting, and seismic analysis back
office data processing in support of e-commerce

Grid is a growing technology that facilitates the executions of large-scale resource


intensive applications on geographically distributed computing resources. Facilitates flexible,
secure, coordinated large scale resource sharing among dynamic collections of individuals,
institutions, and resource Enable communities (virtual organizations) to share geographically
distributed resources as they pursue common goals

3.3.1 LAYERED GRID ARCHITECTURE

Coordinating multiple resources: ubiquitous infrastructure services, app-


specific distributed services

Department of Computer Science and Engineering, VVIT Page 25


Fig 10: Internet Protocol Architecture

Sharing single resources: negotiating access, controlling use

Talking to things: communication (Internet protocols) & security

Controlling things locally Access to control of resources.

3.3.2 DATA GRID ARCHITECTURE

App: Discipline-Specific Data Grid Application

Collective (app): Coherency control, replica selection, task management, virtual data
catalog, virtual data code catalog

Department of Computer Science and Engineering, VVIT Page 26


Collective (generic): Replica catalog, replica management, co-allocation, certificate
authorities, metadata catalogs,

Resource: Access to data, access to computers, access to network performance data

Connect: Communication, service discovery, authentication, authorization, delegation

Fabric: Storage systems, clusters, networks, network caches

3.4 SIMULATION TOOLS

GridSim job scheduling

SimGrid single client multiserver scheduling

Bricks scheduling

GangSim- Ganglia VO

OptoSim Data Grid Simulations

G3S Grid Security services Simulator security services

GridSim is a Java-based toolkit for modeling, and simulation of distributed resource


management and scheduling for conventional Grid environment.

Department of Computer Science and Engineering, VVIT Page 27


GridSim is based on SimJava, a general purpose discrete-event simulation package
implemented in Java.

All components in GridSim communicate with each other through message passing
operations defined by SimJava.

3.4.1 Salient Features of GridSim

It allows modeling of heterogeneous types of resources.

Resources can be modeled operating under space- or time-shared mode.

Resource capability can be defined (in the form of MIPS (Million Instructions per
Second) benchmark.

Resources can be located in any time zone.

Weekends and holidays can be mapped depending on resources local time to model non-
Grid (local) workload.

Resources can be booked for advance reservation.

Applications with different parallel application models can be simulated.

Application tasks can be heterogeneous and they can be CPU or I/O intensive.

There is no limit on the number of application jobs that can be submitted to a resource.

Multiple user entities can submit tasks for execution simultaneously in the same resource,
which may be time-shared or space-shared. This feature helps in building schedulers that
can use different market-driven economic models for selecting services competitively.

Network speed between resources can be specified.

CHAPTER-4

Department of Computer Science and Engineering, VVIT Page 28


SERVICE ORIENTED ARCHITECTURE

A method of design, deployment, and management of both applications and the software
infrastructure where:

All software is organized into business services that are network accessible and
executable.

Service interfaces are based on public standards for interoperability.

4.1 CHARACTERISTICS OF SOA

Quality of service, security and performance are specified.

Software infrastructure is responsible for managing.

Services are cataloged and discoverable.

Data are cataloged and discoverable.

Protocols use only industry standards.

4.1.1 WHAT IS SERVICE

A Service is a reusable component.

A Service changes business data from one state to another.

A Service is the only way how data is accessed.

If you can describe a component in WSDL, it is a Service.

4.1.2 WHY GETTING SOA WILL BE DIFFICULT

Department of Computer Science and Engineering, VVIT Page 29


Managing for Projects

Software: 1 - 4 years

Hardware: 3 - 5 years;

Communications: 1 - 3 years;

Project Managers: 2 - 4 years;

Reliable funding: 1 - 4 years;

User turnover: 30%/year;

Security risks: 1 minute or less.

Managing for SOA

Data: forever.

Infrastructure: 10+ years.

4.1.3 WHY MANAGING BUSINESS SYSTEMS IS DIFFICULT

40 Million lines of code in Windows XP is unknowable.

Testing application (3 Million lines) requires >1015 tests.

Probability correct data entry for a supply item is <65%.

There are >100 formats that identify a person in DoD.

Output / Office Worker: >30 e-messages /day.

Department of Computer Science and Engineering, VVIT Page 30


4.1.4 HOW TO VIEW ORGANISING FOR SOA

Fig 11: Organizing for SOA

4.1.5 SOA MUST REFLECT TIMING

Department of Computer Science and Engineering, VVIT Page 31


Fig 12: Timing of SOA

4.1.6 SOA Must Reflect Conflicting Interests

Department of Computer Science and Engineering, VVIT Page 32


Fig 13: Conflicting Interests of SOA

4.2 ORGANIZATION OF SERVICES

Department of Computer Science and Engineering, VVIT Page 33


1) Infrastructure Services

2) Data Services

3) Security Services

4) Computing Services

5) Communication Services

6) Application Services

4.2.1 ORGANIZATION OF INFRASTRUCTURE SERVICES

Fig 14: Infrastructure Services

4.2.2 ORGANIZATION OF DATA SERVICES

Department of Computer Science and Engineering, VVIT Page 34


Fig 15: Data Services

Data Interoperability Policies

Data are an enterprise resource.

Single-point entry of unique data.

Enterprise certification of all data definitions.

Data stewardship defines data custodians.

Zero defects at point of entry.

Deconflict data at source, not at higher levels.

Data aggregations from sources data, not from reports.

Data Concepts

Department of Computer Science and Engineering, VVIT Page 35


Data Element Definition

Text associated with a unique data element within a data dictionary that describes
the data element, give it a specific meaning and differentiates it from other data
elements. Definition is precise, concise, non-circular, and unambiguous.

Data Element Registry

A label kept by a registration authority that describes a unique meaning and


representation of data elements, including registration identifiers, definitions,
names, value domains, syntax, ontology and metadata attributes.

Data and Services Deployment Principles

Data, services and applications belong to the Enterprise.

Information is a strategic asset.

Data and applications cannot be coupled to each other.

Interfaces must be independent of implementation.

Data must be visible outside of the applications.

Semantics and syntax is defined by a community of interest.

Data must be understandable and trusted.

4.2.3 ORGANIZATION OF SECURITY SERVICES

Department of Computer Science and Engineering, VVIT Page 36


Fig 16: Security Services

Security Services

Conduct Attack/Event Response

Ensure timely detection and appropriate response to attacks.

Manage measures required to minimize the networks vulnerability.

Secure Information Exchanges

Secure information exchanges that occur on the network with a level of


protection that is matched to the risk of compromise.

Provide Authorization and Non-Repudiation Services

Identify and confirm a user's authorization to access the network.

4.2.4 ORGANIZATION OF COMPUTING SERVICES

Department of Computer Science and Engineering, VVIT Page 37


Fig 17: Computing Services

Computing Services

Provide Adaptable Hosting Environments

Global facilities for hosting to the edge.

Virtual environments for data centers.

Distributed Computing Infrastructure

Data storage and shared spaces for information sharing.

Shared Computing Infrastructure Resources

Access shared resources regardless of access device.

4.2.5 ORGANIZATION OF COMMUNICATION SERVICES

Department of Computer Science and Engineering, VVIT Page 38


Fig 18: Communication Services

Communication Services

Provide Information Transport

Transport information, data and services anywhere.

Ensures transport between end-user devices and servers.

Expand the infrastructure for on-demand capacity.

4.2.6 ORGANIZATION OF APPLICATION SERVICES

Department of Computer Science and Engineering, VVIT Page 39


Fig 19: Application Services

Application Services and Tools

Provide Common End User Interface Tools

Application generators, test suites, error identification, application components


and standard utilities.

Common end-user Interface Tools

E-mail, collaboration tools, information dashboards, Intranet portals, etc.

4.3 SOA PROTOCOLS

Department of Computer Science and Engineering, VVIT Page 40


Universal Description, Discovery, and Integration, UDDI. Defines the publication and
discovery of web service implementations.

The Web Services Description Language, WSDL, is an XML-based language that


defines Web Services.

SOAP is the Service Oriented Architecture Protocol. It is a key SOA in which a


network node (the client) sends a request to another node (the server).

The Lightweight Directory Access Protocol, or LDAP is protocol for querying and
modifying directory services.

Extract, Transform, and Load, ETL, is a process of moving data from a legacy system
and loading it into a SOA application.

CHAPTER-5

Department of Computer Science and Engineering, VVIT Page 41


CONCLUSION
Clouds provide the components for novel types of IT systems or novel implementations
of familiar IT system architectures. Sky-computing refers to such systems and their use In
particular, combined clouds capable of providing environments, workflows, enterprise IT, etc as
a service
Design and management of combined clouds face challenges and need fundamental and
system oriented advances. A new area for IT research. Essential for standards and next
generation of IT business
Sky computing to create large scale distributed infrastructures. Our approach relies on
Nimbus for resource management, contextualization and fast cluster instantiation, ViNe for all-
to-all connectivity, Hadoop for dynamic cluster extension. It Provides both infrastructure and
application elasticity
Through the communication platform, the students can communicate with their teacher at
any convenient time, and vice versa at the most reduced cost. This helps teachers know the
situation of teaching and student's knowledge level of the course. The teacher also can answer
questions or send messages to students through this communication platform freely. In practice,
through these technical means it narrows the gap between students and teachers and produces
satisfactory results.
Sky computing is an emerging computing model where resources from multiple cloud
providers are leveraged to create large scale distributed infrastructures. These infrastructures
provide resources to execute computations requiring large computational power, such as
scientific software. Establishing a sky computing system is challenging due to differences among
providers in terms of hardware, resource management, and connectivity. Furthermore, scalability,
balanced distribution of computation and measures to recover from faults are essential for
applications to achieve good performance. This work shows how resources across two
experimental projects: the Future Grid experimental testbed in the United States and Grid'5000, n
infrastructure for large scale parallel and distributed computing research composed of 9 sites in
France, can be combined and used to support large scale, distributed experiments Several open
source technologies are integrated to address these challenges.

REFERENCES

Department of Computer Science and Engineering, VVIT Page 42


[1] Jose Fortes, Advanced Computing and Information Systems Laband NSF Center for
Autonomic Computing
[2] KatarzynaKeahey, MauricioTsugawa, Andrea Matsunaga, and Jose A.B. Fortes paper of
nimbus 2009.P. Singhala, D. N. Shah, B.Patel ,Temperature Control using Fuzzy Logic, January
2014
[3] HarmeetKaur, Kamal Gupta 2013, International Journal of Scientific Research Engineering &
Technology (IJSRET)
[4] NehaMishra, RituYadavand, SaurabhMaheshwari 2014, International Journal on
Computational Sciences & Applications (IJCSA) Vol.4
[5] Sky computing-Exploring the aggregated Cloud resources Part I, by Andre Monteiro,
Joaquim Sousa Pinto, Claudio Teixeira, Tiago Batista
[6] Sky Computing: When Multiple Clouds Become One, Jose Fortes, Advanced Computing and
Information Systems Lab and NSF Center for Autonomic Computing
[7] Architecturing a sky computing platform, Dana Petcu, Ciprian Craciun, Marian Neagul,
Silviu Panica Abha Tewari et al, / (IJCSIT) International Journal of Computer Science and
Information Technologies, Vol. 6 (4), 2015, 3861-3864

Department of Computer Science and Engineering, VVIT Page 43