Professional Documents
Culture Documents
Seminar paper
Grid und Web 2.0 (703524)
Lehrveranstaltungsleiter: T. Fahringer
Name Matrikelnummer
Thomas Zangerl
1
Seminar “Grid und Web 2.0“ Winter term 2007/08
Abstract
The aim of this document is to compare Web 2.0 mashups, which have become amazingly
popular in recent years, with current Grid workflow technologies. In order to achieve that
purpose, the document provides a survey on the basic function principles of the underlying
technologies, accompanied by reflections on issues and limitations of the approaches and
some practical examples. Finally mashups and Grid workflows are compared and differences
as well as similarities and common problems are highlighted.
1. Introduction
The rapid spread of Web 2.0. technologies (like AJAX) in the past years has given rise to
many interesting services which have provided intuitive and responsive applications as they
had never been seen on the web before. The use or rather simulation of asynchronous
communication over HTTP using AJAX ([3]) allowed for fast and impressive chat, e-mail or
map applications on the web that felt like desktop programs without the need for cumbersome
browser-plugin installations.
Those web-applications have become increasingly popular, and many of them have also
provided APIs which people have used either to display the various Web 2.0 services on their
own websites or to combine different Web 2.0 services on a single portal. One of the pioneers
with respect to the availability of an intuitive and popular API (and also up to today, one of
the most succesful AJAX applications) has been Google Maps (http://maps.google.com/), the
popularity of which increased once the service found itself embedded on many private and
also commercial websites.
Hence, other web application providers, among them major players like Yahoo or Amazon,
have followed Google's example and published APIs to their Web 2.0 services.
This process has ended in the availability of a large set of easy-to-use APIs which allow
individuals to integrate third-party services relatively straightforwardly into their own web
presence.
There has followed a huge wave of recombinations („remixes“) of different Web 2.0 services
on user-made sites, which have been called mashups, like the remixes of music and video
clips.
Grid Workflows, on the other hand, also describe a novel approach to using existing resources
(practical and working Grid applications exist, as practical and working web 1.0 applications
had existed before the web 2.0 boom and to some degree still exist today), but with
significantly less share of attention than the nearly universally popular Web 2.0 applications.
Of course, an important reason for that might be that today the Grid itself isn't nearly as
widespread as the web in terms of general usage, although some distributed programs like
seti@home (http://setiathome.berkeley.edu/) or folding@home (http://folding.stanford.edu/)
have gained quite some popularity. However they have never entirely lost their reputation as
„toy applications“.
In [6], the authors describe the several historical phases, which the development of Grid
technology has undergone. They mention a „Pre Web Phase“, a „Pre Grid Phase“, an „Early
Grid Phase“ and a „Grid Standards Phase“, in which important Grid middleware abstractions
like Condor and the Java CoG Kit emerged. Then follows the „Web Service Phase“, in which
the Grid became integrated with web service technologies and the current „Web Upperware
2
Seminar “Grid und Web 2.0“ Winter term 2007/08
Phase“, in which abstractions should be taken further and workflows, web services and Grid
technology should join forces.
Given that the important Grid middleware Globus Toolkit ([7]) is based on web services in it's
recent version 4 and that there have been some major efforts to extend workflow description
languages like BPEL towards web service orchestration (e.g. WS-BPEL [8]), using workflows
on the Grid with these extensions seems quite natural.
Indeed, a lot of different approaches towards workflow support implementation have been
taken, leading either to performance and capacity utilization improvements or to better
usability of Grid resources by the end user.
However these Grid Workflow systems still face some Grid specific problems, which will be
described in more detail along with some examples from current research in the second part of
this document, while first an introduction to Web 2.0 mashups will be given in order to
establish a thorough base for comparison between the two technologies.
Since the XMLHttpRequest has proved itself very useful for content reloading in many
scenarios, other browsers have implemented similar objects. In the meantime, due to the
practical importance of this object, W3C works towards standardization of the
XMLHttpRequest ([9]).
JavaScript serves as the connecting language for the use of such objects and other functions as
well as the basis for any event driven page display.
Furthermore it allows easy integration of third party APIs by just including the JavaScript
source and afterwards calling the functions defined in it. For example, in order to include a
Google Maps „mapplet“ centered above Innsbruck, Austria on a webpage, the following
minimalist JavaScript code would suffice (see [10]):
<script type="text/javascript"
src="http://www.google.com/jsapi?key=pageAPIKey"></script>
<script type="text/javascript">
3
Seminar “Grid und Web 2.0“ Winter term 2007/08
google.load("maps", "2");
</script>
Listing 1: Sample JavaScript code for embedding Google maps on user page
While JavaScript provides the means to the developer to use event-driven code and dynamic
content updates, the Document Object Model (DOM, [11]) represents the model implemented
by the different browsers that handles those events and dynamic content updates.
DOM provides access to the page structure in a tree-model not very much unlike XML. That
model allows, among other things, the change of nodes in the object tree which may be e.g.
<div> elements on an (X)HTML page, the handling of events, ad-hoc change of formatting by
changing CSS properties and the serialisation of (parts of) documents to XML.
DOM itself represents just a page model built by the browser upon parsing and is intended to
be accessible by many programming languages, however in the context of Web 2.0
applications it is almost always used in combination with JavaScript.
Since it's standardization by the W3C, DOM has replaced the multitude of proprietary
JavaScript models implemented by different browsers and now serves as a uniform
programming interface for dynamic webpages (for example, DOM was introduced as a
DHTML-substitute in Microsofts Internet Explorer with version 5 [12]).
XHTML and CSS provide the means for graphical formatting and hence the UI view of the
AJAX programming model.
For example it would be an easy task to get the geo location of a photo from the photo service
Flickr (13]:
4
Seminar “Grid und Web 2.0“ Winter term 2007/08
flickr.photos.geo.getLocation(apiKey, photoID);
Listing 2: API method signature for getting the geo-location of a photo from Flickr
Such a query would return something like (see [13]):
<photo id="123">
<location latitude="-17.685895" longitude="-63.36914" accuracy="6" />
</photo>
Now we have structured information about longitude and latitude, which can easily be parsed
using any XML parser. This way, latitude and longitude could be obtained and used as input
to the Google maps JavaScript function
There is the small drawback, that there exists no JavaScript Flickr-API implementation (there
are however some user-built workarounds to this). This problem can be solved by using some
server-sided scripting language such as PHP and pass the latitude and longitude information to
the page containing the Google Mapplet as a request attribute or by performing the call using
JSON (JavaScript Object Notation, [14]).
In order to obtain a JSON answer from services like Flickr, it suffices to send a standard
REST GET-request to a special URL on the Web 2.0 Service domain with some agreed-upon
request parameters. For the Flickr it suffices to add a „format=json“ request parameter at the
end of the URL:
http://api.flickr.com/services/feeds/photos_public.gne?tags=tagToQueryFor&
lang=de-at&format=json
JSON itself is just a format for transmitting structured data, hence an alternative to XML,
with the one advantage that it represents a subset of JavaScript which can be deserialized with
a simple call of JavaScript's eval() function. Hence, no parsing is required to construct
JavaScript objects from JSON.
Many services provide their APIs for different programming languages, ranging from
ASP.NET and PHP to JavaScript, Python and Ruby on Rails.
Some, like the online trading service eBay offer their API for a multitude programming
languages, others like Google with it's map service specialise on just one.
This leads to a bit of chaotic behaviour concerning the data interchange, when using more
than one service to build a mashup. Also, it would prove little useful to use a new
programming language for every service one intends to use for the mashup.
If supported by the service, JSON and even more so, SOAP provide a nice bridge among
different languages; SOAP for instance can be used in any programming language that
5
Seminar “Grid und Web 2.0“ Winter term 2007/08
implements the SOAP procedure call and has sufficient tools to parse the XML answers and
construct the XML request messages.
When building mashups, the developer is always dependent on the providers of the
services, she is reusing. If the service provider decides to change his API, the mashup
page could suddenly become inoperative. Also, if the service provided by the original
Web 2.0 application becomes unreachable or overloaded for some reason, also the
mashup will become affected.
The APIs may be available for one or the other programming language; if one finds
oneself unlucky, one has to deal with two or more different programming languages or
struggle with workarounds in order to unify the APIs.
REST is a concept that has worked very well for the WWW for years, but using it for
the CRUD operations (Create, Read, Update, Delete), if one needs to store or
manipulate data, brings it's own set of problems, since it has not been designed for this
task (think of caches etc.). SOAP, on the other hand, which is principally designed for
use with web services, is often criticised for being slow and complicated.
Issues generally critized about AJAX applications, mostly concerning usability, also
apply to mashups. For instance it is often pointed out, that browser controls don't show
the expected effects in Web 2.0 applications – for example using the „Back“ button
very often destroys the current state and navigates back to the latest physical page in
the browser's history or even shows an error message instead of just undoing the last
change. This problem can be dealt with (and it is e.g. in Google applications), however
implementing specific handling functions for all browser controls of all browsers
6
Seminar “Grid und Web 2.0“ Winter term 2007/08
might become cumbersome. AJAX can not hide the fact that it is quite simply a route
around the limitations of a web model originally not designed to support the features
promised by Web 2.0 applications.
2.4 Outlook
Mashups have drawn a great part of their success from their ease of creation, which follows
from the availability of a multitude of more or less self-explanatory APIs. Yet, building a
mashup still requires knowing some basics about JavaScript, server-side programming
languages (PHP, ASP) or even web-service technologies like SOAP. In order to reach (nearly)
everyone who might contribute with his or her creativity, mashup creation has to become even
easier.
An example of very user-friendly mashup creation, that already exists, is Yahoo Pipes
(http://pipes.yahoo.com/pipes/). It allows users to aggregate RSS feeds and filter them, pass
received data on to other Web 2.0 applications while watching the result on the fly using a
graphical user interface.
While this editor is incredibly easy to use, the applications that can be created with it, don't
allow for the same degree of flexibility as normal mashups do (which themselves aren't as
versatile as traditional web applications, which again aren't as powerful as desktop
applications). So at the moment, Yahoo Pipes is a convenient editor for nice and small „toy
applications“.
However, as semantic technologies and new web standards continue to emerge, automation of
data exchange will become easier und more efficient and the creation of usable graphical web
interfaces more straightforward.
7
Seminar “Grid und Web 2.0“ Winter term 2007/08
This will allow mashups to become a platform for everybody to create, deploy and publish
applications. Mashup creation tomorrow might be as easy as uploading a video to YouTube
today, given that the current tendency for inituitiveness is continued and a large user base can
be gained by allowing for just a litte more than the creation of nice, but ultimatively useless
„toy applications“.
8
Seminar “Grid und Web 2.0“ Winter term 2007/08
This leads to the question, what is meant by business process. Hammer and Champy provide a
straightforward definition ([17]):
„[A process is defined as] a collection of activities that takes one or more kinds of input and
creates an output that is of value to the customer.“
Now that the notions of workflows and business processes is established, it remains to define
„workflow management“ ([16]):
„A system that completely defines, manages and executes “workflows” through the execution
of software whose order of execution is driven by a computer representation of the workflow
logic.“
So, mapping your processes to a workflow, means understanding which computations and
other tasks, like copying data or sensor measurements, you have to do in what order to gain
„an output that is of value to the customer“. Here, the customer is the scientist or any other
person doing computationally expensive tasks in need of a Grid and the output is the final
computational result.
In scientific and other Grid applications, the different processes that have to be executed in
order to compute that final result (for a given data set or even just for a description of desired
result data) can be mapped to a workflow and thus automated in their entire execution.
9
Seminar “Grid und Web 2.0“ Winter term 2007/08
If one wants to execute these scientific workflows on the Grid, one has to deal with the many
particularities of Grid environments, namely that they are often opportunistic, unrelieable,
shared within or even among VOs and perhaps untrusted for some critical data.
[6] defines this Grid workflow instantiation W_i as a quadruple (G_r, G_s, Q_u, W_m),
where G_r denotes the Grid resources, G_s the Grid services, Q_u the quality expectations of
the user and W_m the abstract workflow model which one wants to map.
The Grid resources are the physical sites with their different properties, like computational
resources, disk space or disk quota, network bandwidth, process queuing times etc.
The Grid services component in the instantiation is responsible for assisting in Grid specific
tasks, such as data staging, authentication, VO mapping, replica services, information services
for (re-)locating resources and perhaps matchmaking.
The user might expect a certain precision or a maximum execution time, which is expressed
in the quality expectations Q_u.
Figure 3 shows a simple instantiation of the workflow in Figure 2 that adds copying of the
data to a Grid resource and the retrieval of the final result from that resource.
The instantiation of the abstract workflow and hence the mapping to resources and services is
done with varying granularity by different workflow systems. The user just needs to define
the abstract workflow in some textual or graphical representation. That representation is
transformed by the workflow system to a concrete workflow which can be executed by some
meta-scheduler that supports interdependencies among jobs like for instance Condor
DAGman ([2]).
10
Seminar “Grid und Web 2.0“ Winter term 2007/08
There are several ways to textually represent abstract workflows, most of which incorporate
some form of XML-based syntax. Given the trend towards web-services in Grid applications,
it is however a bit surprising, that standards like WS-BPEL don't enjoy stronger backing by
the Grid community. One potential reason given in [18] is, that currently Grid technology is
mostly embraced by science while BPEL has it's strong background in business and
companies are still a bit reluctant concerning Grid usage in a productive environment.
However, there is some discussion going on about BPEL integration, especially as the
„machine code“ for workflows modelled in a GUI component.
Elements for variable assignment and collections assist in defining the XML workflow
representation.
Standard programming operators like sum (<math:sum>), product etc. allow easy
joining of workflow results or simple computations.
Conditional statements execute different branches (or different XML-child elements)
based on some precondition.
Choice terminates with the first child element that finishes execution without error,
behaving like a transaction.
The sequential tag describes sequential execution of it's child elements
The parallel construct tells the workflow engine to execute it's child elements in
parallel.
11
Seminar “Grid und Web 2.0“ Winter term 2007/08
Looping (with the for-tag) is interesting, since it effectively breaks the DAG model of
a workflow. However, it was introduced nonetheless since iterative computations play
a very important role in Grid computing.
Because the development of the Karajan language was initially based on GridAnt, the notion
of tasks is used to denote different types of workflow activities in the Grid, not much unlike
the Ant tasks in a build process. Currently there is support for task::execute, which maps to
running a job on the Grid, task::transfer, which stages a file to or from some site and
task::authentication for authentication. While only those tasks are natively supported, one may
execute arbitrary Java classes by using the <executeJava> element.
The following easy example (taken from [20]) uses execute and transfer tasks to execute ls on
the remote host and afterwards in parallel transfer stdin and stdout to the local machine.
<project>
<include file="cogkit.xml"/>
<execute executable="/bin/ls" arguments="-al"
stdout="stdout" stderr="stderr"
host="hot.mcs.anl.gov" provider="GT2"/>
<echo message="Job completed. Transferring stdout and stderr"/>
<parallel>
<transfer srchost="hot.mcs.anl.gov" srcfile="stdout"
desthost="localhost" provider="gridftp"/>
<transfer srchost="hot.mcs.anl.gov" srcfile="stderr"
desthost="localhost" provider="gridftp"/>
</parallel>
<echo message="Stdout and stderr transferred"/>
</project>
A rather elegant feature of Karajan, which can be seen in this example, is that the tags can be
universally used. Here, we use <parallel> for parallel transfer, but it can also be used to
denote parallel execution.
The „provider“-attribute specifies the Grid middleware that is used in combination with
Karajan. This adds middleware transperency for the workflow programmer and facilitates the
portation of workflows to new middleware, since yet unrecognized providers may be added
with support by the Java CoG Kit API.
12
Seminar “Grid und Web 2.0“ Winter term 2007/08
Pegasus, reduced afterwards and executed on the Grid using Condor's DAGman meta-
scheduler.
Pegasus includes methods which take the error-proneness of Grid resources and the
opportunistic environment into account. It can do „just-in-time“-planning, which means that
only partial abstract workflows are transformed to concrete workflows and scheduled and
once they have completed the Grid state is eveluated again and the next partial workflow is
scheduled.
Triana ([1]) provides a graphical environment for workflow definition. Users can drag and
drop processes onto Triana's desktop and connect them with arrows, which indicate workflow
directions (figure 4). The process components are grouped into modules, that provide
functionality for specific areas of computation (there are modules for GriPhyN, Audio, image
processing aso.). Internally Triana uses an XML format and directed cyclic graphs for
representation (which means, loops are allowed). Triana can import WS-BPEL workflows
using „pluggable“ readers.
The major advantage of Triana lies in it's user-friendliness – users need not write XML syntax
for workflow specification but can use a GUI with existing data analysis tools that just have to
be piped together to form a result – not very much unlike Yahoo Pipes can be used for simple
mashups.
Of course, many other workflow enactment systems exist; enumerating and explaining them
all, however, goes beyond the scope of this document.
Pegasus and Triana were picked as examples, because the former distinguishes between
abstract and concrete workflows and features virtual data and just-in-time planning, while the
latter comes with a intuitive user interface for modelling workflows and provides a nice
workflow counterpart to the Yahoo Pipes application mentioned in section 2.4.
known to be working.
14
Seminar “Grid und Web 2.0“ Winter term 2007/08
be done, in order to solve a problem without specifying how it has to be done. This
adds much flexibility to Grid applications.
4. Conclusion
Initially, there seems to be little similarity between Grid workflows and Web 2.0 mashups.
However, a closer look reveals some interesting common points, as well as some fundamental
differences. So, in order to recall some important characteristics about the two upcoming
technologies, table 1 summarizes the most important properties.
From, the table it can be seen that the underlying technologies of Grid workflows and
mashups were defined in a different way: While Grid workflows have a strong research
background and many scientific papers deal with the topic, AJAX followed a grassroots
development.
Interestingly, however, many Web 2.0 applications and hence also mahsups, use quite similar
programming paradigms and AJAX itself became integrated in programming frameworks,
making it a de-facto standard for modern web development.
Grid workflow languages, however, don't follow a uniform specification standard. WS-BPEL,
a workflow language predestined to be used along with web services, is practically irrelevant
at present time and also meta-schedulers like Condor's DAGman aren't really widespread in
the workflow community.
However, Grid workflows and mashups share a very strong common point: Both are an
abstraction of existing technology. Mashups can be created by combining calls to the
programming APIs of various existing applications, while workflow engines allow specifying
15
Seminar “Grid und Web 2.0“ Winter term 2007/08
what is to be done, without knowing much about the Grid environment or the algorithms used
in the processing chain. Graphical tools exist for both fields of application; structured web
information can be combined using e.g. Yahoo Pipes, while standardized program
components can be combined using e.g. the workflow system Triana. So, both new paradigms
share the fact, that they simplify the usage of existing technology (i.e. web application
creation and Grid computations) and hence contribute to the vision of universal participation
in those fields.
16
Seminar “Grid und Web 2.0“ Winter term 2007/08
5. References
[1] D. Churches et al., „Programming Scientific and Distributed Workflow with Triana
Services“, Concurrency and Computation: Practice and Experience, Vol. 18, pp 1021
– 1037, 2006
[3] J.J. Garret: „Ajax: A New Approach to Web Applications“, Adaptive Path LLC,
18. Februar 2005
http://www.adaptivepath.com/ideas/essays/archives/000385.php
[4] R.T. Fielding: „Architectural Styles and the Design of Network- based Software
Architectures“, Dissertation, University of California, Irvine, 2000.
[8] „Web Services Business Process Execution Language Version 2.0“, OASIS
Standard, 2007-04-11
[14] „The application/json Media Type for JavaScript Object Notation (JSON)“, RFC
4627: http://tools.ietf.org/html/rfc4627
17
Seminar “Grid und Web 2.0“ Winter term 2007/08
[16] The Workflow Reference Model. The Workflow Management Coalition, January 1995.
[19] G. von Laszewski, I. Foster, J. Gawor and P. Lane, „A Java Commodity Grid
Kit“, Concurrency and Computation: Practice and Experience, Vol. 13, 2001
[21] E. Deelman et al., „Pegasus: Mapping Scientific Workflows onto the Grid“,
presented at: Across Grids Conference, 2004.
[22] I. Foster, J. Voeckler, M. Wilde, and Y. Zhao, "Chimera: A Virtual Data System for
Representing, Querying, and Automating Data Derivation", presented at: Scientific and
Statistical
Database Management, 2002.
18