You are on page 1of 18

Seminar “Grid und Web 2.

0“ Winter term 2007/08

Seminar paper
Grid und Web 2.0 (703524)

Mashups vs. Grid-Workflows

Lehrveranstaltungsleiter: T. Fahringer

Tragen Sie hier ihre Gruppendaten ein:

Name Matrikelnummer
Thomas Zangerl

1
Seminar “Grid und Web 2.0“ Winter term 2007/08

Abstract

The aim of this document is to compare Web 2.0 mashups, which have become amazingly
popular in recent years, with current Grid workflow technologies. In order to achieve that
purpose, the document provides a survey on the basic function principles of the underlying
technologies, accompanied by reflections on issues and limitations of the approaches and
some practical examples. Finally mashups and Grid workflows are compared and differences
as well as similarities and common problems are highlighted.

1. Introduction
The rapid spread of Web 2.0. technologies (like AJAX) in the past years has given rise to
many interesting services which have provided intuitive and responsive applications as they
had never been seen on the web before. The use or rather simulation of asynchronous
communication over HTTP using AJAX ([3]) allowed for fast and impressive chat, e-mail or
map applications on the web that felt like desktop programs without the need for cumbersome
browser-plugin installations.
Those web-applications have become increasingly popular, and many of them have also
provided APIs which people have used either to display the various Web 2.0 services on their
own websites or to combine different Web 2.0 services on a single portal. One of the pioneers
with respect to the availability of an intuitive and popular API (and also up to today, one of
the most succesful AJAX applications) has been Google Maps (http://maps.google.com/), the
popularity of which increased once the service found itself embedded on many private and
also commercial websites.
Hence, other web application providers, among them major players like Yahoo or Amazon,
have followed Google's example and published APIs to their Web 2.0 services.

This process has ended in the availability of a large set of easy-to-use APIs which allow
individuals to integrate third-party services relatively straightforwardly into their own web
presence.
There has followed a huge wave of recombinations („remixes“) of different Web 2.0 services
on user-made sites, which have been called mashups, like the remixes of music and video
clips.

Grid Workflows, on the other hand, also describe a novel approach to using existing resources
(practical and working Grid applications exist, as practical and working web 1.0 applications
had existed before the web 2.0 boom and to some degree still exist today), but with
significantly less share of attention than the nearly universally popular Web 2.0 applications.
Of course, an important reason for that might be that today the Grid itself isn't nearly as
widespread as the web in terms of general usage, although some distributed programs like
seti@home (http://setiathome.berkeley.edu/) or folding@home (http://folding.stanford.edu/)
have gained quite some popularity. However they have never entirely lost their reputation as
„toy applications“.

In [6], the authors describe the several historical phases, which the development of Grid
technology has undergone. They mention a „Pre Web Phase“, a „Pre Grid Phase“, an „Early
Grid Phase“ and a „Grid Standards Phase“, in which important Grid middleware abstractions
like Condor and the Java CoG Kit emerged. Then follows the „Web Service Phase“, in which
the Grid became integrated with web service technologies and the current „Web Upperware

2
Seminar “Grid und Web 2.0“ Winter term 2007/08

Phase“, in which abstractions should be taken further and workflows, web services and Grid
technology should join forces.

Given that the important Grid middleware Globus Toolkit ([7]) is based on web services in it's
recent version 4 and that there have been some major efforts to extend workflow description
languages like BPEL towards web service orchestration (e.g. WS-BPEL [8]), using workflows
on the Grid with these extensions seems quite natural.

Indeed, a lot of different approaches towards workflow support implementation have been
taken, leading either to performance and capacity utilization improvements or to better
usability of Grid resources by the end user.

However these Grid Workflow systems still face some Grid specific problems, which will be
described in more detail along with some examples from current research in the second part of
this document, while first an introduction to Web 2.0 mashups will be given in order to
establish a thorough base for comparison between the two technologies.

2. Applications for everyone: Web 2.0 Mashups

2.1 Underlying technology


It is very hard to summarize the fundamentals of Web 2.0 technology, since AJAX, the
principle used for most Web 2.0 services, wasn't created as part of a standardization effort, but
merely by „best practice“, which means that somebody came up with ideas that enabled better
user experience on web pages and which, hence, have been copied by many web-developers.
AJAX as a principle consists of the combined use of several web-technologies that had
already existed before AJAX itself was coined as a term: [3] mentions XHTML, CSS
(Cascading Stylesheets), JavaScript, DOM (the Document Object Model), XML, XSLT and
the XMLHttpRequest object. The latter was originally implemented by Microsoft for the
Internet Explorer browser and allows incremental refresh of page content instead of just
reloading the whole page. The retrieval of data for this incremental refresh can be done using
standard web technology using Representational State Transfer (REST i.e. „GET http://...“,
[4]) or by Web Service technology such as SOAP [5]

Since the XMLHttpRequest has proved itself very useful for content reloading in many
scenarios, other browsers have implemented similar objects. In the meantime, due to the
practical importance of this object, W3C works towards standardization of the
XMLHttpRequest ([9]).

JavaScript serves as the connecting language for the use of such objects and other functions as
well as the basis for any event driven page display.
Furthermore it allows easy integration of third party APIs by just including the JavaScript
source and afterwards calling the functions defined in it. For example, in order to include a
Google Maps „mapplet“ centered above Innsbruck, Austria on a webpage, the following
minimalist JavaScript code would suffice (see [10]):

<script type="text/javascript"
src="http://www.google.com/jsapi?key=pageAPIKey"></script>

<script type="text/javascript">

3
Seminar “Grid und Web 2.0“ Winter term 2007/08

google.load("maps", "2");

// Call this function when the page has been loaded


function initialize() {
var map = new google.maps.Map2(document.getElementById("map"));
map.setCenter(new google.maps.LatLng(47.2667, 11.3833),13);
}
google.setOnLoadCallback(initialize);

</script>

Listing 1: Sample JavaScript code for embedding Google maps on user page

While JavaScript provides the means to the developer to use event-driven code and dynamic
content updates, the Document Object Model (DOM, [11]) represents the model implemented
by the different browsers that handles those events and dynamic content updates.
DOM provides access to the page structure in a tree-model not very much unlike XML. That
model allows, among other things, the change of nodes in the object tree which may be e.g.
<div> elements on an (X)HTML page, the handling of events, ad-hoc change of formatting by
changing CSS properties and the serialisation of (parts of) documents to XML.

DOM itself represents just a page model built by the browser upon parsing and is intended to
be accessible by many programming languages, however in the context of Web 2.0
applications it is almost always used in combination with JavaScript.

Since it's standardization by the W3C, DOM has replaced the multitude of proprietary
JavaScript models implemented by different browsers and now serves as a uniform
programming interface for dynamic webpages (for example, DOM was introduced as a
DHTML-substitute in Microsofts Internet Explorer with version 5 [12]).

What DOM is to structured JavaScript interfaces, XML is to structured data. XML is


important for automatic data exchange, because it is easily parseable by machines. Hence,
data represented in XML can be processed, aggregated, transformed (using XSLT) and
integrated into other data by machines.

XHTML and CSS provide the means for graphical formatting and hence the UI view of the
AJAX programming model.

2.2 Gluing it together to Mashups


Since there are many easy-to-use JavaScript APIs for most of the Web 2.0 services it is very
easy to just put verious JavaScript calls to different services together on a single page and a
form a new service. Because all data flows in a structured way (namely as XML), data
retrieved from the services can be easily combined. The simplicity of creating a new mashup
using different Web Service APIs is best shown with a (very) small example.

For example it would be an easy task to get the geo location of a photo from the photo service
Flickr (13]:

4
Seminar “Grid und Web 2.0“ Winter term 2007/08

flickr.photos.geo.getLocation(apiKey, photoID);

Listing 2: API method signature for getting the geo-location of a photo from Flickr
Such a query would return something like (see [13]):

<photo id="123">
<location latitude="-17.685895" longitude="-63.36914" accuracy="6" />
</photo>

Listing 3: Sample return data from Flickr

Now we have structured information about longitude and latitude, which can easily be parsed
using any XML parser. This way, latitude and longitude could be obtained and used as input
to the Google maps JavaScript function

map.setCenter(new google.maps.LatLng(latitude, longitude)).

There is the small drawback, that there exists no JavaScript Flickr-API implementation (there
are however some user-built workarounds to this). This problem can be solved by using some
server-sided scripting language such as PHP and pass the latitude and longitude information to

the page containing the Google Mapplet as a request attribute or by performing the call using
JSON (JavaScript Object Notation, [14]).

In order to obtain a JSON answer from services like Flickr, it suffices to send a standard
REST GET-request to a special URL on the Web 2.0 Service domain with some agreed-upon
request parameters. For the Flickr it suffices to add a „format=json“ request parameter at the
end of the URL:

http://api.flickr.com/services/feeds/photos_public.gne?tags=tagToQueryFor&
lang=de-at&format=json

Listing 4: URL used to obtain a JSON answer from Flickr

JSON itself is just a format for transmitting structured data, hence an alternative to XML,
with the one advantage that it represents a subset of JavaScript which can be deserialized with
a simple call of JavaScript's eval() function. Hence, no parsing is required to construct
JavaScript objects from JSON.

Many services provide their APIs for different programming languages, ranging from
ASP.NET and PHP to JavaScript, Python and Ruby on Rails.
Some, like the online trading service eBay offer their API for a multitude programming
languages, others like Google with it's map service specialise on just one.

This leads to a bit of chaotic behaviour concerning the data interchange, when using more
than one service to build a mashup. Also, it would prove little useful to use a new
programming language for every service one intends to use for the mashup.

If supported by the service, JSON and even more so, SOAP provide a nice bridge among
different languages; SOAP for instance can be used in any programming language that
5
Seminar “Grid und Web 2.0“ Winter term 2007/08

implements the SOAP procedure call and has sufficient tools to parse the XML answers and
construct the XML request messages.

2.3 Drawbacks and limitations


The last few lines of the last chapter have already set the tone for some of the limitations:

 When building mashups, the developer is always dependent on the providers of the
services, she is reusing. If the service provider decides to change his API, the mashup
page could suddenly become inoperative. Also, if the service provided by the original
Web 2.0 application becomes unreachable or overloaded for some reason, also the
mashup will become affected.

 The APIs may be available for one or the other programming language; if one finds
oneself unlucky, one has to deal with two or more different programming languages or
struggle with workarounds in order to unify the APIs.

 JavaScript as a frequently used binding element has it's limitations considering


flexibility and performance since it is an interpreted language running in the users
browser, where security considerations stand against too much flexibility. Using
server-side programming languages, however (if the API permits it), possibly takes
away much of the mashup creation simplicity.

 REST is a concept that has worked very well for the WWW for years, but using it for
the CRUD operations (Create, Read, Update, Delete), if one needs to store or
manipulate data, brings it's own set of problems, since it has not been designed for this
task (think of caches etc.). SOAP, on the other hand, which is principally designed for
use with web services, is often criticised for being slow and complicated.

 Mashups per definition require that the XMLHttpRequest is made to third-party-


domains (if it doesn't get it's structured XML data from different Web 2.0 applications,
it's probably not a mashup). Since JavaScript code gets executed in the browser of the
user, the browser's view of this is that a script from site A tries to connect to site B,
which would be a severe security issue, were it allowed. Hence such cross-site calls
get blocked by the browser.
At the time being there are some proposals to circumvent this restriction on the
browser-side, but presently there are only two possibilities to deal with it: One is using
JSON instead of the XMLHttpRequest with the drawback that not many services
support JSON at the moment and the use of JSON increases cross-browser
incompatibilities even more. The other is to use a proxy on the mashup domain to
translate the XMLHttpRequest to the Web 2.0 application that is used for the mashup.
The major disadvantage of this approach is that traffic is generated on the mashup
page that could otherwise be avoided (Figure 1).

 Issues generally critized about AJAX applications, mostly concerning usability, also
apply to mashups. For instance it is often pointed out, that browser controls don't show
the expected effects in Web 2.0 applications – for example using the „Back“ button
very often destroys the current state and navigates back to the latest physical page in
the browser's history or even shows an error message instead of just undoing the last
change. This problem can be dealt with (and it is e.g. in Google applications), however
implementing specific handling functions for all browser controls of all browsers

6
Seminar “Grid und Web 2.0“ Winter term 2007/08

might become cumbersome. AJAX can not hide the fact that it is quite simply a route
around the limitations of a web model originally not designed to support the features
promised by Web 2.0 applications.

Figure 1: Use of a proxy on the mashup server in order to deal with


XMLHttpRequest security restrictions

2.4 Outlook
Mashups have drawn a great part of their success from their ease of creation, which follows
from the availability of a multitude of more or less self-explanatory APIs. Yet, building a
mashup still requires knowing some basics about JavaScript, server-side programming
languages (PHP, ASP) or even web-service technologies like SOAP. In order to reach (nearly)
everyone who might contribute with his or her creativity, mashup creation has to become even
easier.
An example of very user-friendly mashup creation, that already exists, is Yahoo Pipes
(http://pipes.yahoo.com/pipes/). It allows users to aggregate RSS feeds and filter them, pass
received data on to other Web 2.0 applications while watching the result on the fly using a
graphical user interface.
While this editor is incredibly easy to use, the applications that can be created with it, don't
allow for the same degree of flexibility as normal mashups do (which themselves aren't as
versatile as traditional web applications, which again aren't as powerful as desktop
applications). So at the moment, Yahoo Pipes is a convenient editor for nice and small „toy
applications“.
However, as semantic technologies and new web standards continue to emerge, automation of
data exchange will become easier und more efficient and the creation of usable graphical web
interfaces more straightforward.

7
Seminar “Grid und Web 2.0“ Winter term 2007/08

This will allow mashups to become a platform for everybody to create, deploy and publish
applications. Mashup creation tomorrow might be as easy as uploading a video to YouTube
today, given that the current tendency for inituitiveness is continued and a large user base can
be gained by allowing for just a litte more than the creation of nice, but ultimatively useless
„toy applications“.

8
Seminar “Grid und Web 2.0“ Winter term 2007/08

3. Grid Workflows: fast applications for everyone?

3.1 What is a workflow?


Like many other frequently used terms, the term „workflow“ has become quite a buzzword
and is threatened to lose itself in fuzziness. Therefore, also a multitude of definitions of
workflows exist; for the context of the Grid execution however, the abbreviated, simple
definition in [16] shall suffice:

„[By saying workflow we mean] the computerized facilitation or automation of a business


process, in whole or part.“

This leads to the question, what is meant by business process. Hammer and Champy provide a
straightforward definition ([17]):

„[A process is defined as] a collection of activities that takes one or more kinds of input and
creates an output that is of value to the customer.“

Now that the notions of workflows and business processes is established, it remains to define
„workflow management“ ([16]):

„A system that completely defines, manages and executes “workflows” through the execution
of software whose order of execution is driven by a computer representation of the workflow
logic.“

So, mapping your processes to a workflow, means understanding which computations and
other tasks, like copying data or sensor measurements, you have to do in what order to gain
„an output that is of value to the customer“. Here, the customer is the scientist or any other
person doing computationally expensive tasks in need of a Grid and the output is the final
computational result.

In scientific and other Grid applications, the different processes that have to be executed in
order to compute that final result (for a given data set or even just for a description of desired
result data) can be mapped to a workflow and thus automated in their entire execution.

3.2 How execute workflows on the Grid?


Workflow mapping for such scientific applications motivates the use of directed acyclic
graphs (DAG), which have already long been in use for Grid schedule modelling.
Such a workflow, which defines the processes for a scientific computation (figure 2) is called
an abstract workflow. It can be mapped onto Grid resources, i.e. different sites with different
performance indicators that execute the processes in the given order.

9
Seminar “Grid und Web 2.0“ Winter term 2007/08

Figure 2: Simple abstract scientific workflow

If one wants to execute these scientific workflows on the Grid, one has to deal with the many
particularities of Grid environments, namely that they are often opportunistic, unrelieable,
shared within or even among VOs and perhaps untrusted for some critical data.
[6] defines this Grid workflow instantiation W_i as a quadruple (G_r, G_s, Q_u, W_m),
where G_r denotes the Grid resources, G_s the Grid services, Q_u the quality expectations of
the user and W_m the abstract workflow model which one wants to map.
The Grid resources are the physical sites with their different properties, like computational
resources, disk space or disk quota, network bandwidth, process queuing times etc.
The Grid services component in the instantiation is responsible for assisting in Grid specific
tasks, such as data staging, authentication, VO mapping, replica services, information services
for (re-)locating resources and perhaps matchmaking.
The user might expect a certain precision or a maximum execution time, which is expressed
in the quality expectations Q_u.
Figure 3 shows a simple instantiation of the workflow in Figure 2 that adds copying of the
data to a Grid resource and the retrieval of the final result from that resource.

The instantiation of the abstract workflow and hence the mapping to resources and services is
done with varying granularity by different workflow systems. The user just needs to define
the abstract workflow in some textual or graphical representation. That representation is
transformed by the workflow system to a concrete workflow which can be executed by some
meta-scheduler that supports interdependencies among jobs like for instance Condor
DAGman ([2]).

10
Seminar “Grid und Web 2.0“ Winter term 2007/08

Figure 3: Simple concrete Grid workflow

There are several ways to textually represent abstract workflows, most of which incorporate
some form of XML-based syntax. Given the trend towards web-services in Grid applications,
it is however a bit surprising, that standards like WS-BPEL don't enjoy stronger backing by
the Grid community. One potential reason given in [18] is, that currently Grid technology is
mostly embraced by science while BPEL has it's strong background in business and
companies are still a bit reluctant concerning Grid usage in a productive environment.
However, there is some discussion going on about BPEL integration, especially as the
„machine code“ for workflows modelled in a GUI component.

3.3 The Java CoG Kit Karajan engine


The Java CoG Kit workflow engine Karajan ([6]) shall be roughly explained as an illustrating
example because it is rather simple and built upon established Grid middleware technology,
namely the Java CoG Kit [19].
The Java CoG Kit maps Grid functionality to Java classes, partly building upon services
provided by the Globus Toolkit ([7]), partly using Java classes to perform the desired
functionality (for example in authentication).
Karajan uses it's own, XML-like format to specifiy workflows for execution. It supports
several forms of workflow patterns mostly by simple embedding of commands into respective
XML elements (like <sequential>some things to do</sequential> for sequential execution of
some commands).
There are some predefined XML tags to assist in the creation of different workflow patterns:

 Elements for variable assignment and collections assist in defining the XML workflow
representation.
 Standard programming operators like sum (<math:sum>), product etc. allow easy
joining of workflow results or simple computations.
 Conditional statements execute different branches (or different XML-child elements)
based on some precondition.
 Choice terminates with the first child element that finishes execution without error,
behaving like a transaction.
 The sequential tag describes sequential execution of it's child elements
 The parallel construct tells the workflow engine to execute it's child elements in
parallel.

11
Seminar “Grid und Web 2.0“ Winter term 2007/08

 Looping (with the for-tag) is interesting, since it effectively breaks the DAG model of
a workflow. However, it was introduced nonetheless since iterative computations play
a very important role in Grid computing.

Because the development of the Karajan language was initially based on GridAnt, the notion
of tasks is used to denote different types of workflow activities in the Grid, not much unlike
the Ant tasks in a build process. Currently there is support for task::execute, which maps to
running a job on the Grid, task::transfer, which stages a file to or from some site and
task::authentication for authentication. While only those tasks are natively supported, one may
execute arbitrary Java classes by using the <executeJava> element.

The following easy example (taken from [20]) uses execute and transfer tasks to execute ls on
the remote host and afterwards in parallel transfer stdin and stdout to the local machine.

<project>
<include file="cogkit.xml"/>
<execute executable="/bin/ls" arguments="-al"
stdout="stdout" stderr="stderr"
host="hot.mcs.anl.gov" provider="GT2"/>
<echo message="Job completed. Transferring stdout and stderr"/>

<parallel>
<transfer srchost="hot.mcs.anl.gov" srcfile="stdout"
desthost="localhost" provider="gridftp"/>
<transfer srchost="hot.mcs.anl.gov" srcfile="stderr"
desthost="localhost" provider="gridftp"/>
</parallel>
<echo message="Stdout and stderr transferred"/>
</project>

Listing 5: Karajan example with parallel transfer

A rather elegant feature of Karajan, which can be seen in this example, is that the tags can be
universally used. Here, we use <parallel> for parallel transfer, but it can also be used to
denote parallel execution.
The „provider“-attribute specifies the Grid middleware that is used in combination with
Karajan. This adds middleware transperency for the workflow programmer and facilitates the
portation of workflows to new middleware, since yet unrecognized providers may be added
with support by the Java CoG Kit API.

3.4 Other Grid workflow approaches


Besides the Karajan engine, which derives it's charm from it's simplicity, there are many other
interesting approaches towards Grid workflows.
The Pegasus project ([21]) is designed to map abstract workflows to the Grid. To specify
those abstract workflows, it uses the Virtual Data Language introduced by Chimera ([22]).
The idea behind Chimera is the notion of „Virtual Data“, which states that data is seldom
derived from sensor measurement, but often from other, already existing data using some
deterministic process. Hence data transformation and derivation can be described, which lead
to the final result. Chimera is used to generate an abstract workflow based on a virtual data
language description of the problem, which is transformed to a concrete workflow by

12
Seminar “Grid und Web 2.0“ Winter term 2007/08

Pegasus, reduced afterwards and executed on the Grid using Condor's DAGman meta-
scheduler.
Pegasus includes methods which take the error-proneness of Grid resources and the
opportunistic environment into account. It can do „just-in-time“-planning, which means that
only partial abstract workflows are transformed to concrete workflows and scheduled and
once they have completed the Grid state is eveluated again and the next partial workflow is
scheduled.
Triana ([1]) provides a graphical environment for workflow definition. Users can drag and
drop processes onto Triana's desktop and connect them with arrows, which indicate workflow
directions (figure 4). The process components are grouped into modules, that provide
functionality for specific areas of computation (there are modules for GriPhyN, Audio, image
processing aso.). Internally Triana uses an XML format and directed cyclic graphs for
representation (which means, loops are allowed). Triana can import WS-BPEL workflows
using „pluggable“ readers.
The major advantage of Triana lies in it's user-friendliness – users need not write XML syntax
for workflow specification but can use a GUI with existing data analysis tools that just have to
be piped together to form a result – not very much unlike Yahoo Pipes can be used for simple
mashups.

Of course, many other workflow enactment systems exist; enumerating and explaining them
all, however, goes beyond the scope of this document.
Pegasus and Triana were picked as examples, because the former distinguishes between
abstract and concrete workflows and features virtual data and just-in-time planning, while the
latter comes with a intuitive user interface for modelling workflows and provides a nice
workflow counterpart to the Yahoo Pipes application mentioned in section 2.4.

3.5 Why use workflows at all?


To many people the term workflow may sound just like another buzzword and the efforts put
into making workflows available on the Grid as a futile attempt to follow a current trend.
However, workflow enactment engines provide some real advantages over traditional job
schedulers:

1) Clearly workflows can help improve the general usability of Grids.


The vision of the Grid states, that remote parallel processing should be available for
everyone and be considered as natural in the homes of the future as electricity from the
power grid is in the homes of today. However, if Grid applications should be suitable
for a broad audience, they have to be easy in their handling.
This can be achieved by adding further layers of abstraction like workflows. Triana,
for example, allows to describe workflows by dragging, dropping and combining
graphical representations of pre-implemented processes. Thus it shows the potential
of improving Grid usability by employing workflows, allowing everybody capable of
using e.g. a spreadsheet application also to define his/her own Grid application.
2) Workflows often describe data transformation (filtering data, feeding data into an
equation, etc.) from input data to some desired result. In some cases, one might find
oneself in a situation, where output data of one workflow can serve as the input data of
another. For example, in figure 4, the gradient edge result is used as input to writeGIF.
This means, that workflows, once defined, may be reused as part of larger workflows,
which may again be integrated in an even larger workflow (note, that this is not very
much dissimilar to the notion of subroutines in programming). So it becomes easier to
create large Grid applications, once one has appropriate building blocks that are
13
Seminar “Grid und Web 2.0“ Winter term 2007/08

known to be working.

Figure 4: Simple workflow modelled with Triana's GUI

3) Intuitively, obtaining computational results with the help of workflows, should be


slower than just getting them with traditional job schedulers. However, the
computational overhead for the instantiation of concrete workflows, the allocation of
resources and the interpretation of the workflow language turn out to be negligible
compared to the costs of non-trivial actual tasks. To the contrary, the workflow engine
may even have positive effects on the computation time. It might be more reactive to
changes in the Grid environment and perform faster reschedules in the case of
workflows containing many short running jobs. In such cases, the use of Grid
workflow enactment engines may result result in faster program execution.
4) If Grid applications become coded in a huge C-sourcefile or in a slightly smaller Java
program, it might prove difficult to figure out, what the program exactly does, if one
has not been involved in programming. Even the programmer himself might have
forgotten much about the program when looking at it some months after creating it.
Given that the sourcecode is well-documented it still takes at least some time to
understand the workflow behind it. By using simple XML description dialects or even
a graphical representation, one can be faster in understanding the application's
structure and explaining it to third parties.
5) Workflows give users the ability to abstractly model the execution order of and the
dependencies among processes, that are to be executed on the Grid. The processes are
considered a black box and this way, different algorithms capable of solving a problem
can be easily exchanged in a workflow (by simple replacing the respective
program/Triana component/...). Hence, workflows allow to specify what is needed to

14
Seminar “Grid und Web 2.0“ Winter term 2007/08

be done, in order to solve a problem without specifying how it has to be done. This
adds much flexibility to Grid applications.

4. Conclusion
Initially, there seems to be little similarity between Grid workflows and Web 2.0 mashups.
However, a closer look reveals some interesting common points, as well as some fundamental
differences. So, in order to recall some important characteristics about the two upcoming
technologies, table 1 summarizes the most important properties.

Web 2.0 Mashups Grid workflows


Objective Reusing existing web applications or Modelling the way in which data
data as part of a new application; should be processed or obtained
combining applications/data on the Grid
Difference to Abstraction through API calls to Abstract workflow descriptions
traditional APIs defined by original application; independent of actual Grid
approaches no knowledge about application's resources and/or the process
internas required implementations
How is it done? API calls, often using JavaScript Modelling workflow, mostly in
XML language
Standardized? No formal standard, but AJAX, Proprietary solutions for most
REST and RSS are de-facto engines, often some XML
standards language. WS-BPEL hardly ever
used
Current field of Businesses and private users Mostly academia
application
Direction of Bottom-up (some service does it, Top-down (scientific papers,
development others follow) prototypes...)
Vision Allow everyone to build a web Allow everyone to build a Grid
application application
Table 1: Properties of Mashups and Grid workflows

From, the table it can be seen that the underlying technologies of Grid workflows and
mashups were defined in a different way: While Grid workflows have a strong research
background and many scientific papers deal with the topic, AJAX followed a grassroots
development.
Interestingly, however, many Web 2.0 applications and hence also mahsups, use quite similar
programming paradigms and AJAX itself became integrated in programming frameworks,
making it a de-facto standard for modern web development.
Grid workflow languages, however, don't follow a uniform specification standard. WS-BPEL,
a workflow language predestined to be used along with web services, is practically irrelevant
at present time and also meta-schedulers like Condor's DAGman aren't really widespread in
the workflow community.

However, Grid workflows and mashups share a very strong common point: Both are an
abstraction of existing technology. Mashups can be created by combining calls to the
programming APIs of various existing applications, while workflow engines allow specifying

15
Seminar “Grid und Web 2.0“ Winter term 2007/08

what is to be done, without knowing much about the Grid environment or the algorithms used
in the processing chain. Graphical tools exist for both fields of application; structured web
information can be combined using e.g. Yahoo Pipes, while standardized program
components can be combined using e.g. the workflow system Triana. So, both new paradigms
share the fact, that they simplify the usage of existing technology (i.e. web application
creation and Grid computations) and hence contribute to the vision of universal participation
in those fields.

16
Seminar “Grid und Web 2.0“ Winter term 2007/08

5. References
[1] D. Churches et al., „Programming Scientific and Distributed Workflow with Triana
Services“, Concurrency and Computation: Practice and Experience, Vol. 18, pp 1021
– 1037, 2006

[2] DAGMan (Directed Acyclic Graph Manager):


http://www.cs.wisc.edu/condor/dagman/

[3] J.J. Garret: „Ajax: A New Approach to Web Applications“, Adaptive Path LLC,
18. Februar 2005
http://www.adaptivepath.com/ideas/essays/archives/000385.php

[4] R.T. Fielding: „Architectural Styles and the Design of Network- based Software
Architectures“, Dissertation, University of California, Irvine, 2000.

[5] SOAP Version 1.2, W3C Recommendation, 2007-04-27


http://www.w3.org/TR/soap12-part1/

[6] G. von Laszewski, M. Hategan: „Grid Workflow - An Integrated Approach“. Proposal


for a bookchapter, Argonne National Laboratory, 2005.
http://www.mcs.anl.gov/~gregor/papers/vonLaszewski-workflow-draft.pdf

[7] I. Foster: „Globus Toolkit Version 4: Software for Service-Oriented


Systems.“ IFIP International Conference on Network and Parallel
Computing, Springer-Verlag LNCS 3779, pp 2-13, 2006.

[8] „Web Services Business Process Execution Language Version 2.0“, OASIS
Standard, 2007-04-11

[9] „The XMLHttpRequest Object“, W3C Working Draft, 2007-06-18


http://www.w3.org/TR/2007/WD-XMLHttpRequest-20070618/

[10] Google Maps API documentation:


http://www.google.com/apis/maps/documentation/index.html

[11] „Document Object Model (DOM) Level 3 Core Specification“, W3C


Recommendation, Version 1.0, 2004-04-07
http://www.w3.org/TR/DOM-Level-3-Core/Overview.html

[12] „About the W3C Document Object Model“,


http://msdn2.microsoft.com/en-us/library/ms533043.aspx

[13] Flickr API: http://www.flickr.com/services/api/

[14] „The application/json Media Type for JavaScript Object Notation (JSON)“, RFC
4627: http://tools.ietf.org/html/rfc4627

17
Seminar “Grid und Web 2.0“ Winter term 2007/08

[15] „How To Make Your Own Mashup“: http://www.programmableweb.com/code

[16] The Workflow Reference Model. The Workflow Management Coalition, January 1995.

[17] M. Hammer, J. Champy: „Reengineering the Corporation: A manifesto for business


revolution“, Harper Business, New York, 1993.

[18] G. Fox, D. Gannon: „Workflow in Grid Systems“, Concurrency and


Computation: Practice and Experience, Vol. 18, pp 1009-1019, 2006

[19] G. von Laszewski, I. Foster, J. Gawor and P. Lane, „A Java Commodity Grid
Kit“, Concurrency and Computation: Practice and Experience, Vol. 13, 2001

[20] Java CoG Kit Workflow Guide:


http://wiki.cogkit.org/index.php/Java_CoG_Kit_Workflow_Guide

[21] E. Deelman et al., „Pegasus: Mapping Scientific Workflows onto the Grid“,
presented at: Across Grids Conference, 2004.

[22] I. Foster, J. Voeckler, M. Wilde, and Y. Zhao, "Chimera: A Virtual Data System for
Representing, Querying, and Automating Data Derivation", presented at: Scientific and
Statistical
Database Management, 2002.

18

You might also like