You are on page 1of 25

Computer Networks 35 (2001) 473±497

www.elsevier.com/locate/comnet

The Ninja architecture for robust Internet-scale systems and


services
Steven D. Gribble *, Matt Welsh, Rob von Behren, Eric A. Brewer, David Culler,
N. Borisov, S. Czerwinski, R. Gummadi, J. Hill, A. Joseph, R.H. Katz, Z.M. Mao,
S. Ross, B. Zhao
Computer Science Division, University of California, Berkeley, CA 94720-1776, USA

Abstract
The Ninja project seeks to enable the broad innovation of robust, scalable, distributed Internet services, and to
permit the emerging class of extremely heterogeneous devices to seamlessly access these services. Our architecture
consists of four basic elements: bases, which are powerful workstation cluster environments with a software platform
that simpli®es scalable service construction; units, which are the devices by which users access the services; active
proxies, which are transformational elements that are used for unit- or service-speci®c adaptation; and paths, which are
an abstraction through which units, services, and active proxies are composed. Ó 2001 Elsevier Science B.V. All rights
reserved.

Keywords: Distributed systems; Scalable services; Pervasive computing; Thin clients; Ninja architecture

1. Introduction of information. In addition, services are presenting


themselves in a multitude of forms to match the
The emerging Internet landscape is populated particular capabilities of PCs, PDAs, Webphones,
by rich services of immense scale that are o€ered to and other devices; this adaptation to diversity
a diverse spectrum of clients. This presents exciting raises new notions of service composition and
opportunities for innovation in the kinds of ser- content transformation. Moreover, these new ser-
vices that can be created, but also presents tre- vices may be utilized by millions of users.
mendous design and engineering challenges. The In this opportunity for innovation and vast
traditional suite of information stores, commerce delivery lies a deep engineering challenge: a suc-
sites, network services, and search engines are be- cessful service may need to scale to huge levels of
ing combined in novel ways to provide new ser- load over a short period and it must be continu-
vices that aggregate and transform many sources ously available. The Ninja project seeks to address
these two goals ± enabling broad innovation of
*
service design and easily constructing scalable,
Corresponding author. Present address: Department of robust services ± through a distributed service ar-
Computer Science and Engineering, University of Washington,
114 Sieg Hall, Seattle, WA 98195-2350, USA. Tel.: +1-206-685-
chitecture that deals with huge throughput de-
1958. mands and availability requirements in a generic
E-mail address: gribble@cs.washington.edu (S.D. Gribble). fashion, while facilitating service composition.
1389-1286/01/$ - see front matter Ó 2001 Elsevier Science B.V. All rights reserved.
PII: S 1 3 8 9 - 1 2 8 6 ( 0 0 ) 0 0 1 7 9 - 1
474 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

The distributed service architecture tackles the themselves and locate other services in the wide
problem of ease of authoring scalable, robust ser- area. Section 7 illustrates how services are com-
vices at several levels. At the network architecture posed across the platform tiers through the path
level, structure is imposed on the Internet by a concept. Section 8 puts these concepts together in
partitioning into three tiers (scalable service plat- four distinct services. The remaining sections dis-
forms, transformational intermediaries between cuss related work and future directions.
devices and services, and the devices themselves) to
facilitate state management and consistency while
operating in the presence of failures. Deep pro- 2. Overview of the Ninja architecture
cessing power and persistent storage are provided
within the service platform through the use of well- In Fig. 1, we provide a high-level illustration of
engineered clusters on fast, dedicated networks, the Ninja architecture, decomposing it into four
while soft state and functional transformations are basic elements: bases, which are the rich environ-
provided close to the devices. A service is rendered ments that are engineered to support scalable and
along a path crossing all the tiers. These paths are robust services, units, which are the numerous, het-
the natural unit of adaptation, optimization and erogeneous devices that we wish to support, active
management. proxies, which are transformational elements used
At the language level, services are written in a for device- or service-speci®c adaptation, and paths,
type-safe language (Java) to reduce errors and to an abstraction used to compose the other elements.
facilitate composition at well-de®ned interfaces. We motivate and explore each of these in turn.
Code mobility is harnessed to dynamically upload
services into the platform. At the system level, a 2.1. Building robust services in bases
platform provides a set of interfaces and dictates a
programming discipline that yields eciently We de®ne a service as software embedded in the
pipelined services that are robust to excessive load, Internet infrastructure that exports a network-
replicated to achieve high absolute throughput, accessible, typed, programmatic interface, and that
and tolerant of node failures. Services describe provides strong operational guarantees (such as
themselves to a service discovery service, which high availability). The task of building and main-
itself must scale, so that they can be composed taining services is extremely challenging, since if
programmatically to yield new services. It is the they are to be depended upon, they must have the
structure and careful design of the overall platform essential properties of scalability, availability,
that simpli®es the task of authoring services, be- fault-tolerance, and data consistency and persis-
cause they inherit the approach to scalability, tence, all in the face of voluminuous and potentially
availability, fault-tolerance, data consistency, and growing trac demands. Unfortunately, there is
persistence from the platform. currently a lack of suitable reusable building blocks
We begin in Section 2 with an overview of the and design methodologies for service construction.
entire Ninja platform architecture and an intro- We address this challenge in part by con-
duction of its basic terms and concepts. Section 3 straining the execution environment of services: we
develops the core service platform, called a base, mandate that the core of the service must run in a
including the programming model for services, the well-engineered cluster of workstations, which we
execution vehicle, and the approach to scalable, call a base. Clusters [3] are a natural platform for
persistent state. Section 4 describes the character- building Internet services: each cluster node rep-
istics of emerging devices, called units, which resents an independent failure boundary, which
fundamentally rely upon the infrastructure. Sec- means that replication of computation and data
tion 5 describes the role and function of the can be used to provide fault-tolerance. A cluster
transformational intermediaries, called active permits incremental scalability as nodes can
proxies. Section 6 lays the foundation for service be added to increase capacity. Coupled with
composition, showing how services describe high-performance system area networks, clusters
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 475

Bases

Path
Internet

Active Active Active


Proxies Proxy

Units

Fig. 1. The Ninja platform architecture. The architecture consists of bases (services running on clusters of workstations), active proxies
(stateless or soft-state intermediaries between units and services), units (heterogeneous devices and sensors), and paths (a composition
chain across units, proxies, and services in bases).

can deliver excellent performance for relatively to PCs, laptops, and the now familiar class of
low cost. Modern cluster networks can achieve PDAs and mobile devices, networks of even more
greater than 1 Gb/s throughput with 10±100 ls resource-constrained tiny devices such as sensors
latency. and actuators are being attached to the Internet.
Designing software to run on clusters of work- This large family of client devices, which we call
stations is known to be dicult [15]. To simplify units, may have limited connectivity and low or
the task of authoring new services, we have con- intermittent bandwidth, poor computational abil-
structed a cluster-based software platform (called ities, and may be able to handle only a small set of
vSpace) that allows service authors to concentrate data formats and network protocols. We believe
on application-speci®c functionality, rather than that there will be a very large number of units
on details related to scalability, fault-tolerance, attached to the network, reaching scales of hun-
and composability. Services authored to run on dreds of millions, and eventually, billions of
the vSpace platform inherit the essential service devices.
properties from the platform, greatly reducing the Units, by nature, are typically not useful
size and complexity of service code. without supporting infrastructure. We assume
vSpace supports the dynamic uploading of new that units can be easily lost or broken, implying
services by trusted or untrusted third parties; we that any state that they manage must be repli-
believe this open infrastructure is an important cated in a durable environment, such as a service
property necessary to sustain the distributed in- running in a base, which can provide vast amount
novation that has led to the current success of the of highly available, durable storage. Inexpensive
Internet. Authors can construct their services lo- or small units may not have enough computa-
cally, but then upload their services into bases that tional ability to handle the rich set of data types
are externally maintained. and the growing set of protocols deployed in the
Internet, implying that such units must rely on
2.2. Device diversity surrogates that adapt content and protocols on
their behalf.
The spectrum of network-attached client de- Because some units are mobile, they may expe-
vices is growing in diversity and scale. In addition rience regular periods of disconnected operation.
476 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

Bases can assist such weakly connected units with 2.4. The composition of services
the consistency management of data shared across
units. Similarly, while a unit is disconnected, a Instead of constructing a set of isolated, verti-
service running on a base can act as the unit's cal services that can handle a ®xed set of devices,
surrogate by responding to requests based on the our architecture supports the dynamic composi-
most recent information in the service's persistent tion of horizontal services into a path, as well as
data store. adaptation along that path. A path is a ¯ow of
typed data through multiple services across the
2.3. Adaptation wide area, including the interposition of trans-
formational operators to adapt the data into the
The growing number of devices with Internet form expected by the next service or device along
access capabilities presents a unique set of prob- the path. An essential feature of services that en-
lems to the designers of Internet-based services. As able path composition is programmatic access;
the demand for continuous access to content is services must export typed, programmatically ac-
increasing, access to services is being demanded in cessible interfaces, as opposed to the untyped,
new environments such as automobiles [26] and unstructured user interfaces common to services
kiosks in airplane seats [25], and through new today.
devices such as Web-phones. Constructing a ser- Since paths can be established dynamically,
vice that can be easily and securely used from this the path creation infrastructure can perform data
diversity of contexts and devices is daunting, be- ¯ow optimization by examining many di€erent
cause of the huge variation in computational potential paths before deciding on a particular
power, network connectivity, and interface capa- one to use. During the course of this examina-
bilities of the devices. Additionally, the weak tion, it can weigh the costs of the various
computational ability of small devices such as paths, and choose a path that optimizes for
pagers and PDAs prevents them from using quality of service, resource consumption, or
cryptographic protocols such as SSL to access se- some other metric. By allowing the optimization
cure services. In today's Internet, this diversity in process to continue through the lifetime of a
client capabilities simply means that most services given path, the infrastructure adapts the path to
are inaccessible to clients other than standard the changing characteristics of the execution en-
home PCs or oce workstations. vironment. For example, if a network link be-
Rather than forcing services to adapt their comes overloaded while data are ¯owing through
content and access protocols to the abilities of all the path, this ¯ow may be redirected through a
current and future devices, we place transforma- di€erent channel to improve the quality of
tional intermediaries, called active proxies, be- service.
tween devices and services to shield them from A necessary step in forming a path is being able
each other. An active proxy can transform data to locate services to place in that path. The Ninja
types through a process called distillation, adapt architecture includes a service discovery service
protocols (e.g., by converting an SSL connection (SDS) that allows both human users and programs
into a less expensive shared-key encrypted channel to locate appropriate services across the wide area
for CPU constrained devices), or even adapt the based on service attribute queries. All services
value of content by removing sensitive information publish descriptions of themselves to SDS in-
before content is displayed on an untrusted access stances running in their local base. These instances
point. Examples of active proxies include wireless are organized in a hierarchical structure, matching
basestations, network gateways, ®rewalls, caching the administrative structure of the network. Sum-
proxies, and transformational proxies. Devices mary information about known services is ex-
may migrate to a new geographic or administrative changed through this hierarchy; searches similarly
domain, and in the process may need to begin propagate through the hierarchy until matching
using a new active proxy. information is found.
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 477

3. Bases: scalable platforms for Internet services

We have constructed a software platform that


runs on cluster-of-workstation bases to help alle-
viate the challenges of scalable, high-performance
service construction. The platform consists of a
programming model and I/O substrate geared
towards obtaining high concurrency, and a cluster-
based execution environment (vSpace) that pro-
vides facilities for service component replication,
load-balancing, and fault-tolerance. In addition, Fig. 2. Splitting a service into stages. Our programming model
we provide services with a cluster-based, scalable views a service as a sequence of stages, separated by high- or
storage platform (distributed data structures, or variable-latency operations. Stages only share data using pass-
DDSs) that exposes a coherent image of persistent by-value semantics, for example, by exchanging messages.
data across the physical nodes of a cluster. We
describe each of these three elements in turn. acts to decouple the stages, allowing them to be
isolated from each other, and perhaps be physi-
3.1. A programming model and I/O substrate for cally separated across address spaces or physical
high-concurrency services machine boundaries.
Given these separated stages, our programming
Popular Internet services must be able to handle model o€ers four design patterns that authors and
a very high throughput, perhaps even reaching the service infrastructure can apply to compose
tens of thousands of requests per second in the and condition these stages (Fig. 3):
extreme case. A service must remain robust under Wrap. The wrap pattern places a queue in front
this extreme load, and it must also gracefully of a stage, and assigns some number of threads to
handle temporary bursts during which the o€ered the stage in order to process tasks that arrive on
load exceeds the capacity of the service. We call the the queue. The queue conditions the stage to load;
process of achieving this robustness conditioning excess work that cannot be absorbed by the stage's
the service. A necessary (but not sucient) step in threads is bu€ered in the queue. This queue also
conditioning is selecting an appropriate program- serves to expose scheduling and admission control
ming model and concurrency strategy that allows mechanisms to the stage: because the queue is
the service author and the service's execution apparent, the code in the stage can decide the or-
environment to observe and manage constrained der in which to process tasks, and it can also
resources such as threads and client tasks. choose to drop tasks in the case of excessive or
Our programming model imposes a particular long-lasting overload. Because threads are dedi-
abstraction on services, illustrated in Fig. 2. Given cated to the stage, applying the wrap pattern al-
a request from a wide-area client, the service pro- lows the stage to execute independently of other
cesses that request through a sequence of logically stages.
distinct stages, each of which is separated by a Pipeline. The pipeline pattern takes a wrapped
high- or variable-latency operation. For example, stage, and splits it into two pipelined, wrapped
a Web server might have three stages: reading and stages. Pipelining further decouples a stage, and
parsing an HTTP request from a browser, re- allows for functional parallelism across processors
trieving the requested ®le from the ®le system, and or cluster nodes. Pipelining permits optimizations
returning a formatted response to the browser. We such as having a single thread repeatedly execute
impose the constraint that all data sharing between the same code while processing many tasks from a
these stages is done using pass-by-value semantics, queue, thereby increasing instruction locality.
for example, through the exchange of messages Combine. The combine pattern is the logical
containing the data to be shared. This constraint inverse of the pipeline pattern. Given two previ-
478 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

Fig. 3. The four design patterns. The four design patterns, wrap, pipeline, combine, and replicate, can be applied to stages of a service to
condition it against load, failures, and limited or bottleneck resources.

ously independent, wrapped stages, the combine ployment across multiple architectures. We make
operator fuses the code of the two stages into a use of optimizing Java compilers including Open-
single, wrapped stage. Combine permits resource JIT [30] and the IBM JIT compiler [27].
sharing and fate sharing between these previously Implementing the base platform in Java pre-
independent stages. sented two important challenges. The ®rst was the
Replicate. Given a wrapped stage, the replicate lack of nonblocking I/O mechanisms in the Java
pattern duplicates that stage on a number of in- core libraries. We overcame this by implementing
dependent processors or cluster nodes. Replication our own nonblocking I/O library using native code
is used to eliminate bottlenecks; by replicating a wrappers to existing system calls, for example,
stage, the resources that can be applied to its nonblocking sockets and select. The second was
bottleneck are augmented, hopefully increasing providing ecient access to specialized interfaces,
the throughput of the stage. Replication also such as user-level network interfaces to the clus-
duplicates the stage's functionality across multiple ter's system area network. The native code inter-
failure boundaries, introducing the potential for face provided by Java is ill-suited for these
fault-tolerance. interfaces, as they require fast access to hardware
We have implemented a programming library resources and pinned I/O bu€ers outside of the
that makes it simple for both service authors and Java heap. We have developed an extension to the
the service's execution environment to apply these Java environment, Jaguar [46], which performs a
patterns to pieces of code. All network communi- compile-time specialization of Java bytecode to
cation and disk I/O provided by this library are perform low-level operations directly, while
built using a nonblocking, asynchronous event- maintaining type safety and portability. We have
driven style of programming. This event-driven used Jaguar to implement a Java interface to the
style nicely matches the task-driven composition of VIA [7,41] cluster network interface, which obtains
stages, and also permits each note to scale to the 80 ls round-trip latency and over 488 Mbit/s
point where it can handle many thousands of bandwidth over the Myrinet [32] system area net-
concurrent tasks, network connections, and disk work. This is equivalent to the performance of
interactions. VIA as accessed from C and is more than an order
of magnitude greater than that possible when us-
3.1.1. Java-based I/O substrate implementation ing Java's native code interface. Jaguar has also
The Ninja base architecture makes extensive use been used to implement fast object serialization
of the Java [19] programming language, which and memory-mapped ®le access.
provides strong typing, platform independence,
code mobility, and automatic memory manage- 3.2. The vspace execution platform
ment. These language properties are greatly ben-
e®cial for engineering robust Internet services, vSpace is an execution environment for scalable
eliminating many common sources of bugs. Java Internet services which operates on a cluster of
also provides ¯exibility in terms of service de- workstations (see Fig. 4). vSpace services are
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 479

Fig. 4. Software architecture of a base. A base consists of a cluster, the nodes of which run the vSpace execution environment. Services
are implemented as a graph of workers which communicate through a typed task dispatch mechanism. vSpace load balances tasks
across workers based on information from a cluster load monitor. Workers are replicated across nodes for scalability and availability,
and share global state through a consistent, scalable storage platform (distributed data structures, or DDS).

constructed using the programming model de- of the same type are called a clone group. vSpace
scribed in the previous section; services are con- automatically spawns and destroys clones in re-
structed as a graph of workers, each of which sponse to observed system load; workers' queue
consists of a ®xed-size thread pool, an incoming lengths and worker execution times are both used
event queue, and a set of methods that implement to determine the current load. Scalability and
the worker's logic. A vSpace service is described by fault-tolerance are obtained by replicating clones
a formal service de®nition, which precisely speci®es across multiple physical resources (such as the
the set of workers in the service, their code, and nodes of the cluster), and by providing a mecha-
resource requirements. The act of service publica- nism for failure detection and clone restart.
tion resolves intra-service dependencies and e€ec- A worker may send one or more outgoing tasks
tively ``freezes'' the code used by this particular to a named clone group, in which case the out-
version of the service. This allows the entire service going tasks are load-balanced across the clones in
to be treated as a versioned, immutable entity that group. Optionally, the sender may specify a
which is ready for deployment and composition particular clone as the destination for a task. This
with other services. Further modi®cations to the is used as a locality optimization to allow the result
service code result in a new version of the service, of a previous task dispatch to return to the original
and do not a€ect previously published versions. sender.
Workers correspond directly to stages, as de-
scribed in the previous section. Workers commu- 3.3. Distributed data structures
nicate by asynchronously pushing typed messages
onto other workers' queues. Worker instances and A distributed data structure (DDS) [20] is a self-
workers of di€erent types are pipelined, executing managing storage layer designed to run on a cluster
in parallel on the multiple CPUs and physical of workstations at the scale required by Internet
nodes in the cluster. vSpace uses the replicate de- service workloads. A DDS has all of the previously
sign pattern to instantiate copies of workers across mentioned service properties: high throughput,
multiple cluster nodes; each worker instance, high concurrency, availability, incremental scala-
called a clone, uses the same code base and shares bility, and strict consistency of its data, but pro-
global persistent state through a distributed data vides a narrow data structure interface. Service
structure (described below). A set of worker clones authors see the DDS as a conventional data
480 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

structure, such as a hash table, a tree, or a log. tions, but we believe that the atomic consistency
Behind this interface, the DDS platform hides all provided by our current distributed hash table is
of the mechanisms used to access, partition, repli- already strong enough to support a large class of
cate, scale, and recover the data in the DDS (il- interesting services.
lustrated in Fig. 5). The DDS greatly simpli®es To demonstrate the scalability and fault-toler-
service construction by hiding the complexity of ance of the distributed hash table, we have run a
robustly managing scalable persistent state that is number of performance analyses on a large cluster
partitioned and replicated across the cluster. of workstations (the UC Berkeley Millennium
We have implemented a distributed hash table cluster [13], consisting of two hundred and twelve
as an example of DDS. All operations on elements 500MHz Pentium CPUs across 67 SMPs, each
inside this distributed hash table are atomic, in with either 500MB or 1GB of physical memory,
that any operation completes entirely, or not at all. two 15 GB hard drives, and all connected by a
The hash table has one-copy equivalence, so al- Gigabit switched Ethernet). Fig. 6 demonstrates
though data elements in the hash table are repli- linear scaling in throughput of the distributed hash
cated across multiple hash table nodes (or bricks), table as the number of brick nodes serving data is
workers that use the hash table see a single, logical increased; note that for this experiment, most data
data item. Two-phase commit is used to keep all were resident in a physical memory cache on brick
replicas coherent. We have not yet implemented nodes, rather than forcing a read from disk per
transactions across multiple elements or opera- request.

Fig. 5. High-level view of a DDS. A DDS is a self-managing data repository running on a cluster of workstations. All service instances
(S) in the cluster see the same consistent image of the DDS; as a result, any WAN client (C) can communicate with any service instance.

Fig. 6. Throughput scalability. This benchmark shows the linear scaling of throughput as a function of the number of bricks serving in
a distributed hash table; note that both axis have logarithmic scales. As we added more bricks to the DDS, we increased the number of
workers using the DDS until throughput saturated.
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 481

Fig. 7. Availability and recovery. This benchmark shows the read throughput of a three-node hash table as a deliberate single-node
fault is induced, and afterwards as recovery is performed.

In Fig. 7, we show the read throughput of a 4. Units


three-node distributed hash table as a fault is de-
liberately induced in one node, and as that failed The space of units is extremely diverse with
node undergoes recovery. This ®gure shows that large variations in CPU, memory, and storage
the read throughput of this hash table degrades to capabilities, communication bandwidths and la-
2/3rds of its initial throughput as one of the three tencies, and user interfaces. In this section, we
nodes crashes, but quickly resumes to its full brie¯y circumscribe this space by describing a
throughput as the crashed node completes its re- representative set of units. We then focus on a
covery. particularly interesting new class of units, net-
We have also experimented with scaling the worked sensors that are the most constrained in
capacity of a distributed hash table by creating terms of capabilities and resources.
and populating a single hash table with over 1 TB PCs and laptops are examples of extremely ca-
of data spread over 128 CPUs and 128 disks. This pable units, in that they have liberal amounts of
1 TB hash table took 1.5 h to populate, achieving a CPU and memory resources, persistent storage,
write throughput of 256 MB/s (2 MB/s per disk). and sophisticated display capabilities. However,
The disk write performance was seek limited, as laptops still must be capable of dealing with mo-
random keys were inserted into the hash table for bility, disconnected operation, and low bandwidth
this experiment. or unreliable communication over wireless net-
Our DDS implementation makes use of the works.
exposed queues and events (as described in Section PDAs represent a class of device with limited
3.1) to implement ecient internal task scheduling. computation, displays, user interfaces, and persis-
Exposing the queues to the DDS code makes it is tent storage. Cell phones are currently distinct
possible for each DDS brick to ``peek'' into its from PDAs in that they have much more limited
queue of incoming requests and schedule them computational abilities and they are essentially
based on resource availability. For example, in- continuously connected to the network. There is,
coming read requests for which data are present in however, a strong trend towards the convergence
the bu€er cache can be scheduled before those of PDA and cell phone capabilities, yielding a class
requiring disk access. This technique leads to of units that has the minimal graphical user in-
higher throughput, as head-of-line blocking is re- terface, storage, and programmability of a PDA,
duced. The use of event queues also makes it but with the continuous connectivity of a cell
possible to reorder disk accesses for greater local- phone.
ity and to perform prefetching, similar to optimi- The most limited form of units that we consider
zations used in ®lesystems and database storage are networked sensors and actuators. These
managers. devices have extremely limited computational
482 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

resources, almost no storage capabilities and no terms of power consumption. Given this, the
human interface. In addition to limited commu- ability to put hardware components into a standby
nication bandwidth, communication is extremely state can save signi®cant amounts of power. To
expensive for these devices, since power is their maximize the opportunity for putting the device's
most critical resource and communication con- CPU into standby, we have developed an event-
sume signi®cant power. As an example of net- driven software architecture for our operating
worked sensors, we have developed a device, called systems and applications. The presence of blocking
a mote that contains a microcontroller, a radio, operations could limit the system's ability to
and photo and light sensors. This device (which is switch into a low-power mode, especially if hard-
slightly larger than a quarter) can be placed in the ware polling is used to complete a blocking oper-
environment or carried by an individual, and re- ation. In contrast, with an event-driven system, all
ports information collected from its sensors to processing occur in response to hardware events.
services for analysis. We report further on our This allows the processor to enter standby mode
experiences with this device below. between events, as no computation needs to be
done there until the next hardware event occurs.
4.1. Characteristics of networked sensors We have also observed that our network sen-
sors must be able to handle signi®cant amounts
The characteristics of networked sensors re- of concurrency. Sensors are typically I/O centric,
quire a design methodology focused on extreme and must be capable of supporting multiple, si-
eciency, both in terms of computation and multaneous ¯ows of information. Flows can be
power. As an example of networked sensor, we local to a sensor (e.g., the interaction between a
have assembled a ``mote'' that includes an AT- CPU and the physical sensor devices or the radio
MEL 8535 4 MHz microcontroller with 512 B of used for communication), or they may span
SRAM and 8 KB of ¯ash memory, an RF radio across multiple sensors in a sensor network. For
with 10 kbps throughput, a light and temperature example, networked sensors may cooperate to
sensor, and three LEDs for visual feedback of propagate each other's data towards a central
information (Fig. 8). collection point. In this case, the microcontrol-
Somewhat surprisingly, the programming ler's interaction with its sensors must be over-
model that we have designed for these tiny devices lapped with its operation of the radio and
is very similar to that of high-throughput services execution of networking protocols. To exacerbate
in vSpace, although we use this model for the sake the situation, on many microcontrollers, the CPU
of power and computational eciency, rather than must directly interact with the radio (compared
throughput and load conditioning. Power is the with PCs which typically have dedicated NICs to
most precious resource on these devices, and service the communications device), introducing
communication is the most expensive operation in real-time constraints.
To address these challenges, we have developed
the TinyOS operating environment for networked
sensors. TinyOS has a component-based architec-
ture in which each hardware and software com-
ponent exports an interface that contains the set of
commands that it accepts as well the set of events
that it ®res (Fig. 9). Internally, a software com-
ponent is given a statically allocated storage frame.
While handling a command, a component can emit
tasks that TinyOS's scheduler must execute. Tasks
Fig. 8. A TinyOS-based mote. This ``mote'' includes a 4 MHz
microcontroller, a software-driven radio, and an application
are similar to vSpace workers, but they share the
that coordinates with neighboring motes to discover an ad hoc state of the component that created them rather
sensor network routing topology. than sharing state through the passing of typed
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 483

larly interesting feature of these devices is the


ability to wirelessly reprogram them by sending
system images over the sensor network.

5. Active proxies

Active proxies in the Ninja architecture serve


the purpose of performing ``impedance matching''
between client devices and services by adapting
data and access protocols to the devices' and ser-
vices' needs. Because active proxies can execute in
an environment local to devices, active proxies can
perform context-aware optimizations and trans-
formations on behalf of devices. We believe that
active proxies bring three essential properties to
the Ninja architecture: dynamic service adapta-
Fig. 9. A TinyOS software component. Each TinyOS software
tion, secure access to information, and the fusion
component accepts and emits commands and events. Com- of multiple devices. We describe each of these in
mands ¯ow from higher level layers to lower levels, and events turn.
¯ow from lower level layers to higher levels.
5.1. Dynamic service adaptation
Java objects. Unlike general purpose threads, Ti-
nyOS tasks execute to completion and are atomic In the Ninja architecture, active proxies assume
with respect to each other. TinyOS includes a the responsibility for mitigating the heterogeneity
two-level scheduling mechanism that allows high- of units by translating both network protocols and
priority events to preempt low-priority tasks. data formats between clients and services. At the
Real-time constraints (e.g., servicing the radio) are network protocol level, active proxies can com-
met by using high-priority events, while less critical municate with clients through protocols specially
operations (such as gathering data from a tem- designed for low-computation, low-power, or
perature sensor) are serviced with the remaining poorly connected devices. This is important since
low-priority CPU time. common service communication protocols such as
The use of the TinyOS component model and Java RMI, Ninja RPC, and Jini assume that cli-
scheduler greatly simpli®es the composition of ents are well connected and computationally
multiple components on a sensor. To demonstrate powerful. Similarly, active proxies can be used to
this, we have built an application in which our help establish connections between clients and
motes self-assemble in an ad hoc network, and services by performing more complicated tasks
communicate their routing information to a stati- associated with cryptographic handshakes [16].
cally con®gured active proxy node. This routing Additionally, active proxies can distill service
discovery application, as well as the particular content into a format more suitable for small
operating system tuned for this hardware, are devices [17]. Content presentation can be tailored
composed of several TinyOS components. The for small screen layouts, and image resolution and
components are composed together using a CAD bit-depth reduced both for limited display and
tool (which represents commands and events with network capabilities of these devices [15,17]. For
CAD symbols), and structural VHDL is exported example, a HTML representation of the content
by the CAD tool. This VHDL is used at compile can be rendered as WML for a WAP [44]-enabled
time to assemble the system image that is down- phone, a custom application format, or even as
loaded into the device's ¯ash memory. A particu- voice. Active proxies may perform this ®ltering at
484 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

the application level (e.g., by selectively dropping 5.3. Multiple device fusion
MPEG frames in a video stream), or at the pro-
tocol level (e.g., by delaying or compressing data In addition to enabling basic access, active
to increase actual or perceived throughput, based proxies can be used to combine the capabilities of
on knowledge of the network conditions that the several devices. This is useful for both content and
device is currently experiencing). security adaptation. For example, the limited GUI
of a PDA can be supplemented by the richer,
5.2. Secure service access from diverse clients larger display of a public kiosk, by placing most of
the application on the kiosk while displaying and
Current security models of infrastructure ser- entering sensitive personal information on the
vices assume that both the user's access device and PDA. Entering form data using a pen-based in-
the software running on it can be trusted not to terface is tedious at best and even more cumber-
intercept or send private information elsewhere. some using number pads on devices such as
Unfortunately, this is not the case for many access cellular telephones. Active proxies can split the
points, including public kiosks. A subverted kiosk trust between the PDA and the public terminal by
is able to record all keystrokes (such as typed fusing the devices together to provide one logical
passwords), monitor network trac to extract channel with secure access to the end service. This
personal information such as account numbers or device fusion can only be done because the active
mailing addresses, or perform active attacks by proxy is aware of the context in which the devices
hijacking connections, even if the network trans- are being used, and thus serves as another example
mission is encrypted. To avoid such attacks, of context-aware adaptation.
trusted active proxies can perform context-aware
transformations on data before it arrives at a kiosk
to reduce the content value. Proxies can also in- 6. Service location across the wide area
troduce alternative authentication mechanisms
(such as one-time passwords) so that users will not The service discovery service (SDS) [10] serves
need to divulge passwords or other personal in- two important, and complementary roles: it pro-
formation to untrusted infrastructure. In Section vides a mechanism by which services can announce
8.2, we describe in detail an example framework their presence to the infrastructure, and it provides
that has this functionality. a mechanism by which both human users and
PDAs are problematic because they are gen- programs can locate these announced services
erally power-constrained, computationally limit- across the wide area. While designing the SDS, we
ed devices with little memory and poor focused on providing a fully secure, semantically
networking capabilities. To perform the industry- rich service location system that would successfully
standard SSL handshake phase on one such scale to the wide area. The SDS is a scalable, fault-
device (a Palm Pilot) requires 5±10 s. This tolerant, and secure information repository, pro-
latency imposes an intolerable delay for con- viding clients with directory-style access to all
nection setup, which is particularly undesirable if available services. Services describe themselves to
network connectivity is intermittent. An SSL local SDS instances; these descriptions are pub-
implementation that uses elliptic curve cryptog- lished and aggregated across a wide-area hierar-
raphy [8] is feasible on a Palm Pilot V, but few chy, and clients can query this hierarchy of SDS
Internet services support that option. Active instances in order to locate services.
proxies can be used to adapt the security In addition to serving as a location mechanism,
requirements of services to the capabilities of the the SDS also plays an important role in helping
device. Trusted active proxies can present units clients determine the trustworthiness of services,
with power and computation ecient security and vice versa. This role is critical in an open
protocols, while communicating with end services environment, where there are many opportunities
through standard protocols. for misuse, both from fraudulent services and
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 485

misbehaving clients. To address security concerns, ``announce/listen'' model of data propagation.


the SDS controls the set of agents that has the Servers cache data contained in periodic multicast
ability to discover services, allowing capability- messages. Component failures are tolerated
based access control, i.e., to hide the existence of through the course of normal operation, removing
services rather than (or in addition to) disallowing the need for a separate recovery procedure: re-
access to a located service. covery is enabled simply by listening to channel
As a globally distributed, wide-area service, the announcements [2].
SDS must surmount challenges that are not faced To provide both ¯exibility and simplicity in the
by services that operate solely inside a base. The service query mechanism, the SDS uses XML to
global SDS service must be robust against network encode both service descriptions and queries.
partitions and component failures, it must address XML allows the encoding of arbitrary structures
the potential bandwidth limitations between re- of hierarchical named values, and supports vali-
mote SDS entities, and it must arrange its indi- dating service descriptions against well-de®ned
vidual SDS instance components into a hierarchy schemas (document type de®nitions). This mech-
to distribute the query workload (implying queries anism gives SDS servers functionality to validate
must be routed across this hierarchy). select service descriptions while allowing evolution
of existing service description schemas.
6.1. Design SDS servers are responsible for sending au-
thenticated messages to the well-known global
The SDS system (see Fig. 10) is composed of SDS multicast channel, including announcing
three main components: clients, services, and SDS multicast addresses to be used for service an-
servers. Clients want to discover the services that nouncements, the rate at which announcements
are running in the network. SDS servers solicit should be repeated, and contact information for
information from the services and then use it to the certi®cate authority and the capability man-
ful®ll client queries. To provide scalability in both ager. On each channel, a measurement-based pe-
number of services and volume of client requests, riodicity estimation algorithm determines the
SDS servers are organized into a hierarchical optimal send rate for messages that produce the
structure. Services and requests are associated with desired trade-o€ between update latency and
SDS servers according to each server's domain, the bandwidth utilization. Each server can spawn a
network extent that it covers. child server in order to hand o€ a portion of its
To propagate information across potentially load. Parents monitor child servers through
heterogeneous service architectures, we use an heartbeat packets, and will restart a crashed child

Fig. 10. Components of the Berkeley SDS. Dashed lines correspond to periodic multicast communication between components, while
solid lines correspond to one-time transport connections.
486 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

server. Because servers keep a cache of an- queries from all clients to reach services on all SDS
nouncement service descriptions, a restarted server servers. In our approach, servers dynamically ar-
restores its data by listening to the channel. range themselves into a multi-level hierarchy,
To provide authentication, privacy, and access summary information is propagated up to parent
control, SDS servers work with a certi®cate au- servers, and queries are partitioned among and
thority (CA) and a capability manager (CM) using forwarded to the relevant servers.
secure communication protocols. The CA is a The actual organization of the hierarchy can be
trusted source which provides proof of the binding dependent on many criteria, such as administrative
between a principal and its public and encryption domains or network topology. We believe that the
keys, in the form of a certi®cate. The CM manages mechanism should support the existence of multi-
individual access control lists (ACLs) on behalf of ple hierarchies, and actual usage should be based
each authenticated service. Communication be- on policy. Individual servers can choose to par-
tween SDS components utilize appropriate secu- ticipate in more than one hierarchy by keeping
rity measures while minimizing the performance multiple routing tables, one for each hierarchy.
penalty. SDS server announcements need authen- To prevent upper-level servers in the hierarchy
tication, and are therefore signed, including an from being overwhelmed by update or query
embedded timestamp. Service providers encrypt trac, the SDS architecture ®lters information
their description broadcasts with a symmetric key, while it propagates upward. In particular, the in-
which accompanies the message as a data block formation is summarized in a way that allows
encrypted by the server's public key. This allows queries to determine which, if any, branch contains
caching of the symmetric key in the common case, potential matches.
and simple recovery of the symmetric key during To accomplish this lossy aggregation, we use
failure recovery. Communication among servers hash summarization, where information is sum-
and clients relies on a separate authenticated marized using a unique N-to-M mapping of data
transport channel. values. Complicating this procedure is the SDS'
A client queries its SDS server over an au- use of the subset query model, where matching
thenticated channel to pass in an XML service documents can be identi®ed by a partial list of
template and its access rights in the form of ca- service characteristics. Our solution is to hash a
pabilities. The server uses an internal XML dat- limited number of tag subsets, each subset con-
abase (called XSet [51]), to ®nd services accessible taining a single tag or a cross-product of two tags.
to the client which satisfy the query, and returns This limits computation required for summariza-
them to the client. tion. To address the issue of storage space for
In order to make their services known, service summarizations, we use Bloom ®lters [5]. Bloom
providers listen on the global multicast channel to ®lters collapse hashed summarizations into a ®xed-
routinely determine their current responsible SDS size table, accepting greater possibility of false
server. Providers periodically broadcast service positives in return for less storage requirements.
descriptions to a multicast channel, using the ad- In summary, SDS servers dynamically organize
dress and broadcast rate de®ned by the server themselves into potentially multiple hierarchies for
announcement messages. Providers are also re- data partitioning and query routing. Each server
sponsible for contacting a capability manager and uses multiple hash functions on various subsets of
de®ning access control information for its services. tags in service announcements, and uses the results
to set bits in a bit vector. Servers which are internal
6.2. Wide-area operation nodes in the hierarchy combine bit vectors from
itself and its children servers, and associates the
The SDS wide-area hierarchy is designed to result with this branch at its parent node. After
scale-up in both query volume and number of receiving a query, each server checks its own bit
available services, while adapting to changes in the vector for a match, and failing that checks its chil-
underlying entities. The primary goal is to allow dren vectors to determine which branch to forward
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 487

the query to. A server resolves a query against the 7. Paths: composition of services across the wide-
vector by multiply hashing it and checking if all the area
matching bits are set in the bit-vector. A missing bit
guarantees a true miss, while a match could signal The primary goal of paths is to facilitate the
either a false positive or a true hit. composition of services. To be most useful, the
The distribution of data across the wide area infrastructure should attempt to automate as
exposes a trade-o€ between consistency and per- many parts of the path creation process as possi-
formance. Strict consistency is dicult to achieve ble. In our design, an automatic path creation
in the face of frequent updates, given the wide (APC) facility automates the task of ®nding paths
area's constraints on network bandwidth, trans- between system components, creating the network
mission latency, and the greater possibility of connections between components, ®ne tuning the
network partitions. Therefore, the SDS system performance of the data ¯ow, and handling error
provides loose consistency guarantees about ser- conditions. Whenever possible, the APC facility
vice location information across the wide area. protects users from the failure of individual path
components or communication links. The ideal
situation would be to provide the illusion that the
6.3. Performance
user is accessing a single robust service providing
the composed functionality. Because the APC fa-
We have measured the performance of a single
cility handles large numbers of concurrent users,
SDS server on an Intel Pentium II 350 Mhz with
we designed its path construction algorithms to
128 MB RAM, running on Linux 2.0.36 using the
scale well as the number of components increases,
Blackdown JDK 1.1.7 and the TYA JIT compiler.
even though the number of possible paths may
The results are presented in Table 1. This table
grow exponentially as components are added.
shows that the primary sources of latency are the
A path comprises of a sequence of operators
authenticated transport connections and capability
that perform computations on data and connectors
checking using the Cryptix Java security library.
that provide protocol translations between opera-
We expect both of these components will decrease
tors. A connector is a channel through which op-
signi®cantly as a result of ongoing research. Fur-
erators can pass application data units (ADUs).
thermore, the XML query processing is shown to
The connector hides potential di€erences in net-
scale logarithmically with the size of the data set
work protocols from the operators, and allows
[51]. Finally, using these performance numbers, we
them to communicate as long as the output data
estimate that a single SDS server (using o€-the-
type of the downstream operator matches the in-
shelf components) can handle a user community of
put data type of the upstream operator. Each
about 500 clients sending queries at a rate of of
connector is characterized by a speci®c transport
one query per minute per client.
protocol.
Operators perform computation on data ¯ow-
Table 1
ing along the path. Operators are strongly typed:
Secure query latency breakdown they have a clear de®nition of the input they accept
Description Latency (ms)
and the outputs they produce. Operators have
various attributes such as supported communica-
Query encryption (client-side) 5.3
Query decryption (server-side) 5.2 tion protocols, computational requirements, or
Authenticated transport overhead 18.3 required external data (e.g., a remote database). In
Query XML processing 9.8 addition, operators have associated cost metrics,
Capability checking 18.0 which describe the run-time performance of the
Query result encryption (server-side) 5.6 operator and are used for optimization during
Query result decryption (client-side) 5.4
Query unaccounted overhead 14.4 path creation. The type and attributes for each
operator are combined to form an XML descrip-
Total (secure XML query) 82.0
tion of the operator. These descriptions are used to
488 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

determine which combinations of operators could 3. A map service, running in a base, receives the
make a valid path. address, and returns structured text represent-
The Ninja architecture provides two classes of ing driving directions to the speci®ed address.
operators: long-lived and dynamically created. These directions are passed along a TCP-based
Long-lived operators are standard Ninja services, reliable bytestream connector to the next oper-
and hence support both the data persistence and ator in the path.
fault-tolerance properties previously discussed for 4. A text-to-speech translator, running in the same
services. These are registered with and located active proxy as the speech-to-text operator,
through the service discovery service (SDS). Dy- transforms the textual driving directions into
namically created operators are light-weight, audio. An RTP-based audio connector is used
short-lived transformation elements created by the to send this audio to the user's cell phone.
APC facility as required. These operators, which 5. The user hears the driving directions being
run in active proxies, have only soft-state and spoken to her over her phone.
hence can be simply restarted if the active proxy
fails. 7.2. Path construction
While the reliability of both long-lived and dy-
namic operators helps to guarantee that a path can To create a Ninja path, a user provides the APC
be reconstructed when a failure occurs, this does facility a speci®cation of the endpoints of the
not safeguard against the loss of data that was required path, a partially ordered list of operators
already in the path when the failure occurred. that must be included in the path, and an accept-
Hence, applications that use paths must provide able range of costs for the path in terms of latency,
their own mechanisms for guaranteed or in-order computation or memory requirements. This in-
data delivery if this is required. formation is used to construct an optimal path for
the user's speci®c requirements. The path con-
7.1. An example of a path struction process consists of four steps. As shown
in Fig. 11, path construction is a process of con-
As a motivating example, consider a map ser- tinuous feedback and optimization. The details of
vice that provides driving directions in response to each step are described below.
a user-speci®ed address. This example illustrates Step 1: Logical path creation. A logical path
the composition of two operators with a service, consists of an ordered sequence of operators that
and shows how active proxies that are selected by
the path creation process are used to perform
protocol and data format translations between
clients and services. To allow access to the overall
audio driving direction service from a cell phone,
the APC facility might create a path as follows:
1. The user initiates a call from a cellular phone.
The user speaks the address to which she wishes
to get driving directions. An RTP-based audio
connector is used to send this audio to the ®rst
operator in the path.
2. A speech-to-text operator, running in an active
proxy, is used to convert the spoken audio into
structured text using a grammar speci®cally
chosen for this context (address input). The
Fig. 11. Path construction process. Path execution is an iterative
structured text emitted from this operator is process of optimization. The Ninja APC facility guarantees the
passed along a TCP-based reliable bytestream availability and fault-tolerance of a constructed path by re-
connector to a map service. building its physical or logical path.
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 489

are joined with connectors. During logical path stops the data ¯ow, removes connectors, and shuts
creation, the APC facility searches through the down any dynamic operators. As a performance
XML descriptions of the operators, to ®nd valid optimization, the APC facility may cache com-
sequences that could perform the computation monly used logical and physical paths for reuse at
requested by the user. The result of logical path a later time.
creation is a list of possible operator sequences.
Note that since some operators may be commu- 7.3. APC implementation and evaluation
tative (image format transcoders, for example), the
space of all possible logical paths is large. Hence, We have developed an initial prototype of the
only a small number of logical paths are consid- APC facility that supports both long-lived and
ered initially. dynamic operators. In addition, we have a special
Step 2: Physical path creation. A physical path class of dynamic operators that can be used to
is a mapping of a particular logical path onto wrap existing services. This allows the APC system
physical nodes which execute the operators. Nodes to make use of older services that cannot com-
for long-lived operators are chosen from the municate directly with our connectors.
known services that provide the desired function- Each operator has a reference to an output and
ality, as located using the SDS. Nodes for short- input connector that speaks a speci®c transport
lived operators are chosen according to the protocol. All connectors implement a common
computational capabilities of the node, and the Java interface. To interact with previous and
cost of using that node in the path. The APC fa- subsequent operators in the operator chain, each
cility constructs a physical path from a logical path operator invokes read and write methods of this
by ®nding the lowest cost nodes that meet the interface to receive its input data and send its
user's requirements. output data. TCP, UDP, and RTP connectors are
Step 3: Path instantiation, and execution. Once supported in the current prototype.
the nodes of the path have been selected, the APC Our current implementation encompasses the
facility starts any required dynamic operators, and full range of path creation described previously.
sets up appropriate connectors between the vari- Logical paths are created by searching the XML
ous operators. Once all nodes in the path are set descriptions of the available operators to ®nd the
up, data ¯ow is started. In addition, a control smallest number of operators that can perform the
channel (used for reporting of error conditions and desired data ¯ow. A physical path is then selected
performance information) is established between by placing operators on the least loaded nodes of
the operator nodes and the APC facility. During the network.
the lifetime of the path, the APC facility actively Machine failures are automatically detected by
monitors the operator nodes to make sure that the APC service, and running operators are re-
they are functional. Operator nodes report prob- started on other nodes. Fault detection is achieved
lems to the APC facility about their neighboring by either time-out of a heartbeat beacon or by
nodes in the path, so that the path is repaired when catching an I/O exception when reading or writing
necessary. The APC facility monitors the perfor- data from or to the failed machine. Our prototype
mance of the path, and reroutes the data ¯ow if does not presently exploit the possibilities for
new conditions make the original path suboptimal. performance tuning through dynamic reconstruc-
The control path is used for exception handling, tion of paths.
controlling parameters of path components,
monitoring and analyzing path performance; thus,
it needs to be independent of data paths and be 8. Example services
highly robust.
Step 4: Path tear-down. When a path is no Having completed the description of the Ninja
longer needed, the user informs the APC facility architecture, in this section of the paper we de-
that it should be removed. The APC facility then scribe a number of interesting applications that we
490 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

have built on top of it. These applications dem- songs. The user provides a query song and a set of
onstrate the capability of the Ninja architecture to parameters to use for the search, as well as the
facilitate the simple construction of robust, scal- number of results to return; the query engine re-
able services that are accessible by a diversity of turns the songs in the Jukebox which sound the
devices. This illustrates the opportunity that our most similar to the query song. The search is
architecture provides for the widespread innova- based on a k-nearest-neighbor search in a multi-
tion of both services and devices. dimensional space of features previously extracted
from each song. The query engine runs on the
8.1. The Ninja Jukebox same MultiSpace platform as the Jukebox itself,
and its user interface is integrated into the Juke-
The Ninja Jukebox [18] was an early applica- box client.
tion built using our architecture, and it demon-
strates some of our platform's key features. The 8.2. An active proxy framework for accessing
Jukebox allows a community of users to build a services through untrusted devices
distributed repository of digital music, and pro-
vides a collaborative ®ltering mechanism based on A more general service that we have imple-
users' music preferences. Cluster nodes are har- mented is an active proxy framework that pro-
nessed to rip MP3 ®les from their local CD-ROM vides secure multi-modal access to Internet
drives, and to act as servers for streaming MP3 to services from units [23]. Consider the case of us-
clients. One node acts as the music directory, and ers accessing their stock trading accounts from
maintains a soft-state index of the songs published public access terminals. Instead of relying on the
by each cluster node; the Jukebox client applica- terminal to protect their secure information, the
tion contacts the directory to obtain a list of songs, users can direct private or sensitive information
and streams MP3 directly from the appropriate such as portfolio values or account numbers to
node using HTTP. their personal PDA, while using the rich GUI
The Ninja Jukebox is based on MultiSpace [21], capabilities of the public terminal to initiate re-
an early design prototype of the base service quests and display generic stock information
platform. MultiSpace nodes, each running a JVM, (e.g., stock price ¯uctuations and historical
communicate through the use of NinjaRMI, an graphs). Users initiate trading operations through
extensible variant of Java remote method invoca- the untrusted public terminals, but then con®rm
tion [38]. Each component in the Jukebox appli- them using their trusted portable devices. Net-
cation exports a NinjaRMI interface which is work connections to the users' PDAs are pro-
invoked either internally to the cluster or exter- vided either by the environment, such as with
nally by the Jukebox client application (which also kiosks with infrared network connections, or by
makes use of NinjaRMI). NinjaRMI provides the devices themselves, for example, by directly
support for strong authentication and encryption, initiating a connection from a wireless data en-
which is used to control access to the Jukebox abled PDA.
service. Each song in the Jukebox can have an The proxy is implemented as a collection of
associated ACL authorizing a particular set of vSpace workers that abstract the functionality of
users to listen to it. security adaptation, service adaptation, and device
Constructing this application as a set of fusion. By combining generic content and security
strongly typed, distributed components greatly transformation functions with service-speci®c
simpli®ed service construction and facilitated rules, the proxy architecture decouples device ca-
evolution, as new components could be added to pabilities from service requirements and simpli®es
the service as needed. An example of service evo- the addition of new devices and services. The ser-
lution was our addition of the Jukebox query en- vice uses XML as a standard data representation;
gine [45], which allows users to search for music in one vSpace worker transforms requests from
the Jukebox based on musical similarity between the untrusted access device into an XML
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 491

representation. Another worker provides access services consist of half a dozen style sheets for each
control ®ltering on these requests, possibly inject- device format, each ranging from 50 to 300 lines in
ing secure information into the request: for ex- length. Fig. 12 shows the output of the Yahoo-
ample, this worker may convert a one-time Contest service rendered as WML for a WAP
password provided by the user through the un- browser running on a trusted Palm Pilot. In this
trusted terminal into a password that must be example, because the device is trusted, sensitive
supplied to the service being accessed. information such as the number of owned shares
A third worker transforms the request from its has not been removed by the proxy. In Fig. 13, we
generic XML representation into whatever proto- show the output of the Datek trading service ren-
col is necessary to access the service. For example, dered as HTML on an untrusted Web browsing
if the service is web-based, this worker will convert kiosk; note in particular that sensitive information
the XML into a HTML form to be submitted to such as account numbers and the number of pur-
the service's web server. This worker receives the chased shares have been removed.
content returned from the service, and transforms
it back into XML. A fourth worker performs a 8.3. NinjaMail
sequence of ®ltering operations on the data in
order to remove any sensitive information that Electronic mail was one of the ``killer apps'' of
should not be revealed to the untrusted device. A the early Internet, and even today, the number of
®nal worker is used to transform this ®ltered XML
into whatever protocol and data format is needed
to render the content on the untrusted terminal.
This ®nal XML transformation is driven using
device-speci®c XSL style sheets.
The framework currently allows access to both
the Datek Online [12] and YahooContest [48] stock
trading services, and we are currently adding access
to a HTML-based mail service. Adding support for
a new service merely requires authoring a script to
convert the service's content into an XML repre-
sentation. For example, the YahooContest service
consists of half a dozen scripts each of which are
approximately 250 lines or less. Rendering content
for di€erent client formats requires authoring the Fig. 12. WML for PDA. WML holdings page customized for
appropriate XSL style sheets. Our example stock PDA display.

Fig. 13. Security ®ltered HTML. Stock holdings ®ltered for display on a public kiosk.
492 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

into a set of workers to be a natural programming


model and the typed task dispatching allowed the
components to be easily composed. We also cre-
ated an event mechanism that allowed extension
modules to register with the MailStore service to
receive noti®cations when particular events occur
such as e-mail receipt. This allowed for very di-
verse services to be built, such as an instant mes-
saging noti®er of new e-mail.

8.4. Sanctio
Fig. 14. The NinjaMail architecture.

Recently, there has been an ongoing contro-


users with e-mail access is growing exponentially. versy over access rights to proprietary instant
At the same time, these users are expecting more messaging networks, such as AOL's AIM network
complex functionality such as embedded multi- [1]. Many companies have tried to compose their
media and anytime/anywhere access. These two own services with these existing networks, how-
trends have implications on the requirements of ever, the owners of the proprietary networks have
modern e-mail servers. Hotmail alone has over 61 attempted to prevent such composition, as it
million active users [33], and if they o€ered just diminishes their perceived market penetration.
50 MB worth of storage to each user, their We have built a service called Sanctio, which is
servers would have to handle over 3 petabytes of an instant messaging gateway that provides pro-
data. tocol translation between popular instant mes-
The goal of the NinjaMail [42] project is to saging protocols (such as Mirabilis' ICQ and
build a scalable and feature-rich e-mail service on AOL's AIM), conventional e-mail, and voice
top of Ninja. NinjaMail was built to act as a messaging over cellular telephones. Sanctio obvi-
general e-mail infrastructure which other applica- ates this controversy by bridging together these
tions and services could use to provide more spe- previously proprietary networks into an instant
ci®c functionality, as depicted in Fig. 14. This messaging internetwork. Sanctio runs on a vSpace
loose coupling of the separate components allows base, and acts as a middleman between all of these
for more ¯exibility and extensibility than tradi- messaging protocols, routing and translating
tional e-mail servers. messages between the networks (Fig. 15). In ad-
At NinjaMail's core, the MailStore module dition to protocol translation, Sanctio also can
handles storage operations such as saving and transform the content of messages. We have built a
retrieving messages, pushing out noti®cation of ``Web scraper'' that allows us to compose Alta-
e-mail events, updating message metadata, and Vista's BabelFish natural language translation
performing simple per-user message metadata service with Sanctio, and thus the service can
searches. A message's metadata represents its perform language translation (such as English to
mutable attributes which are used to record its French) as well as protocol translation. A Spanish
¯ags and current folder. Access modules support speaking ICQ user can send a message to an En-
speci®c communication methods between users glish speaking AIM user, with Sanctio providing
and NinjaMail, including an SMTP module for both language and protocol translations.
pushing messages into the MailStore and POP and Users can take advantage of unmodi®ed com-
HTML modules for user message access. mercial client application software in order to use
Each of the above modules is a separate worker Sanctio, or they can use software that we have
running in the cluster, with scalability being constructed for mobile devices such as Palm Pilots.
achieved by running multiple clones of the worker. This software interacts with the Sanctio service
We found that decomposing the NinjaMail system through an active proxy. The proxy presents a very
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 493

Fig. 15. Sanctio messaging proxy. The Sanctio messaging proxy service is composed of language translation and instant message
protocol translation workers in a base. Sanctio allows unmodi®ed instant messaging clients that speak di€erent protocols to com-
municate with each other; Sanctio can also perform natural language translation on the text of the messages.

simple text-based messaging protocol to the Palm devices. CORBA [40] and DCOM [14] provide
Pilot, but interacts with Sanctio using the more platform-independent, object-based network
sophisticated AIM or ICQ procotols. communication, although both systems are de-
Because a user of the service may be reached on signed for tightly coupled distributed applications
a number of di€erent addresses (potentially one and do not directly support composition and
for each of the networks that Sanctio can com- aggregation of components. Jini [39,43] takes a
municate with), Sanctio must keep a large table of Java-centric view, exploiting bytecode mobility to
bindings between users and their current transport deliver stub code which implements a private
addresses on these networks. We used a distrib- communication protocol between client and ser-
uted hash table DDS for this purpose. vice, stubs export a programming model based on
remote method invocation (RMI) [38]. Although
Jini's literature describes a holistic distributed
9. Related work computing model not unlike that of Ninja, the
system has been developed mainly for use within
A number of projects share aspects of the Ninja a workgroup, and does not provide security or
vision of seamlessly interconnecting devices and scalability for the wide area. eSpeak [22] is
Internet services. Related work can generally be another Java-based middleware system which in-
characterized as addressing speci®c aspects of this tends to scale to the wide area, and to integrate
problem space (such as supporting scalable ser- PKI into its nonstandard messaging layer. Nei-
vices or embedding intelligence in the network), ther system addresses service scalability and fault-
rather than taking Ninja's vertical approach to tolerance, or access from impoverished devices
building a general-purpose Internet services plat- which cannot run a Java-based communication
form. As the number of related projects in this protocol.
domain is extremely large ± spanning operating The goals of the Ninja Base environment are
systems, programming languages, networks, em- re¯ected by various application servers, including
bedded systems, and distributed computing plat- IBM WebSphere [24] and BEA Weblogic [4].
forms ± we limit our discussion here to those These systems strive to simplify the construction of
projects which have taken a particularly comple- scalable, fault-tolerant Internet services, generally
mentary approach to the Ninja system design. requiring that applications be constructed as a set
Flexible middleware systems, which support of Java components using an interface such as
distributed computing across heterogeneous re- Enterprise Java Beans (EJB) [37]. EJB components
sources, are directly related to Ninja's goal of are expected to be stateless or to manage their own
tying together Internet services with diverse small state persistence. EJB components usually interact
494 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

with a database to achieve the latter. vSpace di€ers Additionally, we believe that universal Plug and
from these application servers mainly by mandat- Play and Jini-based devices could be easily inte-
ing an event-driven programming style (which grated into the Ninja architecture.
facilitates high concurrency) and through the use
of the DDS layer for persistence. The Ninja Base
environment was inspired by earlier work on 10. Discussion and future directions
TACC [15] and SNS [9], both cluster-based In-
ternet service platforms. If Ninja succeeds in enabling connectivity be-
Harnessing intelligence in the network to tween Internet services and arbitrarily small de-
transform and aggregate data across services has vices, a range of new research directions arise. The
been investigated by several projects. Active net- Ninja goal of moving intelligence into the network
works [47] allow code to be injected into network infrastructure, and opening up the infrastructure
routers to deploy new network protocols, imple- to allow anyone to push new components into it,
ment trac shaping, and perform packet ®ltering. raises questions about management, security, and
An important distinction between these projects service composition.
and Ninja's active proxies is the level at which data The ®rst important concern is how to manage
processing occurs; active networks operate at the resources in a highly dynamic, decentralized net-
transport or packet level, while active proxies op- work of active proxies. Operators should not be
erate using higher level application semantics. As allowed to consume arbitrary amounts of network
such, active proxies are not solely intended to bandwidth, CPU, or memory; however, such re-
implement protocols or perform packet-level op- strictions cannot be made only on a per-site basis,
erations; rather, they are used to perform service as a given operator may consume many aggregate
composition and aggregation, as well as soft-state resources across many active proxies. Otherwise,
transformations (such as HTML ®ltering, as malicious operators could be used to launch dis-
demonstrated by the security proxy). While much tributed denial of service attacks against particular
of the work on mobile agents [28] has focused on bases as well as the network itself. In the same
supporting distributed arti®cial intelligence, active vein, the infrastructure should prevent abuses of
proxies share many of the same systems-level its content delivery mechanisms for unsolicited
concerns, such as code mobility, naming, security, advertising or ``spam'' ± already there are reports
and coordination. of people receiving unwanted advertisements via
Many projects have used transcoding to adapt text paging to cellphones. If Ninja makes this
service content to better suit small devices problem worse, rather than better, the technology
[6,15,29,34±36,49,50]. Additionally, a number of will not be adopted in the wide scale, or the in-
projects have attempted to develop universal in- frastructure will remain closed.
terfaces for large classes of devices, including the New business models emerge in the world of
recent WAP protocol stack [44]. Instead of as- ubiquitous network-based services. Today's model
suming that a single standard will be adopted by of funding Websites through advertising revenue is
all devices, the Ninja architecture allows multiple inappropriate when services capture bits rather
standards to be bridged by using active proxies as than eyeballs. Subscription and micropayment-
transformational intermediaries. based models are possible alternatives. In either
There are several additional technologies that case, retaining user privacy is an important
we would like to explore as interesting examples of concern as data and payments ¯ow across the in-
units. For example, Java Rings [11] and smart frastructure. We envision a new ``service market-
cards allow minimal computation, communica- place'' where both individual operators as well as
tion, and storage, but have no user interfaces. entire vertically integrated services are made
DIMM PC devices (matchbox-sized PCs on a available on a per-use or subscription basis. An-
single chip) could be used as mobile, computa- other interesting model is that of a computational
tionally powerful devices that lack a user interface. economy [31], where active proxies, services, and
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 495

user agents participate in an automated market- 11. Conclusions


place where the commodities are CPU cycles,
memory, and bandwidth. Service authors earn The Ninja architecture represents an important
revenue by making their service available to oth- ®rst step towards opening up the intrastructure of
ers, and active proxies earn revenue by hosting scalable, robust, adaptive Internet services. By
services on behalf of users. Apart from the busi- opening the infrastructure, Ninja hopes to reclaim
ness implications, computational economies can the distributed innovation that was responsible for
be used to implement resource management, load the unprecedented success and widespread adop-
balancing, and quality-of-service contracts. tion of the Internet in the form of the world-wide
Accessing powerful Internet services from small web. Unlike the today's web, the service landscape
devices raises new challenges for user interface envisioned by Ninja is one of active services and
design. Ideally, service-to-device integration will extremely diverse, mobile devices.
be seamless. When failures do occur, however, the In this paper, we described the essential ele-
user may need some way to inspect or control the ments of this open architecture: robust service
path of network components producing the fault. environments on clusters of workstations (bases),
Exerting control over a network of active proxies diverse devices (units), adaptive intermediaries to
from a device as limited as a text pager is dicult isolate services from units (active proxies), and an
at best. Currently, networked devices are bound to abstraction for the composition of these three el-
a particular service; for example, a cellphone is ements (paths). In addition to describing our de-
used primarily for making phone calls. If the Ninja sign and implementation of these components, we
vision is realized, devices will become more ver- presented four innovative services that exploit the
satile and the choices for using them more varied. capabilities o€ered by this open infrastructure.
Users will need some way to select between ser-
vices and perhaps control a user pro®le used by
those services. Acknowledgements
Perhaps the largest challenge to face is that of
automatically composing service components to This work is supported, in part, by the Defense
meet the needs of particular devices. Expressing Advanced Research Project Agency (grant DABT
the transformation, caching, or aggregation 63-98-C-0038) and the National Science Founda-
properties of a Ninja operator in a type system is tion (grant RI EIA-9802069). Support is provided
simple and potentially allows operators to be as well by Intel Corporation, Ericsson, Philips,
automatically chained into a path. However, the Sun Microsystems, IBM, Nortel Networks, and
types must be expressive enough to capture the Compaq.
relevant semantics of an operator. For example,
an English-to-French translation operator may
take type English text as input, and French References
text as output; however, this alone does not
imply translation between the two, as the opera- [1] America Online, The AOL Instant Messaging (AIM)
tor might always output Je ne sais pas traduire Network. http://aim.aol.com/.
cette texte. Apart from strict type-matching, op- [2] E. Amir, S. McCanne, R. Katz, An active service frame-
work and its application to real-time multimedia transcod-
erator selection also depends upon consideration ing, in: Proceedings of ACM SIGCOMM '98, October
of an operator's quality, performance, and cost. 1998, pp. 178±189.
Automatic path creation becomes a problem of [3] T.E. Anderson, D.E. Culler, D. Patterson, A case for
balancing user requirements with other system NOW (networks of workstations), IEEE Micro. 12 (1)
demands, such as resource availability. Perform- (1995) 54±64.
[4] BEA Systems, BEA WebLogic Application Servers. http://
ing this operation eciently and in a decentral- www.bea.com/products/weblogic/.
ized manner suggests several avenues for future [5] B. Bloom, Space/time tradeo€s in hash coding with
research. allowable errors, Commun. ACM 13 (7) (1970) 422±426.
496 S.D. Gribble et al. / Computer Networks 35 (2001) 473±497

[6] C. Brooks, M.S. Mazer, S. Meeks, J. Miller, Application- [22] Hewlett Packard, eSpeak: The Universal Language of E-
speci®c proxy servers as HTTP stream transducers, in: Services. http://www.e-speak.net/.
Proceedings of the Fourth International World Wide Web [23] J. Hill, S. Ross, D. Culler, A. Joseph, A security architec-
Conference, December 1995. ture for the post-PC world. Available at http://
[7] P. Buonadonna, A. Geweke, D. Culler, An implementation www.cs.berkeley.edu/jhill/papers/SecPaper.ps.
and analysis of the virtual interface architecture, in: [24] IBM Corporation, IBM WebSphere Application Server.
Proceedings of SC'98, November 1998. http://www-4.ibm.com/software/webservers/.
[8] Certicom, Elliptic Curve Cryptography for Palm VII. [25] InfoWorld, Boeing to Put Net in the Air. http://www.in-
http://www.certicom.com/press/98/dec0298.htm, December foworld.com/articles/hn/xml/00/04/27/000427enboeing.xml,
1998. April 2000.
[9] Y. Chawathe, E.A. Brewer, System support for scalable [26] InfoWorld, E-cars take to the streets; wireless connections
and fault tolerant Internet services, in: Proceedings of the link road warriors to the Net. http://www.infoworld.com/
IFIP International Conference on Distributed Systems articles/hn/xml/00/03/13/000313hnauto.xml, March 2000.
Platforms and Open Distributed Processing (Middleware [27] Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue,
'98), Lake District, UK, September 1998. Mikio Takeuchi, Takeshi Ogasawara, Toshio Suganuma,
[10] S. Czerwinski, B.Y. Zhao, T. Hodes, A. Joseph, R. Katz, Tamiya Onodera, Hideaki Komatsu, and Toshio Naka-
An architecture for a secure service discovery service, in: tani, Design, implementation, and evaluation of optimiza-
Proceedings of MobiCom '99, ACM, Seattle, WA, August tions in a just-in-time compiler, in: Proceedings of the
1999. ACM 1999 Java Grande Conference, June 1999.
[11] Dallas Semiconductor Designs, The Java Ring. http:// [28] N. Jennings, K. Sycara, M. Wooldridge, A roadmap of
www.ibutton.com/store/jringfacts.html. agent research and development, Autonomous Agents and
[12] Datek Corporation. Datek Online Trading Service. http:// Multi-Agent Systems 1 (1) (1998) 7±38.
www.datek.com, January 2000. [29] M. Liljeberg et al., Enhanced services for World Wide Web
[13] UC Berkeley CS Division, The Millennium Project (home in mobile WAN environment, Technical Report C-1996-
page), 1999. http://millennium.berkeley.edu. 28, University of Helsinki CS Department, April 1996.
[14] G. Eddon, H. Eddon, Inside Distributed COM, Microsoft [30] S. Matsuoka, H. Ogawa, K. Shimura, Y. Kimura, K.
Press, Redmond, WA, 1998. Hotta, H. Takagi, OpenJIT: a re¯ective Java JIT compiler,
[15] A. Fox, S.D. Gribble, Y. Chawathe, E.A. Brewer, P. in: Proceedings of OOPSLA '98, Workshop on Re¯ective
Gauthier, Cluster-based scalable network services, in: Programming in C++ and Java, 1998. http://openjit.is.ti-
Proceedings of the 16th ACM Symposium on Operating tech.ac.jp/.
Systems Principles, St.-Malo, France, October 1997. [31] M.S. Miller, K. Eric Drexler, Markets and computation:
[16] A. Fox, S.D. Gribble, Security on the move: indirect agorics open systems, in: B. Huberman (Ed.), The Ecology
authentication using Kerberos, in: Proceedings of the of Computation, Elsevier, Amsterdam, 1998.
Second International Conference on Wireless Networking [32] Myricom Corporation, Myrinet: a gigabit per second local
and Mobile Computing (MobiCom '96), Rye, NY, No- area network, IEEE Micro, February 1995.
vember 1996. [33] PC World Communications, April 2000. http://
[17] A. Fox, S.D. Gribble, E.A. Brewer, E. Amir, Adapting to www.pcworld.com/pcwtoday/article/
network and client variability via on-demand dynamic 0,1510,16045+1+0,00.html.
distillation, in: Procedings of the Seventh International [34] Y. Sato, Dele Gate Server, March 1994. http://wall.etl.-
Conference on Architectural Support for Programming go.jp/delegate/.
Languages and Operating Systems (ASPLOS-VII), Cam- [35] M.A. Schickler, M.S. Mazer, C. Brooks, Pan-browser
bridge, MA, October 1996. support for annotations and other meta-information on the
[18] I. Goldberg, S.D. Gribble, D. Wagner, E.A. Brewer, The World Wide Web, in: Proceedings of the Fifth Interna-
Ninja Jukebox, in: Proceedings of the Second USENIX tional World Wide Web Conference (WWW-5), May 1996.
Symposium on Internet Technologies and Systems, Boul- [36] B. Schilit, T. Bickmore, Digestor: device-independent
der, CO, USA, October 1999. access to the World Wide Web, in: Proceedings of the
[19] J. Gosling, B. Joy, G. Steele, The Java Language Speci- Sixth International World Wide Web Conference (WWW-
®cation, Addison-Wesley, Reading, MA, 1996. 6), Santa Clara, CA, April 1997.
[20] S.D. Gribble, E.A. Brewer, J.M. Hellerstein, D. Culler, [37] Sun Microsystems, Enterprise Java Beans Technology.
Scalable, distributed data structures for Internet service http://java.sun.com/products/ejb/.
construction, in: Proceedings of the Fourth USENIX [38] Sun Microsystems, Java Remote Method Invocation ±
Symposium on Operating System Design and Implemen- Distributed Computing for Java. http://java.sun.com/.
tation (OSDI 2000), San Diego, CA, USA, October 2000. [39] Sun Microsystems, Jini Connection Technology. http://
[21] S.D. Gribble, M. Welsh, E.A. Brewer, D. Culler, The www.sun.com/jini/.
MultiSpace: an evolutionary platform for infrastructural [40] The Object Management Group (OMG), The Common
services, in: Proceedings of the 1999 Usenix Annual Object Request Broker Architecture. http://www.cor-
Technical Conference, Monterey, CA, USA, June 1999. ba.org.
S.D. Gribble et al. / Computer Networks 35 (2001) 473±497 497

[41] Virtual Interface Architecture Organization, Virtual Inter- Computing (special issue). http://www.cs.berkeley.edu/
face Architecture Speci®cation version 1.0, December 1997. mdw/papers/jaguar-journal.ps.gz.
http://www.viarch.org. [47] D.J. Wetherall, J. Guttag, D.L. Tennenhouse, ANTS: a
[42] J.R. von Behren, S. Czerwinski, A.D. Joseph, E.A. Brewer, toolkit for building and dynamically deploying network
J. Kubiatowicz, NinjaMail: the design of a high-perfor- protocols, in: Proceedings of IEEE OPENARCH'98, San
mance clustered, distributed e-mail system, in: Proceedings Francisco, CA, April 1998.
of the First International Workshop on Scalable Web [48] Yahoo Finance, Yahoo Finance Investment Challenge,
Services, Toronto, Canada, August 2000. 2000. http://contest.®nance.yahoo.com/t1?u/.
[43] J. Waldo, Jini Architecture Overview. Available at http:// [49] Ka-Ping Yee, Shoduoka Mediator Service, 1995. http://
java.sun.com/products/jini/whitepapers. www.shoduoka.com.
[44] WAP Forum, Wireless Application Protocol (WAP) Fo- [50] Bruce Zenel, Dan Duchamp, A general purpose proxy
rum. http://www.wapforum.org. ®ltering mechanism applied to the mobile environment, in:
[45] M. Welsh, N. Borisov, J. Hill, R. von Behren, A. Woo, Proceedings of the Third Annual ACM/IEEE Conference
Querying large collections of music for similarity. Techni- on Mobile Computing and Networking (Mobicom '97),
cal Report UCB/CSD-00-1096, U.C. Berkeley Computer ACM, New York, USA, 1997.
Science Division, November 1999. [51] B.Y. Zhao, A.D. Joseph, XSet: a lightweight database for
[46] M. Welsh, D. Culler, Jaguar: enabling ecient communi- Internet applications, May 2000. http://www.cs.berke-
cation and I/O in Java, Concurrency: Practice and Expe- ley.edu/ravenben/publications/saint.pdf.
rience, 2000, Java for High-Performance Network

You might also like