Elysium Technologies Private Limited: Churn-Resilient Protocol For Massive Data Dissemination in P2P Networks

Elysium Technologies Private Limited
ISO 9001:2008 A leading Research and Development Division

Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Project List 2011 - 2012
20
Churn-Resilient Protocol for Massive Data Dissemination in P2P Networks
Massive data dissemination is often disrupted by frequent join and departure or failure of client nodes in a peer-to-peer (P2P) network. We propose a new churn-resilient protocol (CRP) to assure alternating path and data proximity to accelerate the data dissemination process under network churn. The CRP enables the construction of proximity-aware P2P content delivery systems. We present new data dissemination algorithms using this proximity-aware overlay design. We simulated P2P networks up to 20,000 nodes to validate the claimed advantages. Specifically, we make four technical contributions: 1). The CRP scheme promotes proximity awareness, dynamic load balancing, and resilience to node failures and network anomalies. 2). The proximity-aware overlay network has a 28-50 percent speed gain in massive data dissemination, compared with the use of scope-flooding or epidemic tree schemes in unstructured P2P networks. 3). The CRP-enabled network requires only 1/3 of the control messages used in a large CAM-Chord network. 4) Even with 40 percent of node failures, the CRP network guarantees atomic broadcast of all data items. These results clearly demonstrate the scalability and robustness of CRP networks under churn conditions. The scheme appeals especially to webscale applications in digital content delivery, network worm containment, and consumer relationship management over hundreds of datacenters in cloud computing services.
21
Cloud Technologies for Bioinformatics Applications
Executing large number of independent jobs or jobs comprising of large number of tasks that perform minimal inter task communication is a common requirement in many domains. Various technologies ranging from classic job schedulers to the latest cloud technologies such as Map Reduce can be used to execute these many-tasks in parallel. In this paper, we present our experience in applying two cloud technologies Apache Ha doop and Microsoft DryadLINQ to two bioinformatics applications with the above characteristics. The applications are a pair wise Alu sequence alignment application and an Expressed Sequence Tag (EST) sequence assembly program. First, we compare the performance of these cloud technologies using the above applications and also compare them with traditional MPI implementation in one application. Next, we analyze the effect of inhomogeneous data on the scheduling mechanisms of the cloud technologies. Finally, we present a comparison of performance of the cloud technologies under virtual and nonvirtual hardware platforms.
22
Collective Receiver-Initiated Multicast for Grid Applications
Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing sender-initiated methods arrange nodes in optimized tree structures, based on external network monitoring data. This dependence on monitoring data severely impacts both ease of deployment and adaptivity to dynamically changing network conditions. In this paper, we present Robber, a collective, receiver-initiated, highthroughput multicast approach inspired by the BitTorrent protocol. Unlike BitTorrent, Robber is specifically designed to maximize the throughput between multiple cluster computers. Nodes in the same cluster work together as a collective that tries to steal data from peer clusters. Instead of using potentially outdated monitoring data, Robber automatically adapts to the currently achievable bandwidth ratios. Within a collective, nodes automatically tune the amount of data they steal remotely to their relative performance. Our experimental evaluation compares Robber to BitTorrent, to Balanced Multicasting, and to its predecessor MOB. Balanced Multicasting optimizes multicast trees based on external monitoring data, while MOB uses collective, receiver-initiated multicast with static load balancing.
Madurai
Elysium Technologies Private Limited 230, Church Road, Annanagar, Madurai , Tamilnadu 625 020. Contact : 91452 4390702, 4392702, 4394702. eMail: info@elysiumtechnologies.com
Trichy
Elysium Technologies Private Limited 3rd Floor,SI Towers, 15 ,Melapudur , Trichy, Tamilnadu 620 001. Contact : 91431 - 4002234. eMail: elysium.trichy@gmail.com
Kollam
Elysium Technologies Private Limited Surya Complex,Vendor junction, kollam,Kerala 691 010. Contact : 91474 2723622. eMail: elysium.kollam@gmail.com

We show that both Robber and MOB outperform BitTorrent. They are competitive with Balanced Multicasting as long as the network bandwidth remains stable, and outperform it by wide margins when bandwidth changes dynamically. In large environments and heterogeneous clusters, Robber outperforms MOB.
23
Comparing Hardware Accelerators in Scientific Applications: A Case Study
Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the applications performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.
24
Computing Localized Power-Efficient Data Aggregation Trees for Sensor Networks
We propose localized, self organizing, robust, and energy-efficient data aggregation tree approaches for sensor networks, which we call Localized Power-Efficient Data Aggregation Protocols (L-PEDAPs). They are based on topologies, such as LMST and RNG,that can approximate minimum spanning tree and can be efficiently computed using only position or distance information of one-hop neighbors. The actual routing tree is constructed over these topologies. We also consider different parent selection strategies while constructing a routing tree. We compare each topology and parent selection strategy and conclude that the best among them is the shortest path strategy over LMSTstructure. Our solution also involves route maintenance procedures that will be executed when a sensor node fails or a new node is added to the network. The proposed solution is also adapted to consider the remaining power levels of nodes in order to increase the network lifetime. Our simulation results show that by using our power-aware localized approach, we can almost have the same performance of a centralized solution in terms of network lifetime, and close to 90 percent of an upper bound derived here.
25
Conflicts and Incentives in Wireless Cooperative Relaying: A Distributed Market Pricing Framework
Extensive research in recent years has shown the benefits of cooperative relaying in wireless networks, where nodes overhear and cooperatively forward packets transmitted between their neighbors. Most existing studies focus on physical-layer optimization of the effective channel capacity for a given transmitter-receiver link; however, the interaction among simultaneous flows between different endpoint pairs, and the conflicts arising from their competition for a shared pool of relay nodes, are not yet well understood. In this paper, we study a distributed pricing framework, where sources pay relay nodes to forward their packets, and the payment is shared equally whenever a packet is successfully relayed by several nodes at once. We formulate this scenario as a Stackelberg (leader-follower) game, in which sources set the payment rates they offer, and relay nodes respond by choosing the flows to cooperate with. We provide a systematic analysis of the fundamental structural properties of this generic model. We show that multiple follower equilibria exist in general due to the nonconcave nature of their game, yet only one equilibrium
Madurai
Trichy
Kollam

possesses certain continuity properties that further lead to a unique system equilibrium among the leaders. We further demonstrate that the resulting equilibria are reasonably efficient in several typical scenarios.
26
Consensus and Mutual Exclusion in a Multiple Access Channel
We consider deterministic feasibility and time complexity of two fundamental tasks in distributed computing: consensus and mutual exclusion. Processes have different labels and communicate through a multiple access channel. The adversary wakes up some processes in possibly different rounds. In any round, every awake process either listens or transmits. The message of a process i is heard by all other awake processes, if i is the only process to transmit in a given round. If more than one process transmits simultaneously, there is a collision and no message is heard. We consider three characteristics that may or may not exist in the channel: collision detection (listening processes can distinguish collision from silence), the availability of a global clock showing the round number, and the knowledge of the number n of all processes. If none of the above three characteristics is available in the channel, we prove that consensus and mutual exclusion are infeasible; if at least one of them is available, both tasks are feasible, and we study their time complexity. Collision detection is shown to cause an exponential gap in complexity: if it is available, both tasks can be performed in time logarithmic in n, which is optimal, and without collision detection both tasks require linear time. We then investigate both consensus and mutual exclusion in the absence of collision detection, but under alternative presence of the two other features. With global clock, we give an algorithm whose time complexity linearly depends on n and on the wake-up time, and an algorithm whose complexity does not depend on the wake-up time and differs from the linear lower bound only by a factor O(log2 n). If n is known, we also show an algorithm whose complexity differs from the linear lower bound only by a factor O(log2 n).
27
Cooperative Channelization in Wireless Networks with Network Coding
n this paper, we address congestion of multicast traffic in multihop wireless networks through a combination of network coding and resource reservation. Network coding reduces the number of transmissions required in multicast flows, thus allowing a network to approach its multicast capacity. In addition, it efficiently repairs errors in multicast flows by combining packets lost at different destinations. However, under conditions of extremely high congestion the repair capability of network coding is seriously degraded. In this paper, we propose cooperative channelization, in which portions of the transmission media are allocated to links that are congested at the point where network coding cannot efficiently repair loss. A health metric is proposed to allow comparison of need for channelization of different multicast links. Cooperative channelization considers the impact of channelization on overall network performance before resource reservation is triggered. Our results show that cooperative channelization improves overall network performance while being well suited for wireless networks using network coding.
28
Cooperative Search and Survey Using Autonomous Underwater Vehicles (AUVs)
In this work, we study algorithms for cooperative search and survey using a fleet of Autonomous Underwater Vehicles (AUVs). Due to the limited energy, communication range/bandwidth, and sensing range of the AUVs, underwater search and survey with multiple AUVs brings about several new challenges since a large amount of data needs to be collected by each AUV, and any AUV may fail unexpectedly. To address the challenges and meet our objectives of minimizing the total survey time and traveled distance of AUVs, we propose a cooperative rendezvous scheme called
Madurai
Trichy
Kollam

Synchronization-Based Survey (SBS) to facilitate cooperation among a large number of AUVs when surveying a large area. In SBS, AUVs form an intermittently connected network (ICN) in that they periodically meet each other for data aggregation, control signal dissemination, and AUV failure detection/recovery. Numerical analysis and simulations have been performed to compare the performance of three variants of SBS schemes, namely, Alternating Column Synchronization (ACS), Strict Line Synchronization (SLS), and X Synchronization (XS). The results show that XS can outperform other SBS schemes in terms of the survey time and the traveled distance of AUVs. We also compare XS with nonsynchronization-based survey and the lower bound on the survey time and traveled distance. The results show that XS achieves a close to optimal performance..
29
Coordinating Computation and I/O in Massively Parallel Sequence Search
With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and highperformance noncontiguous I/O.
30
Coordinating Power Control and Performance Management for Virtualized Server Clusters
Todays data centers face two critical challenges. First, various customers need to be assured by meeting their required service-level agreements such as response time and throughput. Second, server power consumption must be controlled in order to avoid failures caused by power capacity overload or system overheating due to increasing high server density. However, existing work controls power and application-level performance separately, and thus, cannot simultaneously provide explicit guarantees on both. In addition, as power and performance control strategies may come from different hardware/software vendors and coexist at different layers, it is more feasible to coordinate various strategies to achieve the desired control objectives than relying on a single centralized control strategy. This paper proposes Co-Con, a novel cluster-level control architecture that coordinates individual power and performance control loops for virtualized server clusters. To emulate the current practice in data centers, the power control loop changes hardware power states with no regard to the application-level performance. The performance control loop is then designed for each virtual machine to achieve the desired performance even when the system model varies significantly due to the impact of power control. Co-Con configures the two control loops rigorously, based on feedback control theory, for theoretically guaranteed control accuracy and system stability. Empirical results on a physical testbed demonstrate that Co-Con can simultaneously provide effective control on both application-level performance and underlying power consumption.
Madurai
Trichy
Kollam

31
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid
We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular, single-precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother, we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed-precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.
32
Data Fusion with Desired Reliability in Wireless Sensor Networks
Energy-efficient and reliable transmission of sensory information is a key problem in wireless sensor networks. To save more energy, in-network processing such as data fusion is a widely used technique, which, however, may often lead to unbalanced information among nodes in the data fusion tree. Traditional schemes aim at providing reliable transmission to individual data packets from source node to the sink, but seldom offer the desired reliability to a data fusion tree. In this paper, we explore the problem of Minimum Energy Reliable Information Gathering (MERIG) when performing data fusion. By adaptively using redundant transmission on fusion routes without acknowledgments, packets with more information are delivered with higher reliability. For different data fusion topologies, such as star, chain, and tree, we provide optimal solutions to compute the number of transmissions for each node. We also propose practical, distributed approximation algorithms for chain and tree topologies. Analytical proofs and simulation results show that energy-efficient information reliability can be guaranteed in an unreliable wireless environment with the help of our proposed schemes.
33
Data Replication in Data Intensive Scientific Applications with Performance Guarantee
Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file caching
Madurai
Trichy
Kollam

technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.
34
Dealing with Nonuniformity in Data Centric Storage for Wireless Sensor Networks
In-network storage of data in Wireless Sensor Networks (WSNs) is considered a promising alternative to external storage since it contributes to reduce the communication overhead inside the network. Recent approaches to data storage rely on Geographic Hash Tables (GHT) for efficient data storage and retrieval. These approaches, however, assume that sensors are uniformly distributed in the sensor field, which is seldom true in real applications. Also they do not allow tuning the redundancy level in the storage according to the importance of the data to be stored. To deal with these issues, we propose an approach based on two mechanisms. The first is aimed at estimating the real network distribution. The second exploits data dispersal method based on the estimated network distribution. Experiments through simulation show that our approach approximates quite closely the real distribution of sensors and that our dispersal protocol sensibly reduces data losses due to unbalanced data load.
35
Decomposing Workload Bursts for Efficient Storage Resource Management
The growing popularity of hosted storage services and shared storage infrastructure in data centers is driving the recent interest in resource management and QoS in storage systems. The bursty nature of storage workloads raises significant performance and provisioning challenges, leading to increased resource requirements, management costs, and energy consumption. We present a novel workload shaping framework to handle bursty workloads, where the arrival stream is dynamically decomposed to isolate its bursts, and then rescheduled to exploit available slack. We show how decomposition reduces the server capacity requirements and power consumption significantly, while affecting QoS guarantees minimally. We present an optimal decomposition algorithm RTT and a recombination algorithm Miser, and show the benefits of the approach by evaluating the performance of several storage workloads using both simulation and Linux implementation.
36
Design and Evaluation of MPI File Domain Partitioning Methods under Extent-Based File Locking Protocol
MPI collective I/O has been an effective method for parallel shared-file access and maintaining the canonical orders of structured data in files. Its implementation commonly uses a two-phase I/O strategy that partitions a file into disjoint file domains, assigns each domain to a unique process, redistributes the I/O data based on their locations in the domains, and has each process perform I/O for the assigned domain. The partitioning quality determines the maximal performance achievable by the underlying file system, as the shared-file I/O has long been impeded by the cost of file systems data consistency control, particularly due to the conflicted locks. This paper proposes a few file domain partitioning methods designed to reduce lock conflicts under the extent-based file locking protocol. Experiments from four I/O benchmarks on the IBM GPFS and Lustre parallel file systems show that the partitioning method producing
Madurai
Trichy
Kollam

minimum lock conflicts wins the highest performance. The benefit of removing conflicted locks can be so significant that more than thirty times of write bandwidth differences are observed between the best and worst methods.
37
Design and Evaluation of Multiple-Level Data Staging for Blue Gene Systems
Parallel applications currently suffer from a significant imbalance between computational power and available I/O bandwidth. Additionally, the hierarchical organization of current Petascale systems contributes to an increase of the I/O subsystem latency. In these hierarchies, file access involves pipelining data through several networks with incremental latencies and higher probability of congestion. Future Exascale systems are likely to share this trait. This paper presents a scalable parallel I/O software system designed to transparently hide the latency of file system accesses to applications on these platforms. Our solution takes advantage of the hierarchy of networks involved in file accesses, to maximize the degree of overlap between computation, file I/O-related communication, and file system access. We describe and evaluate a two-level hierarchy for Blue Gene systems consisting of client-side and I/O node-side caching. Our file cache management modules coordinate the data staging between application and storage through the Blue Gene networks. The experimental results demonstrate that our architecture achieves significant performance improvements through a high degree of overlap between computation, communication, and file I/O.
38
Design and Performance Evaluation of Image Processing Algorithms on GPUs
In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. A set of metrics, customized for image processing, is proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and nonphotorealistic rendering (NPR) as our example applications. The algorithms are carefully selected from major domains of image processing, so they inherently contain a variety of subalgorithms with diverse characteristics when implemented on the GPU. Performance is evaluated in terms of execution time and is compared to the fastest host-only version implemented using OpenMP. It is shown that the observed speedup varies extensively depending on the characteristics of each algorithm. Intensive analysis is conducted to show the appropriateness of the proposed metrics in predicting the effectiveness of an application for parallel implementation.
39
Design of Distributed Heterogeneous Embedded Systems in DDFCharts
The use of formal models of computation in dealing with increasing complexity of embedded systems design is gaining attention. A successful model of computation must be able to handle both control-dominated and data-dominated behaviors, which are most often simultaneously present in complex embedded systems. Besides behavioral
Madurai
Trichy
Kollam

heterogeneity, direct support for modeling distributed systems is also desirable, since an increasing number of embedded systems belong to this category. In this paper, we present distributed DFCharts (DDFCharts), a language based on a formal model that targets distributed heterogeneous embedded systems. Its top hierarchical level is made suitable to capture distributed systems. Behavioral heterogeneity is addressed by composing finite-state machines (FSMs) and synchronous dataflow graphs (SDFGs). We illustrate modeling in DDFCharts with practical examples and describe its implementation on heterogeneous target architecture.
40
Dynamic Resource Provisioning in Massively Multiplayer Online Games
Todays Massively Multiplayer Online Games (MMOGs) can include millions of concurrent players spread across the world and interacting with each other within a single session. Faced with high resource demand variability and with misfit resource renting policies, the current industry practice is to overprovision for each game tens of self-owned data centers, making the market entry affordable only for big companies. Focusing on the reduction of entry and operational costs, we investigate a new dynamic resource provisioning method for MMOG operation using external data centers as low-cost resource providers. First, we identify in the various types of player interaction a source of short-term load variability, which complements the long-term load variability due to the size of the player population. Then, we introduce a combined MMOG processor, network, and memory load model that takes into account both the player interaction type and the population size. Our model is best used for estimating the MMOG resource demand dynamically, and thus, for dynamic resource provisioning based on the game world entity distribution. We evaluate several classes of online predictors for MMOG entity distribution and propose and tune a neural network-based predictor to deliver good accuracy consistently under real-time performance constraints. We assess using trace-based simulation the impact of the data center policies on the quality of resource provisioning. We find that the dynamic resource provisioning can be much more efficient than its static alternative even when the external data centers are busy, and that data centers with policies unsuitable for MMOGs are penalized by our dynamic resource provisioning method. Finally, we present experimental results showing the real-time parallelization and load balancing of a real game prototype using data center resources provisioned using our method and show its advantage against a rudimentary client threshold approach.
41
Edge Self-Monitoring for Wireless Sensor Networks
Local monitoring is an effective mechanism for the security of wireless sensor networks (WSNs). Existing schemes assume the existence of sufficient number of active nodes to carry out monitoring operations. Such an assumption, however, is often difficult for a large-scale sensor network. In this work, we focus on designing an efficient scheme integrated with good self-monitoring capability as well as providing an infrastructure for various security protocols using local monitoring. To the best of our knowledge, we are the first to present the formal study on optimizing network topology for edge self-monitoring in WSNs. We show that the problem is NP-complete even under the unit disk graph (UDG) model and give the upper bound on the approximation ratio in various graph models. We provide polynomialtime approximation scheme (PTAS) algorithms for the problem in some specific graphs, for example, the monitoringsetbounded graph. We further design two distributed polynomial algorithms with provable approximation ratio. Through comprehensive simulations, we evaluate the effectiveness of our design.
42
Madurai
Efficient Adaptive Scheduling of Multiprocessors with Stable Parallelism Feedback
With proliferation of multicore computers and multiprocessor systems, an imminent challenge is to efficiently schedule
Trichy
Kollam

parallel applications on these resources. In contrast to conventional static scheduling, adaptive schedulers that dynamically allocate processors to jobs possess good potential for improving processor utilization and speeding up jobs execution. In this paper, we focus on adaptive scheduling of malleable jobs with periodic processor reallocations based on parallelism feedback of the jobs and allocation policy of the system. We present an efficient adaptive scheduler ACDEQ that provides parallelism feedback using an adaptive controller A-CONTROL and allocates processors based on the well-known Dynamic Equipartitioning algorithm (DEQ). Compared to A-GREEDY, an existing adaptive scheduler that experiences feedback instability thus incurs unnecessary scheduling overheads, we show that A-CONTROL achieves much more stable feedback among other desirable control-theoretic properties. Furthermore, we analyze algorithmically the performances of ACDEQ in terms of its response time and processor waste for an individual job as well as makespan and total response time for a set of jobs. To the best of our knowledge, ACDEQ is the first multiprocessor scheduling algorithm that offers both control-theoretic and algorithmic guarantees. We further evaluate ACDEQ via simulations by using Downeys parallel job model augmented with internal parallelism variations. The results confirm its improved performances over AGDEQ, and they show that ACDEQ excels especially when the scheduling overhead becomes high.
43
Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing
Cloud Computing has been envisioned as the next-generation architecture of IT Enterprise. It moves the application software and databases to the centralized large data centers, where the management of the data and services may not be fully trustworthy. This unique paradigm brings about many new security challenges, which have not been well understood. This work studies the problem of ensuring the integrity of data storage in Cloud Computing. In particular, we consider the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. The introduction of TPA eliminates the involvement of the client through the auditing of whether his data stored in the cloud are indeed intact, which can be important in achieving economies of scale for Cloud Computing. The support for data dynamics via the most general forms of data operation, such as block modification, insertion, and deletion, is also a significant step toward practicality, since services in Cloud Computing are not limited to archive or backup data only. While prior works on ensuring remote data integrity often lacks the support of either public auditability or dynamic data operations, this paper achieves both. We first identify the difficulties and potential security problems of direct extensions with fully dynamic data updates from prior works and then show how to construct an elegant verification scheme for the seamless integration of these two salient features in our protocol design. In particular, to achieve efficient data dynamics, we improve the existing proof of storage models by manipulating the classic Merkle Hash Tree construction for block tag authentication. To support efficient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multiuser setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis show that the proposed schemes are highly efficient and provably secure.
44
Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions
Traditionally, the primary performance goal of computer systems has focused on reducing the execution time of applications while increasing throughput. This performance goal has been mostly achieved by the development of high-density computer systems. As witnessed recently, these systems provide very powerful processing capability and capacity. They often consist of tens or hundreds of thousands of processors and other resource-hungry devices. The energy consumption of these systems has become a major concern. In this paper, we address the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and present two energyconscious scheduling algorithms using dynamic voltage scaling (DVS). A number of recent commodity processors are capable of DVS, which enables processors to operate at different voltage supply levels at the expense of sacrificing
Madurai
Trichy
Kollam

clock frequencies. In the context of scheduling, this multiple voltage facility implies that there is a trade-off between the quality of schedules and energy consumption. To effectively balance these two performance goals, we have devised a novel objective function and a variant from that. The main difference between the two algorithms is in their measurement of energy consumption. The extensive comparative evaluations conducted as part of this work show that the performance of our algorithms is very compelling in terms of both application completion time and energy consumption.
45
Energy-Efficient Localized Routing in Random Multihop Wireless Networks
A number of energy-aware routing protocols were proposed to seek the energy efficiency of routes in multihop wireless networks. Among them, several geographical localized routing protocols were proposed to help making smarter routing decision using only local information and reduce the routing overhead. However, all proposed localized routing methods cannot guarantee the energy efficiency of their routes. In this paper, we first give a simple localized routing algorithm, called Localized Energy-Aware Restricted Neighborhood routing (LEARN), which can guarantee the energy efficiency of its route if it can find the route successfully. We then theoretically study its critical transmission radius in random networks which can guarantee that LEARN routing finds a route for any source and destination pairs asymptotically almost surely. We also extend the proposed routing into three-dimensional (3D) networks and derive its critical transmission radius in 3D random networks. Simulation results confirm our theoretical analysis of LEARN routing and demonstrate its energy efficiency in large scale random networks.
46
Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud
In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-aService (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by todays IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.
47
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures
The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption of
Madurai
Trichy
Kollam
10

these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11:4x and 13:5x over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.
48
Fast and Cost-Effective Online Load-Balancing in Distributed Range-Queriable Systems
Distributed systems such as Peer-to-Peer overlays have been shown to efficiently support the processing of range queries over large numbers of participating hosts. In such systems, uneven load allocation has to be effectively tackled in order to minimize overloaded peers and optimize their performance. In this work, we detect the two basic methodologies used to achieve load-balancing: Iterative key redistribution between neighbors and node migration. We identify these two key mechanisms and describe their relative advantages and disadvantages. Based on this analysis, we propose NIXMIG, a hybrid method that adaptively utilizes these two extremes to achieve both fast and cost-effective load-balancing in distributed systems that support range queries. We theoretically prove its convergence and as a case study, we offer an implementation on top of a Skip Graph, where we thoroughly validate our findings in a variety of static, dynamic and realistic workloads. We compare NIXMIG with an existing load-balancing algorithm proposed by Karger and Ruhl [1] and our experimental analysis shows that, NIXMIG can be as much as three times faster, requiring only one sixth and one third of message and item exchanges, respectively, to bring the system to a balanced state.
49
FDAC: Toward Fine-Grained Distributed Data Access Control in Wireless Sensor Networks
Distributed sensor data storage and retrieval have gained increasing popularity in recent years for supporting various applications. While distributed architecture enjoys a more robust and fault-tolerant wireless sensor network (WSN), such architecture also poses a number of security challenges especially when applied in mission-critical applications such as battlefield and ehealthcare. First, as sensor data are stored and maintained by individual sensors and unattended sensors are easily subject to strong attacks such as physical compromise, it is significantly harder to ensure data security. Second, in many mission-critical applications, fine-grained data access control is a must as illegal access to the sensitive data may cause disastrous results and/or be prohibited by the law. Last but not least, sensor nodes usually are resource-constrained, which limits the direct adoption of expensive cryptographic primitives. To address the above challenges, we propose, in this paper, a distributed data access control scheme that is able to enforce fine-grained access control over sensor data and is resilient against strong attacks such as sensor compromise and user colluding. The proposed scheme exploits a novel cryptographic primitive called attribute-based encryption (ABE), tailors, and adapts it for WSNs with respect to both performance and security requirements. The feasibility of the scheme is demonstrated by experiments on real sensor platforms. To our best knowledge, this paper is the first to realize distributed fine-grained data access control for WSNs.
50
Madurai
Flexible Robust Group Key Agreement
A robust group key agreement protocol (GKA) allows a set of players to establish a shared secret key, regardless of
Trichy
Kollam
11

network/node failures. Current constant-round GKA protocols are either efficient and nonrobust or robust but not efficient; assuming a reliable broadcast communication medium, the standard encryption-based group key agreement protocol can be robust against arbitrary number of node faults, but the size of the messages broadcast by every player is proportional to the number of players. In contrast, nonrobust group key agreement can be achieved with each player broadcasting just constant-sized messages. We propose a novel 2-round group key agreement protocol, which tolerates up to T node failures, using OT-sized messages for any T. We show that the new protocol implies a fullyrobust group key agreement with logarithmic-sized messages and expected round complexity close to 2, assuming random node faults. The protocol can be extended to withstand malicious insiders at small constant factor increases in bandwidth and computation. The proposed protocol is secure under the (standard) Decisional Square Diffie-Hellman assumption.
51
Group Strategy proof Multicast in Wireless Networks
We study the dissemination of common information from a source to multiple nodes within a multihop wireless network, where nodes are equipped with uniform omnidirectional antennas and have a fixed cost per packet transmission. While many nodes may be interested in the dissemination service, their valuation or utility for such a service is usually private information. A desirable routing and charging mechanism encourages truthful utility reports from the nodes. We provide both negative and positive results toward such mechanism design. We show that in order to achieve the group strategyproof property, a compromise in routing optimality or budget-balance is inevitable. In particular, the fraction of optimal routing cost that can be recovered through node charges cannot be significantly higher than 1 2 . To answer the question whether constant-ratio cost recovery is possible, we further apply a primal-dual schema to simultaneously build a routing solution and a cost-sharing scheme, and prove that the resulting mechanism is group strategyproof and guarantees 1 4 -approximate cost recovery against an optimal routing scheme.
52
HaRP: Rapid Packet Classification via Hashing Round-Down Prefixes
Packet classification is central to a wide array of Internet applications and services, with its approaches mostly involving either hardware support or optimization steps needed by software-oriented techniques (to add precomputed markers and insert rules in the search data structures). Unfortunately, an approach with hardware support is expensive and has limited scalability, whereas one with optimization fails to handle incremental rule updates effectively. This work deals with rapid packet classification, realized by hashing round-down prefixes (HaRP) in a way that the source and the destination IP prefixes specified in a rule are rounded down to designated prefix lengths (DPL) for indexing into hash sets. HaRP exhibits superb hash storage utilization, able to not only outperform those earlier software-oriented classification techniques but also well accommodate dynamic creation and deletion of rules. HaRP makes it possible to hold all its search data structures in the local cache of each core within a contemporary processor, dramatically elevating its classification performance. Empirical results measured on an AMD 4-way 2.8 GHz Opteron system (with 1 MB cache for each core) under six filter data sets (each with up to 30 K rules) obtained from a public source unveil that HaRP enjoys up to some 3:6 HyperCuts (HC). throughput level achievable by the best known decision tree-based counterpart,
53
Madurai
hiCUDA: High-Level GPGPU Programming
Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain,
Trichy
Kollam
12

mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hi CUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hi CUDA directives as well as the design and implementation of a prototype compiler that translates a hi CUDA program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using expense to performance. nine CUDA benchmarks show that the simplicity hi CUDA provides comes at no
54
Hybrid Core Acceleration of UWB SIRE Radar Signal Processing
To move High-Performance Computing (HPC) closer to forward operating environments and missions, the Army Research Laboratory is developing approaches using hybrid, asymmetric core computing. By blending capabilities found in Graphics Processing Units (GPUs) and traditional von Neumann multicore Central Processing Units (CPUs), approaches are being developed and optimized to provide at or near real-time processing speeds for research project applications. Algorithms are designed to partition work to resources best designed to handle the processing load. The use of commodity resources allows the design to be flexible throughout the life cycle without the costly and time-consuming delays associated with Application-Specific Integrated Circuit (ASIC) development. This paradigm allows for rapid technology transfer to end users. In this paper, we describe a synchronous impulse reconstruction radar imaging algorithm that has been designed for hybrid CPU-GPU processing. We discuss various optimizations such as asynchronous task partitioning between the CPU and GPU as well as data movement reduction. We also discuss analysis and design of the algorithms within the context of two programming models: NVIDIAs CUDA and AMDs ATI Brook+. Finally, we report on the speedup achieved by this approach that allowed us to take a code once restricted to post processing and transform it into one that exceeds real-time Performance requirements.
55
Impact of Traffic Influxes: Revealing Exponential Inter contact Time in Urban VANETs
Intercontact time between moving vehicles is one of the key metrics in vehicular ad hoc networks (VANETs) and central to forwarding algorithms and the end-to-end delay. Due to prohibitive costs, little work has conducted experimental study on intercontact time in urban vehicular environments. In this paper, we carry out an extensive experiment involving thousands of operational taxies in Shanghai city. Studying the taxi trace data on the frequency and duration of transfer opportunities between taxies, we observe that the tail distribution of the intercontact time, that is, the time gap separating two contacts of the same pair of taxies, exhibits an exponential decay, over a large range of timescale. This observation is in sharp contrast to recent empirical data studies based on human mobility, in which the distribution of the intercontact time obeys a power law. By analyzing a simplified mobility model that captures the effect of hot areas in the city, we rigorously prove that common traffic influxes, where large volume of traffic converges, play a major role in generating the exponential tail of the intercontact time. Our results thus provide fundamental guidelines on design of new vehicular mobility models in urban scenarios, new data forwarding protocols and their performance analysis.
Madurai
Trichy
Kollam
13

56
Integrating Caching and Prefetching Mechanisms in a Distributed Transactional Memory
We present a distributed transactional memory system that exploits a new opportunity to automatically hide network latency by speculatively prefetching and caching objects. The system includes an object caching framework, language extensions to support our approach, and symbolic prefetches. To our knowledge, this is the first prefetching approach that can prefetch objects whose addresses have not been computed or predicted. Our approach makes aggressive use of both prefetching and caching of remote objects to hide network latency while relying on the transaction commit mechanism to preserve the simple transactional consistency model that we present to the developer. We have evaluated this approach on three distributed benchmarks, five scientific benchmarks, and several micro benchmarks. We have found that our approach enables our benchmark applications to effectively utilize multiple machines and benefit from prefetching and caching. We have observed a speedup of up to 7:26 distributed applications on our system using prefetching and caching and a speedup of up to 5:55 our system. for
for parallel applications on
57
Interlacing Bypass Rings to Torus Networks for More Efficient Networks
We introduce a new technique for generating more efficient networks by systematically interlacing bypass rings to torus networks (iBT networks). The resulting network can improve the original torus network by reducing the network diameter, node-to-node distances, and by increasing the bisection width without increasing wiring and other engineering complexity. We present and analyze the statement that a 3D iBT network proposed by our technique outperforms 4D torus networks of the same node degree. We found that interlacing rings of sizes 6 and 12 to all three dimensions of a torus network with meshes 30 x30 x 36 generate the best network of all possible networks, including 4D torus and hypercube of approximately 32,000 nodes. This demonstrates that strategically interlacing bypass rings into a 3D torus network enhances the torus network more effectively than adding a fourth dimension, although we may generalize the claim. We also present a node-to-node distance formula for the iBT networks.
58
Joint Optimization of Complexity and Overhead for the Routing in Hierarchical Networks
The hierarchical network structure was proposed in the early 80s and becomes popular nowadays. The routing complexity and the routing table size are the two primary performance measures in a dynamic route guidance system. Although various algorithms exist for finding the best routing policy in a hierarchical network, hardly exists any work in studying and evaluating the aforementioned measures for a hierarchical network. In this paper, a new mathematical framework to carry out the averages of the routing complexity and the routing table size is proposed to express the routing complexity and the routing table size as the functions of the hierarchical network parameters such as the number of the hierarchical levels and the subscriber density (cluster-population) for each hierarchical level.
Madurai
Trichy
Kollam
14

59
Key Pre distribution Schemes for Establishing Pairwise Keys with a Mobile Sink in Sensor Networks
Security services such as authentication and pair wise key establishment are critical to sensor networks. They enable sensor nodes to communicate securely with each other using cryptographic techniques. In this paper, we propose two key pre distribution schemes that enable a mobile sink to establish a secure data-communication link, on the fly, with any sensor nodes. The proposed schemes are based on the polynomial pool-based key pre distribution scheme, the probabilistic generation key pre distribution scheme, and the Q-composite scheme. The security analysis in this paper indicates that these two proposed pre distribution schemes assure, with high probability and low communication overhead, that any sensor node can establish a pair wise key with the mobile sink. Comparing the two proposed key pre distribution schemes with the Q-composite scheme, the probabilistic key pre distribution scheme, and the polynomial pool-based scheme, our analytical results clearly show that our schemes perform better in terms of network resilience to node capture than existing schemes if used in wireless sensor networks with mobile sinks.
60
LBMP: A Logarithm-Barrier-Based Multipath Protocol for Internet Traffic Management
Traffic management is the adaptation of source rates and routing to efficiently utilize network resources. Recently, the complicated interactions between different Internet traffic management modules have been elegantly modeled by distributed primaldual utility maximization, which sheds new light for developing effective management protocols. For single-path routing with given routes, the dual is a strictly concave network optimization problem. Unfortunately, the general form of multipath utility optimization is not strictly concave, making its solution quite unstable. Decomposition-based techniques like TRafficmanagement Using Multipath Protocol (TRUMP) alleviates the instability, but their convergence is not guaranteed, nor is their optimality. They are also inflexible in differentiating the control at different links. In this paper, we address the above issues through a novel logarithm-barrier-based approach. Our approach jointly considers user utility and routing/congestion control. It translates the multipath utility maximization into a sequence of unconstrained optimization problems, with infinite logarithm barriers being deployed at the constraint boundary. We demonstrate that setting up barriers is much simpler than choosing traditional cost functions and, more importantly, it makes optimal solution achievable. We further demonstrate a distributed implementation, together with the design of a practical Logarithm Barrierbased- Multipath Protocol (LBMP). We evaluate the performance of LBMP through both numerical analysis and packet-level simulations. The results show that LBMP achieves high throughput and fast convergence over diverse representative network topologies. Such performance is comparable to TRUMP, and is often better. Moreover, LBMP is flexible in differentiating the control at different links, and its optimality and convergence are theoretically guaranteed.
61
Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip
Irregular and dynamic applications, such as graph problems and agent-based simulations, often require fine-grained parallelism to achieve good performance. However, current multicore processors only provide architectural support for coarse-grained parallelism, making it necessary to use software-based multithreading environments to effectively implement fine-grained parallelism. Although these software-based environments have demonstrated superior performance over heavyweight, OS-level threads, they are still limited by the significant overhead involved in thread management and synchronization. In order to address this, we propose a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an
Madurai
Trichy
Kollam
15

unlimited number of dynamically created lightweight threads with very low thread management and synchronization overhead. The LCMT architecture can be implemented atop a mainstream architecture with minimum extra hardware to leverage existing legacy software environments. We compare the LCMT architecture with a Niagara-like baseline architecture. Our results show up to 1.8X better scalability, 1.91X better performance, and more importantly, 1.74X better performance per watt, using the LCMT architecture for irregular and dynamic benchmarks, when compared to the baseline architecture. The LCMT architecture delivers similar performance to the baseline architecture for regular benchmarks.
62
Load Balance with Imperfect Information in Structured Peer-to-Peer Systems
With the notion of virtual servers, peers participating in a heterogeneous, structured peer-to-peer (P2P) network may host different numbers of virtual servers, and by migrating virtual servers, peers can balance their loads proportional to their capacities. The existing and decentralized load balance algorithms designed for the heterogeneous, structured P2P networks either explicitly construct auxiliary networks to manipulate global information or implicitly demand the P2P substrates organized in a hierarchical fashion. Without relying on any auxiliary networks and independent of the geometry of the P2P substrates, we present, in this paper, a novel load balancing algorithm that is unique in that each participating peer is based on the partial knowledge of the system to estimate the probability distributions of the capacities of peers and the loads of virtual servers, resulting in imperfect knowledge of the system state. With the imperfect system state, peers can compute their expected loads and reallocate their loads in parallel. Through extensive simulations, we compare our proposal to prior load balancing algorithms.
63
Many Task Computing for Real-Time Uncertainty Prediction and Data Assimilation in the Ocean
Uncertainty prediction for ocean and climate predictions is essential for multiple applications today. Many-Task Computing can play a significant role in making such predictions feasible. In this manuscript, we focus on ocean uncertainty prediction using the Error Subspace Statistical Estimation (ESSE) approach. In ESSE, uncertainties are represented by an error subspace of variable size. To predict these uncertainties, we perturb an initial state based on the initial error subspace and integrate the corresponding ensemble of initial conditions forward in time, including stochastic forcing during each simulation. The dominant error covariance (generated via SVD of the ensemble) is used for data assimilation. The resulting ocean fields are used as inputs for predictions of underwater sound propagation. ESSE is a classic case of Many Task Computing: It uses dynamic heterogeneous workflows and ESSE ensembles are data intensive applications. We first study the execution characteristics of a distributed ESSE workflow on a medium size dedicated cluster, examine in more detail the I/O patterns exhibited and throughputs achieved by its components as well as the overall ensemble performance seen in practice. We then study the performance/usability challenges of employing Amazon EC2 and the Teragrid to augment our ESSE ensembles and provide better solutions faster.
64
Mars: Accelerating MapReduce with Graphics Processors
We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on thousands of CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth. However, GPUs are designed as special-purpose coprocessors and their programming interfaces are less
Madurai
Trichy
Kollam
16

familiar than those on the CPUs to MapReduce programmers. To harness GPUs power for MapReduce, we developed Mars to run on NVIDIA GPUs, AMD GPUs as well as multicore CPUs. Furthermore, we integrated Mars into Hadoop, an open-source CPU-based MapReduce system. Mars hides the programming complexity of GPUs behind the simple and familiar MapReduce interface, and automatically manages task partitioning, data distribution, and parallelization on the processors. We have implemented six representative applications on Mars and evaluated their performance on PCs equipped with GPUs as well as multicore CPUs. The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications. Additionally, integrating Mars into Hadoop enabled GPU acceleration for a network of PCs.
65
Massively LDPC Decoding on Multicore Architectures
Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed in this paper to perform LDPC decoding on multicore architectures. To evaluate the efficiency of the proposed parallel algorithms, LDPC decoders were developed on recent multicores, such as off-theshelf general-purpose x86 processors, Graphics Processing Units (GPUs), and the CELL Broadband Engine (CELL/B.E.). Challenging restrictions, such as memory access conflicts, latency, coalescence, or unknown behavior of thread and block schedulers, were unraveled and worked out. Experimental results for different code lengths show throughputs in the order of 1 ~ 2 Mbps on the general-purpose multicores, and ranging from 40 Mbps on the GPU to nearly 70 Mbps on the CELL/B.E. The analysis of the obtained results allows to conclude that the CELL/B.E. performs better for short to medium length codes, while the GPU achieves superior throughputs with larger codes. They achieve throughputs that in some cases approach very well those obtained with VLSI decoders. From the analysis of the results, we can predict a throughput increase with the rise of the number of cores. Index TermsLDPC, data-parallel computing, multicore, graphics
66
Maximizing the Number of Broadcast Operations in Random Geometric Ad Hoc Wireless Networks
We consider static ad hoc wireless networks whose nodes, equipped with the same initial battery charge, may dynamically change their transmission range. When a node v transmits with range r(v), its battery charge is decreased by B r(v)2, where B > 0 is a fixed constant. The goal is to provide a range assignment schedule that maximizes the number of broadcast operations from a given source (this number is denoted by the length of the schedule). This maximization problem, denoted by MAX LIFETIME, is known to be NP-hard and the best algorithm yields worst-case approximation ratio (log n), where n is the number
of nodes of the network. We consider random geometric instances formed by selecting n points independently and uniformly at random from a square of side length root( n) p in the Euclidean plane. We present an efficient algorithm that constructs a range assignment schedule having length not smaller than 1\2 of the optimum with high probability. Then we design an efficient distributed version of the above algorithm, where nodes initially know n and their own position only. The resulting schedule guarantees the same approximation ratio achieved by the centralized version, thus, obtaining the first distributed algorithm having provably good performance for this problem.
67
Measuring Client-Perceived Page view Response Time of Internet Services
As e-commerce services are exponentially growing, businesses need quantitative estimates of client-perceived response times to continuously improve the quality of their services. Current server-side nonintrusive measurement techniques are limited to non secured HTTP traffic. In this paper, we present the design and evaluation a monitor,
Madurai
Trichy
Kollam
17

namely s Monitor, which is able to measure client-perceived response times for both HTTP and HTTPS traffic. At the heart of s Monitor is a novel size-based analysis method that parses live packets to delimit different web pages and to infer their response times. The method is based on the observation that most HTTP(S)-compatible browsers send significantly larger requests for container objects than those for embedded objects. S Monitor is designed to operate accurately in the presence of complicated browser behaviors, such as parallel downloading of multiple web pages and HTTP pipelining, as well as packet losses and delays. It requires only to passively collect network traffic in and out of the monitored secured services. We conduct comprehensive experiments across a wide range of operating conditions using live secured Internet services, on the Planet Lab, and on controlled networks. The experimental results demonstrate that s Monitor is able to control the estimation error within 6.7 percent, in comparison with the actual measured time at the client side.
68
Metadata Distribution and Consistency Techniques for Large-Scale Cluster File Systems
Most supercomputers nowadays are based on large clusters, which call for sophisticated, scalable, and decentralized metadata processing techniques. From the perspective of maximizing metadata throughput, an ideal metadata distribution policy should automatically balance the namespace locality and even distribution without manual intervention. None of existing metadata distribution schemes is designed to make such a balance. We propose a novel metadata distribution policy, Dynamic Dir-Grain (DDG), which seeks to balance the requirements of keeping namespace locality and even distribution of the load by dynamic partitioning of the namespace into size-adjustable hierarchical units. Extensive simulation and measurement results show that DDG policies with a proper granularity significantly outperform traditional techniques such as the Random policy and the Sub tree policy by 40 percent to 62 times. In addition, from the perspective of file system reliability, metadata consistency is an equally important issue. However, it is complicated by dynamic metadata distribution. Metadata consistency of cross-metadata server operations cannot be solved by traditional metadata journaling on each server. While traditional two-phase commit (2PC) algorithm can be used, it is too costly for distributed file systems. We proposed a consistent metadata processing protocol, S2PC-MP, which combines the two-phase commit algorithm with metadata processing to reduce overheads. Our measurement results show that S2PC-MP not only ensures fast recovery, but also greatly reduces fail-free execution overheads.
69
Minimum-Delay Service Provisioning in Opportunistic Networks
Opportunistic networks are created dynamically by exploiting contacts between pairs of mobile devices that come within communication range. While forwarding in opportunistic networking has been explored, investigations into asynchronous service provisioning on top of opportunistic networks are unique contributions of this paper. Mobile devices are typically heterogeneous, possess disparate physical resources, and can provide a variety of services. During opportunistic contacts, the pairing peers can cooperatively provide (avail of) their (other peers) services. This service provisioning paradigm is a key feature of the emerging opportunistic computing paradigm. We develop an analytical model to study the behaviors of service seeking nodes (seekers) and service providing nodes (providers) that spawn and execute service requests, respectively. The model considers the case in which seekers can spawn parallel executions on multiple providers for any given request, and determines: 1) the delays at different stages of service provisioning; and 2) the optimal number of parallel executions that minimizes the expected execution time. The analytical model is validated through simulations, and exploited to investigate the performance of service provisioning over a wide range of parameters.
Madurai
Trichy
Kollam
18

70
Multicloud Deployment of Computing Clusters for Loosely Coupled MTC Applications
Cloud computing is gaining acceptance in many IT organizations, as an elastic, flexible, and variable-cost way to deploy their service platforms using outsourced resources. Unlike traditional utilities where a single provider scheme is a common practice, the ubiquitous access to cloud resources easily enables the simultaneous use of different clouds. In this paper, we explore this scenario to deploy a computing cluster on the top of a multi cloud infrastructure, for solving loosely coupled Many-Task Computing (MTC) applications. In this way, the cluster nodes can be provisioned with resources from different clouds to improve the cost effectiveness of the deployment, or to implement high-availability strategies. We prove the viability of this kind of solutions by evaluating the scalability, performance, and cost of different configurations of a Sun Grid Engine cluster, deployed on a multi cloud infrastructure spanning a local data center and three different cloud sites: Amazon EC2 Europe, Amazon EC2 US, and Elastic Hosts. Although the testbed deployed in this work is limited to a reduced number of computing resources (due to hardware and budget limitations), we have complemented our analysis with a simulated infrastructure model, which includes a larger number of resources, and runs larger problem sizes. Data obtained by simulation show that performance and cost results can be extrapolated to large-scale problems and cluster infrastructures.
71
Multipath Routing and Max-Min Fair QoS Provisioning under Interference Constraints in Wireless Multi hop Networks
In this paper, we investigate the problem of flow routing and fair bandwidth allocation under interference constraints for multi hop wireless networks. We first develop a novel isotonic routing metric, RI3M, considering the influence of interflow and intra flow interference. The isotonicity of the routing metric is proved using virtual network decomposition. Second, in order to ensure QoS, an interference-aware max-min fair bandwidth allocation algorithm, LMX:M3F, is proposed where multiple paths (determined by using the routing metric) coexist for each user to the base station. In order to solve the algorithm, we develop an optimization formulation that is modeled as a multi commodity flow problem where the lexicographically largest bandwidth allocation vector is found among all optimal allocation vectors while considering constraints of interference on the flows. We compare our RI3M routing metric and LMX:M3F bandwidth allocation algorithm with various interference-based routing metrics and interference-aware bandwidth allocation algorithms established in the literature. We show that RI3M and LMX:M3F succeed in improving network performance in terms of delay, packet loss ratio, and bandwidth usage.
72
Multispanning Tree Zone-Ordered Label-Based Routing Algorithms for Irregular Networks
In this paper, a diverse range of routing algorithms is classified into a new family of routings called zone-ordered label based routing algorithms. The proposed classification is based on three common steps (factors) for generating such routings, namely, graph labeling, deadlock-free zones, and zone ordering. The main goal of this classification is to define several new routing concepts and streamline the knowledge on routing algorithms. Following the classification, a novel methodology is proposed to generate routing algorithms for irregular networks. The methodology uses the
Madurai
Trichy
Kollam
19

three mentioned steps to generate deadlock-free routings. Consequently, the methodology-based routings fall into the category of zone-ordered label-based routings. However, the graph labeling method (first step) used in the methodology is based on multiple spanning tree construction on the network. The simulation results show that constructing further spanning trees may result in routing algorithm with better performance.
73
Network Immunization with Distributed Autonomy-Oriented Entities
Many communication systems, e.g., internet, can be modeled as complex networks. For such networks, immunization strategies are necessary for preventing malicious attacks or viruses being percolated from a node to its neighboring nodes following their connectivities. In recent years, various immunization strategies have been proposed and demonstrated, most of which rest on the assumptions that the strategies can be executed in a centralized manner and/or that the complex network at hand is reasonably stable (its topology will not change overtime). In other words, it would be difficult to apply them in a decentralized network environment, as often found in the real world. In this paper, we propose a decentralized and scalable immunization strategy based on a self-organized computing approach called autonomy-oriented computing (AOC) [1], [2]. In this strategy, autonomous behavior-based entities are deployed in a decentralized network, and are capable of collectively finding those nodes with high degrees of conductivities (i.e., those that can readily spread viruses). Through experiments involving both synthetic and real-world networks, we demonstrate that this strategy can effectively and efficiently locate highly-connected nodes in decentralized complex network environments of various topologies, and it is also scalable in handling large-scale decentralized networks. We have compared our strategy with some of the well-known strategies, including acquaintance and covering strategies on both synthetic and real-world networks.
74
New theory for Deadlock-Free Multicast Routing in Warmhole-Switched Virtual-channelless Networks-On-Chips
A new theory for deadlock-free multicast routing especially used for on-chip interconnection network (NoC) is presented in this paper. The NoC router hardware solution that enables the deadlock-free multicast routing without utilizing virtual channels is introduced formally. The special characteristic of the NoC is that, wormhole packets can cut-through at flit-level and can be interleaved in the same channel with other flits of different packets by multiplexing it using a rotating flit-by-flit arbitration. The routing paths of each flit can be guaranteed correct because flits belonging to the same packet are labeled with the same local Id-tag on every communication channel. Hence, multicast deadlock problem can be solved at each router by further applying a hold-release tagging mechanism to control and manage conflicting multicast requests.
75
Nonnegative Tensor Factorization Accelerated Using GPGPU
This article presents an optimized algorithm for Nonnegative Tensor Factorization (NTF), implemented in the CUDA (Compute Uniform Device Architecture) framework, that runs on contemporary graphics processors and exploits their massive parallelism. The NTF implementation is primarily targeted for analysis of high-dimensional spectral images, including dimensionality reduction, feature extraction, and other tasks related to spectral imaging; however, the algorithm and its implementation are not limited to spectral imaging. The speedups measured on real spectral images are around 60 x 100x compared to a traditional C implementation compiled with an optimizing compiler. Since common problems in the field of spectral imaging may take hours on a state-of-the-art CPU, the speedup achieved using a
Madurai
Trichy
Kollam
20

graphics card is attractive. The implementation is publicly available in the form of a dynamically linked library, including an interface to MATLAB, and thus may be of help to researchers and engineers using NTF on large problems.
76
New theory for Deadlock-Free Multicast Routing in Warmhole-Switched Virtual-channelless Networks-On-Chips
A new theory for deadlock-free multicast routing especially used for on-chip interconnection network (NoC) is presented in this paper. The NoC router hardware solution that enables the deadlock-free multicast routing without utilizing virtual channels is introduced formally. The special characteristic of the NoC is that, wormhole packets can cut-through at flit-level and can be interleaved in the same channel with other flits of different packets by multiplexing it using a rotating flit-by-flit arbitration. The routing paths of each flit can be guaranteed correct because flits belonging to the same packet are labeled with the same local Id-tag on every communication channel. Hence, multicast deadlock problem can be solved at each router by further applying a hold-release tagging mechanism to control and manage conflicting multicast requests.
77
Nonnegative Tensor Factorization Accelerated Using GPGPU
This article presents an optimized algorithm for Nonnegative Tensor Factorization (NTF), implemented in the CUDA (Compute Uniform Device Architecture) framework, that runs on contemporary graphics processors and exploits their massive parallelism. The NTF implementation is primarily targeted for analysis of high-dimensional spectral images, including dimensionality reduction, feature extraction, and other tasks related to spectral imaging; however, the algorithm and its implementation are not limited to spectral imaging. The speedups measured on real spectral images are around 60 x 100x compared to a traditional C implementation compiled with an optimizing compiler. Since common problems in the field of spectral imaging may take hours on a state-of-the-art CPU, the speedup achieved using a graphics card is attractive. The implementation is publicly available in the form of a dynamically linked library, including an interface to MATLAB, and thus may be of help to researchers and engineers using NTF on large problems.
78
On Movement-Assisted Connectivity Restoration in Wireless Sensor and Actor Networks
In wireless sensor and actor networks (WSANs), a set of static sensor nodes and a set of (mobile) actor nodes form a network that performs distributed sensing and actuation tasks. In [1], Abbasi et al. presented DARA, a Distributed Actor Recovery Algorithm, which restores the connectivity of the interactor network by efficiently relocating some mobile actors when failure of an actor happens. To restore 1 and 2-connectivity of the network, two algorithms are developed in [1]. Their basic idea is to find the smallest set of actors that needs to be repositioned to restore the required level of connectivity, with the objective to minimize the movement overhead of relocation. Here, we show that the algorithms proposed in [1] will not work smoothly in all scenarios as claimed and give counterexamples for some algorithms and theorems proposed in [1]. We then present a general actor relocation problem and propose methods that will work correctly for several subsets of the problems. Specifically, our method does result in an optimum movement strategy with minimum movement overhead for the problems studied in [1].
Madurai
Trichy
Kollam
21

79
On the Cost of Network Inference Mechanisms
A number of network path delay, loss, or bandwidth inference mechanisms have been proposed over the past decade. Concurrently, several network measurement services have been deployed over the Internet and intranets. We consider inference mechanisms that use On end-to-end measurements to predict the On2 end-to-end pairwise measurements among n nodes, and investigate when it is beneficial to use them in measurement services. In particular, we address the following questions : 1) For which measurement request patterns would using an inference mechanism be advantageous? 2) How does a measurement service determine the set of hosts that should utilize inference mechanisms, as opposed to those that are better served using direct end-to-end measurements? We explore three solutions that identify groups of hosts which are likely to benefit from inference. We compare these solutions in terms of effectiveness and algorithmic complexity. Results with synthetic data sets and data sets from a popular peer-to-peer system demonstrate that our techniques accurately identify host subsets that benefit from inference, in significantly less time than an algorithm that identifies optimal subsets. The measurement savings are large when measurement request patterns exhibit small-world characteristics, which is often the case. (Part of this work (focusing on one of three solutions presented in this paper) appeared in [1]).
80
Online Capacity Identification of Multitier Websites Using Hardware Performance Counters
Understanding server capacity is crucial to system capacity planning, configuration, and QoS-aware resource management. Conventional stress testing approaches measure server capacity offline in terms of application-level performance metrics like response time and throughput. They are limited in measurement accuracy and timeliness. In a multitier website, resource bottleneck often shifts between tiers as client access pattern changes. This makes the problem of online capacity measurement even more challenge. This paper presents an online measurement approach based on low-level hardware performance metrics such as instructions execution rate and cache access behavior. Such metrics together define a system internal running state. The measurement approach uses machine learning techniques to infer application-level performance at each tier from a set of selected hardware performance counters. A coordinated predictor is induced over individual tier-wide models to make global system performance prediction and identify the bottleneck when the system becomes overloaded. Experiments were conducted on a two-tier Tomcat/My SQL-configured website using TPC-W benchmarks. Experimental results demonstrated that this approach was able to achieve an overload prediction accuracy of higher than 90 percent for a priori known input traffic mix and over 85 percent accuracy even for traffic causing frequent bottleneck shifting. It costs less than 0.5 percent runtime overhead for data collection and no more than 50 ms for each online decision making.
81
Optimization of Rate Allocation with Distortion Guarantee in Sensor Networks
Lossy compression techniques are commonly used by long-term data-gathering applications that attempt to identify trends or other interesting patterns in an entire system since a data packet need not always be completely and immediately transmitted to the sink. In these applications, a non terminal sensor node jointly encodes its own sensed data and the data received from its nearby nodes. The tendency for these nodes to have a high spatial correlation means that these data packets can be efficiently compressed together using a rate-distortion strategy. This paper addresses the optimal rate-distortion allocation problem, which determines an optimal bit rate of each sensor based on
Madurai
Trichy
Kollam
22

the target overall distortion to minimize the network transmission cost. We propose an analytically optimal ratedistortion allocation scheme, and we also extend it to a distributed version. Based on the presented allocation schemes, a greedy heuristic algorithm is proposed to build the most efficient data transmission structure to further reduce the transmission cost. The proposed methods were evaluated using simulations with real-world data sets. The simulation results indicate that the optimal allocation strategy can reduce the transmission cost to 6 ~ 15% of that for the uniform allocation scheme.
82
Parallel Implementation of the Irregular Terrain Model (ITM) for Radio Transmission Loss Prediction
The Irregular Terrain Model (ITM), also known as the Longley-Rice model, predicts long-range average transmission loss of a radio signal based on atmospheric and geographic conditions. Due to variable terrain effects and constantly changing atmospheric conditions which can dramatically influence radio wave propagation, there is a pressing need for computational resources capable of running hundreds of thousands of transmission loss calculations per second. Multicore processors, like the NVIDIA Graphics Processing Unit (GPU) and IBM Cell Broadband Engine (BE), offer improved performance over mainstream microprocessors for ITM. We study architectural features of the Tesla C870 GPU and Cell BE and evaluate the effectiveness of architecture-specific optimizations and parallelization strategies for ITM on these platforms. We assess the GPU implementations that utilize both global and shared memories along with fine-grained parallelism. We assess the Cell BE implementations that utilize direct memory access, double buffering, and SIMDization. With these optimization strategies, we achieve less than a second of computation time on each platform which is not feasible with a general purpose processor, and we observe that the GPU delivers better performance than Cell BE in terms of total execution time and performance per watt metrics by a factor of 2.3x and 1.6x, respectively.
83
Parameter Exploration in Science and Engineering Using Many-Task Computing
Robust scientific methods require the exploration of the parameter space of a system (some of which can be run in parallel on distributed resources), and may involve complete state space exploration, experimental design, or numerical optimization techniques. Many-Task Computing (MTC) provides a framework for performing robust design, because it supports the execution of a large number of otherwise independent processes. Further, scientific workflow engines facilitate the specification and execution of complex software pipelines, such as those found in real science and engineering design problems. However, most existing workflow engines do not support a wide range of experimentation techniques, nor do they support a large number of independent tasks. In this paper, we discuss Nimrod/Ka set of add in components and a new run time machine for a general workflow engine, Kepler. Nimrod/K provides an execution architecture based on the tagged dataflow concepts, developed in 1980s for highly parallel machines. This is embodied in a new Kepler Director that supports many-task computing by orchestrating execution of tasks on on clusters, Grids, and Clouds. Further, Nimrod/K provides a set of Actors that facilitate the various modes of parameter exploration discussed above. We demonstrate the power of Nimrod/K to solve real problems in cardiac science.
84
Madurai
PARTIC: Power-Aware Response Time Control for Virtualized Web Servers
Both power and performance are important concerns for enterprise data centers. While various management strategies have been developed to effectively reduce server power consumption by transitioning hardware components to lower
Trichy
Kollam
23

power states, they cannot be directly applied to todays data centers that rely on virtualization technologies. Virtual machines running on the same physical server are correlated because the state transition of any hardware component will affect the application performance of all the virtual machines. As a result, reducing power solely based on the performance level of one virtual machine may cause another to violate its performance specification. This paper proposes PARTIC, a two-layer control architecture designed based on well-established control theory. The primary control loop adopts a multi-input multi-output control approach to maintain load balancing among all virtual machines so that they can have approximately the same performance level relative to their allowed peak values. The secondary performance control loop then manipulates CPU frequency for power efficiency based on the uniform performance level achieved by the primary loop. Empirical results demonstrate that PARTIC can effectively reduce server power consumption while achieving required application-level performance for virtualized enterprise servers.
85
Passive Network Performance Estimation for Large-Scale, Data-Intensive Computing
Distributed computing applications are increasingly utilizing distributed data sources. However, the unpredictable cost of data access in large-scale computing infrastructures can lead to severe performance bottlenecks. Providing predictability in data access is, thus, essential to accommodate the large set of newly emerging large-scale, data-intensive computing applications. In this regard, accurate estimation of network performance is crucial to meeting the performance goals of such applications. Passive estimation based on past measurements is attractive for its relatively small overhead compared to relying on explicit probing. In this paper, we take a passive approach for network performance estimation. Our approach is different from existing passive techniques that rely either on past direct measurements of pairs of nodes or on topological similarities. Instead, we exploit secondhand measurements collected by other nodes without any topological restrictions. In this paper, we present Overlay Passive Estimation of Network performance (OPEN), a scalable framework providing end-to-end network performance estimation based on secondhand measurements, and discuss how OPEN achieves cost-effective estimation in a large-scale infrastructure. Our extensive experimental results show that OPEN estimation can be applicable for replica and resource selections commonly used in distributed computing.
86
Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing
Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike. Through the use of virtualization and resource time sharing, clouds serve with a single set of physical resources a large user base with different needs. Thus, clouds have the potential to provide to their owners the benefits of an economy of scale and, at the same time, become an alternative for scientists to clusters, grids, and parallel production environments. However, the current commercial clouds have been built to support web and small database workloads, which are very different from typical scientific computing workloads. Moreover, the use of virtualization and resource time sharing may introduce significant performance penalties for the demanding scientific computing workloads. In this work, we analyze the performance of cloud computing services for scientific computing workloads. We quantify the presence in real scientific computing workloads of Many-Task Computing (MTC) users, that is, of users who employ loosely coupled applications comprising many tasks to achieve their scientific goals. Then, we perform an empirical evaluation of the performance of four commercial cloud computing services including Amazon EC2, which is currently the largest commercial cloud. Last, we compare through trace-based simulation the performance characteristics and cost models of clouds and other
Madurai
Trichy
Kollam
24

scientific computing platforms, for general and MTC-based scientific computing workloads. Our results indicate that the current clouds need an order of magnitude in performance improvement to be useful to the scientific community, and show which improvements should be considered first to address this discrepancy between offer and demand.
87
Performance Evaluation of Convolution on the Cell Broadband Engine Processor
Convolution represents a major computational load for many scientific and engineering applications, including seismic surface simulations and seismic imaging. Since convolution presents a heavy computational load, increasing its efficiency can significantly enhance the performance of associated applications. In this work, we present an in-depth analysis of the convolution algorithm and its complexity in order to develop adequate parallel algorithms. The implementation of these algorithms and their evaluation on the IBM Cell Broadband Engine (BE) processor reveals the gains and losses achieved by parallelizing the direct convolution. The performance results show that despite the complexity of the convolution processing, a speedup gain of at least 71.4 is obtained. The parallel vectorized algorithm requires the development effort of considering three independent vectorization strategies. Given the wide availability of Cell processors, the proposed parallelization approach can be widely adopted by any convolution-based application.
88
Performance of Acyclic Stochastic Networks with Network Coding

Network coding allows a network node to code the information flows before forwarding them. While it has been theoretically proved that network coding can achieve maximum network throughput, the theoretical results usually do not consider the burstiness of data traffic, delays, and the stochastic nature in information processing and transmission. There is currently no theory to systematically model and evaluate the performance of network coding, especially when nodes capacity (i.e., coding and transmission) becomes stochastic. Without such a theory, the performance of network coding under various system settings is far from clear. To fill the vacancy, we develop an analytical approach by extending the stochastic network calculus theory to tackle the special difficulties in the evaluation of network coding. We prove the new properties of the stochastic network calculus and design an algorithm to obtain the performance bounds for acyclic stochastic networks with network coding. The tightness of theoretical bounds is validated with simulation.
89
Predictable High-Performance Computing Using Feedback Control and Admission Control
Historically, batch scheduling has dominated the management of High-Performance Computing (HPC) resources. One of the most significant limitations using this approach is an inability to predict both the start time and end time of jobs. Although existing researches such as resource reservation and queue-time prediction partially address this issue, a more predictable HPC system is needed, particularly for an emerging class of adaptive real-time HPC applications. This paper presents a design and implementation of a predictable HPC system using feedback control and admission control. By creating a virtualized application layer and opportunistically multiplexing concurrent applications through the application of formal control theory, we regulate a jobs progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected events. Admission control regulates access to resources when oversubscribed. Our experimental results using five widely used applications show that the feedback and admission controller achieves highly predictable HPC system. The designed feedback controller regulates the HPC jobs progress accurately, close to the prediction by theory, thereby, showing the successful application of classic control theory to HPC workloads. In week-long experiments, over 90 percent of jobs met deadlines and the jobs missing deadlines still finished close to the requested deadlines (12.4 percent error).
Madurai
Trichy
Kollam
25

90
Privacy in VoIP Networks: Flow Analysis Attacks and Defense
A short version of this paper appears in IEEE INFOCOM 2009: http://www.research.ibm.com/people/i/iyengar/ INFOCOM2009-kanon.pdf.) Peer-to-peer VoIP (voice over IP) networks, exemplified by Skype [5], are becoming increasingly popular due to their significant cost advantage and richer call forwarding features than traditional public switched telephone networks. One of the most important features of a VoIP network is privacy (for VoIP clients). Unfortunately, most peer-to-peer VoIP networks neither provide personalization nor guarantee a quantifiable privacy level. In this paper, we propose novel flow analysis attacks that demonstrate the vulnerabilities of peer-to-peer VoIP networks to privacy attacks. We then address two important challenges in designing privacy-aware VoIP networks: Can we provide personalized privacy guarantees for VoIP clients that allow them to select privacy requirements on a percall basis? How to design VoIP protocols to support customizable privacy guarantee? This paper proposes practical solutions to address these challenges using a quantifiable k-anonymity metric and a privacy-aware VoIP route setup and route maintenance protocols. We present detailed experimental evaluation that demonstrates the performance and scalability of our protocol, while meeting customizable privacy guarantees.
91
Privacy Preserving Collaborative Enforcement of Firewall Policies in Virtual Private Networks
The widely deployed Virtual Private Network (VPN) technology allows roaming users to build an encrypted tunnel to a VPN server, which, henceforth, allows roaming users to access some resources as if that computer were residing on their home organizations network. Although VPN technology is very useful, it imposes security threats on the remote network because its firewall does not know what traffic is flowing inside the VPN tunnel. To address this issue, we propose VGuard, a framework that allows a policy owner and a request owner to collaboratively determine whether the request satisfies the policy without the policy owner knowing the request and the request owner knowing the policy. We first present an efficient protocol, called Xhash, for oblivious comparison, which allows two parties, where each party has a number, to compare whether they have the same number, without disclosing their numbers to each other. Then, we present the VGuard framework that uses Xhash as the basic building block. The basic idea of VGuard is to first convert a firewall policy to nonoverlapping numerical rules and then use Xhash to check whether a request matches a rule. Comparing with the Cross-Domain Cooperative Firewall (CDCF) framework, which represents the state-of-theart, VGuard is not only more secure but also orders of magnitude more efficient. On real-life firewall policies, for processing packets, our experimental results show that VGuard is three to four orders of magnitude faster than CDCF.
92
Processor Array Architectures for Scalable Radix 4 Montgomery Modular Multiplication Algorithm
This paper presents a systematic methodology for exploring possible processor arrays of scalable radix 4 modular Montgomery multiplication algorithm. In this methodology, the algorithm is first expressed as a regular iterative expression, then the algorithm data dependence graph and a suitable affine scheduling function are obtained. Four possible processor arrays are obtained and analyzed in terms of speed, area, and power consumption. To reduce power consumption, we applied low power techniques for reducing the glitches and the Expected Switching Activity
Madurai
Trichy
Kollam
26

(ESA) of high fan-out signals in our processor array architectures. The resulting processor arrays are compared to other efficient ones in terms of area, speed, and power consumption.
93
QoS-Aware Dynamic Adaptation for Cooperative Media Streaming in Mobile Environments
Media streaming is expected to be one of the most promising services in mobile environments. Effective data streaming management techniques are, therefore, in strong demand. In an earlier paper, the ideas and benefits of two-level cooperative media streaming with headlight prefetching and dynamic chaining were demonstrated [1]. Though complementary to each other, they operate in session-wide static and distinctive modes. Moreover, users do not have control over the quality and cost levels of the streaming services. The performance degradation or cost increment can reach an unacceptable level under fast or highly unstable moving patterns. In this paper, we propose the QoS-based dynamic adaptation techniques for the flexible employment and smooth integration of headlight prefetching and dynamic chaining to continuously provide quality streaming services to mobile users. The QoS-aware dynamic headlight prefetching is for the cooperation between streaming access points to dynamically adjust the prefetching scheme in response to the fast changing moving patterns. Adaptive P2P media streaming is for the cooperation between mobile users such that multiple peers can be used as streaming sources to increase the likelihood of successful chaining. Furthermore, a QoS-based technique is developed to dynamically trigger and proportionally adjust the prefetching degree when the stability and quality of P2P streaming service vary. With extensive simulation and performance evaluation, we demonstrate that the proposed dynamic adaptation techniques significantly improve the service quality and streaming performance of cooperative media streaming in mobile environments.
94
Quasi-Output-Buffered Switches
It is well known that output-buffered switches have better performance than other switch architectures. However, outputbuffered switches also suffer from the notorious scalability problem, and direct constructions of large outputbuffered switches are difficult. In this paper, we study the problem of constructing scalable switches that have comparable performance (in the sense of 100 percent throughput and first-in first-out (FIFO) delivery of packets from the same flow) to output-buffered switches. For this, we propose a new concept, called quasi-output-buffered switch. Like an output-buffered switch, a quasi-output-buffered switch is a deterministic switch that achieves 100 percent throughput and delivers packets from the same flow in the FIFO order. Using the three stage Clos network, we show that one can recursively construct a larger quasi-output-buffered switch with a set of smaller quasi output- buffered switches. By recursively expanding the three-stage Clos network, we obtain a quasi-output-buffered switch with only 2 2 switches. Such a switch is called a packet-pair switch in this paper as it always transmits packets in pairs. By computer simulations, we show that packet-pair switches have better delay performance than most load-balanced switches with comparable construction complexity.
Madurai
Trichy
Kollam
27

95
Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems
Although anonymizing Peer-to-Peer (P2P) systems often incurs extra traffic costs, many systems try to mask the identities of their users for privacy considerations. Existing anonymity approaches are mainly path-based: peers have to pre-construct an anonymous path before transmission. The overhead of maintaining and updating such paths is significantly high. We propose Rumor Riding (RR), a lightweight and non-path-based mutual anonymity protocol for decentralized P2P systems. Employing a random walk mechanism, RR takes advantage of lower overhead by mainly using the symmetric cryptographic algorithm. We conduct comprehensive trace-driven simulations to evaluate the effectiveness and efficiency of this design, and compare it with previous approaches. We also introduce some early experiences on RR implementations.
96
Satisfiability Modulo Graph Theory for Task Mapping and Scheduling on Multiprocessor Systems
Task graph scheduling on multiprocessor systems is a representative multiprocessor scheduling problem. A solution to this problem consists of the mapping of tasks to processors and the scheduling of tasks on each processor. Optimal solution can be obtained by exploring the entire design space of all possible mapping and scheduling choices. Since the problem is NP-hard, scalability becomes the main concern in solving the problem optimally. In this paper, a SATbased optimization framework is proposed to address this problem, in which SAT solver is enhanced by integrating with a scheduling analysis tool in a branch and bound manner to prune the solution space efficiently. Performance evaluation results show that our technique has average performance improvement in more than an order of magnitude compared to state-of-the-art techniques. We further build a cycle-accurate network-on-chip simulator based on SystemC to verify the effectiveness of the proposed technique on realistic multiprocessor systems.
97
Sensor Placement Algorithms for Fusion-Based Surveillance Networks
Mission-critical target detection imposes stringent performance requirements for wireless sensor networks, such as high detection probabilities and low false alarm rates. Data fusion has been shown as an effective technique for improving system detection performance by enabling efficient collaboration among sensors with limited sensing capability. Due to the high cost of network deployment, it is desirable to place sensors at optimal locations to achieve maximum detection performance. However, for sensor networks employing data fusion, optimal sensor placement is a nonlinear and nonconvex optimization problem with prohibitively high computational complexity. In this paper, we present fast sensor placement algorithms based on a probabilistic data fusion model. Simulation results show that our algorithms can meet the desired detection performance with a small number of sensors while achieving up to sevenfold speedup over the optimal algorithm.
Madurai
Trichy
Kollam
28

98
Site-Based Partitioning and Repartitioning Techniques for Parallel PageRank Computation
The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm.
99
Speed Improves Delay-Capacity Trade-Off in MotionCast
In this paper, we study a unified mobility model for mobile multicast (MotionCast) with n nodes, and k destinations for each multicast session. This model considers nodes which can either serve in a local region or move around globally, with a restricted speed R. In other words, there are two particular forms: Local-based Speed-Restricted Model (LSRM) and Global-based Speed-Restricted Model (GSRM). We find that there is a special turning point when mobility speed varies from zero to the scale of network. For LSRM, as R increases, the delay-capacity trade-off ratio decreases iff R is greater than the turning point ffiffi 1 k q ; For GSRM, as R increases, the trade-off ratio decreases iff R is smaller k0:ffi2ffi5 n p when k on23 , and at k n when k
than the turning point, where the turning point is located at !n23 . As k increases from 1 to n When R
1, the region that mobility can improve delay-capacity trade-off is enlarged.
1, the optimal delay-capacity trade-off ratio is achieved. This paper presents a general approach to study
the performance of wireless networks under more flexible mobility models.
100
Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems
This paper presents a scalable and adaptive decentralized metadata lookup scheme for ultralarge-scale file systems (more than Petabytes or even Exabytes). Our scheme logically organizes metadata servers (MDSs) into a multilayered query hierarchy and exploits grouped Bloom filters to efficiently route metadata requests to desired MDSs through the hierarchy. This metadata lookup scheme can be executed at the network or memory speed, without being bounded by the performance of slow disks. An effective workload balance method is also developed in this paper for server reconfigurations. This scheme is evaluated through extensive trace-driven simulations and a prototype implementation in Linux. Experimental results show that this scheme can significantly improve metadata management scalability and query efficiency in ultra large-scale storage systems.
Madurai
Trichy
Kollam
29

101
Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems
This paper presents a scalable and adaptive decentralized metadata lookup scheme for ultralarge-scale file systems (more than Petabytes or even Exabytes). Our scheme logically organizes metadata servers (MDSs) into a multilayered query hierarchy and exploits grouped Bloom filters to efficiently route metadata requests to desired MDSs through the hierarchy. This metadata lookup scheme can be executed at the network or memory speed, without being bounded by the performance of slow disks. An effective workload balance method is also developed in this paper for server reconfigurations. This scheme is evaluated through extensive trace-driven simulations and a prototype implementation in Linux. Experimental results show that this scheme can significantly improve metadata management scalability and query efficiency in ultralarge-scale storage systems.
102
TASA: Tag-Free Activity Sensing Using RFID Tag Arrays
Radio Frequency IDentification (RFID) has attracted considerable attention in recent years for its low cost, general availability, and location sensing functionality. Most existing schemes require the tracked persons to be labeled with RFID tags. This requirement may not be satisfied for some activity sensing applications due to privacy and security concerns and uncertainty of objects to be monitored, e.g., group behavior monitoring in warehouses with privacy limitations, and abnormal customers in banks. In this paper, we propose TASATag-free Activity Sensing using RFID tag Arrays for location sensing and frequent route detection. TASA relaxes the monitored objects from attaching RFID tags, online recovers and checks frequent trajectories by capturing the Received Signal Strength Indicator (RSSI) series for passive RFID tag arrays where objects traverse. In order to improve the accuracy for estimated trajectories and accelerate location sensing, TASA introduces reference tags with known positions. With the readings from reference tags, TASA can locate objects more accurately. Extensive experiment shows that TASA is an effective approach for certain activity sensing applications.
103
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions
In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. Job co-scheduling includes two tasks: the estimation of co-run performance, and the determination of suitable co-schedules. Most existing studies in job co-scheduling have concentrated on the first task but relies on simple techniques (e.g., trying different schedules) for the second. This paper presents a systematic exploration to the second task. The paper uncovers the computational complexity of the determination of optimal job co-schedules,
Madurai
Trichy
Kollam
30

proving its NP-completeness. It introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal effectively by proposing several heuristics-based algorithms. These discoveries may facilitate the assessment of job co-schedulers by providing necessary baselines, as well as shed insights to the development of co-scheduling algorithms in practical systems.
104
The Small World of File Sharing
Webcaches, content distribution networks, peer-to-peer file-sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of communitys data access patterns.We propose a novel perspective for analyzing data access workloads that considers the implicit relationships that form among users based on the data they access. We propose a new structurethe interest-sharing graphthat capturescommonuser interests in data and justify its utility with studies on four data-sharing systems: a high-energy physics collaboration, the Web, the Kazaa peer-to-peer network, and a BitTorrent file-sharing community. We find small-world patterns in the interest-sharing graphs of all four communities. We investigate analytically and experimentally some of the potential causes that lead to this pattern and conclude that user preferences play a major role. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and it suggests the potential to exploit these naturally emerging patterns. As a proof of concept, we design and evaluate an information dissemination system that exploits the small-world interest-sharing graphs by building an interest-aware network overlay. We show that this approach leads to improved information dissemination performance.
105
ThriftStore: Finessing Reliability Trade-Offs in Replicated Storage Systems
This paper explores the feasibility of a storage architecture that offers the reliability and access performance characteristics of a high-end system, yet is cost-efficient. We propose ThriftStore, a storage architecture that integrates two types of components: volatile, aggregated storage and dedicated, yet low-bandwidth durable storage. On the one hand, the durable storage forms a back end that enables the system to restore the data the volatile nodes may lose. On the other hand, the volatile nodes provide a high-throughput frontend. Although integrating these components has the potential to offer a unique combination of high
Madurai
Trichy
Kollam
31

throughput and durability at a low cost, a number of concerns need to be addressed to architect and correctly provision the system. To this end, we develop analytical and simulation-based tools to evaluate the impact of system characteristics (e.g., bandwidth limitations on the durable and the volatile nodes) and design choices (e.g., the replica placement scheme) on data availability and the associated system costs (e.g., maintenance traffic). Moreover, to demonstrate the high-throughput properties of the proposed architecture, we prototype a GridFTP server based on ThriftStore. Our evaluation demonstrates an impressive, up to 800 Mbps transfer throughput for the new GridFTP service.
106
Throughput Optimization in Multihop Wireless Networks with Multipacket Reception and Directional Antennas
Recent advances in the physical layer have enabled the simultaneous reception of multiple packets by a node in wireless networks. We address the throughput optimization problem in wireless networks that support multipacket reception (MPR) capability. The problem is modeled as a joint routing and scheduling problem, which is known to be NP-hard. The scheduling subproblem deals with finding the optimal schedulable sets, which are defined as subsets of links that can be scheduled or activated simultaneously. We demonstrate that any solution of the scheduling subproblem can be built with jEj 1 or fewer schedulable sets, where jEj is the number of links of the network. This result is in contrast with previous works that stated that a solution of the scheduling subproblem is composed of an exponential number of schedulable sets. Due to the hardness of the problem, we propose a polynomial time scheme based on a combination of linear programming and approximation algorithm paradigms. We illustrate the use of the scheme to study the impact of design parameters on the performance of MPR-capable networks, including the number of transmit interfaces, the beamwidth, and the receiver range of the antennas.
107
Timely Result-Data Offloading for Improved HPC Center Scratch Provisioning and Serviceability
Modern High-Performance Computing (HPC) centers are facing a data deluge from emerging scientific applications. Supporting large data entails a significant commitment of the high-throughput center storage system, scratch space. However, the scratch space is typically managed using simple purge policies, without sophisticated end-user data services to balance resource consumption and user serviceability. End-user data services such as offloading are performed using point-to-point transfers that are unable to reconcile centers purge and users delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. Such inefficiencies can be prohibitive to sustaining high performance. In this paper, we address the above issues by designing a framework for the timely, decentralized offload of application result data. Our framework uses an overlay of userspecified intermediate and landmark sites to orchestrate a decentralized fault-tolerant delivery. We have implemented our techniques within a production job scheduler (PBS) and data transfer tool (BitTorrent). Our evaluation using both a real implementation and supercomputer job log-driven simulations show that: the offloading times can be significantly reduced (90.4 percent for a 5 GB data transfer); the exposure window can be minimized while also meeting center-user service level agreements.
108
Madurai
Toward Efficient and Simplified Distributed Data Intensive Computing
Trichy
Kollam
32

While the capability of computing systems has been increasing at Moores Law, the amount of digital data has been increasing even faster. There is a growing need for systems that can manage and analyze very large data sets, preferably on sharednothing commodity systems due to their low expense. In this paper, we describe the design and implementation of a distributed file system called Sector and an associated programming framework called Sphere that processes the data managed by Sector in parallel. Sphere is designed so that the processing of data can be done in place over the data whenever possible. Sometimes, this is called data locality. We describe the directives Sphere supports to improve data locality. In our experimental studies, the Sector/Sphere system has consistently performed about 2-4 times faster than Hadoop, the most popular system for processing very large data sets.
109
Traceback of DDoS Attacks Using Entropy Variations
Distributed Denial-of-Service (DDoS) attacks are a critical threat to the Internet. However, the memoryless feature of the Internet routing mechanisms makes it extremely hard to trace back to the source of these attacks. As a result, there is no effective and efficient method to deal with this issue so far. In this paper, we propose a novel traceback method for DDoS attacks that is based on entropy variations between normal and DDoS attack traffic, which is fundamentally different from commonly used packet marking techniques. In comparison to the existing DDoS traceback methods, the proposed strategy possesses a number of advantagesit is memory nonintensive, efficiently scalable, robust against packet pollution, and independent of attack traffic patterns. The results of extensive experimental and simulation studies are presented to demonstrate the effectiveness and efficiency of the proposed method. Our experiments show that accurate traceback is possible within 20 seconds (approximately) in a large-scale attack network with thousands of zombies.
110
Traffic-Aware Relay Node Deployment: Maximizing Lifetime for Data Collection Wireless Sensor Networks
Wireless sensor networks have been widely used for ambient data collection in diverse environments. While in many such networks the nodes are randomly deployed in massive quantity, there is a broad range of applications advocating manual deployment. A typical example is structure health monitoring, where the sensors have to be placed at critical locations to fulfill civil engineering requirements. The raw data collected by the sensors can then be forwarded to a remote base station (the sink) through a series of relay nodes. In the wireless communication context, the operation time of a battery-limited relay node depends on its traffic volume and communication range. Hence, although not bounded by the civil-engineering-like requirements, the locations of the relay nodes have to be carefully planned to achieve the maximum network lifetime. The deployment has to not only ensure connectivity between the data sources and the sink, but also accommodate the heterogeneous traffic flows from different sources and the dominating manyto-one traffic pattern. Inspired by the uniqueness of such application scenarios, in this paper, we present an in-depth study on the traffic-aware relay node deployment problem. We develop optimal solutions for the simple case of one source node, both with single and multiple traffic flows. We show however that the general form of the deployment problem is difficult, and the existing only connectivity guaranteed solutions cannot be directly applied here. We then transform our problem into a generalized version of the Euclidean Steiner Minimum Tree problem (ESMT). Nevertheless, we face further challenges as its solution is in continuous space and may yield fractional numbers of relay nodes, where simple rounding of the solution can lead to poor performance. We thus develop algorithms for discrete relay node assignment, together with local adjustments that yield high-quality practical solutions. Our solution has been evaluated through both numerical analysis and ns-2 simulations and compared with state-of-the-art approaches. The results show that for all test cases where the continuous space optimal solution can be computed within acceptable time frames, the network lifetime achieved by our solution is very close to the upper bound of the optimal solution (the difference is less than 13.5 percent). Moreover, it achieves up to 6-14 times improvement over the existing traffic-oblivious strategies.
Madurai
Trichy
Kollam
33

111
Transforming Complete Coverage Algorithms to Partial Coverage Algorithms for Wireless Sensor Networks
The complete area coverage problem in Wireless Sensor Networks (WSNs) has been extensively studied in the literature. However, many applications do not require complete coverage all the time. For such applications, one effective method to save energy and prolong network lifetime is to partially cover the area. This method for prolonging network lifetime recently attracts much attention. However, due to the hardness of verifying the coverage ratio, all the existing centralized or distributed but nonparallel algorithms for partial coverage have very high time complexities. In this work, we propose a framework which can transform almost any existing complete coverage algorithm to a partial coverage one with any coverage ratio by running a complete coverage algorithm to find full coverage sets with virtual radii and converting the coverage sets to partial coverage sets via adjusting sensing radii. Our framework can preserve the characteristics of the original algorithms and the conversion process has low time complexity. The framework also guarantees some degree of uniform partial coverage of the monitored area.
112
Understanding Disconnection and Stabilization of Chord
Previous analytical work [16], [17] on the resilience of P2P networks has been restricted to disconnection arising from simultaneous failure of all neighbors in routing tables of participating users. In this paper, we focus on a different technique for maintaining consistent graphsChords successor sets and periodic stabilizationsunder both static and dynamic node failure. We derive closed-form models for the probability that Chord remains connected under both types of node failure and show the effect of using different stabilization interval lengths (i.e., exponential, uniform, and constant) on the probability of partitioning in Chord.
113
Utilization-Based Resource Partitioning for Power-Performance Efficiency in SMT Processors
Simultaneous multithreading (SMT) increases processor throughput by allowing parallel execution of several threads. However, fully sharing processor resources may cause resource monopolization by a single thread or other misallocations, resulting in overall performance degradation. Static resource partitioning techniques have been suggested, but are not as effective as dynamic ones since program behavior does change over the course of its execution. In this paper, we propose an Adaptive Resource Partitioning Algorithm (ARPA) that dynamically assigns resources to threads according to changes in thread behavior. ARPA analyzes the resource usage efficiency of each thread in a given time period and assigns more resources to threads which can use them more efficiently. Its purpose is to improve the efficiency of resource utilization, thereby improving overall instruction throughput. Our simulation results on a set of 42 multiprogramming workloads show that ARPA outperforms the traditional fetch policy ICOUNT by 55.8 percent with regard to overall instruction throughput and achieves a 33.8 percent improvement over Static Partitioning. It also outperforms the current best dynamic resource allocation technique, Hill-climbing, by 5.7 percent. Considering fairness accorded to each thread, ARPA attains 43.6, 18.5, and 9.2 percent improvements over ICOUNT, Static Partitioning, and Hill-climbing, respectively, using a common fairness metric. We also explore the energy
Madurai
Trichy
Kollam
34

efficiency of dynamically controlling the number of powered-on reorder buffer entries for ARPA. Compared with ARPA, our energy-aware resource partitioning algorithm achieves 10.6 percent energy savings, while the performance loss is negligible.
114
Video Streaming Distribution in VANETs
Streaming applications will rapidly develop and contribute a significant amount of traffic in the near future. A problem, scarcely addressed so far, is how to distribute video streaming traffic from one source to all nodes in an urban vehicular network. This problem significantly differs from previous work on broadcast and multicast in ad hoc networks because of the highly dynamic topology of vehicular networks and the strict delay requirements of streaming applications.We present a solution for intervehicular communications, called Streaming Urban Video (SUV), that 1) is fully distributed and dynamically adapts to topology changes, and 2) leverages the characteristics of streaming applications to yield a highly efficient, cross-layer solution.
115
Yet Another Simple Solution for the Concurrent Programming Control Problem
As multicore processors are becoming increasingly common everywhere, the future computing systems and devices are becoming inevitably concurrent. Also, on the applications side, automation is steadily infiltrating into everyday life, and hence, most software systems are becoming increasingly complex and concurrent. As a result, recent developments and projections indicate that we are entering into the era of concurrent programming. Synchronizing asynchronous concurrent processes in accessing a shared resource is an important issue. Among the synchronization issues, mutual exclusion is fundamental. Solutions to most higher level synchronization problems rely on the assurance of mutual exclusion. Several algorithms with varying characteristics are proposed in the literature to solve the mutual exclusion problem. This paper presents two new algorithms to solve the mutual exclusion problem. The algorithms are simple and have many nice properties.
Madurai
Trichy
Kollam
35

Elysium Technologies Private Limited: Churn-Resilient Protocol For Massive Data Dissemination in P2P Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Elysium Technologies Private Limited: Churn-Resilient Protocol For Massive Data Dissemination in P2P Networks

Uploaded by

Copyright:

Available Formats

Elysium Technologies Private Limited

ISO 9001:2008 A leading Research and Development Division

IEEE Project List 2011 - 2012

Churn-Resilient Protocol for Massive Data Dissemination in P2P Networks

Cloud Technologies for Bioinformatics Applications

Collective Receiver-Initiated Multicast for Grid Applications