Professional Documents
Culture Documents
1, JANUARY-MARCH 2015
Abstract—Despite the advances in hardware for hand-held mobile devices, resource-intensive applications (e.g., video and image
storage and processing or map-reduce type) still remain off bounds since they require large computation and storage capabilities.
Recent research has attempted to address these issues by employing remote servers, such as clouds and peer mobile devices.
For mobile devices deployed in dynamic networks (i.e., with frequent topology changes because of node failure/unavailability and
mobility as in a mobile cloud), however, challenges of reliability and energy efficiency remain largely unaddressed. To the best of our
knowledge, we are the first to address these challenges in an integrated manner for both data storage and processing in mobile
cloud, an approach we call k-out-of-n computing. In our solution, mobile devices successfully retrieve or process data, in the most
energy-efficient way, as long as k out of n remote servers are accessible. Through a real system implementation we prove the feasibility
of our approach. Extensive simulations demonstrate the fault tolerance and energy efficiency performance of our framework in larger
scale networks.
Index Terms—Mobile computing, cloud computing, mobile cloud, energy-efficient computing, fault-tolerant computing
1 INTRODUCTION
where E is a set of edges indicating the communication The first constraint (Eq. (2)) selects exactly n nodes as
links among nodes. Each node has an associated failure prob- storage nodes; the second constraint (Eq. (3)) indicates that
ability P ½fi where fi is the event that causes node vi to fail. each node has access to k storage nodes; the third constraint
Relationship Matrix R is a N N matrix defining the rela- (Eq (4)) ensures that jth column of R can have a non-zero
tionship between nodes and storage nodes. More precisely, element if only if Xj is 1; and constraints (Eq (5)) are binary
each element Rij is a binary variable—if Rij is 0, node i will requirements for the decision variables.
not retrieve data from storage node j; if Rij is 1, node i will
retrieve fragment from storage node j. Storage node list X is 2.3 Formulation of k-Out-of-n Data Processing
a binary vector containing storage nodes, i.e., Xi ¼ 1 indi- Problem
cates that vi is a storage node. The objective of this problem is to find n nodes in V as pro-
The Expected Transmission Time Matrix D is defined as a cessor nodes such that energy consumption for processing a
N N matrix where element Dij corresponds to the ETT for job of M tasks is minimized. In addition, it ensures that the
transmitting a fixed size packet from node i to node j consid- job can be completed as long as k or more processors nodes
ering the failure probabilities of nodes in the network, i.e., finish the assigned tasks. Before a client node starts process-
multiple possible paths between node i and node j. The ETT ing a data object, assuming the correctness of erasure cod-
metric [7] has been widely used for estimating transmission ing, it first needs to retrieve and decode k data fragments
time between two nodes in one hop. We assign each edge of because nodes can only process the decoded plain data
graph G a positive estimated transmission time. Then, the object, but not the encoded data fragment.
path with the shortest transmission time between any two In general, each node may have different energy cost
nodes can be found. However, the shortest path for any pair depending on their energy sources; e.g., nodes attached
of nodes may change over time because of the dynamic to a constant energy source may have zero energy cost
topology. ETT, considering multiple paths due to nodes while nodes powered by battery may have relatively
failures, represents the “expected” transmission time, or high energy cost. For simplicity, we assume the network
“expected” transmission energy between two nodes. is homogeneous and nodes consume the same amount of
Scheduling Matrix S is an L N M matrix where ele- energy for processing the same task. As a result, only
ment Slij ¼ 1 indicates that task j is scheduled at time l on the transmission energy affects the energy efficiency of the
node i; otherwise, Slij ¼ 0. l is a relative time referenced to final solution. We leave the modeling of the general case
the starting time of a job. Since all tasks are instantiated as future work.
from the same function, we assume they spend approxi- Before formulating the problem, we define some func-
mately the same processing time on any node. Given the tions: (1) f1 ðiÞ returns 1 if node i in S has at least one
terms and notations, we are ready to formally describe the task; otherwise, it returns 0; (2) f2 ðjÞ returns the number
k-out-of-n data allocation and k-out-of-n data processing of instances of task j in S; and (3) f3 ðz; jÞ returns the
problems. transmission cost of task j when it is scheduled for the
zth time. We now formulate the k out of n data
2.2 Formulation of k-Out-of-n Data Allocation processing problem as shown in Eqs. (6), (7), (8), (9),
Problem (10), and (11).
In this problem, we are interested in finding n storage nodes The objective function (Eq (6)) minimizes the total trans-
denoted by S ¼ fs1 ; s2 ; . . . sn g; S V such that the total mission cost for all processor nodes to retrieve their tasks. l
expected transmission cost from any node to its k closest stor- represents the time slot of executing a task; i is the index of
age nodes — in terms of ETT—is minimized. We formulate nodes in the network; j is the index of the task of a job. We
this problem as an ILP in Eqs. (1), (2), (3), (4), and (5). note here that T r , the Data Retrieval Time Matrix, is a N M
matrix, where the element Tijr corresponds to the estimated
X
N X
N
time for node i to retrieve task j. T r is computed by sum-
Ropt ¼ arg min Dij Rij (1)
R ming the transmission time (in terms of ETT available in D)
i¼1 j¼1
from node i to its k closest storage nodes of the task.
X
N
L X
X N X
M
Subject to : Xj ¼ n (2) minimize Slij Tijr (6)
j¼1
l¼1 i¼1 j¼1
X
N
Rij ¼ k 8i (3) X
N
j¼1 Subject to : f1 ðiÞ ¼ n (7)
i
Xj Rij 0 8i (4)
f2 ðjÞ ¼ n k þ 1 8j (8)
X
N 3.2.1 Failure by Energy Depletion
Slij 1 8l; j (10) Estimating the remaining energy of a battery-powered
i¼1 device is a well-researched problem [8]. We adopt the
remaining energy estimation algorithm in [8] because of its
X
M X
M simplicity and low overhead. The algorithm uses the his-
f3 ðz1 ; jÞ f3 ðz2 ; jÞ 8z1 z2 : (11) tory of periodic battery voltage readings to predict the bat-
j¼1 j¼1 tery remaining time. Considering that the error for
estimating the battery remaining time follows a normal dis-
tribution [9], we find the probability that the battery
The first constraint (Eq. (7)) ensures that n nodes in the
remaining time is less than T by calculating the cumulative
network are selected as processor nodes. The second con-
distributed function (CDF) at T . Consequently, the pre-
straint (Eq. (8)) indicates that each task is replicated
dicted battery remaining time x is a random variable fol-
n k þ 1 times in the schedule such that any subset of k
lowing a normal distribution with mean m and standard
processor nodes must contain at least one instance of
deviation s, as given by
each task. The third constraint (Eq. (9)) states that each
task is replicated at most once to each processor node.
P fiB ¼ P ½Rem: time < T j Current Energy
The fourth constraint (Eq. (10)) ensures that no duplicate Z T
instances of a task execute at the same time on different 1 1 xm 2
¼ fðx; m; s 2 Þdx,fðx; m; s 2 Þ ¼ pffiffiffiffiffiffi e2ð s Þ :
nodes. The fifth constraint (Eq. (11)) ensures that a set of 1 s 2p
all tasks completed at earlier time should consume lower
energy than a set of all tasks completed at later time. In 3.2.2 Failure by Temporary Disconnection
other words, if no processor node fails and each task com-
Nodes can be temporarily disconnected from a network, e.
pletes at the earliest possible time, these tasks should con-
g., because of the mobility of nodes, or simply when users
sume the least energy.
turn off the devices. The probability of temporary discon-
nection differs from application to application, but this
3 ENERGY EFFICIENT AND FAULT TOLERANT DATA information can be inferred from the history: a node grad-
ALLOCATION AND PROCESSING ually learns its behavior of disconnection and cumulatively
This section presents the details of each component in our creates a probability distribution of its disconnection.
framework. Then, given the current time t, we can estimate the proba-
bility that a node is disconnected
from the network by
3.1 Topology Discovery the time t þ T as follows: P fiC ¼ P ½Node i disconnected
Topology Discovery is executed during the network between t and t þ T .
initialization phase or whenever a significant change of
the network topology is detected (as detected by the 3.3.3 Failure by Application-Dependent Factors
Topology Monitoring component). During Topology Dis- Some applications require nodes to have different roles.
covery, one delegated node floods a request packet In a military application for example, some nodes are
throughout the network. Upon receiving the request equipped with better defense capabilities and some nodes
packet, nodes reply with their neighbor tables and failure may be placed in high-risk areas, rendering different fail-
probabilities. Consequently, the delegated node obtains ure probabilities among nodes. Thus, we define the fail-
global connectivity information and failure probabilities ure probability P ½fiA for application-dependent factors.
of all nodes. This topology information can later be que- This type of failure is, however, usually explicitly known
ried by any node. prior to the deployment.
Fig. 3. k-out-of-n data processing example with N ¼ 9; n ¼ 5; k ¼ 3. (a) and (c) are two different task allocations and (b) and (d) are their tasks sched-
uling respectively. In both cases, node 3; 4; 6; 8; and 9 are selected as processor nodes and each task is replicated to three different processor nodes.
(e) shows that shifting tasks reduce the job completion time from 6 to 5.
task completes, all other instances will be canceled. The A task can be moved to an earlier time slot as long as no
task allocation can be formulated as an ILP as shown in duplicate task is running at the same time, e.g., in Fig. 3d,
Eqs. (12), (13), (14), (15), and (16). In the formulation, Rij task 1 on node 6 can be safely moved to time slot 2 because
is a N M matrix which predefines the relationship there is no task 1 scheduled at time slot 2.
between processor nodes and tasks; each element Rij is a The ILP problem shown in Eqs. (17), (18), (19), and (20)
binary variable indicating whether task j is assigned to finds M unique tasks from Rij that have the minimal trans-
processor node i. X is a binary vector containing proces- mission cost. The decision variable W is an N M matrix
sor nodes, i.e., Xi ¼ 1 indicates that vi is a processor where Rij ¼ 1 indicates that task j is selected to be executed
node. The objective function minimizes the transmission on processor node i. The first constraint (Eq. (18)) ensures
time for n processor nodes to retrieve all their tasks. The that each task is scheduled exactly one time. The second
first constraint (Eq. (13)) indicates that n of the N nodes constraint (Eq. (19)) indicates that Wij can be set only if task
will be selected as processor nodes. The second constraint j is allocated to node i in Rij . The last constraint (Eq. (20)) is
(Eq. (14)) replicates each task to ðn k þ 1Þ different pro- a binary requirement for decision matrix W .
cessor nodes. The third constraint (Eq. (15)) ensures that
the jth column of R can have a non-zero element if only X
N X
M
if Xj is 1; and the constraints (Eq. (16)) are binary require- WE ¼ arg min Tij Rij Wij (17)
W i¼1 j¼1
ments for the decision variables.
X
N X
M X
N
Ropt ¼ arg min Tijr Rij (12) Subject to: Wij ¼ 1 8j (18)
i¼1
R i¼1 j¼1
X
N Rij Wij 0 8i; j (19)
Subject to: Xi ¼ n (13)
i¼1
Wij 2 f0; 1g 8i; j (20)
X
N
Rij ¼ n k þ 1 8j (14)
i¼1
minimize Y (21)
Xi Rij 0 8i (15) Y
N X
X M
Xj and Rij 2 f0; 1g8i; j (16) Subject to: Tij Rij Wij Emin (22)
i¼1 j¼1
P PM
previously, Emin ¼ N i¼1 j¼1 Tij Rij WE , is used as the predefine one node as a topology_delegate Vdel who is respon-
“upper bound” for searching P a task schedule. sible for maintaining the global topology information. If p of
If we define Li ¼ M j¼1 Wij as the number of tasks a node is greater than the threshold t 1 , the node changes its
assigned to node i, Li indicates the completion time of node state to U and piggybacks its ID on a beacon message.
i. Then, our objective becomes to minimize the largest number Whenever a node with state U finds that its p becomes
of tasks in one node, written as minfmaxi2½1;N fLi gg. To solve smaller than t 1 , it changes its state back to NU and puts
this min-max problem, we formulate the problem as shown ID in a beacon message. Upon receiving a beacon mes-
in Eqs. (21), (22), (23), (24), (25), and (26). sage, nodes check the IDs in it. For each ID, nodes add the
The objective function minimizes integer variable Y , ID to set ID if the ID is positive; otherwise, remove the ID.
which is the largest number of tasks on one node. Wij is a If a client node finds that the size of set ID becomes greater
decision variable similar to Wij defined previously. The first than t 2 , a threshold for “significant” global topology
constraint (Eq. (22)) ensures that the schedule cannot con- change, the node notifies Vdel ; and Vdel executes the Topol-
sume more energy that the Emin calculated previously. The ogy Discovery protocol. To reduce the amount of traffic, cli-
second constraint (Eq. (23)) schedules each task exactly ent nodes request the global topology from Vdel , instead of
once. The third constraint (Eq. (25)) forces Y to be the larg- running the topology discovery by themselves. After Vdel
est number of tasks on one node. The last constraint completes the topology update, all nodes reset their status
(Eq. (26)) is a binary requirement for decision matrix W . variables back to NU and set p ¼ 0.
Once tasks are scheduled, we then rearrange tasks—tasks
are moved to earlier time slots as long as there is free time Algorithm 2. Distributed Topology Monitoring
slot and no same task is executed on other node simulta-
1: At each beacon interval:
neously. Algorithm 1 depicts the procedure. Note that
2: if p > t1 and s 6¼ U then
k-out-of-n data processing ensures that k or more functional
processing nodes complete all tasks of a job with probabil- 3: s U
ity 1. In general, it may be possible that a subset of processing 4: Put þ ID to a beacon message.
nodes, of size less than k, complete all tasks. 5: end if
6: if p t1 and s ¼ U then
Algorithm 1. Schedule Re-Arrangement 7: s NU
8: Put ID to a beacon message.
1: L ¼ last time slot in the schedule 9: end if
2: for time t ¼ 2 ! L do 10:
3: for each scheduled task J in time t do 11: Upon receiving a beacon message on Vi :
4: n processor node of task J 12: for each ID in the received beacon message do
5: while n is idle at t 1 AND 13: if ID > 0 then
6: J is NOT scheduled on any node at t 1 do S
14: ID ID fIDg:
7: Move J from t to t 1 15: else
8: t¼t1 16: ID ID n fIDg:
9: end while 17: end if
10: end for 18: end for
11: end for 19: if jfIDgj > t2 then
20: Notify Vdel and Vdel initiate topology discovery
21: end if
3.6 Topology Monitoring 22: Add the ID in Vi0 s beacon message.
The Topology Monitoring component monitors the network
topology continuously and runs in distributed manner on
all nodes. Whenever a client node needs to create a file, the
Topology Monitoring component provides the client with 4 SYSTEM EVALUATION
the most recent topology information immediately. When This section investigates the feasibility of running our
there is a significant topology change, it notifies the frame- framework on real hardware. We compare the performance
work to update the current solution. We first give several of our framework with a random data allocation and proc-
notations. A term s refers to a state of a node, which can be essing scheme (Random), which randomly selects storage/
either U and NU. The state becomes U when a node finds processor nodes. Specifically, to evaluate the k-out-of-n data
that its neighbor table has drastically changed; otherwise, a allocation on real hardware, we implemented a Mobile Dis-
node keeps the state as NU. We let p be the number of tributed File System (MDFS) on top of our k-out-of-n com-
entries in the neighbor table that has changed. A set ID con- puting framework. We also test our k-out-of-n data
tains the node IDs with p greater than t 1 , a threshold param- processing by implementing a face recognition application
eter for a “significant” local topology change. that uses our MDFS.
The Topology Monitoring component is simple yet Fig. 4 shows an overview of our MDFS. Each file is
energy-efficient as it does not incur significant communica- encrypted and encoded by erasure coding into n1 data
tion overhead—it simply piggybacks node ID on a beacon fragments, and the secret key for the file is decomposed
message. The protocol is depicted in Algorithm 2. We into n2 key fragments by key sharing algorithm. Any
CHEN ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT DATA STORAGE AND PROCESSING IN MOBILE CLOUD 35
Fig. 4. An overview of improved MDFS. Fig. 6. Current consumption on Smartphone in different states.
maximum distance separable code can be used to encode the analyze a set of images. In average, it took about 3-4 seconds
data and the key; in our experiment, we adopt the well- to process an image of size 2 MB. Processing a sequence of
developed Reed-Solomon code and Shamir’s Secret Shar- images, e.g., a video stream, the time may increase in an
ing algorithm. The n1 data fragments and n2 key frag- order of magnitude. The peak memory usage of our applica-
ments are then distributed to nodes in the network. When tion was around 3 MB. In addition, for a realistic energy
a node needs to access a file, it must retrieve at least k1 consumption model in simulations, we profiled the energy
file fragments and k2 key fragments. Our k-out-of-n data consumption of our application (e.g., WiFi-idle, transmis-
allocation allocates file and key fragments optimally sion, reception, and 100 percent-cpu-utilization). Fig. 5
when compared with the state-of-art [10] that distributes shows our experimental setting and Fig. 6 shows the energy
fragments uniformly to the network. Consequently, our profile of our smartphone in different operating states. It
MDFS achieves higher reliability (since our framework shows that Wi-Fi component draws significant current dur-
considers the possible failures of nodes when determining ing the communication(sending/receiving packets) and the
storage nodes) and higher energy efficiency (since storage consumed current stays constantly high during the trans-
nodes are selected such that the energy consumption for mission regardless the link quality.
retrieving data by any node is minimized). Fig. 7 shows the overhead induced by encoding data.
We implemented our system on HTC Evo 4G Smart- Given a file of 4.1 MB, we encoded it with different k values
phone, which runs Android 2.3 operating system using 1G while keeping parameter n ¼ 8. The left y-axis is the size of
Scorpion CPU, 512 MB RAM, and a Wi-Fi 802.11 b/g inter- each encoded fragment and the right y-axis is the percent-
face. To enable the Wi-Fi AdHoc mode, we rooted the age of the overhead. Fig. 8 shows the system reliability with
device and modified a config file—wpa_supplicant.conf. respect to different k while n is constant. As expected,
The Wi-Fi communication range on HTC Evo 4G is 80-100 smaller k=n ratio achieves higher reliability while incurring
m. Our data allocation was programmed with 6,000 lines of more storage overhead. An interesting observation is that
Java and C++ code. the change of system reliability slows down at k ¼ 5 and
The experiment was conducted by eight students who reducing k further does not improve the reliability much.
carry smartphones and move randomly in an open space. Hence, k ¼ 5 is a reasonable choice where overhead is low
These smartphones formed an Ad-Hoc network and the ( 60 percent of overhead) and the reliability is high
longest node to node distance was three hops. Students ( 99 percent of the highest possible reliability).
took pictures and stored in our MDFS. To evaluate the To validate the feasibility of running our framework on a
k-out-of-n data processing, we designed an application that commercial smartphone, we measured the execution time
searches for human faces appearing in all stored images. of our MDFS application in Fig. 9. For this experiment we
One client node initiates the processing request and all varied network size N and set n ¼ d0:6N e, k ¼ d0:6ne,
selected processor nodes retrieve, decode, decrypt, and
Fig. 7. A file of 4:1 MB is encoded with n fixed to eight and k swept from 1
Fig. 5. Energy measurement setting. to 8.
36 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
Fig. 8. Reliability with respect to different k=n ratio and failure probability. Fig. 10. Execution time of different components with respect to various
network size. File retrieval time of Random and our algorithm (KNF) is
also compare here.
k1 ¼ k2 ¼ k, and n1 ¼ n2 ¼ n. As shown, nodes spent much
longer time in distributing/retrieving fragments than other
components such as data encoding/decoding. We also mobility trace contains 4 hours of data with 1 Hz sampling
observe that the time for distributing/retrieving fragments rate. Nodes beacon every 30 seconds.
increased with the network size. This is because fragments We compare our KNF with two other schemes—a
are more sparsely distributed, resulting in longer paths to greedy algorithm (Greedy) and a random placement algo-
distribute/retrieve fragments. We then compared the data rithm (Random). Greedy selects nodes with the largest
retrieval time of our algorithm with the data retrieval time number of neighbors as storage/processor nodes because
of random placement. Fig. 10 shows that our framework nodes with more neighbors are better candidates for cluster
achieved 15 to 25 percent lower data retrieval time than heads and thus serve good facility nodes. Random selects
Random. To validate the performance of our k-out-of-n data storage or processor nodes randomly. The goal is to evalu-
processing, we measured the completion rate of our face- ate how the selected storage nodes impact the performance.
recognition job by varying the number of failure node. The We measure the following metrics: consumed energy for
face recognition job had an average completion rate of retrieving data, consumed energy for processing a job, data
95 percent in our experimental setting. retrieval rate, completion time of a job, and completion rate
of a job. We are interested in the effects of the following
parameters—mobility model, node speed, k=n ratio, t 2 , and
5 SIMULATION RESULTS number of failed nodes, and scheduling. The default values for
We conducted simulations to evaluate the performance of the parameters are: N ¼ 26, n ¼ 7, k ¼ 4, t 1 ¼ 3, t 2 ¼ 20;
our k-out-of-n framework (denoted by KNF) in larger scale our default mobility model is RPGM with node-speed 1 m/
networks. We consider a network of 400400 m2 where up s. A node may fail due to two independent factors: depleted
to 45 mobile nodes are randomly deployed. The communi- energy or an application-dependent failure probability;
cation range of a node is 130 m, which is measured on our specifically, the energy associated with a node decreases as
smartphones. Two different mobility models are tested— the time elapses, and thus increases the failure probability.
Markovian Waypoint Model and Reference Point Group Each node is assigned a constant application-dependent
Mobility (RPGM). Markovian Waypoint is similar to Ran- failure probability.
dom Waypoint Model, which randomly selects the way- We first perform simulations for k-out-of-n data alloca-
point of a node, but it accounts for the current waypoint tion by varying the first four parameters and then simulate
when it determines the next waypoint. RPGM is a group the k-out-of-n data processing with different number of
mobility model where a subset of leaders are selected; each failed nodes. We evaluate the performance of data process-
leader moves based on Markovian Waypoint model and ing only with the number of node failures because data
other non-leader nodes follow the closest leader. Each processing relies on data retrieval and the performance of
data allocation directly impacts the performance of data
processing. If the performance of data allocation is already
bad, we can expect the performance of data processing will
not be any better.
The simulation is performed in Matlab. The energy pro-
file is taken from our real measurements on smartphones;
the mobility trace is generated according to RPGM mobility
model; and the linear programming problem is solved by
the Matlab optimization toolbox.
spend higher energy in retrieving data compared with the have to communicate with some storage nodes farther
static network. It also shows that the energy consumption away, leading to higher energy consumption. Although
for RPGM is smaller than that for Markov. The reason is lower k=n is beneficial for both retrieval rate and energy effi-
that a storage node usually serves the nodes in its proxim- ciency, it requires more storage and longer data distribution
ity; thus when nodes move in a group, the impact of mobil- time. A 1 MB file with k=n ¼ 0:6 in a network of eight nodes
ity is less severe than when all nodes move randomly. In all may take 10 seconds or longer to be distributed (as shown
scenarios, KNF consumes lower energy than others. in Fig. 10).
Fig. 12. Effect of k=n ratio on data retrieval rate when n ¼ 7. Fig. 14. Effect of t 2 and node speed on data retrieval rate.
38 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
Fig. 15. Effect of t 2 and node speed on energy efficiency. Fig. 17. Effect of node failure on energy efficiency with fail-fast.
Fig. 16. Effect of node failure on energy efficiency with fail-slow. Fig. 18. Effect of node failure on completion ratio with fail-slow.
Fig. 20. Effect of node failure on completion time with fail-slow. Fig. 22. Comparison of performance before and after scheduling algo-
rithm on job completion time.
Chen et al. [22], [23] distribute data and process the distrib- [5] S. M. George, W. Zhou, H. Chenji, M. Won, Y. Lee, A. Pazarloglou,
R. Stoleru, and P. Barooah, “DistressNet: A wireless Ad Hoc and
uted data in a dynamic network. Both the distributed data sensor network architecture for situation management in disaster
and processing tasks are allocated in an energy-efficient and response,” IEEE Commun. Mag., vol. 48, no. 3, pp. 128–136, Mar.
reliable manner, but how to optimally schedule the task to 2010.
further reduce energy and job makespan is not considered. [6] D. W. Coit and J. Liu, “System reliability optimization with k-out-
of-n subsystems,” Int. J. Rel., Quality Safety Eng., vol. 7, no. 2,
Compared with the previous two works, this paper propose pp. 129–142, 2000.
an efficient k-out-of-n task scheduling algorithm that [7] D. S. J. D. Couto, “High-throughput routing for multi-hop wire-
reduces the job completion time and minimizes the energy less networks,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci.,
MIT, Cambridge, MA, USA, 2004.
wasted in executing duplicated tasks on multiple processor [8] Y. Wen, R. Wolski, and C. Krintz, “Online prediction of battery
nodes. Furthermore, the tradeoff between the system reli- lifetime for embedded and mobile devices,” in Proc. 3rd Int. Conf.
ability and the overhead, in terms of more storage space Power-Aware Comput. Syst., 2005, pp. 57–72.
and redundant tasks, is analyzed. [9] A. Leon-Garcia, Probability, Statistics, and Random Processes for
Electrical Engineering. Englewood Cliffs, NJ, USA: Prentice
Cloud computing in a small-scale network with battery- Hall, 2008.
powered devices has also gained attention recently. Cloudlet [10] S. Huchton, G. Xie, and R. Beverly, “Building and evaluating a k-
is a resource-rich cluster that is well-connected to the Inter- resilient mobile distributed file system resistant to device compro-
net and is available for use by nearby mobile devices [1]. A mise,” in Proc. Mil. Commun. Conf., 2011, pp. 1315–1320.
[11] A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Su, “A survey on
mobile device delivers a small Virtual Machine overlay to a network codes for distributed storage,” Proc. IEEE, vol. 99, no. 3,
cloudlet infrastructure and lets it take over the computation. pp. 476–489, Mar. 2011.
Similar works that use VM migration are also done in Clone- [12] D. Leong, A. G. Dimakis, and T. Ho, “Distributed storage alloca-
tion for high reliability,” in Proc. IEEE Int. Conf. Commun., 2010,
Cloud [2] and ThinkAir [3]. MAUI uses code portability pro- pp. 1–6.
vided by Common Language Runtime to create two [13] M. Aguilera, R. Janakiraman, and L. Xu, “Using erasure codes effi-
versions of an application: one runs locally on mobile devi- ciently for storage in a distributed system,” in Proc. Int. Conf.
ces and the other runs remotely [24]. MAUI determines Dependable Syst. Netw., 2005, pp. 336–345.
[14] M. Alicherry and T. Lakshman, “Network aware resource alloca-
which processes to be offloaded to remote servers based on tion in distributed clouds,” in Proc. IEEE Conf. Comput. Commun.,
their CPU usages. Serendipity considers using remote 2012, pp. 963–971.
computational resource from other mobile devices [4]. Most [15] A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware resource
allocation heuristics for efficient management of data centers for
of these works focus on minimizing the energy, but do not cloud computing,” Future Generation Comput. Syst., vol. 28, no. 5,
address system reliability. pp. 755–768, 2012.
[16] C. Liu, X. Qin, S. Kulkarni, C. Wang, S. Li, A. Manzanares,
and S. Baskiyar, “Distributed energy-efficient scheduling for
7 CONCLUSIONS data-intensive applications with deadline constraints on data
grids,” in Proc. IEEE Int. Perform., Comput. Commun. Conf.,
We presented the first k-out-of-n framework that jointly 2008, pp. 26–33.
addresses the energy-efficiency and fault-tolerance chal- [17] D. Shires, B. Henz, S. Park, and J. Clarke, “Cloudlet seeding:
lenges. It assigns data fragments to nodes such that other Spatial deployment for high performance tactical clouds,” in Proc.
World Congr. Comput. Sci., Comput. Eng., Applied Comput., 2012,
nodes retrieve data reliably with minimal energy consump- pp. 1–7.
tion. It also allows nodes to process distributed data such [18] D. Neumann, C. Bodenstein, O. F. Rana, and R. Krishnaswamy,
that the energy consumption for processing the data is mini- “STACEE: Enhancing storage clouds using edge devices,” in Proc.
1st ACM/IEEE Workshop Autonomic Comput. Economics, 2011,
mized. Through system implementation, the feasibility of pp. 19–26.
our solution on real hardware was validated. Extensive sim- [19] D. Huang, X. Zhang, M. Kang, and J. Luo, “MobiCloud: Building
ulations in larger scale networks proved the effectiveness of secure cloud framework for mobile computing
our solution. andcommunication,” in Proc. IEEE 5th Int. Symp. Serv. Oriented
Syst. Eng., 2010, pp. 27–34.
[20] P. Stuedi, I. Mohomed, and D. Terry, “WhereStore: Location-
ACKNOWLEDGMENTS based data storage for mobile devices interacting with the cloud,”
in Proc. 1st ACM Workshop Mobile Cloud Comput. Serv.: Soc. Netw.
This work was supported by Naval Postgraduate School Beyond, 2010, pp. 1:1–1:8.
under Grant No. N00244-12-1-0035 and the US National Sci- [21] S. Sobti, N. Garg, F. Zheng, J. Lai, Y. Shao, C. Zhang, E. Ziskind, A.
ence Foundation (NSF) under Grants #1127449, #1145858 Krishnamurthy, and R. Y. Wang, “Segank: A distributed mobile
storage system,” in Proc. 3rd USENIX Conf. File Storage Technol.,
and #0923203. 2004, pp. 239–252.
[22] C. Chen, M. Won, R. Stoleru, and G. Xie, “Resource allocation for
REFERENCES energy efficient k-out-of-n system in mobile ad hoc networks,” in
Proc. 22nd Int. Conf. Comput.r Commun. Netw., 2013, pp. 1–9.
[1] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case [23] C. A. Chen, M. Won, R. Stoleru, and G. Xie, “Energy-efficient
for VM-based cloudlets in mobile computing,” IEEE Pervasive fault-tolerant data storage and processing in dynamic network,”
Comput., vol. 8, no. 4, pp. 14–23, Oct.-Dec. 2009. in Proc. 14th ACM Int. Symp. Mobile Ad Hoc Netw. Comput., 2013,
[2] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, pp. 281–286.
“CloneCloud: Elastic execution between mobile device and [24] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu,
cloud,” in Proc. 6th Conf. Comput. Syst., 2011, pp. 301–314. R. Chandra, and P. Bahl, “MAUI: Making smartphones last longer
[3] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “ThinkAir: with code offload,” in Proc. 8th Int. Conf. Mobile Syst., Appl., Serv.,
Dynamic resource allocation and parallel execution in the cloud 2010, pp. 49–62.
for mobile code offloading,” in Proc. IEEE Conf. Comput. Commun.,
2012, pp. 945–953.
[4] C. Shi, V. Lakafosis, M. H. Ammar, and E. W. Zegura,
“Serendipity: Enabling remote computing among intermittently
connected mobile devices,” in Proc. 13th ACM Int. Symp. Mobile
Ad Hoc Netw. Comput., 2012, pp. 145–154.
CHEN ET AL.: ENERGY-EFFICIENT FAULT-TOLERANT DATA STORAGE AND PROCESSING IN MOBILE CLOUD 41
Chien-An Chen received the BS and MS degrees Geoffery G. Xie received the BS degree in com-
in electrical engineering from the University of puter science from Fudan University, China, and
California, Los Angeles, in 2009 and 2010, respec- the PhD degree in computer sciences from the
tively. He is currently working toward the PhD University of Texas, Austin. He is a professor in
degree in computer science and engineering at the Computer Science Department at the US
Texas A&M University. His research interests Naval Postgraduate School. He was a visiting sci-
include mobile computing, energy-efficient wire- entist in the School of Computer Science at
less network, and cloud computing on mobile Carnegie Mellon University, from 2003 to 2004,
devices. and recently visited the Computer Laboratory of
the University of Cambridge, United Kingdom. He
has published more than 60 articles in various
areas of networking. He was an editor of the Computer Networks journal
from 2001 to 2004. He cochaired the ACM SIGCOMM Internet Network
Myounggyu Won received the PhD degree in Management Workshop in 2007, and is currently a member of the work-
computer science from Texas A&M University at shop’s steering committee. His current research interests include net-
College Station, in 2013. He is a postdoctoral work analysis, routing design and theories, underwater acoustic
networks, and abstraction driven design and analysis of enterprise net-
researcher in the Department of Information and
works. He is a member of the IEEE.
Communication Engineering at Daegu Gyeongbuk
Institute of Science and Technology (DGIST),
South Korea. His research interests include wire-
less mobile systems, vehicular ad hoc networks,
intelligent transportation systems, and wireless " For more information on this or any other computing topic,
sensor networks. He received the Graduate please visit our Digital Library at www.computer.org/publications/dlib.
Research Excellence Award from the Department
of Computer Science and Engineering at Texas A&M University in 2012.