Deep Learning in Mobile and Wireless Networking: A Survey PDF

IEEE COMMUNICATIONS SURVEYS & TUTORIALS 1
Deep Learning in Mobile and Wireless Networking:

A Survey
Chaoyun Zhang, Paul Patras, and Hamed Haddadi
Abstract—The rapid uptake of mobile devices and the rising The growing diversity and complexity of mobile network
popularity of mobile applications and services pose unprece- architectures has made monitoring and managing the multi-
dented demands on mobile and wireless networking infrastruc- tude of networks elements intractable. Therefore, embedding
ture. Upcoming 5G systems are evolving to support exploding
versatile machine intelligence into future mobile networks is
arXiv:1803.04311v1 [cs.NI] 12 Mar 2018
mobile traffic volumes, agile management of network resource

to maximize user experience, and extraction of fine-grained real- drawing unparalleled research interest [6], [7]. This trend is
time analytics. Fulfilling these tasks is challenging, as mobile reflected in machine learning (ML) based solutions to prob-
environments are increasingly complex, heterogeneous, and evolv- lems ranging from radio access technology (RAT) selection [8]
ing. One potential solution is to resort to advanced machine to malware detection [9], as well as the development of
learning techniques to help managing the rise in data volumes
and algorithm-driven applications. The recent success of deep networked systems that support machine leaning practices
learning underpins new and powerful tools that tackle problems (e.g. [10], [11]). ML enables systematic mining of valuable
in this space. information from traffic data and automatically uncover corre-
In this paper we bridge the gap between deep learning lations that would otherwise have been too complex to extract
and mobile and wireless networking research, by presenting a by human experts [12]. As the flagship of machine learning,
comprehensive survey of the crossovers between the two areas.
We first briefly introduce essential background and state-of-the- deep learning has achieved remarkable performance in areas
art in deep learning techniques with potential applications to such as computer vision [13] and natural language processing
networking. We then discuss several techniques and platforms (NLP) [14]. Networking researchers are also beginning to
that facilitate the efficient deployment of deep learning onto recognize the power and importance of deep learning, and are
mobile systems. Subsequently, we provide an encyclopedic review exploring its potential to solve problems specific to the mobile
of mobile and wireless networking research based on deep
learning, which we categorize by different domains. Drawing networking domain [15], [16].
from our experience, we discuss how to tailor deep learning to Embedding deep learning into the 5G mobile and wireless
mobile environments. We complete this survey by pinpointing networks is well justified. In particular, data generated by
current challenges and open future directions for research. mobile environments are heterogeneous as these are usually
Index Terms—Deep Learning, Machine Learning, Mobile Net- collected from various sources, have different formats, and
working, Wireless Networking, Mobile Big Data, 5G Systems, exhibit complex correlations [17]. Traditional ML tools require
Network Management. expensive hand-crafted feature engineering to make accurate
inferences and decisions based on such data. Deep learning
I. I NTRODUCTION eliminates domain expertise as it employs hierarchical feature
I NTERNET connected mobile devices are penetrating every extraction, by which it efficiently distills information and
aspect of individuals’ life, work, and entertainment. The obtains increasingly abstract correlations from the data, while
increasing number of smartphones and the emergence of ever- minimizing the data pre-processing effort. Graphics Processing
more diverse applications trigger a surge in mobile data traffic. Unit (GPU)-based parallel computing further enables deep
Indeed, the latest industry forecasts indicate that the annual learning to make inferences within milliseconds. This facil-
worldwide IP traffic consumption will reach 3.3 zettabytes itates network analysis and management with high accuracy
(1015 MB) by 2021, with smartphone traffic exceeding PC and in a timely manner, overcoming the runtime limitations of
traffic by the same year [1]. Given the shift in user preference traditional mathematical techniques (e.g. convex optimization,
towards wireless connectivity, current mobile infrastructure game theory, meta heuristics).
faces great capacity demands. In response to this increasing de- Despite growing interest in deep learning in the mobile
mand, early efforts propose to agilely provision resources [2] networking domain, existing contributions are scattered across
and tackle mobility management distributively [3]. In the different research areas and a comprehensive yet concise
long run, however, Internet Service Providers (ISPs) must de- survey is lacking. This article fills this gap between deep
velop intelligent heterogeneous architectures and tools that can learning and mobile and wireless networking, by presenting
spawn the 5th generation of mobile systems (5G) and gradually an up-to-date survey of research that lies at the intersection
meet more stringent end-user application requirements [4], [5]. between these two fields. Beyond reviewing the most rele-
vant literature, we discuss the key pros and cons of various
C. Zhang and P. Patras are with the Institute for Computing Systems Archi- deep learning architectures, and outline deep learning model
tecture (ICSA), School of Informatics, University of Edinburgh, Edinburgh, selection strategies in view of solving mobile networking
UK. Emails: {chaoyun.zhang, paul.patras}@ed.ac.uk. H. Haddadi is with the
Dyson School of Design Engineering at Imperial College London. Email: problems. We further investigate methods that tailor deep
h.haddadi@imperial.ac.uk. learning to individual mobile networking tasks, to achieve best
Sec. II Sec. IV Sec. V

Sec. III
Related Work & Our Scope Enabling Deep Learning in Mobile & Deep Learning: State-of-the-Art
Deep Learning 101
Wireless Networking
Advanced Parallel Distributed Machine
Related books, Multilayer Boltzmann Auto-
Our scope and Evolution Principles Advantages
Computing Learning Systems
surveys and Perceptron Machine encoder
distinction
magazine papers Dedicated Fast
Fog
Deep Learning Optimisation
Computing
Libraries Algorithms Convolutional Recurrent Generative
Neural Neural Adversarial
Feature Unsupervised
Overviews of 5G Crossovers Network Network Network
Extraction Learning
Overviews of Network between Deep Tensorflow
Deep Learning Techniques and Learning and
(Py)Torch Hardware
Mobile Big Data Mobile Networking Big Data Multi-task Caffe(2) Deep Reinforcement
MXNET
Benefits Learning Software Learning
Theano
Sec. VI Sec. VII

Deep Learning Applications in Mobile & Wireless Network Tailored Deep Learning to Mobile Networks
Deep Learning Driven Deep Learning Driven Deep Learning Driven Mobility Mobile Devices Distributed Data Changing Mobile
Mobile Data Analysis Wireless Sensor Network Analysis and User Localisation and Systems Containers Environment
Network Data App-level Data Deep Deep

Mobility User Model Training
Lifelong Transfer
Analysis Analysis Analysis Localisation Parallelism Parallelism
Learning Learning
Network Traffic CDR Mobile Mobile Pattern Mobile NLP

Prediction Classification Mining Healthcare Recognition and ASR
Sec. VIII
Deep Learning Driven Deep Learning Driven Emerging Deep Learning Driven Future Research Perspectives
Network Control Network Security Mobile Network Applications
Serving Deep Learning with
Spatio-Temporal Mobile
Massive and High-Quality
Traffic Data Mining
Data
Network Anomaly Malware Signal Network Data
Routing Scheduling Processing Monetisation
Optimisation Detection Detection
Resource Radio IoT In-Network Mobile Unsupervised Deep Learning Deep Reinforcement
Others Botnet
Allocation Control Privacy Attacker Powered Analysis Learning Powered Analysis
Detection Computation Crowdsensing
Fig. 1: Diagramatic view of the organization of this survey.
performance under complex environments. We wrap up this and wireless networking, which we group by different sce-
paper by pinpointing future research directions and important narios ranging from mobile traffic analytics to security, and
problems that remain unsolved and are worth pursing with emerging applications. We then discuss how to tailor deep
deep neural networks. Our ultimate goal is to provide a definite learning models to mobile networking problems (Section VII)
guide for networking researchers and practitioners, who intend and highlight open issues related to deep learning adoption in
to employ deep learning to solve problems of interest. networking research (Section VIII). We conclude this article
Survey Organization: We structure this article in a top-down with a brief discussion on the interplay between mobile
manner, as shown in Figure 1. We begin by discussing work networks and deep neural networks (Section IX).1
that gives a high-level overview of deep learning, future mobile
networks, and networking applications built using deep learn-
II. R ELATED H IGH - LEVEL A RTICLES AND
ing, which help define the scope and contributions of this paper
T HE S COPE OF T HIS S URVEY
(Section II). Since deep learning techniques are relatively new
in the mobile networking community, we provide a basic deep Mobile networking and deep learning problems have been
learning background in Section III, highlighting immediate researched mostly independently. Only recently crossovers be-
advantages in addressing mobile networking problems. There tween the two areas have emerged. Several notable works paint
exist many factors that enable implementing deep learning a comprehensives picture of the deep learning or/and mobile
for mobile networking applications (including dedicated deep networking research landscape. We categorize these works into
learning libraries, optimization algorithms, etc.). We discuss (i) pure overviews of deep learning techniques, (ii) reviews
these enablers in Section IV, aiming to help mobile network of analyses and management techniques in modern mobile
researchers and engineers in choosing the right software and networks, and (iii) reviews of works at the intersection between
hardware platforms for their deep learning deployments. deep learning and computer networking. We summarize these
In Section V, we introduce and compare state-of-the-art earlier efforts in Table II and in this section discuss the most
deep leaning models and provide guidelines for selecting representative publications in each class.
among candidates for solving networking problems. In Sec-
tion VI we review recent deep learning applications to mobile 1 We list the abbreviations used throughout this paper in Table I.
A. Overviews of Deep Learning and its Applications

TABLE I: List of abbreviations in alphabetical order.
The era of big data is triggering wide interest in deep
Acronym Explanation
learning across different research disciplines [25]–[28] and a
5G 5th Generation mobile networks
A3C Asynchronous Advantage Actor-Critic
growing number of surveys and tutorials are emerging (e.g.
AdaNet Adaptive learning of neural Network [21], [22]). LeCun et al. give a milestone overview of deep
AE Auto-Encoder learning, introduce several popular models, and look ahead
AI Artificial Intelligence at the potential of deep neural networks [18]. Schmidhuber
AMP Approximate Message Passing
ANN Artificial Neural Network
undertakes an encyclopedic survey of deep learning, likely
ASR Automatic Speech Recognition the most comprehensive thus far, covering the evolution,
BSC Base Station Controller methods, applications, and open research issues [19]. Liu et al.
BP Back-Propagation summarize the underlying principles of several deep learning
CDR Call Detail Record
models, and review deep learning developments in selected
CNN or ConvNet Convolutional Neural Network
ConvLSTM Convolutional Long Short-Term Memory applications, such as speech processing, pattern recognition,
CPU Central Processing Unit and computer vision [20].
CSI Channel State Information Arulkumaran et al. present several architectures and core
CUDA Compute Unified Device Architecture
algorithms for deep reinforcement learning, including deep Q-
cuDNN CUDA Deep Neural Network library
D2D Device to Device communication networks, trust region policy optimization, and asynchronous
DAE Denoising Auto-Encoder advantage actor-critic [24]. Their survey highlights the remark-
DBN Deep Belief Network able performance of deep neural networks in different control
DenseNet Dense Convolutional Network
problem (e.g., video gaming, Go board game play, etc.). Zhang
DQN Deep Q-Network
DRL Deep Reinforcement Learning et al. survey developments in deep learning for recommender
DT Decision Tree systems [65], which have potential to play an important role
ELM Extreme Learning Machine in mobile advertising. As deep learning becomes increasingly
GAN Generative Adversarial Network popular, Goodfellow et al. provide a comprehensive tutorial of
GP Gaussian Process
GPS Global Positioning System deep learning in a book that covers prerequisite knowledge,
GPU Graphics Processing Unit underlying principles, and popular applications [23].
GRU Gate Recurrent Unit
HMM Hidden Markov Model
HTTP HyperText Transfer Protocol B. Surveys on Future Mobile Networks
IDS Intrusion Detection System
The emerging 5G mobile networks incorporate a host of
IoT Internet of Things
ISP Internet Service Provider new techniques to overcome the performance limitations of
LTE Long-Term Evolution current deployments and meet new application requirements.
LS-GAN Loss-Sensitive Generative Adversarial Network Progress to date in this space has been summarized through
LSTM Long Short-Term Memory
surveys, tutorials, and magazine papers (e.g. [4], [5], [34], [35],
LSVRC Large Scale Visual Recognition Challenge
MDP Markov Decision Process [43]). Andrews et al. highlight the differences between 5G and
MEC Mobile Edge Computing prior mobile network architectures, conduct a comprehensive
ML Machine Learning review of 5G techniques, and discuss research challenges
MLP Multilayer Perceptron
facing future developments citeandrews2014will. Agiwal et al.
MIMO Multi-Input Multi-Output
MTSR Mobile Traffic Super-Resolution review new architectures for 5G networks, survey emerging
NFL No Free Lunch theorem wireless technologies, and point out to research problems that
NLP Natural Language Processing remain unsolved [4]. Gupta et al. also review existing work
NMT Neural Machine Translation on 5G cellular network architectures, subsequently proposing
NPU Neural Processing Unit
nn-X neural network neXt a framework that incorporates networking ingredients such as
PCA Principal Components Analysis Device-to-Device (D2D) communication, small cells, cloud
QoE Quality of Experience computing, and the IoT [5].
RBM Restricted Boltzmann Machine Intelligent mobile networking is becoming a popular re-
ResNet Residual Network
RFID Radio Frequency Identification search area and related work has been reviewed in the literature
RNC Radio Network Controller (e.g. [7], [30], [33], [48], [50]–[53]). Jiang et al. discuss
RNN Recurrent Neural Network the potential of applying machine learning to 5G network
SARSA State-Action-Reward-State-Action applications including massive MIMO and smart grids [7].
SGD Stochastic Gradient Descent
SON Self-Organising Network
This work further identifies several research gaps between ML
SNR Signal-to-Noise Ratio and 5G that remain unexplored. Li et al. discuss opportunities
SVM Support Vector Machine and challenges of incorporating artificial intelligence (AI) into
TPU Tensor Processing Unit future network architectures and highlight the significance of
VAE Variational Auto-Encoder
VR Virtual Reality
AI in the 5G era [52]. Klaine et al. present several successful
WGAN Wasserstein Generative Adversarial Network ML practices in SONs, discuss the pros and cons of different
WSN Wireless Sensor Network algorithms, and identify future research directions in this
area [51].
TABLE II: Summary of existing surveys, magazine papers, and books related to deep learning and mobile networking. The
symbol D indicates a publication is in the scope of a domain; 7 marks papers that do not directly cover that area, but from
which readers may retrieve some related insights. Publications related to both deep learning and mobile networks are shaded.
Scope
Publication One-sentence summary Machine learning Mobile networking
Deep Other ML Mobile 5G tech-
leaning methods big data nology
LeCun et al. [18] A milestone overview of deep learning. D
Schmidhuber [19] A comprehensive deep learning survey. D
Liu et al. [20] A survey on deep learning and its applications. D
Deng et al. [21] An overview of deep learning methods and applications. D
Deng [22] A tutorial on deep learning. D
Goodfellow et al. [23] An essential deep learning textbook. D 7
Arulkumaran et al. [24] A survey of deep reinforcement learning. D 7
Chen et al. [25] An introduction to deep learning for big data. D 7 7
Najafabadi [26] An overview of deep learning applications for big data analytics. D 7 7
Hordri et al. [27] A brief of survey of deep learning for big data applications. D 7 7
Gheisari et al. [28] A high-level literature review on deep learning for big data analytics. D 7
Yu et al. [29] A survey on networking big data. D
Alsheikh et al. [30] A survey on machine learning in wireless sensor networks. D D
Tsai et al. [31] A survey on data mining in IoT. D D
Cheng et al. [32] An introductions mobile big data its applications. D 7
Bkassiny et al. [33] A survey on machine learning in cognitive radios. D 7 7
Andrews et al. [34] An introduction and outlook of 5G networks. D
Gupta et al. [5] A survey of 5G architecture and technologies. D
Agiwal et al. [4] A survey of 5G mobile networking techniques. D
Panwar et al. [35] A survey of 5G networks features, research progress and open issues. D
Elijah et al. [36] A survey of 5G MIMO systems. D
Buzzi et al. [37] A survey of 5G energy-efficient techniques. D
Peng et al. [38] An overview of radio access networks in 5G. 7 D
Niu et al. [39] A survey of 5G millimeter wave communications. D
Wang et al. [2] 5G backhauling techniques and radio resource management. D
Giust et al. [3] An overview of 5G distributed mobility management. D
Foukas et al. [40] A survey and insights on network slicing in 5G. D
Taleb et al. [41] A survey on 5G edge architecture and orchestration. D
Mach and Becvar [42] A survey on MEC. D
Mao et al. [43] A survey on mobile edge computing. D D
Wang et al. [44] An architecture for personalized QoE management in 5G. D D
Han et al. [45] Insights to mobile cloud sensing, big data, and 5G. D D
Singh et al. [46] A survey on social networks over 5G. 7 D D
Chen et al. [47] An introduction to 5G cognitive systems for healthcare. 7 7 7 D
Buda et al. [48] Machine learning aided use cases and scenarios in 5G. D D D
Imran et al. [49] An introductions to big data analysis for self-organizing networks (SON) in 5G. D D D
Keshavamurthy et al. [50] Machine learning perspectives on SON in 5G. D D D
Klaine et al. [51] A survey of machine learning applications in SON. 7 D D D
Jiang et al. [7] Machine learning paradigms for 5G. 7 D D D
Li et al. [52] Insights into intelligent 5G. 7 D D D
Bui et al. [53] A survey of future mobile networks analysis and optimization. 7 D D D
Kasnesis et al. [54] Insights into employing deep learning for mobile data analysis. D D
Alsheikh et al. [17] Applying deep learning and Apache Spark for mobile data analytics. D D
Cheng et al. [55] Survey of mobile big data analysis and outlook. D D D 7
Wang and Jones [56] A survey of deep learning-driven network intrusion detection. D D D 7
Kato et al. [57] Proof-of-concept deep learning for network traffic control. D D
Zorzi et al. [58] An introduction to machine learning driven network optimisation. D D D
Fadlullah et al. [59] A comprehensive survey of deep learning for network traffic control. D D D 7
Zheng et al. [6] An introduction to big data-driven 5G optimisation. D D D D
Mohammadi et al. [60] A survey of deep learning in IoT data analytics. D D D
Ahad et al. [61] A survey of neural networks in wireless networks. D 7 7 D
Gharaibeh et al. [62] A survey of smart cities. D D D D
Lane et al. [63] An overview and introduction of deep learning-driven mobile sensing. D D D
Ota et al. [64] A survey of deep learning for mobile multimedia. D D D
C. Deep Learning Driven Networking Applications connected to these areas. Overall, our paper distinguishes
A growing number of papers survey recent works that itself from earlier surveys from the following perspectives:
bring deep learning into the computer networking domain. (i) We particularly focus on deep learning applications for
Alsheikh et al. identify benefits and challenges of using big mobile network analysis and management, instead of
data for mobile analytics and propose a Spark based deep broadly discussing deep learning methods (as, e.g., in
learning framework for this purpose [17]. Wang and Jones [18], [19]) or centering on a single application domain,
discuss evaluation criteria, data streaming and deep learning e.g. mobile big data analysis with a specific platform [17].
practices for network intrusion detection, pointing out research (ii) We discuss cutting-edge deep learning techniques from
challenges inherent to such applications [56]. Zheng et al. the perspective of mobile networks (e.g., [66], [67]),
put forward a big data-driven mobile network optimisation focusing on their applicability to this area, whilst giving
framework in 5G, to enhance QoE performance [6]. More less attention to conventional deep learning models that
recently, Fadlullah et al. deliver a survey on the progress may be out-of-date.
of deep learning in a board range of areas, highlighting its (iii) We analyze similarities between existing non-networking
potential application to network traffic control systems [59]. problems and those specific to mobile networks; based
Their work also highlights several unsolved research issues on this analysis we provide insights into both best deep
worthy of future study. learning architecture selection strategies and adaptation
Ahad et al. introduce techniques, applications, and guide- approaches so as to exploit the characteristics of mobile
lines on applying neural networks to wireless networking networks for analysis and management tasks.
problems [61]. Despite several limitations of neural networks To the best of our knowledge, this is the first time
identified, this article focuses largely on old neural networks that mobile network analysis and management are jointly
models, ignoring recent progress in deep learning and success- reviewed from a deep learning angle. We also provide for
ful applications in current mobile networks. Lane et al. inves- the first time insights into how to tailor deep learning to
tigate the suitability and benefits of employing deep learning mobile networking problems.
in mobile sensing, and emphasize on the potential for accurate
inference on mobile devices [63]. Ota et al. report novel III. D EEP L EARNING 101
deep learning applications in mobile multimedia. Their survey We begin with a brief introduction to deep learning, high-
covers state-of-the-art deep learning practices in mobile health lighting the basic principles behind computation techniques
and wellbeing, mobile security, mobile ambient intelligence, in this field, as well as key advantages that lead to their
language translation, and speech recognition. Mohammadi et success. Deep learning is essentially a sub-branch of ML
al. survey recent deep learning techniques for Internet of where algorithms hierarchically extract knowledge from raw
Things (IoT) data analytics [60]. They overview comprehen- data through multiple layers of nonlinear processing units,
sively existing efforts that incorporate deep learning into the in order to make predictions or take actions according to
IoT domain and shed light on current research challenges and some target objective. The major benefit of deep learning over
future directions. traditional ML is automatic feature extraction, thus avoiding
expensive hand-crafted feature engineering. We illustrate the
D. Our Scope relation between deep learning, machine leaning, and artificial
The objective of this survey is to provide a comprehensive intelligence (AI) at a high level in Fig. 2.
view on state-of-the-art deep learning practices in the mobile
networking area. By this we aim to answer the following key
questions:
1) Why is deep learning promising for solving mobile
networking problems?
2) What are the cutting-edge deep learning models relevant Applications in mobile &
Examples:
wireless network Examples:
Examples:
to mobile and wireless networking? (our scope) MLP, CNN, Supervised learning Rule engines
RNN Unsupervised learning Expert systems
3) What are the most recent successful deep learning appli- Evolutionary algorithms
Deep Learning Reinforcement learning
cations in the mobile networking domain?

4) How can researchers tailor deep learning to specific Machine Learning
mobile networking problems?
5) Which are the most important and promising directions
worthy of further study?
AI
The research papers and books we mentioned previously
only partially answer these questions. This article goes beyond Fig. 2: Venn diagram of the relation between deep learning,
these previous works and specifically focuses on the crossovers machine learning, and AI. This survey particularly focuses on
between deep learning and mobile networking. While our main deep learning applications in mobile and wireless networks.
scope remains the mobile networking domain, for complete-
ness we also discuss deep learning applications to wireless net- The discipline traces its origins 75 years back, when thresh-
works, and identify emerging application domains intimately old logic was employed to produce a computational model
for neural networks [68]. However, it was only in the late B. Advantages of Deep Learning
1980s that neural networks (NNs) gained interest, as Williams
and Hinton showed that multi-layer NNs could be trained We recognize several benefits of employing deep learning
effectively by back-propagating errors [69]. LeCun and Bengio to address network engineering problems, namely:
subsequently proposed the now popular Convolutional Neural
1) It is widely acknowledged that, while vital to the perfor-
Network (CNN) architecture [70], but progress stalled due to
mance of traditional ML algorithms, feature engineering
computing power limitations of systems available at that time.
is costly [72]. A key advantage of deep learning is
Following the recent success of GPUs, CNNs have been em-
that it can automatically extract high-level features from
ployed to dramatically reduce the error rate in the Large Scale
data that has complex structure and inner correlations.
Visual Recognition Challenge (LSVRC) [71]. This has drawn
The learning process does not need to be designed by
unprecedented interest in deep learning and breakthroughs
a human, which tremendously simplifies prior feature
continue to appear in a wide range of computer science areas.
handcrafting [18]. The importance of this is amplified in
the context of mobile networks, as mobile data is usually
generated by heterogeneous sources, is often noisy, and
exhibits non-trivial spatial and temporal patterns [17],
A. Fundamental Principles of Deep Learning
whose otherwise labeling would require outstanding hu-
The key aim of deep neural networks is to approximate com- man effort.
plex functions through a composition of simple and predefined 2) Secondly, deep learning is capable of handling large
operations of units (or neurons). Such an objective function amounts of data. Mobile networks generate high volumes
can be almost of any type, such as a mapping between images of different types of data at fast pace. Training traditional
and their class labels (classification), computing future stock ML algorithms (e.g., Support Vector Machine (SVM) [73]
prices based on historical values (regression), or even deciding and Gaussian Process (GP) [74]) sometimes requires to
the next optimal chess move given the current status on the store all the data in memory, which is computationally
board (control). The operations performed are usually defined infeasible under big data scenarios. Furthermore, the
by a weighted combination of a specific group of hidden units performance of ML does not grow significantly with
with a non-linear activation function, depending on the struc- large volumes of data and plateaus relatively fast [23]. In
ture of the model. Such operations along with the output units contrast, Stochastic Gradient Descent (SGD) employed
are named “layers”. The neural network architecture resembles to train NNs only requires sub-sets of at each training
the perception process in a brain, where a specific set of units step, which guarantees deep learning’s scalability with
are activated given the current environment, influencing the big data. Deep neural networks further benefit as training
output of the neural network model. In mathematical terms, the with big data prevents model over-fitting.
architecture of deep neural networks is usually differentiable, 3) Traditional supervised learning is only effective when
therefore the weights (or parameters) of the model can be sufficient labeled data is available. However, most cur-
learned my minimizing a loss function using gradient descent rent mobile systems generate unlabeled or semi-labeled
methods through back-propagation, following the fundamental data [17]. Deep learning provides a variety of methods
chain rule [69]. We illustrate the principles of the learning and that allow exploiting unlabeled data to learn useful pat-
inference processes of a deep neural network in Fig. 3, where terns in an unsupervised manner, e.g., Restricted Boltz-
we use a CNN as example. mann Machine (RBM) [75], Generative Adversarial Net-
work (GAN) [76]. Applications include clustering [77],
data distributions approximation [76], un/semi-supervised
learning [78], [79], and one/zero shot learning [80], [81]
Forward Passing (Inference)
amongst others.
4) Compressive representations learned by deep neural net-
Units works can be shared across different tasks, while this
Inputs x
is limited or difficult to achieve in other ML paradigms
Outputs y
(e.g., linear regression, random forest, etc.). Therefore, a
single model can be trained to fulfill multiple objectives,
Hidden Hidden Hidden
Layer 1 Layer 2 Layer 3 without requiring complete model retraining for different
tasks. We argue that this is essential for mobile network
engineering, as it reduces computational and memory
Backward Passing (Learning) requirements of mobile systems when performing multi-
task learning applications [82].
Fig. 3: Illustration of the learning and inference processes Although deep learning can have unique advantages when
of a 4-layer CNN. w(·) denote weights of each hidden layer, addressing mobile network problems, it requires certain system
σ(·) is an activation function, λ refers to the learning rate, and software support, in order to be effectively deployed in
∗(·) denotes the convolution operation and L(w) is the loss mobile networks. We review and discuss such enablers in the
function to be optimized. next section.
TABLE III: Summary of tools and techniques that enable deploying deep learning in mobile systems.
Performance Energy con- Economic
Technique Examples Scope Functionality
improvement sumption cost
Enable fast, parallel
Advanced GPU, TPU [83],
Mobile servers, training/inference of deep Medium
parallel CUDA [84], High High
workstations learning models in mobile (hardware)
computing cuDNN [85]
applications
TensorFlow [86], High-level toolboxes that
Associated
Dedicated deep Theano [87], Mobile servers enable network engineers to Low
Medium with
learning library Caffe [88], and devices build purpose-specific deep (software)
hardware
Torch [89] learning architectures
nn-X [90], ncnn [91],
Support edge-based deep Medium
Fog computing Kirin 970 [92], Mobile devices Medium Low
learning computing (hardware)
Core ML [93]
Nesterov [94], Associated
Fast optimization Training deep Accelerate and stabilize the Low
Adagrad [95], Medium with
algorithms architectures model optimization process (software)
RMSprop, Adam [96] hardware
MLbase [97],
Distributed Distributed data Support deep learning
Gaia [10], Tux2 [11], High
machine learning centers, frameworks in mobile High High
Adam [98], (hardware)
systems cross-server systems across data centers
GeePS [99]
IV. E NABLING D EEP L EARNING IN M OBILE N ETWORKING even demonstrate 15-30× higher processing speeds and 30-
5G systems seek to provide high throughput and ultra-low 80× higher performance-per-watt than CPUs and GPUs [83].
latency communication services, to improve users’ QoE [4]. There are a number of toolboxes that can assist the com-
Implementing deep learning to build intelligence into 5G putational optimization of deep learning on the server side.
systems so as to meet these objectives is expensive, since Spring and Shrivastava introduce a hashing based technique
powerful hardware and software is required to support train- that substantially reduces computation requirements of deep
ing and inference in complex settings. Fortunately, several networks implementations [100]. Mirhoseini et al. employ a
tools are emerging, which make deep learning in mobile reinforcement learning scheme to enable machines to learn
networks tangible; namely, (i) advanced parallel computing, the optimal operation placement over mixture hardware for
(ii) distributed machine learning systems, (iii) dedicated deep deep neural networks. Their solution achieves up to 20%
learning libraries, (iv) fast optimization algorithms, and (v) fog faster computation speed than human experts’ designs of such
computing. We summarize these advances in Table III and placements [101].
review them in what follows. Importantly, these systems are easy to deploy, therefore
mobile network engineers do not need to rebuild mobile
servers from scratch to support deep learning computing. This
A. Advanced Parallel Computing
makes implementing deep learning in mobile systems feasible
Compared to traditional machine learning models, deep and accelerates the processing of mobile data streams.
neural networks have significantly larger parameters spaces,
intermediate outputs, and number of gradient values. Each of B. Distributed Machine Learning Systems
these need to be updated during every training step, requiring
Mobile data is collected from heterogeneous sources (e.g.,
powerful computation resources. The training and inference
mobile devices, network probes, etc.), and stored in multiple
processes involve huge amounts of matrix multiplications and
distributed data centers. With the increase of data volumes, it
other operations, though they could be massively parallelized.
is impractical to move all mobile data to a central data center
Traditional Central Processing Units (CPUs) have limited
to run deep learning applications [10]. Running network-wide
number of cores thus they only support restricted computing
deep leaning algorithms would therefore require distributed
parallelism. Employing CPUs for deep learning implementa-
machine learning systems that support different interfaces
tions is highly inefficient and will not satisfy the low-latency
(e.g., operating systems, programming language, libraries), so
requirements of mobile systems.
as to enable training and evaluation of deep models across
Engineers address this issues by exploiting the power of
geographically distributed servers simultaneously, with high
GPUs. GPUs were originally designed for high performance
efficiency and low overhead.
video games and graphical rendering, but new techniques such
Deploying deep learning in a distributed fashion will
as Compute Unified Device Architecture (CUDA) [84] and the
inevitably introduce several system-level problems, which
CUDA Deep Neural Network library (cuDNN) [85] developed
require answering the following questions:
by NVIDIA add flexibility to this type of hardware, allowing
users to customize their usage for specific purposes. GPUs Consistency – When employing multiple machines to train
usually incorporate thousand of cores and perform exception- one model, how to guarantee model parameters and computa-
ally in fast matrix multiplications required for training neural tional processes are consistent across all servers?
networks. This provides higher memory bandwidth over CPUs Fault tolerance – How to deal with equipment breakdowns
and dramatically speeds up the learning process. Recent ad- in a large-scale distributed machine learning systems?
vanced Tensor Processing Units (TPUs) developed by Google Communication – How to optimize communication between
nodes in a cluster and how to avoid congestion? TensorFlow3 is a machine learning library developed by
Storage – How to design efficient storage mechanisms tailored Google [86]. It enables deploying computation graphs on
to different environments (e.g., distributed clusters, single CPUs, GPUs, and even mobile devices [103], allowing ML
machines, GPUs), given I/O and data processing diversity? implementation on both single and distributed architectures.
Resource management – How to assign workloads to nodes Although originally designed for ML and deep neural
in a cluster, while making sure they work well-coordinated? networks applications, TensorFlow is also suitable for other
Programming model – How to design programming inter- data-driven research purposes. Detailed documentation and
faces so that users can deploy machine learning, and how to tutorials for Python exist, while other programming languages
support multiple programming languages? such as C, Java, and Go are also supported. Building upon
TensorFlow, several dedicated deep learning toolboxes were
There exist several distributed machine learning systems
released to provide higher-level programming interfaces,
that facilitate deep learning in mobile networking applica-
including Keras4 and TensorLayer [104].
tions. Kraska et al. introduce a distributed system named
MLbase, which enables to intelligently specify, select, opti-
Theano is a Python library that allows to efficiently define,
mize, and parallelize ML algorithms [97]. Their system helps
optimize, and evaluate numerical computations involving
non-experts deploy a wide range of ML methods, allowing
multi-dimensional data [87]. It provides both GPU and
optimization and running ML applications across different
CPU modes, which enables user to tailor their programs to
servers. Hsieh et al. develop a geography-distributed ML
individual machines. Theano has a large users group and
system called Gaia, which breaks the throughput bottleneck
support community, and was one of the most popular deep
by employing an advanced communication mechanism over
learning tools, though its popularity is decreasing as core
Wide Area Networks (WANs), while preserving the accuracy
ideas and attributes are absorbed by TensorFlow.
of ML algorithms [10]. Their proposal supports versatile ML
interfaces (e.g. TensorFlow, Caffe), without requiring signifi-
cant changes to the ML algorithm itself. This system enables Caffe(2) is a dedicated deep learning framework developed
deployments of complex deep leaning applications over large- by Berkeley AI Research [88] and the latest version, Caffe2,5
scale mobile networks. was recently released by Facebook. It allows to train a neural
networks on multiple GPUs within distributed systems, and
Xing et al. develop a large-scale machine learning platform
supports deep learning implementations on mobile operation
to support big data applications [102]. Their architecture
systems, such as iOS and Android. Therefore, it has the
achieves efficient model and data parallelization, enabling
potential to play an important role in the future mobile edge
parameter state synchronization with low communication cost.
computing.
Xiao et al. propose a distributed graph engine for ML named
TUX2 , to support data layout optimization across machines
and reduce cross-machine communication [11]. They demon- (Py)Torch is a scientific computing framework with wide
strate remarkable performance in terms of runtime and con- support for machine learning models and algorithms [89]. It
vergence on a large dataset with up to 64 billion edges. was originally developed in the Lua language, but developers
Chilimbi et al. build a distributed, efficient, and scalable later released an improved Python version [105]. It is a
system named “Adam".2 tailored to the training of deep lightweight toolbox that can run on embedded systems such
models [98] Their architecture demonstrates impressive perfor- as smart phones, but lacks comprehensive documentations.
mance in terms throughput, delay, and fault tolerance. Another PyTorch is now officially supported and maintained by
dedicated distributed deep learning system called GeePS is Facebook and mainly employed for research purposes.
developed by Cui et al. [99]. Their framework allows data
parallelization on distributed GPUs, and demonstrates higher MXNET is a flexible deep learning library that provides
training throughput and faster convergence rate. interfaces for multiple languages (e.g., C++, Python, Matlab,
R, etc.) [106]. It supports different levels of machine learning
models, from logistic regression to GANs. MXNET provides
C. Dedicated Deep Learning Libraries fast numerical computation for both single machine and dis-
Building a deep learning model from scratch can prove tributed ecosystems. It wraps workflows commonly used in
complicated to engineers, as this requires definitions of deep learning into high-level functions, such that standard
forwarding behaviors and gradient propagation operations neural networks can be easily constructed without substantial
at each layer, in addition to CUDA coding for GPU coding effort. MXNET is the official deep learning framework
parallelization. With the growing popularity of deep learning, in Amazon.
several dedicated libraries simplify this process. Most of these Although less popular, there are other excellent deep learn-
toolboxes work with multiple programming languages, and ing libraries, such as CNTK,6 Deeplearning4j,7 and Lasagne,8
are built with GPU acceleration and automatic differentiation 3 TensorFlow,https://www.tensorflow.org/
support. This eliminates the need of hand-crafted definition 4 Keras deep learning library, https://github.com/fchollet/keras
of gradient propagation. We summarize these libraries below. 5 Caffe2,https://caffe2.ai/
6 MS Cognitive Toolkit, https://www.microsoft.com/en-us/cognitive-toolkit/
7 Deeplearning4j, http://deeplearning4j.org
2 Note that this is distinct from the Adam optimizer discussed in Sec. IV-D 8 Lasagne, https://github.com/Lasagne
which can be also employed in mobile systems. Selecting networks execution in mobile devices, while retaining low
among these varies according to specific applications. energy consumption [90]. Bang et al. introduce a low-power
and programmable deep learning processor to deploy mo-
D. Fast Optimization Algorithms bile intelligence on edge devices [113]. Their hardware only
consumes 288 µW but achieves 374 GOPS/W efficiency. A
The objective functions to be optimized in deep learning
Neurosynaptic Chip called TrueNorth is proposed by IBM
are usually complex, as they include a sum of a extremely
[114]. Their solution seeks to support computationally inten-
large number of data-wise likelihood functions. As the depth
sive applications on embedded battery-powered mobile de-
of the model increases, such functions usually exhibit high
vices. Qualcomm introduces a Snapdragon neural processing
non-convexity with multiple local minima, critical points, and
engine to enable deep learning computational optimization
saddle points. In this case, conventional Stochastic Gradient
tailored to mobile devices.9 Their hardware allows developers
Descent (SGD) algorithms are slow in terms of convergence,
to execute neural network models on Snapdragon 820 boards
which will restrict their applicability to latency constrained
to serve a variety of applications. In close collaboration with
mobile systems.
Google, Movidius develops an embedded neural network com-
Over recent years, a set of new optimization algorithms have
puting framework that allows user-customized deep learning
been proposed to speed up and stabilize the optimization pro-
deployments at the edge of mobile networks. Their products
cess. Suskever et al. introduce a variant of the SGD optimizer
can achieve satisfying runtime efficiency, while operating
with Nesterov’s momentum, which evaluates gradients after
with ultra-low power requirements. More recently, Huawei
the current velocity is applied [94]. Their method demonstrates
officially announced the Kirin 970 as a mobile AI computing
faster convergence rate when optimizing convex functions.
system on chip.10 Their innovative framework incorporates
Adagrad performs adaptive learning to model parameters ac-
dedicated Neural Processing Units (NPUs), which dramatically
cording to their update frequency. This is suitable for handling
accelerates neural network computing, enabling classification
sparse data and significantly outperform SGD in terms of
of 2,000 images per second on mobile devices.
robustness [95]. RMSprop is a popular SGD based method
proposed by Hinton. RMSprop divides the learning rate by Beyond these hardware advances, there are also software
an exponential smoothing average of gradients and does not platforms that seek to optimize deep learning on mobile
require one to set the learning rate for each training step [107]. devices (e.g., [115]). We compare and summarize all these
Kingma and Ba propose an adaptive learning rate optimizer platforms in Table IV.11 In additional to the mobile version of
named Adam that incorporates momentum by the first-order TensorFlow and Caffe, Tencent released a lightweight, high-
moment of the gradient [96]. This algorithm is fast in terms of performance neural network inference framework tailored to
convergence, highly robust to model structures, and is consid- mobile platforms, which relies on CPU computing.12 This
ered as the first choice if one cannot decide which algorithm toolbox performs better than all known CPU-based open
should be used. Andrychowicz et al. suggest that the optimiza- source frameworks in terms of inference speed. Apple has
tion process can be even learned dynamically [108]. They pose developed “Core ML", a private ML framework to facilitate
the gradient descent as a trainable learning problem, which mobile deep learning implementation on iOS 11.13 This lowers
demonstrates good generalization ability in neural network the entry barrier for developers wishing to deploy ML models
training. Wen et al. propose a training algorithm tailored to on Apple equipment. Yao et al. develop a deep learning
distributed systems [109]. They quantize float gradient values framework called DeepSense dedicated to mobile sensing
to {-1, 0 and +1} in the training processing, which theoreti- related data processing, which provides a general machine
cally require 20 times less gradient communications between learning toolbox that accommodates a wide range of edge
nodes. The authors prove that such gradient approximation applications. It has moderate energy consumption and low
mechanism allows the objective function to converge to optima latency, thus being amenable to deployment on smartphones.
with probability of 1, where in their experiments, only a 2% The techniques and toolboxes mentioned above make the
accuracy loss is observed on average on GoogleLeNet [110] deployment of deep learning practices in mobile network
training. applications feasible. In what follows, we briefly introduce
several representative deep learning architectures and discuss
E. Fog Computing their applicability to mobile networking problems.
The Fog computing paradigm presents a new opportunity to
implement deep learning in mobile systems. Fog computing
refers to a set of techniques that permit deploying applications 9 Qualcomm Helps Make Your Mobile Devices Smarter With
or data storage at the edge of networks [111], e.g., on New Snapdragon Machine Learning Software Development Kit:
individual mobile devices. This reduces the communications https://www.qualcomm.com/news/releases/2016/05/02/qualcomm-helps-
make-your-mobile-devices-smarter-new-snapdragon-machine
overhead, offloads data traffic, reduces user-side latency, and 10 Huawei announces the Kirin 970- new flagship SoC with AI capabilities
lightens the sever-side computational burdens [112]. http://www.androidauthority.com/huawei-announces-kirin-970-797788/
11 Adapted from https://mp.weixin.qq.com/s/3gTp1kqkiGwdq5olrpOvKw
There exist several efforts that attempt to shift deep learn-
12 ncnn is a high-performance neural network inference framework opti-
ing computing from the cloud side to mobile devices. For
mized for the mobile platform, https://github.com/Tencent/ncnn
example, Gokhale et al. develop a mobile coprocessor named 13 Core ML: Integrate machine learning models into your app, https:
neural network neXt (nn-X), which accelerates the deep neural //developer.apple.com/documentation/coreml
TABLE IV: Comparison of mobile deep learning platform.

Platform Developer Mobile hardware supported Speed Code size Mobile compatibility Open-sourced
TensorFlow Google CPU Slow Medium Medium Yes
Caffe Facebook CPU Slow Large Medium Yes
ncnn Tencent CPU Medium Small Good Yes
CoreML Apple CPU/GPU Fast Small Only iOS 11+ supported No
DeepSense Yao et al. CPU Medium Unknown Medium No
V. D EEP L EARNING : S TATE - OF - THE -A RT Belief Network (DBN) [117], which performs layer-wise train-
ing and achieves superior performance as compared to MLPs
Revisiting Fig. 2, machine learning methods can be natu-
in many applications, including time series forecasting [133],
rally categorized into three classes, namely supervised learn-
ratio matching [134], and speech recognition [135]. Such
ing, unsupervised learning, and reinforcement learning. Deep
structures can be even extended to a convolutional architecture,
learning architectures have achieved remarkable performance
to learn hierarchical spatial representations [118].
in all these areas. In this section, we introduce the key
principles underpinning several deep learning models and
discuss their largely unexplored potential to solve mobile C. Auto-Encoders
networking problems. We illustrate and summarize the most Auto-Encoders (AEs) are also designed for unsupervised
salient architectures that we present in Fig. 4 and Table V, learning and attempt to copy inputs to outputs. The under-
respectively. lying principle of an AE is shown in Fig. 4(c). AEs are
frequently used to learn compact representation of data for
A. Multilayer Perceptron dimension reduction [136]. Extended versions can be further
employed to initialize the weights of a deep architecture,
The Multilayer Perceptrons (MLPs) is the initial Artificial e.g., the Denoizing Auto-Encoder (DAE) [119]), and generate
Neural Network (ANN) design, which consists of at least virtual examples from a target data distribution, e.g. Variational
three layers of operations [130]. Units in each layer are Auto-Encoders (VAEs) [120]. AEs can be further employed
densely connected hence requiring to configure a substantial to address network security problems, as several research
number of weights. We show an MLP with two hidden layers papers confirm their effectiveness in detecting anomalies under
in Fig. 4(a). The MLP can be employed for supervised, different circumstances [137]–[139], which we will further
unsupervised, and even reinforcement learning purposes. Al- discuss in subsection VI-E.
though this structure was the most popular neural network The structures of RBMs and AEs are based upon MLPs,
in the past, its popularity is decreasing as it entails high CNNs or RNNs. Their goals are similar, while their learning
complexity (fully-connected structure), modest performance, processes are different. Both can be exploited to extract pat-
and low convergence efficiency. MLPs are mostly used as a terns from unlabeled mobile data, which may be subsequently
baseline or integrated into more complex architectures (e.g., employed for various supervised learning tasks, e.g., rout-
the final layer in CNNs used for classification). Building an ing [140], mobile activity recognition [141], periocular ver-
MLP is straightforward, and it can be employed, e.g., to assist ification [142] and base station user number prediction [143].
with feature extraction in models built for specific objectives in
mobile network applications. The advanced Adaptive learning
of neural Network (AdaNet) enables MLPs to dynamically D. Convolutional Neural Networks
train their structures to adapt to the input [116]. This new Instead of employing full connections between layers, Con-
architecture can be potentially explored for analyzing contin- volutional Neural Networks (CNNs or ConvNets) employ a
uously changing mobile environments. set of locally connected kernels (filters) to capture correla-
tions between different data regions. This approach improves
traditional MLPs by leveraging three important ideas, namely,
B. Boltzmann Machine sparse interactions, parameter sharing, and equivariant repre-
Restricted Boltzmann Machines (RBMs) [75] were orig- sentations [23]. This reduce the number of model parameters
inally designed for unsupervised learning purposes. They significantly and maintains the affine invariance (i.e., recogni-
are essentially a type of energy-based undirected graphical tion result are robust to the affine transformation of objects).
models, which include a visible layer and a hidden layer, We illustrate the operation of one 2D convolutional layer in
and where each unit can only takes binary values (i.e., 0 Fig. 4(d).
and 1). RBMs can be effectively trained using the contrastive Owing to the properties mentioned above, CNNs achieve
divergence algorithm [131] through multiple steps of Gibbs remarkable performance in imaging applications. Krizhevsky
sampling [132]. We illustrate the structure and the training et al. [71] exploit a CNN to classify images on the Ima-
process of an RBM in Fig. 4(b). RBM-based models are geNet dataset [144]. Their method reduces the top-5 error
usually employed to initialize the weights of a neural network by 39.7% and revolutionizes the imaging classification field.
in more recent applications. The pre-trained model can be GoogLeNet [110] and ResNet [121] significantly increase the
subsequently fine-tuned for supervised learning purposes using depth of CNN structures, and propose inception and residual
a standard BP algorithm. A stack of RBMs is called a Deep learning techniques to address problems such as over-fitting
Input layer Hidden variables

Feature 1 Hidden layers
Feature 2
Output layer
1. Sample from
Feature 3 2. Sample from
Feature 4
3. Update weights
Feature 5
Visible variables
Feature 6
(a) Structure of an MLP with 2 hidden layers (blue circles). (b) Graphical model and training process of an RBM. v and h denote
visible and hidden variables, respectively.
Output layer
Hidden Minimise
layer
Input layer
(c) Operating principle of an auto-encoder, which seeks to reconstruct (d) Operating principle of a convolutional layer.
the input from the hidden layer.
Generator Network
Outputs h1 h2 h3 ht
Inputs Outputs
States S1 S2 S3 ... St
Outputs from Discriminator Network
the generator
Output
Inputs x1 x2 x3 xt Real data
Real/Fake
(e) Recurrent layer – x1:t is the input sequence, indexed by time t, st (f) Underlying principle of a generative adversarial network (GAN).
denotes the state vector and ht the hidden outputs. In a simple RNN
layer, the input may directly connect to its output without hidden states.
Rewards
States Agent Outputs

Policy/Q values ...
Actions Environment
Observed state
(g) Typical deep reinforcement learning architecture. The agent is a neural network model that
approximates the required function.
Fig. 4: Typical structure and operation principles of MLP, RBM, AE, CNN, RNN, GAN, and DRL.
TABLE V: Summary of different deep learning architectures. GAN and DRL are shaded, since they are built upon other models.
Potential
Learning Example Suitable
Model Pros Cons applications in
scenarios architectures problems
mobile networks
Modeling
High complexity,
Supervised, Modeling data Naive structure and multi-attribute mobile
ANN, modest performance
MLP unsupervised, with simple straightforward to data; auxiliary or
AdaNet [116] and slow
reinforcement correlations build component of other
convergence
deep architectures
Learning
representations from
DBN [117], unlabeled mobile
Extracting robust Can generate virtual
RBM Unsupervised Convolutional Difficult to train well data; model weight
representation samples
DBN [118] initialization;
network flow
prediction
model weight
Learning sparse Powerful and initialization; mobile
DAE [119], Expensive to pretrain
AE Unsupervised and compact effective data dimension
VAE [120] with big data
representations unsupervised learning reduction; mobile
anomaly detection
High computational
AlexNet [71], cost; challenging to
Supervised, ResNet [121], find optimal
Spatial data Weight sharing; spatial mobile data
CNN unsupervised, 3D-ConvNet [122], hyper-parameters;
modeling affine invariance analysis
reinforcement GoogleLeNet [110], requires deep
DenseNet [123] structures for
complex tasks
Individual traffic flow
LSTM [124], High model
Supervised, Expertise in analysis;
Attention based Sequential data complexity; gradient
RNN unsupervised, capturing temporal network-wide
RNN [125], modeling vanishing and
reinforcement dependencies (spatio-)temporal
ConvLSTM [126] exploding problems
data modeling
Virtual mobile data
Training process is
Can produce lifelike generation; assisting
WGAN [67], unstable
GAN Unsupervised Data generation artifacts from a target supervised learning
LS-GAN [127] (convergence
distribution tasks in network data
difficult)
analysis
DQN [128], deep Control problems Ideal for
Mobile network
Policy with high- high-dimensional Slow in terms of
DRL Reinforcement control and
Gradient [129], dimensional environment convergence
management.
A3C [66] inputs modeling
and gradient vanishing, introduced by “depth”. Their structure put via recurrent connections between hidden units [23],
is further improved by the Dense Convolutional Network as shown in Fig. 4(e). Gradient vanishing and exploding
(DenseNet) [123], which reuses feature maps from each layer, problems are frequently reported in traditional RNNs, which
thereby achieving significant accuracy improvements over make them particularly hard to train [147]. The Long Short-
other CNN based models, while requiring fewer layers. CNN Term Memory (LSTM) mitigates these issues by introducing
have been also extended to video applications. Ji et al. propose a set of “gates” [124], which has been proven successful
3D convolutional neural networks for video activity recog- in many applications (e.g., speech recognition [148], text
nition [122], demonstrating superior accuracy as compared categorization [149], and wearable activity recognition [150]).
to 2D CNN. More recent research focuses on learning the Sutskever et al. introduce attention mechanisms to RNN
shape of convolutional kernels [145], [146]. These dynamic models, which achieve outstanding accuracy in tokenized
architectures allow to automatically focus on important regions predictions [125]. Shi et al. substitute the dense matrix multi-
in input maps. Such properties are particularly important in plication in LSTMs with convolution operations, designing a
analyzing large-scale mobile environments exhibiting cluster- Convolutional Long Short-Term Memory (ConvLSTM) [126].
ing behaviors (e.g., surge of mobile traffic associated with a Their proposal reduces the complexity of traditional LSTM
popular event). and demonstrates significantly lower prediction errors in pre-
Given the high similarity between image and spatial mobile cipitation nowcasting.
data (e.g., mobile traffic snapshots, users’ mobility, etc.),
CNN-based models have huge potential for network-wide
mobile data analysis. This is a promising future direction that
we further discuss in Sec. VIII. Mobile networks produce massive sequential data from
various sources, such as data traffic flows, and the evolution
E. Recurrent Neural Network of mobile network subscribers’ trajectories and application
Recurrent Neural Networks (RNNs) are designed for mod- latencies. Exploring the RNN family is promising to enhance
eling sequential data. At each time step, they produce out- the analysis of time series data in mobile networks.
F. Generative Adversarial Network applicability of traditional reinforcement learning algorithms.

The Generative Adversarial Network (GAN) is a framework DRL techniques broaden the ability of traditional reinforce-
that trains generative models using an adversarial process. ment learning algorithms to handle high dimensionality, in
It simultaneously trains two models: a generative one G scenarios previously considered intractable. Employing DRL
that seeks to approximate the target data distribution from is thus promising to address network management and con-
training data, and a discriminative model D that estimates trol problems under complex, changeable, and heterogeneous
the probability that a sample comes from the real training mobile environments. We further discuss this potential in
data rather than the output of G [76]. Both of G and D are Sec. VIII.
normally neural networks. The training procedure for G aims
to maximize the probability of D making a mistake. Each of VI. D EEP L EARNING D RIVEN M OBILE AND W IRELESS
them is trained iteratively while fixing the other one, finally N ETWORKS
G can produce data close to a target distribution (the same
with training examples), if the model converges. We show the Deep learning has wide range of applications in mobile
overall structure of a GAN in Figure 4(f). networking fields. We structure and group deep learning
The training process of traditional GANs is highly sen- applications in different networking domains and characterize
sitive to model structures, learning rates, and other hyper- their contributions. In what follows, we present the important
parameters. Researchers are usually required to employ nu- publications across all areas and compare their design and
merous ad hoc ‘tricks’ to achieve convergence. There exist principle.
several solutions for mitigating this problem, e.g., Wasserstein
Generative Adversarial Network (WGAN) [67] and Loss- A. Deep Learning Driven Mobile Data Analysis
Sensitive Generative Adversarial Network (LS-GAN) [127],
but research on the theory of GANs remains shallow. Recent The development of mobile technology (e.g. smartphones,
work confirms that GANs can promote the performance of augmented reality, etc.) are forcing mobile operators to evolve
some supervised tasks (e.g., super-resolution [151], object mobile network infrastructures. As a consequence, both cloud
detection [152], and face completion [153]) by minimizing side and edge side of mobile networks are becoming increas-
the divergence between inferred and real data distributions. ingly sophisticated to cater the users’ requirements, which
Exploiting the unsupervised learning abilities of GANs is foster and consume huge amount of mobile data every day.
promising in terms of generating synthetics mobile data for These data can be either generated by sensors of mobile
simulations, or assisting specific supervised tasks in mobile devices that record individual users’ behaviors, or from mobile
network applications. This becomes more important in tasks network infrastructure which reflects the dynamics in urban
where appropriate datasets are lacking, given that operators environments. Appropriately mining these data can benefit
are generally reluctant to share their network data. multidisciplinary research fields, ranging from mobile network
management and social analysis, to public transportation,
personal services provision, and so on [32].
G. Deep Reinforcement Learning However, operators may be overwhelmed by managing and
Deep Reinforcement Learning (DRL) refers to a set of analyzing massive amounts of heterogeneous mobile data.
methods that approximate value functions (deep Q learning) or Deep learning is probably the most powerful methodology
policy functions (policy gradient method) through deep neural for the analysis of mobile big data. We begin this subsection
networks. An agent (neural network) continuously interacts by introducing characteristics of mobile big data, then present
with an environment and receives reward signals as feedback. a holistic review on deep learning-driven mobile data analysis.
The agent selects an action at each time step, which will
change the state of the environment. The training goal of Characteristics of Mobile Data Yazti and Krishnaswamy
the neural network is to optimize its parameters, such that propose to categorize mobile data into two groups, namely
it can select actions that potentially lead to the best future network-level data and app-level data [161]. The key difference
return. We illustrate this principle in Fig. 4(g). DRL is well- between them is that the former is usually collected by the
suited to problems that have a huge number of possible states edge mobile devices, while the latter one can be obtained
(i.e., environments are high-dimensional). Representative DRL through network infrastructure. We summarize these two types
methods include Deep Q-Networks (DQNs) [128], deep policy of data and their information comprised in Table VIII. Before
gradient methods [129], and Asynchronous Advantage Actor- discussing the mobile data analytics, we illustrate the data
Critic [66]. These perform remarkably in AI gaming, robotics, collection process in Figure 5.
and autonomous driving [154]–[157], and have made inspiring Mobile operators receive massive amount of network-level
deep learning breakthroughs over recent years. mobile data generated by the networking infrastructure. These
On the other hand, many mobile networking problems can data not only deliver a global view of mobile network
be formulated as Markov Decision Processes (MDPs), where performance (e.g. throughput, end-to-end delay, jitter, etc.),
reinforcement learning can play an important role (e.g., base but also log individual session time, communication types,
station on-off switching strategies [158], routing [159], and sender and receiver through Call Detail Records (CDRs). The
adaptive tracking control [160]). Some of these problems network-level data usually exhibits significant spatio-temporal
nevertheless involve high-dimensional inputs, which limits the variations resulting from users’ behaviors [162], which can
TABLE VI: The taxonomy of mobile big data.

Information
Mobile data Source
Infrastructure locations,
Weather Magnetic field Infrastructure capability, equipment
holders, etc
Network-level data
Performance Data traffic, end-to-end
Humidity indicators delay, QoE, jitter, etc.
Noise Reconnaissance
Sensor nodes Gateways
Session start and end
Call detail time, type, sender and
Location records (CDR) receiver, session results,
Temperature
Air quality etc.
Radio Signal power, frequency,
Wireless sensor network information spectrum, modulation etc.
Device type, usage, MAC
Sensor Device
data
address, etc.
User setting, personal
Data App-level data Profile
information, etc
storage
Mobility, temperature,
Sensors magnetic field,
movement, etc
Mobile data Picture, video, voice,
mining/analysis Application health condition,
preference, etc.
Log record software or
System log
App/Network level hardware failure etc.
mobile data
ity [32]. Some network-level data (e.g. mobile traffic) can be

viewed as pictures taken by panoramic cameras, which provide
a city-scale sensing system for urban sensing. These images
comprise information associated with movements of large-
BSC/RNC
scale of individuals [162], hence exhibiting significant spatio-
Base station
temporal diversity. Additionally, as modern smart phones can
WiFi Modern accommodate multiple sensors and applications, one device
Cellular/WiFi network can foster heterogeneous data simultaneously. Some of these
data involve explicit identity information of individuals. Inap-
propriate sharing and use can raise significant privacy issues.
Therefore, extracting useful patterns from multi-source mobile
Fig. 5: Illustration of the mobile data collection process in without compromising user’s privacy remains a challenging
cellular, WiFi and wireless sensor networks. BSC: Base endeavor.
Station Controller; RNC: Radio Network Controller. Compared to traditional data analysis techniques, deep
learning embraces several unique features to address afore-
mentioned challenges [17]. Namely:
be utilized for network diagnosis and management, users’ 1) Deep learning achieves remarkable performance in vari-
mobility analysis and public transportation planning [163]. ous data analysis tasks, on both structured and unstruc-
On the other hand, App-level data is directly recorded tured data. Some types of mobile data can be represented
by sensors or mobile applications installed in various mo- as image-like (e.g. [163]) or sequential data [164].
bile devices. These data is frequently collected through 2) Deep learning performs remarkably well in feature ex-
crowd-sourcing schemes from heterogeneous sources, such as traction from raw data. This saves tremendous effort of
Global Positioning Systems (GPS), mobile cameras and video hand-crafted feature engineering on mobile data, which
recorders, and potable medical monitors. Mobile devices act as enables employees to spend more time on model design,
sensor hubs, which are responsible for data gathering and pre- and less sorting through the data itself.
processing, and subsequently distribute them to specific venues 3) Deep learning offer excellent tools (e.g. RBM, AE, GAN)
as required [32]. App-level data usually directly or indirectly for handing unlabeled data, which is common in mobile
reflects users’ behaviors, such as mobility, preference and so- network logs.
cial links [55]. Analyzing app-level data from individuals can 4) Multi-modal deep learning allows to learn features over
help reconstructing one’s personality and preference, which multiple modalities [165], which makes it powerful in
can be used for recommender systems and users targeting. modeling to data collected from heterogeneous sensors
The access to these data usually involves privacy and security and data sources.
issues, which have to be addressed appropriately. These advantages forge deep learning as a powerful tool for
Mobile big data include several unique characteristics, mobile data analysis.
such as spatio-temporal diversity, heterogeneity and personal-
TABLE VII: A summary of work on network-level mobile data analysis.

Domain Reference Applications Model
Pierucci and Micheli [166] QoE prediction MLP
Gwon and Kung [167] Inferring Wi-Fi flow patterns Sparse coding + Max pooling
Nie et al. [168] Wireless mesh network traffic prediction DBN + Gaussian models
Network prediction Moyo and Sibanda [169] TCP/IP traffic prediction MLP
Wang et al. [170] Mobile traffic forecasting AE + LSTM
Zhang and Patras [171] Long-term mobile traffic forecasting ConvLSTM + 3D-CNN
Zhang et al. [163] Mobile traffic super-resolution CNN + GAN
Wang [172] Traffic classification MLP, stacked AE
Wang et al. [173] Encrypted traffic classification CNN
Traffic classification
Lotfollahi et al. [174] Encrypted traffic classification CNN
Wang et al. [175] Malware traffic classification CNN
Liang et al. [164] Metro density prediction RNN
Felbo et al. [176] Demographics prediction CNN
CDR mining
Chen et al. [177] Tourists’ next visit location prediction MLP, RNN
Lin et al. [178] Human activity chains generation Input-Output HMM + LSTM
Others Xu et al. [179] Wi-Fi hotpot classification CNN
1) Network-Level Mobile Data Analysis: Network- superior performance over SVM and Autoregressive Integrated
level mobile data refers to logs recorded by Internet Moving Average model. An important work presented in [171]
service providers, including infrastructure metadata, network extend the mobile traffic forecasting to long term. The authors
performance indicators and call detail records (CDRs) (see combine ConvLSTM and 3D CNN to construct a spatio-
Table. VIII). The recent remarkable success of deep learning temporal neural networks to capture the complex features the
ignites global interests on exploiting this techniques for Milan city. They further introduce a fine-tune scheme and
mobile network-level data analysis, so as to optimize mobile lightweight approach that blends prediction with historical
networks configurations, thereby improving end-uses’ QoE. mean, which significantly extend the reliable prediction steps.
These work can be generally categorize into four subjects More recently, Zhang et al. propose an original Mobile
according to applications, namely, network prediction, Traffic Super-Resolution (MTSR) technique to infer network-
network traffic classification, CDR mining and radio analysis. wide fine-grained mobile traffic consumption given coarse-
In what follows, we review the success of deep learning grained counterparts obtained by probing, thereby reducing
achieves in these subjects. Before touching the details of traffic measurement overheads [163]. Inspired by image super-
literatures, we compare these work on Table VII for better resolution techniques, they design a deep zipper network along
illustrations. with a Generative Adversarial Network (GAN) to perform
precise MTSR and improve the fidelity of inferred traffic
Network Prediction: Network prediction refers to inferring snapshots. Their experiments over a real-world dataset show
mobile network traffic or performance indicators given his- that their architecture can improve the granularity of mobile
torical measurements or related data. Pierucci and Micheli traffic snapshot by up to 100×, meanwhile significantly out-
investigate the relationship between key objective metrics and performing other interpolation techniques.
QoE [166]. They employ MLPs to predict users’ QoE in Traffic Classification: The task of traffic classification is to
mobile communications given the average user throughput, the identify specific applications or protocols among the traffic
number of active users in the cells, the average data volume in networks. Wang recognizes the powerful feature learning
per user, and the channel quality indicator, demonstrating high ability of deep neural networks [172]. They use deep AE
prediction accuracy. Network traffic forecasting is another field to identify protocols on a TCP flow dataset and achieve
where deep learning is gaining importance. By leveraging excellent precision and recall rates. Work in [173] proposes to
sparse coding and max-pooling, Gwon and Kung develop use 1D-CNN for encrypted traffic classification. They suggest
a semi-supervised deep learning model to classify received that 1D-CNN works well for modeling sequential data and
frame/packet patterns and infer the original properties of has lower complexity, thus being promising in addressing
flows in a WiFi network [167]. Their proposal demonstrates the traffic classification problem. Similarly, Lotfollahi et al.
superior performance over traditional ML techniques. Nie et present Deep Packet based on CNN for encrypted traffic
al. investigate the traffic demand patterns in wireless mesh classification [174]. Their framework reduces the amount
network [168]. They design a DBN along with Gaussian of hand-crafted feature engineering and achieves great
models to precisely estimate traffic distributions. accuracy. CNN has also been used to identify malware traffic,
In [170], Wang et al. propose to use an AE-based architec- where work in [175] regards traffic data as images and
ture and LSTMs to model spatial and temporal correlations unusual patterns that malware traffic exhibit are classified
of mobile traffic distribution respectively. In particular, their by representation learning. Similar work on mobile malware
AE consists of a global stacked AE multiple local stacked detection will be further discussed in subsection VI-E.
AEs for spatial feature extraction, dimension reduction and
training parallelism. Compressed representations extracted are CDR Mining: Telecommunications equipment is fostering
subsequently processed by LSTMs to perform final forecast- massive CDR data everyday. This describes specific instances
ing. Experiments on a real-world dataset demonstrate their of telecommunication transactions such as phone number, cell
ID, session start/end time, traffic consumption, etc. Using pattern recognition and mobile Natural Language Processing
deep learning to mine useful information from CDR data (NLP) and Automatic Speech Recognition (ASR). Table VIII
can serve a variety of functions. For example, Liang et al. summarizes the works in high level and we will discuss
propose Mercury to estimate metro density prediction from several representative work next.
streaming CDR data using RNN [164]. They take trajectory
of a mobile phone user as a sequence of locations, while RNN- Mobile Health There are an increasing variety of wearable
based models work well in handling sequential data. Likewise, health monitoring devices being introduced to the market. By
Felbo et al. use CDR data to study demographics [176]. They equipping medical sensors, these devices can timely capture
employ CNN to predict age and gender of mobile users, body conditions of their carriers, providing real-time feedbacks
which demonstrates superior accuracy over other ML tools. (e.g. heart rate, blood pressure, breath status etc.), or triggering
More recently, Chen et al. compare different ML models to alarms to remind users of taking medical actions [235]. This
predict tourists’ next locations of visit via analyzing CDR data allows to provide timely and in-depth report to patiences and
[177]. Their experiments suggest that RNN-based predictor doctors.
significantly outperforms traditional ML methods, including Liu and Du design a deep learning-driven MobiEar to aid
Naive Bayes, SVM, RF and MLP. deaf people be aware of emergencies [184]. Their proposal ac-
2) App-Level Mobile Data Analysis: Triggered by the cepts acoustic signals as input, allowing users to register differ-
increasing popularity of Internet of Things (IoT), current ent acoustic events of interest. MobiEar operates efficiently on
mobile devices install increasing number of applications and smart phone and only requires infrequently communications
sensors that can collect massive amounts of app-level mo- with servers for update. Likewise, Liu et al. develop a UbiEar
bile data [180]. Employing artificial intelligence to extract operated on Android platform to assist hard-to-hear sufferers
useful information from these data can extend the capability to recognize acoustic events without requirements of location
of devices [64], [181], [182], thus greatly benefiting users information [185]. Their design adopts a lightweight CNN
themselves, mobile operators as well as device manufacturers. architecture for inference acceleration while demonstrating
This becomes an important and popular research direction in comparable accuracy over traditional CNN models.
the mobile networking domain. Nonetheless, mobile devices Hosseini et al. design an edge computing system for health
usually operate in noisy, uncertain and unstable environments, monitoring and treatment [190]. They use CNNs to extract
where their users move fast and change their underlying features from mobile sensor data, which plays an important
location and activity contexts frequently. As a result, adopting role in their epileptogenicity localization application. Stamate
traditional machine learning tools to usually does not work et al. develop a mobile Android app called cloudUPDRS to
well app-level mobile data analysis. Advanced deep learning manage Parkinson’s symptoms for patiences [191]. In their
practices provide a powerful solution for app-level data min- work, MLPs are employed to determine acceptance of data
ing, as they demonstrate better precision and higher robustness collected by smart phones to maintain high-quality data sam-
in IoT applications [183]. ples. Their method proposed outperforms other ML methods
There exist two approaches for app-level mobile data anal- such as GPs and RFs. Quisel et al. suggest that deep learning
ysis, namely (i) cloud-based computing and (ii) edge-based can be effectively exploited for mobile health data analysis
computing. We illustrate the difference between these scenar- [192]. They exploit CNNs and RNNs to classify lifestyle and
ios in Fig. 6. The cloud-based computing treats mobile devices environmental traits of volunteers. Their models demonstrate
as data collectors that constantly send data to servers with superior prediction accuracy over RFs and logistic regression
limited local data preprocessing. Servers gather data received on six datasets.
for model training and inference, subsequently sending results As deep learning performs remarkably in medical data
back to each device (or locally store the analyzed results with- analysis [236], we expect more and more deep learning driven
out dissemination, depending on application requirements). health care devices emerge to help better physical monitoring
The drawback of this scenario is that users have to access and illness diagnosis.
to the Internet so as to send or receive messages from servers,
which will generate extra data transmission overhead and may Mobile Pattern Recognition Recent advanced mobile devices
result in severe latency for edge applications. The edge-based offer people a potable intelligent assistance, which fosters a
computing scenario refers to offloading pre-trained models diverse set of applications that can classify surrounding objects
from the cloud to individual mobile device such that they (e.g. [194]–[196], [199], [200], [207]) or users’ behaviors (e.g.
can make inferences locally. While this scenario requires less [150], [203], [206], [212], [213], [237], [238]) based on mobile
interactions with servers, its applicability is limited by the camera or sensors. We review and compare recent works on
capability of hardware and batteries. Therefore, it can only mobile pattern recognition in this part.
support tasks that require light computations. Object classification in pictures taken by mobile devices
Many researchers employ deep learning for app-level is drawing increasing research interest. Li et al. develop
mobile data analysis. We group these work according to DeepCham as a mobile object recognition framework [194].
their application domains, namely mobile healthcare, mobile Their architecture involves a crowd-sourcing labeling process
to reduce hand-labeling effort and a collaborative training
14 Human profile source: https://lekeart.deviantart.com/art/male-body- instance generation pipeline tailored to deployment on mobile
profile-251793336 devices. Evaluations on the prototype system suggest that their
Communications Communications
Model Training
& Updates
Query Model
Pretraining
Model
Neural Network
update
Results Results
Neural Network
Inference
Local
Results
Storage and Inference
analysis
Data Collection Data Collection
Cloud-based Edge-based
Fig. 6: Illustration of two deployment approaches for app-level mobile data analysis, namely cloud-based (left) and
edge-based (right). The cloud-based approach makes inference on clouds and send results to edge devices. On the contrary,
the edge-based approach deploys models on edge devices which can make local inference.
14
framework is efficient and effective in terms of training and They evaluate their proposal on 3 offline datasets, where
inference. Tobías et al. investigate the applicability employing their proposal yields higher accuracy over statistics methods
CNN schemes on mobile devices for objection recognition and Principal components Analysis (PCA). Almaslukh et al.
tasks [195]. They conduct experiments on 3 deployment employ a deep AE to perform human activity recognition
scenarios (i.e. deployment (GPU, CPU and mobile) on 2 by analyzing an offline smart phone dataset collected by
benchmark datasets, which suggest that deep learning models accelerometers and gyroscope sensors [204]. Li et al. consider
can be efficiently embedded in mobile devices to perform real- different scenarios for activity recognition [205]. In their im-
time inference. plementation, Radio Frequency IDentification (RFID) data is
Mobile classifier can also assist applications on Virtual directly sent to a CNN model for recognizing human activities.
Reality (VR). A CNN framework is proposed in [199] While their mechanism achieves high accuracies in different
for facial expressions recognition when users are wearing applications, experiments suggest that RFID-based method
head-mounted displays in VR environment. They suggest does not work well for metal objects or liquid containers.
that this system can assist a real-time users interaction mode. [206] exploits RBM to predict human activities given 7
Rao et al. incorporate a deep learning object detector into a types of sensor data collected by smart watch. Experiments
mobile augmented reality (AR) system [201]. Their system on prototype devices can efficiently fulfill the recognition
achieves outstanding performance in detecting and enhancing objective under tolerable power requirements. Ordóñez
geographic objects in outdoor environments. Another work and Roggen architect an advanced ConvLSTM to fuse
focuses on mobile AR applications is introduced in [239], data gathered from multiple sensors and perform activity
where the authors characterize of the tradeoffs between recognition [150]. By leveraging structures of CNN and
accuracy, latency, and energy efficiency of object detection. LSTM, ConvLSTM can automatically compress spatio-
temporal sensor data into low-dimensional representations
Activity recognition is another interesting field which relies without heavy data post-processing effort. Wang et al. exploit
on data collected by mobile motion sensors [238], [240]. Google Soli to architect a mobile user-machine interaction
Heterogeneous sensors (e.g. motion, accelerometer, infrared, platform [208]. By analyzing radio frequency captured by
etc.) equipped on mobile devices can collect versatile types millimeter-wave radars, their architecture is able to recognize
of data that capture different activity information. Data col- 11 types of gestures with high accuracy. Their models are
lected will be delivered to servers for model training and the trained on the server side, while making inferences locally on
model will be subsequently deployed to for domain-specific mobile devices.
tasks. Essential features of sensor data can be automatically
extracted by neural networks, making deep learning powerful Mobile NLP and ASR Recent remarkable achievements
in performing complex activity recognition. obtained by deep learning on Natural Language Processing
The first work based on deep learning appeared in 2014, (NLP) and Automatic Speech Recognition (ASR) are shift
where Zeng et al. employ a CNN to capture local dependency their applications to mobile devices.
and preserve scale invariance of motion sensor data [203]. Powered by deep learning, the intelligent assistance Siri
TABLE VIII: A summary of works on app-level mobile data analysis.

Subject Reference Application Deployment Model
Liu and Du [184] Mobile ear Edge-based CNN
Liu at al. [185] Mobile ear Edge-based CNN
Jindal [186] Heart rate prediction Cloud-based DBN
Kim et al. [187] Cytopathology classification Cloud-based CNN
Mobile Healthcare Sathyanarayana et al. [188] Sleep quality prediction Cloud-based MLP, CNN, LSTM
Li and Trocan [189] Health conditions analysis Cloud-based Stacked AE
Hosseini et al. [190] Epileptogenicity localisation Cloud-based CNN
Stamate et al. [191] Parkinson’s symptoms management Cloud-based MLP
Quisel et al. [192] Mobile health data analysis Cloud-based CNN, RNN
Khan et al. [193] Respiration surveillance Cloud-based CNN
Li et al. [194] Mobile object recognition Edge-based CNN
Edge-based &
Tobías et al. [195] Mobile object recognition CNN
Cloud based
Pouladzadeh and
Food recognition system Cloud-based CNN
Shirmohammadi [196]
Tanno et al. [197] Food recognition system Edge-based CNN
Kuhad et al. [198] Food recognition system Cloud-based MLP
Teng and Yang [199] Facial recognition Cloud-based CNN
Wu et al. [200] Mobile visual search Edge-based CNN
Rao et al. [201] Mobile augmented reality Edge-based CNN
Ohara et al. [202] WiFi-driven indoor change detection Cloud-based CNN,LSTM
Zeng et al. [203] Activity recognition Cloud-based CNN, RBM
Almaslukh et al. [204] Activity recognition Cloud-based AE
Li et al. [205] RFID-based activity recognition Cloud-based CNN
Smart watch-based activity
Bhattacharya and Lane [206] Edge-based RBM
recognition
Mobile Pattern Recognition Edge-based &
Antreas and Angelov [207] Mobile surveillance system CNN
Cloud based
Ordóñez and Roggen [150] Activity recognition Cloud-based ConvLSTM
Wang et al. [208] Gesture recognition Edge-based CNN, RNN
Gao et al. [209] Eating detection Cloud-based DBM, MLP
Zhu et al. [210] User energy expenditure estimation Cloud-based CNN, MLP
Sundsøy et al. [211] Individual income classification Cloud-based MLP
Chen and Xue [212] Activity recognition Cloud-based CNN
Ha and Choi [213] Activity recognition Cloud-based CNN
Edel and Köppe [214] Activity recognition Edge-based Binarized-LSTM
Multiple overlapping activities
Okita and Inoue [215] Cloud-based CNN+LSTM
recognition
Activity recognition using Apache
Alsheikh et al. [17] Cloud-based MLP
Spark
Edge-based &
Mittal et al. [216] Garbage detection CNN
Cloud based
Seidenari et al. [217] Artwork detection and retrieval Edge-based CNN
Zeng et al. [218] Mobile pill classification Edge-based CNN
Mobile activity recognition, emotion
Lane and Georgiev [63] Edge-based MLP
recognition and speaker identification
Car tracking,heterogeneous human
Yao et al. [219] activity recognition and user Edge-based CNN, RNN
identification
Zeng [220] Mobile object recognition Edge-based Unknown
Katevas et al. [221] Notification attendance prediction Edge-based RNN
Radu et al. [141] Activity recognition Edge-based RBM, CNN
Wang et al. [222], [223] Activity and gesture recognition Cloud-based Stacked AE
Cao et al. [224] Mood detection Cloud-based GRU
Wu et al. [225] Google’s Neural Machine Translation Cloud-based LSTM
15 Mixture density
Siri Speech synthesis Edge-based
networks
Mobile NLP and ASR McGraw et al. [226] Personalised speech recognition Edge-based LSTM
Prabhavalkar et al. [227] Embedded speech recognition Edge-based LSTM
Yoshioka et al. [228] Mobile speech recognition Cloud-based CNN
Ruan et al. [229] Shifting from typing to speech Cloud-based Unknown
Georgiev et al. [82] Multi-task mobile audio sensing Edge-based MLP
Ignatov et al. [230] Mobile images quality enhancement Cloud-based CNN
Information retrieval from videos in
Lu et al. [231] Cloud-based CNN
Others wireless network
Reducing distraction for smartwatch
Lee et al. [232] Cloud-based MLP
users
Vu et al. [233] Transportation mode detection Cloud-based RNN
Fang et al. [234] Transportation mode detection Cloud-based MLP
developed by Apple employs deep mixture density networks Mobility Analysis: Since deep learning is able to capture spa-
[241] fix its robotic voice [242]. This enables to synthesis a tial dependency in sequential data, it is becoming a powerful
more human-like voice which sounds more natural to humans. tool for mobility analysis. The applicability of deep learning
An Android app released by Google is developed to support for trajectory prediction is studied in [248]. By sharing repre-
mobile personalized speech recognition [226]. It quantizes sentations learned by RNN and Gate Recurrent Unit (GRU),
parameters in LSTM model compression, allowing the app to the framework can perform multi-task learning on both social
run on low-power mobile phones. Likewise, Prabhavalkar et networks and mobile trajectories modeling. Specifically, they
al. propose a mathematical RNN compression technique that first use deep learning to reconstruct social network representa-
reduces the two third size of a LSTM acoustic model while tions of users, subsequently employing RNN and GRU models
only compromising negligible accuracy [227]. This allows to learn patterns of mobile trajectories with different time
building both memory- and energy-efficient ASR applications granularity. Importantly, these two components jointly share
on mobile devices. representations learned, which tightens the overall architecture
Yoshioka et al. present their deep learning proposal and enables efficient implementation. Ouyang et al. argue
submitted CHiME-3 challenge [243]. Their framework mobility data are normally high-dimensional which may be
incorporates a network-in-network architecture into a CNN problematic for traditional ML models. Instead they build upon
model which allows to perform ASR for mobile multi- deep learning advances and propose a online learning scheme
microphone devices used under noisy environments [228]. to train a hierarchical CNN architecture, allowing model
Mobile ASR can also accelerate the text input on mobile parallelism for data stream processing [247]. By analyzing
devices, where Ruan et al.’s study shows that with the help usage detail records, their framework “DeepSpace” predicts
of ASR, input rates of English Mandarin are 3.0 and 2.8 individuals’ trajectories with much higher accuracy on a real-
times faster over typing on keyboards [229]. They conduct world dataset as compared to naive CNNs.
experiments on a test-bed app based on iOS system, and Instead of focusing on individual trajectories, Song et al.
demonstrate impressive accuracy and acceleration by shifting shed light on the mobility analysis on a larger scale [249]. In
from typing to speech on mobile devices. their work, LSTM networks are exploited to jointly model the
city-wide movement patterns of a large group of people and
Others applications: Beyond aforementioned subjects, deep vehicles. Their multi-task architecture demonstrates superior
learning also plays an important role in other applications prediction accuracy over vanilla LSTM. City-wide mobile
in app-level data analysis. For instance, Ignatov et al. show patterns is also researched in [250], where the authors archi-
that deep learning can enhance the quality of pictures taken tect a deep spatio-temporal residual networks to forecast the
by mobile phones. By employing a CNN, they successfully movement of crowds. In order to capture the unique charac-
improve the quality images obtained by different mobile teristics of spatio-temporal correlations associated with human
devices to a digital single-lens reflex camera level [230]. Lu mobility, their framework abandons RNN-based models and
et al. focus on video post-processing under wireless networks constructs three ResNets to extract nearby and distant spatial
[231], where their framework exploits a customized AlexNet dependencies of human mobility in a city. This scheme learns
to answer queries about object detections. This framework temporal features and fuses all representations extracted by
further involves an optimizer, which instructs mobile devices all models for the final prediction. By incorporating external
to offload videos to reduce query response time. events information, their proposal achieves the highest accu-
Another interesting application is presented in [232], where racy among all deep learning and non-deep learning methods.
Lee et al. show that deep learning can help smartwatch users Lin et al. consider generating human movement chains
reduce distraction by eliminating unnecessary notifications. from cellular data to support transportation planning [178]. In
Specifically, they use an 11-layer MLP to predict the impor- particular, they first employ an input-output Hidden Markov
tance of a notification. Fang et al. exploit a MLP to extract Model (HMM) to label activity profiles for CDR data pre-
features from high-dimensional and heterogeneous sensor data processing. Subsequently, a LSTM is designed for activity
[234], which achieves high transportation mode recognition chain generation, given the labeled activity sequences. They
accuracy. further synthesize urban mobility plans using the generative
model and the simulation yields decent fit accuracy as the
B. Deep Learning Driven Mobility Analysis and User Local- real data.
ization User Localization: Deep learning is also playing an important
Understanding movement patterns of a group of human role in user localization. To overcome the variability and
beings is becoming crucial for epidemiology, urban planning, coarse-grained limitations of signal strength based methods,
public service provisioning and mobile network resource Wang et al. propose a deep learning driven fingerprinting
management [244]. Location-based services and applications system name “DeepFi” to perform indoor localization based on
(e.g. mobile AR, GPS) demand precise individual positioning Channel State Information (CSI) [253]. Their toolbox yields
technology [245]. As a result, research on user localization much higher accuracy than traditional methods including in-
is evolving rapidly, cultivating a set of emerging positioning cluding FIFS [264], Horus [265], and Maximum Likelihood
techniques [246]. In this subsection, we give an overview on [266]. The same group of authors extend their work in [222],
related work in Table IX and discuss them next. [223] and [254], [255], where they update the localization
system, such that it can work with calibrated phase information
TABLE IX: A summary of work on deep learning driven mobility analysis and indoor localization.
Domain Reference Application Model Key contribution
Mobile user trajectory Online framework for data stream
Ouyang et al. [247] CNN
prediction processing
Social Networks & mobile
Yang et al. [248] RNN, GRU Multi-task learning
trajectories modeling
Mobility analysis
City-wide mobility prediction
Song et al. [249] Multi-task LSTM Multi-task learning
& transportation modelling
Deep spatio-temporal Exploitation of spatio-temporal
Citywide crowd flows
Zhang et al. [250] residual networks characteristics of mobility external
prediction
(CNN-based) events
Human activity chains Input-Output HMM +
Lin et al. [178] Generative model
generation LSTM
Subramanian and Sadiq Require less location update and
Mobile movement prediction MLP
[251] paging signaling costs
Operate with received signal
Ezema and Ani [252] Mobile location estimation MLP strength in Global System for
Mobile Communications
First deep learning driven indoor
Wang et al. [253] Indoor fingerprinting RBM
localization based on CSI
Work with calibrated phase
Wang et al. [222], [223] Indoor localization RBM
information of CSI
Use more robust angle of arriving
Wang et al. [254] Indoor localization CNN
for estimation
User localization
Bi-modal framework using both
Wang et al. [255] Indoor localization RBM angle of arriving and average
amplitudes of CSI
Nowicki and Require less effort for system
Indoor localization Stacked AE
Wietrzykowski [256] tuning or filtering
Device-free framework, multi-task
Wang et al. [257], [258] Indoor localization Stacked AE
learning
Handle unlabeled data;
Mohammadiet al. [259] Indoor localization VAE+DQN reinforcement learning aided
semi-supervised learning
Kumar et al. [260] Indoor vehicles localization CNN Focus on vehicles applications
Online learning scheme;
Zheng and Weng [261] Outdoor navigation Developmental network
edge-based
Indoor and outdoor Operate under both indoor and
Zhang et al. [262] Stacked AE
localization outdoor environments
Massive MIMO Operate with massive MIMO
Vieira et al. [263] CNN
fingerprint-based positioning channels
of CSI [222], [223]. They further use more sophisticated CNN the real (labeled data) and inferred (via a VAE) distance to the
[254] and bi-modal [255] to improve the accuracy. target. The agent can virtually move to eight directions at each
Nowicki and Wietrzykowski [256] propose to exploit deep time step. Each time it takes an action, it receives an reward
learning to reduce effort of indoor localization. There frame- signal, identifying whether it moves to a correct direction. By
work obtains satisfactory prediction performance, while reduc- employing deep Q learning, the agent can finally accurately
ing significantly effort for system tuning or filtering. Wang localize user given both labeled and unlabeled data. This
et al. suggest that the objective of indoor localization can work represents a important step towards IoT data analysis
be achieved without the help of mobile devices. In [258], applications, as it only requires limit supervision.
the authors employ an AE to learn useful patterns from Beyond indoor localization, there also exist several research
WiFi signals. By automatic feature extraction, they produce a works that apply deep learning to the outdoor scenario. For
predictor that can fulfill multi-tasks simultaneously, including example, Zheng and Weng introduce a lightweight develop-
indoor localization, activity, and gesture recognition. Kumar mental network for outdoor navigation application on mobile
et al. use deep learning to address the problem of indoor devices [261]. Compared to CNNs, their architecture required
vehicles localization [260]. They employ CNNs to analyze 100 times fewer weights to update compared to original
visual signal to localize vehicles in a car park. This can help CNN while maintaining decent accuracy. This enables efficient
driver assistance systems operate in underground environments outdoor navigation implementation on mobile devices. Work
where the system has limited available vision. in [262] studies localization under both indoor and outdoor
Most mobile devices can only produce unlabeled position environments. They use an AE to pre-train a four-layer MLP,
data, therefore unsupervised and semi-supervised learning in order to avoid hand-craft feature engineering. The MLP can
become essential. Mohammadi et al. address this problem by subsequently used to estimate the coarse position of targets.
leveraging of DRL and VAE. In particular, their framework en- They further introduce an HMM to fine-tune the prediction
visions a virtual agent in indoor environments [259]. This can based on temporal properties of data. This improves the
constantly receive state information during training, including accuracy estimation in both in/outdoor positioning with Wi-
signal strength indicators, current location of the agent and Fi signals.
TABLE X: A summary of work on deep learning driven A comprehensive comparison of different training algorithms
WSNs. apply MLP-based node localization is presented in [269].
Reference Application Model Their experiments suggest that the Bayesian regularization
Chuang and Jiang [267] Node localization MLP algorithm in general yields the best performance. Dong et al.
Bernas and Płaczek [268] Indoor localization MLP consider an underwater node localization scenario [270]. Since
Payal et al. [269] Node localization MLP
Dong et al. [270] Underwater localization MLP acoustic signal suffers from the loss caused by absorption,
Smoldering and flaming scattering, noise, and interference, underwater localization is
Yan et al. [271] MLP
combustion identification not straightforward. By adopting a deep neural network, their
Wang et al. [272] Temperature correction MLP
framework successfully addresses the aforementioned chal-
Lee et al. [273] Online query processing CNN
Hopfield lenges and achieves higher inference accuracy as compared
Li and Serpen [274] Self-adaptive WSN
network to SVM and generalized least square methods.
Khorasani and Naji [275] Data aggregation MLP Deep learning has also been exploited for identification
Li et al. [276] Distributed data mining MLP
of smoldering and flaming combustion phases in forests. In
[271], Yan et al. embed a set of sensors into a forest to
monitor CO2 , smoke, and temperature. They suggest that
C. Deep Learning Driven Wireless Sensor Networks
various burning scenarios will emit different gases, which
The Wireless Sensor Networks (WSNs) consists of a set can be taken advantage to classify smoldering and flaming
of unique or heterogeneous sensors that are distributed over combustion. Wang et al. consider to apply deep learning to
geographical regions. Theses sensors collaboratively monitor correct inaccurate measurements of air temperature [272].
physical or environment status (e.g. temperature, pressure, mo- They discover a close relationship between solar radiation and
tion and pollution) and transmit data collected to centralized actual air temperature, which can be effectively learned by
servers through wireless channels, see wireless sensor network neural networks.
(purple circle in Fig. 5 for an illustration). A WSN typically Missing data or de-synchronization are common in WSN
involves three key core tasks, namely sensing, communication data collection. These may lead to serious problems in analysis
and analysis. Deep learning is becoming increasingly popular due to the inconsistency. Lee et al. address this problem by
for WSN data analysis. In what follows, we review works plugging in a query refinement component in a deep learning
of deep learning driven WSN. Note that this is distinct from based WSN analysis systems [273]. They employ exponential
mobile data analysis we discuss at subsection VI-A, as in smoothing to infer missing data, thereby maintaining the
this subsection we only focus on WSN applications. Before integrity of data for deep learning analysis without signif-
starting, we summaries these works in Table X. icantly compromising accuracy. To enhance the intelligence
There exist two data processing scenarios in WSNs, namely of WSNs, Li and Serpen embed an artificial neural network
centralized and decentralized. The former simply takes sensors into a WSN, allowing it to agilely react to potential changes
as data collectors, who are only responsible for gathering data and following deployment in the field [274]. To this end,
and send them to a central server for processing. The latter they employ a minimum weakly-connected dominating set to
nevertheless assumes sensors have somewhat computational represent the topology for the WSN, and subsequently use a
ability. The main server offloads part of the jobs to the edge Hopfield recurrent neural network as a static optimizer to adapt
and each sensor will perform data processing individually. network infrastructure to potential changes as necessary. This
Work in [275] focuses on the centralized scheme, where the work represents an important step toward embedding machine
authors apply a 3-layer MLP to reduce data redundancy while intelligences to WSNs.
maintaining essential points for data aggregation. These data Turning attention again to Table X, it is interesting to see
are sent to a central server for analysis. In contrast, Li et al. that the majority of deep learning practices on WSN employ
propose to distribute data mining to individual sensors [276]. MLP models. Since MLP is straightforward to architect and
They partition a deep neural network into different layers and perform reasonably well, it remains a good candidate for WSN
offload layer operations to sensor nodes. Simulations con- applications. On the other hand, since most of sensor data
ducted suggest that by pre-processing with neural networks, collected is sequential, we are expecting RNN-based models
their framework obtains high fault detection accuracy, while can play a more important role in this area. This can be a
reducing power consumption at the central server. promising future research direction.
Chuang and Jiang exploit neural networks to localize sensor
nodes in WSNs [267]. To adapt deep learning model to specific
network topology, they employ an online training scheme D. Deep Learning Driven Network Control
and correlated topology-trained data, enabling efficient models In this part, we turn our attention to mobile network control
implementations and accurate location estimations. Based on problems. Due to the powerful function approximation mecha-
this, Bernas and Płaczek architect an ensemble system that nism, deep learning has made remarkable breakthroughs in im-
involves multiple MLPs for location estimation in different proving traditional reinforcement learning [24] and imitation
regions of interest [268]. In this scenario, node locations learning [277]. These advances have potential to solve mobile
inferred by multiple MLPs are fused by a fusion algorithm, network control problems which are previously complex or
which improves the localization accuracy, particularly bene- intractable [278], [279]. Recall that in reinforcement learning,
fiting to sensor nodes that are around boundaries of regions. an agent continuously interacts with the environment to learn
Reinforcement Learning management of network resources and functions for a given

Rewards
environment to improve the network performance. Deep
Neural network agent
Mobile network
environment
learning recently achieved several successful results in this
State
representation
area. For example, Liu et al. exploit a DBN to discover
Actions
the correlations between multi-commodity flow demand
information and link usage in wireless networks [302]. Based
on the predictions, they remove the links that are unlikely
Observed state
to be scheduled, so as to reduce the size of data for the
demand constrained energy minimization. Their method
Imitation Learning reduces runtime by up to 50%, without compromising the
Demonstration
optimality. Subramanian and Banerjee propose to use deep
learning to predict the health condition of heterogeneous
devices in machine to machine communications [280]. The
Mobile network
results obtained are subsequently exploited for optimizing
Desired
Neural network agent actions environment health aware policy change decisions. He et al. employ deep
Observation
representation Learning reinforcement learning to address caching and interference
Predicted alignment problems in wireless networks [281], [282]. In
actions
particular, they treat time-varying channels as finite-state
Markov channels and apply deep Q networks to learn
Observations
the best user selection policy in interference alignment
wireless networks. This novel framework demonstrates
Analysis-based Control significantly higher sum rate and energy efficiency over
Mobile network existing approaches.
Neural network analysis environment
Observation
representation
Analysis
results
Routing: Deep learning can also improve the efficiency of
Outer routing rules. Lee et al. exploit a 3-layer deep neural network
controller
to classify node degree given detailed information of the
Observations routing node [284]. The classification results along with
temporary routes are exploited for the following virtual route
Fig. 7: Principles of three control approaches applied in generation using the Viterbi algorithm. Mao et al. employ
mobile and wireless networks control, namely reinforcement DBN to decide the next routing node to construct a software
learning (above), imitation learning (middle), and analysis- defined router [140]. By considering Open Shortest Path First
based control (below). as the optimal routing strategy, their method achieves up
to 95% accuracy, while reducing significantly less overhead
and delay, and achieving higher throughput with signaling
interval of 240 milliseconds. A similar conclusion is obtained
the best action. With constant exploration and exploitation, in [285], where the authors employ Hopfield neural networks
the agent learns to perform without feedback to maximize its for routing, achieving better usability and survivability in
expected return. Imitation learning follows a different learning mobile ad hoc network application scenarios.
paradigm called “learning by demonstration”. This learning
paradigm relies on a ‘teacher’ (usually a human) who tells the Scheduling: There are several studies investigate schedul-
agent what action should be executed under certain observa- ing with deep learning. Zhang et al. introduce a deep Q
tions during the training. After sufficient demonstrations, the learning-powered hybrid dynamic voltage and frequency scal-
agent learns a policy that imitate the behavior of its teacher ing scheduling mechanism to reduce the energy consumption
and can operate itself without supervision. in real-time systems (e.g. Wi-Fi communication, IoT, video
Beyond these two learning scenarios, analysis-based control applications) [287]. In their proposal, an AE is employed to
is gaining attraction in mobile networking. Specifically, this approximately the Q function, and the framework perform
scheme uses ML learning models for network data analysis, experience replay [303] to stabilize the training process and
and subsequently exploits the results to aid network control. accelerate the convergence. Simulations demonstrate that this
Unlike the prior scenarios, analysis-based control paradigm method reduces by 4.2% energy consumption over a traditional
does not directly output the actions. Instead, it extract useful Q learning based method. Similarly, the work in [288] uses
information and deliver them to an extra agent to execute deep Q learning for scheduling in roadside communication
the actions. We illustrate the principles between the three networks. In particular, interactions between vehicular envi-
control paradigms in Fig. 7. We reviews works which have ronments, including the sequence of actions, observations, and
been proposed so far in this subsection and summarize them reward signals are formulated as a MDP. By approximating
in Table XI. the Q value function, the agent learns a scheduling policy
that achieves less incomplete requests, latency and busy time
Network Optimization: Network Optimization refers to the and longer battery life time compared to traditional scheduling
TABLE XI: A summary of work on deep learning driven network control.

Domain Reference Application Control approach Model
Demand constrained energy
Liu et al. Liu et al. Analysis-based DBN
minimization
Network optimization Subramanian and Machine to machine system Deep multi-modal
Analysis-based
Banerjee [280] optimization network
Caching and interference
He et al. [281], [282] Reinforcement learning Deep Q learning
alignment
mmWave Communication
Masmar and Evans [283] Reinforcement learning Deep Q learning
performance optimization
Lee et al. [284] Virtual route assignment Analysis-based MLP
Yang et al. [285] Routing optimization Analysis-based Hopfield neural networks
Routing
Mao et al. [140] Software defined routing Imitation learning DBN
Tang et al. [286] Wireless network routing Imitation learning CNN
Hybrid dynamic voltage and
Zhang et al. [287] Reinforcement learning Deep Q learning
frequency scaling scheduling
Scheduling Roadside communication network
Atallah et al. [288] Reinforcement learning Deep Q learning
scheduling
Chinchali et al. [289] Cellular network traffic scheduling Reinforcement learning Policy gradient
Roadside communication
Atallah et al. [288] Reinforcement learning Deep Q learning
networks scheduling
Resource management over
Sun et al. [290] Imitation learning MLP
wireless networks
Resource allocation Resource allocation in cloud radio
Xu et al. [291] Reinforcement learning Deep Q learning
access networks
Deep State-Action-
Resource management in
Ferreira et al. [292] Reinforcement learning Reward-State-Action
cognitive space communications
(SARSA)
Resource allocation in
Ye and Li [293] Reinforcement learning Deep Q learning
vehicle-to-vehicle communication
Naparstek and Cohen
Dynamic spectrum access Reinforcement learning Deep Q learning
[294]
Radio control
O’Shea and Clancy [295] Radio control and signal detection Reinforcement learning Deep Q learning
Intercell-interference cancellation
Wijaya et al. [296], [297] Imitation learning RBM
and transmit power optimisation
Mao et al. [298] Adaptive video bitrate Reinforcement learning A3C
Others Oda et al. [299], [300] Mobile actor node control Reinforcement learning Deep Q learning
Kim [301] IoT load balancing Analysis-based DBN
methods. less energy. In addition, Ferreira et al. employ deep State-

More recently, Chinchaliet al. present a policy gradient Action-Reward-State-Action (SARSA) to address resource
based scheduler to optimize cellular network traffic flow allocation management in the cognitive communications
[289]. Specifically, they cast the scheduling problem as a [292]. By forecasting effects of radio parameters, their
MDP and employ RF to predict network throughput which framework avoids wasted trials of bad parameters which
is subsequently used as a component as a reward function. reduce computational resource required.
Evaluations over a realistic network simulator demonstrate
that their proposal can dynamically adapt to traffic variation, Radio Control: In [294], the authors address the dynamic
which enables mobile networks to carry 14.7% more data spectrum access problem in multichannel wireless network
traffic while outperforming heuristic schedulers by more than environments using deep reinforcement learning. In this
2×. setting, they incorporate LSTM into a deep Q network to
maintain and memorize the historical observations, allowing
Resource Allocation: Sun et al. use a deep neural network the architecture to perform precise state estimation given
to approximate the mapping between the input and output partial observations. They distribute the training process to
of the Weighted Minimum Mean Square Error resource each users, which enables effective training parallelization
allocation algorithm [304] under interference-limited wireless and learning good policies for individual users. Experiments
network environments [290]. By effective imitation learning, demonstrate that this framework achieves double the channel
the neural network approximation achieves close performance throughput when compared to a benchmark method. The work
as its teacher, while only requiring 256 shorter runtime. [295] sheds light on the radio control and signal detection
Deep learning has also been applied to the cloud radio problems. In particular, the authors introduce a radio signal
access networks, where Xu et al. employ deep Q learning to search environment based on Gym Reinforcement Learning.
determine on/off mode of remote radio heads given current Their agent exhibits steady learning process and is able to
mode and user demand [291]. Comparisons with Single learn a radio signal search policy.
base station association and fully coordinated association
methods suggest that the proposed DRL controller allows the Other applications: Beyond the application previously dis-
system to satisfy user demand while requiring significantly cussed, deep learning is also playing an important role in other
network control problems. Mao et al. develop the Pensieve [308]. The experiments suggest that the detection accuracy
system that generates adaptive video bit rate algorithms using can significantly benefit from the depth of AEs.
deep reinforcement learning [298]. Specifically, Pensieve em- Distributed attack detection is also an important issue in
ploys a state-of-the-art deep reinforcement learning algorithm mobile network security. Khan et al. focus on detecting flood-
A3C, which takes the bandwidth, bit rate and buffer size as ing attacks in wireless mesh networks [309]. They simulate
input, and selects the best bit rate which leads to best expected a wireless environment with 100 nodes, and artificially inject
return. The model is trained on an offline setting and deployed intermediate and severe distributed flooding attacks to generate
on an adaptive bit rate server, demonstrating that the system synthetic dataset. Their deep learning based methods achieve
outperforms the best existing scheme by 12%-25% in terms excellent false positive and false-negative rates. Distributed
of QoE. This work represents the first important step towards attacks are also studied in [310], where the authors focus on
implementing DRL onto real network systems. Kim links deep an IoT scenario. Another work in [311] employs MLPs to
learning with the load balancing problem of IoT [301]. He detect distributed denial of service attacks. By characterizing
suggests that DBN can effectively analyze network load and typical patterns of attack incidents, their model work well
process structural configuration, thereby achieving efficient in detecting both known and unknown distributed denial of
load balancing in IoT. service intrusions.
Martin et al. propose a conditional VAE to identify
intrusion incidents in IoT [312]. In order to improve detection
E. Deep Learning Driven Network Security
performance, their VAE infers missing features associated
With the increasing popularity of wireless connectivity, with incomplete measurements, which is common in an IoT
protecting users, network equipment and data from malicious environment. The true data labels are embedded into the
attacks, unauthorized access and information leakage becomes decoder layers to assist final classification. Evaluations on the
crucial. Cybersecurity systems shelter mobile devices and well-known NSL-KDD dataset [335] demonstrate that their
users through firewalls, anti-virus software, and an Intrusion model achieves remarkable accuracy in identifying Denial
Detection System (IDS) [305]. The firewall is an access of service, probing, remote to user and user to root attacks,
security gateway between two networks. It allows or blocks outperforming traditional ML methods by 0.18 in terms of
the uplink and downlink network traffic based on pre-defined F1 score.
rules. Anti-virus software detect and remove computer viruses,
worms and Trojans and malwares. IDSs identify unauthorized Malware Detection: Nowadays, mobile devices are carrying
and malicious activities or rule violations in information considerable amount of private information. This information
systems. Each performs its own functions to protect network can be stolen and exploited by malicious apps installed on
communication, central servers and edge devices. smartphones for ill-conceived purposes [336]. Deep learning
Modern cyber security systems benefit increasingly from is being exploited for analyzing and detecting such threats.
deep learning [334], since it can enable the system to (i) Yuan et al. use both labeled and unlabeled mobile apps to
automatically learn signatures and patterns from experience train a RBM [313]. By learning from 300 samples, their model
and generalize to future intrusions (supervised learning); or can classify Android malware with remarkable accuracy, out-
(ii) identify patterns that are clearly differed from regular performing traditional ML tools by up to 19%. Their follow-
behavior (unsupervised learning). This dramatically reduces up research in [314] named Droiddetector further improves the
the effort of pre-defined rules for discriminating intrusions. detection accuracy by 2%. Similarly, Su et al. analyze essential
Beyond protecting networks from attacks, deep leaning can features of Android apps, namely requested permission, used
also play the role of “attacker”, having huge potential for permission, sensitive application programming interface calls,
stealing or cracking users’ password or information. We action and app components [315]. They employ DBNs to
summarize these work in Table XII, which we will discuss extract features of malware and an SVM for classification,
next. achieving high accuracy and only requiring 6 seconds per
inference instance.
Anomaly Detection: Anomaly detection, which aims at iden- Hou et al. attack the malware detection problem from a
tifying network events (e.g. attack, unexpected access and different perspective. Their research points out that signature-
use of data) that do not conform to an expected behaviors, based detection is insufficient to deal with sophisticated
is becoming a key technique in IDSs. Many researchers put Android malware [316]. To address this problem, they
effort on this areas by exploiting the outstanding unsupervised propose the Component Traversal which can automatically
ability of AEs [306]. For example, Thing investigates features execute code routines to construct weighted directed graphs.
of attacks and threats exist in IEEE 802.11 networks [139]. By employing a Stacked AE for graph analysis, their
He employs a stacked AE to categorize network traffic into framework Deep4MalDroid can accurately detect Android
5 types (i.e. legitimate, flooding, injection and impersonation malware that intentionally repackage and obfuscates to
traffic), achieving 98.67% overall accuracy. The AE is also bypass signatures and hinders analysis attempts to their inner
exploited in [307], where Aminanto and Kim use an MLP and operations. This work is followed by that of Martinelli et
stacked AE for feature selection and extraction, demonstrating al., who exploit CNNs to discover the relationship between
remarkable performance. Similarly, Feng et al. use AE to app types and extracted syscall traces from real mobile
detect abnormal spectrum usage in wireless communication devices [317]. The CNN has also been used in [318],
TABLE XII: A summary of work on deep learning driven network security.

Learning
Application Reference Application Problem considered Model
paradigm
Unsuper-
Cyber security Malware classification & Denial of service,
Azar et al. [306] vised & Stacked AE
applications probing, remote to user & user to root
supervised
IEEE 802.11 network Unsuper-
Flooding, injection and impersonation
Thing [139] anomaly detection and vised & Stacked AE
Intrusion detection attack
attack classification supervised
Unsuper-
Aminanto and Wi-Fi impersonation Flooding, injection and impersonation
vised & MLP, AE
Kim [307] attack detection attack
supervised
Unsuper-
Spectrum anomaly
Feng et al. [308] Additive white Gaussian noise vised & AE
detection
supervised
Flooding attack detection Intermediate and severe distributed flood
Khan et al. [309] Supervised MLP
in wireless mesh networks attack
Diro and IoT distributed attack Denial of service, probing, remote to user
Supervised MLP
Chilamkurti [310] detection & user to root
Distributed denial of Known and unknown distributed denial of
Saied et al. [311] Supervised MLP
service attack detection service attack
Unsuper-
Martin et al. Denial of service, probing, remote to user Conditional
IoT intrusion detection vised &
[312] & user to root VAE
supervised
Unsuper-
Android malware Apps in contagio mobile and Google Play
Yuan et al. [313] vised & RBM
detection Store
supervised
Unsuper-
Android malware Apps in contagio mobile and Google Play
Yuan et al. [314] vised & DBN
Malware detection detection Store and Genome Project
supervised
Apps in Drebin, Android Malware Unsuper-
Android malware
Su et al. [315] Genome Project and the Contagio vised & DBN + SVM
detection
Community and Google Play Store supervised
Unsuper-
Android malware App samples from Comodo Cloud Security
Hou et al. [316] vised & Stacked AE
detection Center
supervised
Android malware Apps in Drebin, Android Malware
Martinelli [317] Supervised CNN
detection Genome Project and Google Play Store
McLaughlin et al. Android malware Apps in Android Malware Genome project
Supervised CNN
[318] detection and Google Play Store
Malicious application Unsuper-
Chen et al. [319] detection in the edge Publicly-available malicious applications vised & RBM
network supervised
Malware traffic
Wang et al. [175] Traffic extracted from 9 types of malware Supervised CNN
classification
Oulehla et al.
Mobile botnet detection Client-server and hybrid botnets Unknown Unknown
[320]
Botnet detection Torres et al. [321] Botnet detection Spam, HTTP and unknown traffic Supervised LSTM
Eslahi et al. [322] Mobile botnet detection HTTP botnet traffic Supervised MLP
Alauthaman et al. Peer-to-peer botnet
Waledac and Strom Bots Supervised MLP
[323] detection
Shokri and Privacy preserving deep Avoiding sharing data in collaborative
Supervised MLP, CNN
Shmatikov [324] learning model training
Privacy preserving deep Addressing information leakage introduced
Phong et al. [325] Supervised MLP
Privacy learning in [324]
Privacy-preserving mobile
Ossia et al. [326] Offloading feature extraction from cloud Supervised CNN
analytics
Deep learning with Preventing exposure of private information
Abadi et al. [327] Supervised MLP
differential privacy in training data
Unsuper- MLP & Latent
Privacy-preserving
Osia et al. [328] Offloading personal data from clouds vised & Dirichlet
personal model training
supervised Allocation [329]
Privacy-preserving model Breaking down large models for
Servia et al. [330] Supervised CNN
inference privacy-preserving analytics
Stealing information from Breaking the ordinary and differentially Unsuper-
Hitaj et al. [331] GAN
collaborative deep learning private collaborative deep learning vised
Attacker Generating passwords from leaked Unsuper-
Hitaj et al. [327] Password guessing GAN
password set vised
Reconstructing functions of polyalphabetic
Greydanus [332] Enigma learning Supervised LSTM
cipher
MLP, AE,
Maghrebi [333] Breaking cryptographic Side channel attacks Supervised
CNN, LSTM
where the authors draw inspirations from NLP, and take the tailored to individual users. This avoids transferring personal
disassembled byte-code of an app as a text for analysis. Their data to untrusted entities hence users’ privacy is guaranteed.
experiments demonstrate that CNN can effectively learn to Osia et al. dedicated to protecting user’s personal data from
detect sequences of opcodes that are indicative of malware. inferences’ perspective. In particular, they break the entire
Chen et al. incorporate location information into the detection deep neural network into a feature extractor (on the client side)
framework and exploit RBM for feature extraction and and an analyzer (on the cloud side) to minimize the exposure
classification [319]. Their proposal improve the performance of sensitive information. Through local processing of raw
of other ML methods. input data, sensitive personal information is transferred into
abstract features which avoid direct disclosure to the cloud.
Botnet Detection: A botnet is a network that consists of Experiments on gender classification and emotion detection
machines compromised by bots. These machine are usually suggest that their framework can effectively preserve users’
under the control of a botmaster who takes advantages of bots privacy, while maintaining remarkable inference accuracy.
to harm public services and systems [337]. Detecting botnets
is challenging and now becoming a pressing task in cyber Attacker: Though having less applications, deep learning
security. has been for security attack, such as compromising user’s
Deep learning is playing an important role in this area. For private information to guessing a enigma. In [331], Hitaj et
example, Oulehla et al. propose to employ neural networks al. suggest that collaborative learning a deep model is not
to extract features from mobile botnet behaviors [320]. They reliable. By training a GAN, their attacker is able to affect
design a parallel detection framework for detecting both the collaboratively learning process and lure the victims to
client-server and hybrid botnets and demonstrate encouraging disclose private information by injecting fake training samples.
performance. Torres et al. investigate the common behavior Their GAN even successfully breaks the differentially private
patterns that botnets exhibit across their life cycle using collaborative learning in [327]. The authors further investigate
LSTM [321]. They employ both under-sampling and over- the use of GANs for password guessing. In [340], they
sampling to address the class imbalance between botnet and design a PassGAN to learn the distribution of a set of leaked
normal traffic in the dataset, which is common in anomaly passwords. Once trained on a dataset, the PassGAN is able
detection problems. The similar problem is also studies in to match over 46% of passwords in a different testing set,
[322] and [323], where the authors use standard MLPs to without user intervention or cryptography knowledge. This
perform mobile and peer-to-peer botnet detection respectively, novel technique has potential to revolutionize current password
achieving high overall accuracy. guessing algorithms.
Greydanus break a decryption rule using a LSTM network
Privacy: Preserving user privacy during training and evalu- [332]. They treat decryption as a sequence-to-sequence trans-
ating a deep neural network is another important research lation task, and train a framework with large enigma pairs.
issue [338]. Initial research is conducted in [324], where The proposed LSTM demonstrates remarkable performance
the authors enable participants in training and evaluating a in learning polyalphabetic ciphers. Maghrebi et al. exploit
neural network without sharing their input data. This allows to various deep learning models (i.e. MLP, AE, CNN, LSTM) to
preserve individual’s privacy while benefiting all users as they construct a precise profiling system to perform side channel
collaboratively improve the model performance. Their frame- key recovery attacks [333]. Surprisingly, deep learning based
work is revisited and improved in [325], where another group methods demonstrate overwhelming performance over other
of researchers employ additively homomorphic encryption to template machine learning attacks in terms of efficiency in
address the information leakage problem ignored in [324] breaking both unprotected and protected Advanced Encryption
without compromising model accuracy. This significantly re- Standard implementations.
inforces the security of the system.
Osia et al. focus on privacy-preserving mobile analytics
using deep learning. They design a client-server framework F. Emerging Deep Learning Driven Mobile Network Applica-
based on the Siamese architecture [339], which accommodates tions
a feature extractor in mobile devices and correspondingly a In this part, we review work based on deep learning on
classifier in the cloud [326]. By offloading feature extractions other mobile networking areas, which are beyond the scopes
from the cloud, their system offers strong privacy guarantees. of all aforementioned subjects. These emerging applications
An innovative work in [327] implies that deep neural networks are very interesting and open several new research directions.
can be trained with differential privacy. The authors introduce A short summary abstracts these work in Table XIII.
a differentially private SGD to avoid disclosure of private
information of training data. Experiments on two publicly- Deep Learning Driven Signal Processing: Deep learning is
available image recognition datasets demonstrate that their also gaining increasing attentions in the signal processing area,
algorithm is able to maintain users privacy, with a manageable especially in Multi-Input Multi-Output (MIMO) applications.
cost in terms of complexity, efficiency, and performance. The MIMO is a technique that enables cooperations and
Servia et al. consider to train deep neural networks on utilizations of multiple radio antennas in wireless networks.
distributed devices without violating privacy constraints Multi-user MIMO systems aim at accommodating a large
[330]. In specific, the authors retrain an initial model locally number of users and devices simultaneously within a
condensed area while maintaining high throughputs and unsupervised deep learning to generate real-time accurate user
consistent performance. This is now a fundamental technique profiles [351] using an on-network machine learning platform
in current wireless communications, including both cellular Net2Vet [355]. Specifically, they analyze user browsing
and WiFi networks. By incorporating deep learning, MIMO data in real time and generate user profiles using product
is evolving to a more intelligent system which can optimize categories. The profiles can be subsequently associated with
its performance based on aware environments. the products that are of interest to the users and then use
such information for online advertising.
Samuel et al. suggest that deep neural network can be a
good estimator for transmitted vectors in a MIMO channel. IoT In-Network Computation: Instead of taking IoT nodes
By unfolding a projected gradient descent method, they design as producers of data or the end-consumer of processed infor-
a MLP-based detection network to perform binary MIMO mation, Kaminski et al. embed neural networks into a IoT
detection [341]. The Detection Network can be implemented network, allowing IoT nodes to collaboratively process data
on multiple channels after a single training. Simulations generated in synchronization with the data flows [352]. This
conducted demonstrate that their architecture achieves near- enables low-latency communication in IoT networks, while
optimal accuracy while requiring light computation without offloading data storage and processing from the cloud. In
prior knowledge of Signal-to-Noise Ratio (SNR). Yan et al. particular, they map each hidden unit of a per-trained neural
employ deep learning to solve the similar problem from a network as a node in IoT networks, and investigate the op-
different perspective [342]. By considering the characteristic timal projection which leads to the minimum communication
invariance of signals, they exploit an AE as a feature extractor, overhead. Their framework performs a similar function on in-
and subsequently use an Extreme Learning Machine (ELM) network computation in WSNs, which opens a new research
to classify signal sources in a MIMO orthogonal frequency directions in fog computing.
division multiplexing system. Their proposal achieves higher Mobile Crowdsensing: Xiao et al. advocate that there
detection accuracy over several traditional methods while exist malicious mobile users who intentionally provide false
yielding similar complexity. sensing data to servers for cost saving and privacy preserving,
Wijaya et al. consider applying deep learning to a differ- making mobile crowdsensings system vulnerable [353]. They
ent scenario [296], [297]. The authors propose to use non- formulate the server-users system as a Stackelberg game,
iterative neural networks to perform transmit power control at where the server plays the role of leader that is responsible
base stations, thereby preventing the degeneration of network for evaluating the sensing effort of individuals, by analyzing
performance introduced by inter-cell interference. The neural the accuracy of each sensing report. Users are paid by
network is trained to estimate the probability of transmit the evaluations of the effort, hence cheating users will be
power. At every packet transmission, the transmit power with punished with zero reward. To design the optimal payment
the highest activation probability will be selected. Simulations policy, the servers employs a deep Q network to derive
demonstrate that their framework significantly outperform the knowledge from experience sensing reports and payment
belief propagation algorithm that routinely used for transmit policy without requiring knowledges of specific sensing
power control in MIMO environments, while yielding less models. Simulations demonstrate superior performance in
computational cost. terms of sensing quality, resilience to attacks and server utility
Deep learning is enabling progress in radio signal analysis. over traditional Q learning and random payment strategies.
In [343], O’Shea et al. employ an LSTM to replace sequence
translation routines between radio transmitter and receiver.
Though their framework works well in ideal environments, VII. TAILORING D EEP L EARNING TO M OBILE N ETWORKS
its performance drops significantly when introducing realistic Although deep learning performs remarkably in many mo-
channel effects. Later, they consider a different scenario in bile networking areas, the No Free Lunch (NFL) theorem
[344], where they exploit a regularized AE to enable reliable indicates that there is no single model that can work univer-
communications over over an impaired channel between sally well in all problems [356]. This implies that for any
signal senders and receivers. They further incorporate a specific mobile and wireless networking problem, we may
radio transformer network on the decoder side for signal need to adapt a different deep learning architectures so as to
reconstruction, thereby achieving receiver synchronization. achieve better performance. In this section, we focus on how to
Simulations demonstrate that their framework is reliable tailor deep learning to mobile network applications from three
and can be efficiently implemented. West and O’Shea later perspectives, namely, mobile devices and systems, distributed
turn their attention to modulation recognition. In [345], they data centers, and changing mobile network environments.
compare the recognition accuracy of different deep learning
architectures, including traditional CNNs, ResNet, Inception
CNN and LSTM. Their experiments suggest that LSTM is the A. Tailoring Deep Learning to Mobile Devices and Systems
best candidate for modulation recognition since it achieves The ultra-low latency requirements of future 5G mobile
that highest accuracy. Due to its superior performance, LSTM networks demand with runtime efficiency of operations in
is also employed for the similar task in [346]. mobile systems, including deep learning driven applications.
However, running complex deep learning on mobile systems
Network Data Monetization: Gonzalez and Vallina employ may violate certain latency constrains. On the other hand,
TABLE XIII: A summary of emerging deep learning driven mobile network applications.
Reference Application Model
Samuel et al. [341] MIMO detection MLP
Yan et al. [342] Signal detection on a MIMO-OFDM system AE+ELM
Vieira et al. [263] Massive MIMO fingerprint-based positioning CNN
Neumann et al. [347] MIMO channel estimation CNN
Wijaya et al. [296], [297] Intercell-interference cancellation and transmit power optimization RBM
O’Shea et al. [348] Optimisation of representations and encoding/decoding processes AE
O’Shea et al. [348] Sparse linear inverse problem of MIMO signal LAMP, LVAMP, CNN
O’Shea et al. [343] Radio traffic sequence recognition LSTM
O’Shea et al. [344] Learning to communicate over an impaired channel AE + radio transformer network
Rajendran et al. [346] Automatic modulation classification LSTM
West and O’Shea [345] Modulation recognition CNN, ResNet, Inception CNN, LSTM
O’Shea et al. [349] Modulation recognition Radio transformer network
O’Shea and Hoydis [350] Modulation classification CNN
Gonzalez and Vallina [351] Network data monetisation Unknown
Kaminski et al. [352] In-network computation for IoT MLP
Xiao et al. [353] Mobile crowdsensing Deep Q learning
Dörner et al. [354] Continuous data transmission AE
TABLE XIV: Summary of works on deep learning for mobile devices and systems.
Reference Methods Target model
Iandola et al. [360] Filter size shrinking, reducing input channels and late downsampling CNN
Howard et al. [361] Depth-wise separable convolution CNN
Zhang et al. [362] Point-wise group convolution and channel shuffle CNN
Zhang et al. [363] Tucker decomposition AE
Cao et al. [364] Data parallelisation by RenderScript RNN
Chen et al. [365] Space exploration for data reusability and kernel redundancy removal CNN
Rallapalli et al. [366] Memory optimisations CNN
Lane et al. [367] Runtime layer compression and deep architecture decomposition MLP, CNN
Huynh et al. [368] Caching, Tucker decomposition and computation offloading CNN
Wu et al. [369] Parameters quantisation CNN
Bhattacharya and Lane [370] Sparsification of fully-connected layers and separation of convolutional kernels MLP, CNN
Georgiev et al. [82] Representation sharing MLP
Cho and Brand [371] Convolution operation optimisation CNN
Guo and Potkonjak [372] Filters and classes pruning CNN
Li et al. [373] Cloud assistance and incremental learning CNN
Zen et al. [374] Weight quantisation LSTM
Falcao et al. [375] Parallelisation and memory sharing Stacked AE
most current mobile devices are limited by the capability similar model complexity [362]. They discover that more
of hardware. This means that implementing complex deep groups of convolution can reduce the computation require-
learning architectures on such equipment without tuning may ment, hence one can increase the group number to expand the
be computationally infeasible. To address this issue, numerous information encoded by models for performance improvement
researchers dedicated to improve existing deep learning archi- under certain computational capacity constrains.
tectures [357], such that they will not violate any latency and Zhang et al. focus on reducing parameters of fully-
energy constraint [358], [359]. Before the review, we outline connected layers for mobile multimedia features learning
these works in Table XIV. [363]. By applying Trucker decomposition to weight sub-
tensors in the model, the dimensionality of parameters is
Iandola et al. design a compacted SqueezeNet for embedded significantly reduced while maintaining decent reconstruction
systems The SqueezeNet gains similar accuracy compared to capability. The Trucker decomposition has also been employed
AlexNet while embracing 50 times less parameters [360]. in [368], where the authors seek to approximate the model with
Howard et al. extend this work and introduce an efficient less parameters for memory saving. Mobile optimizations are
family of streamlined CNNs called MobileNet, which uses further studied for RNN models. In [364], Cao et al. use a
depth-wise separable convolution operations to drastically re- mobile toolbox called RenderScript 16 to parallelize specific
duce computation required and model size [361]. This new data structures and enable mobile GPUs to perform computa-
design results in small models which can run with low tional accelerations. Their proposal significantly reduces the
latency, enabling them to satisfy the requirements for mobile latency of running RNN models on Android smartphones.
and embedded vision applications. They further introduce Chen et al. shed light on implementing CNN on iOS mobile
two hyper-parameters to control the width and resolution devices [365]. In particular, they reduce the latency for model
of multipliers which can draw appropriate trade-off between executions, namely space exploration for data re-usability
accuracy and efficiency. The ShuffleNet proposed by Zhang et
al. improves the accuracy of MobileNet by employing point- 16 Android Renderscript ttps://developer.android.com/guide/topics/
wise group convolution and channel shuffle while retaining renderscript/compute.html.
and kernel redundancy removal. The former alleviates the data. We illustrate the principle of these two solutions in Fig.
high bandwidth requirement of convolutional layers while the 8 and review related techniques in this subsection.
latter reduces the memory and computational requirements
with negligible performance degenerations. These drastically Model Parallelism Large-scale distributed deep learning is
reduces overhead and latency for running CNN on mobile first studied in [95], where the authors develop a framework
devices. named DistBelief, which enables training complex neural
Rallapalli et al. investigate offloading very deep CNNs from networks on thousands of machines. In their framework,
clouds to edge devices by using memory optimization on both the full model is partitioned into smaller components and
mobile CPUs and GPUs [366]. Their framework allows to run distributed over various machines. Only nodes with edges (e.g.
deep CNNs with large memory requirements at high speed connections between layers) that cross boundaries between
on a mobile object detection application. Lane et al. develop machines require communications for parameters update and
a software accelerator DeepX to assist deep learning imple- inference. This system further involves a parameter server
mentations on mobile devices by exploiting two inference-time which enables each model replica to obtain latest parameters
resource control algorithms, i.e. runtime layer compression and during training. Experiments demonstrate that the proposed
deep architecture decomposition [367]. Specifically, runtime framework achieves significantly faster training speed on a
layer compression technique controls the the memory and CPU cluster over a single GPU, while achieving state-of-the-
computation runtime during the inference phase, by extending art performance on the ImageNet [144] classification.
model compression principles. This is important in mobile Teerapittayanon et al. propose distributed deep neural net-
devices, since offloading inference to edges is more practical works tailored to distributed systems, which include cloud
on current hardware platforms. Further, the deep architecture servers, fog layers and geographically distributed devices
designs “decomposition plans” which seeks to allocate data [376]. Particularly, they scale the overall neural network ar-
and model operations to local and remote processors optimally, chitecture and distribute its components hierarchically from
tailored to individual neural network structures. By combing cloud to end devices. The model exploits local aggregators
these two, the DeepX enables maximization of energy and and binary weights to reduce computational storage, and
runtime efficiency under certain computation and memory communication overheads, while maintaining decent accuracy.
constraints. Experiments on a multi-view multi-camera dataset demon-
Beyond these works, researchers also successfully adapt strate their proposal can perform efficient cloud-based training
deep learning architectures through other designs and sophis- and local inference over a distributed computing system. Im-
ticated optimizations, such as parameters quantization [369], portantly, without violating latency constrains, the distributed
[374], sparsification and separation [370], representation and deep neural network obtains essential benefits associated with
memory sharing [82], [375], convolution operation optimiza- distributed systems, such as fault tolerance and privacy.
tion [371], pruning [372], and cloud assistance [373]. These Coninck et al. consider distributing deep learning over
techniques will be of great significance to embed deep neural IoT for classification applications [377]. Specifically, they
networks into mobile systems and devices. deploy a small neural network to local devices to perform
coarse classification, which enables fast respond filtered data
to be sent to central servers. If the local model fails to
B. Tailoring Deep Learning to Distributed Data Containers classify, the larger neural network on the cloud is activated
Mobile systems generate and consume massive mobile data to perform fine-grained classification. The overall architecture
every day, which may involve similar content but distributed maintain comparable accuracy, while significantly reducing
around the world. Moving all these data to centralized servers latency introduced by large model inference.
to perform model training and evaluation inevitably introduces Decentralized methods can also be applied to deep
communication and storage overheads, which is difficult to reinforcement learning. In [378], Omidshafiei et al. consider
scale. However, mobile data generated from different locations a multi-agent system with partial observability and limited
usually exhibits different characteristics associated with human allowed communication, which is common in mobile network
culture, mobility, geographical topology, etc. To obtain a systems. They combine a set of sophisticated methods and
robust deep learning model for mobile network applications, algorithms, including hysteretic learners, deep recurrent
training the model with diverse data becomes necessary. More- Q network, concurrent experience replay trajectories and
over, completely accommodating the full training/inference distillation, to enable multi-agent coordination using a single
process on a cloud will introduce non-negligible computational joint policy under a set of decentralized partial observable
overheads. Hence, appropriately offloading model executions MDPs. Their framework can potentially play an important
to distributed data centers or edge devices can dramatically role in addressing control problems in distributed mobile
alleviate the burden on the cloud. systems. As controllers in such systems have strict limited
Generally, there exist two solutions to address this problem. observability and strict communication constraints, the whole
Namely, (i) decomposing the model itself to train (or make process can be formulated as a partially observable MDP.
inference with) its components individually; or (ii) scaling
the training process to perform model update at different Training Parallelism Training parallelism is also essential for
locations associated with data containers. Both schemes allow mobile system, as mobile data usually come asynchronously
one to train a single model without requiring to centralize all from different sources. Effectively training models while main-
Model Parallelism Training Parallelism
Machine/Device 1
Machine/Device 3 Machine/Device 4
Asynchronous SGD
Data Collection
Machine/Device 2
Time
Fig. 8: The underlying principles of model parallelism (left) and training parallelism (right).
taining consistency, fast convergence, and accuracy remains Critic algorithm. In [383], Hardy et al. further study learning
challenging [379]. a neural network in a distributed manner over cloud and
A practical method to address this problem is to perform edge devices. In particular, they propose a training algorithm,
asynchronous SGD. The basic idea is to enable the server AdaComp, which allows to compress worker updates to the
that maintains a model to allow to accept stale delayed targeted model. This significantly reduce the communication
information (e.g. data, gradient update) from workers. At each overhead between cloud and edge, while retaining good fault
update iteration, the server only requires to wait for a smaller tolerance with the occurrence of worker breakdowns.
number of workers. This is essential for training a deep neural Federated learning has recently become an emerging paral-
network over distributed machines in mobile systems. The lelism approach that enables mobile devices to collaboratively
asynchronous SGD is first studied in [380], where the authors learn a shared model while retaining all training data on
propose a lock-free parallel SGD named HOGWILD, which individual device [384], [385]. Beyond offloading training
demonstrates significant faster convergence over locking coun- data from central servers, it performs model update under
terparts. The Downpour SGD in [95] improves the robustness Secure Aggregation protocol [386], which decrypts the average
of the training process when work nodes breakdown, as update only enough users have participated without inspect
each model replica requests the latest version of parameters. individual phone’s update. This allows model to aggregate
Hence small number of failed machines will not make a updates aggregation in a secure, efficient, scalable, and fault-
significant impact to the training process. A similar idea has tolerant way.
been employed in [381], where Goyal et al. investigate the
usage of a set of techniques (i.e. learning rate adjustment,
warm-up, batch normalization) of large minibatches which C. Tailoring Deep Learning to Changing Mobile Network
offer important insights on training large-scale deep neural Environments
networks on distributed system. Eventually, their framework Mobile network environments usually exhibit changing
can train an accurate network on ImageNet in 1 hour, which patterns over time. For instance, spatial mobile data traffic
is impressive in comparison with traditional algorithms. distributions over a region significantly vary from different
Zhang et al. advocate that most of asynchronous SGD algo- time of a day [387]. Applying a deep learning model in
rithms suffer from slow convergence due to the inherent vari- changing mobile environments will requires lifelong learning
ance of stochastic gradients [382]. They propose an improved ability to continuously absorb new features from mobile
SGD with variance reduction to speed up the convergence. environments, without forgetting old but essential patterns.
Their algorithm significantly outperforms other asynchronous Moreover, new smartphone-targeted viruses are spreading fast
SGD in terms of convergence when training deep neural via mobile network and severely jeopardize users’ privacy
networks on Google Cloud Computing Platform. The asyn- and commercial profits. These pose unprecedented challenges
chronous method has also been applied to deep reinforcement to current anomaly detection systems and anti-virus software,
learning. In [66], the authors create multiple environments as they are required such frameworks to timely react to new
which allows agents to perform asynchronous updates to the threats using limited information. To this end, the model
main structure. The new algorithm A3C breaks the sequential should have transfer learning ability, which can enable to fast
dependency, reduces the reliability of experience replay, and transfer the knowledge from pre-trained models for different
significantly speeds up the training of the traditional Actor- jobs or dataset. This will allow models to work well with
Deep Lifelong Learning Deep Transfer Learning

Model 1 Model 2 Model n
Deep Learning Model

...
Knowledge 1 Knowledge 2 Knowledge n Knowledge 1 Knowledge Knowledge 2 Knowledge Knowledge n

Knowledge ... Transfer Transfer
Base
Support
Support
Learning Task 1 Learning task 2 Learning Task n Learning Task 1 Learning task 2 Learning Task n
... ...
Time Time
Fig. 9: The underlying principles of deep lifelong learning (left) and deep transfer learning (right). The lifelong learning retain
the knowledge learned while transfer learning exploits the source domain labeled data to help target domain learning without
knowledge retention.
limited threat samples (one-shot learning) or limited metadata in [391]. In particular, they abandon the memory modules
description of new threats (zero-shot learning). Therefore, in [389] and design a self-organizing architecture with with
both lifelong learning and transfer learning are essential for recurrent neurons for processing time-varying patterns. A
applications in changeable mobile network environments. We variant of the Growing When Required network is employed
illustrated these two learning paradigms in Fig. 9 and review in each layer to to predict neural activation sequences from the
essential research in this subsection. previous network layer, which allows learning the time-vary
correlations between input and label, without requirements of
Deep Lifelong Learning Lifelong learning mimics human a predefined number of classes. Importantly, their framework
behaviors and seeks to build a machine that can continuously is robust, as it has tolerance to missing and corrupted sample
adapt to new environments [388]. An ideal lifelong learning labels which is common in mobile data.
machine is able to retain knowledge from previous learning Another interesting deep lifelong learning architecture is
experience to aid future learning and problem solving with presented in [392], where Tessler et al. manage to build a
necessary adjustments, which is of great significance in learn- DQN agent which can retain learned skills in playing the
ing problems in mobile network environments. famous computer game Minecraft. The overall framework
There exist several research efforts which adapt traditional includes a pre-trained model, Deep Skill Network, which
deep learning to a lifelong learning machine. For example, is trained a-priori on various sub-tasks of the game. When
Lee et al. propose a dual-memory deep learning architec- learning a new task, the old knowledge is maintained by
ture for lifelong learning of everyday human behaviors over incorporating reusable skills through a Deep Skill module,
non-stationary data streams [389]. To enable the pre-trained which consists of a Deep Skill Network array and a multi-skill
model to retain old knowledge while training with new data, distillation network. These allow the agent to selectively
their architecture includes two memory buffers, namely deep transfer knowledge to solve a new task. Experiments
memory and fast memory. The deep memory is composed of demonstrate that their proposal significantly outperforms
several deep networks, which are built when the amount of traditional double DQN in terms of accuracy and convergence
data from an unseen distribution is accumulated and reaches by reusing and transferring old skills in new task learning.
a threshold. The fast memory component is a small neural This technique has potential to be employed in solving mobile
network, which is updated immediately when coming across networking problems, as it can continuously new knowledge.
a new data sample. These two memory modules allow to
perform continuous learning without forgetting old knowledge. Deep Transfer Learning Unlike lifelong learning, the transfer
Experiments on a non-stationary image data stream prove learning only seeks to use knowledge from a specific domain
the effectiveness of this model, as it significantly outper- to aid target domain learning. Applying transfer learning can
forms other online deep learning algorithms. The memory accelerate the new learning process, as the new task does not
mechanism has also been applied in [390]. In particular, require to learn from scratch. This is essential to mobile net-
the authors introduce a differentiable neural computer, which work environments, as they require to agilely respond to new
allows neural network to dynamically read from and write to network patterns and threats. Now it has important application
an external memory module. This enables lifelong lookup and in computer network domain [51], such as Web mining [393],
forgetting of knowledge from external sources, as humans do. caching [394] and base station sleeping mechanism [158].
Parisi et al. consider a different lifelong learning scenario There exist two extreme transfer learning paradigms,
namely one-shot learning and zero-shot learning. The one- traffic and application popularity is highly irregular (see an
shot learning refers to a learning method that gains as much example in Fig. 10) [400], [401], which is particularly difficult
information as possible about a category from only one or a to capture their complex correlations.
handful of samples, given a pre-trained model [395]. On the
other hand, zero-shot learning does not require any sample 3D Mobile Traffic Surface 2D Mobile Traffic Snapshot
from a category [396]. It aims at learning a new distribution
given meta description of the new category and correlation
with existing training data. Though research toward deep
one-shot learning [80], [397] and deep zero-shot learning
[398], [399] are novel, both paradigms are quite promising
in detecting new threats or patterns in mobile networks.
VIII. F UTURE R ESEARCH P ERSPECTIVES

Although deep learning is achieving increasingly promising
results in the mobile networking domain, several essential Fig. 10: An example of 3D mobile traffic surface (left) and
open research issues exist and are worthy of attention in 2D projection (right) on Milan, Italy. Figures adapted from
the future. In what follows, we discuss these challenges and [163] using data from [402].
pinpoint important mobile networking problems that can be
solved by deep learning. These can deliver insights into future Recent research suggests that data collected by mobile
mobile networking research. sensors (e.g. mobile traffic) over a city can be regarded
as pictures taken by panoramic cameras, which provide a
A. Serving Deep Learning with Massive and High-Quality city-scale sensing system for urban surveillance [403]. These
Data traffic sensing images enclose information associated with
Deep neural networks rely on massive and high-quality movements of large-scale of individuals [162]. We recognize
data to gain satisfying performance. When training a large that mobile traffic data, from both spatial and temporal di-
and complex architecture, data volume and quality are very mensions, have important similarity with videos, images or
important, as deeper model usually has a huge set of param- speech sequence, as been confirmed in [163]. We illustrate
eters to be learned and configured. This issue remains true their analogies in Fig. 11.
in mobile network applications. Unfortunately, unlike some
popular research areas such as computer vision and NLP, Mobile Traffic Evolutions
t t+s Video
there still lacks high-quality and large-scale labeled datasets
for mobile network applications, as service provides prefer
Latitude
Latitude
...
to keep their data confidential and reluctant to release their
datasets. This to some extent, restricts the development of Longitude Longitude
deep learning in the mobile networking domain. Moreover, Image
due to limitations of mobile sensors and network equipment,
mobile data collected are usually subjected to loss, redundancy, Mobile Traffic
Latitude
Snapshot
mislabeling and class imbalance, and thus cannot be directly
employed for training purpose.
Longitude
To establish an intelligent 5G mobile network architecture, Zooming
efficient and mature streamlining and platforms for mobile data Speech Signal
Amplitute
processing are in demand, to serve deep learning applications.
Traffic volume
Mobile Traffic
This requires considerable amount of research efforts for data Series
collection, transmission, cleaning, clustering and transforma-
tion. We appeal to researchers and companies to release more Time Frequency
datasets, which can dramatically advance the deep learning

applications in the mobile network area and benefit a wide Fig. 11: Analogies between mobile traffic data (left) and other
range of communities from both academic and industry. data (right).
Specifically, the evolution of large -scale mobile traffic

B. Deep Learning in Spatio-Temporal Mobile Traffic Data highly resemble videos, as they are both composed of a
Mining sequence of “frames”. If we focus on individual traffic snap-
Accurate analysis of mobile traffic data over a geographical shot, the spatial mobile traffic distribution is by analogy with
region is becoming increasingly essential for event localiza- images. Moreover, if we zoom into a narrow location to mea-
tion, network resource allocation, context-based advertising sure its long-term traffic consumption, we can observe that a
and urban planning [387]. However, due to the mobility of single traffic series looks similar to natural language sequence.
smartphone users, the spatio-temporal distribution of mobile These imply that, to some extent, well-established tools for
computer vision (e.g. CNN) or NLP (e.g. RNN, LSTM) can overall model training process. GANs are good at imitating
be a promising candidate for mobile traffic analysis [163]. data distributions, which can be employed as a simulator to
Beyond the similarity, we observe several properties that mimic real mobile network environments. Recent research
mobile traffic snapshots uniquely embrace, making them dis- reveals that GANs can protect communications by crafting
tinct from images or language sequences. Namely, custom their own cryptography to avoid eavesdropping [408].
1) Values of neighboring pixels in fine-grained traffic snap- All these tools require further research to fulfill their full
shots normally do not change significantly, while this potentials in the mobile networking domain.
happens quite often in edge areas of natural images.
2) Single mobile traffic series usually exhibits regular peri- D. Deep Reinforcement Learning Powered Mobile Network
odicity (in both daily and weekly properties), but such Control
property does not hold for pixels in videos. Currently, many mobile network control problems have been
3) Due to users’ mobility, the mobile traffic consumption in solved by constrained optimization, dynamic programming
a region is more likely to stay or shift to neighboring and game theory approaches. Unfortunately, these methods
areas in the near future, while the values of pixels might either make strong assumption about the objective functions
not be shifted in videos. (e.g. function convexity) or data distribution (e.g. Gaussian
Such properties of mobile traffic simplify their spatio-temporal or Poisson distributed), or suffer from high time and space
correlations, which can be exploited as prior knowledge for complexity. However, as mobile networks become increasingly
model design. We recognize several unique advantages of complex, such assumptions sometimes turn unrealistic. The
employing deep learning for mobile traffic mining: objective functions may be further is affected by large set
1) Deep learning performs remarkably in imaging and NLP of variables which poses severe computational and memory
applications, while mobile traffic has significant analogies challenges to these mathematical approaches.
to these data but with less spatio-temporal complexity; On the other hand, despite the Markov property, deep
2) LSTM captures well temporal correlations and depen- reinforcement learning does not make strong assumptions to
dency of time series data, while irregular and stochastic the target system. It employs function approximation which
user mobility could be learned using the transformation perfectly addresses the problem of large state-action space,
abilities of deformable CNN [146]. enabling reinforcement learning to scale to network control
3) Advanced GPU computing enables fast training of neural problems that were previously intractable. Inspired by its
networks and parallelism techniques can support low- remarkable achievements in Atari [128] and the game of Go
latency mobile traffic analysis. [409], a few researchers start to bring DRL to solve complex
We expect that deep learning can effectively exploited the network control problems, which has been reviewed in Sec.
spatio-temporal dynamic to solve limitation of existing tools VI-D. However, we believe that these work only displays a
such as Exponential Smoothing [404] and Autoregressive small part of DRL’s talent and its potential in mobile network
Integrated Moving Average model [405]. This may raise the control remains largely unexplored. For instance, it can be
research on mobile traffic traffic analysis to a higher level. exploited to extract rich features from cellular networks to
enable intelligent switching on/off base stations to reduce
C. Deep Unsupervised Learning Powered Mobile Data Anal- infrastructure’s energy footprint [410]. This should be, as
ysis DeepMind trains a DRL agent which reduces Google data
center cooling bill by 40% 17 . These exciting applications
We observe that current deep learning practices in mo-
make us believe DRL will raise the performance of mobile
bile networks largely employ supervised learning and rein-
network control to a higher level in the future.
forcement learning, while the potential of deep unsupervised
learning is yet to be explored. However, as mobile networks
IX. C ONCLUSIONS
generate considerable amounts of unlabeled data every day,
data labeling is costly and requires domain-specific knowl- Deep learning is playing an increasingly important role in
edge. To facilitate the analysis of raw mobile network data, the mobile and wireless networking domain. In this paper,
unsupervised learning becomes essential in extracting insights we provided a comprehensive survey of recent work that
from unlabeled data [406], so as to optimize the mobile lies at the intersection between these two different areas. We
network functionality to improve QoE. summarized both basic concepts and advanced principles of
Deep learning is resourceful in terms of unsupervised various deep learning models, then correlated the deep learning
learning, as it provides excellent unsupervised tools such as and mobile networking disciplines by reviewing work across
AE, RBM and GAN. These models in general require light different application scenarios. We discussed how to tailor
feature engineering thus being promising in learning from deep learning models to general mobile networking appli-
heterogeneous and unstructured mobile data. In particular, cations, an aspect entirely overlooked by previous surveys.
deep AEs work well unsupervised anomaly detection. One can We concluded by pinpointing several open research issues
use normal data only to train an AE, and take test data that and promising directions, which may lead to valuable future
yields significantly higher reconstruction error than training 17 DeepMind AI Reduces Google Data Center Cooling Bill by 40%
error as outliers [407]. Though less popular, RBM can perform https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-
layer-wise unsupervised pre-training which can accelerate the bill-40/
research results. We hope this article will become a definite [20] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu,
guide to researchers and practitioners interested in applying and Fuad E Alsaadi. A survey of deep neural network architectures
and their applications. Neurocomputing, 234:11–26, 2017.
applying machine intelligence to to complex problems in [21] Li Deng, Dong Yu, et al. Deep learning: methods and applications.
mobile network environments. Foundations and Trends R in Signal Processing, 7(3–4):197–387,
2014.
[22] Li Deng. A tutorial survey of architectures, algorithms, and applications
ACKNOWLEDGEMENT for deep learning. APSIPA Transactions on Signal and Information
Processing, 3, 2014.
We would like to thank Zongzuo Wang for sharing valuable [23] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning.
insights on deep learning, which helped improving the quality MIT press, 2016.
[24] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and
of this paper. Anil Anthony Bharath. A brief survey of deep reinforcement learn-
ing. arXiv:1708.05866, 2017. To appear in IEEE Signal Processing
Magazine, Special Issue on Deep Learning for Image Understanding.
R EFERENCES [25] Xue-Wen Chen and Xiaotong Lin. Big data deep learning: challenges
and perspectives. IEEE access, 2:514–525, 2014.
[1] Cisco. Cisco Visual Networking Index: Forecast and Methodology, [26] Maryam M Najafabadi, Flavio Villanustre, Taghi M Khoshgoftaar,
2016-2021, June 2017. Naeem Seliya, Randall Wald, and Edin Muharemagic. Deep learning
[2] Ning Wang, Ekram Hossain, and Vijay K Bhargava. Backhauling 5G applications and challenges in big data analytics. Journal of Big Data,
small cells: A radio resource management perspective. IEEE Wireless 2(1):1, 2015.
Communications, 22(5):41–49, 2015. [27] NF Hordri, A Samar, SS Yuhaniz, and SM Shamsuddin. A systematic
[3] Fabio Giust, Luca Cominardi, and Carlos J Bernardos. Distributed literature review on features of deep learning in big data analytics. In-
mobility management for future 5G networks: overview and analysis ternational Journal of Advances in Soft Computing & Its Applications,
of existing approaches. IEEE Communications Magazine, 53(1):142– 9(1), 2017.
149, 2015. [28] Mehdi Gheisari, Guojun Wang, and Md Zakirul Alam Bhuiyan. A
[4] Mamta Agiwal, Abhishek Roy, and Navrati Saxena. Next generation survey on deep learning in big data. In Computational Science and
5G wireless networks: A comprehensive survey. IEEE Communications Engineering (CSE) and Embedded and Ubiquitous Computing (EUC),
Surveys & Tutorials, 18(3):1617–1655, 2016. IEEE International Conference on, volume 2, pages 173–180, 2017.
[5] Akhil Gupta and Rakesh Kumar Jha. A survey of 5G network: [29] Shui Yu, Meng Liu, Wanchun Dou, Xiting Liu, and Sanming Zhou.
Architecture and emerging technologies. IEEE access, 3:1206–1232, Networking for big data: A survey. IEEE Communications Surveys &
2015. Tutorials, 19(1):531–549, 2017.
[6] Kan Zheng, Zhe Yang, Kuan Zhang, Periklis Chatzimisios, Kan Yang, [30] Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, and Hwee-Pink
and Wei Xiang. Big data-driven optimization for mobile networks Tan. Machine learning in wireless sensor networks: Algorithms, strate-
toward 5G. IEEE network, 30(1):44–51, 2016. gies, and applications. IEEE Communications Surveys & Tutorials,
[7] Chunxiao Jiang, Haijun Zhang, Yong Ren, Zhu Han, Kwang-Cheng 16(4):1996–2018, 2014.
Chen, and Lajos Hanzo. Machine learning paradigms for next- [31] Chun-Wei Tsai, Chin-Feng Lai, Ming-Chao Chiang, Laurence T Yang,
generation wireless networks. IEEE Wireless Communications, et al. Data mining for Internet of things: A survey. IEEE Communi-
24(2):98–105, 2017. cations Surveys and Tutorials, 16(1):77–97, 2014.
[8] Duong D Nguyen, Hung X Nguyen, and Langford B White. Rein- [32] Xiang Cheng, Luoyang Fang, Xuemin Hong, and Liuqing Yang.
forcement learning with network-assisted feedback for heterogeneous Exploiting mobile big data: Sources, features, and applications. IEEE
rat selection. IEEE Transactions on Wireless Communications, 2017. Network, 31(1):72–79, 2017.
[9] Fairuz Amalina Narudin, Ali Feizollah, Nor Badrul Anuar, and Ab- [33] Mario Bkassiny, Yang Li, and Sudharman K Jayaweera. A survey on
dullah Gani. Evaluation of machine learning classifiers for mobile machine-learning techniques in cognitive radios. IEEE Communica-
malware detection. Soft Computing, 20(1):343–357, 2016. tions Surveys & Tutorials, 15(3):1136–1159, 2013.
[10] Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, [34] Jeffrey G Andrews, Stefano Buzzi, Wan Choi, Stephen V Hanly, Angel
Gregory R Ganger, Phillip B Gibbons, and Onur Mutlu. Gaia: Geo- Lozano, Anthony CK Soong, and Jianzhong Charlie Zhang. What
distributed machine learning approaching LAN speeds. In USENIX will 5G be? IEEE Journal on selected areas in communications,
Symposium on Networked Systems Design and Implementation (NSDI), 32(6):1065–1082, 2014.
pages 629–647, 2017. [35] Nisha Panwar, Shantanu Sharma, and Awadhesh Kumar Singh. A
[11] Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen, survey on 5G: The next generation of mobile communication. Physical
Ming Wu, Wei Li, and Lidong Zhou. Tux2: Distributed graph com- Communication, 18:64–84, 2016.
putation for machine learning. In USENIX Symposium on Networked [36] Olakunle Elijah, Chee Yen Leow, Tharek Abdul Rahman, Solomon
Systems Design and Implementation (NSDI), pages 669–682, 2017. Nunoo, and Solomon Zakwoi Iliya. A comprehensive survey of pilot
[12] Paolini, Monica and Fili, Senza . Mastering Analytics: How to benefit contamination in massive MIMO–5G system. IEEE Communications
from big data and network complexity: An Analyst Report. RCR Surveys & Tutorials, 18(2):905–923, 2016.
Wireless News, 2017. [37] Stefano Buzzi, I Chih-Lin, Thierry E Klein, H Vincent Poor, Chenyang
[13] Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and Yang, and Alessio Zappone. A survey of energy-efficient techniques
Ioannis Kompatsiaris. Deep learning advances in computer vision with for 5G networks and challenges ahead. IEEE Journal on Selected Areas
3d data: A survey. ACM Computing Surveys (CSUR), 50(2):20, 2017. in Communications, 34(4):697–709, 2016.
[14] Richard Socher, Yoshua Bengio, and Christopher D Manning. Deep [38] Mugen Peng, Yong Li, Zhongyuan Zhao, and Chonggang Wang.
learning for nlp (without magic). In Tutorial Abstracts of ACL 2012, System architecture and key technologies for 5G heterogeneous cloud
pages 5–5. Association for Computational Linguistics. radio access networks. IEEE network, 29(2):6–14, 2015.
[15] IEEE Network special issue: Exploring Deep Learning for Efficient [39] Yong Niu, Yong Li, Depeng Jin, Li Su, and Athanasios V Vasilakos.
and Reliable Mobile Sensing. http://www.comsoc.org/netmag/cfp/ A survey of millimeter wave communications (mmwave) for 5G:
exploring-deep-learning-efficient-and-reliable-mobile-sensing, 2017. opportunities and challenges. Wireless Networks, 21(8):2657–2676,
[Online; accessed 14-July-2017]. 2015.
[16] Mowei Wang, Yong Cui, Xin Wang, Shihan Xiao, and Junchen Jiang. [40] Xenofon Foukas, Georgios Patounas, Ahmed Elmokashfi, and Ma-
Machine learning for networking: Workflow, advances and opportuni- hesh K Marina. Network slicing in 5G: Survey and challenges. IEEE
ties. IEEE Network, 2017. Communications Magazine, 55(5):94–100, 2017.
[17] Mohammad Abu Alsheikh, Dusit Niyato, Shaowei Lin, Hwee-Pink [41] Tarik Taleb, Konstantinos Samdanis, Badr Mada, Hannu Flinck, Sunny
Tan, and Zhu Han. Mobile big data analytics using deep learning Dutta, and Dario Sabella. On multi-access edge computing: A survey
and Apache Spark. IEEE network, 30(3):22–29, 2016. of the emerging 5G network edge architecture & orchestration. IEEE
[18] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Communications Surveys & Tutorials, 2017.
Nature, 521(7553):436–444, 2015. [42] Pavel Mach and Zdenek Becvar. Mobile edge computing: A survey
[19] Jürgen Schmidhuber. Deep learning in neural networks: An overview. on architecture and computation offloading. IEEE Communications
Neural networks, 61:85–117, 2015. Surveys & Tutorials, 2017.
[43] Yuyi Mao, Changsheng You, Jun Zhang, Kaibin Huang, and Khaled B Mobile Computing Systems and Applications, pages 117–122. ACM,
Letaief. A survey on mobile edge computing: The communication 2015.
perspective. IEEE Communications Surveys & Tutorials, 2017. [64] Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco GB
[44] Ying Wang, Peilong Li, Lei Jiao, Zhou Su, Nan Cheng, Xuemin Sher- De Natale. Deep learning for mobile multimedia: A survey. ACM
man Shen, and Ping Zhang. A data-driven architecture for personalized Transactions on Multimedia Computing, Communications, and Appli-
QoE management in 5G wireless networks. IEEE Wireless Communi- cations (TOMM), 13(3s):34, 2017.
cations, 24(1):102–110, 2017. [65] Shuai Zhang, Lina Yao, and Aixin Sun. Deep learning based rec-
[45] Qilong Han, Shuang Liang, and Hongli Zhang. Mobile cloud sensing, ommender system: A survey and new perspectives. arXiv preprint
big data, and 5G networks make an intelligent and smart world. IEEE arXiv:1707.07435, 2017.
Network, 29(2):40–45, 2015. [66] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex
[46] Sukhdeep Singh, Navrati Saxena, Abhishek Roy, and HanSeok Kim. Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray
f. IETE Technical Review, 34(1):30–39, 2017. Kavukcuoglu. Asynchronous methods for deep reinforcement learning.
[47] Min Chen, Jun Yang, Yixue Hao, Shiwen Mao, and Kai Hwang. A 5G In International Conference on Machine Learning, pages 1928–1937,
cognitive system for healthcare. Big Data and Cognitive Computing, 2016.
1(1):2, 2017. [67] Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein gen-
[48] Teodora Sandra Buda, Haytham Assem, Lei Xu, Danny Raz, Udi erative adversarial networks. In International Conference on Machine
Margolin, Elisha Rosensweig, Diego R Lopez, Marius-Iulian Corici, Learning (2017), 2017.
Mikhail Smirnov, Robert Mullins, et al. Can machine learning aid in [68] W. McCulloch and W. Pitts. A logical calculus of the ideas immanent
delivering new use cases and scenarios in 5G? In Network Operations in nervous activity. Bulletin of Mathematical Biophysics, (5).
and Management Symposium (NOMS), 2016 IEEE/IFIP, pages 1279– [69] DRGHR Williams and Geoffrey Hinton. Learning representations by
1284, 2016. back-propagating errors. Nature, 323(6088):533–538, 1986.
[49] Ali Imran, Ahmed Zoha, and Adnan Abu-Dayya. Challenges in 5G: [70] Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images,
how to empower SON with big data for enabling 5G. IEEE Network, speech, and time series. The handbook of brain theory and neural
28(6):27–33, 2014. networks, 3361(10):1995, 1995.
[50] Bharath Keshavamurthy and Mohammad Ashraf. Conceptual design [71] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet
of proactive SONs based on the big data framework for 5G cellular classification with deep convolutional neural networks. In Advances in
networks: A novel machine learning perspective facilitating a shift in neural information processing systems, pages 1097–1105, 2012.
the son paradigm. In System Modeling & Advancement in Research [72] Pedro Domingos. A few useful things to know about machine learning.
Trends (SMART), International Conference, pages 298–304. IEEE, Commun. ACM, 55(10):78–87, 2012.
2016. [73] Ivor W Tsang, James T Kwok, and Pak-Ming Cheung. Core vector
[51] Paulo Valente Klaine, Muhammad Ali Imran, Oluwakayode Onireti, machines: Fast SVM training on very large data sets. Journal of
and Richard Demo Souza. A survey of machine learning techniques Machine Learning Research, 6:363–392, 2005.
applied to self organizing cellular networks. IEEE Communications
[74] Carl Edward Rasmussen and Christopher KI Williams. Gaussian
Surveys and Tutorials, 2017.
processes for machine learning, volume 1. MIT press Cambridge,
[52] Rongpeng Li, Zhifeng Zhao, Xuan Zhou, Guoru Ding, Yan Chen,
2006.
Zhongyao Wang, and Honggang Zhang. Intelligent 5G: When cellular
[75] Nicolas Le Roux and Yoshua Bengio. Representational power of
networks meet artificial intelligence. IEEE Wireless Communications,
restricted boltzmann machines and deep belief networks. Neural
2017.
computation, 20(6):1631–1649, 2008.
[53] Nicola Bui, Matteo Cesana, S Amir Hosseini, Qi Liao, Ilaria Malan-
chini, and Joerg Widmer. A survey of anticipatory mobile networking: [76] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
Context-based classification, prediction methodologies, and optimiza- Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-
tion techniques. IEEE Communications Surveys & Tutorials, 2017. erative adversarial nets. In Advances in neural information processing
[54] Panagiotis Kasnesis, Charalampos Patrikakis, and Iakovos Venieris. systems, pages 2672–2680, 2014.
Changing the game of mobile data analysis with deep learning. IT [77] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A
Professional, 2017. unified embedding for face recognition and clustering. In Proceedings
[55] Xiang Cheng, Luoyang Fang, Liuqing Yang, and Shuguang Cui. Mobile of the IEEE Conference on Computer Vision and Pattern Recognition,
big data: The fuel for data-driven wireless. IEEE Internet of Things pages 815–823, 2015.
Journal, 2017. [78] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and
[56] Lidong Wang and Randy Jones. Big data analytics for network Max Welling. Semi-supervised learning with deep generative models.
intrusion detection: A survey. International Journal of Networks and In Advances in Neural Information Processing Systems, pages 3581–
Communications, 7(1):24–31, 2017. 3589, 2014.
[57] Nei Kato, Zubair Md Fadlullah, Bomin Mao, Fengxiao Tang, Osamu [79] Russell Stewart and Stefano Ermon. Label-free supervision of neural
Akashi, Takeru Inoue, and Kimihiro Mizutani. The deep learning vision networks with physics and domain knowledge. In AAAI, pages 2576–
for heterogeneous network traffic control: proposal, challenges, and 2582, 2017.
future perspective. IEEE Wireless Communications, 24(3):146–153, [80] Danilo Rezende, Ivo Danihelka, Karol Gregor, Daan Wierstra, et al.
2017. One-shot generalization in deep generative models. In International
[58] Michele Zorzi, Andrea Zanella, Alberto Testolin, Michele De Filippo Conference on Machine Learning, pages 1521–1529, 2016.
De Grazia, and Marco Zorzi. Cognition-based networks: A new [81] Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew
perspective on network optimization using learning and distributed Ng. Zero-shot learning through cross-modal transfer. In Advances in
intelligence. IEEE Access, 3:1512–1530, 2015. neural information processing systems, pages 935–943, 2013.
[59] Zubair Fadlullah, Fengxiao Tang, Bomin Mao, Nei Kato, Osamu [82] Petko Georgiev, Sourav Bhattacharya, Nicholas D Lane, and Cecilia
Akashi, Takeru Inoue, and Kimihiro Mizutani. State-of-the-art deep Mascolo. Low-resource multi-task audio sensing for mobile and
learning: Evolving machine intelligence toward tomorrow’s intelligent embedded devices via shared deep neural network representations. Pro-
network traffic control systems. IEEE Communications Surveys & ceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
Tutorials, 2017. Technologies, 1(3):50, 2017.
[60] Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, and Mohsen [83] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav
Guizani. Deep learning for IoT big data and streaming analytics: A Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden,
survey. arXiv preprint arXiv:1712.04301, 2017. Al Borchers, et al. In-datacenter performance analysis of a tensor
[61] Nauman Ahad, Junaid Qadir, and Nasir Ahsan. Neural networks in processing unit. arXiv preprint arXiv:1704.04760, 2017.
wireless networks: Techniques, applications and guidelines. Journal of [84] John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. Scal-
Network and Computer Applications, 68:1–27, 2016. able parallel programming with CUDA. Queue, 6(2):40–53, 2008.
[62] Ammar Gharaibeh, Mohammad A Salahuddin, Sayed J Hussini, Ab- [85] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Co-
dallah Khreishah, Issa Khalil, Mohsen Guizani, and Ala Al-Fuqaha. hen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cuDNN:
Smart cities: A survey on data management, security and enabling Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759,
technologies. IEEE Communications Surveys & Tutorials, 2017. 2014.
[63] Nicholas D Lane and Petko Georgiev. Can deep learning revolutionize [86] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis,
mobile sensing? In Proceedings of the 16th International Workshop on Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving,
Michael Isard, et al. TensorFlow: A system for large-scale machine [109] Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran
learning. In OSDI, volume 16, pages 265–283, 2016. Chen, and Hai Li. TernGrad: Ternary gradients to reduce communi-
[87] Theano Development Team. Theano: A Python framework for cation in distributed deep learning. In Advances in neural information
fast computation of mathematical expressions. arXiv e-prints, processing systems, 2017.
abs/1605.02688, May 2016. [110] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
[88] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew
Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Rabinovich. Going deeper with convolutions. In Proceedings of the
Caffe: Convolutional architecture for fast feature embedding. arXiv IEEE conference on computer vision and pattern recognition, pages
preprint arXiv:1408.5093, 2014. 1–9, 2015.
[89] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A Matlab-like [111] Flavio Bonomi, Rodolfo Milito, Preethi Natarajan, and Jiang Zhu. Fog
environment for machine learning. In BigLearn, NIPS Workshop, 2011. computing: A platform for internet of things and analytics. In Big
[90] Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Data and Internet of Things: A Roadmap for Smart Environments,
Eugenio Culurciello. A 240 G-ops/s mobile coprocessor for deep neural pages 169–186. Springer, 2014.
networks. In Proceedings of the IEEE Conference on Computer Vision [112] Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and
and Pattern Recognition Workshops, pages 682–687, 2014. Yiran Chen. MoDNN: Local distributed mobile computing system for
[91] ncnn is a high-performance neural network inference framework opti- deep neural network. In 2017 Design, Automation & Test in Europe
mized for the mobile platform . https://github.com/Tencent/ncnn, 2017. Conference & Exhibition (DATE), pages 1396–1401. IEEE, 2017.
[Online; accessed 25-July-2017]. [113] Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim,
[92] Huawei announces the Kirin 970- new flagship SoC with AI capabil- Qing Dong, Yen-Po Chen, Laura Fick, Xun Sun, Ron Dreslinski, et al.
ities. http://www.androidauthority.com/huawei-announces-kirin-970- 14.7 a 288µw programmable deep-learning processor with 270kb on-
797788/, 2017. [Online; accessed 01-Sep-2017]. chip weight storage using non-uniform memory hierarchy for mobile
[93] Core ML: Integrate machine learning models into your app. https: intelligence. In IEEE International Conference on Solid-State Circuits
//developer.apple.com/documentation/coreml, 2017. [Online; accessed (ISSCC), pages 250–251, 2017.
25-July-2017]. [114] Filipp Akopyan. Design and tool flow of ibm’s truenorth: an ultra-
[94] Ilya Sutskever, James Martens, George E Dahl, and Geoffrey E Hinton. low power programmable neurosynaptic chip with 1 million neurons.
On the importance of initialization and momentum in deep learning. In Proceedings of the 2016 on International Symposium on Physical
ICML (3), 28:1139–1147, 2013. Design, pages 59–60. ACM, 2016.
[95] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, [115] Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and
Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, Soheil Ghiasi. Cnndroid: GPU-accelerated execution of trained deep
et al. Large scale distributed deep networks. In Advances in neural convolutional neural networks on Android. In Proceedings of the 2016
information processing systems, pages 1223–1231, 2012. ACM on Multimedia Conference, pages 1201–1205. ACM, 2016.
[96] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic [116] Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and
optimization. International Conference on Learning Representations Scott Yang. Adanet: Adaptive structural learning of artificial neural
(ICLR), 2015. networks. ICML, 2017.
[97] Tim Kraska, Ameet Talwalkar, John C Duchi, Rean Griffith, Michael J [117] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learn-
Franklin, and Michael I Jordan. MLbase: A distributed machine- ing algorithm for deep belief nets. Neural computation, 18(7):1527–
learning system. In CIDR, volume 1, pages 2–1, 2013. 1554, 2006.
[98] Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik [118] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng.
Kalyanaraman. Project adam: Building an efficient and scalable deep Convolutional deep belief networks for scalable unsupervised learning
learning training system. In OSDI, volume 14, pages 571–582, 2014. of hierarchical representations. In Proceedings of the 26th annual
[99] Henggang Cui, Hao Zhang, Gregory R Ganger, Phillip B Gibbons, and international conference on machine learning, pages 609–616. ACM,
Eric P Xing. Geeps: Scalable deep learning on distributed GPUs with 2009.
a GPU-specialized parameter server. In Proceedings of the Eleventh [119] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and
European Conference on Computer Systems, page 4. ACM, 2016. Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning
[100] Ryan Spring and Anshumali Shrivastava. Scalable and sustainable useful representations in a deep network with a local denoising cri-
deep learning via randomized hashing. ACM SIGKDD Conference on terion. Journal of Machine Learning Research, 11(Dec):3371–3408,
Knowledge Discovery and Data Mining, 2017. 2010.
[101] Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus [120] Diederik P Kingma and Max Welling. Auto-encoding variational bayes.
Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy International Conference on Learning Representations (ICLR), 2014.
Bengio, and Jeff Dean. Device placement optimization with reinforce- [121] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
ment learning. International Conference on Machine Learning, 2017. residual learning for image recognition. In Proceedings of the IEEE
[102] Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak conference on computer vision and pattern recognition, pages 770–778,
Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2016.
Petuum: A new platform for distributed machine learning on big data. [122] Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3D convolutional neural
IEEE Transactions on Big Data, 1(2):49–67, 2015. networks for human action recognition. IEEE transactions on pattern
[103] Moustafa Alzantot, Yingnan Wang, Zhengshuang Ren, and Mani B analysis and machine intelligence, 35(1):221–231, 2013.
Srivastava. RSTensorFlow: GPU enabled tensorflow for deep learning [123] Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der
on commodity Android devices. In Proceedings of the 1st International Maaten. Densely connected convolutional networks. IEEE Conference
Workshop on Deep Learning for Mobile Systems and Applications, on Computer Vision and Pattern Recognition, 2017.
pages 7–12. ACM, 2017. [124] Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to
[104] Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, forget: Continual prediction with LSTM. 1999.
Simiao Yu, and Yike Guo. TensorLayer: A versatile library for efficient [125] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence
deep learning development. In Proceedings of the 2017 ACM on learning with neural networks. In Advances in neural information
Multimedia Conference, MM ’17, pages 1201–1204, 2017. processing systems, pages 3104–3112, 2014.
[105] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward [126] Shi Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin
Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, Wong, and Wang-chun Woo. Convolutional LSTM network: A machine
and Adam Lerer. Automatic differentiation in pytorch. 2017. learning approach for precipitation nowcasting. In Advances in neural
[106] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, information processing systems, pages 802–810, 2015.
Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: [127] Guo-Jun Qi. Loss-sensitive generative adversarial networks on Lips-
A flexible and efficient machine learning library for heterogeneous chitz densities. arXiv preprint arXiv:1701.06264, 2017.
distributed systems. arXiv preprint arXiv:1512.01274, 2015. [128] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu,
[107] Sebastian Ruder. An overview of gradient descent optimization Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller,
algorithms. arXiv preprint arXiv:1609.04747, 2016. Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control
[108] Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoff- through deep reinforcement learning. Nature, 518(7540):529–533,
man, David Pfau, Tom Schaul, and Nando de Freitas. Learning to 2015.
learn by gradient descent by gradient descent. In Advances in Neural [129] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Lau-
Information Processing Systems, pages 3981–3989, 2016. rent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis
Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering [150] Francisco Javier Ordóñez and Daniel Roggen. Deep convolutional
the game of Go with deep neural networks and tree search. Nature, and LSTM recurrent neural networks for multimodal wearable activity
529(7587):484–489, 2016. recognition. Sensors, 16(1):115, 2016.
[130] Ronan Collobert and Samy Bengio. Links between perceptrons, MLPs [151] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew
and SVMs. In Proceedings of the twenty-first international conference Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Jo-
on Machine learning, page 23. ACM, 2004. hannes Totz, Zehan Wang, et al. Photo-realistic single image super-
[131] Geoffrey E Hinton. Training products of experts by minimizing resolution using a generative adversarial network. In IEEE Conference
contrastive divergence. Neural computation, 14(8):1771–1800, 2002. on Computer Vision and Pattern Recognition, 2017.
[132] George Casella and Edward I George. Explaining the gibbs sampler. [152] Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and
The American Statistician, 46(3):167–174, 1992. Shuicheng Yan. Perceptual generative adversarial networks for small
[133] Takashi Kuremoto, Masanao Obayashi, Kunikazu Kobayashi, Takaomi object detection. In IEEE Conference on Computer Vision and Pattern
Hirata, and Shingo Mabu. Forecast chaotic time series data by Recognition, 2017.
DBNs. In Image and Signal Processing (CISP), 2014 7th International [153] Yijun Li, Sifei Liu, Jimei Yang, and Ming-Hsuan Yang. Generative
Congress on, pages 1130–1135. IEEE, 2014. face completion. In IEEE Conference on Computer Vision and Pattern
[134] Yann Dauphin and Yoshua Bengio. Stochastic ratio matching of RBMs Recognition, 2017.
for sparse high-dimensional inputs. In Advances in Neural Information [154] Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine.
Processing Systems, pages 1340–1348, 2013. Continuous deep Q-learning with model-based acceleration. In Inter-
[135] Tara N Sainath, Brian Kingsbury, Bhuvana Ramabhadran, Petr Fousek, national Conference on Machine Learning, pages 2829–2838, 2016.
Petr Novak, and Abdel-rahman Mohamed. Making deep belief net- [155] Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisỳ, Dustin
works effective for large vocabulary continuous speech recognition. In Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson,
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE and Michael Bowling. Deepstack: Expert-level artificial intelligence in
Workshop on, pages 30–35, 2011. heads-up no-limit poker. Science, 356(6337):508–513, 2017.
[136] Yoshua Bengio et al. Learning deep architectures for AI. Foundations [156] Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre
and trends R in Machine Learning, 2(1):1–127, 2009. Quillen. Learning hand-eye coordination for robotic grasping with deep
[137] Mayu Sakurada and Takehisa Yairi. Anomaly detection using au- learning and large-scale data collection. The International Journal of
toencoders with nonlinear dimensionality reduction. In Proceedings Robotics Research, page 0278364917710318, 2016.
Workshop on Machine Learning for Sensory Data Analysis (MLSDA), [157] Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil
page 4. ACM, 2014. Yogamani. Deep reinforcement learning framework for autonomous
[138] Miguel Nicolau, James McDermott, et al. A hybrid autoencoder and driving. Electronic Imaging, 2017(19):70–76, 2017.
density estimation model for anomaly detection. In International [158] Rongpeng Li, Zhifeng Zhao, Xianfu Chen, Jacques Palicot, and Hong-
Conference on Parallel Problem Solving from Nature, pages 717–726. gang Zhang. Tact: A transfer actor-critic learning framework for energy
Springer, 2016. saving in cellular radio access networks. IEEE Transactions on Wireless
[139] Vrizlynn LL Thing. IEEE 802.11 network anomaly detection and Communications, 13(4):2000–2011, 2014.
attack classification: A deep learning approach. In IEEE Wireless [159] Hasan AA Al-Rawi, Ming Ann Ng, and Kok-Lim Alvin Yau. Ap-
Communications and Networking Conference (WCNC), pages 1–6, plication of reinforcement learning to routing in distributed wireless
2017. networks: a review. Artificial Intelligence Review, 43(3):381–416, 2015.
[140] Bomin Mao, Zubair Md Fadlullah, Fengxiao Tang, Nei Kato, Osamu [160] Yan-Jun Liu, Li Tang, Shaocheng Tong, CL Philip Chen, and Dong-
Akashi, Takeru Inoue, and Kimihiro Mizutani. Routing or computing? Juan Li. Reinforcement learning design-based adaptive tracking control
the paradigm shift towards intelligent computer network packet trans- with less learning parameters for nonlinear discrete-time MIMO sys-
mission based on deep learning. IEEE Transactions on Computers, tems. IEEE Transactions on Neural Networks and Learning Systems,
2017. 26(1):165–176, 2015.
[141] Valentin Radu, Nicholas D Lane, Sourav Bhattacharya, Cecilia Mas- [161] Demetrios Zeinalipour Yazti and Shonali Krishnaswamy. Mobile big
colo, Mahesh K Marina, and Fahim Kawsar. Towards multimodal deep data analytics: research, practice, and opportunities. In Mobile Data
learning for activity recognition on mobile devices. In Proceedings Management (MDM), 2014 IEEE 15th International Conference on,
of ACM International Joint Conference on Pervasive and Ubiquitous volume 1, pages 1–2. IEEE, 2014.
Computing: Adjunct, pages 185–188, 2016. [162] Diala Naboulsi, Marco Fiore, Stephane Ribot, and Razvan Stanica.
[142] Ramachandra Raghavendra and Christoph Busch. Learning deeply cou- Large-scale mobile traffic analysis: a survey. IEEE Communications
pled autoencoders for smartphone based robust periocular verification. Surveys & Tutorials, 18(1):124–161, 2016.
In Image Processing (ICIP), 2016 IEEE International Conference on, [163] Chaoyun Zhang, Xi Ouyang, and Paul Patras. ZipNet-GAN: Inferring
pages 325–329, 2016. fine-grained mobile traffic patterns via a generative adversarial neural
[143] Jing Li, Jingyuan Wang, and Zhang Xiong. Wavelet-based stacked network. In Proceedings of the 13th ACM Conference on Emerging
denoising autoencoders for cell phone base station user number predic- Networking Experiments and Technologies. ACM, 2017.
tion. In IEEE International Conference on Internet of Things (iThings) [164] Victor C Liang, Richard TB Ma, Wee Siong Ng, Li Wang, Marianne
and IEEE Green Computing and Communications (GreenCom) and Winslett, Huayu Wu, Shanshan Ying, and Zhenjie Zhang. Mercury:
IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Metro density prediction with recurrent neural network on streaming
Smart Data (SmartData), pages 833–838, 2016. CDR data. In Data Engineering (ICDE), 2016 IEEE 32nd International
[144] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Conference on, pages 1374–1377, 2016.
Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, [165] Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee,
Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large and Andrew Y Ng. Multimodal deep learning. In Proceedings of the
Scale Visual Recognition Challenge. International Journal of Computer 28th international conference on machine learning (ICML-11), pages
Vision (IJCV), 115(3):211–252, 2015. 689–696, 2011.
[145] Junmo Kim Yunho Jeon. Active convolution: Learning the shape [166] Laura Pierucci and Davide Micheli. A neural network for quality of
of convolution for image classification. In Proceedings of the IEEE experience estimation in mobile communications. IEEE MultiMedia,
Conference on Computer Vision and Pattern Recognition, 2017. 23(4):42–49, 2016.
[146] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, [167] Youngjune Gwon and HT Kung. Inferring origin flow patterns in Wi-Fi
and Yichen Wei. Deformable convolutional networks. arXiv preprint with deep learning. In ICAC, pages 73–83, 2014.
arXiv:1703.06211, 2017. [168] Laisen Nie, Dingde Jiang, Shui Yu, and Houbing Song. Network traffic
[147] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long- prediction based on deep belief network in wireless mesh backbone
term dependencies with gradient descent is difficult. IEEE transactions networks. In Wireless Communications and Networking Conference
on neural networks, 5(2):157–166, 1994. (WCNC), 2017 IEEE, pages 1–5, 2017.
[148] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid [169] Vusumuzi Moyo et al. The generalization ability of artificial neural
speech recognition with deep bidirectional LSTM. In Automatic Speech networks in forecasting TCP/IP traffic trends: How much does the size
Recognition and Understanding (ASRU), 2013 IEEE Workshop on, of learning rate matter? International Journal of Computer Science
pages 273–278, 2013. and Application, 2015.
[149] Rie Johnson and Tong Zhang. Supervised and semi-supervised text [170] Jing Wang, Jian Tang, Zhiyuan Xu, Yanzhi Wang, Guoliang Xue, Xing
categorization using LSTM for region embeddings. In International Zhang, and Dejun Yang. Spatiotemporal modeling and prediction in
Conference on Machine Learning, pages 526–534, 2016. cellular networks: A big data enabled deep learning approach. In
INFOCOM–36th Annual IEEE International Conference on Computer [190] Mohammad-Parsa Hosseini, Tuyen X Tran, Dario Pompili, Kost Elise-
Communications, 2017. vich, and Hamid Soltanian-Zadeh. Deep learning with edge computing
[171] Chaoyun Zhang and Paul Patras. Long-term mobile traffic fore- for localization of epileptogenicity using multimodal rs-fMRI and EEG
casting using deep spatio-temporal neural networks. arXiv preprint big data. In Autonomic Computing (ICAC), 2017 IEEE International
arXiv:1712.08083, 2017. Conference on, pages 83–92, 2017.
[172] Zhanyi Wang. The applications of deep learning on traffic identifica- [191] Cosmin Stamate, George D Magoulas, Stefan Küppers, Effrosyni
tion. BlackHat USA, 2015. Nomikou, Ioannis Daskalopoulos, Marco U Luchini, Theano Mous-
[173] Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen souri, and George Roussos. Deep learning parkinson’s from smartphone
Yang. End-to-end encrypted traffic classification with one-dimensional data. In Pervasive Computing and Communications (PerCom), 2017
convolution neural networks. In Intelligence and Security Informatics IEEE International Conference on, pages 31–40, 2017.
(ISI), 2017 IEEE International Conference on, pages 43–48, 2017. [192] Tom Quisel, Luca Foschini, Alessio Signorini, and David C Kale. Col-
[174] Mohammad Lotfollahi, Ramin Shirali, Mahdi Jafari Siavoshani, and lecting and analyzing millions of mhealth data streams. In Proceedings
Mohammdsadegh Saberian. Deep packet: A novel approach for of the 23rd ACM SIGKDD International Conference on Knowledge
encrypted traffic classification using deep learning. arXiv preprint Discovery and Data Mining, pages 1971–1980. ACM, 2017.
arXiv:1709.02656, 2017. [193] Usman Mahmood Khan, Zain Kabir, Syed Ali Hassan, and Syed Has-
[175] Wei Wang, Ming Zhu, Xuewen Zeng, Xiaozhou Ye, and Yiqiang Sheng. san Ahmed. A deep learning framework using passive WiFi sensing
Malware traffic classification using convolutional neural network for for respiration monitoring. arXiv preprint arXiv:1704.05708, 2017.
representation learning. In Information Networking (ICOIN), 2017 [194] Dawei Li, Theodoros Salonidis, Nirmit V Desai, and Mooi Choo
International Conference on, pages 712–717. IEEE, 2017. Chuah. Deepcham: Collaborative edge-mediated adaptive deep learning
[176] Bjarke Felbo, Pål Sundsøy, Alex’Sandy’ Pentland, Sune Lehmann, for mobile object recognition. In Edge Computing (SEC), IEEE/ACM
and Yves-Alexandre de Montjoye. Using deep learning to predict Symposium on, pages 64–76, 2016.
demographics from mobile phone metadata. International Conference [195] Luis Tobías, Aurélien Ducournau, François Rousseau, Grégoire
on Learning Representations (ICLR), workshop track, 2016. Mercier, and Ronan Fablet. Convolutional neural networks for object
[177] Nai Chun Chen, Wanqin Xie, Roy E Welsch, Kent Larson, and Jenny recognition on mobile devices: A case study. In Pattern Recognition
Xie. Comprehensive predictions of tourists’ next visit location based on (ICPR), 2016 23rd International Conference on, pages 3530–3535.
call detail records using machine learning and deep learning methods. IEEE, 2016.
In Big Data (BigData Congress), 2017 IEEE International Congress [196] Parisa Pouladzadeh and Shervin Shirmohammadi. Mobile multi-food
on, pages 1–6, 2017. recognition using deep learning. ACM Transactions on Multimedia
[178] Ziheng Lin, Mogeng Yin, Sidney Feygin, Madeleine Sheehan, Jean- Computing, Communications, and Applications (TOMM), 13(3s):36,
Francois Paiement, and Alexei Pozdnoukhov. Deep generative models 2017.
of urban mobility. 2017. [197] Ryosuke Tanno, Koichi Okamoto, and Keiji Yanai. DeepFoodCam: A
DCNN-based real-time mobile food recognition system. In Proceedings
[179] Chang Xu, Kuiyu Chang, Khee-Chin Chua, Meishan Hu, and Zhenx-
of the 2nd International Workshop on Multimedia Assisted Dietary
iang Gao. Large-scale Wi-Fi hotspot classification via deep learning.
Management, pages 89–89. ACM, 2016.
In Proceedings of the 26th International Conference on World Wide
[198] Pallavi Kuhad, Abdulsalam Yassine, and Shervin Shimohammadi.
Web Companion, pages 857–858. International World Wide Web Con-
Using distance estimation and deep learning to simplify calibration in
ferences Steering Committee, 2017.
food calorie measurement. In Computational Intelligence and Virtual
[180] Ala Al-Fuqaha, Mohsen Guizani, Mehdi Mohammadi, Mohammed
Environments for Measurement Systems and Applications (CIVEMSA),
Aledhari, and Moussa Ayyash. Internet of things: A survey on enabling
2015 IEEE International Conference on, pages 1–6, 2015.
technologies, protocols, and applications. IEEE Communications Sur-
[199] Teng Teng and Xubo Yang. Facial expressions recognition based on
veys & Tutorials, 17(4):2347–2376, 2015.
convolutional neural networks for mobile virtual reality. In Proceedings
[181] Suranga Seneviratne, Yining Hu, Tham Nguyen, Guohao Lan, Sara of the 15th ACM SIGGRAPH Conference on Virtual-Reality Continuum
Khalifa, Kanchana Thilakarathna, Mahbub Hassan, and Aruna Senevi- and Its Applications in Industry-Volume 1, pages 475–478. ACM, 2016.
ratne. A survey of wearable devices and challenges. IEEE Communi- [200] Wu Liu, Huadong Ma, Heng Qi, Dong Zhao, and Zhineng Chen. Deep
cations Surveys & Tutorials, 2017. learning hashing for mobile visual search. EURASIP Journal on Image
[182] He Li, Kaoru Ota, and Mianxiong Dong. Learning IoT in edge: and Video Processing, 2017(1):17, 2017.
Deep learning for the Internet of Things with edge computing. IEEE [201] Jinmeng Rao, Yanjun Qiao, Fu Ren, Junxing Wang, and Qingyun Du.
Network, 32(1):96–101, 2018. A mobile outdoor augmented reality method combining deep learning
[183] Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio For- object detection and spatial relationships for geovisualization. Sensors,
livesi, and Fahim Kawsar. An early resource characterization of deep 17(9):1951, 2017.
learning on wearables, smartphones and internet-of-things devices. In [202] Kazuya Ohara, Takuya Maekawa, and Yasuyuki Matsushita. Detecting
Proceedings of the 2015 International Workshop on Internet of Things state changes of indoor everyday objects using Wi-Fi channel state
towards Applications, pages 7–12. ACM, 2015. information. Proceedings of the ACM on Interactive, Mobile, Wearable
[184] Sicong Liu and Junzhao Du. Poster: Mobiear-building an environment- and Ubiquitous Technologies, 1(3):88, 2017.
independent acoustic sensing platform for the deaf using deep learning. [203] Ming Zeng, Le T Nguyen, Bo Yu, Ole J Mengshoel, Jiang Zhu, Pang
In Proceedings of the 14th Annual International Conference on Mobile Wu, and Joy Zhang. Convolutional neural networks for human activity
Systems, Applications, and Services Companion, pages 50–50. ACM, recognition using mobile sensors. In Mobile Computing, Applications
2016. and Services (MobiCASE), 2014 6th International Conference on,
[185] Liu Sicong, Zhou Zimu, Du Junzhao, Shangguan Longfei, Jun Han, and pages 197–205. IEEE, 2014.
Xin Wang. Ubiear: Bringing location-independent sound awareness to [204] Bandar Almaslukh, Jalal AlMuhtadi, and Abdelmonim Artoli. An ef-
the hard-of-hearing people with smartphones. Proceedings of the ACM fective deep autoencoder approach for online smartphone-based human
on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(2):17, activity recognition. International Journal of Computer Science and
2017. Network Security (IJCSNS), 17(4):160, 2017.
[186] Vasu Jindal. Integrating mobile and cloud for PPG signal selection to [205] Xinyu Li, Yanyi Zhang, Ivan Marsic, Aleksandra Sarcevic, and Ran-
monitor heart rate during intensive physical exercise. In Proceedings dall S Burd. Deep learning for RFID-based activity recognition.
of the International Workshop on Mobile Software Engineering and In Proceedings of the 14th ACM Conference on Embedded Network
Systems, pages 36–37. ACM, 2016. Sensor Systems CD-ROM, pages 164–175. ACM, 2016.
[187] Edward Kim, Miguel Corte-Real, and Zubair Baloch. A deep semantic [206] Sourav Bhattacharya and Nicholas D Lane. From smart to deep: Robust
mobile application for thyroid cytopathology. In Proc. SPIE, volume activity recognition on smartwatches using deep learning. In Pervasive
9789, page 97890A, 2016. Computing and Communication Workshops (PerCom Workshops), 2016
[188] Aarti Sathyanarayana, Shafiq Joty, Luis Fernandez-Luque, Ferda Ofli, IEEE International Conference on, pages 1–6, 2016.
Jaideep Srivastava, Ahmed Elmagarmid, Teresa Arora, and Shahrad [207] Antreas Antoniou and Plamen Angelov. A general purpose intelligent
Taheri. Sleep quality prediction from wearable data using deep surveillance system for mobile devices using deep learning. In Neural
learning. JMIR mHealth and uHealth, 4(4), 2016. Networks (IJCNN), 2016 International Joint Conference on, pages
[189] Honggui Li and Maria Trocan. Personal health indicators by deep 2879–2886. IEEE, 2016.
learning of smart phone sensor data. In Cybernetics (CYBCONF), [208] Saiwen Wang, Jie Song, Jaime Lien, Ivan Poupyrev, and Otmar
2017 3rd IEEE International Conference on, pages 1–5, 2017. Hilliges. Interacting with Soli: Exploring fine-grained dynamic gesture
recognition in the radio-frequency spectrum. In Proceedings of the Processing (ICASSP), 2016 IEEE International Conference on, pages
29th Annual Symposium on User Interface Software and Technology, 5955–5959, 2016.
pages 851–860. ACM, 2016. [227] Rohit Prabhavalkar, Ouais Alsharif, Antoine Bruguier, and Lan Mc-
[209] Yang Gao, Ning Zhang, Honghao Wang, Xiang Ding, Xu Ye, Guanling Graw. On the compression of recurrent neural networks with an appli-
Chen, and Yu Cao. ihear food: Eating detection using commodity cation to LVCSR acoustic modeling for embedded speech recognition.
bluetooth headsets. In Connected Health: Applications, Systems and In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE
Engineering Technologies (CHASE), 2016 IEEE First International International Conference on, pages 5970–5974, 2016.
Conference on, pages 163–172, 2016. [228] Takuya Yoshioka, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa,
[210] Jindan Zhu, Amit Pande, Prasant Mohapatra, and Jay J Han. Using deep Keisuke Kinoshita, Masakiyo Fujimoto, Chengzhu Yu, Wojciech J
learning for energy expenditure estimation with wearable sensors. In Fabian, Miquel Espi, Takuya Higuchi, et al. The NTT CHiME-3
E-health Networking, Application & Services (HealthCom), 2015 17th system: Advances in speech enhancement and recognition for mobile
International Conference on, pages 501–506. IEEE, 2015. multi-microphone devices. In IEEE Workshop on Automatic Speech
[211] Pål Sundsøy, Johannes Bjelland, B Reme, A Iqbal, and Eaman Jahani. Recognition and Understanding (ASRU), pages 436–443, 2015.
Deep learning applied to mobile phone data for individual income [229] Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James
classification. ICAITA doi, 10, 2016. Landay. Speech is 3x faster than typing for english and mandarin text
[212] Yuqing Chen and Yang Xue. A deep learning approach to human entry on mobile devices. arXiv preprint arXiv:1608.07323, 2016.
activity recognition based on single accelerometer. In Systems, Man, [230] Andrey Ignatov, Nikolay Kobyshev, Kenneth Vanhoey, Radu Timofte,
and Cybernetics (SMC), 2015 IEEE International Conference on, pages and Luc Van Gool. DSLR-quality photos on mobile devices with deep
1488–1492, 2015. convolutional networks. arXiv preprint arXiv:1704.02470, 2017.
[213] Sojeong Ha and Seungjin Choi. Convolutional neural networks for [231] Zongqing Lu, Noor Felemban, Kevin Chan, and Thomas La Porta.
human activity recognition using multiple accelerometer and gyroscope Demo abstract: On-demand information retrieval from videos using
sensors. In Neural Networks (IJCNN), 2016 International Joint deep learning in wireless networks. In Internet-of-Things Design
Conference on, pages 381–388. IEEE, 2016. and Implementation (IoTDI), 2017 IEEE/ACM Second International
[214] Marcus Edel and Enrico Köppe. Binarized-BLSTM-RNN based human Conference on, pages 279–280, 2017.
activity recognition. In Indoor Positioning and Indoor Navigation [232] Jemin Lee, Jinse Kwon, and Hyungshin Kim. Reducing distraction
(IPIN), 2016 International Conference on, pages 1–7. IEEE, 2016. of smartwatch users with deep learning. In Proceedings of the 18th
[215] Tsuyoshi OKITA and Sozo INOUE. Recognition of multiple overlap- International Conference on Human-Computer Interaction with Mobile
ping activities using compositional cnn-lstm model. In Proceedings Devices and Services Adjunct, pages 948–953. ACM, 2016.
of the 2017 ACM International Joint Conference on Pervasive and [233] Toan H Vu, Le Dung, and Jia-Ching Wang. Transportation mode
Ubiquitous Computing and Proceedings of the 2017 ACM International detection on mobile devices using recurrent nets. In Proceedings of the
Symposium on Wearable Computers, pages 165–168. ACM, 2017. 2016 ACM on Multimedia Conference, pages 392–396. ACM, 2016.
[216] Gaurav Mittal, Kaushal B Yagnik, Mohit Garg, and Narayanan C [234] Shih-Hau Fang, Yu-Xaing Fei, Zhezhuang Xu, and Yu Tsao. Learning
Krishnan. Spotgarbage: smartphone app to detect garbage using transportation modes from smartphone sensors based on deep neural
deep learning. In Proceedings of the 2016 ACM International Joint network. IEEE Sensors Journal, 17(18):6111–6118, 2017.
Conference on Pervasive and Ubiquitous Computing, pages 940–945. [235] Daniele Ravì, Charence Wong, Benny Lo, and Guang-Zhong Yang. A
ACM, 2016. deep learning approach to on-node sensor data analytics for mobile or
[217] Lorenzo Seidenari, Claudio Baecchi, Tiberio Uricchio, Andrea Ferra- wearable devices. IEEE journal of biomedical and health informatics,
cani, Marco Bertini, and Alberto Del Bimbo. Deep artwork detection 21(1):56–64, 2017.
and retrieval for automatic context-aware audio guides. ACM Trans- [236] Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel T
actions on Multimedia Computing, Communications, and Applications Dudley. Deep learning for healthcare: review, opportunities and
(TOMM), 13(3s):35, 2017. challenges. Briefings in Bioinformatics, page bbx044, 2017.
[218] Xiao Zeng, Kai Cao, and Mi Zhang. Mobiledeeppill: A small-footprint [237] Charissa Ann Ronao and Sung-Bae Cho. Human activity recognition
mobile deep learning system for recognizing unconstrained pill images. with smartphone sensors using deep learning neural networks. Expert
In Proceedings of the 15th Annual International Conference on Mobile Systems with Applications, 59:235–244, 2016.
Systems, Applications, and Services, pages 56–67. ACM, 2017. [238] Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu.
[219] Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Deep learning for sensor-based activity recognition: A survey. arXiv
Abdelzaher. Deepsense: A unified deep learning framework for preprint arXiv:1707.03502, 2017.
time-series mobile sensing data processing. In Proceedings of the [239] Xukan Ran, Haoliang Chen, Zhenming Liu, and Jiasi Chen. Delivering
26th International Conference on World Wide Web, pages 351–360. deep learning to mobile devices via offloading. In Proceedings of the
International World Wide Web Conferences Steering Committee, 2017. Workshop on Virtual Reality and Augmented Reality Network, pages
[220] Xiao Zeng. Mobile sensing through deep learning. In Proceedings of 42–47. ACM, 2017.
the 2017 Workshop on MobiSys 2017 Ph. D. Forum, pages 5–6. ACM, [240] Vishakha V Vyas, KH Walse, and RV Dharaskar. A survey on human
2017. activity recognition using smartphone. International Journal, 5(3),
[221] Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, and Joan Serrà. 2017.
Practical processing of mobile sensor data for continual deep learning [241] Heiga Zen and Andrew Senior. Deep mixture density networks for
predictions. In Proceedings of the 1st International Workshop on Deep acoustic modeling in statistical parametric speech synthesis. In Acous-
Learning for Mobile Systems and Applications. ACM, 2017. tics, Speech and Signal Processing (ICASSP), 2014 IEEE International
[222] Xuyu Wang, Lingjun Gao, and Shiwen Mao. PhaseFi: Phase finger- Conference on, pages 3844–3848, 2014.
printing for indoor localization with a deep learning approach. In [242] Siri Team. Deep Learning for Siri’s Voice: On-device Deep Mix-
Global Communications Conference (GLOBECOM), 2015 IEEE, pages ture Density Networks for Hybrid Unit Selection Synthesis. https:
1–6, 2015. //machinelearning.apple.com/2017/08/06/siri-voices.html, 2017. [On-
[223] Xuyu Wang, Lingjun Gao, and Shiwen Mao. CSI phase fingerprinting line; accessed 16-Sep-2017].
for indoor localization with a deep learning approach. IEEE Internet [243] Jon Barker, Ricard Marxer, Emmanuel Vincent, and Shinji Watan-
of Things Journal, 3(6):1113–1123, 2016. abe. The third ’CHiME’speech separation and recognition challenge:
[224] Bokai Cao, Lei Zheng, Chenwei Zhang, Philip S Yu, Andrea Piscitello, Dataset, task and baselines. In Automatic Speech Recognition and
John Zulueta, Olu Ajilore, Kelly Ryan, and Alex D Leow. Deepmood: Understanding (ASRU), 2015 IEEE Workshop on, pages 504–511,
Modeling mobile phone typing dynamics for mood detection. 2017. 2015.
[225] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad [244] Kai Zhao, Sasu Tarkoma, Siyuan Liu, and Huy Vo. Urban human
Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, mobility data mining: An overview. In Big Data (Big Data), 2016
Klaus Macherey, et al. Google’s neural machine translation system: IEEE International Conference on, pages 1911–1920, 2016.
Bridging the gap between human and machine translation. arXiv [245] Shixiong Xia, Yi Liu, Guan Yuan, Mingjun Zhu, and Zhaohui Wang.
preprint arXiv:1609.08144, 2016. Indoor fingerprint positioning based on Wi-Fi: An overview. ISPRS
[226] Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez International Journal of Geo-Information, 6(5):135, 2017.
Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Haşim Sak, [246] Pavel Davidson and Robert Piche. A survey of selected indoor
Alexander Gruenstein, Françoise Beaufays, et al. Personalized speech positioning methods for smartphones. IEEE Communications Surveys
recognition on mobile devices. In Acoustics, Speech and Signal & Tutorials, 2016.
[247] Xi Ouyang, Chaoyun Zhang, Pan Zhou, and Hao Jiang. DeepSpace: [267] Po-Jen Chuang and Yi-Jun Jiang. Effective neural network-based node
An online deep learning framework for mobile big data to understand localisation scheme for wireless sensor networks. IET Wireless Sensor
human mobility patterns. arXiv preprint arXiv:1610.07009, 2016. Systems, 4(2):97–103, 2014.
[248] Cheng Yang, Maosong Sun, Wayne Xin Zhao, Zhiyuan Liu, and [268] Marcin Bernas and Bartłomiej Płaczek. Fully connected neural net-
Edward Y Chang. A neural network approach to jointly modeling social works ensemble with signal strength clustering for indoor localization
networks and mobile trajectories. ACM Transactions on Information in wireless sensor networks. International Journal of Distributed
Systems (TOIS), 35(4):36, 2017. Sensor Networks, 11(12):403242, 2015.
[249] Xuan Song, Hiroshi Kanasugi, and Ryosuke Shibasaki. DeepTransport: [269] Ashish Payal, Chandra Shekhar Rai, and BV Ramana Reddy. Analysis
Prediction and simulation of human mobility and transportation mode of some feedforward artificial neural network training algorithms
at a citywide level. In IJCAI, pages 2618–2624, 2016. for developing localization framework in wireless sensor networks.
[250] Junbo Zhang, Yu Zheng, and Dekang Qi. Deep spatio-temporal Wireless Personal Communications, 82(4):2519–2536, 2015.
residual networks for citywide crowd flows prediction. In Thirty- [270] Yuhan Dong, Zheng Li, Rui Wang, and Kai Zhang. Range-based
First Association for the Advancement of Artificial Intelligence (AAAI) localization in underwater wireless sensor networks using deep neural
Conference on Artificial Intelligence, 2017. network. In IPSN, pages 321–322, 2017.
[251] J Venkata Subramanian and M Abdul Karim Sadiq. Implementation [271] Xiaofei Yan, Hong Cheng, Yandong Zhao, Wenhua Yu, Huan Huang,
of artificial neural network for mobile movement prediction. Indian and Xiaoliang Zheng. Real-time identification of smoldering and
Journal of science and Technology, 7(6):858–863, 2014. flaming combustion phases in forest using a wireless sensor network-
[252] Longinus S Ezema and Cosmas I Ani. Artificial neural network based multi-sensor system and artificial neural network. Sensors,
approach to mobile location estimation in gsm network. International 16(8):1228, 2016.
Journal of Electronics and Telecommunications, 63(1):39–44, 2017. [272] Baowei Wang, Xiaodu Gu, Li Ma, and Shuangshuang Yan. Temper-
[253] Xuyu Wang, Lingjun Gao, Shiwen Mao, and Santosh Pandey. DeepFi: ature error correction based on BP neural network in meteorological
Deep learning for indoor fingerprinting using channel state information. wireless sensor network. International Journal of Sensor Networks,
In Wireless Communications and Networking Conference (WCNC), 23(4):265–278, 2017.
2015 IEEE, pages 1666–1671, 2015. [273] Ki-Seong Lee, Sun-Ro Lee, Youngmin Kim, and Chan-Gun Lee.
[254] Xuyu Wang, Xiangyu Wang, and Shiwen Mao. CiFi: Deep convolu- Deep learning–based real-time query processing for wireless sensor
tional neural networks for indoor localization with 5 GHz Wi-Fi. In network. International Journal of Distributed Sensor Networks,
2017 IEEE International Conference on Communications (ICC), pages 13(5):1550147717707896, 2017.
1–6. [274] Jiakai Li and Gursel Serpen. Adaptive and intelligent wireless sensor
networks through neural networks: an illustration for infrastructure
[255] Xuyu Wang, Lingjun Gao, and Shiwen Mao. BiLoc: Bi-modal deep
adaptation through hopfield network. Applied Intelligence, 45(2):343–
learning for indoor localization with commodity 5GHz WiFi. IEEE
362, 2016.
Access, 5:4209–4220, 2017.
[275] Fereshteh Khorasani and Hamid Reza Naji. Energy efficient data aggre-
[256] Michał Nowicki and Jan Wietrzykowski. Low-effort place recognition
gation in wireless sensor networks using neural networks. International
with WiFi fingerprints using deep learning. In International Conference
Journal of Sensor Networks, 24(1):26–42, 2017.
Automation, pages 575–584. Springer, 2017.
[276] Chunlin Li, Xiaofu Xie, Yuejiang Huang, Hong Wang, and Changxi
[257] Xiao Zhang, Jie Wang, Qinghua Gao, Xiaorui Ma, and Hongyu Wang. Niu. Distributed data mining based on deep neural network for wireless
Device-free wireless localization and activity recognition with deep sensor network. International Journal of Distributed Sensor Networks,
learning. In Pervasive Computing and Communication Workshops 11(7):157453, 2015.
(PerCom Workshops), 2016 IEEE International Conference on, pages
[277] Jonathan Ho and Stefano Ermon. Generative adversarial imitation
1–5, 2016.
learning. In Advances in Neural Information Processing Systems, pages
[258] Jie Wang, Xiao Zhang, Qinhua Gao, Hao Yue, and Hongyu Wang. 4565–4573, 2016.
Device-free wireless localization and activity recognition: A deep [278] Michele Zorzi, Andrea Zanella, Alberto Testolin, Michele De Filippo
learning approach. IEEE Transactions on Vehicular Technology, De Grazia, and Marco Zorzi. COBANETS: A new paradigm for
66(7):6258–6267, 2017. cognitive communications systems. In Computing, Networking and
[259] Mehdi Mohammadi, Ala Al-Fuqaha, Mohsen Guizani, and Jun-Seok Communications (ICNC), 2016 International Conference on, pages 1–
Oh. Semi-supervised deep reinforcement learning in support of IoT 7. IEEE, 2016.
and smart city services. IEEE Internet of Things Journal, 2017. [279] Mehdi Roopaei, Paul Rad, and Mo Jamshidi. Deep learning control
[260] Anil Kumar Tirumala Ravi Kumar, Bernd Schäufele, Daniel Becker, for complex and large scale cloud systems. Intelligent Automation &
Oliver Sawade, and Ilja Radusch. Indoor localization of vehicles using Soft Computing, pages 1–3, 2017.
deep learning. In 2016 IEEE 17th International Symposium on World [280] Shivashankar Subramanian and Arindam Banerjee. Poster: Deep
of Wireless, Mobile and Multimedia Networks (WoWMoM), pages 1–6. learning enabled M2M gateway for network optimization. In Proceed-
[261] Zejia Zhengj and Juyang Weng. Mobile device based outdoor nav- ings of the 14th Annual International Conference on Mobile Systems,
igation with on-line learning neural network: A comparison with Applications, and Services Companion, pages 144–144. ACM, 2016.
convolutional neural network. In Proceedings of the IEEE Conference [281] Ying He, Chengchao Liang, F Richard Yu, Nan Zhao, and Hongxi Yin.
on Computer Vision and Pattern Recognition Workshops, pages 11–18, Optimization of cache-enabled opportunistic interference alignment
2016. wireless networks: A big data deep reinforcement learning approach. In
[262] Wei Zhang, Kan Liu, Weidong Zhang, Youmei Zhang, and Jason Gu. 2017 IEEE International Conference on Communications (ICC), pages
Deep neural networks for wireless localization in indoor and outdoor 1–6.
environments. Neurocomputing, 194:279–287, 2016. [282] Ying He, Zheng Zhang, F Richard Yu, Nan Zhao, Hongxi Yin, Vic-
[263] Joao Vieira, Erik Leitinger, Muris Sarajlic, Xuhong Li, and Fredrik tor CM Leung, and Yanhua Zhang. Deep reinforcement learning-based
Tufvesson. Deep convolutional neural networks for massive MIMO optimization for cache-enabled opportunistic interference alignment
fingerprint-based positioning. In 28th Annual IEEE International wireless networks. IEEE Transactions on Vehicular Technology, 2017.
Symposium on Personal, Indoor and Mobile Radio Communications [283] Faris B Mismar and Brian L Evans. Deep reinforcement learning
(IEEE PIMRC 2017). IEEE–Institute of Electrical and Electronics for improving downlink mmwave communication performance. arXiv
Engineers Inc., 2017. preprint arXiv:1707.02329, 2017.
[264] Jiang Xiao, Kaishun Wu, Youwen Yi, and Lionel M Ni. FIFS: [284] YangMin Lee. Classification of node degree based on deep learning and
Fine-grained indoor fingerprinting system. In 2012 21st International routing method applied for virtual route assignment. Ad Hoc Networks,
Conference on Computer Communications and Networks (ICCCN), 58:70–85, 2017.
pages 1–7. [285] Hua Yang, Zhimei Li, and Zhiyong Liu. Neural networks for MANET
[265] Moustafa Youssef and Ashok Agrawala. The Horus WLAN location AODV: an optimization approach. Cluster Computing, pages 1–9,
determination system. In Proceedings of the 3rd international confer- 2017.
ence on Mobile systems, applications, and services, pages 205–218. [286] Fengxiao Tang, Bomin Mao, Zubair Md Fadlullah, Nei Kato, Osamu
ACM, 2005. Akashi, Takeru Inoue, and Kimihiro Mizutani. On removing routing
[266] Mauro Brunato and Roberto Battiti. Statistical learning theory for loca- protocol from future wireless networks: A real-time deep learning
tion fingerprinting in wireless LANs. Computer Networks, 47(6):825– approach for intelligent traffic control. IEEE Wireless Communications,
845, 2005. 2017.
[287] Qingchen Zhang, Man Lin, Laurence T Yang, Zhikui Chen, and Peng [307] Muhamad Erza Aminanto and Kwangjo Kim. Detecting impersonation
Li. Energy-efficient scheduling for real-time systems based on deep Q- attack in WiFi networks using deep learning approach. In Interna-
learning model. IEEE Transactions on Sustainable Computing, 2017. tional Workshop on Information Security Applications, pages 136–147.
[288] Ribal Atallah, Chadi Assi, and Maurice Khabbaz. Deep reinforcement Springer, 2016.
learning-based scheduling for roadside communication networks. In [308] Qingsong Feng, Zheng Dou, Chunmei Li, and Guangzhen Si. Anomaly
Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks detection of spectrum in wireless communication via deep autoencoder.
(WiOpt), 2017 15th International Symposium on, pages 1–8. IEEE, In International Conference on Computer Science and its Applications,
2017. pages 259–265. Springer, 2016.
[289] Sandeep Chinchali, Pan Hu, Tianshu Chu, Manu Sharma, Manu Bansal, [309] Muhammad Altaf Khan, Shafiullah Khan, Bilal Shams, and Jaime
Rakesh Misra, Marco Pavone, and Katti Sachin. Cellular network traffic Lloret. Distributed flood attack detection mechanism using artificial
scheduling with deep reinforcement learning. In National Conference neural network in wireless mesh networks. Security and Communica-
on Artificial Intelligence (AAAI), 2018. tion Networks, 9(15):2715–2729, 2016.
[290] Haoran Sun, Xiangyi Chen, Qingjiang Shi, Mingyi Hong, Xiao Fu, [310] Abebe Abeshu Diro and Naveen Chilamkurti. Distributed attack
and Nikos D Sidiropoulos. Learning to optimize: Training deep detection scheme using deep learning approach for Internet of Things.
neural networks for wireless resource management. arXiv preprint Future Generation Computer Systems, 2017.
arXiv:1705.09412, 2017. [311] Alan Saied, Richard E Overill, and Tomasz Radzik. Detection of
[291] Zhiyuan Xu, Yanzhi Wang, Jian Tang, Jing Wang, and Mustafa Cenk known and unknown DDoS attacks using artificial neural networks.
Gursoy. A deep reinforcement learning based framework for power- Neurocomputing, 172:385–393, 2016.
efficient resource allocation in cloud RANs. In 2017 IEEE Interna- [312] Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and
tional Conference on Communications (ICC), pages 1–6. Jaime Lloret. Conditional variational autoencoder for prediction and
[292] Paulo Victor R Ferreira, Randy Paffenroth, Alexander M Wyglinski, feature recovery applied to intrusion detection in IoT. Sensors,
Timothy M Hackett, Sven G Bilén, Richard C Reinhart, and Dale J 17(9):1967, 2017.
Mortensen. Multi-objective reinforcement learning-based deep neural [313] Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. Droid-
networks for cognitive space communications. In Cognitive Commu- Sec: deep learning in Android malware detection. In ACM SIGCOMM
nications for Aerospace Applications Workshop (CCAA), 2017, pages Computer Communication Review, volume 44, pages 371–372. ACM,
1–8. IEEE, 2017. 2014.
[293] Hao Ye and Geoffrey Ye Li. Deep reinforcement learning for resource [314] Zhenlong Yuan, Yongqiang Lu, and Yibo Xue. Droiddetector: Android
allocation in v2v communications. arXiv preprint arXiv:1711.00968, malware characterization and detection using deep learning. Tsinghua
2017. Science and Technology, 21(1):114–123, 2016.
[294] Oshri Naparstek and Kobi Cohen. Deep multi-user reinforcement learn- [315] Xin Su, Dafang Zhang, Wenjia Li, and Kai Zhao. A deep learning
ing for dynamic spectrum access in multichannel wireless networks. approach to Android malware feature learning and detection. In
arXiv preprint arXiv:1704.02613, 2017. Trustcom/BigDataSE/ISPA, 2016 IEEE, pages 244–251, 2016.
[295] Timothy J O’Shea and T Charles Clancy. Deep reinforcement learning
[316] Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. Deep4MalDroid:
radio control and signal detection with KeRLym, a gym RL agent.
A deep learning framework for Android malware detection based on
arXiv preprint arXiv:1605.09221, 2016.
linux kernel system call graphs. In Web Intelligence Workshops (WIW),
[296] Michael Andri Wijaya, Kazuhiko Fukawa, and Hiroshi Suzuki.
IEEE/WIC/ACM International Conference on, pages 104–111, 2016.
Intercell-interference cancellation and neural network transmit power
[317] Fabio Martinelli, Fiammetta Marulli, and Francesco Mercaldo. Eval-
optimization for MIMO channels. In Vehicular Technology Conference
uating convolutional neural network for effective mobile malware
(VTC Fall), 2015 IEEE 82nd, pages 1–5, 2015.
detection. Procedia Computer Science, 112:2372–2381, 2017.
[297] Michael Andri Wijaya, Kazuhiko Fukawa, and Hiroshi Suzuki. Neural
[318] Niall McLaughlin, Jesus Martinez del Rincon, BooJoong Kang,
network based transmit power control and interference cancellation for
Suleiman Yerima, Paul Miller, Sakir Sezer, Yeganeh Safaei, Erik
MIMO small cell networks. IEICE Transactions on Communications,
Trickel, Ziming Zhao, Adam Doupé, et al. Deep android malware
99(5):1157–1169, 2016.
detection. In Proceedings of the Seventh ACM on Conference on Data
[298] Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. Neural adap-
and Application Security and Privacy, pages 301–308. ACM, 2017.
tive video streaming with pensieve. In Proceedings of the Conference
of the ACM Special Interest Group on Data Communication, pages [319] Yuanfang Chen, Yan Zhang, and Sabita Maharjan. Deep learning for
197–210. ACM, 2017. secure mobile edge computing. arXiv preprint arXiv:1709.08025, 2017.
[299] Tetsuya Oda, Ryoichiro Obukata, Makoto Ikeda, Leonard Barolli, and [320] Milan Oulehla, Zuzana Komínková Oplatková, and David Malanik.
Makoto Takizawa. Design and implementation of a simulation system Detection of mobile botnets using neural networks. In Future Tech-
based on deep Q-network for mobile actor node control in wireless nologies Conference (FTC), pages 1324–1326. IEEE, 2016.
sensor and actor networks. In Advanced Information Networking and [321] Pablo Torres, Carlos Catania, Sebastian Garcia, and Carlos Garcia
Applications Workshops (WAINA), 2017 31st International Conference Garino. An analysis of recurrent neural networks for botnet detection
on, pages 195–200. IEEE, 2017. behavior. In Biennial Congress of Argentina (ARGENCON), 2016
[300] Tetsuya Oda, Donald Elmazi, Miralda Cuka, Elis Kulla, Makoto Ikeda, IEEE, pages 1–6, 2016.
and Leonard Barolli. Performance evaluation of a deep Q-network [322] Meisam Eslahi, Moslem Yousefi, Maryam Var Naseri, YM Yussof,
based simulation system for actor node mobility control in wireless NM Tahir, and H Hashim. Mobile botnet detection model based on
sensor and actor networks considering three-dimensional environment. retrospective pattern recognition. International Journal of Security and
In International Conference on Intelligent Networking and Collabora- Its Applications, 10(9):39–+, 2016.
tive Systems, pages 41–52. Springer, 2017. [323] Mohammad Alauthaman, Nauman Aslam, Li Zhang, Rafe Alasem,
[301] Hye-Young Kim and Jong-Min Kim. A load balancing scheme based and MA Hossain. A p2p botnet detection scheme based on decision
on deep-learning in IoT. Cluster Computing, 20(1):873–878, 2017. tree and adaptive multilayer neural networks. Neural Computing and
[302] Lu Liu, Yu Cheng, Lin Cai, Sheng Zhou, and Zhisheng Niu. Deep Applications, pages 1–14, 2016.
learning based optimization in wireless network. In 2017 IEEE [324] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning.
International Conference on Communications (ICC), pages 1–6. In Proceedings of the 22nd ACM SIGSAC conference on computer and
[303] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Pri- communications security, pages 1310–1321. ACM, 2015.
oritized experience replay. arXiv preprint arXiv:1511.05952, 2015. [325] Yoshinori Aono, Takuya Hayashi, Lihua Wang, Shiho Moriai, et al.
[304] Qingjiang Shi, Meisam Razaviyayn, Zhi-Quan Luo, and Chen He. An Privacy-preserving deep learning: Revisited and enhanced. In Inter-
iteratively weighted MMSE approach to distributed sum-utility maxi- national Conference on Applications and Techniques in Information
mization for a MIMO interfering broadcast channel. IEEE Transactions Security, pages 100–110. Springer, 2017.
on Signal Processing, 59(9):4331–4340, 2011. [326] Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee,
[305] Anna L Buczak and Erhan Guven. A survey of data mining and Nic Lane, and Hamed Haddadi. A hybrid deep learning architecture for
machine learning methods for cyber security intrusion detection. IEEE privacy-preserving mobile analytics. arXiv preprint arXiv:1703.02952,
Communications Surveys & Tutorials, 18(2):1153–1176, 2016. 2017.
[306] Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday [327] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya
Tupakula. Autoencoder-based feature learning for cyber security Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential
applications. In Neural Networks (IJCNN), 2017 International Joint privacy. In Proceedings of the 2016 ACM SIGSAC Conference on
Conference on, pages 3854–3861. IEEE, 2017. Computer and Communications Security, pages 308–318. ACM, 2016.
[328] Seyed Ali Osia, Ali Shahin Shamsabadi, Ali Taheri, Kleomenis Kat- [350] Timothy J., O’Shea, and Jakob Hoydis. An introduction to deep
evas, Hamed Haddadi, and Hamid R Rabiee. Private and scalable learning for the physical layer. arXiv preprint arXiv:1702.00832, 2017.
personal data analytics using a hybrid edge-cloud deep learning. IEEE [351] Roberto Gonzalez, Alberto Garcia-Duran, Filipe Manco, Mathias
Computer Magazine Special Issue on Mobile and Embedded Deep Niepert, and Pelayo Vallina. Network data monetization using Net2Vec.
Learning, 2018. In Proceedings of the SIGCOMM Posters and Demos, pages 37–39.
[329] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet ACM, 2017.
allocation. Journal of machine Learning research, 3(Jan):993–1022, [352] Nichoas Kaminski, Irene Macaluso, Emanuele Di Pascale, Avishek
2003. Nag, John Brady, Mark Kelly, Keith Nolan, Wael Guibene, and Linda
[330] Sandra Servia-Rodriguez, Liang Wang, Jianxin R Zhao, Richard Doyle. A neural-network-based realization of in-network computation
Mortier, and Hamed Haddadi. Personal model training under privacy for the Internet of Things. In 2017 IEEE International Conference on
constraints. In Proceedings of the 3rd ACM/IEEE International Con- Communications (ICC), pages 1–6.
ference on Internet-of-Things Design and Implementation, Apr 2018. [353] Liang Xiao, Yanda Li, Guoan Han, Huaiyu Dai, and H Vincent Poor.
[331] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. Deep A secure mobile crowdsensing game with deep reinforcement learning.
models under the GAN: Information leakage from collaborative deep IEEE Transactions on Information Forensics and Security, 2017.
learning. arXiv preprint arXiv:1702.07464, 2017. [354] Sebastian Dörner, Sebastian Cammerer, Jakob Hoydis, and Stephan ten
[332] Sam Greydanus. Learning the enigma with recurrent neural networks. Brink. Deep learning-based communication over the air. arXiv preprint
arXiv preprint arXiv:1708.07576, 2017. arXiv:1707.03384, 2017.
[333] Houssem Maghrebi, Thibault Portigliatti, and Emmanuel Prouff. Break- [355] Roberto Gonzalez, Filipe Manco, Alberto Garcia-Duran, Jose Mendes,
ing cryptographic implementations using deep learning techniques. In Felipe Huici, Saverio Niccolini, and Mathias Niepert. Net2Vec: Deep
International Conference on Security, Privacy, and Applied Cryptog- learning for the network. In Proceedings of the Workshop on Big Data
raphy Engineering, pages 3–26. Springer, 2016. Analytics and Machine Learning for Data Communication Networks,
[334] Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim, Sang C Suh, Ikkyun pages 13–18, 2017.
Kim, and Kuinam J Kim. A survey of deep learning-based network [356] David H Wolpert and William G Macready. No free lunch theorems for
anomaly detection. Cluster Computing, pages 1–13, 2017. optimization. IEEE transactions on evolutionary computation, 1(1):67–
[335] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A Ghorbani. 82, 1997.
A detailed analysis of the kdd cup 99 data set. In Computational [357] Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of
Intelligence for Security and Defense Applications, 2009. CISDA 2009. model compression and acceleration for deep neural networks. arXiv
IEEE Symposium on, pages 1–6, 2009. preprint arXiv:1710.09282, 2017. To appear in IEEE Signal Processing
[336] Kimberly Tam, Ali Feizollah, Nor Badrul Anuar, Rosli Salleh, and Magazine.
Lorenzo Cavallaro. The evolution of android malware and Android [358] Nicholas D Lane, Sourav Bhattacharya, Akhil Mathur, Petko Georgiev,
analysis techniques. ACM Computing Surveys (CSUR), 49(4):76, 2017. Claudio Forlivesi, and Fahim Kawsar. Squeezing deep learning into
[337] Rafael A Rodríguez-Gómez, Gabriel Maciá-Fernández, and Pedro mobile and embedded devices. IEEE Pervasive Computing, 16(3):82–
García-Teodoro. Survey and taxonomy of botnet research through life- 88, 2017.
cycle. ACM Computing Surveys (CSUR), 45(4):45, 2013. [359] Jie Tang, Dawei Sun, Shaoshan Liu, and Jean-Luc Gaudiot. Enabling
[338] Menghan Liu, Haotian Jiang, Jia Chen, Alaa Badokhon, Xuetao Wei, deep learning on IoT devices. Computer, 50(10):92–96, 2017.
and Ming-Chun Huang. A collaborative privacy-preserving deep [360] Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf,
learning system in distributed mobile environment. In Computational William J Dally, and Kurt Keutzer. SqueezeNet: AlexNet-level accu-
Science and Computational Intelligence (CSCI), 2016 International racy with 50x fewer parameters and< 0.5 MB model size. International
Conference on, pages 192–197. IEEE, 2016. Conference on Learning Representations (ICLR), 2017.
[339] Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity [361] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko,
metric discriminatively, with application to face verification. In Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam.
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Mobilenets: Efficient convolutional neural networks for mobile vision
Computer Society Conference on, volume 1, pages 539–546, 2005. applications. arXiv preprint arXiv:1704.04861, 2017.
[340] Briland Hitaj, Paolo Gasti, Giuseppe Ateniese, and Fernando Perez- [362] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet:
Cruz. PassGAN: A deep learning approach for password guessing. An extremely efficient convolutional neural network for mobile devices.
arXiv preprint arXiv:1709.00440, 2017. arXiv preprint arXiv:1707.01083, 2017.
[341] Neev Samuel, Tzvi Diskin, and Ami Wiesel. Deep MIMO detection. [363] Qingchen Zhang, Laurence T Yang, Xingang Liu, Zhikui Chen, and
arXiv preprint arXiv:1706.01151, 2017. Peng Li. A tucker deep computation model for mobile multimedia
[342] Xin Yan, Fei Long, Jingshuai Wang, Na Fu, Weihua Ou, and Bin feature learning. ACM Transactions on Multimedia Computing, Com-
Liu. Signal detection of MIMO-OFDM system based on auto encoder munications, and Applications (TOMM), 13.
and extreme learning machine. In Neural Networks (IJCNN), 2017 [364] Qingqing Cao, Niranjan Balasubramanian, and Aruna Balasubrama-
International Joint Conference on, pages 1602–1606. IEEE, 2017. nian. MobiRNN: Efficient recurrent neural network execution on
[343] Timothy J O’Shea, Seth Hitefield, and Johnathan Corgan. End-to-end mobile GPU. 2017.
radio traffic sequence recognition with recurrent neural networks. In [365] Chun-Fu Chen, Gwo Giun Lee, Vincent Sritapan, and Ching-Yung
Signal and Information Processing (GlobalSIP), 2016 IEEE Global Lin. Deep convolutional neural network on iOS mobile devices. In
Conference on, pages 277–281, 2016. Signal Processing Systems (SiPS), 2016 IEEE International Workshop
[344] Timothy J O’Shea, Kiran Karra, and T Charles Clancy. Learning to on, pages 130–135, 2016.
communicate: Channel auto-encoders, domain specific regularizers, and [366] S Rallapalli, H Qiu, A Bency, S Karthikeyan, R Govindan, B Man-
attention. In Signal Processing and Information Technology (ISSPIT), junath, and R Urgaonkar. Are very deep neural networks feasible on
2016 IEEE International Symposium on, pages 223–228, 2016. mobile devices. IEEE Trans. Circ. Syst. Video Technol, 2016.
[345] Nathan E West and Tim O’Shea. Deep architectures for modulation [367] Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio
recognition. In Dynamic Spectrum Access Networks (DySPAN), 2017 Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. DeepX: A
IEEE International Symposium on, pages 1–6, 2017. software accelerator for low-power deep learning inference on mobile
[346] Sreeraj Rajendran, Wannes Meert, Domenico Giustiniano, Vincent devices. In Information Processing in Sensor Networks (IPSN), 2016
Lenders, and Sofie Pollin. Distributed deep learning models for wireless 15th ACM/IEEE International Conference on, pages 1–12, 2016.
signal classification with low-cost spectrum sensors. arXiv preprint [368] Loc N Huynh, Rajesh Krishna Balan, and Youngki Lee. DeepMon:
arXiv:1707.08908, 2017. Building mobile GPU deep learning models for continuous vision ap-
[347] David Neumann, Wolfgang Utschick, and Thomas Wiese. Deep plications. In Proceedings of the 15th Annual International Conference
channel estimation. In WSA 2017; 21th International ITG Workshop on Mobile Systems, Applications, and Services, pages 186–186. ACM,
on Smart Antennas; Proceedings of, pages 1–6. VDE, 2017. 2017.
[348] Timothy J O’Shea, Tugba Erpek, and T Charles Clancy. Deep learning [369] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng.
based MIMO communications. arXiv preprint arXiv:1707.07980, 2017. Quantized convolutional neural networks for mobile devices. In
[349] Timothy J O’Shea, Latha Pemula, Dhruv Batra, and T Charles Clancy. Proceedings of the IEEE Conference on Computer Vision and Pattern
Radio transformer networks: Attention models for learning to synchro- Recognition, pages 4820–4828, 2016.
nize in wireless systems. In Signals, Systems and Computers, 2016 [370] Sourav Bhattacharya and Nicholas D Lane. Sparsification and sepa-
50th Asilomar Conference on, pages 662–666, 2016. ration of deep learning layers for constrained resource inference on
wearables. In Proceedings of the 14th ACM Conference on Embedded Edward Grefenstette, Tiago Ramalho, John Agapiou, et al. Hybrid
Network Sensor Systems CD-ROM, pages 176–189. ACM, 2016. computing using a neural network with dynamic external memory.
[371] Minsik Cho and Daniel Brand. Mec: Memory-efficient convolution for Nature, 538(7626):471–476, 2016.
deep neural network. arXiv preprint arXiv:1706.06873, 2017. [391] German I Parisi, Jun Tani, Cornelius Weber, and Stefan Wermter.
[372] Jia Guo and Miodrag Potkonjak. Pruning filters and classes: Towards Lifelong learning of human actions with deep neural network self-
on-device customization of convolutional neural networks. In Proceed- organization. Neural Networks, 2017.
ings of the 1st International Workshop on Deep Learning for Mobile [392] Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, and
Systems and Applications, pages 13–17. ACM, 2017. Shie Mannor. A deep hierarchical approach to lifelong learning in
[373] Shiming Li, Duo Liu, Chaoneng Xiang, Jianfeng Liu, Yingjian Ling, Minecraft. In AAAI, pages 1553–1561, 2017.
Tianjun Liao, and Liang Liang. Fitcnn: A cloud-assisted lightweight [393] Daniel López-Sánchez, Angélica González Arrieta, and Juan M Cor-
convolutional neural network framework for mobile devices. In Em- chado. Deep neural networks and transfer learning applied to multime-
bedded and Real-Time Computing Systems and Applications (RTCSA), dia web mining. In Distributed Computing and Artificial Intelligence,
2017 IEEE 23rd International Conference on, pages 1–6, 2017. 14th International Conference, volume 620, page 124. Springer, 2018.
[374] Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Hender- [394] Ejder Baştuğ, Mehdi Bennis, and Mérouane Debbah. A transfer
son, and Przemysław Szczepaniak. Fast, compact, and high quality learning approach for cache-enabled wireless networks. In Modeling
lstm-rnn based statistical parametric speech synthesizers for mobile and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt),
devices. arXiv preprint arXiv:1606.06061, 2016. 2015 13th International Symposium on, pages 161–166. IEEE, 2015.
[375] Gabriel Falcao, Luís A Alexandre, J Marques, Xavier Frazão, and [395] Li Fei-Fei, Rob Fergus, and Pietro Perona. One-shot learning of
Joao Maria. On the evaluation of energy-efficient deep learning using object categories. IEEE transactions on pattern analysis and machine
stacked autoencoders on mobile gpus. In Parallel, Distributed and intelligence, 28(4):594–611, 2006.
Network-based Processing (PDP), 2017 25th Euromicro International [396] Mark Palatucci, Dean Pomerleau, Geoffrey E Hinton, and Tom M
Conference on, pages 270–273. IEEE, 2017. Mitchell. Zero-shot learning with semantic output codes. In Advances
[376] Surat Teerapittayanon, Bradley McDanel, and HT Kung. Distributed in neural information processing systems, pages 1410–1418, 2009.
deep neural networks over the cloud, the edge and end devices. In Dis- [397] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al.
tributed Computing Systems (ICDCS), 2017 IEEE 37th International Matching networks for one shot learning. In Advances in Neural
Conference on, pages 328–339, 2017. Information Processing Systems, pages 3630–3638, 2016.
[377] Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Steven Bohez, [398] Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha.
Pieter Simoens, Piet Demeester, and Bart Dhoedt. Distributed neural Synthesized classifiers for zero-shot learning. In Proceedings of the
networks for Internet of Things: the big-little approach. In Internet of IEEE Conference on Computer Vision and Pattern Recognition, pages
Things. IoT Infrastructures: Second International Summit, IoT 360Âř 5327–5336, 2016.
2015, Rome, Italy, October 27-29, 2015, Revised Selected Papers, Part [399] Junhyuk Oh, Satinder Singh, Honglak Lee, and Pushmeet Kohli. Zero-
II, pages 484–492. Springer, 2016. shot task generalization with multi-task deep reinforcement learning.
[378] Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P arXiv preprint arXiv:1706.05064, 2017.
How, and John Vian. Deep decentralized multi-task multi-agent [400] Huandong Wang, Fengli Xu, Yong Li, Pengyu Zhang, and Depeng Jin.
reinforcement learning under partial observability. Understanding mobile traffic patterns of large scale cellular towers in
[379] Suyog Gupta, Wei Zhang, and Fei Wang. Model accuracy and runtime urban environment. In Proc. ACM IMC, pages 225–238, 2015.
tradeoff in distributed deep learning: A systematic study. In Data [401] Cristina Marquez, Marco Gramaglia, Marco Fiore, Albert Banchs,
Mining (ICDM), 2016 IEEE 16th International Conference on, pages Cezary Ziemlicki, and Zbigniew Smoreda. Not all Apps are created
171–180, 2016. equal: Analysis of spatiotemporal heterogeneity in nationwide mobile
[380] Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. service usage. In Proceedings of the 13th ACM Conference on
Hogwild: A lock-free approach to parallelizing stochastic gradient Emerging Networking Experiments and Technologies. ACM, 2017.
descent. In Advances in neural information processing systems, pages [402] Gianni Barlacchi, Marco De Nadai, Roberto Larcher, Antonio Casella,
693–701, 2011. Cristiana Chitic, Giovanni Torrisi, Fabrizio Antonelli, Alessandro
[381] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Vespignani, Alex Pentland, and Bruno Lepri. A multi-source dataset of
Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaim- urban life in the city of Milan and the province of Trentino. Scientific
ing He. Accurate, large Minibatch SGD: Training imagenet in 1 hour. data, 2, 2015.
arXiv preprint arXiv:1706.02677, 2017. [403] Liang Liu, Wangyang Wei, Dong Zhao, and Huadong Ma. Urban
[382] ShuaiZheng Ruiliang Zhang and JamesT Kwok. Asynchronous dis- resolution: New metric for measuring the quality of urban sensing.
tributed semi-stochastic gradient optimization. In AAAI, 2016. IEEE Transactions on Mobile Computing, 14(12):2560–2575, 2015.
[383] Corentin Hardy, Erwan Le Merrer, and Bruno Sericola. Distributed [404] Denis Tikunov and Toshikazu Nishimura. Traffic prediction for mobile
deep learning on edge-devices: feasibility via adaptive compression. network using Holt-Winters exponential smoothing. In Proc. SoftCOM,
arXiv preprint arXiv:1702.04683, 2017. Split–fDubrovnik, Croatia, September 2007.
[384] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and [405] Hyun-Woo Kim, Jun-Hui Lee, Yong-Hoon Choi, Young-Uk Chung, and
Blaise Aguera y Arcas. Communication-efficient learning of deep net- Hyukjoon Lee. Dynamic bandwidth provisioning using ARIMA-based
works from decentralized data. In Proceedings of the 20th International traffic forecasting for Mobile WiMAX. Computer Communications,
Conference on Artificial Intelligence and Statistics, volume 54, pages 34(1):99–106, 2011.
1273–1282, Fort Lauderdale, FL, USA, 20–22 Apr 2017. [406] Muhammad Usama, Junaid Qadir, Aunn Raza, Hunain Arif, Kok-
[385] B McMahan and Daniel Ramage. Federated learning: Collaborative Lim Alvin Yau, Yehia Elkhatib, Amir Hussain, and Ala Al-Fuqaha.
machine learning without centralized training data. Google Research Unsupervised machine learning for networking: Techniques, applica-
Blog, 2017. tions and research challenges. arXiv preprint arXiv:1709.06599, 2017.
[386] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, [407] Chong Zhou and Randy C Paffenroth. Anomaly detection with
H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD
Karn Seth. Practical secure aggregation for privacy preserving machine International Conference on Knowledge Discovery and Data Mining,
learning. Cryptology ePrint Archive, Report 2017/281, 2017. https: pages 665–674. ACM, 2017.
//eprint.iacr.org/2017/281. [408] Martín Abadi and David G Andersen. Learning to protect com-
[387] Angelo Furno, Marco Fiore, and Razvan Stanica. Joint spatial and munications with adversarial neural cryptography. arXiv preprint
temporal classification of mobile traffic demands. In Proc. IEEE arXiv:1610.06918, 2016.
INFOCOM, 2017. [409] Silver David, Julian Schrittwieser, Karen Simonyan, Ioannis
[388] Zhiyuan Chen and Bing Liu. Lifelong machine learning. Synthesis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas
Lectures on Artificial Intelligence and Machine Learning, 10(3):1–145, Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap,
2016. Fan Hui, Laurent Sifre, George van den Driessche, Graepel Thore,
[389] Sang-Woo Lee, Chung-Yeon Lee, Dong-Hyun Kwak, Jiwon Kim, and Demis Hassabis. Mastering the game of Go without human
Jeonghee Kim, and Byoung-Tak Zhang. Dual-memory deep learning knowledge. Nature, 550(7676):354–359, 2017.
architectures for lifelong learning of everyday human behaviors. In [410] Jingjin Wu, Yujing Zhang, Moshe Zukerman, and Edward Kai-Ning
IJCAI, pages 1669–1675, 2016. Yung. Energy-efficient base-stations sleep-mode techniques in green
[390] Alex Graves, Greg Wayne, Malcolm Reynolds, Tim Harley, Ivo Dani- cellular networks: A survey. IEEE communications surveys & tutorials,
helka, Agnieszka Grabska-Barwińska, Sergio Gómez Colmenarejo, 17(2):803–826, 2015.

Deep Learning in Mobile and Wireless Networking: A Survey PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning in Mobile and Wireless Networking: A Survey PDF

Uploaded by

Copyright:

Available Formats

IEEE COMMUNICATIONS SURVEYS & TUTORIALS 1

Deep Learning in Mobile and Wireless Networking:

mobile traffic volumes, agile management of network resource

Sec. II Sec. IV Sec. V

Sec. VI Sec. VII

Network Data App-level Data Deep Deep

Network Traffic CDR Mobile Mobile Pattern Mobile NLP

Fig. 1: Diagramatic view of the organization of this survey.

A. Overviews of Deep Learning and its Applications

cations in the mobile networking domain?

TABLE IV: Comparison of mobile deep learning platform.

Input layer Hidden variables

States Agent Outputs

F. Generative Adversarial Network applicability of traditional reinforcement learning algorithms.

TABLE VI: The taxonomy of mobile big data.

ity [32]. Some network-level data (e.g. mobile traffic) can be

TABLE VII: A summary of work on network-level mobile data analysis.

TABLE VIII: A summary of works on app-level mobile data analysis.

Reinforcement Learning management of network resources and functions for a given

TABLE XI: A summary of work on deep learning driven network control.

methods. less energy. In addition, Ferreira et al. employ deep State-

TABLE XII: A summary of work on deep learning driven network security.

Model Parallelism Training Parallelism

Deep Lifelong Learning Deep Transfer Learning

Deep Learning Model

Knowledge 1 Knowledge 2 Knowledge n Knowledge 1 Knowledge Knowledge 2 Knowledge Knowledge n

VIII. F UTURE R ESEARCH P ERSPECTIVES

datasets, which can dramatically advance the deep learning

Specifically, the evolution of large -scale mobile traffic

You might also like