You are on page 1of 6

1

Review: A Deep Learning Approach to Network


Intrusion Detection
Mayra Macas, ID Student: 11721083

Abstract—One of the key challenges in artificial intelligence neural network algorithms aspects [3]. The use of DL for
(AI) is how to insert the informal knowledge into a computer. attack detection in the cyberspace could be a resilient mech-
Several AI projects have sought to hard-code knowledge about anism to small mutations or novel attacks because of its
the world in formal languages. Nevertheless, a major source
of difficulty in many real-world AI applications is that many high-level feature extraction capability [4]. The self-taught
variation factors influence every single piece of data we are able and compression capabilities of deep learning architectures
to observe. Most applications require us to disentangle the factors are key mechanisms for hidden pattern discovery from the
of variation and discard the ones that we do not care about. training data so that attacks are discriminated from benign
Of course, it can be very difficult to extract such high-level, traffic [5] [6] [7]. Existing DL approaches utilized in intrusion
abstract features from raw data. When it is nearly as difficult
to obtain a representation as to solve the original problem, detection have been categorized into unsupervised deep learn-
representation learning does not, at first glance, seem to help us. ing methods and supervised deep learning methods. The differ-
Deep learning (DL) solves this central problem in representation ence between supervised and unsupervised methods is whether
learning by introducing representations that are expressed in training data are labeled [8]. Specifically, unsupervised deep
terms of other, simpler representations. Nowadays DL is gaining learning methods using unlabeled data include Auto-encoders
much popularity and wide use in various computer science fields,
such as object recognition, speech recognition, computer vision [4] [5] [9] [14], restricted Boltzmann machine (RBM) [15],
and so forth. Nevertheless, the application of DL in network deep belief network (DBN) [10] and recurrent neural network
systems has just started to receive research attention. In this (RNN) [11]. The supervised deep learning methods include
study, we are concerned with the investigation of the various deep convolutional neural network (CNN) which is combined with
learning techniques employed for network intrusion detection. multi-layer perceptron (MLP) in [13]. With this motivation,
Index Terms—Machine learning, Deep Neural Network, Intru- the deep learning based approaches can help to overcome
sion Detection Systems the challenges of developing an efficient Intrusion Detection
System.
I. I NTRODUCTION The remainder of this analysis is organized as follows. Sec-
tion 2 provides an overview the most popular DL algorithms
Most organizations nowadays face a rapid evolution of employed for network intrusion detection, while Section 3 con-
information and communication technologies (ICTs), convert- stitutes a literature survey concentrating on defining existing
ing them into modern organizations. However, the growing methodologies with the dataset used for evaluation. In Section
dependence and reliance on these information systems (IT) 4 we conclude the paper.
have also led to increased different types of cyber-terrorism
and cyber attackers that in many cases have been proven II. BACKGROUND
difficult to be controlled and detected by IT technicians on Deep Learning (DL) models are based on Artificial Neural
time. How to identify network attacks is a key problem that Networks (ANNs). These computational models are more
should never been overlooked. As an important and active or less designed in analogy with biological brain modules.
security mechanism, Intrusion Detection has become the key Nowadays shallow ANNs are back in fashion under the term
technology of network security. The objective of Intrusion DL. One of the milestones in this story occurred in 2006 when
Detection Systems (IDS) is to identify unusual access or at- Geoffrey Hinton’s lab was able to train efficiently a deep
tacks on internal network security. IDSs are usually hybrid and network able to reconstruct high-dimensional input vectors
combine anomaly detection and misuse detection modules [1]. from a small central layer [16]. After this discovery by Hinton
The anomaly detection module classifies the abnormal network et al. other deep networks using the same principle have
traffic. The misuse detection module classifies attack patterns been introduced successfully in classification tasks [17]. Deep
with known signatures or extracts new signatures from the networks IDS can be classified based on the way architectures
attack labeled data coming from the anomaly module [2]. and techniques are being utilized. In this section we focus
Nevertheless, IDS produces high false-positive rates in the in the most popular DL algorithms employed for network
identification of novel attacks. intrusion detection. Figure 1 shows a classification of Deep
Within this context, the scientific community has studied networks IDS
and designed models based on deep learning (DL). DL has
become practical because of the improvement in CPU and A. Supervised Networks
M. Macas is with the Department of Computer Science, Zhejiang Univer- Supervised learning is the most common type of learning
sity, China,e-mail: (mayramacas11@zju.edu.cn). not only in Deep Learning but also in Machine Learning
2

B. Unsupervised Networks
Unlike the supervised models presented in Section 2.1,
this section is focused on other type of networks called
unsupervised models. These models aim to discover the hidden
structure of unlabeled data. From a probabilistic point of view,
this kind of learning is related to the problem of density
estimation, which deals with the estimation of the probability
distribution of the input data. One of the breakthroughs in the
development of DL techniques was the use of pre-training to
allow a more efficient training of deep networks [16]. The
Fig. 1: Classification of deep learning IDS
idea is that each block of a deep network can be pre-trained
using an unsupervised model. Each block captures regularities
in general. In this type of learning, the desired output of from its input distribution without requiring labeled data. This
the training data is known in advance, and it is supplied to process is done layer-wise, and the parameters learned in each
the model during training. Therefore, each training sample is block serve as a good initialization of the weights of that block
represented as a pair consisting of an input vector denoted in a deep neural network. This idea can also be regarded from
by x, and a desired output value denoted by y. The algorithm a probabilistic point of view. Consider random input-output
should infer a function which can be used for mapping samples pairs (x, y) in a neural network. Learning a mapping function
with unknown outputs. Within a probabilistic framework, this between both of them involves modeling an approximation
kind of models are also known as discriminative models of the probability distribution P (y | x) by maximizing its
because they model the conditional probability distribution likelihood. If the true P(x) and P (y | x) are related, learning
P (y | x) which can be used for predicting y from x [19]. P (x) may facilitate the modeling of the real target distribution
1) Deep Neural Networks (DNNs): An Artificial Neural P (y | x) [19].
Network (ANN) is a generic term to encompass any structure 1) Deep Boltzmann Machine (DBM): When DBM is
of interconnected neurons which sends information to each trained with a large supply of unlabeled data and fine-tuned
other. Deep-learning networks are distinguished from the more with labeled data, acts as a good classifier. Its structure is an
commonplace single-hidden-layer neural networks by their offspring of a general Boltzmann machine (BM) which is a
depth; that is, the number of node layers through which data network of units based on stochastic decisions to determine
passes in a multistep process of pattern recognition. Note their on and off states. BM algorithm is simple to train but
that a network with just one hidden layer cannot actually be turns out to be slow in the process. A reduction in the number
considered deep [18]. The main goal of using a neural net is of hidden layers of a DBM to one forms a Restricted Boltz-
to arrive at the point of least error as fast as possible. In order mann Machine (RBM). According to Salakhutdinov [20] Deep
to achieve this during the implementation of a deep learning Boltzmann machines are interesting for several reasons. First,
concept for IDS, Roy et al. [12] used a multilayer feed-forward like deep belief networks, DBMs have the potential of learning
network. The feed-forward network consisted of input layers, internal representations that become increasingly complex.
about 400 hidden layer neurons, and output neurons. The Second, high-level representations can be built from a large
activation functions used were the rectifier activation function supply of unlabeled sensory inputs and very limited labeled
and the softmax activation function. The authors proclaimed data can then be used to only slightly fine-tune the model for
that with the loss being set as Cross Entropy, this classification a specific task at hand. Finally, unlike deep belief networks,
model could be used to detect future intrusion attacks. the approximate inference procedure, in addition to an initial
2) Deep Convolutional Neural Networks (DCNNs)): A con- bottom-up pass, can incorporate top-down feedback, allowing
volutional neural network (CNN) is a type of discrimination deep Boltzmann machines to better propagate uncertainty,
deep architecture with one or more convolutional and pooling and hence deal more robustly with ambiguous inputs. Fiore
layers in an array to form a multilayer neural network. The et al. [15] used Restricted Boltzmann Machine in order to
architecture of this type of network is similar to other networks implement semi-supervised anomaly detection systems where
because they contain several hidden layers where each layer the classifier was trained with normal traffic data only, so that
applies an affine transformation to the input data followed by knowledge about anomalous behaviors was constructed and
a non-linearity. DCNNs leverage these three key ideas: local evolved in a dynamic way. The results revealed that when the
receptive field, shared weights and pooling layers [19]. The classifier was tested in a network widely different from the
CNN has been found highly effective and been commonly one where training data were taken from, the results decayed.
used in computer vision and image recognition. Therefore, This suggests the need for further investigation over the nature
using CNNs for image conversion, feature learning in in- of anomalous traffic and the intrinsic differences with normal
trusion detection is practicable. Li et al. [13] used CNNs traffic.
for automatically learning the features of graphic NSL-KDD 2) Deep Belief Networks (DBNs): A DBN is a deep gener-
transformation via a graphics conversion technique. Results ative model composed of a visible layer and multiple hidden
showed that the CNN model is sensitive to image conversion layers of latent variables. There are connections between the
of attack data and it can be used for intrusion detection. layers but not between units within each layer. DBN could
3

be used as a feature extraction method for dimensionality depth of an RNN can be as large as the length of the input data
reduction. On the other hand, when associating class labels sequence. RNNs are very powerful for modeling sequence data
with feature vectors, DBN is used for classification [19]. Gao (e.g., speech or text). With these precedents, Yin et al. [11]
et al. [10] applied DBN in the intrusion detection domain. Ac- proposed a deep learning approach for intrusion detection us-
cording to the authors, DBN not only learns high dimensional ing recurrent neural networks (RNN). The authors proclaimed
representations but also performs classification tasks in an that the RNN mode could be suitable for the development of
efficient manner. An unsupervised greedy learning algorithm a classification model with higher accuracy and performance
can be used to pre-train and fine-tune a DBN in order to learn a than the traditional machine learning classification methods in
similarity representation over the nonlinear, high-dimensional binary and multiclass classification.
input data, something that largely facilitates the classification
tasks. Therefore, the output of the intrusion detection model III. L ITERATURE REVIEW
could be improved when using DBN. Based on the background presented above, this section
3) Auto-Encoders: Auto-Encoders [19] are shallow neural focuses mainly on providing information about intrusion de-
networks that are trained to reconstruct their input using an tection technologies in conjunction with machine learning
intermediate representation (code). Auto-Encoders and RBMs technique. Figure 2 shows deep learning algorithms to Net-
are very similar models because they share the same ar- work Intrusion Detection.
chitecture: an input layer that represents the data, a hidden
layer that represents the code to be learnt, and the weighted
connections between them. However, The Auto-Encoder is
trained by minimizing the reconstruction error, so an extra
layer is added on the top to represent a reconstruction of the
original data. Both sets of weights are tight, which means that
they are actually the same weights. These models also have the
capability of extracting good representations from unlabeled
data that work well to initialize deeper networks [18]. One
of the main problems of the Auto-Encoder model is that it
could potentially learn a useless identity transformation when
the representation size (the hidden layer) is larger than the
input layer (the so-called over-complete case). To overcome
this potential limitation, there is a simple variant of the model
called Denoising Auto-Encoders (DAE) [21] that works well
in practice. The basic idea is to feed the encoder/decoder
system with a corrupted version of the original input, In order
to force the system to reconstruct the clean version. This Fig. 2: Classification of selected DL algorithms to Network
small change allows the DAE to learn useful representations, Intrusion Detection
even in the over-complete case. The Auto-Encoder model
is widely used in intrusion detection and according to Yu In the early 2000s, several studies presented two issues
et al. [9] it is possible to apply unsupervised deep learning in developing an effective and flexible Network Intrusion
techniques to automatically learn essential features from raw Detection System (NIDS) for unknown future attacks. The
Network Traffic counts. To that end, the authors employed a first issue was that it was difficult to extract features from the
variant of DAE based deep learning architecture called Stacked network traffic dataset for anomaly detection. The second issue
Denoising Autoencoders (SDA) to detect traffic generated by was that a labeled (trained) traffic dataset from real network
botnets. The SDA approach also exhibited the best perfor- traffic was not available. In order to overcome those Fiore
mance in binary classification on the datasets. While SDA et al. [15] used Restricted Boltzmann Machine in order to
and DBN have similar training principles, SDA appears to be implement semi-supervised anomaly detection systems where
comparable and superior to DBN. The authors claimed that the classifier was trained with normal traffic data only, so
SDA employing the denoising criterion can learn significant that knowledge about anomalous behaviors was constructed
higher level representation (features) from raw traffic data, and and evolved in a dynamic way. In the experimental phase,
that deep learning approaches have remarkable capabilities for the accuracy of RBM was tested in classifying normal data
the intrusion detection task. and data infected by bot. A second experiment trained RBM
4) Recurrent Neural Networks (RNNs): The RNNs can be with the KDD Cup 99 dataset and tested against real world
used for either unsupervised or supervised learning. When data. To randomize the order of test data, the experiment was
used in unsupervised learning mode, prediction of the data repeated 10 times. The experiment confirmed that testing a
sequence from previous data samples is possible, but this classifier in two different networks training data affects the
concept sometimes causes difficulty in the training phase. [18]. performance. The results revealed that when the classifier was
The RNN model architecture is a feedback loop linking layer tested in a network widely different from the one where train-
to layer with the ability to store data of previous input ing data were taken from, the results decayed. This suggests
increasing in that way the reliability of the model [19]. The the need for further investigation over the nature of anomalous
4

traffic and the intrinsic differences with normal traffic. The Ashfaq et al. [7] improved the classifiers performance
Deep Belief Networks (DBN) learning method consists of the for the intrusion detection systems (IDS) through a novel
unsupervised training of each layer of Restricted Boltzmann Fuzziness based semi-supervised learning approach by uti-
Machine (RBM), feeding the output vector of the last layer lizing unlabeled samples assisted with a supervised learning
of RBM as the input vector of BP neural network, and then algorithm. A single hidden layer feed-forward neural network
performing a supervised training to the BP neural network. (SLFN) was trained to output a fuzzy membership vector,
Gao et al. [10] applied DBN in the intrusion detection domain. and the sample categorization (low, mid, and high fuzziness
This deep learning model not only learns high-dimensional categories) on the unlabeled samples was performed using the
representations but also performs classification tasks in an fuzzy quantity. The classifier was trained after incorporating
efficient manner. The unsupervised greedy learning algorithm each category separately into the original training set. During
can be used to pre-train and fine-tune a DBN in order to learn a the experiment phase, the authors performed necessary scaling
similarity representation over the nonlinear, high-dimensional to normalize the data. Two subsets were extracted from the
input data, something that largely facilitates the classification training file. The division of training samples and unlabeled
tasks. In the experiment phase, the KDD Cup 1999 dataset was samples was done following a 10 : 90 ratio, where the 10%
used. It was distributed in 494021 records of train data and of the NSL-KDD dataset was used as labeled data and the
11850 records of test data. The model was denoted as DBN i , remaining 90% as unlabeled data. According to the results,
where i signifies the number of Restricted Boltzmann Machine the accuracy obtained by the proposed algorithm was the
(RBM) layers. Accuracy rates of 74.2%, 82.65%, 90.07%, highest compared to those obtained with J48, Naive Bayes, NB
93.4%, 86.82%, and 82.30% were achieved by the DBN 1 , tree, Random forests, Random tree, Multi-layer perceptron,
DBN 2 , DBN 3 , DBN 4 ; SVM and ANN, respectively. The and Support Vector Machine (SVM). Another cybersecurity
detection rates of the shallow DBN 1 and DBN 2 were not work utilizing the NSL-KDD dataset is Diro et al. [5] was
better than those of the SVM and the ANN, but the deep on NSL-KDD dataset. This work uses a self-taught deep
DBN 3 with its additional one layer RBM outscored them learning scheme in which unsupervised feature learning has
both. Therefore, the performance of the DBN model using been applied on training data. It utilizes a novel model of
RBM network with three or more layers outperforms SVM parallel training and parameters sharing by local fog nodes
and ANN. and it detects network attacks in the distributed fog-to-things
Meanwhile, Li et al. [4] demonstrated that the use of the networks following a deep learning approach. The outputs
AutoEncoder deep learning method Is effective for achieving of the model training on the distributed fog nodes are the
data dimension reduction and it can improve the detection ac- attack detection models and their associated local learning
curacy. In the experimental model the authors applied the Au- parameters. These local parameters are sent to the coordinating
toEnconder in order to convert complicated high-dimensional fog node for global update and re-propagation. This sharing
data into low dimensional codes with a nonlinear mapping, scheme results in better learning as it enables the sharing
thereby reducing the dimensionality of data, extracting its of the best parameters avoiding in this way local overfitting.
main features, and then applying the DBN learning method The results of the experiment give rise to two conclusions.
to detect malicious code. The comparative analysis between First, the distributed model has a better performance than the
single DBN and the DBN + AutoEnconder Yielded accuracy centralized model, since the increased number of nodes in the
rates of 89.75% and 91.4%, respectively. Consequently, the distributed network of Fog systems, leads to a 3% increase
hybrid malicious code detection model was superior to that in the overall accuracy of detection from around 96% to over
of the single DBN. Javaid et al. [6] applied deep learning 99%. Second, the detection rate exhibits as well that deep
techniques such as sparse AutoEncoder and SMR. Based on learning is better than classic machine learning for both binary
those techniques, this research introduced two main steps in and multi-classes.
terms of feature extraction and supervised classification. The Yu et al. [9] demonstrated the application of unsupervised
first step was to collect unlabeled network traffic data. Then, deep learning techniques to automatically learn essential fea-
the next step was to apply the extracted features to a labeled tures from raw Network Traffic counts. The authors Imple-
traffic dataset, which could be collected in a confined, isolated, mented a Stacked Denoising Autoencoders (SDA) based deep
and private network environment for supervised classification. learning architecture to detect traffic generated by botnets.
The primary goal of this research was to evaluate the perfor- The procedure of SDA was split into two stages. The first
mance of deep learning based on accuracy. To evaluate the stage, unsupervised layer-wise pre-training, is a greedy layer-
classification accuracy, this research evaluated the NSL-KDD wise training process. In the second stage, supervised fine-
training dataset using 2 − class (normal and attack), 5 − class tuning stage, a logistic regression layer for classification was
(normal and four different attack categories), and 23 − class added on top of the stacked denoising autoencoders. In the
(normal and twenty-two different attacks). According to the experiment phase, the UNB ISCX IDS 2012 dataset was
results, deep learning showed better accuracy performance for used. Two datasets of different sizes were constructed. One
2−class compared to SMR; however, there was no significant consisting of 43% of the UNB ISCX IDS 2012 and another
improvement for 5 − class and 23 − class. The accuracy using the original dataset in its entirety. The experiment was
for 2 − class was 88.39%, which was much higher than divided into three parts: binary classification using the SDA-
SMR (78.06%). It should be noted that The highest accuracy based deep neural network, multi-classification using the SDA-
achieved using the NB-Tree methodology was 82%. based deep neural network, and classification using different
5

deep learning architectures. In the binary classification using the RNN model achieved a detection rate of 83.28% when
the SDA-based deep neural network, the values of accuracy given 100 epochs. Meanwhile, the algorithms: J48, Naive
rate achieved were over 99%. These results indicate that the Bayesian, Random Forest, Multi-layer Perceptron, Support
SDA approach could learn highly significant features from the Vector Machine and ANN obtained a detection rate of 81.2%.
raw payloads of the network application layer. In the second In the five-category classification the RRN model obtained
part of the experiment, the multi-classification (8classes) an accuracy rate of 81.29%, which was better than the 79.9%
using the SDA-Based deep neural network, the SDA approach obtained by using J48, Naive Bayesian, Random Forest, Multi-
performed equally well as in the binary classification task on layer Perceptron, Support Vector Machine and ANN. In order
both the 43% dataset and the whole dataset. Specifically, the to increment the accuracy rate in the RNN model, they applied
SDA approach achieved 98.11% accuracy rate on the whole the reduced-size technique on the dataset, reaching in that way
dataset, and 97.96% accuracy rate on the 43% dataset. The to 97.09% (gain rate: 17.19%).
SDA approach performed better in the larger dataset for the Roy et al. [12] used a multilayer feed-forward network to
multi-classification task, a fact also observed in the binary clas- implement a deep learning concept for IDS. The feed-forward
sification. In the last part of the experiment, the classification network consisted of input layers, about 400 hidden layer
using different deep learning architectures, the authors used neurons, and output neurons. The activation functions used
2 − class and 8 − class on the 43% and the whole dataset. were the rectifier activation function and the softmax activation
The following deep learning approaches were evaluated: SDA, function. In the experimental phase, the authors compared the
SAE, DBN, and AE-CNN achieving accuracy rates of 99.41%, performance of the Deep Neural Network (DNN) with that
99.26%, 99.29% and 98.46% on the 43% dataset (2 − class). of a Support Vector Machine (SVM). The KDD Cup 1999
Similarly, the accuracy rates achieved on the whole dataset dataset was divided into two parts: training and validation,
for the 2 − class case were 99.48%, 99.42%, 99.39%, and 75% of the dataset were assigned to the training frame
98.54% respectively. Regarding the 8 − class case, accuracy and 25% of the dataset has been assigned to the validation
rates of 97.96%, 98.51%, 98.04% and 96.37% on the 43% frame. The results of the experiment provided the following
dataset were achieved by SDA, DBN, and AE-CNN, whereas information: The training and validation models have a very
the corresponding rates on the whole dataset were 98.11%, high R2 (0.999944) value. This high value indicates that the
97.96%, 97.55% and 93.58% respectively. According to the adopted model is highly accurate. In effect, the DNN has better
results SDA achieved better overall performance in almost all performance than SVM, since its accuracy rate was around
experiments except multi-classification on the 43% dataset. 99.99% compared to 84.63%. With the loss being set as Cross
Furthermore, while other approaches got worse performance Entropy, this classification model could be used to detect future
as the dataset size increased, the SDA method achieved its intrusion attacks.
highest accuracy on the whole dataset. The SDA approach also Li et al. [13] employed a visual conversion of the NSL-KDD
demonstrated the best performance in the binary classification dataset format to evaluate the performance of Convolutional
on both datasets. While SDA and DBN have similar training Neural Networks (CNN) in detecting novel attacks. In the
principles, SDA appears to be comparable and superior to experimental phase, the data were categorized into five classes:
DBN. And the AE-CNN method always yields worse perfor- Normal, Dos, Probe, U2R, and R2L. The NSL-KDD feature
mance than others. The authors have demonstrated that SDA attributes were classified into three groups: basic features,
employing the denoising criterion can learn significant higher traffic features, and content features. Each sample of the NSL-
level representation (features) from raw traffic data, and that KDD dataset contained 41 features (integer, float, symbolic,
deep learning approaches have remarkable capabilities for the or binary). To convert NSL-KDD data format into the visual
intrusion detection task. image type, they mapped various types of features into binary
Yin et al. [11] proposed a deep learning approach for vector space and then transformed the binary vector into the
intrusion detection using recurrent neural networks (RNN). image. After that, the NSL-KDD data form turned into a binary
The principal goal of this study was to demonstrate that RNN vector with 464 dimensions. Then they turned every 8 bits
is suitable for the development of a classification model with into a grayscale pixel. The binary vector with 464 dimensions
higher accuracy and performance than the traditional machine turned into an 8 ∗ 8 grayscale image with vacant pixels padded
learning classification methods in binary and multiclass clas- by 0. The NSL-KDD dataset was divided into training and
sification. To evaluate the classification accuracy, this research test sets. In order to test intrusion detection and identify the
used the NSL-KDD dataset in the experimental phase. In this ability to discover new attacks, there were 17 additional attack
phase, the training of the RNN model consisted of two states: types were incorporated in the test set. The results showed
Forward Propagation and Back Propagation. The experiment that the proposed method had good performance on the NSL-
consisted of two parts: the study of the performance of the KDD Test21 dataset in comparison to J48, Naive Bayes, NB
RNN-IDS model for binary classification (Normal, anomaly) Tree, Random forest, Random tree, Multi-layer perceptron,
and five category classification such as Normal, DoS, R2L, and SVM. The CNN model obtained 81.57% of accuracy
U2R and Probe. In the binary classification and the five- rate, whereas, the other algorithms achieved accuracy rates of
category classification, they compared the performance of the 63.97%, 55.77%, 66.16%, 62.26%, 58.51%, 57.34%, 42.29%
RNN model with that of an ANN, naive Bayesian, random and 81.57% respectively.
forest, support vector machine, random tree and multilayer One of the most recent works in cyber-security is of Shone
perceptron. The result in the binary classification showed that et al [14]. In this study the authors proposed a non-symmetric
6

deep auto-encoder (NDAE) for unsupervised feature learning. [6] A. Javaid, Q. Niyaz and M. Alam (2016, May).A deep learning approach
The classifier model was constructed by stacked NDAEs on for network intrusion detection system. In Proceedings of the 9th EAI
International Conference on Bio-inspired Information and Communica-
KDD Cup99 and NSL-KDD datasets. Fundamentally, this tions Technologies (formerly BIONETICS) (pp. 21-26). ICST (Institute
involves the proposed shift from the encoder-decoder paradigm for Computer Sciences, Social-Informatics and Telecommunications En-
(symmetric) towards utilizing just the encoder phase (non- gineering).
[7] R. A. R.. Ashfaq, X. Z. Wang, J. Z. Huang, H. Abbas and Y. L. He
symmetric). The reasoning behind this is that given the correct (2017).Fuzziness based semi-supervised learning approach for intrusion
learning structure, it is possible to reduce both computa- detection system. Fuzziness based semi-supervised learning approach
tional and time overheads, with minimal impact on accuracy for intrusion detection system. Information Sciences, 378, 484-497.
[8] B. Krawczyk (2016).Learning from imbalanced data: open challenges
and efficiency. NDAE was used as a hierarchical unsuper- and future directions. Progress in Artificial Intelligence, 5(4), 221-232.
vised feature extractor. The model was realized by stacking [9] Y. Yu, J. Long and Z. Cai (2017, October).Session-Based Network
NDAEs in order to create a deep learning hierarchy. Stacking Intrusion Detection Using a Deep Learning Architecture. In Modeling
Decisions for Artificial Intelligence (pp. 144-155). Springer, Cham.
the NDAEs offered a layer-wise unsupervised representation [10] N. Gao, L. Gao, Q. Gao and H. Wang (2014, November).An intrusion
learning algorithm, which allowed the model to learn the detection model based on deep belief networks. In Advanced Cloud and
complex relationships between different features. It also had Big Data (CBD), 2014 247 on (pp. 247-252). IEEE.
[11] C. Yin, Y. Zhu, J. Fei and X. He Deep Learning Approach for Intrusion
feature extraction capabilities. Hence, it was able to refine the Detection Using Recurrent Neural Networks. IEEE Access, 5, 21954-
model by prioritizing the most descriptive features. The model 21961.
was implemented in GPU-enabled TensorFlow and evaluated [12] S.S. Roy, A. Mallik, R. Gulati, M.S. Obaidat and P. V. Krishna (2017,
January) A Deep Learning Based Artificial Neural Network Approach for
using the benchmark KDD Cup99 and NSL-KDD datasets. Intrusion Detection. In International Conference on Mathematics and
The authors combined the deep learning power of stacked Computing (pp. 44-53). Springer, Singapore.
NDAEs with a shallow learning classifier. Random Forest (RF) [13] Z. Li, Z. Qin, K. Huang, X. Yang and S. Ye (2017, November).Intrusion
Detection Using Convolutional Neural Networks for Representation
was used as shallow learning classifier in order to increase Learning. In International Conference on Neural Information Processing
the classification power of stacked autoencoders. The model (pp. 858-866). Springer, Cham.
trained the RF classifier using the encoded representations [14] N. Shone, T. N. Ngoc, V. D. Phai and Q. Shi (2018).A deep learning
approach to network intrusion detection. IEEE Transactions on Emerging
learned by the stacked NDAEs to classify network traffic into Topics in Computational Intelligence, 2(1), 41-50.
normal data and known attacks. In the evaluation phase, they [15] U. Fiore, F. Palmieri, A. Castiglione and A. De Santis (2013). Network
compared the stacked NDAE model against the mainstream anomaly detection with the restricted Boltzmann machine. Neurocom-
puting, 122, 13-23.
DBN technique. These comparisons have demonstrated that [16] G. E. Hinton, S. Osindero and Y. W. Teh (2006).A fast learning
the model offers up to a 5% improvement in accuracy and up algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
to 98.81% of training time reduction [17] G. E. Hinton and R. R. Salakhutdinov Reducing the dimensionality of
data with neural networks. science, 313(5786), 504-507.
[18] Y. LeCun, Y. Bengio and G. Hinton (2015).Deep learning. nature,
521(7553), 436.
IV. C ONCLUSION [19] I. Goodfellow, Y. Bengio, A. Courville and Y. Bengio(2016)Deep
learning (Vol. 1). Cambridge: MIT press.
Motivated by the challenges and the current state-of-the- [20] R. Salakhutdinov (2010). Learning deep Boltzmann machines using
adaptive MCMC. In Proceedings of the 27th International Conference
art for deep learning applications in a multitude of areas, in on Machine Learning (ICML-10) (pp. 943-950).
this work we have surveyed and classified the deep learning [21] P. Vincent,H. Larochelle, Y. Bengio and P. A. Manzagol (2008, July).
techniques employed for network intrusion detection. The re- Extracting and composing robust features with denoising autoencoders.
In Proceedings of the 25th international conference on Machine learning
sults of the examined models demonstrate that the application (pp. 1096-1103). ACM.
of deep learning in cybersecurity and in particular in IDS is
promising. In spite of the significant advancements, however,
there is still much room for future research on a variety of
open issues, such as new approaches to testing the reliability
and efficiency of knowledge-based and behavioral approaches
in intrusion detection.

R EFERENCES

[1] H. Debar, M. Dacier and A. Wespi Towards a taxonomy of intrusion-


detection systems. Computer Networks, 31(8), 805-822
[2] S. Axelsson (2000).Intrusion detection systems: A survey and taxonomy
(Vol. 99). Technical report.
[3] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi and M. Ghogho
(2016, October). Deep Learning Approach for Network Intrusion De-
tection in Software Defined Networking. In Wireless Networks and
Mobile Communications (WINCOM), 2016 International Conference on
(pp. 258-263). IEEE.
[4] Y. Li, R. Ma and R Jiao (2015).A hybrid malicious code detection method
based on deep learning. methods. methods, 9(5).
[5] A. A. Diro and N. Chilamkurti Distributed attack detection scheme using
deep learning approach for Internet of Things. Future Generation
Computer Systems.

You might also like