You are on page 1of 5

2016 11th Asia Joint Conference on Information Security

Classier Ensemble Design with Rotation Forest to


Enhance Attack Detection of IDS in Wireless
Network
Bayu Adhi Tama, Kyung-Hyune Rhee
Laboratory of Information Security and Internet Applications,
IT Convergence and Applications Engineering, Pukyong National University
E-mail: bayuat@pukyong.ac.kr, khrhee@pknu.ac.kr

corresponding author

an ensemble learner, called Rotation Forest (RF) [5] with


different base classiers. The performance of ensemble learner
is inuenced by heterogeneity of base classiers constructing
the ensemble. Various base classiers may lead to different
accuracy on different data samples.
The aim of this evaluation study is to provide the existing
literature a benchmark of designing IDS model using ensemble
learning with ameliorated accuracy. The reminder of this paper
is broken down in the following parts. Section II presents
state-of-the-art existing classier ensembles used in IDS whilst
Section III covers the overview of Rotation Forest, dataset used
in the experiment, base classiers, and performance evaluation
metric. Experimental result is detailed in Section IV and nally
we draw some concluding remarks in the last section.

AbstractThis paper is devoted to discover the appropriate


base classier algorithms while employing Rotation Forest as an
ensemble learning method for intrusion detection system (IDS)
in wireless network. Twenty different classication algorithms
are involved in the experiment and their detection performances
are assessed using the value of area under receiver operating
characteristic curve (AUC) performance metric. The performance result of an ensemble learner are evaluated, including
its signicant improvement while using diverse machine leaning
algorithms as base classiers. From the experimental result and
classier signicant test, it can be revealed that the performance
of Rotation Forest has brought signicant improvement over the
base classiers.

I. I NTRODUCTION
In todays digital environment, many devices are wirelessly
connected each other to bring anytime and anywhere connections to users. However, apart from this advantage, it suffers
from security exposures such as DoS attack [1], [2], leak of
communication condentiality, and so forth. A plethora of
attacks in wireless network has been recognized, but some
of them are continuously rising. Intrusion detection system
(IDS) in wireless network has attained many attractions, for
instance in todays wireless network supporting vehicular adhoc network (VANET). It offers remarkable protection system
against attacks which have rapidly proliferated over time.
Constructing intrusion detection model is still a challenging task so then a number of methods have been proposed
to enhance the performance of IDS [3]. Such methods include statistic-based, pattern-based, rule-based, state-based,
and heuristic-based was categorized by [4]. Most of IDS
models are developed using data mining and machine learning
techniques. These techniques lie in several learning paradigms,
i.e. supervised, unsupervised, and semi-supervised learning. In
particular, supervised learning deals with creating model and
predicting future unknown attacks using labeled data samples.
However, to achieve high detection accuracy, supervised
learning algorithms require more computational efforts. A
potential technique to increase the performance of machine
learning algorithms is to built ensemble learning. Therefore,
in this paper we employ and evaluate the performance of
978-1-5090-2285-4/16 $31.00 2016 IEEE
DOI 10.1109/AsiaJCIS.2016.13

II. R ELATED W ORK


Table I depicts the-state-of-the-art classier ensemble
schemes found in the previous studies. However, these previous studies discussed in the literature have several limitations
which we further describe as following.
As seen in the table, most of studies employed old-fashioned
dataset, namely KDD Cup 99 or NSL KDD [6] to evaluate
IDS. Hence, in order to distinguish between our study and
the previous ones, we considered GPRS dataset [7] which has
a particular purpose for evaluating IDS in wireless network.
This dataset may demonstrate a valuable tool for research
on alternative 802.11 settings, i.e. mesh grid and VANET
since some of the respective attacks are based on resembling
principles [8].
Moreover, RF has been underexplored in the existing literature. As a newly proposed ensemble algorithm, RF could be
used to enhance the accuracy detection of base classiers to
construct IDS. With respect to base classier, we consider 20
different machine algorithms so as the diverse models of each
classier in representing knowledge from different perspective
can be further explored and analyzed.
Finally, most studies do not provide statistical test to show
the level of signicance of their experimental results. The test
87

TABLE II
BASE CLASSIFIERS

will statistically verify the hypothesis of the improved performance of ensemble classier over base classier. Therefore,
we also provide statistical signicant test to demonstrate a signicant different between improved performance of ensemble
and its base classier.

ID
01
02
03
04
05
06
07
08
09
10

III. M ETHOD AND DATASET


In this section, RF ensemble scheme is briey explained,
the twenty machine learning algorithms as base classiers are
presented, and the dataset used to assess classier performance
are covered. Finally, model selection and performance metrics
used for experiment are detailed.

Rotation forest aims at constructing accurate and diverse


classiers. It applies feature extraction using Principle Component Analysis (PCA) to subsets of features and reconstructs
a full feature set for each classier forming the ensemble [5].
The structure of RF algorithm is described as follows [5].
Let X is the object in the training dataset (N n matrix)
and Y is the label of training set (N 1 matrix) with F number
of features. It is assumed that Y takes a value from the set
of class label {1 , ..., c } and the feature set is partitioned
into K subsets, and the number of classiers in the ensemble
is denoted by L with notation of {D1 , ..., DL }. Similar to
Bagging and Random Forest, all classiers are trained in
parallel.
In the training phase, to construct the training set for
classier Di , these following steps are required.
Split F into K subsets randomly with each feature subset
contains M = n/K features.
Fi,j denotes the jth subset features for training set of
classier Di and Xi,j be the dataset X for the features
in Fi,j (for j = 1...K). A nonempty random subset is
eliminated from Xi,j and then select a bootstrap sample

of 75% from Xi,j to form a new set Xi,j
. Apply PCA

on Xi,j to obtain the coefcients in matrix Ci,j . Each

matrix Xi,j
has size of M 1 and coefcients of this
(M )
(1)
matrix are ai,j , ..., ai,j j .
Organize the vector with coefcients in a sparse rotation
matrix Ri as follows.

Algorithm
OneR
PART Decision Learner
Conjunctive Rule
Best First Decision Tree
Decision Stump
Functional Tree
J48
Random Tree
Simple Chart
REPTree

B. Dataset
For this study we employed GPRS dataset [7] which possesses two distinct wireless network topologies, i.e. WEP/WPA
and WPA2. WEP/WPA dataset is composed by attacks class
(37.5%) and normal class (62.5%) with 15 variables and 1
class label variable. We consider 9600 instances for training
set. WPA2 dataset comprises 7500 instances with attacks class
(40%) and normal class (60%). It has 16 variables and 1 class
label attribute.
C. Base Classiers
We chose 20 machine learning algorithms in order to
evaluate their performance correspond to RF classier. We
also considered to keep the diversity and heterogeneity of
algorithms while choosing them. For the sake of simplicity,
we assign the identication number (ID) to each classier as
shown in Table II.

D. Model Selection and Performance Metrics


As recommended by [17], we use ve times cross validation
(5 2cv). This method is found to be more robust than usual
k-cross validation since it overcomes the problem of underestimated variance. To assess the results of the experiments of RF
with respect to 20 base classiers, we employ a metric which
is commonly used in IDS research, i.e. area under the ROC
Curve (AU C) value. Since most of IDS researches are two
class classication problem, i.e. normal or attack, the outcomes
are labeled either as positive (p) or negative (n).
For a binary classication, confusion matrix is commonly
used for analyzing how well a classier can recognize different
classes. The outcomes of confusion matrix with respect to
binary classication are dened as true positive (T P ), true
negative (T N ), false positive (F P ), and false negative (F N ).
The aforementioned performance metric, AU C is calculated
as follow.

[0]

[0]

...
(1)
(MK )
ai,K , ..., ai,K
(1)
At this point we require to construct Ria by rearranging the
columns of Ri so as to match the order of features in F and
built classier Di using (XRi , Y ) as training set.
For classication phase, for a given samples x, let di,j (xRia )
is the probability assigned by classier Di to the hypothesis
[0]
(1)
(M )
ai,2 , ..., ai,2 2
...
[0]

ID
11
12
13
14
15
16
17
18
19
20

that x comes from class j , the average combination method


is:
L
1
j (x) =
di,j (xRia ), j = 1, ..., c.
(2)
L i=1

A. Overview of Rotation Forest

(1)
(M )
ai,1 , ..., ai,1 1

[0]
Ri =

...
[0]

Algorithm
LibSVM
Naive Bayes
Logistic
Multilayer Perceptron
RBF Network
Convolutional Neural Network
Simple Logistic
SMO
FURIA
JRip

...
...
...
...

88

TABLE I
C LASSIFIER ENSEMBLES FOR IDS IN THE EXISTING LITERATURES

Study
[9]

Ensemble approach
Weighted ensemble

[10]

Majority voting

[11]

Weighted voting

[12]

Adaboost

[13]
[14]

Product rule
Min, max, product rule voting

[15]

Bagging

[16]

Voting

This study

Rotation forest


AU C =

TP
FP
d
=
TP + TN FP + FN

Base classier(s)
Classication and regression trees, bayesian networks
Neural network, support
vector machine, multivariate regression splines
Decision tree, support
vector machine
Decision stump

Dataset
KDD Cup 99

Performance metric(s)
Accuracy

KDD Cup 99

Accuracy

No

KDD Cup 99

Accuracy

No

KDD Cup 99

No

Decision tree
k-means, v-SVC

Private
KDD Cup 99

Detection rate, false alarm


rate
AUC
False alarm rate, detection
rate
Accuracy

Multilayer
perceptron,
RBF neural network
Neural network, decision
tree
20 classier algorithms

TP FP
d
P
N

Private
KDD Cup 99
GPRS

True positive rate, false


positive rate, precision, recall, F1
AUC

Signicant Test
No

No
No
No
No
Yes

For W EP/W P A dataset, RF has brought remarkable improvement for OneR (OR) (11), Decision Stump (DS) (15),
and Functional Tree (FT) (16). RF-OR, RF-DS, and RF-FT
outperform base classiers OR (11), DS (15), and FT (16)
in term of AU C value. Among them, OR (11) receives the
biggest improvement (22.51%) after the application of RF.
For W P A2 dataset, the implementation of RF has also
brought substantial improvement for Conjunctive Rule (CR)
(13), OR (11), and DS (15). RF-CR, RF-OR, and RF-DS outperform base classiers CR (13), OR (11), and DS (15) in term
of AU C value. Among three base classiers, CR (13) gets the
biggest improvement (35.69%) after the implementation of RF.
To further evaluate the performance of RF with different base classier, we take the median AU C of ve times
twofold cross-validation. Fig. 1 and Fig. 2 depict the median
performance value indicator for W EP/W P A dataset and
W P A2 dataset, respectively. From Fig. 1 it is clear that the

(3)

E. Classiers Signicance Test


For further evaluation, we tested the signicance of classiers by using non-parametric Wilcoxon matched-pairs signed
rank test. The null hypothesis is mean of average AU C =
mean average of E AU C and the alternative hypothesis is
mean of average AU C = mean average of E AU C. We
dened the signicance level, = 0.05, which corresponds to
a condence of 95%.
IV. E XPERIMENTAL R ESULT
In this section we compare and report the performance
result of classier ensemble applied on W EP/W P A dataset
and W P A2 dataset. We show that by incorporating classier
ensemble, the nal ensemble performance is supposed to rise
signicantly. In the last section, we evaluate the classiers
signicance using standard statistical analysis.
As we mentioned before, all algorithms are evaluated using
5 2cv. It partitions the data into two parts, one part is for
training set and another part is for testing set. This process
is then repeated ve times and the result is ten performance
values. AU C value reported in this section are medians and
average of 5 2cv.
Table III and Table IV show the accuracy performance of
20 algorithms as single classiers and as a part of RF ensemble applied on W EP/W P A dataset and W P A2 dataset,
respectively. AU C and E AU C denote AU C value of single
classier and classier ensemble, respectively. The improvement column shows the relative improvement that classier
ensemble gives over single classier whilst signicant level
for each classier of statistical test is denoted by p-value.

Fig. 1. 5 2cv median AU C per classier for W EP/W P A dataset

89

TABLE IV
M EAN OF AU C VALUE ON W P A2 D ATASET

TABLE III
M EAN VALUE OF AU C ON W EP/W P A D ATASET

ID
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20

AUC
0.8031
0.8224
0.8673
0.9485
0.9382
0.9300
0.8726
0.7808
0.8092
0.7781
0.6841
0.9469
0.7078
0.9459
0.7078
0.8325
0.9332
0.8029
0.9242
0.9413

E-AUC
0.8222
0.8512
0.8677
0.9499
0.9413
0.9409
0.8729
0.7704
0.7817
0.8502
0.8381
0.9502
0.7844
0.9499
0.7873
0.9327
0.9449
0.8132
0.9502
0.9497

Improvement (%)
2.37
3.51
0.05
0.15
0.32
1.17
0.03
-1.33
5.09
9.27
22.51
0.35
10.83
0.42
11.24
12.04
1.25
1.29
2.82
0.90

p-value
0.00512
0.09296
0.00512
0.0466
0.05876
0.00512
0.80258
0.09296
0.00512
0.00512
0.00512
0.00512
0.00512
0.02852
0.00512
0.00512
0.05876
0.00512
0.00512
0.00694

ID
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20

AUC
0.9174
0.8761
0.9389
0.9661
0.9217
0.9380
0.9321
0.8517
0.9116
0.9235
0.7064
0.9707
0.5000
0.9744
0.7458
0.9626
0.9684
0.9759
0.9754
0.9708

E-AUC
0.9222
0.9067
0.9378
0.9715
0.9516
0.9470
0.9298
0.8551
0.8551
0.9313
0.9176
0.9764
0.6785
0.9772
0.7738
0.9751
0.9744
0.9770
0.9774
0.9757

Improvement (%)
0.52
3.49
-0.12
0.56
3.25
0.95
-0.25
0.39
-6.21
0.84
29.90
0.59
35.69
0.29
3.76
1.30
0.62
0.11
0.21
0.51

p-value
0.09296
0.00512
0.44726
0.01242
0.00512
0.0466
0.50926
0.96012
0.00512
0.00694
0.00512
0.00512
0.00512
0.00694
0.00512
0.00512
0.00512
0.00512
0.00512
0.00512

who have suggested to use Decision Tree (J4.8) as base


classier. Based on our experimental result, Rotation
Forest has also performed well using other weak learner
algorithms.
3) Among the twenty ensembles applied on intrusion detection systems in wireless network: RF-Multilayer Perceptron (RF-MP) and RF-Best First Decision Tree (RFBFDT) are the best choice for W EP/W P A standard,
while RF-BFDT and RF-Simple Chart (RF-SC) are the
best choice for W P A2 standard. In addition, we tested
the signicance between RF-MP and RF-BFDT, as
well as between RF-BFDT and RF-SC using Wilcoxon
matched-pairs signed rank test. The p-values are 0.85716
and 0.28462, respectively. These results reveal that their
AU C values are not signicant at p 0.05, thus they
share the same performance.

Fig. 2. 5 2cv median AU C per classier for W P A2 dataset

V. C ONCLUSION

implementation of RF has outperformed most base classiers,


except SMO (8). On the other hand, Fig. 2 conrms that
the implementation of RF does not outperform Logistic (3),
Simple Logistic (7), and FURIA (9). However, across the
entire output domains, the implementation of RF with OR (11)
or CR (13) receives the biggest improvement.

Choosing the right base classier forming an ensemble is


highly important. In this paper we carried out a comparative
study of choosing the appropriate base classier while using Rotation Forest for intrusion detection system (IDS) in
wireless network. We assess the performance of the twenty
machine learning algorithms either as single classier or as
part of ensemble using Area Under ROC Curve (AU C) as
a performance metric. Experimental results reveal that the
performance of weak classiers, i.e. OneR and Conjuctive
Rule can be improved signicantly after the implementation of
ensemble learning. In addition, we found that Rotation Forest
with Best First Decision Tree is suitable for IDS since it
receives satisfactory performance on both W EP/W P A and
W P A2 dataset.

According to the above results, some general remarks can


be drawn as follows.
1) The implementation of Rotation Forest as an classier
ensemble has signicant enhancements as indicated in
Table III - IV and Fig. 1 - 2. It can be revealed
that classier ensemble prevails over base classier in
intrusion detection system dataset.
2) Selecting appropriate base classier is an essential task
while implementing Rotation Forest for intrusion detection system. This opposes to Rodriguez, et al [5]

ACKNOWLEDGMENT
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government

90

(MSIP) No. NRF-2014R1A2A1A11052981.


R EFERENCES
[1] B. A. Tama and K. H. Rhee, Performance analysis of multiple classier
system in DoS attack detection, in Information Security Applications.
Springer, 2015, pp. 339347.
[2] B. A. Tama and K.-H. Rhee, Data mining techniques in DoS/DDoS
attack detection: A literature review, Information, vol. 18, no. 8, p.
3739, 2015.
[3] B. A. Tama and K. H. Rhee, A combination of PSO-based feature
selection and tree-based classiers ensemble for intrusion detection
systems, in Advances in Computer Science and Ubiquitous Computing.
Springer, 2015, pp. 489495.
[4] H.-J. Liao, C.-H. R. Lin, Y.-C. Lin, and K.-Y. Tung, Intrusion detection
system: A comprehensive review, Journal of Network and Computer
Applications, vol. 36, no. 1, pp. 1624, 2013.
[5] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, Rotation forest: A
new classier ensemble method, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 28, no. 10, pp. 16191630, 2006.
[6] M. Tavallaee, E. Bagheri, W. Lu, and A.-A. Ghorbani, A detailed
analysis of the KDD CUP 99 data set, in The Second IEEE Symposium
on Computational Intelligence for Security and Defence Applications
2009, 2009.
[7] D. W. Vilela, E. Ferreira, A. A. Shinoda, N. V. de Souza Araujo,
R. de Oliveira, and V. E. Nascimento, A dataset for evaluating intrusion detection systems in IEEE 802.11 wireless networks, in IEEE
Colombian Conference on Communications and Computing (COLCOM).
IEEE, 2014, pp. 15.
[8] C. Kolias, G. Kambourakis, A. Stavrou, and S. Gritzalis, Intrusion
detection in 802.11 networks: empirical evaluation of threats and a public
dataset, 2015.
[9] S. Chebrolu, A. Abraham, and J. P. Thomas, Feature deduction and
ensemble design of intrusion detection systems, Computers & Security,
vol. 24, no. 4, pp. 295307, 2005.
[10] S. Mukkamala, A. H. Sung, and A. Abraham, Intrusion detection using
an ensemble of intelligent paradigms, Journal of network and computer
applications, vol. 28, no. 2, pp. 167182, 2005.
[11] S. Peddabachigari, A. Abraham, C. Grosan, and J. Thomas, Modeling
intrusion detection system using hybrid intelligent systems, Journal of
network and computer applications, vol. 30, no. 1, pp. 114132, 2007.
[12] W. Hu, W. Hu, and S. Maybank, Adaboost-based algorithm for network
intrusion detection, IEEE Transactions onSystems, Man, and Cybernetics, Part B: Cybernetics, vol. 38, no. 2, pp. 577583, 2008.
[13] J. B. Cabrera, C. Gutierrez, and R. K. Mehra, Ensemble methods for
anomaly detection and distributed intrusion detection in mobile ad-hoc
networks, Information Fusion, vol. 9, no. 1, pp. 96119, 2008.
[14] G. Giacinto, R. Perdisci, M. Del Rio, and F. Roli, Intrusion detection
in computer networks by a modular ensemble of one-class classiers,
Information Fusion, vol. 9, no. 1, pp. 6982, 2008.
[15] M. Govindarajan and R. Chandrasekaran, Intrusion detection using
neural based hybrid classication methods, Computer networks, vol. 55,
no. 8, pp. 16621671, 2011.
[16] S. S. S. Sindhu, S. Geetha, and A. Kannan, Decision tree based light
weight intrusion detection using a wrapper approach, Expert Systems
with applications, vol. 39, no. 1, pp. 129141, 2012.
[17] J. Demsar, Statistical comparisons of classiers over multiple data sets,
The Journal of Machine Learning Research, vol. 7, pp. 130, 2006.

91

You might also like