Professional Documents
Culture Documents
corresponding author
I. I NTRODUCTION
In todays digital environment, many devices are wirelessly
connected each other to bring anytime and anywhere connections to users. However, apart from this advantage, it suffers
from security exposures such as DoS attack [1], [2], leak of
communication condentiality, and so forth. A plethora of
attacks in wireless network has been recognized, but some
of them are continuously rising. Intrusion detection system
(IDS) in wireless network has attained many attractions, for
instance in todays wireless network supporting vehicular adhoc network (VANET). It offers remarkable protection system
against attacks which have rapidly proliferated over time.
Constructing intrusion detection model is still a challenging task so then a number of methods have been proposed
to enhance the performance of IDS [3]. Such methods include statistic-based, pattern-based, rule-based, state-based,
and heuristic-based was categorized by [4]. Most of IDS
models are developed using data mining and machine learning
techniques. These techniques lie in several learning paradigms,
i.e. supervised, unsupervised, and semi-supervised learning. In
particular, supervised learning deals with creating model and
predicting future unknown attacks using labeled data samples.
However, to achieve high detection accuracy, supervised
learning algorithms require more computational efforts. A
potential technique to increase the performance of machine
learning algorithms is to built ensemble learning. Therefore,
in this paper we employ and evaluate the performance of
978-1-5090-2285-4/16 $31.00 2016 IEEE
DOI 10.1109/AsiaJCIS.2016.13
TABLE II
BASE CLASSIFIERS
will statistically verify the hypothesis of the improved performance of ensemble classier over base classier. Therefore,
we also provide statistical signicant test to demonstrate a signicant different between improved performance of ensemble
and its base classier.
ID
01
02
03
04
05
06
07
08
09
10
Algorithm
OneR
PART Decision Learner
Conjunctive Rule
Best First Decision Tree
Decision Stump
Functional Tree
J48
Random Tree
Simple Chart
REPTree
B. Dataset
For this study we employed GPRS dataset [7] which possesses two distinct wireless network topologies, i.e. WEP/WPA
and WPA2. WEP/WPA dataset is composed by attacks class
(37.5%) and normal class (62.5%) with 15 variables and 1
class label variable. We consider 9600 instances for training
set. WPA2 dataset comprises 7500 instances with attacks class
(40%) and normal class (60%). It has 16 variables and 1 class
label attribute.
C. Base Classiers
We chose 20 machine learning algorithms in order to
evaluate their performance correspond to RF classier. We
also considered to keep the diversity and heterogeneity of
algorithms while choosing them. For the sake of simplicity,
we assign the identication number (ID) to each classier as
shown in Table II.
[0]
[0]
...
(1)
(MK )
ai,K , ..., ai,K
(1)
At this point we require to construct Ria by rearranging the
columns of Ri so as to match the order of features in F and
built classier Di using (XRi , Y ) as training set.
For classication phase, for a given samples x, let di,j (xRia )
is the probability assigned by classier Di to the hypothesis
[0]
(1)
(M )
ai,2 , ..., ai,2 2
...
[0]
ID
11
12
13
14
15
16
17
18
19
20
(1)
(M )
ai,1 , ..., ai,1 1
[0]
Ri =
...
[0]
Algorithm
LibSVM
Naive Bayes
Logistic
Multilayer Perceptron
RBF Network
Convolutional Neural Network
Simple Logistic
SMO
FURIA
JRip
...
...
...
...
88
TABLE I
C LASSIFIER ENSEMBLES FOR IDS IN THE EXISTING LITERATURES
Study
[9]
Ensemble approach
Weighted ensemble
[10]
Majority voting
[11]
Weighted voting
[12]
Adaboost
[13]
[14]
Product rule
Min, max, product rule voting
[15]
Bagging
[16]
Voting
This study
Rotation forest
AU C =
TP
FP
d
=
TP + TN FP + FN
Base classier(s)
Classication and regression trees, bayesian networks
Neural network, support
vector machine, multivariate regression splines
Decision tree, support
vector machine
Decision stump
Dataset
KDD Cup 99
Performance metric(s)
Accuracy
KDD Cup 99
Accuracy
No
KDD Cup 99
Accuracy
No
KDD Cup 99
No
Decision tree
k-means, v-SVC
Private
KDD Cup 99
Multilayer
perceptron,
RBF neural network
Neural network, decision
tree
20 classier algorithms
TP FP
d
P
N
Private
KDD Cup 99
GPRS
Signicant Test
No
No
No
No
No
Yes
For W EP/W P A dataset, RF has brought remarkable improvement for OneR (OR) (11), Decision Stump (DS) (15),
and Functional Tree (FT) (16). RF-OR, RF-DS, and RF-FT
outperform base classiers OR (11), DS (15), and FT (16)
in term of AU C value. Among them, OR (11) receives the
biggest improvement (22.51%) after the application of RF.
For W P A2 dataset, the implementation of RF has also
brought substantial improvement for Conjunctive Rule (CR)
(13), OR (11), and DS (15). RF-CR, RF-OR, and RF-DS outperform base classiers CR (13), OR (11), and DS (15) in term
of AU C value. Among three base classiers, CR (13) gets the
biggest improvement (35.69%) after the implementation of RF.
To further evaluate the performance of RF with different base classier, we take the median AU C of ve times
twofold cross-validation. Fig. 1 and Fig. 2 depict the median
performance value indicator for W EP/W P A dataset and
W P A2 dataset, respectively. From Fig. 1 it is clear that the
(3)
89
TABLE IV
M EAN OF AU C VALUE ON W P A2 D ATASET
TABLE III
M EAN VALUE OF AU C ON W EP/W P A D ATASET
ID
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
AUC
0.8031
0.8224
0.8673
0.9485
0.9382
0.9300
0.8726
0.7808
0.8092
0.7781
0.6841
0.9469
0.7078
0.9459
0.7078
0.8325
0.9332
0.8029
0.9242
0.9413
E-AUC
0.8222
0.8512
0.8677
0.9499
0.9413
0.9409
0.8729
0.7704
0.7817
0.8502
0.8381
0.9502
0.7844
0.9499
0.7873
0.9327
0.9449
0.8132
0.9502
0.9497
Improvement (%)
2.37
3.51
0.05
0.15
0.32
1.17
0.03
-1.33
5.09
9.27
22.51
0.35
10.83
0.42
11.24
12.04
1.25
1.29
2.82
0.90
p-value
0.00512
0.09296
0.00512
0.0466
0.05876
0.00512
0.80258
0.09296
0.00512
0.00512
0.00512
0.00512
0.00512
0.02852
0.00512
0.00512
0.05876
0.00512
0.00512
0.00694
ID
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
AUC
0.9174
0.8761
0.9389
0.9661
0.9217
0.9380
0.9321
0.8517
0.9116
0.9235
0.7064
0.9707
0.5000
0.9744
0.7458
0.9626
0.9684
0.9759
0.9754
0.9708
E-AUC
0.9222
0.9067
0.9378
0.9715
0.9516
0.9470
0.9298
0.8551
0.8551
0.9313
0.9176
0.9764
0.6785
0.9772
0.7738
0.9751
0.9744
0.9770
0.9774
0.9757
Improvement (%)
0.52
3.49
-0.12
0.56
3.25
0.95
-0.25
0.39
-6.21
0.84
29.90
0.59
35.69
0.29
3.76
1.30
0.62
0.11
0.21
0.51
p-value
0.09296
0.00512
0.44726
0.01242
0.00512
0.0466
0.50926
0.96012
0.00512
0.00694
0.00512
0.00512
0.00512
0.00694
0.00512
0.00512
0.00512
0.00512
0.00512
0.00512
V. C ONCLUSION
ACKNOWLEDGMENT
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government
90
91