Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 1
Abstract—Cyber security in the context of big data indicates the data is too big, too fast, or too hard for
is known to be a critical problem and presents a great existing tools to handle. Big data is commonly described
challenge to the research community. Machine learn- by three characteristics: volume, variety and velocity (aka
ing algorithms have been suggested as candidates for
handling big data security problems. Among these algo- 3Vs). The 3Vs define properties or dimensions of data
rithms, support vector machines (SVMs) have achieved where volume refers to an extreme size of data, variety
remarkable success on various classification problems. indicates the data was generated from divers sources and
However, to establish an effective SVM, the user needs velocity refers to the speed of data creation, streaming
to define the proper SVM configuration in advance, and aggregation [49]. The complexity and challenge of
which is a challenging task that requires expert knowl-
edge and a large amount of manual effort for trial and big data are mainly due to the expansion of all three
error. In this work, we formulate the SVM configura- characteristics (3Vs)- rather than just the volume alone
tion process as a bi-objective optimisation problem in [14]. Learning from big data allows researchers, analysts,
which accuracy and model complexity are considered as and organisations users to make better and faster decisions
two conflicting objectives. We propose a novel hyper- to enhance their operations and quality of life [38]. Given
heuristic framework for bi-objective optimisation that
is independent of the problem domain. This is the first its practical applications and challenges, this field has
time that a hyper-heuristic has been developed for attracted the attention of researchers and practitioners
this problem. The proposed hyper-heuristic framework from various communities, including academia, industry
consists of a high-level strategy and low-level heuristics. and government agencies [14].
The high-level strategy uses the search performance However, big data created a new issue related not only
to control the selection of which low-level heuristic
should be used to generate a new SVM configura- to the 3Vs characteristics, but also to data security. It
tion. The low-level heuristics each use different rules has been indicated that big data does not only increase
to effectively explore the SVM configuration search the scale of the challenges related to security, but also
space. To address bi-objective optimisation, the pro- create new and different cyber-security threats that need
posed framework adaptively integrates the strengths to be addressed in an effective and intelligent ways. Indeed,
of decomposition- and Pareto-based approaches to
approximate the Pareto set of SVM configurations. security is known as the prime concern for any organ-
The effectiveness of the proposed framework has been isation when learning from big data [47]. Examples of
evaluated on two cyber security problems: Microsoft big data cyber-security challenges are malwares detection,
malware big data classification and anomaly intrusion authentications and steganoanalysis [45]. Among these
detection. The obtained results demonstrate that the challenges, malware detection is the most critical challenge
proposed framework is very effective, if not superior,
compared with its counterparts and other algorithms. in big data cyber-security. The term malware (short for
malicious software) refers to various malicious computer
Index Terms—Hyper-heuristics, Big data, Cyber se- programs such as ransomwares, viruses and scarewares
curity, Optimisation.
that can infect computers and release important infor-
mation via networks, email or websites [53]. Researchers
I. Introduction and organisations acknowledged the issues that can be
caused by these dangerous software (malicious computer
The rapid advancements in technologies and network-
programs) and therefore new methods should be developed
ings such as mobile, social and Internet of Things create
to prevent them. Yet, despite the fact that malware is a
massive amounts of digital information. In this context,
crucial issue in big data, very little researches have been
the term big data has been emerged to describe this
done in this area [47]. Examples of malware detection
massive amounts of digital information. Big data refers
methods include signature-based detection methods [22],
to large and complex datasets containing both structured
behaviors monitoring detection methods [54] and patterns-
and unstructured data generated on a daily basis, and need
based detection methods [19],[53]. However, most of exist-
to be analysed in short periods of time [49]. The term big
ing malware detection methods are mainly proposed to
data is different from the big database, where big data
deal with small-scale datasets and unable to handle big
N. R. Sabar is with the Department of Computer Science and Infor- data within a moderate amount of time. In addition, these
mation Technology, La Trobe University, Melbourne, VIC, Australia. methods can be easily evaded by attackers, very costly to
Email: n.sabar@latrobe.edu.au maintain and they have very law success rates [53].
X. Yi and A. Song are with the School of Computer Science and
Information Technology, RMIT University, Australia. Email: {xun.yi, To address the above issues, machine learning (ML)
andy.song}@rmit.edu.au algorithms have been proposed for classifying unknown
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 2
patterns and malicious software [45],[53]. ML have show- data classification and anomaly intrusion detection. The
ing promising results to classify and identify unknown empirical results fully demonstrate the effectiveness of the
malware software. Support vector machines (SVMs) are proposed framework on both problems.
among the most popular ML algorithms and have shown The remainder of this paper is organised as follows. In
remarkable success in various real-world applications [15]. the next section (Section II), we present a brief overview
The popularity of SVMs is due to their strong performance of related work. The definition and formulation of SVMs
and scalability [40]. However, despite these advantages, the are presented in Section III. In Section IV, we describe the
performance of an SVM is strongly affected by its selected proposed hyper-heuristic framework and its main compo-
configuration [9]. A typical SVM configuration includes nents. In Section V, we discuss the experimental setup,
the selection of the soft margin parameter (or penalty) and including the benchmark instances and the parameter
the kernel type as well as its parameters. In the literature, settings of the proposed framework. In Section VI, we
various methodologies have been developed for selecting provide the computational results of our framework and
SVM configurations. These methodologies can be classi- compare the framework with other algorithms. Finally, the
fied based on the formulation of the SVM configuration conclusion of this paper is presented in Section VII.
problem and the optimisation method used [9] [12]. An
SVM configuration formulation can rely on either a single
II. Related work
criterion, in which case k-fold cross-validation is used to
assess the performance of the generated configuration, or In this section, we briefly discuss some related works on
multiple criteria, in which case more than one criterion malware detection methods and meta-learning methods. It
must be used to evaluate the generated configuration, such also includes review on hyper-heuristics for classification
as the model accuracy and model complexity [46]. The problems.
available optimisation methods include grid search meth-
ods, gradient-based methods and meta-heuristic methods.
A. Malware detection methods
Grid search methods are easy to implement and have
shown good results [13]. However, they are computation- Recent survey by Ye et al. [53] classified malware detec-
ally expensive, which limits their applicability to big data tion methods into three types: signature-based detection
problems. Gradient-based methods are very efficient, but methods, patterns-based detection methods and cloud-
their main shortcomings are that they require the objective based detection methods. Most of existing detection meth-
function to be differentiable and that they strongly depend ods use signature to detect malware software [21], [22].
on the initial point [4]. Meta-heuristic methods have been Signature is a unique short string of bytes defined for
suggested to overcome the drawbacks of grid search meth- each known malware software so it can be used to detect
ods and gradient-based methods [56],[5], [28]. However, the future unknown software [22]. Although signature-based
performance of a meta-heuristic method strongly depends detection methods are able to detect malware software,
on the selected parameters and operators, the selection of they require constant updating to include the signature
which is known to be a very difficult and time-consuming of new malware software into the signature database. In
process. In addition, only one kernel is used in most works, addition, they can be easily evaded by malware developers
and the search is performed over the parameter space of by using encryption, polymorphism or obfuscation [53].
that kernel. Furthermore, signature database is usually created via
This work presents a novel bi-objective hyper-heuristic manual process by domain experts which is known as
framework for SVM configuration optimisation. Hyper- tedious task and time-consuming [16].
heuristics are more effective than other methods because Patterns-based detection methods check whether a
they are independent of the particular task at hand and given malware software contains a set of patterns or
can often obtain highly competitive configurations. Our not. The patterns are extracted by domain experts to
proposed hyper-heuristic framework integrates several key distinguish malware software and non-benign files [2], [10],
components that differentiate it from existing works to find [35]. However, the analysis of malware software and the
an effective SVM configuration for big data cyber security. extraction of patterns by domain experts is subject to
First, the framework considers a bi-objective formulation error-prone and requires a huge amount of time [19]. This
of the SVM configuration problem, in which the accu- indicates that manual analysis and extraction are major
racy and model complexity are treated as two conflicting issues in developing patterns-based detection methods
objectives. Second, the framework controls the selection because malware software grows very fast [53].
of both the kernel type and kernel parameters as well Cloud-based detection methods use a server to store
as the soft margin parameter. Third, the hyper-heuristic detection software so malware detection can be done
framework combines the strengths of decomposition- and in a client-server manner using cloud-based architecture
Pareto-based approaches in an adaptive manner to find an [54],[41], [53]. However, cloud-based detection methods are
approximate Pareto set of SVM configurations. highly affected by the available number of cluster nodes
The performance of the proposed framework is validated and the running time of the detection methods [29]. This
and compared with that of state-of-the-art algorithms can slow down the detection processes and thus multiable
on two cyber security problems: Microsoft malware big malware software can not be easily detected.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 3
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 4
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 5
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 6
The main components of the proposed hyper-heuristic 1) Select: The selection step involves selecting one
framework are discussed in the following subsections. heuristic from the existing pool of heuristics using a selec-
tion mechanism. In this work, we use Multi-Armed Bandit
(MAB) [39] as an on-line heuristic selection mechanism .
B. Solution representation
In MAB, the past performance of each heuristic is saved;
In our framework, each solution represents one config- then, these performances are used to decide which heuris-
uration (θ ∈ Θ) of the SVM, which is represented in the tic should be selected. Each heuristic is associated with
form of a one-dimensional array, as shown in Figure 3. In two variables: the empirical reward qi and the confidence
this figure, C is the margin parameter (or penalty), KF is level ni . The empirical reward qi represents the average
the index of the selected kernel function, and k1 , k2 , . . . , reward obtained during the search process using this
kKF are the parameters of that kernel function. heuristic. A higher value of the empirical reward is better.
The confidence level ni is the number of times that the
ith heuristic has previously been applied. Based on these
C. Population initialisation
two variables, MAB calculates the confidence interval for
The population of solutions (P S) is randomly ini- each heuristic and then selects the highest value using the
tialised. We use the following equation to assign a random following formula (Equation (11)):
value to each decision variable in a given solution (x):
v
u 2log PLLHn ni(t)
u
xpi = + lip Randpi (0, 1)× − (upi lip ), i=LLH1
(9) arg max qi(t) + c
t
ni(t)
p = 1, 2, . . . , |P S|, i = 1, 2, . . . , d i=LLH1 ...LLHn
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 7
Hyper-heuristic framework
High-level strategy
no
Start Stop
Accept
Select LLH Apply LLH Terminate?
solution? yes
Domain barrier
Low-level heuristics
from only the archive with probability pa . A fixed x = x + N (M ean, σ2) (14)
set of neighbouring solutions for each sub-problem
is determined using the Euclidean distances between where M ean = 0 and σ2 = 0.5 is the standard
any two solutions based on their weight vectors. deviation.
• Heuristic application. In this task, the selected
heuristic is applied to the created mating pool to 2) Differential Mutation 1
evolve a new set of solutions.
3) Accept solution: The acceptance step checks whether x = x1 + F × (x2 − x3 ) (15)
the newly generated solutions should be accepted. In this 3) Differential Mutation 2
work, we first compare each solution x with its neighbour-
ing sub-problems y. x will replace y if it is superior in terms x = x1 + F × (x2 − x3 ) + F × (x4 − x5 ) (16)
of the scalarisation function, g te (x, λ) <g te (y, λ). Next, we
update the archive using non-dominated solutions. 4) Differential Mutation 3
4) Terminate: This step terminates the search process.
This step checks whether the total number of iterations x = x1 + F × (x1 − x2 ) + F × (x3 − x4 ) (17)
has been reached and, if so, terminates the search process
and returns the set of non-dominated solutions. Otherwise, where x1 , x2 , x3 , x4 and x5 are five different solutions
it starts a new iteration. selected from the mating pool in accordance with the
solution selection process discussed in IV-E2. F is a
scaling factor, whose value is fixed to 0.9 in this study.
F. Low-level heuristics
The low-level heuristics (LLHs) are a set of problem- 5) Arithmetic Crossover
specific rules that operate directly on a given solution.
Each LLH takes one or more solutions as input and then x = λ × x1 + (1 − λ) × x2 (18)
modifies them to generate a new solution. In this work,
we utilise various sets of heuristics within the proposed where λ is a randomly generated number, whose value
framework. These heuristics have been demonstrated to is within the range λ ∈ [0, 1]. x1 is the current sub-
be suitable for different problems and even for different problem, and x2 is the best solution in its neighbour-
stages of the same problem. These heuristics are chosen hood.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 8
6) Polynomial Mutation
N M
1 XX
logloss = − yij log(pij ) (20)
(19) N i=1 j=1
(
x1 + σ × (b − a), if Rand ≤ 0.5 where N represents the number of training samples, M is
x= the number of classes, log is the natural logarithm, and yij
x1 , otherwise
is a true label that takes a value of 1 if i is in class j and 0
and otherwise. pij is the estimated probability that i belongs
to class j. Further description can be found on the Kaggle
1
web site.
(
(2 × Rand) (η+1) − 1, if Rand ≤ 0.8
σ= 1 2) Anomaly intrusion detection: In the second exper-
1 − (2 − 2 × Rand) (η+1) , otherwise imental evaluation, we used the NSL-KDD 2 anomaly
where η is set to 0.3 and a and b are the lower and intrusion detection instances. NSL-KDD includes selected
upper bounds, respectively, on the value of the ith records from the KDDCUP99 dataset collected by moni-
decision variable. toring incoming network traffic. NSL-KDD has been used
by many researchers to develop network-based intrusion
detection systems (NIDs). The NSL-KDD problem in-
G. Archive stance consists of 125,973 training samples and 22,544 test-
The archive saves the set of non-dominated solutions ing samples, each classified as either normal or anomalous
and is updated in each iteration. In this work, the newly (i.e., a network attack).
generated solutions are first added to the archive. Then,
following the concept of NSGA-II [18], we use the non- B. Parameter settings
dominated sorting procedure to divide the archive into
The proposed framework has a few parameters that need
several levels of non-domination such that solutions in the
to be determined in advance. To this end, we conducted
first level have the highest priority to be selected, those
a preliminary investigation to set the values of these
in the second level have the second highest priority, etc.
parameters. We tested different values for each parameter
To ensure that the selected solutions are distributed along
while keeping the other parameters fixed. Table II shows
the Pareto front (P F ), we may also select some solutions
the parameter settings investigated in our work as well as
at the lowest level, depending on the crowding distance
the final selected values.
measure.
TABLE II: The parameter settings of our framework
V. Experimental setup Parameter Investigated range Final value
Maximum number of generations 5-150 100
This section summarises the benchmark instances that Population size, P S 5-30 20
were used to assess the proposed framework and the Archive size 5-20 10
pn 0.1-0.8 0.5
parameter settings. pna 0.1-0.8 0.3
pa 0.1-0.8 0.2
A. Benchmark instances
In this work, we analysed our proposed framework on VI. Results and comparisons
two different cyber security problems with a broad range
In this section, we present the results of the experi-
of different structures and sizes.
ments that we conducted to evaluate the proposed HH-
1) Microsoft malware big data classification: A first
SVM framework described in this paper. We conducted
experimental evaluation uses Microsoft malware big data
two experimental tests. In the first test, HH-SVM was
classification problem which was introduced for BIG 2015,
compared with each low-level heuristic individually. In the
hosted at Kaggle 1 . Microsoft provided a total of 500 GB
second test, the results of HH-SVM were compared with
of data of known malware files representing a mix of 9
those of other algorithms proposed in the literature.
families (classes) for 2 purposes: training and testing. A
total of 10868 malwares are included in the training set,
and 10783 malwares are included in the testing set. Each A. HH-SVM compared with individual low-level heuristics
sample is a binary file with the extension “.bytes”, and the This section compares the proposed HH-SVM with each
corresponding disassembled file in the assembly language low-level heuristic (LLH). Our aim is to assess the benefits
(text) has the extension ”.asm”. The ultimate goal is of the proposed hyper-heuristic framework and the effects
to train the classification algorithm using the training of using multiple LLHs on the search performance. To this
samples to effectively classify each of the testing samples end, we tested each LLH separately. The outcomes were
into one of the 9 categories (malware families) such that the results of seven different algorithms, denoted by HH-
the logloss function below is minimised: SVM, LLH1 , LLH2 , LLH3 , LLH4 , LLH5 , and LLH6 . All
1 https://www.kaggle.com/c/malware-classification/data 2 http://nsl.cs.unb.ca/NSL-KDD/
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 9
algorithms were executed under identical conditions, and TABLE V: The p-values of HH-SVM compared with the
the same base components were utilised on both problem individual low-level heuristics
instances (BIG 2015 and NSL-KDD). The average results BIG 2015 NSL-KDD
over 31 independent runs are compared in Table III. The HH-SVM vs.
p-value p-value
BIG 2015 results are compared in terms of logloss, for
LLH1 0.001 0.000
which lower values are better (20), whereas the NSL-
LLH2 0.000 0.010
KDD results are compared based on accuracy, for which
LLH3 0.020 0.011
higher values are better. In the table, the best results
achieved among all algorithms are indicated in bold font. LLH4 0.000 0.000
From the results, we can see that HH-SVM outperforms LLH5 0.012 0.000
all other algorithms (LLH1 , LLH2 , LLH3 , LLH4 , LLH5 , LLH6 0.022 0.000
and LLH6 ) on both BIG 2015 and NSL-KDD. Table IV
reports the numbers of support vectors (NSV ) for HH-
SVM and the compared algorithms on both instances, for B. HH-SVM compared with other algorithms
which lower values are better. As seen from this table, In this section, the results of HH-SVM are compared
the proposed HH-SVM framework produced lower NSV with those reported in the literature. For BIG 2015, we
values for both BIG 2015 and NSL-KDD compared with consider the following algorithms in the comparison:
LLH1 , LLH2 , LLH3 , LLH4 , LLH5 , and LLH6 . These
• XGBoost (AE) [55]
positive results justify the use of the proposed hyper-
• Random Forest (RF) [23]
heuristic framework and the use of the pool of heuristics
• Optimised XGBoost (OXB) [1]
(LLHs).
For the NSL-KDD instance, the accuracy results ob-
tained by HH-SVM are compared against those of the
TABLE III: Comparison of the HH-SVM results against following algorithms:
the results of all low-level heuristics (LLH1 to LLH6 )
• Gaussian Naive Bayes Tree (GNBT) [48]
individually
• Fuzzy Classifier (FC) [27]
Algorithm / Instance BIG 2015 NSL-KDD • Decision Tree (DT) [33]
HH-SVM 0.0031 85.69
LLH1 0.0332 77.24 The results of HH-SVM and the other algorithms for
LLH2 0.0223 66.45 the BIG 2015 and NSL-KDD problem instances are sum-
LLH3 0.0214 80.01 marised in Table VI and Table VII, respectively. Similar
LLH4 0.0208 79.22
LLH5 0.0227 80.37
to the literature, the results for BIG 2015 in Table VI are
LLH6 0.0216 76.93 given in the form of the logloss values achieved by the
various algorithms, whereas in Table VII, all algorithms
are compared in terms of the accuracy measure. In the
logloss comparisons, a lower value indicates better per-
TABLE IV: The NSV values obtained by HH-SVM and formance, whereas in the accuracy comparisons, a higher
the individual low-level heuristics (LLH1 to LLH6 ) value indicates better performance. The best result ob-
Algorithm / Instance BIG 2015 NSL-KDD tained among the compared algorithms is indicated in bold
HH-SVM 20 8 in both tables. As shown in Table VI, HH-SVM has a lower
LLH1 33 12
logloss value than those of AE, RF and OXB for the BIG
LLH2 34 17
LLH3 34 20 2015 instance, whereas in Table VII, the accuracy value
LLH4 42 16 of HH-SVM is higher than those of GNBT, FC and DT
LLH5 41 22 for the NSL-KDD instance. The results demonstrate that
LLH6 38 21
HH-SVM is an effective methodology for addressing cyber
security problems. The good performance of HH-SVM can
To further verify these results, we conducted statistical be attributed to its ability to design and optimise different
tests using the Wilcoxon test with a significance level of SVMs for different problem instances and for different
0.05. The p-values for the HH-SVM results versus those stages of the solution process.
of LLH1 , LLH2 , LLH3 , LLH4 , LLH5 , and LLH6 are
reported in Table V. In this table, a p-value of less than TABLE VI: Comparison of the logloss results of HH-SVM
0.05 indicates that HH-SVM is statistically superior to the and other algorithms
algorithm considered for comparison. A value greater than Algorithm BIG 2015
0.05 indicates that the performance of our proposed HH- HH-SVM 0.0031
SVM framework is not significantly superior. From the
AE 0.0748
table, we can clearly see that all p-values are less than 0.05,
RF 0.0988557
indicating that HH-SVM is statistically superior to LLH1 ,
LLH2 , LLH3 , LLH4 , LLH5 , and LLH6 across both BIG OXB 0.0063
2015 and NSL-KDD.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 10
TABLE VII: Comparison of the accuracy results of HH- [10] David Brumley, Cody Hartwig, Zhenkai Liang, James Newsome,
SVM and other algorithms Dawn Song, and Heng Yin. Automatically identifying trigger-
based behavior in malware. Botnet Detection, pages 65–88,
Algorithm NSL-KDD 2008.
HH-SVM 85.69 [11] Edmund K Burke, Matthew Hyde, Graham Kendall, Gabriela
Ochoa, Ender Özcan, and John R Woodward. A classification
GNBT 82.02 of hyper-heuristic approaches. In Handbook of metaheuristics,
FC 82.74 pages 449–468. Springer, 2010.
[12] Athanassia Chalimourda, Bernhard Schölkopf, and Alex J
DT 80.14 Smola. Experimentally optimal ν in support vector regression
for different noise models and parameter settings. Neural
Networks, 17(1):127–141, 2004.
[13] Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for
VII. Conclusion support vector machines. ACM transactions on intelligent
systems and technology (TIST), 2(3):27, 2011.
In this work, we proposed a hyper-heuristic SVM opti- [14] Min Chen, Shiwen Mao, and Yunhao Liu. Big data: A survey.
misation framework for big data cyber security problems. Mobile Networks and Applications, 19(2):171–209, 2014.
We formulated the SVM configuration process as a bi- [15] Nello Cristianini and John Shawe-Taylor. An introduction to
support vector machines and other kernel-based learning meth-
objective optimisation problem in which accuracy and ods. Cambridge university press, 2000.
model complexity are treated as two conflicting objectives. [16] Mohsen Damshenas, Ali Dehghantanha, and Ramlan Mah-
This bi-objective optimisation problem can be solved using moud. A survey on malware propagation, analysis, and de-
tection. International Journal of Cyber-Security and Digital
the proposed hyper-heuristic framework. The framework Forensics (IJCSDF), 2(4):10–29, 2013.
integrates the strengths of decomposition- and Pareto- [17] Kalyanmoy Deb. Multi-objective optimization using evolution-
based approaches to approximate the Pareto set of con- ary algorithms, volume 16. John Wiley & Sons, 2001.
[18] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT
figurations. Our framework has been tested on two bench- Meyarivan. A fast and elitist multiobjective genetic algo-
mark cyber security problem instances: Microsoft malware rithm: Nsga-ii. IEEE transactions on evolutionary computation,
big data classification and anomaly intrusion detection. 6(2):182–197, 2002.
The experimental results demonstrate the effectiveness [19] Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher
Kruegel. A survey on automated dynamic malware-analysis
and potential of the proposed framework in achieving techniques and tools. ACM computing surveys (CSUR), 44(2):6,
competitive, if not superior, results compared with other 2012.
algorithms. [20] Agoston E Eiben, James E Smith, et al. Introduction to
evolutionary computing, volume 53. Springer, 2003.
[21] Eric Filiol. Malware pattern scanning schemes secure against
References black-box analysis. Journal in Computer Virology, 2(1):35–50,
2006.
[1] Mansour Ahmadi, Dmitry Ulyanov, Stanislav Semenov, Mikhail [22] Eric Filiol, Grégoire Jacob, and Mickaël Le Liard. Evaluation
Trofimov, and Giorgio Giacinto. Novel feature extraction, methodology and theoretical model for antiviral behavioural
selection and fusion for effective malware family classification. detection strategies. Journal in Computer Virology, 3(1):23–37,
In Proceedings of the Sixth ACM Conference on Data and 2007.
Application Security and Privacy, pages 183–194. ACM, 2016. [23] Luba Gloukhov, Cody Wild, and David Reilly. Malware clas-
[2] Alfred V Aho and Margaret J Corasick. Efficient string match- sification: Distributed data mining with spark. In Associa-
ing: an aid to bibliographic search. Communications of the tion for the Advancement of Artificial Intelligence, pages 1–6.
ACM, 18(6):333–340, 1975. www.aaai.org, 2015.
[3] Shawkat Ali and Kate A Smith-Miles. A meta-learning ap- [24] Taciana AF Gomes, Ricardo BC Prudêncio, Carlos Soares,
proach to automatic kernel selection for support vector ma- André LD Rossi, and André Carvalho. Combining meta-learning
chines. Neurocomputing, 70(1):173–186, 2006. and search techniques to select parameters for support vector
[4] Nedjem-Eddine Ayat, Mohamed Cheriet, and Ching Y Suen. machines. Neurocomputing, 75(1):3–13, 2012.
Automatic model selection for the optimization of support vec- [25] Kieran Greer. A stochastic hyperheuristic for unsupervised
tor machine kernels. Pattern Recognition, 38(10):1733–1745, matching of partial information. Advances in Artificial Intel-
2005. ligence, 2012:13, 2012.
[5] Yukun Bao, Zhongyi Hu, and Tao Xiong. A particle swarm [26] Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and
optimization and pattern search based memetic algorithm for Thomas Stützle. Paramils: an automatic algorithm configu-
svms parameters optimization. Neurocomputing, 117:98–106, ration framework. Journal of Artificial Intelligence Research,
2013. 36(1):267–306, 2009.
[6] Rodrigo C Barros, Márcio P Basgalupp, André CPLF de Car- [27] Pavel Krömer, Jan Platoš, Václav Snášel, and Ajith Abraham.
valho, and Alex A Freitas. A hyper-heuristic evolutionary Fuzzy classification by evolutionary algorithms. In Systems,
algorithm for automatically designing decision-tree algorithms. Man, and Cybernetics (SMC), 2011 IEEE International Con-
In Proceedings of the 14th annual conference on Genetic and ference on, pages 313–318. IEEE, 2011.
evolutionary computation, pages 1237–1244. ACM, 2012. [28] Ana Carolina Lorena and Andre CPLF De Carvalho. Evolu-
[7] Márcio P Basgalupp, Rodrigo C Barros, Tiago S da Silva, and tionary tuning of support vector machine parameter values in
André CPLF de Carvalho. Software effort prediction: a hyper- multiclass problems. Neurocomputing, 71(16):3326–3334, 2008.
heuristic decision-tree based approach. In Proceedings of the [29] Mohammad M Masud, Tahseen M Al-Khateeb, Kevin W
28th Annual ACM Symposium on Applied Computing, pages Hamlen, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani
1109–1116. ACM, 2013. Thuraisingham. Cloud-based malware detection for evolving
[8] Márcio P Basgalupp, Rodrigo C Barros, and Vili Podgor- data streams. ACM transactions on management information
elec. Evolving decision-tree induction algorithms with a multi- systems (TMIS), 2(3):16, 2011.
objective hyper-heuristic. In Proceedings of the 30th Annual [30] Pericles BC Miranda, Ricardo BC Prudencio, Andre CPLF
ACM Symposium on Applied Computing, pages 110–117. ACM, Carvalho, and Carlos Soares. Combining meta-learning with
2015. multi-objective particle swarm algorithms for support vector
[9] Asa Ben-Hur and Jason Weston. A users guide to support vector machine parameter selection: An experimental analysis. In
machines. Data mining techniques for the life sciences, pages Neural Networks (SBRN), 2012 Brazilian Symposium on, pages
223–239, 2010. 1–6. IEEE, 2012.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2801792, IEEE Access
IEEE ACCESS 11
[31] Péricles BC Miranda, Ricardo BC Prudencio, Andre Carlos PLF [51] Alan Vella, David Corne, and Chris Murphy. Hyper-heuristic
de Carvalho, and Carlos Soares. Multi-objective optimization decision tree induction. In Nature & Biologically Inspired
and meta-learning for support vector machine parameter selec- Computing, 2009. NaBIC 2009. World Congress on, pages 409–
tion. In Neural Networks (IJCNN), The 2012 International 414. IEEE, 2009.
Joint Conference on, pages 1–8. IEEE, 2012. [52] Ricardo Vilalta and Youssef Drissi. A perspective view and sur-
[32] Péricles BC Miranda, Ricardo BC Prudêncio, André PLF vey of meta-learning. Artificial Intelligence Review, 18(2):77–95,
De Carvalho, and Carlos Soares. A hybrid meta-learning ar- 2002.
chitecture for multi-objective optimization of support vector [53] Yanfang Ye, Tao Li, Donald Adjeroh, and S Sitharama Iyengar.
machine parameters. Neurocomputing, 143:27–43, 2014. A survey on malware detection using data mining techniques.
[33] Mehdi Mohammadi, Bijan Raahemi, Ahmad Akbari, and Babak ACM Computing Surveys (CSUR), 50(3):41, 2017.
Nassersharif. New class-dependent feature transformation for [54] Yanfang Ye, Tao Li, Shenghuo Zhu, Weiwei Zhuang, Egemen
intrusion detection systems. Security and communication net- Tas, Umesh Gupta, and Melih Abdulhayoglu. Combining file
works, 5(12):1296–1311, 2012. content and file relations for cloud based malware detection. In
[34] José Carlos Ortiz-Bayliss, Hugo Terashima-Marı́N, and Santi- Proceedings of the 17th ACM SIGKDD international conference
ago Enrique Conant-Pablos. Learning vector quantization for on Knowledge discovery and data mining, pages 222–230. ACM,
variable ordering in constraint satisfaction problems. Pattern 2011.
Recognition Letters, 34(4):423–432, 2013. [55] Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and
[35] Chhabi Rani Panigrahi, Mayank Tiwari, Bibudhendu Pati, and Uday Tupakula. Autoencoder-based feature learning for cyber
Rajendra Prasath. Malware detection in big data using fast security applications. In Neural Networks (IJCNN), 2017 In-
pattern matching: A hadoop based comparison on gpu. In ternational Joint Conference on, pages 3854–3861. IEEE, 2017.
Mining Intelligence and Knowledge Exploration, pages 407–416. [56] Jun Zhang, Zhi-hui Zhan, Ying Lin, Ni Chen, Yue-jiao Gong,
Springer, 2014. Jing-hui Zhong, Henry SH Chung, Yun Li, and Yu-hui Shi.
[36] Matthias Reif, Faisal Shafait, and Andreas Dengel. Meta- Evolutionary computation meets machine learning: A survey.
learning for evolutionary parameter optimization of classifiers. IEEE Computational Intelligence Magazine, 6(4):68–75, 2011.
Machine learning, 87(3):357–380, 2012. [57] Qingfu Zhang and Hui Li. Moea/d: A multiobjective evolution-
[37] Alejandro Rosales-Pérez, Jesus A Gonzalez, Carlos A Coello ary algorithm based on decomposition. IEEE Transactions on
Coello, Hugo Jair Escalante, and Carlos A Reyes-Garcia. evolutionary computation, 11(6):712–731, 2007.
Surrogate-assisted multi-objective model selection for support
vector machines. Neurocomputing, 150:163–172, 2015.
[38] Nasser R Sabar, Jemal Abawajy, and John Yearwood. Hetero-
geneous cooperative co-evolution memetic differential evolution
algorithm for big data optimization problems. IEEE Transac-
tions on Evolutionary Computation, 21(2):315–327, 2017.
[39] Nasser R Sabar, Masri Ayob, Graham Kendall, and Rong Qu.
A dynamic multiarmed bandit-gene expression programming
hyper-heuristic for combinatorial optimization problems. IEEE
transactions on cybernetics, 45(2):217–228, 2015.
[40] Bernhard Schölkopf and Alexander J Smola. Learning with
kernels: support vector machines, regularization, optimization,
and beyond. MIT press, 2002.
[41] Sagar Shaw, Manish Kumar Gupta, and Sanjay Chakraborty.
Cloud based malware detection. In Proceedings of the 5th
International Conference on Frontiers in Intelligent Computing:
Theory and Applications: FICTA 2016, volume 1, page 485.
Springer, 2017.
[42] Kevin Sim, Emma Hart, and Ben Paechter. A hyper-heuristic
classifier for one dimensional bin packing problems: Improving
classification accuracy by attribute evolution. Parallel Problem
Solving from Nature-PPSN XII, pages 348–357, 2012.
[43] Carlos Soares, Pavel B Brazdil, and Petr Kuba. A meta-learning
method to select the kernel width in support vector regression.
Machine learning, 54(3):195–209, 2004.
[44] Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan
Jager, Min Kang, Zhenkai Liang, James Newsome, Pongsin
Poosankam, and Prateek Saxena. Bitblaze: A new approach
to computer security via binary analysis. Information systems
security, pages 1–25, 2008.
[45] Shan Suthaharan. Big data classification: Problems and chal-
lenges in network intrusion prediction with machine learn-
ing. ACM SIGMETRICS Performance Evaluation Review,
41(4):70–73, 2014.
[46] Thorsten Suttorp and Christian Igel. Multi-objective opti-
mization of support vector machines. Multi-objective machine
learning, pages 199–220, 2006.
[47] Colin Tankard. Big data security. Network security, 2012(7):5–
8, 2012.
[48] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A Ghor-
bani. A detailed analysis of the kdd cup 99 data set. In Com-
putational Intelligence for Security and Defense Applications,
2009. CISDA 2009. IEEE Symposium on, pages 1–6. IEEE,
2009.
[49] Chun-Wei Tsai, Chin-Feng Lai, Han-Chieh Chao, and Athana-
sios V Vasilakos. Big data analytics: a survey. Journal of Big
Data, 2(1):21, 2015.
[50] Vladimir Vapnik. The nature of statistical learning theory.
Springer science & business media, 2013.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.