You are on page 1of 4

A Cooperative Intrusion Detection System Based on Improved Parallel SVM

Hongle Du, Shaohua Teng, Xiufen Fu, Wei Zhang, Yuanfang Pu


Faculty ofcomputer, Guangdong University oftechnology, Guangzhou, Guangdong, China
DhI5597@163.com, {shteng, xffu, weizhang}@gdut.edu.cn, baobaopyf@163.com

Abstract detect unknown attacks. Both methods need to establish


It is important that the training time of the Support profiles of user behaviors.
Vector Machine (SVM) is shortened and storage space Network attacks have seriously threatened to network
requirement is reduced for high-speed and large-scale security nowadays. Because pervasive computing has no
network. An intrusion detection method based on parallel the limitation of time and space, it is suitable to the
SVM is proposed and a detection model system is applications of small data. Then it is important to reduce
constructed in this paper. First, original training dataset the quantity of the sending data for pervasive computing.
gained from network is divided into three subsets An incremental learning method is applied in this paper.
according to the network protocol (TCP, UDP and ICMP). First, the network data flow is divided into TCP data flow,
Second, every subset is parted into multi-subsets and sent UDP data flow and ICMP data flow depending on the TCP,
to parallel SVMs. Then we get multiple results from SVM UDP and ICMP protocol. Three data flows are trained
trainers. The incremental learning algorithm of SVM is separately based on parallel algorithm of SVM and three
used to train new data sets instead of reconstructing SVM type of detection agent, TCP, UDP, and ICMP are obtained.
for whole data. This method improves the training Finally, simulating experiments are done with KDD CUP
efficiency by reducing the size of training subsets. At last, 1999 data set. The experiment results show that there is a
simulation experiments are done with KDD CUP 1999 data better detection effect with cooperative network intrusion
set. The experiment results show that the training time of detection based on parallel algorithm of SVM.
SVM is shortened and the detection accuracy obtained by 2. Parallel algorithm of SVM
our method is exactly same as that obtained by others. 2.1. Incremental learning of SVM
1. Introduction Support Vector Machine is a popular topic based on
Pervasive computing, also synonymously called statistical machine learning [2]. There are two problems in
ubiquitous computing, is the next generation computing machine learning: 1) the training of large-scale data sets
environments proposed by Weiser in 1991[1]. In the era of that requires large memory space and very long training
pervasive computing, the information and communication time; and 2) availability of a complete data set. Moreover,
technology will be highly embedded into people's daily we have to learn on line. In fact, data samples are
lives. The goal of pervasive computing is to enable accumulated during the learning to improve the learning
computing to be pervasively and unobtrusively available to accuracy. Thus, incremental learning arises. The key of
the user anywhere at anytime. Intrusion detection is the incremental learning is to retain the information of original
second defense of network security. Depending on the type samples and cope with the increasing samples. Syed [3]
of analysis, intrusion detection systems are classified as proposed the incremental SVM learning algorithm for the
either signature-based or anomaly-based. Signature-based fITst time. The idea is to acquire the support vectors by
schemes (also denoted as misuse-based) seek defmed training initial sample set. Both the new data sets and the
patterns, or signatures, within the analyzed data. For this previous support vectors form new samples, and train them
purpose, a signature database corresponding to known to produce new support vectors. Support vector set
attacks is specified a prior. Anomaly detection needs to obtained prior may change due to incremental learning [4].
establish the user's normal behavior patterns in the Such as the new samples include some classification
protected system, and generate an alarm wherever the information that the previous training set don't include. The
deviation between a given observation at an instant and the change is related with the KKT (Karush-Kuhn-Tucker
normal behavior exceeds a predefmed threshold. Another theory) condition. Because the samples that violate the
possibility of anomaly detection is to model the "abnormal" KKT conditions will change the previous support vector set,
behavior of the system and to raise an alarm when the these samples are added to previous data set. The new
difference between the observed behavior and expected one samples that meet KKT conditions will be thrown away
falls within a given limit. Anomaly detection is used to because they don't change the previous support vector set

978-1-4244-5228-6/09/$26.00 2009 IEEE


[41. Three situations in violation of the KKT conditions are TD),TD z,TD 3 , ,TDn The new training sunsets are
given in [4], we can sum up as: J{(xJI < l. reconstructed by adding SVs into the training subsets, and
2.2. Parallel SVM above process are repeatedly done.
Another method to solve the large-scale training set for
SVM is parallel learning algorithm. Depending on divide
and conquer, the problem is divided into some sub-
problems in parallel learning algorithm. After every sub-
problem is processed, all results from sub-problems will be
fused. The advantages of parallel learning algorithm are to
reduce the training time and its extensibility is good. The
structure of the traditional Cascade SVM algorithm is an
inverted binary tree. It is shown in Fig. l(a) , in which each
node is a classifier. The results obtained from classifiers are
combined with every two. Then trainings are again done
and new classifiers are generated. Repeat the above step (a) (b)
until one result and form one classifier. The algorithm is Figure I.The architecture of Cascade SVM
described in detail as follows:
Step I: The original training set is divided into 2n The distributed SVM proposed in this paper is an
subsets (For example in Fig. I(a): TD), TD z, TD3, TD4) . improved algorithm of cascade SVM. We use the master-
Then original problem is decomposed into 2n sub-problems. slave structure. The master computer takes charge of
2n sub-problems are resolved with parallel SVM. 2n sub- splitting training dataset, handing out subsets, fusing
classifiers and 2n subsets of support vectors are generated. training sub-results and training after fusion. And the slave
Step 2: 2n support vector sets are combined with every computers take charge of training subsets and determining
two, for example, SVs 1 and SVsz are combined; SVs3 and whether or not the KKT conditions are met. The algorithm
SVs4 are combined. New classifiers are obtained through is described in detail as follows:
SVM algorithm with 2n- 1 new training sets. 2n- 1 new results Step I : The original training set is divided into n subsets
and support vector sets are generated (SVs5 and SVs6) . (For example in Fig. I(b): TD), TD z, TD3 , ... , TDn) .
Step 3: Repeat step (2) until only remaining one support Original problem is decomposed into n sub-problems.
vector set SVs (Svss). Step 2: n sub-problems are resolved with distributed
Step 4: Repeatedly add the SVs in every original SVM. n sub-classifiers and n subsets of support vector
training subset (Z"), Return to the step (2) and get the new (SVs 1 ' SVsz ' "' , SVsn) are obtained.
support vector set SVs'; Step 3: Deal with the support vector sets (SVs),SVsz,' ,
Step 5: The algorithm stop if the condition of ISVs '-SVsl SVsn) by the data fusion and a new training set TD is
:s 8 (8 is threshold) is satisfied. Return to step (2) to train generated.
again if the condition is not satisfied. Step 4: Train TD with SVM and get a new support
2.3. Improved parallel cascade algorithm vector set SVs.
Cascade algorithm is a parallel algorithm of SVM [51. Step 5: repeatedly add SVs in every training subset (n
From Fig. I(a), six-layer architecture is needed in the subsets) with incremental learning in section 2.3 and get n
algorithm when there are 25 nodes. This is an enormous new training subsets. All above steps are repeatedly done
waste because more nodes are not used in lower layer. The and new support vector set SVs' are generated.
training time is lengthened because support vector set is Step 6: The algorithm stop if the condition of ISVs '-SVsl
added directly into the original training subsets. This leads :s 8 (8 is threshold) is satisfied. Return to step (4) to start
to reconstruct new training subsets in every cycle. The new training process if the condition is not satisfied.
improved parallel cascade model is presented in this paper 3. Parallel SVM-based Cooperative Intrusion
and shown in Fig. I(b). The original training dataset is Detection
divided into n subsets that are sent to every node. The Three key problems of cooperative algorithm are task
nodes train them. The classifiers and corresponding support partitioning, algorithm selection and result fusion. Task
vector sets are generated. The redundant data are filtered partitioning must keep as much as possible classification
out by using the algorithm of data fusion, such as D-S information. Result fusion is to fuse the results from every
evidence theory. We simply select one from duplicated agent and improve the system performance. The main work
support vectors and throwaway others in this paper. All is task partitioning and algorithm selection in this paper.
support vectors are combined into training dataset. Next, SVM takes advantage to solve the pattern recognition
trainings are redone and the new support vector set SVs are problem with small samples, nonlinear, and high-dimension.
obtained. Finally, we get the subsets TD),TD z,TD3, ,TD n However, there are requirement of large storage space and
that are the unsatisfied KKT condition sets of
long training time for the large-scale datasets. Combining training model. The second method does not need to read
with the cascade algorithm and incremental learning training model when detection computers restart. However,
algorithm of SVM, we give the model of cooperative every detection computer needs to change the training
intrusion detection based on parallel SVM as shown in Fig. model when the model is updated with adding new training
2. The Network data flows are divided into TCP flows, set. This paper does the experiments by using the second
UDP flows and ICMP flows depending on the network method because of local network environment.
Testin Trainin
protocol. Before every data flow is trained by the improved
parallel algorithm in Section 2, the data is preprocessed
such as cleaning, attribute selection, and formatting. Finally,
the training data is stored into database.
There are three process of every detection agent:
construction, self-adaptor and detection. The pre-process of
every detection agent is similar in training stage and
predicting stage. Therefore, taking the detection agent of
TCP data flow as an example, we discuss the realization
process of every detection agent. The training and testing
process of parallel SVM is shown in Fig. 3.
Da t a preprocessing I Training module I saving module
Training
TCP . I. . . I module Figure 3. The training and testing process of parallel SVM
J
l''''l !'oa :cslIIg ~
I I repParallelfrailiing I I .. Database

. I ... I
Training
module
4. Experiment and Result
P"'lJl'<:lCl,'C smg ~
I L1)], l'aralld l rallllllg
4.1. Experiment data sets
"~~"~";"g T' I
I I
I
I Training
[eM" ",,,,,,n,,,,,,,,, I
module
KDD CUP 1999(6) data are standard data sets for
intrusion detection, including the training data sets and test
data sets. Among the 41 attributes ofKDD CUP 1999 data,
Fi some are digital, but others are given in character strings.
gure 2. The module of cooperative intrusion detection However, SVM can accept input of digital types only.
To construct detection agent shown in Fig. 3, three-layer Therefore, before training, we should make the input data
architecture is used in the training module. The first layer is to be digital type and normalized. Simple substitution is
control center to split task and supervise running state of used to transform the symbols into numerical data. All
every training computer. The second layer is training symbols, such as protocol-type, service, and flag, are
computers to train the sub-tasks and determine whether replaced by different numerical values. For examples, the
KKT condition is met or not. The third layer is data fusion three types of protocols (TCP, UDP and ICMP) are
to fuse the all results obtained from training computers and described by 1, 2, and 3. The 71 types of service are
reconstruct new training dataset. Then new training subsets substituted with 1, 2, ... , 71. A record is labeled by either 1
are reconstructed by using incremental learning algorithm or -1, where normal record is 1 and abnormal record is -1.
of SVM in Section 3 for every training computer. Every The input data set is normalized with Libsvm [71.
detection agent has ability of self-learning, called as self- In order to shorten the training time and ensure
adaptor. The improved parallel algorithm is used to train representativeness of the data chosen, the same interval is
SVM by using the incremental learning algorithm of used to select data. We select four training data sets as
parallel SVM in Section 2. Therefore, the method is follows. The training data set is obtained as: from the first
suitable for incremental learning or online learning. In 10 percent records in the training data set of KDD CUP
other words, there is a good self-adaptive capacity. 1999, we select one record in every other 15 records. Thus,
There are two methods to deal with the testing set. 1) totally 32935 records are selected for training. In the test
The training is done in control center. Every detection data set ofKDD CUP 1999, one record is selected in every
computer gets detection model from control center in other 20 records and totally 15552 records are selected for
testing process. 2) The training is done in detection test. Then training sets and testing sets are divided into six
computer. Two methods have advantages and subsets according to network protocol: TCP train set, UDP
disadvantages: the first method only changes the control train set, ICMP train set, TCP test set, UDP test set and
center when the result is updated with adding new training ICMP test set. Table 1 shows the distribution of
set. However, every detection computer needs to reread experimental data.
Table 1: The distribution of experimental data
training model from control center when detection
computers restart. Therefore, performance of every TCP UDP ICMP
detection agent is affected by speed of network and size of Data Test Train Test Train Test Train
type set set set set set set
Total 11921 19004 2686 2133 16496 28364 number of support vector; AR (Accuracy Rate) is the rate
normal 4387 7683 1648 1914 37 129
of the test rewords detected with classifier; C-Num is the
4.2. Experiment Results and Analysis number of the records predicted as corrected; N-cycle is the
In table 2, T-time is the training time. And every number number of cycle. In the follow experiments, e is 50.
is max training time of classifier in every layer; SVs is the
Table 2: Experiment results with PISVM

TCP data set UDP data set ICMP data set


T-time(s) SVs AR (0/0) C-Num T-time(s) SVs AR (%) C-Num T-time(s) SVs AR C-Num
(%)
First cycle 21+201 8701 85.7059 10217 1+2 1021 61.3552 1648 1+0.5 300 99.9394 16486
Second cycle 62+241 8781 85.7142 10218 2+2.5 1092 61.3552 1648 1+1.5 351 99.9454 16487
Third cycle 74+256 8795 85.7142 10218 2+3 1128 61.3552 1648 1+1.5 363 99.9454 16487
In table2, TDl, TD2, TD3 and TD4 are four Cascade. The numbers of support vectors are 8975,
training subsets, whose records are selected in every 1128, and 363 (TCP classifier, UDP classifier and
other 4 records from first, second, third, and fourth ICMP classifier) with PISVM and 10725 with Cascade.
TCP train set. UDP and ICMP train set are dealt with. The more support vectors will lead to much testing
Because every training set has no redundant data, the time and larger storage space. Because the support
process of data fusion is omitted. The experiment vectors of TCP is more than other protocols, testing
results obtained from different training data set are time is spent longer than others.
shown in Table 2. The prediction accuracy is different Table 4: Experiment results comparison between PISVM and Cascade
to that of training data set in Table 2. The prediction N- T- SVs AR C-
accuracy of ICMP data set is 99.9454% after three cycle time(s) (%) Num
cycles; the prediction accuracy of TCP data set is PISVM 3 855 8975+1128+363 91.1584 27343
Cascade 3 5362 10725 83.59 25999
85.7142%; but for UDP data set, the prediction
accuracy is only 61.3552%. After we analyze carefully 5. Conclusion
the experiment data set, we have: first, the ICMP The different detection agents are constructed
training set includes 28364 records, TCP training set depending on different network protocol and the
includes 19004 records, but UDP training set only architecture of cooperative intrusion detection is given
includes 2133 records. Second, there are 28235 in this paper. How to split the training dataset will be
(99.5452%) attack records in ICMP training set and our future work.
11311 (59.5190%) attack records in TCP training set. Acknowledgement
However, there only are 219 (10.2672%) attack This work was supported by Guangdong Provincial
records in UDP training set and there are more attack Natural Science Foundation (Grant No. 06021484,
records in testing set. Grant No. 9151009001000007), Yuexiu Zone
In Table3, TD}, TD 2 , TD3, and TD4 are all training Guangzhou city science & technology project (Grant
subsets. Their records come from training data set. TDi No.2007-GX-075).
is obtained, whose records are selected from every 6. References
other 4 records in the ith training set. Such as the [1]M. Weiser, The computer for the 21st century, SCientific American
records of TD 1 come from every other 4 records in the (International Edition), v 265, n 3, Sept. 1991, pp. 66-75.
first training set. There are 12351 records, 12351 [2]V. Vapnik, The Nature of Statistical Learning Theory [M], New
York: Springer-Verlag, 1995
records, 12350 records, and 12350 records in TD}, TD2, [3]L. Tan, T. Sherwood, A high throughput string matching
TD3, and TD4 separately. The experiment results of architecture for intrusion detection and prevention [C], Proc. of
three cycles are shown in Table 3, in which cascade the 32 0d Inter. Symposium on Computer Architecture,
spends much training time during every cycle. Washington, IEEE Computer Society, 2005: 112-122
Table 3: Experiment results with cascade [4]W. Zhou, L. Zhang, L. Jiao, An Analysis of SVMs Generalization
Performance [J], Chinese Journal of Electronics, 2001, 29 (5):
T-time(s) SVs AR (0A) C-Num 590-594
First cycle 108+87+332 10106 81.4905 25346 [5]Y. Liu, D. Tian, X. Yu, et aI., Large-scale network Intrusion
Second cycle 841+763+783 10725 83.59 25999 Detection Algorithm Based on Distributed learning [1], Chinese
Third cycle 870+798+780 10764 83.59 25999 journal of software, 2008, 19(4):993-1003
The comparison of experimental results is in Table [6]http://kdd. ics.uci.edu/databases/kddcup99/kddcup99.html
4 between PISVM and Cascade. From table 4, we can [7]C. Chang and C. Lin, LIBSVM: a library for support vector
machines, 2001, Software available at
see that training time (spent in all three phases) of http.z/www.csie.ntu.twz-cjlin/libsvm
PISVM is less than that of Cascade. And the accuracy
with cooperative algorithm in this paper is higher than

You might also like