Professional Documents
Culture Documents
. I ... I
Training
module
4. Experiment and Result
P"'lJl'<:lCl,'C smg ~
I L1)], l'aralld l rallllllg
4.1. Experiment data sets
"~~"~";"g T' I
I I
I
I Training
[eM" ",,,,,,n,,,,,,,,, I
module
KDD CUP 1999(6) data are standard data sets for
intrusion detection, including the training data sets and test
data sets. Among the 41 attributes ofKDD CUP 1999 data,
Fi some are digital, but others are given in character strings.
gure 2. The module of cooperative intrusion detection However, SVM can accept input of digital types only.
To construct detection agent shown in Fig. 3, three-layer Therefore, before training, we should make the input data
architecture is used in the training module. The first layer is to be digital type and normalized. Simple substitution is
control center to split task and supervise running state of used to transform the symbols into numerical data. All
every training computer. The second layer is training symbols, such as protocol-type, service, and flag, are
computers to train the sub-tasks and determine whether replaced by different numerical values. For examples, the
KKT condition is met or not. The third layer is data fusion three types of protocols (TCP, UDP and ICMP) are
to fuse the all results obtained from training computers and described by 1, 2, and 3. The 71 types of service are
reconstruct new training dataset. Then new training subsets substituted with 1, 2, ... , 71. A record is labeled by either 1
are reconstructed by using incremental learning algorithm or -1, where normal record is 1 and abnormal record is -1.
of SVM in Section 3 for every training computer. Every The input data set is normalized with Libsvm [71.
detection agent has ability of self-learning, called as self- In order to shorten the training time and ensure
adaptor. The improved parallel algorithm is used to train representativeness of the data chosen, the same interval is
SVM by using the incremental learning algorithm of used to select data. We select four training data sets as
parallel SVM in Section 2. Therefore, the method is follows. The training data set is obtained as: from the first
suitable for incremental learning or online learning. In 10 percent records in the training data set of KDD CUP
other words, there is a good self-adaptive capacity. 1999, we select one record in every other 15 records. Thus,
There are two methods to deal with the testing set. 1) totally 32935 records are selected for training. In the test
The training is done in control center. Every detection data set ofKDD CUP 1999, one record is selected in every
computer gets detection model from control center in other 20 records and totally 15552 records are selected for
testing process. 2) The training is done in detection test. Then training sets and testing sets are divided into six
computer. Two methods have advantages and subsets according to network protocol: TCP train set, UDP
disadvantages: the first method only changes the control train set, ICMP train set, TCP test set, UDP test set and
center when the result is updated with adding new training ICMP test set. Table 1 shows the distribution of
set. However, every detection computer needs to reread experimental data.
Table 1: The distribution of experimental data
training model from control center when detection
computers restart. Therefore, performance of every TCP UDP ICMP
detection agent is affected by speed of network and size of Data Test Train Test Train Test Train
type set set set set set set
Total 11921 19004 2686 2133 16496 28364 number of support vector; AR (Accuracy Rate) is the rate
normal 4387 7683 1648 1914 37 129
of the test rewords detected with classifier; C-Num is the
4.2. Experiment Results and Analysis number of the records predicted as corrected; N-cycle is the
In table 2, T-time is the training time. And every number number of cycle. In the follow experiments, e is 50.
is max training time of classifier in every layer; SVs is the
Table 2: Experiment results with PISVM