Professional Documents
Culture Documents
Svein J. Knapskog
Sylvain Gombault
Departement Reseaux,
securite et multimedia,
TELECOM Bretagne, France
I. I NTRODUCTION
Network security is becoming more and more important
as networks have heavily involved in peoples daily life and
in all business processes within most organizations. As an
important technique in the defense-in-depth network security
framework [1], intrusion detection has become a widely studied topic in computer networks in recent years. In general,
the techniques for intrusion detection fall into two major
categories: signature-based detection and anomaly detection.
Signature-based detection identifies malicious behavior by
matching it against pre-defined description of attacks (signatures). Anomaly detection, on the other hand, defines a profile
of a subjects normal activities and attempts to identify any
unacceptable deviation as a potential attack. Any observable
behavior of a system or a network, e.g., network traffic, audit
logs, system calls, can be used as the subject information.
Intrusion Detection Systems (IDS) can also be categorized
as host-based IDSs and network-based IDSs according to
the target environment for the monitoring. Host-based IDSs
usually monitor the host system behavior by examining the
information of the system, such as CPU time, system calls,
keystroke and command sequences. Examples are [2][3][4][5].
Network-based IDSs, on the other hand, monitor network
Data preprocessing is very important for anomaly intrusion detection and for many data mining related tasks. Data
normalization is a essential step of data preprocessing for
most anomaly detection algorithms that learns the statistical
attributes extracted from the audit data. Data normalization
is to scale the values of each continuous attributes into a
well-proportioned range such that the effect of one attribute
cannot dominate the others. In KDD Cup 1999 data, for
example, the values of attribute dst bytes (number of data
bytes from destination to source) ranges from 0 to 2293370
or even larger, while the attribute same srv rate (number
of connections to the same service) only ranges from 0 to 1.
If the attributes are not normalized into the same (or similar)
scale, one attribute (e.g., dst bytes) may overwhelm all the
others and this means that only one attribute is considered
during the detection and the statistical detection methods thus
are not effective. Except the reference [1] that need the original
attributes to mine some detection rules, other references that
use the statistical attributes, [10], [11], [13], [17], [18] and [19]
did not normalize the attributes before training and detection.
Reference [14] and [15] used a statistical normalization that
converts the data into standard Normal distribution while
Reference [16] converted the data into a range of 0 and 1.
In this paper, we systematically evaluate the impact of
different schemes of attribute normalization on the detection
performance with three anomaly detection algorithms, PCA
(Principal Component Analysis), k-NN (K-Nearest Neighbor),
and one class SVM (Support Vector Machine). We introduce
four different schemes of attributes normalization include
mean range [0,1] normalization, statistical normalization, frequency normalization and ordinal value normalization. KDD
Cup 1999 data are used for the evaluation. The extensive experiments show that attribute normalization improves a lot the
detection performance. Statistical normalization outperforms
the other schemes if the data set is large. We also compare
the performance of the three anomaly intrusion algorithms in
this paper. In practical use, we detect DDoS attacks with the
statistical normalization methods in a real network and the
testing results show its effectiveness.
Our contributions are twofold. First, attribute normalization
is very important for many anomaly detection tasks but it is
often ignored. The comparison results with different schemes
of attribute normalization presented in this paper provide
useful references not only to the anomaly intrusion detection
problem, but also to general classification problems that use
statistical attributes. To the best of our knowledge, this is the
first study that evaluates the impact of attribute normalization
on the classification performance. Second, we analyze the
merits as well as the demerits of anomaly detection algorithms
k-NN, PCA and SVM and suggest their most suitable environments for the detection.
The remainder of this paper is organized as follows. Section
2 describes the schemes of attribute normalization. Section 3
briefly introduces the anomaly detection algorithms used in
this paper. Extensive experiments based on KDD Cup 1999
data are given in detail in Section 4. Experiments based on
Fig. 1.
vi min(vi )
max(vi ) min(vi )
(1)
vi
(2)
C. Ordinal normalization
Ordinal normalization is to rank the continuous value of an
attribute and then normalize the rank into [0,1]. Let r be the
rank of a given value in an attribute, the ordinal normalization
is defined as
r1
xi =
max(r) 1
(4)
(7)
i=1 i
where is the ratio of variation in the subspace to the
total variation in the original space. We form a (m k)
(usually k m for data reduction) matrix U whose columns
consist of the k eigenvectors. The representation of the data
by principal components consists of projecting the data onto
the k-dimensional subspace according to the following rules
[22]
Yi = (Xi )U = U
(8)
The number of principal eigenvectors U1 , U2 , ..., Uk , used
to represent the distribution of the original data, is determined
by (7).
Given an incoming vector T that represents a test sample,
we project it onto the k-dimensional subspace representing the
normal behavior according to the rules defined by (8). The
distance between the test data vector and its reconstruction
onto the subspace is the distance between the mean-adjusted
input data vector = T and
r = (T )U U T = U U T
(9)
If the test data vector is normal, that is, if the test data
vector is very similar to the training vectors corresponding to
normal behavior, the test data vector and its reconstruction will
be very similar and the distance between them will be very
small. On this property our intrusion identification model is
based. As PCA seeks a projection that best represents the data
in a least-square sense, we use the squared Euclidean distance
in the experiments to measure the distance between these two
vectors
=k r k2
(10)
is characterized as the anomaly index. If is below a
predefined threshold, the vector is then identified as normal.
Otherwise it is identified as anomalous.
B. Anomaly detection with K-Nearest Neighbor (K-NN)
K-Nearest Neighbor (k-NN) is a method for classifying
objects based on closest training examples in the feature space.
It is easily accessible and has been demonstrated effective for
many classification tasks [22]. For a given k, k-NN ranks the
neighbors of a test vector T among the training samples, and
uses the class labels of the k most nearest neighbors to predict
the class of the test vector. Euclidean distance is usually used
for measuring the similarity between two vectors:
v
um
uX
d(T, Xi ) =k T Xi k= t (ti xij )2
(11)
i=1
(12)
( (Xi )) i
(13)
Subject to
where is a kernel map that transforms the training
examples to another space.
After and solve the problem, the decision function is
f (x) = sgn(( (X)) )
(14)
Total (#)
98,278
391,458
4,107
1,126
52
Training (#)
7,000
0
0
0
0
Test (#)
10,000
78,291
4,107
1126
52
99
98
original
range [0,1]
statistical
ordinal
frequency
97
96
95
10
Fig. 3.
15
20
25
False Positive Rate (%)
30
35
40
100
99.5
Detection Rate (%)
100
99
98.5
original
range [0,1]
statistical
ordinal
frequency
98
97.5
97
10
Fig. 4.
20
30
False Positive Rate (%)
40
50
100
100
original
range [0,1]
statistical
ordinal
frequency
98
99
97
original
range [0,1]
statistical
ordinal
frequency
95
90
96
95
10
Fig. 2.
20
30
40
False Positive Rate (%)
50
60
85
10
Fig. 5.
20
30
40
50
False Positive Rate (%)
60
70
80
100
100
95
95
85
80
75
original
range [0,1]
statistical
ordinal
frequency
70
65
60
10
Fig. 6.
20
30
False Positive Rate (%)
40
90
90
85
80
70
50
Fig. 9.
SVM.
SVM
KNN
PCA
75
10
20
30
False Positive Rate (%)
40
50
100
our experiments. After the raw data 2 was collected and the
attributes were constructed, we randomly selected 5,000 normal connections for training and 10,000 normal connections
as well as 36,380 DDoS attack connections for testing.
Based on the results of Section 4, we use k-NN for
DDoS attack detection with or without statistical attribute
normalization for comparison and the results are summarized
in Fig.10. From the figure, it is clear that statistical attribute
normalization improve a lot the detection results and k-NN can
achieve a good results with statistical attribute normalization.
95
original
range [0,1]
statistical
ordinal
frequency
90
85
80
75
70
65
60
10
Fig. 7.
20
30
False Positive Rate (%)
40
50
100
100
99.5
99
98.5
96
94
92
Statistical normalization
Original data
98
SVM
KNN
PCA
97.5
97
10
15
20
25
30
False Positive Rate (%)
35
40
90
45
Fig. 8. Comparison: Overall detection results with k-NN, PCA and SVM,
using statistical normalization.
Fig. 10.
5
10
False Positive Rate (%)
15