Professional Documents
Culture Documents
1
M-Tech Research Scholar, LNCT,Bhopal
2
Department of CSE LNCT, Bhopal
3
Department of CSE SRCEM, Banmore
1
chaturvedisushilkumar@yahoo.co.in
2
vineet_rich@yahoo.com
3
girishniru@yahoo.com
Abstract-As the network dramatically extended security Host Based ADS:-these types of systems actually run
considered as major issue in networks. There are many on the system being monitored. These data come from
methods to increase the network security at the moment such the records of different host system activities,
as encryption, VPN, firewall etc. but all of these are too static
including appraisal record of OS, system logs,
to give an effective protection against attack and counter
application program information, and so on.
attack. We use data mining algorithm and apply it to the
anomaly detection problem. In this work our aim to use data
Network Based ADS:-these types of system are placed
mining techniques including classification tree and support
vector machines for anomaly detection. The result of on the network, near the system or system being
experiments shows that the algorithm C4.5 has greater monitored. They examine the network traffic and
capability than SVM in detecting network anomaly and false determine whether it falls within acceptable
alarm rate by using 1999 KDD cup data. boundaries. these data come through network
segments, such as :Internet packets.
Keywords- Data Mining; Support Vector Machines;
classification Tree; Anomaly Detection Systems (ADS) Anomaly detection techniques are classified into two
categories [3]:
I. INTRODUCTION
1. Anomaly Detection: Anomaly detection refers to storing
In recent year computer technology have been utilized
features of user’s usual behaviors into database, then
by many people all over the world in several areas. With
comparing user’s current behavior with those in database.
the development of internet technology, network security
If the deviation is huge enough, we can say that there is
has become a global focus in the world. Traditional
something abnormal.
security such as firewall, VPN and data encryption is
insufficient to detect against attacks by crackers. However, 2. Misuse Detection: Misuse Detection refers to
intrusion detection is a dynamic one, which can give confirming attack incidents by matching features through
dynamic protection to the network security in monitoring, the attacking feature library.
attack and counter attack [1]. For collecting the data set,
Anomaly Detection System (ADS) can be classified as We decided to use data mining for solving the problem of
host-based and network-based [2]. network intrusion because of following reasons [1, 4, 5, 6,]:
349
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
Data mining can process huge amount of data. 3. If the selected attribute is discrete (categorical),
the node is branched with all possible values. If
It is more useful to find out the ignored and hidden the attribute is continuous, a cut point with the
information. highest information gain is selected.
4. After splitting, consider whether or not these new
Data mining algorithms are used to perform data nodes are leaves (their data belong to the same
type); otherwise, new nodes are the root of the
summarization and visualization that help the security
sub-trees.
analysis in various areas. [7]. 5. Repeating all the above steps, until all new nodes
are leaves.
II. RELATED WORK
Algorithm C4.5 (D)
Denning was amongst the first persons to think in the area
of application of data mining to network security. He has Input: an attribute-valued dataset D
given a model of a real –time intrusion-detection expert 1: Tree = {}
system [8]. The concept behind the model is that 2: if D is “pure” OR other stopping criteria met then
exploitation of a system’s vulnerabilities involves abnormal 3: terminate
4: end if
usage of system and this abnormality can be detected by
5: for all attribute a € D do
looking for the abnormal patterns in the audit records. The 6: Compute information-theoretic criteria if we split on a
model proposed is capable of detecting break-ins, 7: end for
penetrations, and other forms of computer anomaly.in this 8: abest = Best attribute according to above computed
paper we are using two methods of anomaly detection criteria
SVM (Support Vector Machine) and C4.5 that is extended 9: Tree = Create a decision node that tests abest in the root
version of classification algorithm ID3. Both the methods 10: Dv = Induced sub-datasets from D based on abest
11: for all Dv do
are supervised algorithm. We are performing comparison
12: Treev = C4.5(Dv)
on the basic of detection rate and false alarm rate. 13: Attach Treev to the corresponding branch of Tree
14: end for
III DATA MINING ALGORITHMS 15: return Tree
350
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
For a two-class linearly separable learning task, the aim IV. EXPERIMENTS
of SVC is to find a hyperplane that can separate two classes
of given samples with a maximal margin which has been We tested our work using the 1999 KDD cup network
proved able to offer the best generalization ability. anomaly data set [17]. It originated from the 1998 DARPA
Support Vector machines are a set of related supervised Intrusion Detection Evaluation Program managed by MIT
learning methods used for classification and prediction Lincoln Labs.
[11]. The first stage is pre-processing. Data in this phase
a margin can be defined as the amount of space, or partition into training and testing. In the next step, we
separation, between the two classes as defined by a applied C4.5 and SVM on training dataset in order to build
hyperplane. Geometrically, the margin corresponds to the and train the models.
shortest distance between the closest data points to any Finally trained models are evaluated on testing dataset to
point on the hyperplane. Figure 1 shows optimal calculate the efficiency of the models.
hyperplane for a linearly seperable case.
The training data set consists of seven weeks of traffic
with around 5 million connections and the testing data
Optimal Hyper Plane consists of two weeks of traffic with around 300,000
connections. The data contains four main categories of
attacks:
Figure 1 Denial-of-service (Dos) such as smurf, apache2,pod,
R* etc.
Remote-to-local (R2L) like imap, worm, phf,etc.
User to root (U2R) such as perl, rootkit and so on.
PROBING such as nmap, portsweep, etc.
mining algorithms can lead to better results if data under
analysis have been normalized [18].
Detection of attack can be measured by following metrics:
R* False positive (FP): Or false alarm, Corresponds to the
number of detected attacks but it is in fact normal.
False negative (FN): Corresponds to the number of
detected normal instances but it is actually attack, in
other words these attacks are the target of intrusion
Hyperplane can be written as [12]. detection systems.
True positive (TP): Corresponds to the number of
T
w x+b=0 (1) detected attacks and it is in fact attack.
True negative (TN): Corresponds to the number of
Where W = {w1, w2, …, wn } are weight vectors for n detected normal instances and it is actually normal.
attributes A = { A1, A2, …, An }; b is a scalar, and X ={x1,
x2, …, xn} are values of attributes. R* desired directionally
geometrical distance from the sample x* to the optimal The accuracy of an intrusion detection system is
hyperplane [13, 14]. For more details on support vector measured regarding to detection rate and false alarm rate.
machines, you can refer to [15, 16]. In this work, we use 1999 KDD cup Dataset which consist
of (311129 records).
351
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
Table 1 given below shows the percentage of data. Then, TABLE 2
15% of data is extracted by sampling. 70% of this new set DETECTION RATE COMPARISION OF DIFFERENT ATTACKS
belonged to training set, and 40% dedicated to test data. THROUGH C4.5 AND SVM
352
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 5, May 2012)
REFERENCES [20] Prabhjeet Kaur , Amit Kumar Sharma, Sudesh Kumar Prajapat
“MADAM ID FOR INTRUSION DETECTION USING DATA
[1] M. Xue, C. Zhu, "Applied Research on Data Mining Algorithm in MINING” IJRIM Volume 2, Issue 2 (February 2012) (ISSN 2231-
Network Intrusion Detection," jcai, pp.275-277, 2009 International 4334)
Joint Conference on Artificial Intelligence, 2009.
[11]http://en.wikipedia.org/wiki/Support_vector_machine
[13] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification, Wiley,
2001.
[17] http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
353