You are on page 1of 6

Systematic Approach to Intrusion Evaluation using the Rough set based

Classification

R.Ravinder Reddy*, Dr.Y Ramadevi1, Dr.K.V.N Sunitha2


*, 1
Department of Computer Science and Engineering, CBIT, Hyderabad 500075, India
*Assistant Professor and Corresponding author, email: ravindra_rkk@cbit.ac.in
1
Professor and Head, email: yrd@cbit.ac.in
2
Principal, BVRIT for women, Bachupally, Hyderabad 500090, Telengana, India, email:
k.v.n.sunitha@gmail.com

ABSTRACT
In the data driven world finding the appropriate user behavior user is ambitious. Intrusion
detection system is used to do such task, in most of the cases it is not accurate and time
consuming process. In this approach, finding such behavior in effectively and accurately using
the rough set based approach using the attribute scaling. Intrusion is the classification problem, it
is used differentiate the of normal and anomaly behavior accurately. In the process of evaluation
all the attributes may not involve. Rough set theory is used to select the competent attributes. The
preferred attributes may not be scaled properly, scaling of the attributes improves the detection
performance. In this approach, rough set based feature selection and attribute scaling are
combined with classification to increasing the capability of intrusion detection and decrease
detection time.
Keywords: Classification, rough sets, intrusion detection, attribute scaling
1. INTRODUCTION
Intrusion detection is the process of identifying the abnormal behavior in the system, in the early
days of detection is done by observing the log record. Anderson has detected the first intrusion
by auditing the log records [13]. This process of finding the intrusion is time consuming, later
denning [15] has proposed first IDS model based on the user profiles by inspecting the audit
data. Based on this approaches preparation of the model and test for intrusion takes huge
amounts of time. Rather intrusion detection should be done in timely and accurately. In the late
90’s data mining based approaches [14] has increased the detection rate and decreases the
detection time. Later on this approach feature selection techniques [11] are applied for reducing
the dimensionality of the feature vector, this approach has reduces the detection time
considerably without affecting the detection accuracy. Rough set theory based approaches are
proved significant performance gain by its feature selection techniques. In this work rough set
based feature selection is used to reduce the dimensionality of the dataset. The obtained dataset
contains the heterogeneous features, it severely affecting the accuracy of the model. Proper
scaling of the data will enhance the performance of the model.

2. RELATED WORK
2.1 Intrusion Detection
Intrusion Detection Systems (IDS) have become indispensable in the security infrastructure so
that intrusions may be detected to prevent large scale damage before it is too late. Rapid growth
of the computer network activities has increased the rate of network attacks. Advancement in the
network has increased the usage in all the aspects including financial transactions, which impact
the major parts of the critical information like, confidentiality, integrity and availability (C I A
triangle) [8]. Intrusion detection involves supervision of computers or networks to prevent
unauthorized access, activity, or change of data.
Based on the detection techniques, intrusions are classified into misuse, anomaly and hybrid.
Misuse detection looks for known signatures. Anomaly-based network intrusion detection can
detect known as well as unknown attacks [3]. Hybrid techniques combine both misuse and
anomaly. Based on the detection source, it classified into Host based and Network based IDS.

2.2 Rough set theory


Intrusion detection is a problem of classification. In the process of classification, all the features
need not participate, and subsets of features (optimal) are used for evaluation of intrusion
detection. Feature selection is necessary for reducing the dimensionality of the data [11]. The
selection of features is a first step in the classification process for selecting the optimal features
from the dataset. It requires a better feature selection technique. Feature selection is an essential
task for intrusion classification for timely detection of intrusion. In this chapter, rough set based
feature selection technique for intrusion detection is presented.
Identifying the intruder behavior in the network as well as in the system is an arduous and
time-consuming mechanism. The main aim of feature selection is to determine a minimal feature
subset from a problem domain. This must be done even while retaining high accuracy in
representing the original features [3]. Rough set approach is mostly used for dimensionality
reduction for removing the unnecessary features [7]. Zhang et al. [4] explained the capability of
Rough Set Theory (RST) in determining the categories of attacks in IDS according to
classification rules. Their findings showed that rough set classification attained high detection
accuracy and the feature ranking was fast . Most of existing IDS use all the data features for
detecting intrusions. In the literature of IDS very few researcher address the importance of
having a small feature subset. The feature subset obtained exercises influence on the accuracy of
the intrusion detection.
2.3 Testbed
In the initial stages of IDS growth, the standard datasets are not available [2]. The KDD initiate
the process and designed the standard intrusion dataset. Earlier TCPDUMP data has been used
for evaluation of IDS. Before the KDDCUP99 [6] dataset IDS data is normally collected from
three sources, namely data packets from networks (for NIDS), command sequences from user
input, and low-level system information, such as, system call sequences, log files, and
CPU/memory usage (for HIDS). Evaluation of IDS needs standard dataset. In this thesis, to
conduct the experimentation for Network-based IDS, the following standard benchmark datasets
are used.
1. KDDCUP99
2. UNB ISCX
3. HTTP-CSIC
The KDDCUP99 dataset is used in many of the intrusion applications it shows the good
results. Later ISCX and CSIC dataset are introduced for enhancing the IDS performance.
ISCX dataset is prepared from the real time captured packets, the CSIC dataset is
prepared for the web traffic applications.
3. METHODOLOGY
Intrusion detection dataset contains the both numerical and categorical attributes. The ranges of
the attributes are affecting the classification performance in the system. The main power of
scaling is avoidance of attributes in greater numeric ranges against those in smaller numeric
ranges. Another advantage is to avoidance of numerical difficulties during calculations. Rough
set theory is derived to reduce the number of features, quick reduct algorithm is used to obtain
the optimal number of features from the dataset.
Data scaling is applied for the obtained dataset for both the numerical and categorical
features for enhancing the detection rate of the system. The scaled dataset is used to classify the
intrusion model accurately. Process flow of the system as shown in the figure 1.
Algorithm: Data scaling based IDS model
Input: Intrusion Dataset
Output: Intrusion accuracy

Begin:
1. Intrusion dataset has given to the model
2. Rough set theory is used to reduce the features
3. Data scaling is applied to the dataset
4. Perform the data mining based classification
5. Result analysis

End

Intrusion Dataset

Rough set feature


selection

Data scaling

Classification
Figure 1: Data flow in the system

4. RESULT ANALYSIS
As measuring the performance of the classifier accuracy is not sufficient. The F-measure giving
the harmonic mean of precision and recall, G-mean giving the geometric mean of normal and
anomaly accuracies.
2 × precision × recall
F − measure =
precision + recall
G-mean concerns the two accuracies of both classes at the same time. The G-mean is the
geometric mean of specificity and sensitivity. It is used when performance of both classes is
expected to be high simultaneously. It is a good indicator on overall performance. It is very
useful for the imbalanced datasets.
𝐺 − 𝑚𝑒𝑎𝑛 = √𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 ∗ 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦

The experimentation is conducted for the three intrusion datasets by using the data mining
classification algorithm. The results shown in table 1 are for the original dataset.
Table 1: Intrusion analysis to the original dataset
F- G-
Dataset Accuracy Precision Recall Kapaa
Measure Mean
KDDCUP99 97.37 0.974 0.974 0.974 0.94 0.974
ISCX 99.56 1 0.996 0.998 0.561 0.997
HTTP-CSIC 96.52 0.966 0.965 0.965 0.929 0.965

The dataset is refined using the rough set based feature selection and data scaling. The obtained
results are shown in table 2, the performance of the model has increased.
Table 2 Intrusion analysis with scaling
Accuracy Precision Recall F-Measure Kapaa G-Mean
KDDCUP99 99.95 0.999 0.998 0.998 0.998 0.998
ISCX 100 1 1 1 1 1
HTTP-CSIC 99.98 0.999 0.998 0.998 0.998 0.998

The roc curve is plotted for ISCX dataset [5] has the maximum performance of the classifier as
shown in figure 2.
Figure 2: ROC Curve for ISCX dataset
5. CONCLUSION
In evaluation of IDS both the empirical risks and structural risks are important considerations.
By adopting the rough set based feature selection reduction in dimensionality of the dataset has
been achieved. This decreased the empirical risk of the model. Data scaling has improves the
detection performance of the model. Experiments were conducted for the intrusion datasets and
achieved performance gain.
REFERENCES
1. Sen S, Clark JA, “Evolutionary computation techniques for intrusion detection in mobile ad
hoc networks”, Computer Networks, 55, pp:41–57, 2011.
2. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA, “Toward developing a systematic
approach to generate benchmark datasets for intrusion detection”, Computer Security,
Vol.31,pp:357–74, 2012.
3. Langley P, “Selection of relevant features in machine learning”, In Proceedings of the
AAAI Fall Symposium on Relevance, pp. 1-5, 1994.
4. L. Zhang, G. Zhang, L. Yu, J. Zhang, and Y. Bai, “Intrusion Detection Using Rough Set
Classification”, Journal of Zheijiang University Science. 5(9), pp. 1076-1083, 2004.
5. http:// iec.csic.es/dataset/
6. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
7. Pawlak Z., “Rough sets”, International Journal of Computer and Information Sciences,
vol.11, pp: 341-356, 1982.
8. Carlos A. Catania, Calos Garcia Garino, “Automatic network intrusion detection: Current
techniques and open issues”, Computers an Electrical Engineering 38, pp: 1062-1072,
2012.
9. C.A Catania, C.G Garino, “Automatic network intrusion detection: Current techniques and
open issues”, Computers & Electrical Engineering 38 (5), pp: 1062-1072, 2012.
10. L. H. Zhang, G. H. Zhang, L. Yu, J. Zhang and Y.C. Bai, “Intrusion detection using rough
set classification”, Journal of Zhejiang University Science, 5(9), 1076-1086, 2004.
11. Dash M & Liu H., “Feature Selection for Classification. Intelligent Data Analysis”, Vol. 1,
No. 3, pp. 131-156, 1997.
12. Lee W, Stolfo S. J., And Mok K. W, “A data mining framework for building intrusion
detection models”, In Proceedings of the 1999 IEEE Symposium on Security and Privacy,
1999.
13. J. P. Anderson, “Computer Security Threat Monitoring and Surveillance”, Technical
Report, April 1980.
14. Lee W and Stolfo S.J, “Data Mining techniques for intrusion detection”, In: Proc. of the 7th
USENIX security symposium, San Antonio, TX, 1998.
15. Denning D, “An Intrusion-Detection Model”, IEEE Transactions on Software Engineering,
Vol. SE-13, No. 2, pp.222-232, 1987.

You might also like