Professional Documents
Culture Documents
Table of Content
Chapter 1: Introduction
Clustering Techniques
(Unsupervised Learning)
Decision Trees
Response Rules
Research Goal
Chapter 2: Background
Related Works
Overview
PAGE 2 |
Chapter 5: Conclusion
| 24 April 2011
References
Introduction
Network Security and Intrusion
PAGE 3 |
24 April 2011
Introduction
Intrusion Prevention and Detection
firewall is used to prevent attacks but it is not able to examine all the details of
the traffic on the network. A firewall follows the rules set by the administrator
and respond if a rule match the traffic. No magic and no intelligence
PAGE 4 |
| 24 April 2011
Introduction
Intrusion Detection System
Note: An Intrusion Detection System must be comprehensive and reliable; able to detect
intrusion in real time and store its signature patterns.
PAGE 5 |
| 24 April 2011
Introduction
Research Goal
PAGE 6 |
| 24 April 2011
Background
Intrusion Detection Methods
Traditional Method
Signature-Based
- Manual and highly susceptible to human error.
- Cannot detect emerging cyber threats
Misuse
- Automatic
- High accuracy in detecting known attack
- Inability to detect attacks whose instances have not yet been observed.
Anomaly
- Automatic
- High false alarms
- Computationally expensive
Our focus shall be automatic discovery of network intrusion using data mining based approaches.
PAGE 7 |
| 24 April 2011
Background
Related Works
Techniques used in data mining based intrusion: Single and Multiple Techniques
Single Techniques perform analysis and detection based on a single data mining
method.
Multiple Techniques perform same functions using combined data mining methods
sometimes ensembled.
This data mining methods include unsupervised learning methods and supervised
learning methods.
Clustering Based Methods. Ability to detect new attacks not seen before. This implies
that attack types with unknown pattern signatures can be detected.(Eskin, 2000; Portuoy
et al., 2001; Nong et al.,2001 and Guan et al.,2003).
Hybrid Techniques. Combination of Data Mining Methods often yields better result.
(Abadeh et al. 2005 and Bashah, 2005).
PAGE 8 |
| 24 April 2011
Experimenting with a generalized publicly available dataset like KDD Cup 99 dataset.
2.
This two steps will be achieved independently by (A) Building a Classifier System, a - d
(B) Building a Knowledge Based System, e - f
a.
a.
b.
Filtering discrepancies in the results and cluster or perform unsupervised learning on the outliers.
c.
Performing an intelligent training, validation and testing on the outlier classes using ANN with target of 0.05100 accuracy.
d.
Modelling the result using rule generation techniques like Decision tree etc.
e.
Developing an integrated signature knowledge based containing known signature rules and responses integrated with newly
established ones
f.
Developing a response controller that generates active responses and alerts, frequent patterns and traffic relationship, severity
levels and some useful forensic information
Knowledge
Based
System
Classifier
System
PAGE 9 |
| 24 April 2011
PAGE 10 |
Probing
22 Attack Classes
| 24 April 2011
Software. The software used for the analysis is WEKA 3.6.2 Open Source Application
that has exploratory and visual features developed at University of Waikato, Hamilton,
New Zealand.
There are features for pre-processing,
clustering, classification, association,
It integrates different algorithms.
attribute selection and visualization.
PAGE 11 |
| 24 April 2011
The first stage used formatted and normalized KDD cup 99 dataset with unsupervised clustering
techniques. The algorithms used for the experiment are K- Means and Expectation Maximization
(EM). The algorithms were used because of their ability to group traffics into clusters without target
variable - the datasets are clustered into five that is; normal, DoS, probe, U2R and R2L.
To be able to confirm the result of this clustering, very intelligent algorithm is needed to validate the
results obtained from clustering. We therefore modelled Multilayer Perceptron (MLPs) and Radial
Basis Function Artificial Neural Networks (RBFs) with different parameters setting momentum as
0.2 and Learning Rate, 0.3. The best of MLPs and RBFs were chosen for each of the training and
testing exemplars. When this step is completed, an alarm is produced which triggers the rules
generated engine modelling based on individual attack type to avoid poor detection of U2R and
even other intrusion types which are few in KDD dataset intrusions. The result obtained could then be
integrated.
Since an ideal automatic intrusion detection system needs to respond intelligently in real time to
abnormal events, there is need to have rules generated algorithm that could serve as basis for
response control. Therefore, Decision Trees rule based response control system were modelled.
The decision tree algorithms tested included Naive Bayes trees, J48 C4.5 and Classification and
Regression Tree (Simple CART) - In order to generate set of distinctive rules for each intrusion, the
attacks patterns for each attack type audit were investigated using decision trees in stage three.
Learning Phases: It has two phases: training phase and testing phase. Training phase is to adapt the
model to the training examples while the testing phase is to evaluate the ability of the learned
classifier to generalize to new data points. The testing option used to model was percentage split.
PAGE 12 |
| 24 April 2011
K-Means
EM
Percentage
Accuracy
47.4169
63.2616
Correctly
Classified
Exemplars
8,903
11,878
Incorrectly
Classified
Exemplars
9,873
6,898
K-Means
EM
PAGE 13 |
R2L
359
3768
2
1465
27
DOS
3094
53
527
0
0
PROBE
682
1856
305
0
0
PROBE
688
1681
311
144
2
No Class
1531
6
126
1
0
NORMAL
215
7428
0
991
0
| 24 April 2011
Unsupervised
Learning/ Clustering
Confusion Matrices
DOS
3133
81
533
15
0
R2L
174
543
139
1045
35
NORMAL
36
4000
140
700
10
No Class
1570
6
129
1
0
DOS
NORMAL
PROBE
R2L
U2R
DOS
NORMAL
PROBE
R2L
U2R
PAGE 14 |
| 24 April 2011
DoS&
Normal
Probe&
Normal
R2L&
NormaL
U2R&
Normal
PAGE 15 |
Metrics
MLP( Layers)
RBF(Number of Iterations)
237.81
357.84
12.39
12.88
12.66
13.6
98.9862
86.4754
90.0129
90.0129
89.9914
TP Rate
0.977
0.99
0.865
0.9
0.9
0.9
FP Rate
0.036
0.006
0.105
0.085
0.085
0.085
Time(s)
137.91
212.6
11.01
10.58
11.2
10.87
99.168
94.792
94.7304
95.5008
95.6549
TP Rate
0.989
0.992
0.948
0.947
0.955
0.957
FP Rate
0.015
0.028
0.009
0.419
0.008
0.008
Time(s)
116.97
176.59
10.82
10.63
11.27
10.76
98.5706
87.2478
90.639
89.1256
89.0975
TP Rate
0.983
0.986
0.872
0.906
0.891
0.891
FP Rate
0.033
0.033
0.043
0.039
0.041
0.041
Time(s)
98.61
154.76
10.22
10.17
10.01
10.7
99.7274
99.3186
99.3186
99.3186
99.3186
TP Rate
0.997
0.997
0.993
0.993
0.993
0.993
FP Rate
0.397
0.397
0.993
0.993
0.993
0.993
Time(s)
| 24 April 2011
PAGE 16 |
| 24 April 2011
Metrics
Simple CART
C4.5
DoS
Time to Model
Size of Tree
Accuracy
TP Rate
FP Rate
Time to Model
Size of Tree
Accuracy
TP Rate
FP Rate
Time to Model
Size of Tree
Accuracy
TP Rate
FP Rate
Time to Model
Size of Tree
Accuracy
TP Rate
FP Rate
10.71
13
100
1
0
1.22
13
99.7952
0.988
0.007
1.87
9
100
1
0
0.04
3
72.7273
0.727
0.152
1.05
68
100
1
0
0.17
19
99.0964
0.991
0.007
0.35
3
99.8471
0.998
0.002
0.01
9
72.7273
0.727
0.152
Probe
R2L
U2R
PAGE 17 |
| 24 April 2011
PAGE 18 |
| 24 April 2011
Expectation Maximization Algorithm (EM) recorded a log likelihood of 7.71061 while K-means
recorded sum of squared errors of 20901.44549244482 after seventeen iterations. From the accuracy
and confusion matrices presented, EM yielded a better performance than K-Means with percentage
accuracy of 63.2616. The inability of the two algorithms to classify U2R attacks as shown in Clustering
Models Result and Confusion Matrices informed our decision to model the dataset by attack types in
the next stage of the experiment.
The results of the model of Multilayer Perceptron, MLP and Radial Basis Function, RBF for DoS in
Artificial Neural Networks Models Table has MLP with two hidden layers and RBF with one iteration
performing better than all other MLPs and RBFs; MLP Accuracy = 97.7135 and RBF Time= 12.35.
Also, the results of the model of MLP and RBF for Probe had MLP with one hidden layer and RBF
with two iterations performing better than all other MLPs and RBFs; MLP Accuracy = 98.8906, high TP
Rate and Lower FP Rate and RBF Time= 10.58. For R2L, MLP with two hidden layers and RBF with
two iterations performed better than all other MLPs and RBFs; MLP Accuracy = 98.5706, high TP
Rate and Lower FP Rate and RBF Time= 10.01. Lastly, the model of U2R produced MLP with two
hidden layers and RBF with three iterations as the best of all our models having recorded percentage
accuracy of 99.7274 with the highest TP Rate and least FP Rate and modelling time of 10.01.
For the decision tree models , Simple CART algorithm performed best out of the three algorithm
tested based on accuracy, moderate size of tree, generalization and lesser time to model compared to
NBTree having recorded the least percentage accuracy of 72.723 at U2R, size of tree that fell
between 3 and 13, least TP Rate of 0.727, highest FP Rate of 0.227 and moderate model time whose
highest was 10.71 at DoS. NBTree produced the worst results in for all the metrics considered.
Although, C4.5 has comparable accuracy as Simple CART, its size is large. Therefore, the Simple
CART algorithm was used to generate the response rules for the IDS.
PAGE 19 |
| 24 April 2011
S/No
DoS
IF((flag =REJ V flag = RSTO V flag=SO) ^ land< 0.5) THEN intrusion = neptune
IF((flag =REJ V flag = RSTO V flag=SO) ^ land >=0.5) THEN intrusion = land
IF((flag!=REJ V flag !=RSTO V flag !=SO) ^ src_bytes>= 1256.0 ^ protocol_type = icmp) THEN intrusion =
pod
IF((flag!=REJ V flag !=RSTO V flag !=SO) ^ src_bytes>= 1256.0 ^ protocol_type != icmp) THEN intrusion =
back
IF(count < 3.5 ^ (service = eco_i V service = ecr_i) THEN intrusion = ipsweep
IF(count < 3.5 ^ (service != eco_i ^ service != ecr_i) ^serror_rate < 0.5 ^ flag = SF) THEN intrusion = satan
IF(count < 3.5 ^ (service != eco_i ^ service != ecr_i) ^serror_rate < 0.5 ^ flag != SF) THEN intrusion =
portsweep
IF(count >= 3.5 ^ dst_host_count <128.5 ^ dst_hast_same_src_port_rate < 0.56) THEN intrusion=
portsweep
IF(service = pop_3 V service = telnet) THEN intrusion = guess_passwd
4
Probe
4
R2L
1
2
IF((service = pop_3 V service = telnet) ^num_failed_logins< 0.5 ^( flag = REJ V flag = RSTO)) THEN intrusion
= phf
IF((service = pop_3 V service = telnet) ^num_failed_logins< 0.5 ^( flag != REJ V flag != RSTO) ^ ( service =
http V service = login)) THEN intrusion = ftp_write
IF((service = pop_3 V service = telnet) ^num_failed_logins< 0.5 ^( flag != REJ V flag != RSTO) ^ ( service !=
http V service != login)) THEN intrusion = warezmaster
IF((service != pop_3 V service != telnet) ^num_failed_logins>= 0.5) THEN intrusion = guesspasswd
3
4
U2R
PAGE 20 |
Rules
| 24 April 2011
CLUSTERER
Prefrom
source host and
process
or
Network
organized
Records
ket
security
s
signature
Rules
remediesgeneand
rator
Signature
Knowledge
Base
format it
Paramet
ers
rules
and
provides
Respo
nse
Contro
-ller
PAGE 21 |
| 24 April 2011
Target
host
Conclusion
Conclusion and Future Works
The proposed AIR-IDS is feasible with the outcome of our experiment which
proves that clustering techniques, artificial neural networks and decision
trees are a right combination to ensure improved intrusion detection in an
increasingly connected world where networks experience increased attacks
daily.
As to the response, the rules themselves are not the response to intrusions
but the basis for determining the appropriate response to be applied on the
network to combat the activities of hackers, intruders or criminal. In future, we
will
experiment with more algorithms and ensemble them if need be based on
their advantages on KDD dataset and some domain-specific real-life dataset
experiment and conclude on the appropriate way to optimize the algorithms
used.
develop knowledge base of responses and signatures
develop and implement a system that could detect, prevent and report
intrusions in network environment.
PAGE 22 |
| 24 April 2011
References
Abadeh, M.S., Habibi, J. & Lucas, C. 2005. Intrusion Detection Using a Fuzzy Genetic-Based Learning Algorithm, Journal
of Network and Computer Application, 30: 414-428.
Aldous, D., 1991. The continuum random tree. I, The Annals of Probability. pp. 128. Bace, R.G. 2001. Intrusion
Detection, Technical Publishing (ISBN 1-57870- 185-6).
Beghdad, R., 2008 Critical Study of Neural Networks in Detecting Intrusions, Journal of Computers and Security, 27: 168175.
Breiman, L., 2001. Random Forests, Machine Learning, vol. 45, no. 1, pp. 532.
Chang, C & Lin, C., 2001. LIBSVM: a library for support vector machines, Available at
http://www.csie.ntu.edu.tw/cjlin/libsvm. [Accessed 28th Nov., 2009].
Denning, D.E., 1987. An Intrusion Detection Model, IEEE Trans-actions on Software Engineering, SE-13:222-232, 1987.
Eskin, L., 2002.Anomaly detection over noisy data using learned probability distributions. In Proc. 17th Int. Conf. Machine
Learning, pages 255{262, San Francisco, CA.
Gomez, J. & Dasgupta, D., 2002. Evolving Fuzzy Classifiers for Intrusion Detection, The Proceedings of the IEEE
Workshop on Information Assurance, pp: 68-75.
Guan, Y, Ghorbani, A.A. & Belacel, N. 2003. Y-means: a clustering method for intrusion detection, In Canadian
Conference on Electrical and Computer Engineering, pages 1{4}
References
Hawrylkiw, D. 2010. Intrusion Detection FAQ: Network Intrusion and use of Automated response, SANS Institute.
Available at
Heady, R., Luger, G. Maccabe, A., & Servilla, M., 1990. The architecture of a network level intrusion detection system,
Technical report, Computer Science Department, University of New Mexico.
Hofmann, A. Schmitz, A. & Sick, B. 2003. Intrusion Detection in Computer Networks with Neural and Fuzzy Classifiers
ICANN/ICONIP 2003, 2714(2003), p. 174.
http://www.sans.org/resources/idfaq/auto_res.php. [Accessed 5 October, 2010].
Javitz, H.S. & Valdes, S., 1993. The NIDES Statistical Component: Description and Justification, Technical Report,
Computer Science Laboratory, SRI International.
John, G., & Langley, P. 1995. Estimating continuous distributions in Bayesian classifiers, in Proceedings of the Eleventh
Conference on Uncertainty in Artificial Intelligence. pp. 338345.
Lee, W & Salvatore, J.S., 1998. Data Mining Approaches for Intrusion Detection, Proceedings of the 7th USENIX Security
Symposium, San Antonio, Texas, January 26-29, 1998
Nong, Y. & Xiangyang, L. 2001. A scalable clustering technique for intrusion signature recognition, In Proc. 2nd IEEE
SMC Information Assurance Workshop, pages 1{4}.
Pfahringer, B., 2000. Winning the KDD 99 Classification Cup: Bagged Boosting, Journal of SIGKDD Explorations, 1, pp.
65-66.
Portnoy, L., Eskin E., & Stolfo S., 2001. Intrusion detection with unlabeled data using clustering, In ACM Workshop on
Data Mining Applied to Security, Philadelphia, PA, November 2001
References
Quinlan, J., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
R. Kohavi, R., 1996. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid, in Proceedings of the
Second International Conference on Knowledge Discovery and Data Mining, vol. 7.
Ruck, D., Rogers, S., Kabrisky, M., Oxley, M., & Suter, B., 1990. The multilayer perceptron as an approximation to a Bayes
optimal discriminant function, IEEE Transactions on Neural Networks, vol. 1, no. 4, pp. 296298. IEEE Transactions on
Neural Networks, vol. 1, no. 4, pp. 296298.
Siraj, M.M., Maarof, M.A. & Hashim, Z. M., 2009. "Intelligent Clustering with PCA and Unsupervised Learning Algorithm in
Intrusion Alert Correlation," ias, vol. 1, pp.679-682, 2009 Fifth International Conference on Information Assurance and
Security.
Song, D., Heywood, M.I. & Zincir-Heywood, A.N. 2005. Training Genetic Programming on Half a Million Patterns: An
Example from Anomaly Detection, IEEE Transactions on Evolutionary Computation, 9: 225-239.
Successful Real-Time Security Monitoring, Riptech Inc. white paper, September 2001
Tavallaee, M. Bagheri, E., Lu, W., & Ghorbani, A.A. 2009. A Detailed Analysis of the KDD CUP 99 Data Set, Proceeding of
2009 IEEE Symposium on Computational Intelligence in Security and Defence Application.
Waikato environment for knowledge analysis (weka) version 3.6.2. Available at: http://www.cs.waikato.ac.nz/ml/weka/,
[Accessed 24 April., 2008].
Yeung, D.Y. & Ding, Y., 2003. Host-Based Intrusion Detection Using Dynamic and Static Behavioral Models, Journal of
Pattern Recognition, 36: 229-243.
Zhong, S. Khoshgoftaar, T.M. & Seliya, N., 2004. Evaluating clustering techniques for network intrusion detection, In 10th
ISSAT Int. Conf. on Reliabilityand Quality Design, pp. 173(177), Las Vegas, Nevada, USA, August 2004}.
Thank You
Framework for An Intelligent Rule-Based
Network Intrusion Detection System
Future Technology for Detecting and Combating Malware