Intrusion Detection Systems Using Decision Trees and Support Vector Machines

Intrusion Detection Systems Using Decision Trees and Support Vector Machines
Sandhya Peddabachigari, Ajith Abraham*, Johnson Thomas

Department of Computer Science, Oklahoma State University, USA
Abstract
Security of computers and the networks that connect them is increasingly becoming of great significance. Intrusion detection is a mechanism of providing security to computer networks. Although there are some existing mechanisms for Intrusion detection, there is need to improve the performance. Data mining techniques are a new approach for Intrusion detection. In this paper we investigate and evaluate the decision tree data mining techniques as an intrusion detection mechanism and we compare it with Support Vector Machines (SVM). Intrusion detection with Decision trees and SVM were tested with benchmark 1998 DARPA Intrusion Detection data set. Our research shows that Decision trees gives better overall performance than the SVM.
1. Introduction
Attacks on the nations computer infrastructures are becoming an increasingly serious problem. Computer security is defined as the protection of computing systems against threats to confidentiality, integrity, and availability [Sum97]. Confidentiality (or secrecy) means that information is disclosed only according to policy, integrity means that information is not destroyed or corrupted and that the system performs correctly, availability means that system services are available when they are needed. Computing system refers to computers, computer networks, and the information they handle. Security threats come from different sources such as natural forces (such as flood), accidents (such as fire), failure of services (such as power) and people known as intruders. There are two types of intruders: the external intruders who are unauthorized users of the machines they attack, and internal intruders, who have permission to access the system with some restrictions. The traditional prevention techniques such as user authentication, data encryption, avoiding programming errors and firewalls are used as the first line of defense for computer security. If a password is weak and is compromised, user authentication cannot prevent unauthorized use, firewalls are vulnerable to errors in configuration and ambiguous or undefined security policies. They are generally unable to protect against malicious mobile code, insider attacks and unsecured modems. Programming errors cannot be avoided as the complexity of the system and application software is changing rapidly leaving behind some exploitable weaknesses. Intrusion detection is therefore required as an additional wall for protecting systems. Intrusion detection is useful not only in detecting successful intrusions, but also provides important information for timely countermeasures.
Corresponding author email: ajith.abraham@ieee.org
An Intrusion is defined [Hlm90] as any set of actions that attempt to compromise the integrity, confidentiality or availability of a resource. This includes a deliberate unauthorized attempt to access information, manipulate information, or render a system unreliable or unusable. An attacker can gain illegal access to a system by fooling an authorized user into providing information that can be used to break into a system. An attacker can deliver a piece of software to a user of a system which is actually a trojan horse containing malicious code that gives the attacker system access. Bugs in trusted programs can be exploited by an attacker to gain unauthorized access to a computer system. There are legitimate actions that one can perform that when taken to the extreme can lead to system failure. An attacker can gain access because of an error in the configuration of a system. In some cases it is possible to fool a system into giving access by misrepresenting oneself. An example is sending a TCP packet that has a forged source address that makes the packet appear to come from a trusted host. Intrusions are classified [Sun96] as six types. 1. Attempted break-ins, which are detected by atypical behavior profiles or violations of security constraints. 2. Masquerade attacks, which are detected by atypical behavior profiles or violations of security constraints. 3. Penetration of the security control system, which are detected by monitoring for specific patterns of activity. 4. Leakage, which is detected by atypical use of system resources. 5. Denial of service, which is detected by atypical use of system resources. 6. Malicious use, which is detected by atypical behavior profiles, violations of security constraints, or use of special privileges. 1.1 Intrusion Detection The process of monitoring the events occurring in a computer system or network and analyzing them for sign of intrusions is known as Intrusion detection. Intrusion detection is classified into two types: misuse intrusion detection and anomaly intrusion detection. 1. Misuse intrusion detection uses well-defined patterns of the attack that exploit weaknesses in system and application software to identify the intrusions. These patterns are encoded in advance and used to match against the user behavior to detect intrusion. 2. Anomaly intrusion detection uses the normal usage behavior patterns to identify the intrusion. The normal usage patterns are constructed from the statistical measures of the system features, for example, the CPU and I/O activities by a particular user or program. The behavior of the user is observed and any deviation from the constructed normal behavior is detected as intrusion. We have two options to secure the system completely, either prevent the threats and vulnerabilities which come from flaws in the operating system as well as in the application programs or detect them and take some action to prevent them in future and also repair the damage. It is impossible in practice, and even if possible, extremely difficult and expensive, to write a completely secure system. Transition to such a system for use in the entire world would be an equally difficult task. Cryptographic methods can be compromised if the passwords and keys are stolen. No matter how secure a system is,
it is vulnerable to insiders who abuse their privileges. There is an inverse relationship between the level of access control and efficiency. More access controls make a system less user-friendly and more likely of not being used. An Intrusion Detection system is a program (or set of programs) that analyzes what happens or has happened during an execution and tries to find indications that the computer has been misused. An Intrusion detection system does not eliminate the use of preventive mechanism but it works as the last defensive mechanism in securing the system. Data mining approaches are a relatively new technique for intrusion detection. There are a wide variety of data mining algorithms drawn from the fields of statistics, pattern recognition, machine learning, and databases. Previous research of data mining approaches for intrusion detection model identified several types of algorithms as useful techniques. Classification is one of the data mining algorithms, which have been investigated as a useful technique for intrusion detection models. In this paper we investigate the decision tree as intrusion detection model. Comparing the decision tree model with already existing models shows its advantages and drawbacks. In this paper we compare decision tress with support vector machines. We investigate and evaluate intelligent systems as systems for Intrusion Detection. Our specific objectives are to investigate and test: 1. the decision tree as an intrusion detection model 2. compare and evaluate their performance with a support vector machines intrusion detection model
2. Literature Review
James Anderson [And80] first proposed that audit trails should be used to monitor threats. All the available system security procedures were focused on denying access to sensitive data from an unauthorized source. Dorothy Denning [Den87] first proposed the concept of intrusion detection as a solution to the problem of providing a sense of security in computer systems. The basic idea is that intrusion behavior involves abnormal usage of the system. The model is a rule-based pattern matching system. Some models of normal usage of the system could be constructed and verified against usage of the system and any significant deviation from the normal usage flagged as abnormal usage. This model served as an abstract model for further developments in this field and is known as generic intrusion detection model and is depicted in figure 1 [Kum 95].
Audit Trail/Network Packets/Application Trails
Event Generator
Assert New Rules Modify existing Rules
Update Profiles
Activity profile
Generate Anomaly Records
Rule Set
Clock
Generate New Profiles Dynamically
Figure 1: A Generic Intrusion Detection Model
Statistical approaches compare the recent behavior of a user of a computer system with observed behavior and any significant deviation is considered as intrusion. This approach requires construction of a model for normal user behavior. IDES (Intrusion Detection Expert System) [Lun90] first exploited the statistical approach for the detection of intruders. It uses the intrusion detection model proposed by Denning [Den87] and audit trails data as suggested in Anderson [And80]. IDES maintains profiles, which is a description of a subjects normal behavior with respect to a set of intrusion detection measures. Profiles are updated periodically, thus allowing the system to learn new behavior as users alter their behavior. These profiles are used to compare the user behavior and informing significant deviation from them as the intrusion. IDES also uses the expert system concept to detect misuse intrusions. This system has later developed as NIDES (Next-generation Intrusion Detection Expert System) [Lun93]. The advantage of this approach is that it adaptively learns the behavior of users, which is thus potentially more sensitive than human experts. This system has several disadvantages. The system can be trained for certain behavior gradually making the abnormal behavior as normal, which makes intruders undetected. Determining the threshold above which an intrusion should be detected is a difficult task. Setting the threshold too low results in false positives (normal behavior detected as an intrusion) and setting too high results in false negatives (an intrusion undetected). Attacks, which occur by sequential dependencies, cannot be detected, as statistical analysis is insensitive to order of events. Predictive pattern generation uses a rule base of user profiles defined as statistically weighted event sequences [Tcl90]. This method of intrusion detection attempts to predict future events based on events that have already occurred. This system develops sequential rules of the from E1 E2 E3 (E4 = 94%; E5 = 6%) where the various Es are events derived from the security audit trail, and the percentage on the right hand of the rule represent the probability of occurrence of each of the consequent events given the occurrence of the antecedent sequence. This would mean that for the sequence of observed events E1 followed by E2 followed by E3, the probability of event E4 occurring is 94% and that of E5 is 6%. The rules are generated inductively with an information theoretic algorithm that measures the applicability of rules in terms of coverage and predictive power. An intrusion is detected if the observed sequence of events matches the left hand side of the rule but the following events significantly deviate from the right hand side of the rule. The main advantages of this approach include its ability to detect and respond quickly to anomalous behavior, easier to detect users who try to train the system during its learning period. The main problem with the system is its inability to detect some intrusions if that particular sequence of events have not been recognized and created into the rules. State transition analysis approach construct the graphical representation of intrusion behavior as a series of state changes that lead from an initial secure state to a target compromised state. Using the audit trail as input, an analysis tool can be developed to compare the state changes produced by the user to state transition diagrams of known penetrations. State transition diagrams form the basis of a rule-based expert system for detecting penetrations, called the State Transition Analysis Tool (STAT) [Por92]. The STAT prototype is implemented in USTAT (Unix State Transition Analysis Tool) [Ilg92] on UNIX based system. The main advantage of the method is it detects the intrusions
independent of audit trial record. The rules are produced from the effects of sequence of audit trails on system state whereas in rule based methods the sequence of audit trails are used. It is also able to detect cooperative attacks, variations to the known attacks and attacks spanned across multiple user sessions. Disadvantages of the system are it can only construct patterns from sequence of events but not from more complex forms and some attacks cannot be detected, as they cannot be modeled with state transitions. Keystroke monitoring technique utilizes a users keystrokes to determine the intrusion attempt. The main approach is to pattern match the sequence of keystrokes to some predefined sequences to detect the intrusion. The main problems with this approach are lack of support from operating system to capture the keystroke sequences and also many ways of expressing the sequence of keystrokes for the same attack. Some shell programs like bash, ksh have the user definable aliases utility. These aliases make this technique difficult to detect the intrusion attempts unless some semantic analysis of the commands is used. Automated attacks by malicious executables cannot be detected by this technique as they only analyze the keystrokes. IDES [Lun90] used expert system methods for misuse intrusion detection and statistical methods for anomaly detection. IDES expert system component evaluates audit records as they are produced. The audit records are viewed as facts, which map to rules in the rule-base. Firing a rule increases the suspicion rating of the user corresponding to that record. Each users suspicion rating starts at zero and is increased with each suspicious record. Once the suspicion rating surpasses a pre-defined threshold, an intrusion is detected. There are some disadvantages to expert system method. An Intrusion scenario that does not trigger a rule will not be detected by the rule-based approach. Maintaining and updating a complex rule-based system can be difficult. The rules in the expert system have to be formulated by a security professional which means the system strength is dependent on the ability of the security personnel. Model-Based approach attempts to model intrusions at a higher level of abstraction than audit trail records. This allows administrators to generate their representation of the penetration abstractly, which shifts the burden of determining what audit records are part of a suspect sequence to the expert system. This technique differs from the rule-based expert system technique, which simply attempt to pattern match audit records to expert rules. Garvey and Lunts [Gl91] model-based approach consists of three parts: anticipator, planner and interpreter. The anticipator generates the next set of behaviors to be verified in the audit trail based on the current active models and passes these sets to the planner. The planner determines how the hypothesized behavior is reflected in the audit data and translates it into a system dependent audit trail match. The interpreter then searches for this data in the audit trail. The system collects the information this way until a threshold is reached, then it signals an intrusion attempt. The advantage of this model is it can predict the intruders next move based on the intrusion model, which is used to take preventive measures, what to look for next and verify against the intrusion hypothesis. This also reduces the data to be processed as the planner and interpreter filter the data based on their knowledge what to look for, which leads to efficiency. There are some drawbacks to this system. The intrusion patterns must always occur in the behavior it is looking for otherwise it cannot detect them. The Pattern Matching [Ks94] approach encodes known intrusion signatures as patterns that are then matched against the audit data. Intrusion signatures are classified
using structural interrelationships among the elements of the signatures. The patterned signatures are matched against the audit trails and any matched pattern can be detected as an intrusion. Intrusions can be understood and characterized in terms of the structure of events needed to detect them. A Model of pattern matching is implemented using colored petri nets in IDIOT [Ks95]. Intrusion signature is represented with Petri nets, the start state and final state notion is used to define matching to detect the intrusion. This system has several advantages. The system can be clearly separated into different parts. This makes different solutions to be substituted for each component without changing the overall structure of the system. Pattern specifications are declarative, which means pattern representation of intrusion signatures can be specified by defining what needs to be matched than how it is matched. Declarative specification of intrusion patterns enables them to be exchanged across different operating systems with different audit trails. There are few problems in this approach. Constructing patterns from attack scenarios is a difficult problem and it needs human expertise. Attack scenarios that are known and constructed into patterns by the system can only be detected, which is the common problem of misuse detection.
3. Data Mining Techniques for Intrusion Detection

Data mining is a relatively new approach for intrusion detection. Data mining is defined as [Gsr98] the semi-automatic discovery of patterns, associations, changes, anomalies, rules, and statistically significant structures and events in data. There exist many different types of data mining algorithms to include classification, link analysis, clustering, association, rule abduction, deviation analysis, and sequence analysis. Using these algorithms data mining extracts knowledge from the large data sets by analyzing them and presents it in the intrusion detection model. This approach considers the intrusion detection as data analysis process, whereas the previous approaches were knowledge engineering processes.
Figure 2: Data mining process of building Intrusion detection models
Data mining approaches for intrusion detection was first implemented in Mining Audit Data for Automated Models for Intrusion Detection (MADAMID) [Lsm98]. The data mining process of building intrusion detection models is depicted in the figure 2 [Lee99]. First raw data is converted into ASCII network packet information which in turn is converted into connection level information. These connection level records contain within connection features like service, duration etc. Data mining algorithms are applied to this data to create models to detect intrusions. Data mining algorithms used in this approach are RIPPER (rule based classification algorithm), meta-classifier, frequent episode algorithm and association rules. These algorithms are applied to audit data to compute models that accurately capture the actual behavior of intrusions as well as normal activities. The RIPPER algorithm [Coh96] was used to learn the classification model in order to identify normal and abnormal behavior. Frequent episode algorithm and association rules together are used to construct frequent patterns from audit data records. These frequent patterns represent the statistical summaries of network and system activity by measuring the correlations among system features and sequential co-occurrence of events. From the constructed frequent patterns the consistent patterns of normal activities and the unique intrusion patterns are identified and analyzed, and then used to construct additional features. These additional features are useful in learning the detection model more efficiently in order to detect intrusions. RIPPER classification algorithm is then used to learn the detection model. Meta classifier is used to learn the correlation of intrusion evidence from multiple detection models and produce combined detection model. The main advantage of this system is automation of data analysis through data mining, which enables it to learn rules inductively replacing manual encoding of intrusion patterns. However, some novel attacks may not be detected. Audit Data Analysis and Mining (ADAM) [Bcj01] combines association rules and classification algorithm to discover attacks in audit data. Association rules are used to gather necessary knowledge about the nature of the audit data as the information about patterns within individual records can improve the classification efficiency. This system has two phases, training phase and detection phase. In the training phase database of frequent item sets is created for the attack-free items from using only attack-free data set. This serves as a profile against which frequent item sets found later will be compared. Next a sliding-window, on-line algorithm is used to find frequent item sets in the last D connections and compares them with those stored in the attack-free database, discarding those that are deemed normal. In this phase classifier is also trained to learn the model to detect the attack. In the detection phase a dynamic algorithm is used to produce item sets that are considered as suspicious and used by the classification algorithm already learned to classify the item set as attack, false alarm (normal event) or as unknown. Unknown attacks are the ones which are not able to detect either as false alarms or as known attacks. This method attempts to detect only anomaly attacks. 3.1 Neural Networks Neural networks have been used both in anomaly intrusion detection as well as in misuse intrusion detection. For anomaly intrusion detection, neural networks were modeled to learn the typical characteristics of system users and identify statistically significant variations from the user's established behavior. In misuse intrusion detection
the neural network would receive data from the network stream and analyze the information for instances of misuse. In the first approach of neural networks [Dbs92] for intrusion detection, the system learns to predict the next command based on a sequence of previous commands by a user. Here a shifting window of w recent commands is used. The predicted command of the user is compared with the actual command of the user and any deviation is signaled as intrusion. If w is too small, there will be many false positives and if it is too big some attacks may not be detected. NNID (Neural Network Intrusion Detector) [Rlm98] identifies users based on the distribution of commands used by the user. This system has three phases. In the first phase it collects the training data from the audit logs for each user for some period and constructs a vector from that data to represent the command execution by each user. In the second phase, neural network is trained to identify the user based on these command distribution vectors. In the final phase the network identify the user for each new command distribution vector. If the networks identified user is different from the actual user, it signals anomaly intrusion. A neural network for misuse detection is implemented [Can98] in two ways. The first approach incorporates the neural network component into an existing or modified expert system. This method uses the neural network to filter the incoming data for suspicious events and forward them to the expert system. This improves the effectiveness of the detection system. The second approach uses the neural network as a stand alone misuse detection system. In this method, the neural network would receive data from the network stream and analyze it for misuse intrusion. There are several advantages to this approach. It has the ability to learn the characteristics of misuse attacks and identify instances that are unlike any which have been observed before by the network. It has high degree of accuracy to recognize known suspicious events. Neural network works well on noisy data. Inherent speed of neural networks is helpful in real time intrusion detection system. It has some problems also. The main problem is training of neural networks. The training phase requires very large amount of data. 3.2 Support Vector Machines Support Vector Machines [Mjs02] have been proposed as a novel technique for intrusion detection. A Support Vector Machine (SVM) maps input (real-valued) feature vectors into a higher dimensional feature space through some nonlinear mapping. SVMs are powerful tools for providing solutions to classification, regression and density estimation problems. These are developed on the principle of structural risk minimization [Vla95]. Structural risk minimization seeks to find a hypothesis h for which one can find lowest probability of error. The structural risk minimization can be achieved by finding the hyper plane with maximum separable margin for the data. Computing the hyper plane to separate the data points i.e. training a SVM leads to quadratic optimization problem [Vla95], [Joa98]. SVM uses a feature called kernel to solve this problem. Kernel transforms linear algorithms into nonlinear ones via a map into feature spaces. There are many kernel functions; some of them are Polynomial, radial basis functions, two layer sigmoid neural nets etc. The user may provide one of these functions at the time of training classifier, which selects support vectors along the surface of this function. SVMs classify data by using these support vectors, which are members of the set of training inputs that outline a hyper plane in feature space. The implementation of SVM intrusion detection system has two phases: training and testing. 8
The main advantage of this method is speed of the SVMs, as the capability of detecting intrusions in real-time is very important. SVMs can learn a larger set of patterns and be able to scale better, because the classification complexity does not depend on the dimensionality of the feature space. SVMs also have the ability to update the training patterns dynamically whenever there is a new pattern during classification. The main disadvantage is SVM can only handle binary-class classification whereas intrusion detection requires multi-class classification. 3.3 Decision Trees Decision tree induction is one of the classification algorithms in data mining. The Classification algorithm is inductively learned to construct a model from the preclassified data set. Each data item is defined by values of the attributes. Classification may be viewed as mapping from a set of attributes to a particular class. The Decision tree classifies the given data item using the values of its attributes. The decision tree is initially constructed from a set of pre-classified data. The main approach is to select the attributes, which best divides the data items into their classes. According to the values of these attributes the data items are partitioned. This process is recursively applied to each partitioned subset of the data items. The process terminates when all the data items in current subset belongs to the same class. A node of a decision tree specifies an attribute by which the data is to be partitioned. Each node has a number of edges, which are labeled according to a possible value of the attribute in the parent node. An edge connects either two nodes or a node and a leaf. Leaves are labeled with a decision value for categorization of the data. Induction of the decision tree uses the training data, which is described in terms of the attributes. The main problem here is deciding the attribute, which will best partition the data into various classes. The ID3 algorithm [Qun86] uses the information theoretic approach to solve this problem. Information theory uses the concept of entropy, which measures the impurity of a data items. The value of entropy is small when the class distribution is uneven, that is when all the data items belong to one class. The entropy value is higher when the class distribution is more even, that is when the data items have more classes. Information gain is a measure on the utility of each attribute in classifying the data items. It is measured using the entropy value. Information gain measures the decrease of the weighted average impurity (entropy) of the attributes compared with the impurity of the complete set of data items. Therefore, the attributes with the largest information gain are considered as the most useful for classifying the data items. To classify an unknown object, one starts at the root of the decision tree and follows the branch indicated by the outcome of each test until a leaf node is reached. The name of the class at the leaf node is the resulting classification. Decision tree induction has been implemented with several algorithms. Some of them are ID3 [Qun86] and later on it was extended into C4.5 [Qun93] and C5.0. Another algorithm for decision trees is CART [Bre84]. Of particular interest to this work is the C4.5 decision tree algorithm. C4.5 avoids over fitting the data by determining a decision tree, it handles continuous attributes, is able to choose an appropriate attribute selection measure, handles training data with missing attribute values and improves computation efficiency. C4.5 builds the tree from a set of data items using the best attribute to test in order to divide the data item into subsets and then it uses the same procedure on each sub set recursively. The best
attribute to divide the subset at each stage is selected using the information gain of the attributes. 3.3.1 Decision Trees as Intrusion Detection Model Intrusion detection can be considered as classification problem where each connection or user is identified either as one of the attack types or normal based on some existing data. Decision trees can solve this classification problem of intrusion detection as they learn the model from the data set and can classify the new data item into one of the classes specified in the data set. Decision trees can be used as a misuse intrusion detection as they can learn a model based on the training data and can predict the future data as one of the attack types or normal based on the learned model. Decision trees work well with large data sets. This is important as large amounts of data flow across computer networks. The high performance of Decision trees makes them useful in real-time intrusion detection. Decision trees construct easily interpretable models, which is useful for a security officer to inspect and edit. These models can also be used in the rule-based models with minimum processing. Generalization accuracy of decision trees is another useful property for intrusion detection model. There will always be some new attacks on the system which are small variations of known attacks after the intrusion detection models are built. The ability to detect these new intrusions is possible due to the generalization accuracy of decision trees. 4. Experimentation Setup And Performance Evaluation 4.1 Intrusion Data The KDD Cup 1999 Intrusion detection contest data [KDD99] is used in our experiments. This data was prepared by the 1998 DARPA Intrusion Detection Evaluation program by MIT Lincoln Labs [MIT]. They acquired nine weeks of raw TCP dump data. The raw data was processed into connection records, which are about five million connection records. The data set contains 24 attack types. All these attacks fall into four main categories. 1. Denial of Service (DOS): In this type of attacks an attacker makes some computing or memory resources too busy or too full to handle legitimate requests, or denies legitimate users access to a machine. Examples are Apache2, Back, Land, Mailbomb, SYN Flood, Ping of death, Process table, Smurf, Teardrop. 2. Remote to User (R2L): In this type of attacks an attacker who does not have an account on a remote machine sends packets to that machine over a network and exploits some vulnerability to gain local access as a user of that machine. Examples are Dictionary, Ftp_write, Guest, Imap, Named, Phf, Sendmail, Xlock. 3. User to Root (U2R): In this type of attacks an attacker starts out with access to a normal user account on the system and is able to exploit vulnerability to gain root access to the system. Examples are Eject, Loadmodule, Ps, Xterm, Perl, Fdformat. 4. Probing: In this type of attacks an attacker scans a network of computers to gather information or find known vulnerabilities. An attacker with a map of machines and services that available on a network can use this information to look for exploits. Examples are Ipsweep, Mscan, Saint, Satan, Nmap.
10
The original data contains 744 MB data with 4,940,000 records. The data set has 41 attributes for each connection record plus one class label. Some features are derived features which are useful in distinguishing normal connection from attacks. Some features examine only the connections in the past two seconds that have the same destination host as the current connection, and calculate statistics related to protocol behavior, service, etc. These are called same host features. Some features examine only the connections in the past two seconds that have the same service as the current connection and are called same service features. Same host and same service features are together called time-based traffic features of the connection level records. Some other connection records were also sorted by destination host, and features were constructed using a window of 100 connections to the same host instead of a time window. These are called host-based traffic features. R2L and U2R attacks dont have any sequential patterns like DOS and Probe because the former attacks have the attacks embedded in the data packets whereas the later attacks have many connections in a short amount of time. So some features that look for suspicious behavior in the data packets like number of failed logins are constructed and these are called content features. 4.2 Experimentation Setup and Results Analysis Our experiments have two phases, namely a training and a testing phase. In the training phase the system constructs a model using the training data to give maximum generalization accuracy (accuracy on unseen data). The test data is tested with the constructed model to detect the intrusion in the testing phase. We have written a C++ program for processing the data from 24 attacks to four classes of attacks. The main purpose of Intrusion detection models is to classify the data set into one of the four attack types or normal. The data set for our experiments contained 11982 records, which are randomly generated from the data set. This data set has five different classes, random generation of data include the number of data from each class proportional to its size, except that the smallest class is completely included. This data set is again divided into training data with 5092 records and testing with 6890 records. All the intrusion detection models are trained and tested with the same set of data. As the data set has five different classes we perform a 5-class classification. The normal data belongs to class1, probe belongs to class2, denial of service (DOS) belongs to class3, user to root (U2R) belongs to class4 and remote to local (R2L) belongs to class5. We used WEKA (Waikato Environment for Knowledge Analysis) software for decision trees and SVM. WEKA accepts the data in ARFF (Attribute-Relation File Format) file. An ARFF file is an ASCII text file format, which contains a list of instances containing a set of attributes. This ARFF file contains only nominal (categorical) and numeric values. Data should be first saved as comma-separated file format (.CSV), through spread sheets. The next step is to add the data sets name using @relation tag, the attribute information using @attribute tag and @data line. For nominal attributes, all possible values need to be defined at the start of the file. We have written some C++ programs to preprocess the data to obtain the values for all the nominal attributes. Our data set had seven nominal attributes. We used an AMD Athlon, 1.67 GHz processor with 992 MB of RAM for our experiments. 4.2.1 Decision Tree To compare decision tree performance with SVM which is a binary classifier, we used binary decision tree classifier although they are capable of handling a 5-class 11
classification problem. We constructed five different classifiers. The data is partitioned into the two classes of Normal and Attack patterns where Attack is the collection of four classes (Probe, DOS, U2R, and R2L) of attacks. The objective is to separate normal and attack patterns. We repeat this process for all the five classes. First a classifier is constructed using the training data and then testing data is tested with the constructed classifier to classify the data into normal or attack. Table 1 summarizes the results of the test data. It shows the training and testing times of the classifier in seconds for each of the five classes and their accuracy in percentage terms.
Training time (sec) Normal Probe DOS U2R R2L 1.53 3.09 1.92 1.16 2.36 Testing time (sec) 0.03 0.02 0.03 0.03 0.03 Accuracy (%) 99.64 99.86 96.83 68 84.19
Table 1: Performance of Decision Tree 4.2.2 Support Vector Machines As SVMs are able to handle only binary class classification problems, we need to employ five SVMs, for the 5-class classification intrusion detection. We divided the data into the two classes of Normal and Attack patterns, where the Attack is collection of four classes of attacks (Probe, DOS, U2R, and R2L). The classifier is learned from the training data and it is used on the test data to classify the data into normal or attack patterns. This process is repeated for all classes. The results are summarized in the table 2. It shows the training time and testing time in seconds for each of the five classes and their accuracy in percentage terms.
Training time (sec) Normal Probe DOS U2R R2L 5.02 1.33 19.24 3.05 2.02 Testing time (sec) 0.13 0.13 2.11 0.95 0.13 Accuracy (%) 99.64 98.57 99.78 40.00 34.00
Table 2: Performance of the SVM Kernel option defines the feature space in which the training set examples will be classified. Both our trial and error experiments and a previous study [Aa02] showed that 12
polynomial kernel option often performs well on most of the datasets. Therefore we decided to use the polynomial kernel for our experiments. From our experiments we observed that for different class of data, different polynomial degrees gives different performance and the results are presented in the table 3. We therefore used different polynomial degrees for different classes.
Polynomial Degree Normal Probe DOS U2R R2L 1 99.64 98.57 70.99 40.00 33.92 2 99.64 64.85 99.92 40.00 31.44 3 99.64 61.72 99.78 40.00 28.06
Table 3: Classification accuracy of different polynomial kernel degrees 4.2.3 Comparison of Decision tree with SVM To evaluate the performance of decision tree intrusion detection model we compared it with SVM in terms of accuracy, training and testing times and we summarize the results in table 4. Decision tree gives better accuracy for Probe, R2L and U2R classes compared to SVM and it gives worse accuracy for DOS class of attacks. For Normal class both gives the same performance. There is a small difference in the accuracy for Normal, Probe and DOS classes for decision trees and SVM but there is a significant difference for U2R and R2L classes. These two classes have small training data compared to other classes, so we can conclude that decision tree gives good accuracy with small training data sets. The training time and testing times are also less for decision tree compared to the SVM.
Class Normal Probe DOS U2R R2L Decision Tree Training Testing time(s) time(s) 1.53 3.09 1.92 1.16 2.36 0.03 0.02 0.03 0.03 0.03 Accuracy (%) 99.64 99.86 96.83 68.00 84.19 SVM Training Testing time(s) time(s) 5.02 1.33 19.24 3.05 2.02 0.13 0.13 2.11 0.95 0.13 Accuracy (%) 99.64 98.57 99.78 40.00 34.00
Table 4: Performance Comparison of Decision tree with SVM The graph in figure 3 shows the performance of decision tree and SVM in terms of accuracy for the R2L class of data. Data set of R2L class contains 563 data points and
13
as it is difficult to represent all of them in the graph 30 data points were used. The classification value of 1 in the graph represents a correct classification and value of 2 represents a misclassification. The graph shows that SVM misclassification was much more than the decision tree which classified most of them correctly. We can conclude that a decision tree gives very good performance when compared to SVM for R2L class data.
2.5 2
Classification
1.5 1 0.5 0
10
13
16
19
22
Data Points
Actual Decision tree SVM
Figure 3: Performance Comparison of Decision tree with SVM
5. Conclusions
In this research we have investigated some new techniques for intrusion detection and evaluated their performance on the benchmark KDD Cup 99 Intrusion data. We have first explored a decision tree as an intrusion detection model. We also conducted experiments with support vector machines (SVM) and compared the decision tree performance with this model. As the decision tree was used as a binary classifier, we employed five classifiers for 5-class classification. The empirical results indicate that decision tree gives better accuracy than SVM for Probe, U2R and R2L classes whereas for Normal class both gives same accuracy and for DOS class decision tree gives slightly worse accuracy than decision tree. From empirical results of U2R and R2L classes which have small training data and for which decision tree gives better performance than SVM, we can say that decision tree works well with small training data. The results also show that testing time and training time of the classifiers are slightly better than SVM.
14
25
28
Moreover, decision tree is capable of multi-class classification which is not possible with SVM. Multi-class classification is a very useful feature for intrusion detection models. With the increasing incidents of cyber attacks, building an effective intrusion detection models with good accuracy and real-time performance are essential. Data mining is relatively new approach for intrusion detection. More data mining techniques should be investigated and their efficiency should be evaluated as intrusion detection models. Future work would include a hybridization approach with various different models in order to overcome their individual limitations and improve the performance of the model from their complementary features. The Ensemble approach should also be investigated with various combinations of classifiers to improve the performance. References
[Aa02] A. B. M. S. Ali, A. Abraham. An Empirical Comparison of Kernel Selection for Support Vector Machines. 2nd International Conference on Hybrid Intelligent Systems: Design, Management and Applications, The Netherlands, 2002. [And80] J. P. Anderson. Computer Security Threat Monitoring and Surveillance. Technical report, James P Anderson Co., Fort Washington, Pennsylvania, April 1980. [Bcj01] D. Barbara, J. Couto, S. Jajodia and N. Wu. ADAM: A Testbed for Exploring the Use of Data Mining in Intrusion Detection. SIGMOD Record, 30(4):15--24, 2001. [Bre84] L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classification of Regression Trees. Wadsworth Inc., 1984. [Can98] J. Cannady. Artificial Neural Networks for Misuse Detection. National Information Systems Security Conference, 1998. [Coh96] William Cohen. Learning Trees and Rules with Set-Valued Features. American Association for Artificial Intelligence (AAAI), 1996. [Den87] D. E. Denning. An Intrusion Detection Model. In IEEE Transactions on Software Engineering, pp. 222-228, February 1987. [Dbs92] H. Debar, M. Becke, D. Siboni. A Neural Network Component for an Intrusion Detection System. Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy, 1992. [Gl91] T. D. Garvey and T. F. Lunt. Model based intrusion detection. In Proceedings of the 14th National Computer Security Conference, pages 372-385, October 1991. [Gsr98] R. Grossman, S. Kasif, R. Moore, D. Rocke, and J. Ullman. Data Mining Research: Opportunities and Challenges, A report of three NSF workshops on Mining Large, Massive, and Distributed Data, January 1998. [Hlm90] R. Heady, G. Luger, A. Maccabe, and M. Servilla. The Architecture of a Network level Intrusion Detection System. Technical report, Department of Computer Science, University of New Mexico, August 1990. [Joa98] Joachims T. Making Large-Scale SVM Learning Practical. LS8-Report, University of Dortmund, LS VIII-Report, 1998.
15
[Ilg92] K. Ilgun. USTAT: A Real-Time Intrusion Detection System for UNIX. Master Thesis, University of California, Santa Barbara, November 1992. [KDD99] KDD cup 99 Intrusion detection data set. <http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz> [Ks94] S. Kumar and E. H. Spafford. An Application of Pattern Matching in Intrusion Detection. Technical Report CSD-TR-94-013, Purdue University, 1994. [Ks95] S. Kumar. Classification and Detection of Computer Intrusions. PhD Thesis, Department of Computer Science, Purdue University, August 1995. [Lee99] W. Lee. A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems. PhD Thesis, Computer Science Department, Columbia University, June 1999. [Lsm98] W. Lee and S. Stolfo. Data Mining Approaches for Intrusion Detection. In proceedings of the 7th USENIX Security Symposium, 1998. [Lsm99] W. Lee and S. Stolfo and K. Mok. A Data Mining Framework for Building Intrusion Detection Models. In Proceedings of the IEEE Symposium on Security and Privacy, 1999. [Lun90] T.F. Lunt, A. Tamaru, F. Gilham et al, A REAL-TIME INTRUSIONDETECTION EXPERT SYSTEM (IDES), Final Technical Report, Project 6784, SRI International 1990 [Lun93] T. Lunt. Detecting intruders in computer systems. In Proceedings of the 1993 Conference on Auditing and Computer Technology, 1993. [MIT] MIT Lincoln Laboratory. <http://www.ll.mit.edu/IST/ideval/> [Mjs02] S. Mukkamala, G. Janoski, A. Sung. Intrusion Detection Using Neural Networks and Support Vector Machines. Proceedings of IEEE International Joint Conference n Neural Networks, pp.1702-1707, 2002 [Por92] P. A. Porras. STAT: A State Transition Analysis Tool for Intrusion Detection. Masters Thesis, Computer Science Dept., University of California, Santa Barbara, July 1992. [Qun86] J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1:81-106, 1986. [Qun93] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [Rlm98] J. Ryan, M. J. Lin, R. Miikkulainen. Intrusion Detection with Neural Networks. Advances in Neural Information Processing Systems 10, Cambridge, MA: MIT Press, 1998. [Sum97] R. C. Summers. Secure Computing: Threats and Safeguards. McGraw Hill, New York, 1997. [Sun96] A. Sundaram. An Introduction to Intrusion Detection. ACM Cross Roads, Vol. 2, No. 4, April 1996. [Tcl90] H. S. Teng, K. Chen and S. C. Lu. Security Audit Trail Analysis Using Inductively Generated Predictive Rules. In Proceedings of the 11th National Conference on Artificial Intelligence Applications, pages 24-29, IEEE, IEEE Service Center, Piscataway, NJ, March 1990. [Vla95] Valdimir V. N. The Nature of Statistical Learning Theory, Springer, 1995.
16

Intrusion Detection Systems Using Decision Trees and Support Vector Machines

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intrusion Detection Systems Using Decision Trees and Support Vector Machines

Uploaded by

Copyright:

Available Formats

Intrusion Detection Systems Using Decision Trees and Support Vector Machines

Sandhya Peddabachigari, Ajith Abraham*, Johnson Thomas

Corresponding author email: ajith.abraham@ieee.org

Assert New Rules Modify existing Rules

Figure 1: A Generic Intrusion Detection Model

3. Data Mining Techniques for Intrusion Detection

Figure 2: Data mining process of building Intrusion detection models

Figure 3: Performance Comparison of Decision tree with SVM

You might also like