You are on page 1of 24

ABSTRACT

Botnets have become one of the major forms of malware on the internet that has become cause of great concern for cyberspace. Recent malicious attempts are intended to get financial benefits through a large pool of compromised hosts, which are called software robots or simply bots. A group of bots, referred to as a botnet, is remotely controlled by a server and can be used for sending spam mails, stealing information like passwords, and launching DDoS attacks. A study shows that about 40% of all computers connected to the internet in the world are infected by bots and are in control of attackers. In this dissertation work, we propose a botnet detection mechanism by monitoring group activities in DNS traffic. Our botnet detection mechanism is more robust so that the bots can be easily detected by looking at their group activities in DNS traffic. The proposed mechanism can detect botnets effectively while bots are connecting to their server.

CONTENTS Chapter Page No.

1. INTRODUCTION 1.1 1.2 1.3

....................................................................................... 4 ........................................................................... 4

WHAT IS A BOTNET?

GROWTH OF BOTNET ........................................................................... 5 BOTNET ATTACKS AND THEFTS ................................................... 5

2. ARCHITECTURE OF BOTNET 2.1 2.2

............................................................... 6

BOTNET LIFE CYCLE ........................................................................... 6 COMMAND AND CONTROL ARCHITECTURE 2.2.1 2.2.2 2.2.3 CENTRALIZED C&C MODEL P2P-BASED C&C MODEL ........................... 7

................................................... 7 ................................................... 8

HIERARCHICAL C&C MODEL ................................................... 8 ............................................................... 9 ................................................... 9 ....................................... 9

2.3

RALLYING MECHANISM 2.3.1 2.3.2 2.3.3

HARD-CODED IP ADDRESS

DYNAMIC DNS DOMAIN NAME DISTRIBUTED DNS SERVICE

................................................... 9

3. BOTNET DETECTION MECHANISMS 3.1 3.2 3.3 3.4 SIGNATURE BASED DNS BASED

................................................... 10

........................................................................... 10

....................................................................................... 10 ........................................................................... 11 ........................................................................... 11

ANOMALY BASED RELATED WORK

4. ISSUES RELATED TO BOTNET 4.1 4.2 4.3

................................................................ 12

RALLY PROBLEM AND IRC SERVER .................................................... 12 C&C SERVER MIGRATION ................................................................ 12

FEATURES OF BOTNET DNS ................................................................ 12

5. PROPOSED ALGORITHM 5.1

............................................................................ 14 ............................ 14

BOTNET DNS QUERY DETECTION ALGORITHM

6. IMPLEMENTATION AND TESTING 6.1 6.2 6.3 DATA SET

.................................................... 18

........................................................................................ 18

IMPLEMENTATION AND TESTING .................................................... 19 RESULTS .................................................................................................... 19

7. CONCLUSION AND FUTURE WORK

..................................................... 22

REFERENCES

..................................................................................................... 23

CHAPTER - 1 INTRODUCTION The latest threat to internet technology are the Botnets. In this report, we are going to explore todays most intriguing manifestation of advanced malware: the BOTNET. A Botnet is a group of computers that have been compromised, and run a remote control bot application. The bot herder will send commands to the droves of compromised systems, which will gleefully obey.
Botnets are being used increasingly to perform crimes by proxy, and can often go undetected within systems, waiting for a trigger event. The main purpose of the botnet is to construct a malicious attacking platform that is manipulated by the botmaster. 1.1 WHAT IS A BOTNET?

A botnet is a collection of compromised computers, termed bots, that are used for malicious purposes. A computer becomes a bot when it runs a file, typically from a drive-by download, that has bot software embedded in it. Botnets are controlled via protocols such as IRC and HTTP. The term bot comes from the word robot, because a robot is a machine or device that operates automatically by remote control. Todays bots are written in C, C++, Delphi, and Perl. A bot is considered malware. Malware i.e. a malicious software is a generic term for a range of software programs that are designed usually for harmful purposes. The main purpose of bots is to infect systems and use these compromised computers for malicious purposes. A bot has many different ways of infecting someones computer. The owner of the personal computer may not know his/her computer has been infected and converted into a bot. If one infection has occurred, the possibility of many more bots infecting the same system still exists. 4

1.2 GROWTH OF BOTNET A botnet is a large pool of compromised hosts that are controlled by a botmaster. Recent botnets use the Internet Relay Chat (IRC) server as their C&C server for controlling the botnet. Botmaster can disperse commands to his botnet by the use of the IRC C&C channel. It was shown that most botnets use the IRC for C&C process, however the traffic among bots, the C&C sever and the botmaster can be considered as legitimate traffic because it is hard to distinguish from normal traffic. The size and prevalence of the botnet reported as many as 172,000 new bots recruited every day according to CipherTrust[2], which means about 5 million new bots are appeared every month. Symantec [3] recently reported that the number of bots observed in a day is 30,000 on average. The total number of bot infected systems has been measured to be between 800,000 to 900,000. A single botnet comprised of more than 140,000 hosts was found in the wild and botnet driven attacks have been responsible for single DDoS attacks of more than 10Gbps capacity. 1.3 BOTNET ATTACKS AND THEFTS The various types of attacks[4] by Botnets are as follows: Denial-of-Service attacks Adware Spyware Spamming Click Fraud Access number replacements Fast-Flux

CHAPTER 2 ARCHITECTURE OF BOTNET 2.1 BOTNET LIFE CYCLE There are four phases [5] of botnet creation and maintenance: Initial Infection, Secondary Injection, Malicious Activity and Maintenance and Upgrade.

Figure.1 - A General Botnet Life-Cycle [6] Initial infection is the exploit that an attacker uses to get the bot software running on the host computer for the first time. In Secondary injection, the bot running on the infected host receives commands from the botmaster via the command-and-control network. It then autonomously carries out whatever malicious activities the secondary injection specifies, including spreading itself to vulnerable peers, and occasionally reports in to the botmaster for maintenance and upgrades, that is, for updates to the mandate of the secondary injection.

2.2 COMMAND AND CONTROL ARCHITECTURE The backbone of botnet is command and control channel; which is responsible for setting up the botnet, controlling the activities of the bots, issuing commands, and ultimately
reaching the goals. The command and control channel is stable during the operation of botnets i.e. once a botnet is established; the command and control channel remain the same throughout its operation. But on the other hand, once a C&C channel is detected then the whole botnet is exposed.

Typical botnet C&C models are: 2.2.1 CENTRALIZED C&C MODEL: All bots communicate with the same server. If the C&C server is unavailable for whatever reason, such as hardware or software failure or shutdown or because it is blocked by authorities, the botnet will be neutralised. The bots still run hidden on infected machines however they would be unable to receive commands and send data to the bot master effectively making them useless[7].

Figure.2 - Centralized C&C Botnet Model [7]

2.2.2 P2P BASED C&C MODEL:


Commands in a p2p structure are introduced to a bot in the botnet and it is then propagated to all the other agents. The advantage of this structure is that it is very hard to shut down because there is no centralised C&C server but the weaknesses are that communication can be unpredictable causing latency in the commands being received and the IP address for the whole botnet can be exposed.

Figure.3 - P2P C&C Botnet Model [7]

2.2.3

HIERARCHICAL C&C MODEL: Hierarchical C&C model uses a subordinate structure employing bot agents as

proxies in order to distribute commands to the entire botnet. The advantage of this is that a single bot does not know the location and number of the whole botnet. However, the disadvantage is that because there is no direct contact with every bot there can be a great deal of command latency[7].

Figure.4 - Hierarchical C&C Botnet Model [7] 2.3 RALLYING MECHANISM


The Rallying Mechanism defines how the bot is able to locate the C&C server(s). There are three main methods[7]:

2.3.1

Hard coded IP addresses of C&C servers are embedded in the bots binary files. This type of rally mechanism is very easy to detect and block using firewalls and because of this, recent bots do not use this method.

2.3.2

Dynamic DNS Domain Name, a bot master buys a domain name and the bots use this domain to obtain the IP address of a C&C server. This has become more widely used because authorities can often shutdown botnet C&C servers. With this method the bot master can setup a new server and update DNS to point to the new server. Domain Flux can also be used by this method.

2.3.3

Distributed DNS Service is when a bot master runs their own DNS server which resolves the IP addresses of the C&C servers. Bots have the IP address of the DNS Server hard coded in the binary files. This method is the most difficult to detect.

CHAPTER 3 BOTNET DETECTION MECHANISMS These techniques can be classified into three categories: 3.1 SIGNATURE BASED: Signature-based botnet detection technique uses the signature and behaviours of existing botnets for its detection. The basic idea is to extract feature information on the packets from the traffic and march the patterns registered in the knowledge base of existing bots. Apparently, it is easy to carry-on by simply comparing every byte in the packet, but it also goes with several drawbacks. Firstly, it is unable to identify the undefined bots. Second, it should always update the knowledge base with new signatures, which enhances the management cost and reduces the performance. Third, new bots may launch attacks before they are patched in the knowledge base. For example, Snort [10] is an open source IDS that monitors network traffic to find signs of intrusion by searching matches based on the predefined set of rules and signatures. A major weakness of the signature based detections is that they are limited to detect only the known botnets. 3.2 DNS BASED: DNS based detection techniques are performed on DNS traffic. The significant robustness and dramatic potential threat of FFSN make it necessary to emphasize the detection algorithms on the DNS traffic. For a botmaster to maintain and hide its bots, DNS queries have been implemented in multiple botnet stages, such as the rallying process after infection, malicious attack initiation, and C&C server update. There are two major factors to distinguish botnet DNS queries from legitimate DNS queries.

10

DNS-based detection techniques are based on DNS information generated by a botnet. In order to access the C&C server bots carry out DNS queries to locate the particular C&C server that is typically hosted by a DDNS(Dynamic DNS) provider. Therefore, it is feasible to detect Botnet DNS traffic by DNS monitoring and detect DNS traffic anomalies. 3.3 ANOMALY BASED: Different from normal internet traffic, botnets often generates high volume of traffic that may cause high network latency, and traffic on unusual ports. These network traffic anomalies along with other unique botnet behaviors have been utilized for botnet detection. 3.4 RELATED WORK There have been a few researches and analysis about the bot and botnet such as their behaviors, statistics, and traffic measurements. Dagon et al. [12] identified key metrics for measuring the utility of a botnet, and describe various topological structures botnet may use to coordinate attacks. And using the performance metrics, they consider the ability of different response techniques to degrade or disrupt botnets. Their study used DNS redirection to monitor botnets. Binkley [9] proposed an anomaly-based algorithm for detecting IRC-based botnet meshes. The algorithm combines an IRC mesh detection component with a TCP scan detection heuristic called the TCP work weight. They can detect IRC channel with high work weight host but some of them could not be a member of botnet (false positive), additional analysis for many borderline cases as they mentioned in the paper. Botnets are constructed and managed in several stages such as bot infection, C&C server rallying, and other types of malicious activities. Defense against botnet attacks seems to be a very complicated task. Only a few of works have been done in this area, but we need further improvements for the purpose of practical use. 11

CHAPTER 4 ISSUES RELATED TO BOTNET 4.1 RALLY PROBLEM AND IRC SERVER The main problem of a botmaster is rallying ie. how to rally the infected hosts. Botmaster wants their botnets to be invisible and portable and therefore, they uses DNS for rallying. It is possible to use other method for rallying the bots, however most of them cannot provide both mobility and invisibility at the same time. 4.2 C&C SERVER MIGRATION If a botnet uses only a single C&C server, the botnet could easily be detected. So, a botmaster uses several C&C servers and uses Dynamic DNS (DDNS) which is a resolution service that automatically perceives the change of the IP address of a server and substitutes the DNS record by frequent updates and changes, for keeping the botnets portable[13]. Even though the root C&C server cannot operate well or link failure occurred, candidate C&C servers can be a feasible substitution for the root C&C server. So, a botmaster frequently migrates its C&C server. 4.3 FEATURES OF BOTNET DNS As, infected hosts automatically access the C&C server with its domain name. So, DNS RR (resource record) query is used.

Figure.5 - Differences between Botnet and Legitimate DNS [13] 12

First, only botnet members send queries to the domain name of C&C server(fixed size), legitimate users never query to the C&C server domain name. Therefore, the number of different IP address which queried botnet domain are normally fixed. On the other hand, the legitimate sites are queried from anonymous users (random) at usually. Second, the fixed members of botnet act and migrate together at the same time. The group activity of botnet derived from this property. DNS queries from botnet occur temporary and simultaneously. However, most of legitimate DNS queries occur continuously and do not occur simultaneously. The botnet queries appears at specified situations which mentioned above, so they appear intermittently. Third, the botnet uses DDNS for C&C server usually, but legitimate sites do not commonly use DDNS.

13

CHAPTER 5 PROPOSED ALGORITHM In this project, we developed a botnet DNS query detection algorithm by using the different features of botnet DNS and legitimate DNS which are mentioned above in Figure-5. 5.1 BOTNET DNS QUERY DETECTION ALGORITHM The algorithm is divided into three parts: 1. INSERT-DNS-QUERY: In this stage of algorithm, we create a database for storing DNS query data which include source IP address of the query, domain name of the query and timestamp of the query received. We are grouping the DNS query data by the domain name and timestamp.

Figure.6 Insert-DNS-Query This stage of the algorithm works as follows: 1. First, there is a storage file for storing the queries.

14

2. We insert the domain name and source IP (which accessed that domain name) for each time period of one hour (t). 3. It means that if time period is of one hour duration, then for each timeperiod (t): a. Insert the domain name which is accessed in that timeperiod and for each domain create an IP list which contains the source IP addresses which accessed that particular domain name. b. There should not be any duplicate domain name and for each unique domain name there should not be any duplicate IP address in the IP list. 4. In this way, our Insert-DNS-Query is implemented. To implement this stage we worked as follows: 1. Let the input data set be named.log and the timeperiod be one hour. 2. While(named.log) 3. Take the first line from the input file and extract the time (hour), source IP address and query (i.e. domain name) 4. Create a new file for each timeperiod(i.e. for each hour) and put the domain name and IP address into a string and insert the string into the newly created file. 5. End of while 6. For each newly created file: 7. If there is any string containing same domain name and same IP address appeared more than one then delete that string. 8. End of for loop.

2. DELETE-DNS-QUERY This stage of algorithm is executed for removing redundant DNS queries. This stage works as follows:

15

First, we created a whitelist file named as legi.txt containing some legitimate queries, as shown below:

Figure.7 whitelist file: legi.txt This file is simply a text file containing whitelisted domain names inserted manually, one domain name in one line. Then, 1. For each newly created file (i.e. for each time period) in the previous stage: 2. Match the domain name with the whitelisted domain names in legi.txt 3. If any domain name is found in legi.txt, then delete all instances of that domain name from the file. 4. Else if the size of IP list for a domain name (i.e. the no. of occurrences of domain name) do not exceed the threshold defined then delete all instances of that domain name from the file. 5. End of for loop. This stage of algorithm reduces the processing overhead and saves the memory.

3. DETECT-BotDNS-QUERY This is the final stage of our algorithm. This stage detects the BotDNS query. The basic idea behind this procedure is that suppose there are two timeperiods t1 and t2 each of one hour duration. Then if there is a same domain name query in both t1 and t2, then we will calculate the no. of IPs which accessed these queries for each t1 and t2. 16

Suppose, in timeperiod t1, the no. of IPs is A for that same domain name and it is B in t2 for that same domain name query. Now, we will calculate the no. of same IP addresses which accessed that domain name in both t1 and t2. Suppose, no.of common IP addresses is C. Then, We calculate the similarity (S) using the following formula:

S = 0.5*[(C/A) + (C/B)] OR S = -1

(A 0 , B 0)

if A = 0 or B = 0

If the calculated similarity (S) is equal to -1 then it means the particular domain name is appeared only once, so we cannot give any result about this query. So, we put this query into a newly created file blacklist.txt. If S is greater than or equal to pre defined similarity threshold then we can directly say that this query is a BotDNS query and we put it into a newly created file botlist.txt. Else we put the query into another file whitelist.txt.

Actually the basic idea behind this is given below: Assume that there is a domain name DN which is requested by multiple source IP addresses in a certain time t, then we measure how many source IP addresses of them request DN after t in each time slot. Thus we implemented this stage, and the input for this stage are the each timeperiod file produced after executing the previous stage of the algorithm i.e. Delete-DNS-Query. Thus, the whole algorithm is implemented.

17

CHAPTER 6 IMPLEMENTATION AND TESTING 6.1 DATA - SET We have taken an offline dataset which is actually a DNS log file from our Tezpur University server which is of the following format:

Figure.8 Format of DNS log file (input data-set): named.log The log file is named as named.log. Each line of log file contains the month name (e.g. Nov), date, timestamp (i.e. time at which particular query is requested), server name, clients IP address (i.e. source IP address), and query (i.e. domain name) requested in the same order it is described. The order is shown above in Figure-6. After the data set is taken, we will perform our DNS Query Detection Algorithm onto this data set and calculate our results.

18

6.2 IMPLEMENTATION AND TESTING In order to evaluate the proposed algorithm, we have taken four sets of offline DNS log files from Tezpur University server. Each log file is of six hours duration. The format of log file is shown in Figure-6. Also we have taken the filtered result from the server which showed that how many BotDNS queries are contained in those log files. To implement our algorithm we have taken the timeperiod of one hour duration, threshold size of IP list is 5 and similarity threshold is 0.8 because it is the adequate value which is between a similarity of botnet domain and a maximum similarity of legitimate domains. We have tested our algorithm for each log file and found our results. Each log file of six hour duration contains about 500000 queries due to which it took about 5 to 6 hours processing of algorithm for each log file. Each log file contains 13 to 14 BotDNS queries originally checked with available original result. Our algorithm found about 92% to 94% of all the BotDNS queries present in each log file with no false result. Thus we can say that our algorithm is 92% feasible.

6.3 RESULTS The results of one of the log files are shown below:

19

Figure.9 input data set : named.log This input log file contained 534496 queries. It is of Nov.21, 2011 from 1p.m. to 7p.m. And the whitelist file for matching is shown below:

Figure.10 whitelist input file : legi.txt We have taken 31 whitelisted domain names for testing our algorithm in legi.txt.

20

Now, This log file contained 13 BotDNS Queries in original and our algorithm found 12 queries in this log. It shows that the algorithm gives 92% results. The three output files are:

Figure.11 botlist.txt : output file contained BotDNS queries

Figure.12 blacklist.txt : output file contained the blacklisted queries (i.e. those queries which appeared only once in the whole log file)

Figure.13 whitelist.txt : output file contained legitimate queries with size threshold greater than 5 21

CHAPTER 7 CONCLUSION AND FUTURE WORK Since 1989, botnets have evolved from the benign assistant tool to the predominant threat in the present day internet. Although the number of bots to each botnet seems to be decreasing, the monetarily damaging power of the botnets is continuously increasing given the development of internet bandwidth. Instead of using a centralized, IRC based C&C channel to perform multiple nefarious attacks, the botnets have been gradually developed into more complicated, stealthy, and module-based packages which perform particular malicious activity with diverse C&C protocols and structures. The proposed algorithm of botnet DNS query detection enables us to distinguish the botnet. The proposed algorithm is 92% feasible. However, there is another approach which can detect botnets migrating to another C&C server. Therefore, we have to develop the migrating botnet detection algorithm with modifying the botnet DNS query detection algorithm.

22

REFERENCES [1] C. A. Schiller, J. Binkley, D. Harley, G. Evron, T. Bradley, C. Willems, M. Cross,

BOTNETS: The Killer Web Application, copyright 2007 Syngress Publishing, Inc., a division of Elsevier, Inc. [2] [3] [4] [5] Ciphertrust, secure computing, http://www.ciphertrust.com Symantec Co., http://www.symantec.com http://en.wikipedia.org/wiki/Botnet Z. Zhu, G. Lu and Y. Chen, Botnet Research Survey In Proc. of the Annual IEEE

International Computer Software and Applications Conference, July 2008. [6] X. Zang, A. Tangpong, G. Kesidis and D. J. Miller, Botnet Detection through Fine

Flow Classification, CSE Dept Technical Report No. CSE11-001, Departments of CS&E and EE, The Pennsylvania State University, PA, Jan. 31, 2011. [7] A. Shaikh, Botnet Analysis and Detection System, Technical Report, School of

Computing, Edinburgh Napier University, November 2010. [8] F. Naseem, M. Shafqat, U. Sabir and A. Shahzad , A Survey of Botnet Technology

and Detection In the proceedings of the International Journal of Video & Image Processing and Network Security IJVIPNS-IJENS Vol: 10 No: 01, 2009. [9] A. V. Barsamian, Network Characterization For Botnet Detection Using Statistical-

Behavioral methods, Technical Report, Thayer School of Engineering, Dartmouth College, Hanover, New Hampshire, June, 2009. [10]
http://www.snort.org

23

[11]

J. Binkley and S. Singh, An algorithm for anomaly-based botnet detection, In

Proceedings of USENIX (SRUTI), 2006. [12] D. Dagon, C. Zou, and W. Lee, Modeling botnet propagation using time zones, In

NDSS 2006, Feb 2006. [13] H. Choi, H. Lee, H. Lee, H. Kim, Botnet Detection by Monitoring Group Activities In Proc. of the Seventh International Conference on Computer and

in DNS Traffic,

Information Technology, October 2007.

24

You might also like