Professional Documents
Culture Documents
Security views
1.
Malware update
2.
Ang Chiong Teck, a student at Nanyang Technological University in Singapore, has received a jail sentence of four months
157
158
3.
More compromises of personal and
financial information occur
An Ameriprise Financial employees laptop that contained
customer information was stolen out of a car. Ameriprise
Financial has sent letters to 158,000 customers informing
them accordingly. No customer SSNs were stored on the laptop, but unfortunately a file that contained the names and
SSNs of 68,000 current and former financial advisers was.
Providence Home Services is notifying 265,000 current and
former patients that their medical information fell into unauthorized hands when disks and tapes containing this information were pilfered from the car of an employee. Information
about numerous current and former employees was also on
the stolen disks and tapes. No evidence that the stolen information has been used for identity fraud purposes exists. Having
employees take home disks and tapes is a standard business
continuity-related procedure for Providence Home Services.
Providence Home Services has set up a hotline to answer inquiries from those whose information was compromised.
A perpetrator gained unauthorized access to a computer at
the University of Delawares School of Urban Affairs and Public Policy; SSNs of 159 graduate students were stored on that
system. Additionally, someone pilfered a backup hard drive
from the Universitys Department of Entomology and Wildlife
Ecology; the hard drive contained personal data. The university has notified all affected individuals.
159
4.
Violence Against Women and DOJ
Reauthorization Act bans annoying
postings and messages
By signing the Violence Against Women and Department of
Justice Reauthorization Act, President Bush also signed into
160
5.
Proposed US legislation would require
deletion of personal information on US
Web sites
Proposed federal legislation currently being considered by the
US Congress would require every US Web site to delete all information about visitors to the site, including names, street and
email addresses, telephone numbers, and so forth, if the information is no longer needed for a bona fide business reason. The
provisions for personal information deletion in the proposed
legislation, the Eliminate Warehousing of Consumer Internet
Data Act of 2006, are intended to fight identity theft because
Web sites that contain personal information are often major
targets for computer criminals. Some speculate that one effect
of this requirement would be to reduce concern about search
engines storing information about users search terms, something for which the US Department of Justice (DOJ) recently
subpoenaed Yahoo, Google, and other search engine providers.
As the bill is currently worded personal information does not
refer to search terms or Internet addresses. If this proposed legislation is passed, violations of this law could be punished by
the FTC as deceptive business practices whether or not
a Web site is run by a business or an individual.
The proposed legislation described in this news item appears to be another step forward in fighting identity theft. If
there is no genuine business-related reason to keep personal
information on a Web site, it is only logical that this information be removed. Controversies and challenges concerning the
interpretation of no longer needed will, of course, surface.
Still, requiring Web site operators to purge unneeded personal
information will help ensure that at least some targets of
opportunity for would-be identity thieves will disappear.
6.
Financial Services Authority Report
highlights need for banks to boost
on-line security
In its Financial Risk 2006 Report the UKs Financial Services
Authority (FSA) found that 50% of Internet users are very concerned about the risk of fraud. The report stated that banks
should strive more to alleviate these concerns by educating
Web users about on-line security. Of the 1500 people asked
about their on-line habits, many reported that they used
good security practices, yet a fourth did not remember when
their security software such as anti-virus software was last
updated. Industry group Apacs found that Internet fraud
losses rose to GBP 14.4 million during the first half of 2005
more than triple that of the same period the previous year.
A critical point made in the FSA report was that if customers
were expected to absorb the costs for on-line fraud, 77% would
avoid on-line banking altogether. Recent reports of criminal
gangs stealing millions from the government through tax
credit scams involving the Department for Work and Pensions
and Network Rail have fueled the concerns of on-line customers concerning on-line security.
It would be very difficult to disagree with the findings and
recommendations of the recent FSA report. Banks rely on
on-line transactions, yet they often do not go far enough in ensuring that these transactions are secure. I especially worry
about the threat of keystroke sniffers being installed on users
computers; few users know what keystroke loggers are, let
alone how to detect them. Losses from fraudulent on-line
transactions are starting to mount, as indicated in this and
several previous news items. As these losses grow, banks
and other financial institutions will be virtually forced to pay
more attention to on-line security.
7.
Washington State and Microsoft
sue anti-spyware vendor
Microsoft and the State of Washington each filed lawsuits in
the US District Court for the Western District of Washington
against Secure Computer and its principals. The charges include violation of Washingtons Computer Spyware Act and
three other laws. Secure Computer allegedly used scare tactics
that included putting misleading links on Googles Web site,
producing unwanted pop up advertising, and spamming.
Secure Computer implied that their software came from or
was endorsed by Microsoft and then went further by using
a Windows feature to pop up warnings on PCs, informing
the users that their system had been compromised and that
they should run a spyware scan. Users were later advised to
buy Secure Computers Spyware Cleaner for USD 49.95 to
remove the malware that was supposedly installed on their
computers. The program does not work, however. Washington
state law establishes a fine of up to USD 100,000 per violation.
If Secure Computer has actually done what it is being
accused of having done, the lawsuits brought against this
company are a just punishment. As I have said so many times
before, computer users are for the most part incredibly nave
concerning security issues; it would not be difficult for an
8.
Morgan Stanley offers to settle
with the SEC
US-based investment bank Morgan Stanley has offered to
settle with the Securities and Exchange Commission (SEC)
for USD 15 million to resolve a matter related to Morgan Stanleys having destroyed potential electronic evidence. The
company did not comply with an order to keep electronic
messages that pertained to a lawsuit that had been filed
against it. Morgan Stanley claims that backup tapes on which
the email messages in question were stored were accidentally
overwritten. The SEC has not decided whether to accept
Morgan Stanleys settlement offer.
This is a truly fascinating case. Morgan Stanley somehow
got its wires crossed and deleted evidence that the SEC
ordered it to hand over. I do not blame the SEC for taking its
time in deciding how to deal with this investment bank.
If the SEC accepts Morgan Stanleys offer, Morgan Stanley will
not only get away relatively cheaply (remember, USD 15 million is small change for a company such as Morgan Stanley),
but other companies faced with the dilemma of having to
hand over evidence that they know will be used against them
will also be tempted to accidentally erase the evidence. On
the other hand, offering to pay the SEC right up front not
only appears to be a magnanimous move on Morgan Stanleys
part, but it also promises to close one of the many complicated
cases with which I am sure that the SEC is having to deal.
9.
FTC settles with CardSystems Solutions
and ChoicePoint
ID verification services vendor CardSystems Solutions has settled charges brought by the Federal Trade Commission (FTC)
that this company failed to secure sensitive customer data.
The charges followed a major security incident that led to
more than 260,000 individual cases of identity fraud. CardSystems Solutions had been obtaining data from the magnetic
strips of credit and debit cards and storing them without
deploying ample security safeguards. The company, bought
by Pay By Touch late last year, has agreed to implement
a wide-ranging security program and undergo independent
security audits every two years for 20 years.
In settling with the FTC, ID verification services vendor
ChoicePoint must pay USD 10 million in civil penalties and
USD 5 million for consumer damages. The USD 10 million is
the FTCs largest civil fine to date. ChoicePoint was charged
with not sufficiently screening its clients for legitimacy and
for data handling methods that violated the Fair Credit
Reporting Act, the FTC Act, other federal laws, and privacy
rights. The settlement requires ChoicePoint to establish a security program that includes verifying the legitimacy of clients
for their services, auditing its clients use of the information
obtained, and making visits to client sites. ChoicePoint now
161
10.
Lawsuits are not curtailing illegal
downloads
Surveys of 3000 on-line users in Spain, Germany, and the UK,
by the industry group Jupiter and studies by the International
Federation of the Phonographic Industries (IFPI) are indicating
that despite almost 20,000 people being sued in illegal song
downloading cases in 17 countries, illegal file sharing activity
has remained close to the same for the past two years.
Approximately 335 legal download stores and on-line music
services have two million songs legally available double
the amount from the previous year with 420 million singles
legally downloaded in 2005 and sales exceeding USD 1 billion
in 2005 up from USD 380 million in 2004. More rapid growth
is predicted this year. According to the surveys, 35% of illegal
file sharers have cut back on their activity, 14% have increased
their activity, and 33% of them buy less music than those who
obtain their music through purchasing it through legal channels. With approximately 870 million song files available
through illegal downloading on the net, the music industry
is having a difficult time persuading song-swappers to get
their music legally. The music industry is threatening to sue
Internet service providers (ISPs) if they do not start identifying
and stopping customers who ignore copyright restrictions. In
its Digital Music Report, the IFPI stated that music downloads
for mobile phones had reached USD 400 million annually,
which comprises 40% of the digital music business. Meanwhile, the plusses and minuses of the use of Digital Rights
Management technology, something that limits what consumers can do with their music once they have purchased
it, are still being debated.
The entertainment industry faces a continuing uphill struggle in its war against piracy. Using lawsuits as a mechanism for
reducing illegal downloads may not be working, but it nevertheless was a logical course of action to pursue. I suspect that much
of the reason that lawsuits are not working better than they are
is that most of the lawsuits have targeted individuals instead of
organizations. As such, many individuals who illegally download movies and music are probably not even aware of the
many lawsuits that have been filed over these type of activities,
so there is little or no intimidation factor. It is logical to also assume that the entertainment industry will in the not too distant
future now shift its strategy by increasingly going after ISPs who
do not prevent users from performing illegal downloads. The
road to success for this possible strategy is also not certain,
however; in the past numerous ISPs have been able to win court
162
battles against the RIAA and other entertainment industry entities when they have been directed to hand over names of illegal
file sharers. Again, the entertainment industry does indeed
have a long way to go.
11.
Russian stock exchange operations
disrupted by virus
A virus halted computing operations at the main Russian
stock exchange. The Russian Trading System (RTS) halted
operations in its three markets for slightly over 1 h after an
unidentified virus infected computing systems there. The
infection produced a massive amount of outgoing traffic that
disrupted normal network operations. The virus reportedly
insight, or understanding the mechanisms involved, an accurate simulation without the precision is quite acceptable,
Fisher explains.
The value of abstract models becomes apparent in the context of activities at CERT, Fishers old employer. Concerned
with broader issues of Internet security and broad security responses, the organization wants to understand the basic
mechanisms involved in an attack rather than getting into detail. Malicious activities such as distributed denial of service
attacks occur across huge numbers of machines. These
models dont depend on details like the topology of the Internet or who is connected to who, Fisher explains. They can
be abstracted away. You are concerned about the number of
machines vulnerable to attack, and the number of machines
capable of launching attacks.
It is these types of network, with large numbers of autonomous nodes, where emergent behavior is prevalent, Fisher
says. Outcomes for the whole system derive from local events.
As one nodes influence affects its neighbor, the neighbors
behavior in turn will affect other neighbors. This emergent
behavior is similar in different domains. The spread of viruses
in large computer networks, for example, can be similar to the
spread of biological epidemics even though the details in
each domain (people vs computers) will be totally different.
Fisher developed a software language called Easel designed
to help model emergent behaviors in Internet security and the
critical national infrastructure. It works on the basis that, although you could not reasonably model every single node
on the Internet for example, you must model enough nodes
to reflect emergent behavior. Although Fisher does most of
his work in the range of 201000 nodes, the Macintosh-based
tool can model abstract networks up to 32,000 nodes in size.
164
article info
abstract
Article history:
In a previous article [von Solms, 2000], the development of Information Security up to the
Corporate Governance
Information Security
Risk management
SarbanesOxley
Social engineering
1.
Introduction
166
2.
Corporate Governance and Information
Security
3.
The relationship between Corporate
Governance and Information Security
4.
Information Security and Information
Security Governance
From the previous discussion, and many other references,
there can be no doubt that the developments in the field of
good Corporate Governance over the last three to four years
had escalated the importance of Information Security to
higher levels. It is not only the fact that the spotlight was on
Information Security which resulted in this, but also the
167
5.
6.
168
7.
Summary
references
www.elsevier.com/locate/cose
EVENTS
For a more detailed listing of IS security and audit events, please refer to the events diary on www.compseconline.com
CSI NET SEC 06
1214 June 2006
Scottsdale, Arizona, USA
www.csinetsec.com
INFOSECURITY CANADA
1416 June 2006
Toronto, Canada
www.infosecuritycanada.com
INTERNATIONAL CONFERENCE ON DEPENDABLE
SYSTEMS AND NETWORKS 2006
2528 June 2006
Philadelphia, Pa, USA
www.dsn.org
18TH ANNUAL FIRST CONFERENCE
2530 June 2006
Baltimore, Maryland, USA
www.first.org/conference/2006
www.elsevier.com/locate/cose
KEYWORDS
Security;
Intrusion detection;
Correlation;
Alert analysis;
Reduction;
Attack scenario
Abstract With the growing deployment of networks and the Internet, the importance of network security has increased. Recently, however, systems that detect
intrusions, which are important in security countermeasures, have been unable
to provide proper analysis or an effective defense mechanism. Instead, they have
overwhelmed human operators with a large volume of intrusion detection alerts.
This paper presents a fast and efficient system for analyzing alerts. Our system basically depends on the probabilistic correlation. However, we enhance the probabilistic correlation by applying more systematically defined similarity functions and
also present a new correlation component that is absent in other correlation models. The system can produce meaningful information by aggregating and correlating
the large volume of alerts and can detect large-scale attacks such as distributed
denial of service (DDoS) in early stage. We measured the processing rate of each
elementary component and carried out a scenario-based test in order to analyze
the efficiency of our system. Although the system is still imperfect, we were able
to reduce the numerous redundant alerts 5.5% of the original volume without distorting the meaning through two-phase reduction. This ability reduces the management overhead drastically and makes the analysis and correlation easy. Moreover,
we were able to construct attack scenarios for multistep attacks and detect largescale attacks in real time.
2005 Elsevier Ltd. All rights reserved.
170
Introduction
Cyber attacks are escalating as the mission-critical
infrastructures for governments, companies, institutions, and millions of every day users become
increasingly reliant on interdependent computer
networks and the Internet. Moreover, current
cyber attacks show a tendency to become more
precise, distributive, and large-scale (CERT Coordination Center; Bugtraq). However, recent intrusion detection systems (IDSs) which are important
in security countermeasures, have been unable
to provide proper analysis or an effective security
mechanism for defending such cyber attacks because of several limitations.
First, as network traffic increases, the intrusion
detection alerts produced by IDSs are increasing
exponentially. In spite of this increase, most IDSs
neglect the overhead of human operators, who are
overwhelmed by the large volume of alerts. Second, human operators are fully responsible for
analyzing a networks status and the trends of
cyber attacks. Third, although cyber attacks can
produce multiple correlated alerts (Kendall, 1999;
CERT Coordination Center), IDSs are generally unable to detect such attacks as a complex single
attack but regard each alert as a separate attack.
Therefore, in the early stage, it is difficult to detect large-scale attacks such as a distributed denial of service (DDoS) or a worm.
These limitations are caused by the absence of
a mechanism that can preprocess and correlate the
massive number of alerts from IDSs. In fact, preprocessing and correlation of alerts are essential for
human operators because the information reproduced by this means can reduce the overhead of
human operators and help them react appropriately
(Bloedorn et al., 2001).
In proposing a fast and efficient system that
analyzes intrusion detection alerts via correlation,
we focused on providing human operators with
a level of flexibility that matches the topology and
status of a network. Our system basically depends
on the probabilistic correlation proposed in Valdes
and Skinner (2001) rather than the fixed rule-based
correlation of Perrochon et al. (2000), Cuppens
(2001a,b) Cuppens et al. (2002), Lee (1999) and
Lee et al. (2000). Compared with other models,
our model, which is similar to the probabilistic
correlation, has several advantages.
First, we considered the time similarity, though
this major measure of correlation is disregarded in
other models, and we used a mathematical function that computes the time similarity on the basis
of Brownes result in Browne et al. (2001). To pro-
S. Lee et al.
cess the time information more systematically,
we also applied the result to our system.
Second, for immediate analysis of the status of
a managed network and the trends of cyber
attacks, we used a situator, which can grasp the
trend of attacks being generated in the network by
analyzing the relations between the source and
the destination, as one of our components. With
a situator, we could detect large-scale attacks
such as a DDoS or worm in the early stage, and we
could respond to such threats as soon as possible.
Third, we implemented our model and tested it
for various attack scenarios. Moreover, as a result
of our improvement, the system has the capability
of real-time processing and is therefore more
practical than other models.
The remainder of this paper is organized as
follows. In next section, we describe the architecture of our proposed system and the details of
each component. Then, we describe the correlation hierarchy and similarity functions of our
system. Further, we compare our system with
other correlation systems and illustrate the performance of our system which is followed by an
overview of previously proposed correlation mechanisms. Finally, in the last section, we summarize
the paper and discuss future work.
System architecture
Our system consists of the five components as
shown in Fig. 1: Filter, Control center, Aggregator,
Correlator, and Situator. We attached the filter to
the sensor in each managed network and operated
other components from the control center.
Control center
The control center receives filtered alerts as
a Thread Event from the filter and saves them in
Aggregator
Sensor
Filter
Correlator
Situator
Control center
Sensor
Filter
Figure 1
DB
DB
171
FILTER
Alert Queue
(1)
SENSOR
Alert Receiver
Module
(2)
(3)
ThreadEvent
Maker Module
CENTER
(8)
ThreadEvent
Sender Module
ThreadEvent Table
(4)
(5)
(6)
(7)
Timer
Sensor Alert
Thread Event
Control information
Figure 2
a database before forwarding them to the aggregator and the situator for further processing. When
the system is started, the control center initializes
a runtime environment by connecting to a database, setting the parameters and so on. The viewers that are used for inspecting the processed
information in each component and the data structure are also defined in the control center.
Filter
The filter gathers alerts from the sensor in each
managed network and eliminates redundancies
among those alerts. The features that the filter
uses to eliminate the redundancies are the source
and class of the attack. The filter merges the
redundant alerts into Thread Events and forwards
them to the control center at regular intervals.
The filter consists of three modules: an Alert
Receiver, a ThreadEvent Maker, and a ThreadEvent
Sender. The alert receiver forms one process and
the other two modules behave as multiple threads
in a single process. Fig. 2 shows the internal architecture and processing flow of the filter.
Alert receiver: the primary sensor of our system is
a widespread NIDS Snort. The alert receiver receives
alerts from the sensor in the form of an Alertpkt1
struct type, and sends them to the alert queue. The
alert queue saves the alerts in the order of arrival.
ThreadEvent Maker: after receiving the alerts
from the queue, the ThreadEvent Maker compares
them with previous alerts. If exact matches exist
1
Aggregator
The aggregator compares the similarity of features
between the thread events transferred from each
filter. If common features exist between two
thread events, the aggregator merges them into
one meta event named an Aggregation Event. The
aggregator can merge the thread events that may
not be merged into a similar thread event in the
filter because the aggregator has longer merging
interval than the filter.
Fig. 4 illustrates a diagram of the processing flow
of the aggregator. When new thread events are
transferred into the control center, the network
module that is communicating with each managed
network calls the aggregator. The aggregator then
extracts the previous aggregation events generated
for certain period of time from the database and,
using the similarity functions defined in section
172
S. Lee et al.
Is Alertqueue empty?
Y
Create sub ThreadEvent
and attach it to the matching
meta ThreadEvent
Dequeuing AlertEvent
from Alertqueue
Y
Does ThreadEvent Table
has the meta ThreadEvent
that has the same src IP
and attack class?
Figure 3
Correlator
By analyzing the timing and causal relation between aggregation events, the correlator can catch
attack scenarios that are carried out in multiple
steps and accumulate a store of knowledge about
new attack patterns. We therefore suppressed the
minimum expectation of similarity on the source
and destination of the attack as shown in Table 2.
We also enforced the similarity expectation on the
source and destination of the attack. Moreover, to
correlate various attacks with the same destination, the minimum expectation and similarity expectation of the attack class are set to low.
As shown in Fig. 5, which illustrates a diagram of
the processing flow of the correlator, the correlator
only processes aggregation events. When a new
Network 1
Sensor1
Control center
Network 2
Sensor2
Thread
Event
Network module
Aggregator
- Communicate
with each network
- Call Fusion( )
- Save Thread Event
Save
Network 3
- Update or create
Aggregation Event
Select
Meta Event
Update or
Create
Sensor3
Thread Event
Database
Figure 4
Aggregtion Event
Database
Correlated
Meta Event
Database
173
Feature
Source IP
Source port
Destination IP
Destination port
Attack class
Time
Expectationa
Minimumb
Medium
High
Low
Low
Medium
High
Low
Low
High
High
Medium
Medium
a
b
aggregation event is transferred from the aggregator, the correlator selects the previously generated
Correlation Events within certain periods. If there
is a matching event in the lists of selected events,
a new aggregation event is merged into that correlation event with the time information. Otherwise,
a new correlation event is generated.
The correlator can provide us with important
information about the similarity of an attack class.
For example, the attack scenarios detected in the
correlator may consist of related attacks, and
these attacks can be considered as similar. Therefore, to construct a more precise matrix, the
results of the correlator should be fed back to
the similarity matrix.
Situator
The situator grasps the trend of attacks being
generated in the network by analyzing the relations between the source and the destination. This
capability enables early detection of the largescale attacks that originate from many attackers
around the world, such as DDoS and worm, and it
reduces the response time.
The situator can detect three types of attack:
1:N, N:1, and M:N. The 1:N attack means an attack
that originates from a single source to multiple destinations, such as a network scan and a service scan.
In contrast, the N:1 attack means an attack that originates from multiple sources to a single destination.
One example of an N:1 attack is a DDoS, and
such attacks tend to increase without warning.
Therefore, by analyzing the attack trends in a network, we can detect attacks in the early stage. As
with a worm or virus, an M:N attack has the entire
network as its destination. While this type of attack generates a small number of events for a specific source and destination, it generates a great
number of events in the entire network.
Table 2
Fig. 6 shows the internal architecture and a simple flow diagram of the situator. The situator saves
each thread event that is transferred from the filters
in candidate lists first. If the number of thread
events saved in each candidate list exceeds a predefined threshold, the situator classifies them into
a corresponding situation and generates the Situation Events. A human operator can reconfigure the
threshold according to the status of the managed
network or the trend of the current attacks. Fig. 7
shows a more detailed flow diagram of the situator.
Hierarchy of correlation
Each component of our system achieves the
following hierarchy of correlation and, to get the
correlation at different stages of the hierarchy, we
use multiple events. For example, we can infer
thread events (within a sensor correlation) and
then merge them into aggregation events or
situation events for center-level inspection. The
aggregation events are correlated into the correlation events again in order to construct the attack
scenarios of multistep attacks. Fig. 8 shows the
overall hierarchy of correlation.
Thread event: a thread event is the primary information unit in our system. The filter first eliminates the redundancies among a large number of
raw alerts and merges them into a small number
of thread events. In this process, the filter does
not use the similarity functions defined in section
Similarity functions. Instead, the filter simply
compares the source and the attack class. The
thread events generated in the filter are transferred to the control center for further processing.
Aggregation event: by setting the minimum expectation of similarity on the source, the destination ip address, and the attack class as high, and by
relaxing the similarity expectation for the time as
Feature
Source IP
Source port
Destination IP
Destination
port
Attack class
Time
Expectation
Minimum
High
High
Low
Low
High
High
Low
Low
Low
Low
Low
Low
174
S. Lee et al.
Correlator
- Calculate Similarity
Aggregation Event
Database
Update
- Update or create
Create
Correlated MetaEvent
Similarity
Matrix
Figure 5
Correlated
Meta Event
Database
Feedback
Similarity functions
In so far as we consider the similarity of features,
the minimum expectation of similarity, and the
expectation of similarity, our correlation approach
is similar to the probabilistic alert correlation
Situator
N:1 Situator
N:1 Candidates
DDoS
Detector
M:N Situator
Thread
Event
Worm
Detector
IP address similarity
If the sources of two different events (or attacks)
belong to the same sub-network, there is greater
probability that the same attacker launched the
two events. This probability may increase exponentially as the matching address becomes longer.
We can infer, therefore, that the similarity of ip
address agrees with the log scale. The similarity
function for the ip address is defined as follows and
its value can be readjusted to a realistic level
through more experiments.
IPsimilarity (string IP1, string IP2) {
If perfect match, return 1;
If C class match, return 0.8;
If B class match, return 0.4;
If A class match, return 0.2;
Return 0;
}
Port similarity
There may be a high probability that the list of
input events will become a subset of the meta
events because newly input events are generally
low level of the earlier meta events. We therefore
defined the similarity between the port list of two
events as the mean of the similarity between each
input event and the meta event. The input event
has a port list of L1 {x1, x2, ., xn}, and the meta
event has a port list of L2 {y1, y2, ., ym}, then the
similarity S between L1 and L2 is defined as follows.
Si max Similarity xi ; yj
1 j m
1:N Situator
1:N Candidates
Scan
Detector
S
Thread Event
Situation Event
1X
Si
n
M:N Situator
thread event
N:1 Situator
N:1 Candidates
175
1:N Situator
1:N Candidates
Otherwise
If exists then increase
1:N Situator
Otherwise
Increase N:1
Candidates
Increase 1:N
Candidates
Figure 7
Time similarity
p
M
Sensor
Filter
Alert
The time information is important in alert correlation, and time similarity has great significance
when we calculate the overall similarity. For
example, if most features of the two events are
similar, and if the two events are extremely
2
Aggregator
Correlator
Aggregation
Event
Correlation
Event
Thread Event
Alert
Alert
Thread Event
Alert
Situation
Event
Alert
Alert
Alert
Figure 8
M:N Situation
N:1 Situation
Thread Event
Situator
1:N Situation
176
S. Lee et al.
p
t2 t1
Overall similarity
After calculating the similarity of each feature, we
need to calculate the overall similarity in order to
decide whether the two events can be correlated.
When calculating overall similarity, the expectation of similarity and the minimum expectation
of similarity play an important role as a weight
and a necessary condition each. By using the
expectation of similarity, we can attach importance to the significant features. The minimum
expectation of similarity is used as a threshold
value. For instance, certain features can be required to match exactly or approximately for an
event to be considered as a candidate for correlation with another. The minimum expectation thus
expresses the necessary but not sufficient conditions for correlation.
If any overlapping feature matches at a value less
than the minimum similarity for the feature, the
overall similarity between two events is zero.
Otherwise, the overall similarity is the weighted
average of the similarities of the overlapping features, using the respective expectations of similarity as weights.
As with the probabilistic approach (Valdes and
Skinner, 2001), we can define the overall similarity
between a new event, X, and an earlier event, Y,
as follows.
P
SIMX; Y
Ej SIM Xj ; Yj
P
Ej
j
Performance evaluation
To assess the processing power and efficiency of our
system, we measured the reduction ratio of the
filter and the correlator, and the processing time of
each component. To evaluate the correlation performance, we also conducted the following scenariobased test using known techniques and exploits.
Scenario #1: stealth scan to specific host
Scenario #2: Buffer Overflow attack to FTP
server
Scenario #3: CGI attack to Web server
Scenario #4: Buffer Overflow attack to RPC
service
Scenario #5: network scan to multiple host [1:N
attack]
Scenario #6: DDoS attack [N:1 attack]
Scenario #7: attack using worm and virus [N:M
attack]
Although useful for evaluating the performance
of the IDS, the DARPA data set is unsuitable for
evaluating the correlation system. We therefore
conducted known attack scenarios to inspect
whether our correlation component successfully
detects multistep attacks and large-scale attacks
in the early stage. In this paper, we only present
the results of Scenarios #2, #4, and #6.
Reduction ratio of filter and aggregator
We measured the reduction ratio of the filter and
the aggregator by dividing the number of events
generated in the filter or the aggregator into the
total number of alerts generated in the test
period. The results are as shown in Table 3.
When we set the timer in the filter to 1 min, the
average reduction ratio of the filter is 11.1%. In
the case of ICMP Nachi Worm by Ping CyberKit,
the filter on average merged 20 alerts into a single
thread event, and the maximum reduction ratio
was 5%.
Processing time of each component
To assess the processing efficiency of each component, we measured the processing time of
whatever single thread event (i.e. may include
several alerts) was processed completely in each
component.
As shown in Table 4, our system has a real-time
processing capability, since a single thread event
can be processed completely in all components
within 0.8 s. Furthermore, in contrast to other systems, in which most processes are conducted manually, the automation in our system can drastically
reduce the management overhead of human
Table 3
177
Reduction ratio of filter and aggregator
Component
599,403
Filter
# of thread event
Average reduction ratio
(# of thread event/#
of alert)
# of aggregation event
Average reduction ratio
(# of aggregation event/#
of alert)
66,775
11.1%
Aggregator
33,173
5.5%
Table 4
Component
Processing
time (s)
0.0097
0.5398
0.0887
0.1691
Total
0.8103
178
S. Lee et al.
Figure 9
Figure 10
Fig. 10 shows the thread events that are generated as a consequence of a Buffer Overflow attack
to an ftp server. As shown in Fig. 10, current IDSs
usually provide the detected alerts as they are,
Figure 11
179
Figure 12
180
S. Lee et al.
Figure 13
carried out in order to interrupt the service provision or normal operation of a specific host,
causes a great deal of overhead in a managed
network. Most IDSs, however, cant detect such an
attack in the early stage.
We emulated a DDoS attack with the aid of an
ICMP Flooder. As shown in Fig. 16, the ICMP Flooder
continuously transfers large packets to the target
of the attack.
Fig. 17 shows the thread events that were transferred to the control center. The thread events
generated by our emulated DDoS attack are the
events included in the rectangles. As you can see
in the Count field of the table, the DDoS attack
usually generates a large volume of alerts within
a short period. Whenever such a large volume of
alerts is transferred to the control center without
preprocessing (that is, merging in the filter) as is
the case in most IDSs, human operators may be
easily overwhelmed and react inappropriately.
Our system, however, can reduce the numerous
alerts to a small volume that human operators
can easily handle. For example, more than a thousand alerts can be merged into just 10 thread
events, as shown in Fig. 17.
Furthermore, in the case of alert flooding
(i.e. when a large volume of alerts is generated in
the managed network), the attack count of each
thread event also shows the average and maximum
reduction ratio of our system. The maximum
Figure 14
Related work
Several alert aggregation and correlation techniques
(Perrochon et al., 2000; Cuppens, 2001a,b; Cuppens
et al., 2002; Debar and Wespi, 2001; Valdes and Skinner, 2001; Porras et al., 2002; Ning et al., 2002; Morin
et al., 2002) have been proposed to facilitate the
analysis of intrusions. Based on their own way, these
approaches tried to find the relationships between
alerts and to generate the significant information.
Figure 15
Perrochon et al. (2000) used a predefined rule to correlate alerts and to find the attack scenarios. Cuppens (2001a,b) and Cuppens et al. (2002) used
Lambda language to specify attack scenarios and
used Prolog predicates to correlate alerts based on
IDMEF data model. In Debar and Wespi (2001), an aggregation and correlation component was built into
a Tivoli Enterprise Console. In Valdes and Skinner
(2001), a probabilistic method was used to correlate
alerts by using the similarity between their features.
Porras et al. (2002) proposed a mission-impact-based
approach to analyzing the security alerts produced
by spatially distributed heterogeneous information
security (INFOSEC) devices. They intended to provide analysts with a powerful capability to automatically fuse together and isolate the INFOSEC alerts
that represent the greatest threat to the health
and security of their networks. Ning et al. (2002) developed three utilities to facilitate the analysis of
large sets of correlated alerts. In Morin et al.
Figure 16
181
182
S. Lee et al.
Figure 17
Figure 18
Acknowledgement
This work was supported by the Korea Science and
Engineering Foundation (KOSEF) through the advanced Information Technology Research Center
(AITrc) and University IT Research Center Project.
References
Bloedorn E, Christiansen AD, Hill W, Skorupka C, Talbot LM,
Tivel J. Data mining for network intrusion detection: how
to get started. MITRE Technical Report; August 2001.
Browne H, Arbaugh W, McHugh J, Fithen W. A trend analysis of
exploitations. In: Proceedings of the 2001 IEEE symposium on
security and privacy; May 2001. p. 214e29.
Bugtraq. Security focus online, <http://online.securityfocus.
com/archive/1>.
CERT Coordination Center. Cert/CC advisories Carneige Melon.
Software Engineering Institute. Online, <http://www.cert.
org/advisories/>.
Cuppens F. Cooperative intrusion detection. In: International
symposium Information Superiority: Tools for Crisis and
Conflict-Management. Paris, France; September 2001a.
Cuppens F. Managing alerts in a multi intrusion detection environment. In: 17th annual computer security applications
conference (ACSAC). New Orleans; December 2001b.
Cuppens F, Autrel F, Miege A, Benferhat S. Correlation in an intrusion detection process. In: Internet security communication workshop (SECI02). Tunis-Tunisia; September 2002.
Debar H, Wespi A. Aggregation and correlation of intrusiondetection alerts. In: Proceedings of 2001 international workshop on recent advances in intrusion detection. Davis, CA;
October 2001.
Kendall K. A database of computer attacks for the evaluation of
intrusion detection systems. Masters thesis. Massachusetts
Institute of Technology; June 1999.
Lee W. A framework for constructing features and models for intrusion detection system. PhD thesis. Columbia University;
June 1999.
Lee W, Nimbalkar RA, Yee KK, Patil SB, Desai PH, Tran TT, et al.
A data mining and CIDF-based approach for detecting novel
and distributed intrusions. In: Proceedings of 2000 international workshop on recent advances in intrusion detection
(RAID00). Toulouse, France; October 2000.
Morin B, Me L, Debar H, Ducasse M. M2D2: a formal data model for
IDS alert correlation. In: Proceedings of the fifth international
symposium on recent advances in intrusion detection
(RAID02). In: LNCS 2516. Zurich, Switzerland; October 16e18,
2002. p. 115e37.
Ning P, Cui Y, Reeves DS. Analyzing intensive intrusion alerts
via correlation. In: Proceedings of the fifth international
symposium on recent advances in intrusion detection
(RAID02). In: LNCS 2516. Zurich, Switzerland; October
2002. p. 74e94.
NMAP network mapping tool, <http://www.insecure.org/
nmap/>.
183
www.elsevier.com/locate/cose
Institute for Development and Research in Banking Technology, Castle Hills, Road Number 1,
Masab Tank, Hyderabad-500057, India
b
K. R. School of Information Technology, Indian Institute of Technology, Mumbai-400076, India
Received 20 January 2005; revised 16 August 2005; accepted 23 September 2005
KEYWORDS
Authentication;
Bilinear pairings;
Smart card;
Password;
Timestamp
Abstract The paper presents a remote user authentication scheme using the
properties of bilinear pairings. In the scheme, the remote system receives user
login request and allows login to the remote system if the login request is valid.
The scheme prohibits the scenario of many logged in users with the same loginID, and provides a flexible password change option to the registered users without
any assistance from the remote system.
2005 Elsevier Ltd. All rights reserved.
Introduction
Password authentication is an important technique
to verify the legitimacy of a user. The technique is
regarded as one of the most convenient methods
for remote user authentication. Based on the
computation complexity, password-based authentication schemes are classified into two broad
0167-4048/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cose.2005.09.002
185
user login request. The login request verification requires user identity, remote system public-key corresponding to the remote systems
secret key.
- The scheme prevents the scenario of many
logged in users with the same login-ID. Typically, a registered user can share his password or
secret component with others, thus all who
know the password or secret component with
respect to the users login-ID, can login to the
remote system. This generally happens in digital library, where a subscriber can share his
login-ID and password with others, and many
users (who knows login-ID and password) can
download or view the digital document. In
our scheme, the login request is generated by
the smart card using its stored secret component without any human intervention. It is
extremely difficult to extract the secret
component from the smart card, and thus the
user cannot share it with others. Even if the
legitimate users password is shared with
others, the other person cannot login to the
system without the smart card. Once a valid
user logs into the remote system, his smart
card will be inside the terminal until the user
logs out. If the user pulls out the card from
the terminal after login the remote system,
the login session will be immediately expired.
Thus, the scheme can successfully prevent
the scenario of many logged in users with the
same login-ID.
- The scheme can resist the replay, forgery and
insider attacks.
The rest of the paper is organised as follows. In
the next section, we give some preliminaries of
bilinear pairings. In the section following that, we
propose our scheme and analyse the scheme in
Section Correctness, performance and security.
Finally we conclude the paper in last section.
Preliminaries
Bilinear pairings
Suppose G1 is an additive cyclic group generated
by P, whose order is a prime q, and G2 is a multiplicative cyclic group of the same order. A map
e^ : G1 !G1 /G2 is called a bilinear mapping if it
satisfies the following properties:
1. Bilinear: e^aP; bQ Ze^P; Q ab for all P, Q G1
and a, b Zq :
186
2. Non-degenerate: there exist P, Q G1 such that
e^P; Q s1:
3. Computable: there is an efficient algorithm to
compute e^P; Q for all P, Q G1.
We note that G1 is the group of points on an
elliptic curve and G2 is a multiplicative subgroup
of a finite field. Typically, the mapping e^ will be
derived from either the Weil or the Tate pairing
on an elliptic curve over a finite field.
Mathematical problems
Proposed scheme
There are three entities in the proposed scheme,
namely the user, users smart card and the remote
system. The scheme consists of mainly three
phases e the setup phase, the registration phase
and the authentication phase.
Setup phase
Suppose G1 is an additive cyclic group of order
prime q, and G2 is a multiplicative cyclic group of
the same order. Suppose P is a generator of G1,
e^ : G1 !G1 G2 is a bilinear mapping and H: {0,
Registration phase
This phase is executed by the following steps when
a new user wants to register with the RS.
R1. Suppose a new user Ui wants to register with
the RS.
R2. Ui submits his identity IDi and password PWi to
the RS.
R3. On receiving the registration request, the RS
computes RegIDi Z s$H(IDi) C H(PWi).
R4. The RS personalizes a smart card with the
parameters IDi, RegIDi, H($) and sends the
smart card to Ui over a secure channel.
Authentication phase
This phase is executed every time whenever a user
logs into the RS. The phase is further divided into
the login and verification phases. In the login
phase, user sends a login request to the RS. The
login request comprises with a dynamic coupon,
called DID, which is dependent on the users ID,
password and RSs secret key. The RS allows the
user to access the system only after successful
verification of the login request.
Login phase
The user Ui inserts the smart card in a terminal and
keys IDi and PWi. If IDi is identical to the one that is
stored in the smart card, the smart card performs
the following operations:
L1. Computes DIDi Z T$RegIDi, where T is the user
systems timestamp.
L2. Computes Vi Z T$H(PWi).
L3. Sends the login request CIDi, DIDi, Vi, TD to the
RS over a public channel.
Verification phase
Let the RS receives the login message CIDi, DIDi, Vi,
TD at time T) (RT ). The RS performs the following
operations to verify the login request:
V1. Verifies the validity of the time interval between T) and T. If (T)T ) % DT, the RS
Performance
In order to compare the performance of our
scheme with the existing public-key based remote
user authentication schemes, we consider the
schemes (Chang and Liao, 1994; Shen et al.,
2003) which are based on ElGamals (1985)
187
signature scheme and used smart cards. The smart
card personalization cost for the registration process of our scheme is as per the schemes in (Chang
and Liao, 1994; Shen et al., 2003). The login phase
in (Chang and Liao, 1994; Shen et al., 2003) requires four discrete logarithm operations, one
scalar multiplication and one hash computation;
whereas the verification phase requires two discrete logarithm operations, one scalar multiplication, one hash computation and one inverse
operation. Our scheme needs two scalar multiplications of elliptic curve point and one hash to
point operation in the login phase; whereas two bilinear pairing operations, one scalar multiplication
of curve point, one point addition and one hash to
point operation in the verification phase. As the
pairing operation is costly (Barreto et al., 2002),
so the verification phase of our scheme takes
high computation cost compared to the verification phase in (Chang and Liao, 1994; Shen et al.,
2003). However, the verification process is done
by the RS with large computation system, thereby
the computation cost of the verification process
is not a constraint. The computation cost at the
users system (e.g., smart card) is a crucial issue
and the login phase of our scheme is efficient
than the login phase in (Chang and Liao, 1994;
Shen et al., 2003). Furthermore, our scheme
claims the following characteristics:
188
proposed scheme can successfully prevent the
scenario of many logged in users with the same
login-ID.
Security
Here, we show that the proposed scheme can
withstand the following attacks.
Replay attack
Suppose an adversary replays an intercepted valid
login request and the RS receives the request at
time Tnew. The attack cannot work because it fails
the step (V1) of the verification phase as the time
interval (Tnew T ) exceeds the expected transmission delay DT.
Forgery attack
A valid user login message consists of IDi, DIDi, Vi
and T, where DIDi Z T$RegIDi and Vi Z T$H(PWi).
The RegIDi is stored in smart card by the RS at the
time of Ui registration process and it is extremely
difficult to extract RegIDi from the smart card. An
adversary cannot construct a valid RegIDi (Zs$H
(IDi) C H(PWi)) without the knowledge of RSs
secret key s and users password. If an adversary
intercepts a valid login message CIDi, DIDi, Vi, TD,
he cannot resend it later, but the timestamp will
be different in the next time and it fails the step
Conclusion
We proposed a remote user authentication scheme
using the properties of bilinear pairings. The
scheme prevents the adversary from forgery attacks by employing a dynamic login request in
every login session. The use of smart card not
only makes the scheme secure but also prevents
the users from distribution of their login-IDs, which
effectively prohibits the scenario of many logged in
users with the same login-ID. Moreover, the scheme
provides a flexible password change option, where
References
Barreto PSLM, Kim HY, Lynn B, Scott M. Efficient algorithms
for pairing-based cryptosystems. In: Advances in cryptology e Crypto02, LNCS, vol. 2442. Springer-Verlag; 2002.
p. 354e68.
Boneh D, Franklin M. Identity-based encryption from the Weil
pairing. In: Advances in cryptology e Crypto01, LNCS, vol.
2139. Springer-Verlag; 2001. p. 213e29.
Chang CC, Wu TC. Remote password authentication with smart
cards. IEE Proceedings e E 1993;138(3):165e8.
Chang CC, Liao WY. A remote password authentication scheme
based upon ElGamals signature scheme. Computers & Security 1994;13(2):137e44.
Cocks C. An identity based encryption scheme based on quadratic residues. In: Cryptography and coding, LNCS, vol.
2260. Springer-Verlag; 2001. p. 360e3.
ElGamal T. A public key cryptosystem and signature scheme
based on the discrete logarithms. IEEE Transaction on Information Theory 1985;31(4):469e72.
Frey G, Ruck H. A remark concerning m-divisibility and the
discrete logarithm in the divisor class group of curves.
Mathematics of Computation 1994;62:865e74.
Hess F. Efficient identity based signature schemes based on
pairings. In: Selected areas in cryptography02, LNCS, vol.
2595. Springer-Verlag; 2003. p. 310e24.
Hsieh BT, Sun HM, Hwang T. On the security of some password
authentication protocols. Informatica 2003;14(2):195e204.
Hwang JJ, Yeh TC. Improvement on PeyravianeZunics password authentication schemes. IEICE Transactions on Communications 2002;E85-B(4):823e5.
IEEE P1363.2 draft D12: standard specifications for passwordbased public key cryptographic techniques. IEEE P1363
working group; 2003.
Ku WC, Chen CM, Lee HL. Weaknesses of LeeeLieHwangs hashbased password authentication scheme. ACM Operating
Systems Review 2003;37(4):9e25.
Ku WC. A hash-based strong-password authentication scheme
without using smart cards. ACM Operating Systems Review
2004;38(1):29e34.
Lamport L. Password authentication with insecure communication. Communications of the ACM 1981;24(11):770e2.
Lee CC, Li LH, Hwang MS. A remote user authentication scheme
using hash functions. ACM Operating Systems Review 2002;
36(4):23e9.
Menezes A, Okamoto T, Vanstone S. Reducing elliptic curve
logarithms to logarithms in a finite field. IEEE Transactions
on Information Theory 1993;39:1639e46.
Menezes A, van Oorschot PC, Vanstone S. Handbook of applied
cryptography. CRC Press; 1996.
Peyravian M, Zunic N. Methods for protecting password transmission. Computers & Security 2000;19(5):466e9.
Shamir A. Identity-based cryptosystems and signature schemes.
In: Advances in cryptology e Crypto84, LNCS, vol. 196.
Springer-Verlag; 1984. p. 47e53.
Shen JJ, Lin CW, Hwang MS. A modified remote user authentication scheme using smart cards. IEEE Transactions on
Consumer Electronics 2003;49(2):414e6.
189
Shimizu A, Horioka T, Inagaki H. A password authentication
methods for contents communication on the Internet. IEICE
Transactions on Communications 1998;E81-B(8):1666e73.
Manik Lal Das received his M. Tech.
degree in 1998. He is working in Institute for Development and Research in
Banking Technology, Hyderabad as Research Officer and pursuing his Ph.D.
degree in K. R. School of Information
Technology, Indian Institute of Technology, Bombay, India. He has published over 15 research articles in
refereed Journal Conferences. He is
a member of Cryptology Research Society of India and Indian Society for Technical Education. His research interests include Cryptography and Information Security.
Ashutosh Saxena received his M.Sc.
(1990), M. Tech. (1992) and Ph.D. in
Computer Science (1999) from Devi
Ahilya University, Indore. Presently,
he is working as Associate Professor
in Institute for Development and Research in Banking Technology, Hyderabad. He is on the Editorial Committees
of various International Journals and
Conferences, and is a Life Member of
Computer Society of India and Cryptology Research Society of India and Member of IEEE Computer
Society. He has authored and co-authored more than 50 research paper published in National/International Journals and
Conferences. His main research interest is in the areas of Authentication Technologies, Smart Cards, Key Management and
Security Issues in Banking.
Ved P. Gulati received his Ph.D. degree from Indian Institute of Technology, Kanpur, India. Presently, he is
a consultant advisor in Tata Consultancy Services, Hyderabad, India. He
was Director of Institute for Development and Research in Banking Technology, Hyderabad, India from 1997
to 2004. He is a member of IEEE, Cryptology Research Society of India and
Computer Society of India. His research Interests include Payment Systems, Security Technologies, and Financial Networks.
Deepak B. Phatak received his Ph.D.
degree from Indian Institute of Technology, Bombay, India. He is Subrao
M. Nilekani Chair Professor with K. R.
School of Information Technology, Indian Institute of Technology Bombay,
India. His research interests include
Data Bases, System performance evaluation, Smart Cards and Information
Systems.
www.elsevier.com/locate/cose
KEYWORDS
Computer security;
Education;
Courseware;
Laboratory projects;
Minix
Abstract To address national needs for computer security education, many universities have incorporated computer and security courses into their undergraduate
and graduate curricula. In these courses, students learn how to design, implement,
analyze, test, and operate a system or a network to achieve security. Pedagogical
research has shown that effective laboratory exercises are critically important to
the success of these types of courses. However, such effective laboratories do
not exist in computer security education.
Intrigued by the successful practice in operating system and network courses
education, we adopted a similar practice, i.e., building our laboratories based on
an instructional operating system. We use Minix operating system as the lab basis,
and in each lab we require students to add a different security mechanism to the
system. Benefited from the instructional operating system, we design our lab exercises in a way such that students can focus on one or a few specific security concepts while doing each exercise. The similar approach has proved to be effective
in teaching operating system and network courses, but it has not yet been used
in teaching computer security courses.
2005 Elsevier Ltd. All rights reserved.
Introduction
*
The high priority that information security education warrants has been recognized since early
1990s. In 2001, Eugene Spafford, director of the
Center for Education and Research in Information
Assurance and Security (CERIAS) at Purdue University, testified before Congress that to ensure safe
0167-4048/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cose.2005.09.011
191
computing, the security (and other desirable properties) must be designed in from the start. To do
that, we need to be sure all of our students
understand the many concerns of security, privacy,
integrity, and reliability (Spafford, 1997).
To address these needs, many universities have
incorporated computer and information security
courses into their undergraduate and graduate
curricula. In many curricula, computer security
and network security are two core courses. These
courses teach students how to design, implement,
analyze, test, and operate a system or a network
with the goal of making it secure. Pedagogical
research has shown that students learning is
enhanced if they can engage in a significant
amount of hands-on exercises. Therefore, effective laboratory exercises (or course projects) are
critically important to the success of computer
security education.
Traditional courses, such as operating systems,
compilers, and networking, have effective laboratory exercises, as the result of 20 years maturation. In contrast, laboratory designs in security
education courses are still embryonic. A variety of
approaches are currently used; three of the most
frequently used designs are the following: (1) the
free-style approach, i.e., instructors allow students to pick any security-related topic they are
interested in for the course projects; (2) the dedicated computing environment approach, i.e., students conduct security implementation, analysis
and testing (Hill et al., 2001; Mayo and Kearns,
1999) in a contained environment; and (3) the
build-it-from-scratch approach, i.e., students
build a secure system from scratch (Mitchener
and Vahdat, 2001).
Free-style design projects are effective for
creative students; however, most students become
frustrated with this strategy because of the difficulty in finding an interesting topic. With the
dedicated environment approach, projects can
be very interesting, with the logistical burdens of
the laboratory e obtaining, setting up, and managing the computing environment. In addition,
course size is constrained by the size of the
dedicated environment. The third design approach
requires students to spend considerable amount of
time on activities that are irrelevant to computer
security education but are essential for a meaningful and functional system.
The lack of an effective and efficient laboratory
for security courses motivated us to consider
practices adopted by the traditional mature
courses, e.g., operating systems (OS) and compilers. In OS courses, a widely adopted successful
practice is using an instructional OS (e.g., MINIX
192
lectures to identify, exploit, and fix those
vulnerabilities.
Our approach is open-ended, i.e., we can add
more laboratory projects to this framework without affecting others. The projects presented in
this paper are the result of 3 years maturation,
with more components added in each year. We are
also planning to design a number of network
security projects for Minix based on the Minixs
existing networking functionality.
The paper is organized as follows: the next
section briefly describes our computer security
course. Then the design of our courseware is
described which is followed by description of
each of our laboratory projects. Further the
experiences and lessons we have gained during
our 3-year practice are presented. Finally, the last
section concludes the paper and describes the
future work.
Pedagogical approach
Lecturing on theories, principles and techniques of
computer security is not enough for students to
understand system security. Students must be able
to put what they have learned into use. We use the
learning by doing approach. It was shown in
other studies that this type of active learning
approach has a higher chance of having a lasting
effect on students than letting students passively
listen to lectures without reinforcement (Meyers
and Jones, 1993).
More specifically, we try to use the Minix OS as
our base system to develop assignments that can
give students hands-on experience with those theories taught in class. For example, when teaching
Set-UID concept of Unix, we developed an assignment for students to play with this security
W. Du et al.
mechanism, figure out why it is needed, and
understand how it is implemented.
We have developed two types of assignments:
small assignments and comprehensive assignments. Each small assignment focuses on one
specific concept, such as Set-UID and access control. These assignments are usually small; they do
not need much programming, and take only 1 or 2
weeks; therefore, we can have several small projects to cover a variety of concepts in system security. However, being able to deal with each
individual concept is not enough, students need
to learn how to put them together. We have developed comprehensive assignments, which cover
a number of concepts in one assignment. They
are ideal candidates for final projects.
Course prerequisites
Because this course focuses on system security, we
require students to have appropriate system background. Students taking the course are expected
to have taken the graduate-level operating systems. They should be proficient in C programming.
193
Minix Instructional
Operating System
Vulnerabilities pool
Figure 1
Preparation
Privilege
(SetUID)
Access
Control
Capability
Sandboxing
Encrypted
File System
194
W. Du et al.
Table 1
Course projects
Laboratory setup
We use Minix on Solaris in our course. All of the
laboratory exercises will be conducted in SUN
Solaris environment using C language. Except
for giving students more disk space (100 MB) to
store the files of Minix system, Minix poses no
special requirements on the general Solaris
computing environment.
The Minix operating system can also be installed on simulated environments like VMware ,
Bochs and so on. Installing the operating system
on VMware is not a difficult process, and no superuser privilege is needed to run Minix on VMware.
Therefore, this could be another installation option. Both approaches can be used in our laboratory designs. However, we preferred to use the
Solaris approach, so students do not need to buy
the VMware license or use free-wares that are
not stabilized yet.
We have designed a variety of course projects
on Minix. Depending on the course schedule and
the students familiarity with Unix and their proficiency in C programming, instructors might want
to choose a subset of the projects we designed.
Currently, we are still developing more assignments, and we will also solicit contributions from
Complete
Complex
Superuser privilege
Modularized
Instructional OS
Minix
Nachos
Xinu
Yes
Yes
Yes
Yes
Partial
Yes
No
No
No
No
No
Yes
Yes
Yes
Yes
Commercial OS
Linux
BSD
SunOS
Windows
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
195
other people. Our goal is to create a pool of lab assignments, such that different instructors can
choose the subset to meet the requirements of
their syllabi.
Preparation
In this warm-up project, students get familiar with
the Minix operating system, such as installing and
compiling the Minix OS, conducting simple administration tasks (e.g., adding/removing users), and
learning to use/modify some common utilities.
More importantly, we want students to understand
the Minix kernel. For our system security course,
students just need to understand in detail system
calls, file systems, the data structure of i-node
and process table. They do not need to study
non-security modules such as process scheduling
and memory management. Students meeting the
prerequisites should be comfortable with the
Minix environment in 2e3 weeks.
The following is a list of sample tasks we used.
In reality, instructor can choose different tasks to
achieve the same goals:
Compile and install Minix, then add three user
accounts to the system.
Change the password verification procedure,
such that a user is blocked for 15 min after
three failed trials.
Implement system calls to enable users to print
out attributes in i-node and process table.
Appropriate security checking should be implemented to ensure that a user cannot steal information from other accounts.
Our experiments show that it is better to guide
students to conduct the above tasks in one or two
lab sessions, in which a teaching assistant can
provide immediate helps. These lab sessions are
extremely necessary when students have significantly different backgrounds.
Set-UID programs
Set-UID is an important security concept in Unix
operating systems. It is a good example to show
students how privileges are escalated in a system.
In this project, students learn the Set-UID concept
and its implementation. Students also learn how
an attacker can escalate his privileges via exploiting a vulnerable Set-UID program.
Students need to finish the following tasks: (1)
Figure out why passwd, chsh, su commands need
to be Set-UID programs, and what will happen if
196
Storing the ACLs: This is another challenging
part of the project. Students need to think
where exactly they should store the access
control list. The current Minix implementation
does not seem to have a place to store the full
access control list. Students need to solve this
issue. A hint we give them is to use some unused entries in i-nodes or store the access
control lists in separate files.
ACL management: In addition to implementing
the full ACL in the kernel, students also need to
implement the corresponding utilities, such
that users can manage the access control list
of their own files.
Capability
Capability is another important concept in computer security. The goal of this project is to help
students understand the concept of capability. We
defined a set of capabilities in this project, with
each capability representing whether a process
can invoke a specific system call. Students need to
implement these capabilities in Minix. Specifically, their capability mechanism should be able to
achieve the following functionalities: (1) Permission granting based on capability. (2) Capability
copying: A process should be able to copy its capabilities to another process. (3) Capability reduction/restoration: A process should be able to
amplify or reduce its current capabilities. For example, a process can temporarily remove its own
Set-UID capability, but later can add it back. Of
course, a process cannot assign a new capability
to itself. (4) Capability revocation: Root should
be able to revoke capabilities from processes.
In this project, students need to take care of
the following issues:
Capability list representation: Students need
to think about how to represent the set of defined capabilities. They also need to think how
they can associate capabilities with each process. The final representation should conveniently support the required functionalities
(e.g., copying, removing, etc.).
Storing the capabilities: This is another challenging part of the project where students
need to think where capabilities should be
stored.
One option is to add an entry to the process
table to store the capabilities. A potential issue is
how feasible it is to extend the process table (note
that the process table is a kernel data structure
used by many other components).
W. Du et al.
Capability revocation: Students need to think
about how to revoke an objects capability.
They must be careful not to introduce vulnerabilities in this part.
Capability management: Students need to take
care of two types of users, normal and superusers. They need to consider the following issues:
how they manage these two types of users, and
what functionalities are associated with each
of them.
This project enhanced the students understanding of the capability concept. At the beginning, most
students had trouble mapping the capability concept to the real world. We did not tell the students
how the capability should be implemented, but to
ask them to design their own capability mechanisms. This requires them to figure out how the
capabilities should be represented in the system,
where to store the capabilities, how the system can
use the capability to conduct access control, etc.
Once students have figured out all of these issues,
the implementation becomes relatively easy;
therefore the amount of coding for this project is
not significant, and students are able to accomplish
the task within 2 weeks. Had it not been for Minix,
students would need to spend a lot of time implementing a meaningful system where the effect of
the capability can be demonstrated.
We encouraged students to design some other
features beyond the basic requirements. Students
were highly motivated, some implemented a more
generic capability-based access control mechanism than the required one, and some allow new
capabilities to be defined by the superuser.
Sandbox
A sandbox is an environment in which the actions
of an untrusted process are restricted according to
a security policy (Bishop, 2002). Such restriction
protects the system from untrusted applications.
In Unix, chroot can be used to achieve a simple
sandbox.
The instruction chroot newroot cmd causes
cmd to be executed relative to newroot, i.e., the
root directory is changed to newroot for cmd and
any of its child processes. Any program running
within this sandbox can only access files within
the subdirectory of newroot.
Some Unix systems allow normal user to run
chroot sandbox (just make chroot a Set-UID
program). However, this can introduce a serious
problem: malicious users may create a login environment with their own shadow file and passwd
file under newroot, which will help them gain
197
encryption/decryption operations should be transparent to users. Implementing EFS requires students to combine techniques such as encryption,
key management, authentication, access control,
and security in OS kernels and file systems;
therefore this project is a comprehensive project.
We give this project as a final project.
Minix system has a complete file system, so students can build the EFS on top of it. As we mentioned before, Minix file system is reasonably
easy to understand; students can start building
their own EFS after they understand how the file
system works.
This project is a good candidate for the final
comprehensive project because it covers a variety
of security-related concepts and properties:
198
the advantages and disadvantages of their designs, so they can evaluate their own designs.
Using encryption and hashing algorithms: Although students are provided with codes for
encryption and hashing algorithms, they still
need to learn how to use it correctly. Because
AES is a block cipher, students need to deal
with the issues related to the block and padding; otherwise, their reading/writing system
calls might not function correctly.
Security analysis: After most of the students
have finished their designs, we gave them several incorrect designs that we have encountered in the past, and we asked them to find
out whether those designs are secure or not;
if not, how to break those EFSs.
Project simplification
For students who do not have sufficient background in operating system kernel programming,
we need to customize our projects for them. We
divide the EFS project into three projects:
1. Project 1: Encryption algorithms. This project
gets students familiar with the AES algorithm.
Students need to implement a user-level program to encrypt and decrypt files.
2. Project 2: Kernel modification. The second project asks students to modify the corresponding
system calls, such that some special files are
always read/write using encryption. However,
to simplify this project, we ask them to always
use a fixed key for the encryption. The key can
be hard-coded in their programs.
3. Project 3: Key management. This project deals
with the key management issue that is intentionally left off in the previous project. Students now need to find a place to store the
key; they need to make decision on whether
to use the same key for all the files or one
key for each file; they also need to deal with
the authentication issues, etc.
Vulnerability analysis
Vulnerability analysis strengthens the system security by identifying and analyzing security flaws
in computer systems. This project intends to expose students to such a critical approach. We
have two goals in this project: The first goal is to
let students gain first-hand experience on software
vulnerabilities, be familiar with a list of common
security flaws, and understand how a seemly-notso-harmful flaw in a program can become a risk
to a system. The second goal is to give students
W. Du et al.
opportunities to practice their vulnerability analysis and testing skills. Students can learn a number
of methodologies from class, such as vulnerability
hypothesis, penetration testing methodology,
code inspection techniques, and blackbox and
whitebox testing (Pfleeger et al., 1989). They
need to practice these methodologies in this
project.
To achieve our goals, we modify the Minix
source codes and intentionally introduce a set of
vulnerabilities. We call these vulnerabilities the
injected vulnerabilities. The revised Minix system
is then given to students. The students are given
some hints, such as a list of possible vulnerabilities, the possible locations of the vulnerable programs, etc. Their task is to find out and verify
these vulnerabilities.
The injected vulnerabilities cover a wide spectrum of vulnerabilities, such as buffer overflow,
race condition, security holes in the access control
mechanisms, security holes in Set-UID programs,
information leakage, and denial of service. These
vulnerabilities reflect system flaws caused by incorrect design, implementation, and configuration. All these vulnerabilities are collected from
real commercial Unix operating systems, such as
SunOS, HP-Unix and Linux, and are then ported
to Minix. We have ported nine vulnerabilities so
far, with six in the user level and three in the kernel level. We will port other typical vulnerabilities
to Minix in the future.
Students in this project need to accomplish the
following tasks:
Identify vulnerabilities. This is a warm-up
practice to help students get familiar with
vulnerability living environment.
Exploit vulnerabilities. This is a challenging
and interesting part of the project in which students write attack programs aiming at these
vulnerabilities. Demonstration is needed to
show what unauthorized privilege can be
obtained.
Fix vulnerabilities. Students need to design
solutions to eliminate or remedy the identified
vulnerabilities.
199
time to help the students finish this assignment. The preparation part is extremely important. If students fail this part, they will spend
enormously more time on the subsequent projects. This is very clear when we compare the
performance of the students in our 2003 course
with that of the students in 2002. We plan to integrate the materials related to Minix into the
lecture, so students can be prepared better.
Background knowledge: We also realized that
some students in the class are not familiar
with the Unix environment because they have
been using Windows most of the time. This
brings some challenges because these students
do not know how to set up the PATH environment variable, how to search for a file, etc.
We plan to develop materials to help students
get over this obstacle.
Cheating: Cheating did occur, especially on the
final encrypted file system project. We now
have a list of questions that we will ask during
students demonstrations. They not only help
us evaluate students projects, but also are
quite effective so far in identifying cheatings.
Example of questions include where do you
save keys and why?, can your implementation work on large files? and how did you handle
that?, etc. Students who simply copy others
implementation be will most likely unable to
answer these questions.
Preparation: From our experience, the preparation project is crucial to the success of the
subsequent assignments. Some students who
overlooked this assignment find themselves in
trouble later. In fact, when we used the proposed approach at the first time, we did not
give students this assignment because we
thought it was not necessary. As a result, students later spent a great deal of time in figuring out how to achieve the tasks in this
assignment. Most of the students told us that
they spent 80% of their time to get familiar
with the system. Once they know how Minix
works, they can spend short time to finish the
required task. Therefore, when we use the
approach again, we used several lectures to
inform students the necessary materials, and
ask the TA to devote significant amount of
200
adopted by other people. This requires us to
provide detailed documentations, instructions,
and a pool of different projects covering a wide
range of security concepts.
References
Ashton P, Smxdthe solaris port of minix. 1996.
Bishop M. Computer security: art and science. Addison-Wesley;
2002.
Bochs, <http://bochs.sourceforge.net>; 2002.
Christopher WA, Procter SJ, Anderson TE. The nachos instructional operating system. In: Proceedings of the winter
1993, USENIX conference, San Diego, CA, USA, January
25e29, 1993. p. 481e9. Available from: http://http.cs.
berkeley.edu/%126;tea/nachos.
Comer D. Operating system design: the XINU approach. Prentice
Hall; 1984.
Hill JMD, Carver CA Jr, Humphries JW, Pooch UW. Using an isolated network laboratory to teach advanced networks and
security. In: Proceedings of the 32nd SIGCSE technical symposium on computer science education, Charlotte, NC, USA,
February 2001. p. 36e40.
Landwehr CE, Bull AR, McDermott JP, Choi WS. A taxonomy of
computer program security flaws. ACM Computing Surveys
September 1994;26(3):211e54.
Mayo J, Kearns P. A secure unrestricted advanced systems laboratory. In: Proceedings of the 30th SIGCSE technical symposium on computer science education, New Orleans, USA,
March 24e28, 1999. p. 165e9.
Meyers C, Jones TB. Promoting active learning: strategies for
the college classroom. San Francisco, CA: Jossey-Bass;
1993.
Mitchener WG, Vahdat A. A chat room assignment for teaching
network security. In: Proceedings of the 32nd SIGCSE technical symposium on computer science education, Charlotte,
NC, USA, February 2001. p. 31e5.
Pfleeger C, Pfleeger S, Theofanos M. A methodology for penetration testing. Computers and Security 1989;8(7):613e20.
W. Du et al.
Spafford EH. February 1997 testimony before the United States
House of Representatives subcommittee on technology,
computer and network security, 2000. Available from:
http://www.house.gov/science/hearing.htm.
Tanenbaum A. Operating systems: design and implementation.
2nd ed. Prentice Hall; 1996.
Tanenbaum A, <http://www.cs.vu.nl/%126;ast/minix.html>;
1996.
VMWare, <http://www.vmware.com>; 1996.
Wenliang Du received the B.S. degree in Computer Science
from the University of Science and Technology of China, Hefei,
China, in 1993, the M.S. degree and the Ph.D. degree from the
Computer Science Department at Purdue University, West Lafayette, Indiana, USA, in 1999 and 2001, respectively. During
his studies in Purdue, he did research in the Center for Education and Research in Information Assurance and Security
(CERIAS). Dr. Du is currently an assistant professor in the
Department of Electrical Engineering and Computer Science at
Syracuse University, Syracuse, New York, USA. His research
background is in computer and network security. In particular,
he is interested in wireless sensor network security and privacypreserving data mining. He is also interested in developing
instructional laboratories for security education using instructional operating systems. His research has been supported by
the National Science Foundation and the Army Research Office.
Mingdong Shang received his B.S. Degree in Electrical and
Mechanical Engineering from Beijing University of Aeronautics
and Astronautics in 1998. He is Currently a Ph.D. student in
the Department of Electrical Engineering and Computer Science
at Syracuse University. His research interests include computer
security and network security, and he has been focusing on
developing Minix-based instructional laboratory environment
and lab exercises for computer and network security courses.
Haizhi Xu received his B.S. and M.S. degrees both in computer
engineering from Harbin Institute of Technology, Herbin, China,
in 1995 and 1997 respectively. He is a Ph.D. Candidate at Syracuse University, Syracuse, NY, USA, majoring in computer engineering. His current research interests are computer system
security, intrusion detection and mitigation, and operating
systems.
www.elsevier.com/locate/cose
KEYWORDS
Threshold
cryptography;
Signature schemes;
Multi-secret;
Traceability;
Multiple signing
policies
Abstract In recent years, a great deal of work has been done on threshold signature schemes and many excellent schemes have been proposed. In Eurocrypt94,
Li et al. [Threshold-multisignature schemes where suspected forgery implies
traceability of adversarial shareholders. In: Advances in CryptologydProceedings
of EUROCRYPT 94; 1994. p. 413e9] proposed a threshold signature scheme with
traceability, which allows us to trace back to find the signer without revealing
the secret keys. And in 2001, Lee [Threshold signature scheme with multiple signing
policies. IEE Proc Comput Digit Tech 2001;148(2):95e9] proposed a threshold signature scheme with multiple signing policies, which allows multiple secret keys to be
shared among a group of users, and each secret key has its specific threshold value.
In this paper, based on these schemes, we present a traceable threshold signature
scheme with multiple signing policies, which not only inherits their properties, but
also fixes their weaknesses.
2005 Elsevier Ltd. All rights reserved.
Introduction
In order to keep the secret efficiently and safely,
Shamir (1979) and Blakley (1979) presented (l, n)
threshold secret sharing schemes independently
in 1979. In such a scheme, the dealer splits the secret x into shares x1 ; .; xn among players, and
sends the share to the corresponding player. As
* Corresponding author. Tel.: 86 21 62835602; fax: 86 21
62933504.
E-mail address: cao-zf@cs.sjtu.edu.cn (Z. Cao).
0167-4048/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cose.2005.11.006
202
will allow any one, or any selected subset, of the
launch enable codes to be activated in this
scheme. Till now, many efficient schemes for sharing more than one secret have been proposed
(Blundo et al., 1994; Lee, 2001).
Digital signature is a major research topic in
modern cryptography and computer security. The
signer needs to take full responsibility for their
digital signatures. In 1991, Desmedt and Frankel
(1991) combined digital signatures and threshold
secret sharing schemes to propose the concept of
threshold signature. Like threshold secret sharing
schemes, in a threshold signature scheme, the responsibility for signing a document is shared by
a group of signers from time to time. A threshold
signature scheme is designed to allow that only
when the number of players attains the given
threshold value, the signature can be created.
More precisely, a typical (l, n) threshold signature
scheme follows the three basic properties:
Any l or more players in the group can cooperate with each other to generate a valid group
signature, while they do not reveal any information about their sub-secret keys and the
secret key.
Any l 1 or fewer players in the group cannot
create a valid group signature.
Any verifier can verify the group signature with
only knowing the group public key.
However, Li et al. (1994) have pointed out that
most of the (l, n) threshold digital signature
schemes proposed so far suffer from the so-called
conspiracy attack. That is, any l or more players
can cooperate to impersonate any other set of players to forge the signature. To prevent from the attack, they added a random number to the shadow
to form a sub-secret key held by each player. The
additional random number gives the (l, n) threshold
signature scheme the property of traceability,
which means that we can trace adversarial signers
if forgery is suspected. Unfortunately, Michels and
Horster (1996) showed that the signer cannot be
sure who his cosigners are in Li et al.s (1994)
scheme, and this weakness violates the traceability
property.
Corresponding to multi-secret sharing scheme,
there is threshold signature scheme with multiple
group secret keys. In this kind of scheme, different
secret keys can be used to sign documents depending on the significance of the documents. Once the
number of the cooperated users is greater than or
equal to the threshold value of the group secret
key, they can cooperate to sign the document.
In 2001, Lee proposed an efficient threshold
J. Shao, Z. Cao
signature scheme with multiple signing policies.
However, in Lees scheme, there are n group
secret keyS0 ; S1 ; .; Sn1 ; if the group secret key
S0 is exposed, then the scheme is broken. Furthermore, the partial signature cannot be verified.
In this paper, based on Li et al.s scheme and
Lees scheme, we present a traceable threshold
signature scheme with multiple signing policies.
The proposed scheme allows the players to apply
different group secret keys to sign documents, and
only two sub-secret keys need to be kept by each
player. Furthermore, in the proposed scheme, we
can trace back to find the signer without revealing
the secret keys. In addition, the exposure of any of
the group secret key cannot harm the security of
other unexposed group secret key.
The rest of this paper is organized as follows. In
the next section, we first review Li et al.s scheme
and Lees scheme. Then we propose our scheme
and discuss its security. Finally, conclusions are
marked.
Li et al.s scheme
Li et al. (1994) proposed two (l, n) threshold signatures with traceable players: the first one needs
a mutually trusted dealer while the second one
does not. In this section we only review their first
scheme, which needs a mutually trusted dealer
(Michels and Horster, 1996).
The dealer picks two large primes p, q with
qjp 1, a generator
order
q and a
P gGFp
i
*
polynomial fx l1
a
x
mod
q
a
Z
i
i
q ; i 0;
i0
1; .; l 1 . Then the dealer determines x
f0 a0 as the group secret key and computes
y gx mod p as the public group key. The secret
share of each player Pi 1 i n with identity
IDi is ui bi fIDi mod q using a random value
bi Zq* and the public keys are yi gui mod p and
zi gbi mod p.
If a group B with jBj t of players would like to
generate a signature of a message m, then each
player
Pi (i B) picks ki Zq* and broadcasts
ki
ri g mod p . Once, all ri are available, each
player and the designated combiner (DC) compute
R
iB
Lees scheme
203
set of the public values, x1 ; .; xnl , and B be
the union of Bl and Ba. If a group Bl with jBl j l
of players would like to generate a signature of
a message m with the threshold value l, then
each player Pi iBl picks ri 1; .; N 1 and
n
broadcasts ui ui riL mod N. Once all ui are
available, then each player and the DC compute
Y
n
ui mod N RL mod N
U
iBl
e Hm; U
Q
where R iBl ri mod N.
Pi iBl computes
nl d
zi ri KiL
Then
each
player
mod N
Q
Q
where di jD;j;B xi xj jB;jsi 0 xj e. Each
player Pi iB sends the values m and zi to the DC
who can compute the group signature by
Y Y
nl
Z
zi
Wi mod N RaL de mod N
iBl
iBa
nli
Si adL mod N
Yi adL
ni
mod N
e
mod N
U ZL Ynl
Initialization phase
Firstly, the dealer selects the following parameters:
(1) two large primes c, c0 , with c0 jc 1;
(2) a generator GFc of order c0 ;
(3) a number N pq(p 2p0 1 and q 2q0 1),
where p, q, p0 and q0 are large primes, and
defines 4N 2p0 q0 ;
204
J. Shao, Z. Cao
zi asi e mod N
li
H2 a; b; zi ; yie ; gi ; g0i
wi r i l i s i e
ti bi eri mod c0
10
Thus,
the dealer publishes
c; c0 ; g; N; L; L0 ;
a; b; H1 ; H2 as the group public parameters and
keeps fp; q; p0 ; q0 ; 4N; d; fxg from being revealed. Let AjAj n be the set of all players
public values xis in the group, CjCj n 1 be
the set of all public shadows public values xis,
and D be the union of A and C.
Then the dealer computes the n group secret
keys Si ; i 0; .; n 1 and the corresponding n
public group keys Yi ; i 0; .; n 1 as follows:
dLi L0ni
Si a
Yi a
mod N
dLni L0ni
mod N
1
2
12
iBa
13
ti mod c0
14
iBl
Q
nli
Wi KiL di e mod
N, and di jD;j;B
where Q
xi xj jB;jsi 0 xj . m; Z; T; U; Bl is the
group signature of the signers in Bl to message m.
This signature can be checked by computing
e H1 m; U; Bl
e H1 m; U; Bl
gi ari mod N
e
mod N
1 Z L Ynl
Y
gT
vi Ue mod c
15
16
iBl
hold.
n
e
Theorem 1. If 1 Z L Ynl
mod N and gT
vi U e
iBl
Wi KiL
ie
Li s
Lnl si di e
Security discussions
mod N
nli d e
iL
i
mod N
mod N
Y
iBa
Lnl
asi edi
iBl
asi i edi L
nl
L0l mod N
iBa
si edi Lnl L0l
mod N
iB
also, we have
si Q
fxi =2
0 0
. mod p q
2
jD;jsi xi xj
Y
di
xi xj
Y
0 xj
jD;jB
jB;jsi
Consequently,
n
n
nl 0l L
2nl 0l e
e
ZL Ynl
adeL L
mod N
adL L
2nl L0l
adeL
adeL
2nl L0l
mod N 1 mod N
ti
iBl
vi uei
mod c
iBl
vi
iBl
vi Ue mod c
iBl
ti mod c0
iBl
we have
gT
Y
iBl
vi Ue mod c
205
Y
iBl
uei
mod c
Conclusions
Based on the schemes of Li et al. and Lee we have
devised a traceable threshold signature scheme with
multiple signing policies. In the proposed scheme,
any group secret key is exposed, which cannot harm
the security of other unexposed group secret keys.
Moreover, our scheme has the traceability property.
Acknowledgements
This research is supported by the National Natural
Science Foundation of China for Distinguished
Young Scholars under Grant No. 60225007, the
National Research Fund for the Doctoral Program
of Higher Education of China under Grant
No.20020248024, and the Science and Technology
Research Project of Shanghai under Grant Nos.
04JC14055 and 04DZ07067.
References
Blakley GR. Safeguarding cryptographic keys. In: Proceedings of
AFIPS National Computer Conference, vol. 48, Arlington, VA,
June 1979. p. 313e7.
Blundo C, Santis AD, Crescenzo GD, Gaggia AG, Vaccaro U. Multi
secret sharing schemes. In: Desmedt YG, editor. Advances in
206
cryptologydCrypto94 Proceedings. LNCS 839. Berlin:
Springer-Verlag; 1994. p. 150e63.
Desmedt Y, Frankel Y. Shared generation of authenticators and
signatures. In: Advances in cryptologydCrypto91 Proceedings; 1991. p. 457e69.
Lee NY. Threshold signature scheme with multiple signing
policies. IEE Proc Comput Digit Tech March 2001;148(2):95e9.
Li C, Hwang T, Lee N. Threshold-multisignature schemes where
suspected forgery implies traceability of adversarial shareholders. In: Advances in CryptologydProceedings of EUROCRYPT 94; 1994. p. 413e9.
Michels M, Horster P. On the risk of disruption in several
multiparty signature schemes. In: Advances in Cryptologyd
Proceedings of Asiacrypto 96; 1996. p. 334e45.
Shamir A. How to share a secret. Commun ACM 1979;22(11):
612e3.
Simmons GJ. An introduction to shared secret and/or shared
control schemes and their application, Contemporary cryptology. IEEE Press; 1991. p. 441e97.
J. Shao, Z. Cao
Shoup V. Practical threshold signatures. In: Preneel B, editor.
EUROCRYPT 2000. LNCS 1807; 2000. p. 207e20.
Jun Shao received his B.S. degree in Computer Science from
Northwestern Polytechnical University in 2003. Currently,
he is a doctoral candidate in the Department of Computer
Science and Engineering, Shanghai Jiao Tong University.
His research interests lie in cryptography and network
security.
Zhenfu Cao is the professor and the doctoral supervisor of
the Department of Computer Science and Engineering,
Shanghai Jiao Tong University. His main research areas
are number theory, modern cryptography, and information
security. He is the recipient of the Youth Award and Research
Fund of Chinese Science Academy (1986), the first prize Award
for Science and Technology in Chinese University (2001), and
the National Outstanding Youth Fund of China (2002), etc.
www.elsevier.com/locate/cose
KEYWORDS
RFID;
Access control;
Authentication;
Security;
APF
Abstract The objective of this paper is to propose an idea called APF (Authentication Processing Framework) as one of the ways to deter the growing concerns
of unauthorized readers from accessing the tag (transponder) which could result into the violations of information stored in the tag. On one hand, we will discuss the
importance of RFID systems and on the other hand, we will discuss about the security implications that the RFID systems have over consumers privacy and security.
In this paper, we are trying to weigh the two issues, importance of RFID system and
the RFID security implications. Having done that, we are recommending our idea
called APF (Authentication Processing Framework) as a good method to overcome
the above mentioned problem.
2005 Elsevier Ltd. All rights reserved.
Introduction
A typical RFID system will consist of a tag, a reader,
an antenna and a host system. Most RFID tags are
passive which means that they are battery-less and
that they obtain power to operate from the
reader. While some are battery powered tags
which means they are active and do not need
power from the reader to function. RFID tags are
tiny computer chips connected to miniature antennae that can be affixed to physical objects
* 101 Domiru-Tsuda, 3-25-41 Tsudamachi, Kodaira-shi, Tokyo,
Japan. Tel./fax: 81 423 43 4403.
E-mail addresses: ayoadejohn@yahoo.com, ayoade@nict.go.jp
(Berthon, 2000). In the most commonly touted applications of RFID, the microchip contains an Electronic Product Code (EPC) with sufficient capacity
to provide unique identifiers for all items produced
worldwide. When an RFID reader emits a radio signal, tags in the vicinity respond by transmitting
their stored data to the reader.
With passive (battery-less) RFID tags, readrange can vary from less than an inch to 20e30
feet, while active (self-powered) tags can have
a much longer read-range.
Typically, the data are sent to a distributed
computing system involved in, perhaps, supply
chain management or inventory control (Spychips,
2003).
0167-4048/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cose.2005.11.008
208
RFID system has many beneficial uses as it can be
applied to many areas of our day to day activities.
It supports many versatile applications including
entrance gate control at transport facilities, custody control and so on. However, the major barrier
that the RFID system is facing presently is the issue
of possibility of privacy violation which could be as
a result of illegal access.
Since, RFID tags respond automatically to any
reader; that is, they transmit without the knowledge of the bearer, and this property can be used
to track a specific user or object over wide areas.
While expectations are growing for the use of RFID
systems in various fields, opposition to their use
without the knowledge of the user is increasing
(CASPIAN).
Furthermore, if personal identity were linked
with unique RFID tag numbers, individuals could be
profiled and tracked without their knowledge or
consent. For example, a tag embedded in a shoe
could serve as a de facto identifier for the person
wearing it. Even if item-level information remains
generic, identifying items people wear or carry
could associate them with, for example, particular
events like political rallies (Spychips, 2003).
Our main goal is to find a solution to the privacy
problem of illegal access of readers to the tags
(tags) in the RFID system.
Moreover, the RFID has been around for many
years now. The first notable application was in
identifying aircraft as friend or foe. Since then
RFID has been deployed in a number of application
such as identifying and tracking animals from
implanted tags; tracking transport containers;
access control systems; keyless entry systems for
vehicles; and automatic collection of road tolls
(Allan, 2003).
Many other RFID applications may emerge.
Consider an airport setting. Both boarding passes
and luggage labels could be tagged with RFID
devices. Before take-off, an RFID enabled airplane
could verify that all boarding passes issued were
on the plane and that all luggage associated with
those was in the hold. Within an airport, tracking
passengers by their boarding passes could improve
both security and customer service. Of course, in
other environments this would be an undesirable
violation of privacy (Weis, 2003).
Regarding consumers privacy violation, we can
refer to the above example. Since many airlines
are in the airport with different workers, there
could be malicious workers working for different
airlines with ulterior motives to violate consumers
privacy. There is a tendency that the malicious
workers would be accessing and monitoring the
private information of consumers.
J. Ayoade
Therefore, there should be a preventive method
that should be put in place to deter the violation of
privacy of consumers.
ID 1 Key = 10
ID 1 Key = 10
ID 2 Key = 11
Registration
.
ID N Key = **
Tags
ID 2 Key = 11
a. Kill command idea e The standard mode of operation proposed by the AutoID Center is indeed for tags to be killed upon purchase of
the tagged product. With their proposed tag
design, a tag can be killed by sending it a special kill command. However, there are many
environments, in which simple measures like
kill command are unworkable or undesirable for privacy enforcement. For example,
consumers may wish RFID tags to remain operative while in their possession.
b. Faraday cage approach e An RFID tag may be
shielded from scrutiny using what is known as
a Faraday cage e a container made of metal
mesh or foil which is impenetrable by radio signals (of certain frequencies). There have been
reports that some thieves have been using foillined bags in retail shops to prevent shoplifting-detection mechanisms (Liu et al., 2004).
c. The active jamming approach e An active jamming approach is a physical means of shielding
tags from view. In this approach, the user could
use a radio frequency device which actively
sends radio signals so as to block the operation
of any nearby RFID readers. However, this approach could be illegal for example if the
broadcast power is too high it could disrupt
all nearby RFID systems and not that alone it
could be dangerous and cause problems in restricted areas like hospital and in the train.
d. The blocker tag approach e The blocker tag is
the tag that replies with simulated signals
when queried by reader so that the reader cannot trust the received signals. Like active
209
ID N - Key = ***
Authentication Processing
Framework
Figure 1
210
J. Ayoade
ID R1 Key = 110
ID R1 = 110
ID R2 Key = 111
Registration
.
ID R2 = 111
ID RN Key = ***
ID R N = ***
Readers
Authentication Processing
Framework
Figure 2
Registration
process
Tag
ID1
Tag
ID2
Tag
IDN
Reader
ID1
Reader
ID2
Reader
ID N
..
Challenge
Reader 1
Reader 2
Reader N
Response
Readers
Tags
Authentication Processing
Framework
Figure 3
denied.
The registration/access control of readers to the APF/tag. O-means access granted X-means access
211
Challenge
Issue Command to Access
Readers
Kill Command
Faraday Cage
Active jamming
Blocker tag
APF
Tags
4
Response
Get the encrypted data
1 Register
decryption keys
with the APF
2
5
Figure 4
APF
Database
Access Granted
Figure 5
Conclusion
In conclusion, information in tags can be protected
from being read by unauthorized readers through
212
the authentication procedures as we have described above in the APF system. It is very
imperative to protect unauthorized access to the
tag in order to prevent the violation of privacy and
confidential information stored in it. Moreover, the
above framework is a mutual authentication which
makes it a system that will be able to protect
unauthorized or malicious readers from accessing
the information stored in the RFID tags.
References
Adopting fair information practices to low cost RFID systems, <http://www.guir.berkeley.edu/pubs/ubicomp2002/
privacyworkshop/papers/UBICOM2002_RFIDv3.doc>.
Allan Alex. RFID and privacy, <http://www.whitegum.com/
journal/rfidspch.htm>; November 2003.
Ari Juels, Rivest RL, Szydlo M. The blocker tag: selective
blocking of RFID tags for consumer privacy, <http://www.
rsasecurity.com/rsalabs/staff/bios/ajuels/publications/
blocker/blocker.pdf>; 2003.
Berthon Alain. Security in RFID, <http://www.nepc.sanc.
org.sg/html/techReport/N327.doc>; July, 2000.
C.A.S.P.I.A.N., <http://www.nocards.org>.
E-ZPASS Regional Consortium Service Center, <http://www.
ezpass.com>.
Liu Dingzhe, Kobara Kazukuni, Hideki Imai. Pretty-simple
privacy enhanced RFID and its application. In: (SCIS 2004)
J. Ayoade
The symposium on cryptography and information security,
Sendai, Japan; January 2004.
Nakamura Naoshi. Future of the Internet RFID, <http://www.
gbde.org/acrobat/rfid03.pdf>.
Ohkubo Miyako, Suzuki Koutarou, Kiinoshita Shingo. Hash-chain
based forward-secure privacy protection scheme for lowcost RFID. In: (SCIS 2004) The symposium on cryptography
and information security, Sendai, Japan; January 2004.
Position statement on the use of RFID on consumer products,
<http://www.spychips.org/jointrfid_position_paper.html>;
November 2003.
Weis Stephen A. Security and privacy in radio-frequency identification devices, <http://theory.lcs.mit.edu/wSweis/masters.
pdf>; May 2003.
Dr. John Ayoade is an expert researcher in the Security Advancement Group of the National Institute of Information and
Communications Technology, Tokyo, Japan.
He obtained his Ph.D. degree in Information Systems under
Japanese government scholarship in the Graduate School of Information Systems in the University of Electro-Communications,
Tokyo, Japan.
Dr. Ayoades research work focuses on information and communications security and privacy. He has a very wide knowledge
in the university training involving lectures and practical in the
principles and practice of telecommunications and network policies, coupled with the sound theoretical and practical knowledge in Computer Science. He has presented and published
papers in many conferences and journals, respectively.
Dr. Ayoade is happily married to his loving and caring wife
Oluwatomi and they are blessed with a daughter and a son,
Opeyemi and Ayodeji, respectively.
www.elsevier.com/locate/cose
KEYWORDS
Hurst parameter;
Traffic;
Time series;
Distributed denial-ofservice flood attacks;
Anomaly detection
Introduction
The Internet is the infrastructure that supports
computer communications. It has actually become
the electricity of the modern society because
* Tel.: 86 21 62233389; fax: 86 21 62232517.
E-mail addresses: mli@ee.ecnu.edu.cn, ming_lihk@yahoo.
com.
URL: http://www.ee.ecnu.edu.cn/teachers/mli/js_lm(Eng).
htm.
0167-4048/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cose.2005.11.007
214
detection system (IDS) and intrusion prevention
system (IPS) are desired (Kemmerer and Vigna,
2002; Householder et al., 2002; Schultz, 2004; Sorensen, 2004; Gong, 2003; Li, 2004; Streilein et al.,
2003; Bencsath and Vajda, 2004; Feinstein et al.,
2003; Oh and Lee, 2003; Liston, 2004).
There are several categories of denial-ofservice (DOS) attacks (Gong, 2003). The CERT Coordination Center (CERT/CC) divides DOS attacks
into three categories: (1) flood (i.e., bandwidth)
attacks, (2) protocol attacks, and (3) logical attacks. This paper considers flood attacks.
A DDOS flood attack sends attack packets upon
a site (victim) with a huge amount of traffic, the
sources of which are distributed over the world so
as to effectively jam its entrance and block access
by legitimate users or significantly degrade its
performance. It never tries to break into the
victims system, making security defenses at the
protected site irrelevant (DDoS; Dittrich-a; Dittrich-b; Dittrich-c; Dittrich-d; Dietrich et al.;
Geng et al., 2002).
Usually, IDSs are classified into two categories.
One is misuse detection and the other anomaly
detection. Solutions given by misuse detection are
primarily based on a library of known signatures to
match against network traffic. Hence, unknown
signatures from new variants of an attack mean
100% miss. Therefore, anomaly detectors play
a role in detection of DDOS flood attacks. As far
as anomaly detection is concerned, quantitatively
characterizing abnormalities of statistics of abnormal traffic is fundamental.
A traffic stream is a packet flow. A packet
consists of a number of fields, such as protocol
type, source IP, destination IP, ports, flag setting
(in the case of TCP or UDP), message type (in the
case of ICPM), timestamp, and data length (packet
size). Each may serve as a feature of a packet. The
literature discussing traffic features is rich (see
e.g. Li, 2004; Streilein et al., 2003; Bencsath and
Vajda, 2004; Feinstein et al., 2003; Oh and Lee,
2003; Cho and Park, 2003; Cho and Cha, 2004; Lan
et al., 2003; Paxson and Floyd, 1995; Li et al.,
2003; Beran, 1994; Willinger and Paxson, 1998;
Willinger et al., 1995; Csabai, 1994; Tsybakov
and Georganas, 1998; MIT; Garber, 2000; Kim
et al., 2004; Mahajan et al., 2002; Kim et al.,
2004; Bettati et al., 1999). For instance, Mahajan
et al. (2002) consider flow rate, Kim et al. (2004)
use head message, Oh and Lee (2003) alone consider 86 features of traffic (not from a statistics view
though), and so on. To the best of our knowledge,
however, taking into account the Hurst parameter
H in characterizing abnormality of traffic series in
packet size under DDOS flood attacks is rarely seen
M. Li
except for Li (2004), where autocorrelation function (ACF) of traffic series in packet size (traffic
for short) with long-range dependence (LRD) is
taken as its statistical feature. As a supplementary
to Li (2004), this paper specifically studies how
H of traffic varies under DDOS flood attacks. In
this regard, the following two questions are
fundamental.
(1) Whether H of traffic when a site is under DDOS
flood attacks (abnormal traffic for short) is significantly different from that of normal one
(i.e., attack free traffic)?
(2) What is the change trend of H of traffic when
a site suffers from DDOS flood attacks?
We will give the answers to the above questions
from the point of views of processing data traffic
and theoretic inference and analysis.
In the rest of paper, section Test data sets is
about test data. We brief data traffic and use
a series of normal traffic in ACM to explain how
its H normally varies in section Brief of data
traffic. The answer to the question (1) is given
in section Using H to describe abnormality of
traffic under DDOS flood attacks. Then, in section
Change trend of traffic under DDOS flood
attacks, we use a pair of series (one is normal
traffic and the other abnormal one) that is provided by MIT Lincoln Laboratory to demonstrate that
averaged H of abnormal traffic tends to be significantly smaller than that of normal one and briefly
discusses this abnormality of abnormal traffic from
a view of Fourier analysis. The answer to the
question (2) is given in this section. Section Conclusions concludes the paper.
Change trend of averaged Hurst parameter of traffic under DDOS flood attacks
demonstrate a case how H of traffic varies under
DDOS attacks. Though whether or not MIT test
data are in the sense of standardization is worth
further discussion as stated in McHugh (2000),
they are valuable and can yet be test data for
the research of abnormality of abnormal traffic
due to available data traffic under DDOS flood attacks being rare.
G2 HcospH
pH2H 1
215
M1
1X
Hm n;
M m0
M. Li
(a)
1000
x(i)
216
500
256
512
768
1024
0.85
0.78
0.7
(c)
Hist[H(n)]
(b)
H(n)
16
24
32
Figure 1
of Hn.
0.5
0.25
0.5
0.75
Demonstrating statistical invariable H. (a) A real-traffic time series; (b) estimate Hn; (c) histogram
Beran (1994,
p. 55).
Thus, a consequence of Lemma
is that Hy Hx is considerable, where Hy and Hx
are average H values of x and y, respectively.
Hence, H is a parameter that can yet be used to
describe abnormality of traffic under DDOS flood
attacks. This gives the answer to the question (1)
in Section Introduction.
(a)
2000
x(i)
Change trend of averaged Hurst parameter of traffic under DDOS flood attacks
1000
256
512
768
217
1024
(c)
0.95
hist[H(n)]
(b)
H(n)
0.9
0.85
1
0.75
0.5
0.25
0.8
16
24
0
0.5
32
0.6
0.7
0.8
0.9
Figure 2 Demonstrating Hn of attack free traffic OM-W1-1-1999AF. (a) Time series of OM-W1-1-1999AF; (b)
estimate Hn of OM-W1-1-1999AF; (c) histogram of Hn of OM-W1-1-1999AF.
Hy < Hx :
Hy 0:774;
8
The above inequality exhibits a case of the change
trend of H of traffic under DDOS flood attacks. It
actually follows a general rule as can be seen
from the following analysis.
(a)
2000
y(i)
1000
256
512
768
1024
(c)
0.85
hist[H(n)]
(b)
H(n)
0.8
0.75
1
0.75
0.5
0.25
0.7
16
24
32
0.5
0.6
0.7
0.8
0.9
Figure 3 Demonstrating Hn of abnormal traffic OM-W2-1-1999AC. (a) Time series of OM-W2-1-1999AC; (b) estimate
Hn of OM-W2-1-1999AC; (c) histogram of Hn of OM-W2-1-1999AC.
218
M. Li
Conclusions
Acknowledgement
This work was supported in part by the National
Natural Science Foundation of China under the
project grant number 60573125. MIT Lincoln Laboratory is highly appreciated.
References
Adas A. Traffic models in broadband networks. IEEE Communications Magazine 1997;35(7):82e9.
Bencsath B, Vajda I. Protection against DDoS attacks based on
traffic level measurements. In: International symposium on
collaborative technologies and systems. Waleed W. Smari,
William McQuay; 2004. p. 22e8.
Bendat JS, Piersol AG. Random data: analysis and measurement
procedure. 2nd ed. John Wiley & Sons; 1986.
Beran J, Shernan R, Taqqu MS, Willinger W. Long-range dependence in variable bit-rate video traffic. IEEE Transactions on
Communications FebruaryeApril 1995;43(2e4):1566e79.
Beran J. Statistics for long-memory processes. Chapman & Hall;
1994.
Bettati R, Zhao W, Teodor D. Real-time intrusion detection and
suppression in ATM networks. In: Proceedings of the first
USENIX workshop on intrusion detection and network monitoring; April 1999.
Caccia DC, Percival D, Cannon MJ, Raymond G,
Bassingthwaighte JB. Analyzing exact fractal time series:
evaluating dispersional analysis and rescaled range methods.
Physica A 1997;246(3e4):609e32.
Change trend of averaged Hurst parameter of traffic under DDOS flood attacks
Carmona R, Hwang W-L, Torresani B. Practical time-frequency
analysis: Gabor and wavelet transforms with an implementation in S. Academic Press; 1999. p. 244e7.
Cho S, Cha S. SAD: web session anomaly detection based on
parameter estimation. Computers & Security 2004;23(4):
312e9.
Cho S-B, Park H-J. Efficient anomaly detection by modeling privilege flows using hidden Markov model. Computers & Security
2003;22(1):45e55.
Coulouris G, Dollimore J, Kindberg T. Distributed systems:
concepts and design. 3rd ed. Addison-Wesley; 2001.
Csabai I. 1/f noise in computer network traffic. Journal of Physics A: Mathematical and General 1994;27(12):L417e21.
Data are available from: <http://www.acm.org/sigcomm/ITA/>.
Distributed denial of service (DDoS) attacks/tools, <http://
staff.washington.edu/dittrich/misc/ddos/>.
Dietrich S, Long N, Dittrich D. An analysis of the Shaft distributed denial of service tool, <http://www.adelphi.edu/
wspock/shaft_analysis.txt>.
Dittrich D. The DoS projects Trinoo distributed denial of
service attack tool, <http://staff.washington.edu/dittrich/
misc/trinoo.analysis> (Dittrich-a).
Dittrich D. The Tribe Flood Network distributed denial of
service attack tool, <http://staff.washington.edu/dittrich/
misc/tfn.analysis.txt> (Dittrich-b).
Dittrich D. The Stacheldraht distributed denial of service
attack tool, <http://staff.washington.edu/dittrich/misc/
stacheldraht.analysis.txt> (Dittrich-c).
Dittrich D. The Mstream distributed denial of service attack
tool, <http://staff.washington.edu/dittrich/misc/mstream.
analysis.txt> (Dittrich-d).
Feinstein L, Schnackenberg D, Balupari R, Kindred D. Statistical
approaches to DDoS attack detection and response. In:
DARPA information survivability conference and exposition, vol. I, April 22e24, 2003. Washington, DC; 2003.
p. 303e14.
Garber L. Denial-of-service attacks rip the Internet. Computer
April 2000;33(4):12e7.
Geng X, Huang Y, Whinston AB. Defending wireless infrastructure against the challenge of DDoS attacks. Mobile Networks
and Applications 2002;7:213e23.
Gong F. Deciphering detection techniques: part III denial of
service detection. White Paper. McAfee Network Security
Technologies Group; January 2003.
Householder A, Houle K, Dougberty C. Computer attack trends
challenge Internet security. Supplement to Computer. IEEE
Security & Privacy April 2002;35(4):5e7.
Kemmerer RA, Vigna G. Intrusion detection: a brief history and
overview. Supplement to Computer. IEEE Security & Privacy
April 2002;35(4):27e30.
Kim SS, Reddy ALN, Vannucci M. Detecting traffic anomalies at
the source though aggregate analysis of packet header
data. In: Proceedings of Networking 2004. LNCS, vol. 3042,
Athens, Greece; May 2004. p. 1047e59.
Kim Y, Lau WC, Chuah MC, Chao HJ. PacketScore: statisticsbased overload control against distributed denial-of-service
attacks. In: IEEE Infocom 2004, Hong Kong; 2004.
Lan K, Hussain A, Dutta D. Effect of malicious traffic on the network. In: Proceedings of passive and active measurement
workshop, April 2003, La Jolla, California; 2003.
Leland E, Taqqu M, Willinger W, Wilson DV. On the self-similar
nature of ethernet traffic, (extended version). IEEE/ACM
Transactions on Networking February 1994;2(1):1e15.
Li Ming, Chi C-H. A correlation-based computational method
for simulating long-range dependent data. Journal of the
Franklin Institute SeptembereNovember 2003;340(6e7):
503e14.
219
Li S-Q, Hwang C-L. Queue response to input correlation functions: continuous spectral analysis. IEEE/ACM Transactions
on Networking December 1993;1(6):678e92.
Li Ming, Zhao W, Jia WJ, Chi C-H, Long DY. Modeling autocorrelation functions of self-similar teletraffic in communication networks based on optimal approximation in
Hilbert space. Applied Mathematical Modelling 2003;
27(3):155e68.
Li Ming, Chi C-H, Long DY. Fractional Gaussian noise: a tool of
characterizing traffic for detection purpose. In: Content computing LNCS, vol. 3309. Springer; November 2004. p. 94e103.
Li Ming. An approach for reliably identifying signs of DDoS flood
attacks based on LRD traffic pattern recognition. Computers
& Security 2004;23(7):549e58.
Lighthill MJ. An introduction to Fourier analysis and generalised
functions. Cambridge University Press; 1958.
Liston K. Intrusion detection FAQ: can you explain traffic analysis and anomaly detection? <www.sans.org/resources/idfaq/
anomaly_detection.php>; 6 July, 2004.
Livny M, Melamed B, Tsiolis AK. The impact of autocorrelation on
queuing systems. Management Science 1993;39:322e39.
MaDysan D. QoS & traffic management in IP & ATM networks.
McGraw-Hill; 2000.
Mahajan R, Bellovin S, Floyd S, Ioannidis J, Paxson V, Shenker S.
Controlling high bandwidth aggregates in the network.
Computer Communications Review July 2002;32(3):62e73.
Mandelbrot BB. Fast fractional Gaussian noise generator. Water
Resources Research 1971;7(3):543e53.
Mandelbrot BB. Gaussian self-affinity and fractals. Springer;
2001.
McHugh J. Testing intrusion detection systems: a critique of
the 1988 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Transactions
on Information System Security November 2000;3(4):262e94.
Michiel H, Laevens K. Teletraffic engineering in a broad-band era.
Proceedings of the IEEE December 1997;85(12):2007e33.
<http://www.ll.mit.edu/IST/ideval>.
Muniandy SV, Lim SC. On some possible generalizations of fractional Brownian motion. Physics Letters A 2000;266:140e5.
Muniandy SV, Lim SC. Modelling of locally self-similar processes
using multifractional Brownian motion of RiemanneLiouville
type. Physical Review E 2001;63:046104.
Oh SH, Lee WS. An anomaly intrusion detection method by clustering normal user behavior. Computers & Security 2003;
22(7):596e612.
Paxson V, Floyd S. Wide-area traffic: the failure of Poisson modeling. IEEE/ACM Transactions on Networking June 1995;3(3):
226e44.
Paxson V. Fast, approximate synthesis of fractional Gaussian
noise for generating self-similar network traffic. Computer
Communications Review October 1997;27(5):5e18.
Pitts JM, Schormans JA. Introduction to IP and ATM design and
performance: with applications and analysis software.
John Wiley; 2000. p. 287e93.
Schultz E. Intrusion prevention. Computers & Security 2004;
23(4):265e6.
Sorensen S. Competitive overview of statistical anomaly detection.
White Paper. Juniper Networks Inc., www.juniper.net; 2004.
Stalling W. Data and computer communications. 4th ed.
Macmillan; 1994.
Stallings W. High-speed networks: TCP/IP and ATM design
principles. Prentice Hall; 1998 [chapter 8].
Streilein WW, Fried DJ, Cunninggham RK. Detecting flood-based
denial-of-service attacks with SNMP/RMON. In: Workshop on
statistical and machine learning techniques in computer
intrusion detection. September 24e26, 2003. George Mason
University; 2003.
220
Tsybakov B, Georganas ND. Self-similar processes in communications networks. IEEE Transactions on Information Theory
September 1998;44(5):1713e25.
Willinger W, Paxson V. Where mathematics meets the Internet.
Notices of the American Mathematical Society August 1998;
45(8):961e70.
Willinger W, Taqqu MS, Leland WE, Wilson DV. Self-similarity in
high-speed packet traffic: analysis and modeling of ethernet
traffic measurements. Statistical Science 1995;10(10):
67e85.
Willinger W, Paxson V, Riedi RH, Taqqu MS. Long-range dependence
and data network traffic. In: Doukhan P, Oppenheim G,
Taqqu MS, editors. Long-range dependence: theory and applications. Birkhauser; 2002.
M. Li
Ming Li completed his undergraduate program in electronic
engineering at Tsinghua University. He received the M.S. degree
in mechanics from China Ship Scientific Research Center and
Ph.D. degree in Computer Science from City University of
Hong Kong, respectively. In March 2004, he joined East China
Normal University (ECNU) as a professor after several years experiences in National University of Singapore and City University
of Hong Kong. He is currently a Division Head for Communications & Information Systems at ECNU. His current research
interests include teletraffic modeling and its applications to
anomaly detection and guaranteed quality of service, fractal
time series, testing and measurement techniques. He has published over 50 papers in international journals and international
conferences in those areas.
www.elsevier.com/locate/cose
KEYWORDS
Reverse engineering;
Software protection;
Process metrics;
Binary code;
Complexity metrics
Abstract Reverse engineering of binary code file has become increasingly easier
to perform. The binary reverse engineering and subsequent software exploitation
activities represent a significant threat to the intellectual property content of commercially supplied software products. Protection technologies integrated within
the software products offer a viable solution towards deterring the software exploitation threat. However, the absence of metrics, measures, and models to characterize the software exploitation process prevents execution of quantitative
assessments to define the extent of protection technology suitable for application
to a particular software product. This paper examines a framework for collecting
reverse engineering measurements, the execution of a reverse engineering experiment, and the analysis of the findings to determine the primary factors that affect
the software exploitation process. The results of this research form a foundation for
the specification of metrics, gathering of additional measurements, and development of predictive models to characterize the software exploitation process.
2005 Elsevier Ltd. All rights reserved.
Introduction
Deployed software products are known to be
susceptible to software exploitation through reverse engineering of the binary code (executable)
files. Numerous accounts of commercial companies
reverse engineering their competitors product,
for purposes of gaining competitive advantages,
have been published (Bull et al., 1995; Chen, 1995;
* Corresponding author.
E-mail address: isutherl@glam.ac.uk (I. Sutherland).
0167-4048/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cose.2005.11.002
222
formats (Tilley, 2000) are routinely published, (2)
hex editors, dissemblers, software in-circuit emulators tools are readily available via Internet sources, and (3) similar attack scenarios involving
reverse engineering of binary code files are readily
accessible through numerous hacking websites.
There are also legitimate reasons for reverse engineering code in such cases as legacy systems (Muller et al., 2000; Cifuentes and Fitzgerald, 2000) and
so there is a body of published academic material
(Weide et al., 1995; Interrante and Basrawala,
1988; Demeyer et al., 1999; Wills and Cross,
1996; Gannod et al., 1988) to which a software exploiter could refer although the main focus of this
effort is at source code level (Muller et al., 2000).
The commercial software product developer is
forced to employ various protection technologies to
protect both the intellectual property content and
the software development investment represented
by the software asset to be released into the
marketplace. The commercial software product developer must determine the appropriate protection
technologies that are both affordable and supply
adequate protection against the reverse engineering
threat for a desired period of performance.
The absence of predictive models that characterize the binary reverse engineering software
exploitation process precludes an objective and
quantitative assessment of the time since first
release of the software asset to when software
exploitation is expected to successfully extract
useful information content. Similar to parametric
software development estimation models (e.g.,
COCOMO), the size and complexity of the binary
code file to be reverse engineered are considered
to be a prime contributing factor to the time and
effort required to execute the reverse engineering
activity. Additionally, the skill level of the software exploiter is also considered to be a primary
contributing factor. This paper describes the execution of an experiment to derive empirical data
that will validate a set of proposed attributes that
are believed to be the primary factors affecting
the binary reverse engineering process.
Background
An insider is assumed to have access to developmental information resources pertaining to the
commercial software product including the product source code. An outsider does not have access
to this information and must resort to analysis of
available software product resources. Such available software product resources may be little
more than the binary code file as released from
I. Sutherland et al.
the original developer. The outsider is forced to
execute a binary reverse engineering activity
beginning with the binary code file and concluding
when some desired end goal has been achieved.
The entry criterion is defined as the time when
the outsider first obtains a copy of the binary code
file so as to commence the reverse engineering
process. The commercial software product vendor
must assume that this entry criterion coincides
with the first market release of the product.
The exit criterion is determined by the time when
the outsider has satisfied a particular end goal for
the software exploitation process. Unlike software
development activities where the singular end goal
is to deliver a reasonably well-tested software
product to an end user given the available funding
and schedule resources, binary reverse engineering
activities may have multiple software exploitation
end goals (Kalb). The first software exploitation end
goal is defined as obtaining sufficient information regarding the software products operational function,
performance, capabilities, and limitation. Satisfying
this first software exploitation end goal enables the
software exploiter to transfer the information gathered to other software products that are either in
development or are already deployed. The second
software exploitation end goal builds upon the first
and is defined as enabling minor modifications to
alter/enhance the deployed software product. Satisfying this second software exploitation end goal
enables (1) circumvention of existing performance
limiters and protection technologies to enhance
the operational performance of the deployed software product, and/or (2) insertion of malicious
code artefacts to corrupt the execution of the deployed software product. The third software exploitation end goal builds upon the previous two and is
defined as enabling major modifications to enhance
the operational performance of the deployed software product. Satisfying this third software exploitation end goal enables a significant alteration of
the deployed software products functional and
operational performance characteristics.
Regardless of the particular software exploitation end goal to be obtained, the software exploitation process must be defined to base a series of
experiments that will enable the capturing of
measurement data. This software exploitation
process commences when the exploiter acquires
the binary code file that represents the subject for
the reverse engineering activity. For networkcentric computing, this acquisition step is rather
expediently performed and may be no more effort
than locating the particular executable or load file
that will be the subject of subsequent reverse
engineering activities. For commercial software
223
Assertions
Prior to executing the reverse engineering experiment, a set of assertions were identified to be
validated once experimental results had been
obtained. The first assertion was that a statistical
model could illustrate the relationship between
education and technical ability of the software
exploiter and their ability to successfully reverse
engineer a software product. The second assertion
was that the complexity of the binary code file is
related to the complexity of the human readable
source code. The reverse engineering experiment
uses the Halstead and McCabe software complexity
metrics to explore this relationship.
Experiment
The reverse engineering experiment requires a set
of test subjects to perform a sequence of tasks
relating to the reverse engineering of a set of
224
binary code files. The test subjects progress and
success during each task are monitored using
a variety of techniques to enable a series of
deductions to be made concerning the effort
required to reverse engineer a binary code file of
known size and complexity. To expediently execute the reverse engineering experiment, each
task was allotted a specific amount of time. The
progress of each test subject towards achieving
the task objective is then assessed. This approach
avoids the potentially open ended approach of
allowing each test subject to perform the task to
a completion criterion consuming as much time as
required to complete the task.
The set of test subjects included 10 student
volunteers attending the University of Glamorgan.
This included six undergraduates (three secondyear students and three third-year students),
three masters students, and one post-masters
student providing diversity in the education/technical skills suitable for experimental requirements.
Prior to the commencement of the experiment the
test subjects were informed that the nature of the
experiment related to reverse engineering of
executable programs that contained simple algorithms. The test subjects were provided a reading
list and a copy of the platform used (Redhat 7.2
GNU/Linux) along with documentation.
The reverse engineering experiment is partitioned into three stages that include an initial
assessment of the test subjects knowledge/skill
base, execution of the reverse engineering tasks
on a set of test objects, and a post-experiment
assessment to obtain feedback on the experiment.
A set of six test object programs were developed
that included (1) Hello World, (2) Date, (3) Bubble
Sort, (4) Prime Number, (5) LIBC, and (6) GCD
(Table 1). The test object programs were purposely
selected to be easily recognizable algorithms,
approximately same size to afford reasonable reverse engineering progress given a restrictive
amount of time, and absence of proprietary software elements to avoid legal infringements associated with reverse engineering of binary code files.
A subset of the six test object programs were
compiled with the debug option enabled (Program
Set A) while another subset of the six test objects
were compiled with the debug option disabled
(Program Set B). This approach provides the test
subjects the opportunity to reverse engineer the
same test object thereby enabling the assessment
of the value that debug information retained in the
binary code file adds to the reverse engineering
process.
The initial assessment of the test subjects
knowledge/skill base requires each test subject
I. Sutherland et al.
to complete a questionnaire. The questionnaire
inquired as to the number of years of experience
the test subject possessed regarding UNIX and the
C programming language. The majority of test
subjects had at least one years experience with
UNIX and the C programming language. The questionnaire also included a series of multiple choice
questions. The multiple choice questions focused
on UNIX commands relating to reverse engineering
to provide an assessment of the test subjects level
of experience/capability.
The execution of the reverse engineering experiment required each test subject to perform
a static, dynamic, and modification task on each of
the test object programs within a constrained time
limit. Test object filenames were selected so as
not to reveal the function of the binary. Each test
subject was supplied with a tutorial worksheet
that provided general guidance during each specific task. For example, the static task tutorial
worksheet requested each test subject to determine the size of the binary, determine the creation
time of the binary, speculate as to the type of
information contained in the file, identify all
strings and any constants present in the executable, and generate the assembly language for the
program. The dynamic task tutorial worksheet
requested each test subject to determine if any
input is required by the binary, describe the output
produced by the binary, identify any command line
arguments required by the binary, and describe
the function/purpose of the binary. The modify
task tutorial worksheet requested each test subject to perform a specific modification to the test
object program that requires the development and
insertion of a software patch to the binary code
file. For example, the test subjects were requested to modify the Hello World binary so that
upon execution the program would output World
Hello or to modify the Bubble Sort binary so that
upon execution the program sorts in descending
rather than ascending order. During the time
allotted for each task the test subjects were
required to perform the work requested and record their findings on the tutorial worksheets
provided for that task. Upon expiration of the
allotted time the tutorial worksheets were collected and replaced with the next tutorial worksheet in the experiment.
Test subjects were provided with Program Set A
during the morning session of the reverse engineering experiment. Experiment developers were
present to observe the execution of the experiment and to observe any interactions between test
subjects. Test subjects were allowed to interact
during the lunchtime break since it was decided
225
Session
Event
Morning
session
Initial assessment
Program Set A
(debug option enabled)
Test
object
Program
function
Task
Duration
(min)
Total
duration (min)
Hello World
Date
Bubble Sort
Prime Number
15
10
10
10
10
10
15
15
15
15
15
15
35
Static
Dynamic
Modify
Static
Dynamic
Modify
Static
Dynamic
Modify
Static
Dynamic
Modify
Hello World
Date
GCD
LIBC
Static
Dynamic
Modify
Static
Dynamic
Modify
Static
Dynamic
Modify
Static
Dynamic
Modify
10
10
10
10
10
10
15
15
15
15
15
15
30
45
45
Lunch
Afternoon
session
Program Set B
(debug option disabled)
30
30
45
45
Exit questionnaire
Results
The measurements collected during the reverse
engineering experiment are analyzed to validate
the two assertions defined in the beginning of this
paper (section Assertions).
Education/technical ability
The first assertion to be validated by the experimental results concerned whether the use of
a statistical model could illustrate the relationship
between education and technical ability of the
software exploiter and their ability to successfully
reverse engineer a software product. This assertion
226
I. Sutherland et al.
graphs do not coincide one-for-one, a correlation
coefficient of 0.7236642 was computed illustrating
a statistically significant relationship between the
educational/technical ability of the software exploiter and their ability to successfully reverse engineer the binary code file of a software product.
This result provides validation evidence for the first
experiment assertion.
2.5
Normalized Data
Ability
Score
2
1.5
1
0.5
0
1
10
Test Subject
Figure 1
is validated through analysis of the initial questionnaire and tutorial worksheet responses. The education/technical ability (Fig. 1, ability) is derived
from the initial questionnaire responses for each
test subject and is normalized to values between
0 and 3 (Table 2) based on their experience with
operating systems, platforms, and the range of
commands used during the reverse engineering
experiment. The ability to successfully reverse
engineer a software product (Fig. 1, score) is derived from the tutorial worksheet responses for
each test subject and is normalized by applying
a consistent grading scheme per question response
(Table 2) then averaging over all of the responses
(3 tasks 8 test objects) for that particular test
subject. The education/technical ability and the
ability to successfully reverse engineer a software
product values are plotted against the test subjects identification number. Although the two
Table 2
responses
Grading
scheme
used
to
normalize
Grade Description
0
Complexity/size metric
The second assertion to be validated by the
experimental results concerned the relationship
between the complexity of the binary code file to
the complexity of the human readable source
code. This assertion is validated through correlation of the tutorial worksheet responses (regarding
the reverse engineering of the eight test objects)
versus the application of Halstead and McCabe
metrics on the human readable source code (six
software programs that when compiled produced
the eight test objects). The tutorial worksheet
responses for the static, dynamic, and modification tasks were normalized using the grading
scheme (Table 2) then averaged to produce the
mean grade per test object (3 tasks 10 test subjects). The Halstead and McCabe metrics were
computed using the source code for each of the
test objects. The mean grade per test object is
correlated with each of the individual metric items
to determine the extent of any dependencies
(Tables 3 and 4).
The statistical analysis reveals that there are no
significant positive correlations between the
source code metrics and the ability of the software
exploiter to successfully reverse engineer a software product. The lack of correlation illustrates
that source code artefacts that contribute to size
and complexity metrics do not impact the reverse
engineering process applied to binary code files.
For example, the amount of branching (decision
points) within a source code file is the basis of
the McCabe cyclomatic complexity metric and
has significant bearing on unit-level testing of
the software module. Comparatively, branching
instructions (jump instructions) within a binary
code file are easily disassembled and understood
by the software exploiter.
Conclusion
The reverse engineering experiment as defined
within this paper represents a framework for the
experimental collection of measurement data in
227
Source program
Hello World
Date
Bubble Sort
Prime Number
Test object
Mean grade
per test object
1.483
1.300
0.786
0.867
6
7
6
18
0.667
1.499
27
12
0.001
8
1
10
27
14
103
0.167
5.988
618
17
0.001
2.86
1
9
14
11
48
2.5
5.988
120
19
0.001
7.68
1
21
33
15
130
0.094
10.638
1435
15
0.001
1.83
3
Correlation
Metric
Lines of code
Software lengtha
Software vocabularya
Software volumea
Software levela
Software difficultya
Efforta
Intelligencea
Software timea
Language levela
Cyclomatic complexity
a
0.5802
0.3958
0.5560
0.4006
0.4833
0.7454
0.3972
0.6744
0
0.1909
0.4802
Halstead metrics.
Table 4
exploitation process will enable commercial software product developers to quantitatively predict
the time following product deployment when it is
anticipated that a software exploiter would have
achieved a given exploitation end goal.
The reverse engineering experiment also provides quantitative evidence that industry accepted
source code size and complexity metrics are not
suitable for characterizing the size and complexity
of binary code files pursuant to estimating the
time required to perform software exploitation
activities. Literary research conducted at the
commencement of this project did not identify
binary size and complexity metrics that could have
been used instead of the source code size and
Source program
Hello World
Date
GCD
LIBC
Test object
1.350
1.558
1.700
1.008
6
7
6
18
0.667
1.499
27
12
0.001
8
1
10
27
14
103
0.167
5.988
618
17
0.001
2.86
1
49
40
20
178
0.131
7.633
2346
17
0.2
2.43
3
665
59
21
275
0.134
7.462
5035
19
0.4
2.3
11
Correlation
Metric
Lines of code
Software lengtha
Software vocabularya
Software volumea
Software levela
Software difficultya
Efforta
Intelligencea
Software timea
Language levela
Cyclomatic complexity
a
Halstead metrics.
0.3821
0.3922
0.0904
0.4189
0.1045
0.0567
0.5952
0.1935
0.5755
0.0743
0.7844
228
complexity metrics. Size and complexity metrics
that directly characterize the binary code files
must be defined. Such size and complexity metrics
are required to support the development of a software exploitation predictive models a follow-on
research project has been proposed to define
these metrics and then use the existing reverse
engineering experiment framework to gather
measurements to corroborate the defined metrics.
Acknowledgment
The researchers wish to thank the sponsor of this
project, who requested to remain anonymous, for
the generous funding of this project and for providing funding for the follow-on research project.
I. Sutherland et al.
Muller H, Jahnke J, Smith D, Storey Margaret-Anne, Tilley S,
Wong K. Reverse engineering: a roadmap. In: ACM 2000;
2000.
Storey MAD, Wong K, Fong P, Hooper D, Hopkins K, Muller HA. On
designing an experiment to evaluate a reverse engineering
tool. Published in. In: The proceedings of working conference on reverse engineering 1996. Washington DC, USA:
IEEE Computer Society; 1996.
Tabernero M. Embedded system vulnerabilities & The IEEE
1149.1 JTAG standard, <http://www.cs.jhu.edu/wkalb/
Kalb_JTAG_page.htm>; February 2002.
Tilley SR. The canonical activities of reverse engineering. Annals of Software Engineering 2000;vol. 9. Baltzer Science
Publishers.
Weide BW, Heym WD, Hollingsworth JE. Reverse engineering of
legacy code expose. In: Proceedings: 17th international conference on software engineering. IEEE Computer Society
Press/ACM Press; 1995. p. 327e31.
Wills LM, Cross II JH. Recent trends and open issues in reverse
engineering. Automated Software Engineering: An International Journal July 1996;vol. 3(1/2):165e72. Kluwer Academic Publishers.
References
Bull TM, Younger EJ, Bennett KH, Luo Z. Bylands: reverse
engineering safety-critical systems. In: Proceedings of the
international conference on software maintenance. IEEE
Computer Society Press; October 17e20, 1995. p. 358e66.
Chen Y. Reverse engineering. In: Practical reusable UNIX software. John Wiley & Sons; 1995.
Cifuentes C, Fitzgerald A. The legal status of reverse engineering of computer software. Annals of Software Engineering
2000;vol. 9. Baltzer Science Publishers.
Demeyer S, Ducasse S, Lanza M. A hybrid reverse engineering
approach combining metrics and program visualisation.
Published in. In: The proceedings of working conference
on reverse engineering 1999. Washington DC, USA: IEEE
Computer Society; 1999.
Gannod G, Chen Y, Cheng B. An automated approach for
supporting software reuse via reverse engineering. In: Thirteenth international conference on automated software engineering. IEEE Computer Society Press; 1988. p. 94e104.
Gleason JA. A reverse engineering tool for a computer aided
software engineering (CASE) system. Technical Report. Massachusetts Institute of Technology, Department of Electrical
Engineering and Computer Science; 1992.
Interrante MF, Basrawala Z. Reverse engineering annotated bibliography technical report. Software Engineering Research
Centre; 26 January 1988. Number SERC-TR-12-F.
Kalb G. Embedded computer systems e vulnerabilities, intrusions and protection mechanisms courseware. The Johns
Hopkins University Information Security Institute.
Further reading
<http://www.wotsit.org>; July 1996.
Iain Sutherland is a lecturer in the Information Security Research Group at the School of Computing, University of Glamorgan, UK. His main research interests are Information Security
and Computer Forensics. Dr. Sutherland received his Ph.D.
from Cardiff University.
George E. Kalb is an Instructor and Institute Fellow at the Johns
Hopkins University Information Security Institute, US. His research interests are in the domains of binary reverse engineering and tamper resistance technologies. He has a B.A. in Physics
and Chemistry from University of Maryland and an M.S. in Computer Science from Johns Hopkins University.
Andrew Blyth is currently the Head of the Information Security
Research Group at the School of Computing, University of Glamorgan, UK. His research interests include network and operating systems security, and reverse engineering. Dr. Blyth
received his Ph.D. from Newcastle University.
Gaius Mulley is a senior lecturer at the University of Glamorgan.
He is the author of GNU Modula-2 and the groff html device
driver grohtml. His research interests also include performance
of micro-kernels and compiler design. Dr. Mulley received his
Ph.D. and B.Sc.(Hons) from the University of Reading.
article info
abstract
Article history:
This paper addresses methods for combating spam, focusing especially on those based on
the economic motivations of unsolicited commercial e-mail. Considering the fact that to
date no machine has passed the Turing test, well-known blacklist and whitelist solutions
can be generalized by greylists. An outline of a simple SMTP anti-spam application following these ideas and running on a UNIX machine is offered. Some problems regarding the
Keywords:
application are discussed, together with some of the results obtained after a two-month
Spam
test period.
2006 Elsevier Ltd. All rights reserved.
Anti-spam filter
Whitelist
Greylist
UNIX
Sendmail
1.
Introduction
Spam is the word commonly used to refer to unsolicited commercial e-mail (UCE) or unsolicited bulk e-mail (UBE). As well as
a certain displeasure for users, spam is a waste of money
and Internet resources (Grimes, 2004; Spam). Furthermore,
owing to its content, its distribution methods, and the way it
usually forges its sources, it can be regarded as fraudulent
(Hinde, 2003). Currently, more than half of all circulating Internet e-mails are spam. Forecasts point to an even worse situation in the future. While the number of legitimate e-mails in
2007 will be the same as now, it is believed that spam will
double (Spam filters, 2004).
There is readily available software to fight spam. Anti-spam
software is in constant evolution, and so are the tools used to
generate it. Indeed, a fight has arisen at the spam battlefield
similar to the one between other computing opponents, such
as viruses and antivirus software. Just as a computer virus
has a life cycle comparable to the life cycle of its biological
2.
230
There are supporters of the former (Mertz) as well as of the latter (Grimes, 2004; Hinde, 2003). However, a problem as multifaceted as spam probably requires a combined solution.
With regard to the legal front, some difficulties have been
encountered. They may be due to the international character
of Internet and the lenience of some recently enacted laws
(Asaravala). A severe anti-spam legislation could perhaps
lead to a lack of competitiveness against less scrupulous
neighbouring countries. Opt-in and opt-out models are also being debated, as well as public registries where the addresses of
people who do not want e-mail marketing are to be included.
Some of these measures may be counter-productive, however,
since spammers can use them maliciously for their own
benefit.
Regarding the technological aspect, several measures have
been devised and put into practice. Among them are the
following:
0) Preventive methods: such as trying to prevent spammers
from including ones e-mail address in their lists.
1) Blacklists: these are lists of e-mail or machine addresses
from which it is known that spam is sent. They may be
personal or public, local or distributed. When a message
arrives coming from an address or machine listed on the
blacklist, it is rejected.
2) Honeypots: in connection with blacklists, these consist of
invented e-mail addresses. Their aim is to attract as
much spam as possible in order to alert other users or
take further measures. They are based on spam usually
being distributed in bulk. Characteristic features
( fingerprints) are obtained from received messages. User
software connects to the honeypot to find out if the relevant message has already been received there.
3) Whitelists: their operation is the opposite of blacklists.
They consist of a list of addresses from which all mail is
accepted. Mail coming from other addresses is transferred
to a low priority folder (Ookoboiny). A few commercial
implementations are available and some of them are evaluated in PC Magazine, 2004.
4) Content filters: these compute a score for each incoming
message as a function of some previously user-established criteria. If the score of the message is greater than
a given threshold, the message is considered spam.
5) Bayesian filters (Graham): Statistics about the content of
the message are used for the purpose of being able to classify it as spam or not. Users must train their filters to make
them learn which messages are spam and which are
not. This method is appealing because it is adaptable;
that is, it learns from its users concept of spam as more
and more messages are processed.
6) Neural networks (Vinther): if a human being is easily capable of detecting spam, perhaps artificial intelligence
should be tried out. Although no systems are currently
available commercially, some efforts have been made.
7) Sender Id: this method is devised to get rid of forged sender
information (domain spoofing). It simply asks the presumed
sender domain for IP addresses from which that message
can be sent. The message is considered spam if the e-mail
connection did not come from one of those (Sender ID
Framework).
3.
Spam on spam
Recently, there has been much debate about the economic aspects of fighting spam (McWilliams). Clear evidence in support
of how profitable a spam-based commercial campaign can be
is seen in spam e-mails that advertise spam services. Sometimes, e-mails offering bulk marketing programs via e-mail
are received. Their price ranges between a few dollar cents
and one dollar for each thousand e-mails sent, depending on
whether the spammers are in charge of the design. If one is
interested in merely buying e-mail addresses to send spam
e-mails, the price is in the region of a hundred dollars. It thus
depends on whether the addresses are classified by country,
Internet domain, field of activity, etc. and whether they are
verified (the addresses are not dead). The promised response
rate also ranges between 1% and a more realistic 1/10,000.
Quoting the very spammers advertisement: You sell a product or service for 10 euros. You decide to promote this product
or service on the Internet to 10 million people, only 1% decides
buy your product (sic), do the math and see how much money
you would make. [.] You would make one million euros sending 10 million emails. You understand now why you receive so
much email every day in your mailbox: Advertising on Internet is extremely lucrative.. Just as certain animals or plants
produce hundreds of eggs or seeds, a spammer spreads a huge
amount of messages, despite knowing that the vast majority
will not bear fruit.
231
232
4.
Considering the above ideas, a low-cost and highly customizable anti-spam application has been developed at the
Computer Science Department at the University of Salamanca. To accomplish this, only a web server (Apache) and
an e-mail management program (Sendmail) were needed.
A simple C program and a dozen lines added to the configuration file of Sendmail (sendmail.cf) sufficed for work to
begin.
An SMTP anti-spam barrier was chosen. SMTP (Klensin)
stands for Simple Mail Transfer Protocol. As its name suggests,
SMTP is very straightforward. The simplest of all SMTP working procedures to deliver mail is shown in Fig. 1. Mail transfer
begins with the recipient machine greeting and introducing
itself. A dialogue with the senders machine follows, which
is quite easy to understand. After the sender has issued the
QUIT command and the other part has acknowledged it, the
connection is closed.
It must be remarked that the senders machine statements
about its name or the senders address may be false. There is
no guarantee that they are real. Once the mail has been processed, the stated machine name, along with the IP address
from which the connection was established, will appear at
the Received headers of the e-mail. The rest of the headers
are probably kept unchanged (Fig. 2). It is important to mention that both the origin address (MAIL FROM) and the destination address (RCPT TO) may differ from the ones at the
headers of the message (From: and To:, respectively). This is
why the former is sometimes known as SMTP envelope addresses. The specification states that any notification or error
detected once the SMTP connection is closed has to be
addressed to the MAIL FROM address.
The SMTP anti-spam filter developed therefore has to decide whether the connection is good with only two pieces of
sender.part.com
recipient.part.com
[[listening onport 25]]
233
5.
Users can customize the filter to suit their needs. They must
create a file named .blacklist at their home directory. An
example of such a file can be seen in Fig. 4. The file syntax is
very simple. It is a text file with independent entries on different lines. Each entry has two parts separated by a colon (:).
The type of entry comes after the colon and can be PASSWORD, BLACK, GREY or WHITE. A PASSWORD entry is used
to set the password of the user. BLACK, GREY or WHITE deal
with address lists. When a sender requests the system to deliver an e-mail to a local user and the address does not have
a password, the local users .blacklist file is scanned sequentially. The first time that the left part of a line matches the
SMTP MAIL FROM address, the right part will show what to do
with the request (i.e. which list it belongs to). If the end of
the file is reached, the address is considered to be GREY. As
can be seen in the figure, regular expressions can be used to
specify addresses. This is not a minor enhancement. Joining
the three lists in a single file and using regular expressions afford the application great flexibility.
For example, one can decide to allow all incoming mail
from the mars.com domain with the line *.mars.com:WHITE. If
later one finds that spam is arriving from alienvacation@mars.com,
6.
Application description
234
pera:PASSWORD
*.mars.com:WHITE
alienvacation@mars.com:BLACK
granny@popularwebmail.com:WHITE
*popularwebmail.com:GREY
:WHITE
compromised@recipient.part.com:GREY
*[@.]recipient.part.com:WHITE
Fig. 4 An example of .blacklist file.
may be repeated until the input string does not match its left
part. Then, the output string is the resulting input string. If
there is no match, the output string equals the input string.
For further information, readers are referred to Sendmail
documentation (The whole scoop in the configuration file).
Rules are grouped in procedures. Different procedures are
invoked in different parts of the SMTP connection. check_rcpt
is the name of the procedure which is called when an SMTP
RCPT TO address has been received (Sendmail 8.8). In the version
of Sendmail used to develop the test filter, check_rcpt calls another procedure whose name is Local_check_rcpt. These procedures are widely used to avoid mail relaying. However, they can
also be used to implement a spam filter. For example, Fig. 5 illustrates how Local_check_rcpt was used at the test example.
Sendmail must be restarted for the changes to take effect.
A general explanation of Fig. 5 follows. On the first line, the
CheckAS key is defined as an executable program located at /
root/ANTISPAM/antispam. The program takes an argument consisting of the recipients login, a colon and what was read from
the SMTP MAIL FROM statement. The program writes on the
standard output BLACK, GREY, WHITE, GOODPASS or BADPASS according to the argument passed. The present version
looks up the users .blacklist file as described above. The second line builds up a set, whose name is ProgramaAS, where all
local users who the administrator wants to be included in the
anti-spam program appear. In the example of the figure, only
user gyermo is included. Following these two lines, the relevant core of the modifications is shown. When Sendmail
receives the SMTP RCPT TO statement, check_rcpt is invoked,
which, in turn, calls Local_check_rcpt. On the fifth line, the
rule adds what was stated in MAIL FROM plus the word local
if the mail is for a local user. The sixth line more or less states
that if the receiving user is local and belongs to the anti-spam
program, the CheckAS program will be run with the corresponding argument. The seventh line works the same as the
sixth, but for the password case. The eighth, ninth and tenth
[01]
[02]
[03]
[04]
[05]
[06]
[07]
[08]
7.
Results
$: $1 $| $&{rcpt_addr} $| $&{rcpt_mailer}
$1 $| $( CheckAS $2:$&{mail_addr} $)
$1 $| $( CheckAS $&{rcpt_addr}:$&{mail_addr} $)
$#error $@ 5.7.1 $: "550 " $&{mail_addr}
" blacklisted (spam)"
$#error $@ 5.1.0 $: "550 " $&{mail_addr} " blocked."
" Info: http://tejo.fis.usal.es/~gyermo/as.htm"
$#error $@ 5.1.1 $: "550 Wrong or expired password"
$1
8.
Problems
235
9.
Acknowledgements
This work has been partially supported by the Spanish Ministerio de Ciencia y Tecnologa (FEDER funds, grant BFM200200033) and by the Junta de Castilla y Leon (grant SA107/03).
236
references
Asaravala A, et al. With this law, you can spam. Available from:
<http://www.wired.com/news/business/0,1367,62020,00.
html>.
Cournane A, Hunt R. An analysis of the tools used for the generation and prevention of spam. Computers & Security 2004;
23:15466.
Dominus MJ. My life with spam: Part 3. Available from: <http://
www.perl.com/pub/a/2000/03/spam3.html>.
Graham P. A plan for spam. Available from: <http://www.
paulgraham.com/spam.html>.
Grimes GA. Issues with spam. Computer Fraud & Security 2004;
5:126.
Hinde S. Spam: the evolution of a nuisance. Computers & Security
2003;22:4748.
Klensin J. Simple Mail Transfer Protocol (RFC 2821). Available
from: <http://www.ietf.org/rfc/rfc2821.txt>.
Manes S. Kill spam with your own two hands. Available from:
<http://www.forbes.com/forbes/2003/0623/136_print.html>.
McWilliams B. Swollen orders show spams allure. Available
from: <http://www.wired.com/news/business/
0,1367,59907,00.html>.
Mertz D. Spam filtering techniques, six approaches to eliminating
unwanted e-mail. Available from: <http://www-106.ibm.com/
developerworks/linux/library/l-spamf.html>.
Miller MJ. Forward thinking. How spam solutions lead to more
problems. PC Magazine December 2003:7.