Professional Documents
Culture Documents
article info a b s t r a c t
Article history: In the era of electronic and mobile commerce, massive numbers of financial transactions are conducted
Received 12 September 2017 online on daily basis, which created potential fraudulent opportunities. A common fraudulent activity
Received in revised form 23 May 2018 that involves creating a replica of a trustful website to deceive users and illegally obtain their credentials
Accepted 28 May 2018
is website phishing. Website phishing is a serious online fraud, costing banks, online users, governments,
and other organisations severe financial damages. One conventional approach to combat phishing is to
raise awareness and educate novice users on the different tactics utilised by phishers by conducting
Keywords:
Classification periodic training or workshops. However, this approach has been criticised of being not cost effective
Computer security as phishing tactics are constantly changing besides it may require high operational cost. Another anti-
Phishing phishing approach is to legislate or amend existing cyber security laws that persecute online fraudsters
Machine learning without minimising its severity. A more promising anti-phishing approach is to prevent phishing attacks
Web security using intelligent machine learning (ML) technology. Using this technology, a classification system is
Security awareness integrated in the browser in which it will detect phishing activities and communicate these with the end
user. This paper reviews and critically analyses legal, training, educational and intelligent anti-phishing
approaches. More importantly, ways to combat phishing by intelligent and conventional are highlighted,
besides revealing these approaches differences, similarities and positive and negative aspects from the
user and performance prospective. Different stakeholders such as computer security experts, researchers
in web security as well as business owners may likely benefit from this review on website phishing.
© 2018 Elsevier Inc. All rights reserved.
Contents
1. Introduction......................................................................................................................................................................................................................... 45
2. Phishing background .......................................................................................................................................................................................................... 45
2.1. Phishing history...................................................................................................................................................................................................... 45
2.2. Phishing process ..................................................................................................................................................................................................... 46
2.3. Phishing as a classification problem ..................................................................................................................................................................... 46
3. Common traditional anti-phishing methods .................................................................................................................................................................... 47
3.1. Legal anti-phishing legislations............................................................................................................................................................................. 47
3.2. Simulated training.................................................................................................................................................................................................. 48
3.3. User experience: Anti-phishing online communities.......................................................................................................................................... 48
3.4. Discussion non intelligent anti-phishing solutions ............................................................................................................................................. 48
4. Computerised anti-phishing techniques ........................................................................................................................................................................... 49
4.1. Databases (blacklist and whitelist) ....................................................................................................................................................................... 49
4.2. Intelligent anti-phishing techniques based on ML .............................................................................................................................................. 50
4.2.1. Decision trees and rule induction.......................................................................................................................................................... 50
4.2.2. Associative classification (AC)................................................................................................................................................................ 51
4.2.3. Neural network (NN) .............................................................................................................................................................................. 51
4.2.4. Support vector machine (SVM).............................................................................................................................................................. 52
4.2.5. Fuzzy logic ............................................................................................................................................................................................... 52
4.2.6. CANTINA term frequency inverse document frequency approach ..................................................................................................... 52
* Corresponding author.
E-mail addresses: P12047781@myemail.dmu.ac.uk (I. Qabajeh), fadi.fayez@manukau.ac.nz (F. Thabtah), chiclana@dmu.ac.uk (F. Chiclana).
https://doi.org/10.1016/j.cosrev.2018.05.003
1574-0137/© 2018 Elsevier Inc. All rights reserved.
I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55 45
5. Conclusions.......................................................................................................................................................................................................................... 53
References ........................................................................................................................................................................................................................... 53
In this section, we examine the literature on phishing and critically emails, or any other methods to ask or solicit information from on-
analyse different techniques based on the above categories. Focus, line users by claiming ones self as a business without the authority
however, will be on the intelligent anti-phishing solutions since of that business. Other US States such as Texas have also introduced
it is believed to be the way forward in shielding the web from new cybercrime legislations that include phishing, and in 2005 the
phishing threats and promising results have recently been derived General Assembly of Virginia added phishing attacks to their list
by this category in [6,30–33], [2,4], among others. of computer crimes [35]. These new laws empowered companies
such as America Online to file lawsuits in Virginia against phishers
in 2006 [36]. However, most states in US have not legislated spe-
3.1. Legal anti-phishing legislations cific laws incriminating phishing and usually prosecute phishers
using other computing crime laws such as fraud.
Governments have been slow in responding and opposing At Federal level in US, lawmakers and congressional repre-
started to oppose phishing. California State in the US was the first to sentatives have not passed anti-phishing legislation either. There
issue anti-phishing legislation in 2005 [34]. This legislation stated were a few attempts between 2004 and 2006 following the Anti-
that it is unlawful to use any electronic means such as websites, Phishing Act of 2004 to pass specific bills incriminating phishing
48 I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55
and instigating tougher prison sentences, but these bills were business processes performed by employees by experimenting to
stopped at the committee level in Congress. Nevertheless, federal measure a certain outcome Arachchilage and Love [42]. In phish-
law enforcement can incriminate phishers using other laws that ing, the authors of Arachchilage et al. [44] used the embedded
are related to identity theft and fraud such as ‘‘18 U.S.C. section training methodology to measure phishing awareness at a uni-
1028’’ [36]. Businesses have also joined the Government in fighting versity. The authors sent malicious emails from the administrator
phishing. For example, in 2005 Microsoft filed over 115 lawsuits to participants without informing them of the training material
in Washington’s Western District Court accusing a single Internet content. These emails urged users to click on a link that would
user of utilising various deceptive methods to access some of redirect them to a malicious website where they would input
the company’s users’ information (add reference). In mid-2006, their login credentials. This aim was directed at identifying the
the then president George W. Bush established a new cybercrime number of users who would actually click on the link. During
identity theft task force [37], with a single goal: reduce the risks of the experiment, the user was interrupted immediately when he
cybercrime, especially phishing. clicked the link and was then provided with the training material.
The United Kingdom (UK) has followed the US by strengthening The embedded training proposed by the authors was based on a
its legal system against severe cybercrimes, including fraud and preliminary pilot study conducted by them on a limited number of
identity theft. In 2006, the UK introduced the new Fraud Act, university students.
which increased prison sentences to up to ten-years for online
fraud offences [38]. This same act prevents possession of a phishing 3.3. User experience: Anti-phishing online communities
website with the intent to deceive users and commit fraud. Fur-
ther, Microsoft decided to collaborate with other law enforcement One of the approaches to reducing the impact of phishing on
agencies outside US to bring justice to phishers. In doing so, the online users and organisations is to build an anti-phishing commu-
company signed an agreement with the Australian government to nity to monitor recent phishing activities and provide news to the
train law enforcement agents in preventing phishing [39]. Also, different stakeholders. Users’ experiences are practical and based
in 2010 Canada introduced an Anti-spam Act that incriminates on real cases related to different types of phishing. Such efforts
cybercrime and that aims to protect Canadian online consumers by users and organisations have resulted in new proactive online
and businesses when globally trading [40]. communities and data repositories. These accumulated and useful
resources are of interest since they can be employed to study ways
3.2. Simulated training to make the Internet safer and free from phishing.
The Monitoring and Takedown (MaT) approach enables individ-
One of the easy, yet helpful, policies to oppose cybercrimes is uals who recognise phishing activity to report it via public anti-
to educate users on the ways employed to access their informa- phishing communities including APWG, PhishTank, Millersmiles,
tion. When novice users are aware of the circumstances around and Symantec among others [8,27,46,47]. These anti-phishing
phishing, they may be able to minimise this risk or stop it as communities allow users to report phishing content and warn
early as possible. Unfortunately, ordinary web browsing users are other users and organisations as well. Users can also report phish-
unaware of how phishing attacks start or how visually to recog- ing content to the Federal Trade Commission’s Complaint Depart-
nise an untruthful website and differentiate it from one that is ment, becoming directly part of the campaign towards combating
trustworthy [11,12]. Moreover, basic security indicators and anti- phishing. Many reputable companies also have an internet fraud
phishing software counterparts are still vague for many online department that allows users to report any fraudulent or suspi-
shoppers [24]. Subsequently, these increase the pace of phishing cious activity such as phishing. PhishTank was created in 2003 as
and motivate phishers to launch further attacks. For instance, a a subsidiary of OpenDNS in order to provide the parent company,
security survey was conducted by Julie et al. [41], which revealed as well as the online community, with a phishing repository. This
the lack of knowledge on cybercrimes, including phishing, held by large collection of stored phishing websites has given computer
online users. In addition, some respondents in the survey showed security experts, users, researchers and business owners’ exten-
security awareness yet were reluctant in using their financial infor- sive information about phishing attacks and the features of their
mation for payment purposes, even within trustworthy websites. associated emails and websites. Another example of a good use of
There have been a number of studies on educating people as user experiences is Cloudmark, which is an alerting-based anti-
to the severity of phishing. For example, Arachchilage and Love phishing method with user rating system [48]. When a user is
[42] investigated whether mobile games can be a helpful method visiting a website and experiencing any kind of threat, they can
for raising awareness of phishing attacks. The authors evaluated then rate that website to alert other online users. Finally, Web
learning curves of users who played with a mobile game about of Trust (WOT) is another example of an anti-phishing approach
phishing developed by Arachchilage and Cole [43], and assessed based on the user feedback rating model [49].
whether an interactive mobile platform is effective in educating
users in contrast to traditional security training. A comparison of 3.4. Discussion non intelligent anti-phishing solutions
user responsiveness to phishing has also been conducted using the
developed mobile game, along with a website designed by APWG. Legislators in the US, UK and Canada, among others countries,
The results showed that users who played the anti-phishing mobile have approved legislative bills that include serious jail sentences
game were able to spot non-genuine websites with a higher rate of for incriminated phishers. This has been made clear in several high
accuracy than other users who only used the APWG website. profile cases, especially in the US. Nevertheless, these legislative
There are a number of organisations and research studies, such bills have not achieved a decrease of phishing attacks. On the
as Arachchilage et al. [44] and Ronald et al. [45], that have adopted contrary, phishing has now become more severe than ever and
a relative training to warn users of phishing. This training involves businesses as well as individual users have suffered from sub-
sending participants simulated malicious emails from a genuine stantial financial losses as a result. One of the primary reasons
source to evaluate their exposure to phishing. At the end of the for legal actions not to be as effective as expected in minimising
training, participants are given the training material and informed phishing is due to the fact that often a phishing website has a short
about their vulnerability to phishing. life span (normally about two days), which helps the phisher to
Embedded training is another way to measure a users’ vul- disappear quickly once the fraud has been committed, making law
nerability to phishing. This training often mimics primary daily enforcement difficult.
I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55 49
As previously mentioned, raising awareness of phishing risks • Domain/URL Based. These are real time URL lists that con-
and educating users has shown promising initial results [45]. Com- tain malicious domain names and normally look for spam
puter security scholars have adopted different ways to disseminate URLs within the body of emails.
the seriousness phishing may cause to society, with [8,27] using • Internet Protocol Based. These are real time URL or domain
web-based material to teach novice users phishing fraud tech- server blacklists that contain IP addresses who, in real-time,
niques; while others, such as Arachchilage et al. [44], developing change their status. Often, mailbox providers, such as Ya-
contextual and embedded trainings based on simulated phishing hoo for example, check domain server blacklists to evaluate
emails coming from genuine sources; or educational material on whether the sending server (source) is run by someone who
phishing based on mobile games in order to increase the motiva- allows other users to send from their own source.
tion factor among [42].
Even though educating users may positively affect the global ef- Users, businesses, or computer software enterprises can create
forts of combating phishing, this approach demands high costs and blacklists. Whenever a website is about to be browsed, the browser
requires users to be equipped with computer security knowledge. checks the URL in the blacklist. If the URL exists in the blacklist,
Large organisations and governments are periodically investing in a certain action is taken to warn the user of the possibility of a
the development of anti-phishing materials in both hard and soft security breach. Otherwise, no action will be taken as the web-
forms as well as websites and mobile applications. However, since site’s URL is not recognised as harmful. Currently, there are a few
phishing techniques keep changing/evolving, small to medium hundred blacklists which are publically available, among which
enterprises might not have the resources large organisations have we can mention the ATLAS blacklist from Arbor Networks, BLADE
to enable them to invest in their users’ education. Therefore, a Malicious URL Analysis, DGA list, CYMRU Bogon list, Scumware.org
large portion of the online community realistically cannot afford list, OpenPhish list, Google blacklist, and Microsoft blacklist [52].
the continuous additional costs to keep updating current anti- Since any user or small to large organisation can create blacklists,
phishing material. Furthermore, phishing techniques are becoming the currently public available blacklists have different levels of
more sophisticated because of the group efforts of phishers who security effectiveness, particularly with respect to two factors:
employ systematic attack strategies, which make it harder for even
security experts and specialised law enforcement agents to keep 1. Times the blacklist gets updated and its consistent availabil-
their skills updated. This makes ordinary users vulnerable, even if ity.
they were equipped with basic knowledge about phishing. Thus, 2. Results quality with respect to accurate phishing detection
more advanced, cheaper and intelligent approaches are needed for rate.
their implementation both within educational and legislative solu-
tions to further reduce phishing attacks. We have seen thoughtful Marketers, users, and businesses tend to use Google and Mi-
attempts that evolved from user experiences, user ratings, and crosoft blacklists when compared with other publically available
users’ social networking (such as Phishtank, Cloudmark, and APWG blacklists commonly use because of their lower false positive rates.
among others helping novice users and enterprises avoid falling A study by [2] analysing blacklists concluded that they contain on
prey to phishing). Effectiveness of these user community based average 47% to 83% phishing websites.
approaches relies mainly on the following factors: (1) User experi- Blacklists often are stored on servers, but can also be available
ence; (2) User knowledge; (3) User honesty; and (4) Accessibility locally in a computer machine as well [25]. Thus, the process
and validity of the user community’s website data. Unfortunately, of checking whether a URL is part of the blacklist is executed
these factors are difficult to measure and validate, thus relying whenever a website is about to be visited by the user, in which
on user experience and knowledge alone necessitates careful care case the server or local machine uses a particular search method to
and accuracy. We hypothesise that by ‘‘only’’ considering users’ verify the process and derive an action. The blacklist usually gets
experience in judging a websites’ legitimacy is not enough to updated periodically. For example, Microsoft blacklist is normally
combat phishing, although it can be a supporting approach to a updated every nine hours to six days, whereas Google blacklist
more advanced intelligent solution based on ML/DM. gets updated every twenty hours to twelve days [11,12]. Hence,
the time window needed to amend the blacklist by including new
4. Computerised anti-phishing techniques malicious URLs, or excluding a possible false positive URLs, may
allow phishers to launch and succeed in their phishing attacks. In
There has been development of anti-spam software tools that
other words, phishers have significant time to initiate a phishing
can block suspicious emails, however, these programmes con-
attack before their websites get blocked This is an obvious limita-
stantly block a large number of genuine emails and classify them
tion of using the blacklist approach in tracking false websites [18].
as junk emails [11,12]. Emails misclassified as spam are simply
Another study by APWG revealed that over 75% of phishing do-
false positive instances. Thus, one of the ultimate goals of the
mains have been genuinely serving legitimate websites and when
computerised anti-phishing tool is to reduce false positives and
blocked imply that several trustworthy websites will be added
increase true positives so users can be confident of their mailbox’s
filter results without having to manually check their junk email to the blacklist, which causes a drastic reduction in the website’s
folder. revenue and hinder its reputation [9].
After the creation of blacklists, many automated anti-phishing
4.1. Databases (blacklist and whitelist) tools normally used by software companies such as McAfee,
Google, Microsoft, were proposed. For instance, The Anti-Phishing
A database driven approach to fight phishing, called black- Explorer 9, McAfee Site Advisor, and Google Safe Base are three
list, was developed by several research projects [2,50,51]). This common anti-phishing tools based on the blacklist approach.
approach is based on using a predefined list containing domain Moreover, companies such as VeriSign developed anti-phishing
names or URLs for websites that have been recognised as harm- internet crawlers that gather massive numbers of websites to iden-
ful. A blacklisted website may lose up to 95% of its usual traffic, tify clones in order to assist in differentiating between legitimate
which will hinder the website’s revenue capacity and eventually and phishing websites.
profit [23]. This is the primary reason that web masters and web There have been some attempts to look into creating whitelists,
administrators give great attention to the problem of blacklisting. i.e. legitimate URL databases, in contrast to blacklists [53]. Unfor-
According to Mohammad et al. [11,12], there are two types of tunately, since the majority of newly created websites are initially
blacklists in computer security: identified as ‘‘suspicious’’, this creates a burden on the whitelist
50 I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55
approach. To overcome this issue, the websites expected to be (1) Decision trees (ID3, C4.5 and successors) [56].
visited by the user should exist in the whitelist. This is sometimes (2) Probabilistic models (Naïve Bayes, Bayesian Network and
problematic in practise because of the large number of possible successors) [57].
websites that a user might browse. The whitelist approach is sim- (3) Rule-based classification
ply impractical since ‘‘knowing’’ in advance what users might be
browsing for might be different to those actually visited during the a. Associative classification (AC)
browsing process. Human decision is a dynamic process and often i. Classification based Association (CBA and succes-
users change their mind and start browsing new websites that they
sors) [58].
initially never intended to.
ii. Classification based on Multiple Association
One of the early developed whitelist was proposed by Chen and
(CMAR and successors) [59].
Guo [53], which was based on users’ browsing trusted websites.
iii. Multiclass Classification-based Association
The whitelist monitors the user’s login attempts and if a repeated
login was successfully executed this method prompts the user to (MCAR and successors) [60].
insert that website into the whitelist. One clear limitation of Chen b. Rule induction such as FOIL, RIPPER and successors
and Guo’s method is that it assumes that users are dealing with [61].
trustful websites, which unfortunately is not always case. c. Covering or greedy, such as PRISM [62] and eDRI [29,
Phishzoo is another whitelist technique developed by Afroz and 30].
Greenstadt [5]. This technique constructs a website profile using
a fuzzy hashing approach in which the website is represented by (4) Neural Networks (NN) methods and their successors [63].
several criteria that differentiate one website from another includ- (5) Support Vector Machine (SVM) [64,65]
ing images, HTML source code, URL, and SSL certificate. Phishzoo (6) Fuzzy Logic (FL) [66]
works as follows: (7) Boosting and paging methods, and their successors [67].
(8) Search methods such as Genetic Algorithms (GA) [68].
1. When the user browses a new website, PhishZoo makes a
specific profile for that website. The rest of this section critically analyses intelligent
2. The new website’s profile is contrasted with existing profiles anti-phishing attempts based on ML. We show how these ap-
in the PhishZoo whitelist. proaches derive a classification anti-phishing system along with
• If a full match is found, the newly browsed website is their benefits and weaknesses.
marked trustful.
• If partly matching, then the website will not be added 4.2.1. Decision trees and rule induction
since it is suspicious Fette et al. [69] explored email phishing utilising the C4.5 deci-
• If no match is found but the SSL certificate is matched, sion tree classifier among other methods including Random Forest,
PhishZoo will instantly amend the existing profile in SVM and Naïve Bayes. As a result, a new Random Forest method
the whitelist. called ‘‘Phishing Identification by Learning on Features of Email
• If no match is found, then a new profile will be created Received’’ (PILFER) was developed. Experiments on a set of 860
for the website in the whitelist. phishy and 695 ham emails were conducted. Various features for
Recently, Lee et al. [31] investigated the personal security im- distinguishing phishing emails identified included: IP URLs, time
ages whitelist approach and its impact on internet banking users’ of space, HTML messages, number of connections inside the email,
security. The authors utilised 482 users to conduct a pilot study and JavaScript. The authors claim that PILFER can be improved
on a simulated bank website. The results revealed that over 70% towards grouping messages by joining all ten features discovered
of the users during the simulated experiments had given their in the classifier apart from ‘‘Spam filter output’’.
login credentials despite their personal security image test not Mohammad et al. [25] investigated a number of rule induction
being performed. Results also revealed that novice users do not pay algorithms on the problem of website phishing classification. The
high levels of attention to the use of personal images in ebanking, authors compared RIPPER, C4.5 (Rules), CBA, and PRISM on a secu-
which can be seen as a possible shortcoming for this anti-phishing rity dataset they collected containing 2500 instances and 16 fea-
approach. tures. A special hand crafted rule to collect the data was developed
by the authors based on simple statistical analysis performed on
4.2. Intelligent anti-phishing techniques based on ML the initial dataset’s features. Experiments of the four rule-based
classification methods showed that there are eight effective fea-
Since phishing is a typical classification problem, ML and DM tures that can be employed by the classification algorithm in com-
techniques seem appropriate for deriving knowledge from website bating phishing: SSL and HTTPS, Domain-age, Site-traffic, Long-
features that can assist in minimising the problem. The key to URL, Request-URL Sub-domain, Multi-sub-domain, Suffix–prefix,
success in developing automated anti-phishing classification sys-
and IP-address.
tems is a website’s feature. Since there are a tremendous number
Khadi and Shinde [5] studied the problem of email-based phish-
of features linked with a website, a necessary step to enhance
ing and proposed a potential solution based on combining a RIPPER
the predictive system performance is to pre-process the set of
classifier with fuzzy logic. The role of fuzzy logic is to pick the
features in order to pick up the ‘‘most’’ effective. Feature effective-
ness can be measured using different computational intelligence main features of the email and rank them based on a probability
methods such as information gain, correlation analysis, and chi- score. Meanwhile, the role of RIPPER is to automatically use these
square among others [54,55]). features to classify the type of emails as ham or phishy. Two com-
Once an initial features set is chosen, the intelligent algorithm ponents of the email were utilised by Khadi and Shinde: the email
can be applied on the selected features to come up with the message (spelling errors, embedded link) and URL (IP address,
predictive system. There are many ML and DM algorithms for Length, Long URL, Suffix_Prefix, Crawler URL, Non matching URL).
classification that have been developed by scholars in the last two Moreover, very limited data consisting of just 100 instances from
to three decades as covered in Chapter 2. Most of these algorithms phishtank was in experiments involving the WEKA software tool.
use one of the following major classification approaches in deriving No comparison with other fuzzy logic or rule-based classifications
their predictive systems: was conducted by the authors. Results showed that there are
I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55 51
twelve rules generated by RIPPER from the dataset with an 85.4% updated several parameters, like the learning rate, in a dynamic
prediction rate. way before adding a new neuron to the hidden layer. The process
Aburrous et al. [26] investigated rule induction methods to seek of updating these NN features is performed during the building of
their applicability for categorising websites based on phishing fea- the classification model and based on the network environment,
tures. Website features were initially manually classified into six behaviour of the desired error rate, and the computed error rate at
criteria as described in an earlier report on phishing by Aburrous that point. The dynamic NN model was applied to detect phishing
et al. [22]. Using WEKA, a number of experiments with four clas- on a large dataset from UCI containing over 11 000 websites [12].
sification algorithms (RIPPER, PART, PRISM, C4.5) were conducted Experiments using different epoch sizes (100, 200, 500, 1000) have
against 1006 instances downloaded from Phishtank. The focus of been conducted, and the results obtained exhibited better predic-
the experiments was the classification accuracy of the classifiers tive systems when compared to Bayesian Network and Decision
produced. Results revealed that rule induction is a promising ap- Trees.
proach because it was able to detect, on average, 83% of phishing The ANN Back Propagation algorithm [70] was investigated on a
websites. The authors suggested that results obtained could be security dataset concerning website phishing by Mohammad et al.
further enhanced if a careful feature selection were employed. [71]. The authors collected a dataset with over 2000 instances from
different legitimate and phishing sources. Processing the dataset,
4.2.2. Associative classification (AC) they tried to measure the correlation between the features and
The two AC methods CBA and MCAR have been evaluated on target attributes using basic univariate statistical analysis (fre-
a Phishtank dataset to seek their applicability in cracking phish- quency of features values and the target attribute values). Finally,
ing ([58,60], Abourrous et al., 2010b). Abourrous et al. (2010b) used they applied the Back Propagation ANN algorithm to derive anti-
a dataset consisting of over 1000 instances with 27 different fea- phishing models. The results of the study indicated that ANN is a
tures and applied CBA, MCAR, and four other rule-based classifiers promising approach for combating phishing, particularly since the
using the WEKA DM tool. The aim was to assist security managers results showed increased accuracy of the models generated from
within organisations by building an intelligent anti-phishing tool the Back Propagation algorithm when compared with decision
within browsers that can detect phishing as accurately as possible. trees and probabilistic.
Experimental results of the six ML algorithms revealed that AC Mohammad et al. [32] have developed an anti-phishing NN
methods generated more rules than the rest of the algorithms, model that relies on constantly improving the learned predictive
yet had higher predictive classifiers. More specifically, the AC sys- model based on previous training experiences, Since phishers con-
tems produced showed high correlations among features linked tinuously update their deception methods, new features become
with three major criteria: URL, Domain Identity, and Encryption. apparent while others become insignificant. In order to cope with
Nevertheless, the massive number of rules derived by MCAR and these changes, the authors proposed a self-structuring NN classi-
CBA may overwhelm end-users since they might not be able to fication algorithm that deals with the vitality of phishing features.
control the anti-phishing system. Furthermore, the authors did The algorithm employs validation data to track the performance of
not implement the AC rules within a browser to evaluate its real the constructed network model and make the appropriate decision
performance, which does not facilitate measuring the success or based on results obtained against the validation dataset. For in-
failure of their classification systems. stance, when the achieved error against the network is lower than
Recently, more domain specific AC anti-phishing systems have the minimum achieved error, the algorithm saves the network’s
been created [4,18]. These new models take into account not only weights and continues the training process. However, when the
two class values of the phishing problem (legitimate, phishy) but achieved error is larger than the minimum achieved error so far,
also considers a harder case to detect: the ‘‘suspicious’’ class label. the algorithm continues the training process without saving the
Instances that cannot be fully classes as phishy nor as legitimate are weights. Other important network parameters are also updated
very hard to detect by typical ML algorithms, thus increasing their when necessary during the building of the classification model
false positive rates. Abdelhamid et al. [18] and [4] have therefore without waiting until the model has been entirely built. Results
enhanced current intelligent classification systems by including obtained against a phishing dataset of thirty features and over
two distinct advantages: (1) extending the phishing problem to 10 000 instances showed that the self-structuring NN model is
include suspicious cases, making it more realistic; and (2) propos- able to generate anti-phishing models more accurately than tra-
ing a new multi-label learning phase that can discover disjunctive ditional classification approaches such as C4.5 and probabilistic
in addition to conjunctive rules. These additional disjunctive rules approaches.
are tossed out by existing AC methods. This new multi-label phase Recently, a new machine learning technique based on Long
enhances predictive power and provides more useful knowledge to Short Term Memory (LSTM) ANN proposed to deal with spear
the end-user. The authors used a dataset that has 16 features and phishing posts on social media [72]. The LSTM model was trained
over 1500 instances, comparing the performance of their classifiers on different posts on social media that were represented as word
with other rule-based classifiers with respect to the knowledge vectors. The author enhanced the classification model by using
derived and its accuracy. The authors employed the chi-square clustering techniques. Experimental results revealed that the LSTM
testing method to measure the features goodness and discrim- ANN classification model is more accurate that manual classifica-
inate among features with respect to their impact on phishing. tion and other models obtained from former email attack cam-
Processed data results showed high competitive performance of paigns.
the new multi-label associative classifiers when compared with Feed Forward NN (FFNN) was applied on an email phishing
CBA, MCAR, rule induction, and decision trees. classification problem by Jameel and George [33]. Basic imple-
mentation of a multilayer FFNN based on Back Propagation was
4.2.3. Neural network (NN) used to differentiate suspicious from legitimate emails. Eighteen
One of the common ways to train a NN is trial and error [32]. binary features were extracted from the email (header and HTML
However, this methodology has been criticised because of the time body) and made available as the training dataset attributes. These
spent to tune the parameters and the requirement of an available features were given values based on human rules developed by
domain expert. Thabtah et al. [30] proposed a NN anti-phishing security domain experts. To derive the NN models, 6000 emails
model based on self-structuring the classification system rather were used. The results obtained showed that FFNN is able to
than using trial and error. The algorithm proposed by the authors categorise emails with high speed and with an error rate below
52 I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55
2%. However, the authors have not yet embedded their FFNN into Table 1
browsers for live testing. Phishing features per category [22].
Once a new website identity and its structural features are cap- dataset was assigned three possible values by the authors: Phishy,
tured (Abnormal URL, Abnormal anchors, Server Form Handler, Genuine, and Doubtful. Limited results indicated that there are two
Abnormal certificate in SSL, Abnormal DNS, Abnormal cookies), effective indicators to distinguish phishiness in websites: Domain
then a SVM algorithm is trained on a historical dataset consisting Identity and URL.
of the same features in order to derive the new website type. Almomani et al. [17] proposed a promising solution to deal with
Experimental results on six features using the proposed SVM a vital types of email phishing attacks called zero-day. This type of
indicated that the first helps towards increasing the detection email phishing attacks involves the utilisation of hosts by attackers
rate since malicious websites are not correlated. Furthermore, the that do not appear inside the blacklists of phishing emails. The
SVM model achieved just over 83% prediction rate, and therefore authors developed a detection system that they name phishing
more investigation is needed into the feature selection phase by dynamic evolving neural fuzzy framework (PDENF). This system
including other features that could improve the performance of the was able to successfully redflag phishing emails using classification
classifier. rules learnt by semi-supervised learning techniques. In particular,
the authors have used clustering to easy the process of classifica-
4.2.5. Fuzzy logic tion using neural fuzzy technique.
Phishing in electronic banking (Ebanking) applications has been A fuzzy based ANN model was proposed in 2015 by Nguyen
investigated by Aburrous et al. [22] utilising Fuzzy Logic. A simu- et al. [6] to classify websites based on a smaller set of phishing
lated phishing email was sent by the authors with the help of the features related to the website’s URL (PrimaryDomain, SubDomain,
security manager at Jordan Ahli Bank to measure security indica- PathDomain) and its rank (PageRank, AlexaRank, AlexaReputa-
tors of phishing among a sample of 120 employees after obtaining tion). The proposed fuzzy ANN model does not use any rules set,
the necessary authorisation (www.ahlionline.com.jo). The email rather it employs a computational function to split data instances
urged the chosen employees to reactivate their accounts by logging (websites) into ‘‘genuine’’ and ‘‘non-genuine’’ categories. Their
in because server maintenance conducted the previous two days model was tested against 21 600 websites from legitimate and
required account reactivations. Shocking results were obtained: phishing sources such as Phishtank and DMOZ. They also compared
37% of the targeted employees submitted their credentials without the generated results with that of Aburrous et al. [26] and Zhang
investigation, of which 7% were Information Technology employ- and Yuan (2008). It was discovered that their fuzzy NN model was
ees. The authors’ goal with the simulated email was to determine able to slightly enhance the phishing detection rate.
features that users may look for inside the email when they suspect
phishing to be used within a FL system to help in differentiating 4.2.6. CANTINA term frequency inverse document frequency approach
types of email. Carnegie Mellon Anti-phishing and Network Analysis Tool
FL has been used as an anti-phishing model to help classify (CANTINA) is a content based anti-phishing method that deter-
websites into legitimate or phishy in [22]. The authors claimed mines suspicious websites using the statistical measure of Term
that FL could be effective in identifying phishing activities because Frequency Inverse Document Frequency (TF–IDF). Term Frequency
it provides a simple way of dealing with intervals rather than (TF) is a statistical formula that measures keyword significance in
specific numeric values. Their proposed FL classification model was a document while Inverse Document Frequency (IDF) measures
built manually to categorise websites using the six criteria listed the importance of that keyword across a large collection of docu-
in Table 1. Each of those criteria contains a number of phishing ments [28]. CANTINA evaluates the website content (links, anchor
indicators as described in the same table. Each feature in the tags, forms tags, images, text, etc.) for TF–IDF to produce a lexical
I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55 53
Table 2
Common anti-phishing methods based on ML.
Method name ML technique First Author Reference
Dynamic rule induction Rule induction learning Qabajeh Issa Qabajeh, et al. 2014
Enhanced dynamic rule induction Rule induction and covering Thabtah Fadi Thabtah et al. [29,30]
approaches
Classification based association AC Aburrous Maher Aburrous et al. [23,26]
Multi-label classifier based associative AC Abdelhamid Neda Abdelhamid et al. [18]
classification
Self-structuring neural network NN Mohammad Rami Mohammad et al. [25,32]
Neural network trained with NN Mohammad Rami Mohammad et al. [71]
back-propagation
Feed forward neural network NN Jameel Noor Ghazi Jameel and George [33]
Fuzzy DM Fuzzy logic Aburrous Maher Aburrous et al. [22]
Fuzzy DM Fuzzy logic Khadi Anindita Khadi and Shinde [4]
PILFER Decision tree Fette Ian Fette et al. [69]
Page classifier SVM Pan Ying Pan and Ding [74]
PDENF Fuzzy and clustering Almomani, Ammar Almomani et al. [14,17]
CANTINA Term frequency and inverse document Sanglerdsinlapachai Nuttapong Sanglerdsinlapachai and Rungsawang [75]
frequency
Biased SVM, LIBSVM, ANN, self-organising NN, SVM and other ML techniques Basnet Ram Basnet et al. [15]
map
signature of the website. This signature (top ranked TF–IDF key others. Unlike existing phishing reviews that were based around
words) will be passed into the search engine to seek their rank in only intelligent techniques such as machine learning and data
domain names and decide the type of the website. The description mining this paper focuses on raising awareness and educating
of the CANTINA based classification process is as follows: users on phishing from training and legal prospective. This indeed
will equip individuals with knowledge and skills that may pre-
1. Parse the webpage. vent phishing on a wider context within the community. In this
2. Compute the TF–IDF for the common terms of the website. paper, we review conventional anti-phishing approaches such as
3. Select the top five terms according to the computed scores law enforcement, user training, and education and then critically
of all TF–IDF terms. analyses their different methods. Then the attention is directed
4. Add the top five terms to the URL to locate the lexical to review predictive ML method particularly rule-based methods,
signature. decision trees, associative classification, SVM, NN, and computa-
5. Input the lexical signature into a search engine. tional intelligence. We contrast the ways these methods detect
6. Check whether the domain name of the current website phishing activities, their performance and their advantages and
matches the domain names of the top N search results (often disadvantages.
N = 30). While many countries such as the USA have taken a lead to
7. Return ‘‘Legitimate’’ when there is a match or ‘‘Phishy’’ when criminalise phishing activities and put together more severe leg-
there is no match. islations, it is still hard to find attackers basically since phishing
attacks have a short life span. Despite this limitation, it is still
When the search results in an empty set, the current website is crucial that law enforcement agencies improve their information
classified as ‘‘phishy’’. To overcome the ‘‘no results’’ problem the sharing work as well as jurisdiction. Moreover, educating novice
authors merged TF–IDF with other content features such as ‘‘IP users using visual cues can partly improve their abilities to detect
Address’’, ‘‘domain age’’. ‘‘suspicious Images’’, ‘‘suspicious Link’’, phishing; however, many novice users still not paying high atten-
and ‘‘suspicious URL’’. tion to visual cues when browsing the internet which make them
Sanglerdsinlapachai and Rungsawang [75] have used CANTINA vulnerable to phishing attacks. Users need to be exposed to repet-
TF–IDF, and added a few more features such as ‘‘Forms’’ and ‘‘Top itive training about phishing attacks since phishers continuously
pages’’ similarity linked with the domain’’, and removed features change the deception tactics.
such as ‘‘domain age’’ and ‘‘known images’’. A dataset consisting of Online phishing communities gather data that allow users to
200 websites was used in the experiments, and three DM methods share information about phishing attacks such blacklisted URLs,
were applied to the dataset. Results obtained, despite being lim- which is useful information centre for users. However, this ap-
ited, revealed that the reduced features set maintained a similar proach necessitates good awareness about web security indicators
detection rate with that of the CANTINA features set. Moreover, besides blacklisted URLs become outdated as updates are not per-
adding the new features slightly enhanced the detection rate for formed in real-time.
most of the learning methods considered in the experiments. Finally, anti-phishing methods based around ML especially AC
Table 2 shows a brief summary of the common anti-phishing and rule induction are suitable to combat phishing due to their high
approaches that are based on automated learning along with the detection rate and more importantly the easy to understand out-
comes they offer (If-Then rules). These rules empower novice users
name of the method, the learning approached used, the first author,
as well as security experts to understand and manage security
and their reference details.
indicators. However, adding a visualisation layer into ML learning
methods is advantageous to novice users as they may react quickly
5. Conclusions
to visual cues.
In near future we are intend to design and implement a knowl-
Website phishing classification is a fundamental problem due
edge base using rule induction that can on real time warns online
to the very large online transactions performed by businesses,
users of any possibility of phishing attacks.
individuals and governments. While many users are vulnerable to
the phishing attacks, playing catch-up to the phishers’ evolving References
strategies is not an option. There have been different approaches
to combat phishing ranging from legal, educational, simulation, [1] N. Abdelhamid, F. Thabtah, Associative classification approaches: Review and
online community forums, black lists and machine learning among comparison, J. Inf. Knowl. Manage. (JIKM) 13 (3) (2014) 1450027.
54 I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55
[2] S. Sheng, M. Holbrook, N.A.G. Arachchilage, L. Cranor, J. Downs, Who falls for [35] General Assembly of Virginia, 2005. CHAPTER 827. http://leg1.state.va.us/cgi-
phish?: a demographic analysis of phishing susceptibility and effectiveness of bin/legp504.exe?051+ful+CHAP0827 [Accessed 01.04.16].
interventions, in: CHI ’10 Proceedings of the 28th International Conference on [36] G.H. Pike, Lost data: The legal challenges, Inf. Today 23 (10) (2006) 1–3.
Human Factors in Computing Systems, ACM, New York, NY, USA, 2010. [37] Executive-Order-13402, 2006. Executive Order 13402. http://www.gpo.gov/
[3] N. Abdehamid, Multi-label rules for phishing classification, Appl. Comput. Inf. fdsys/pkg/FR-2006-05-15/pdf/06-4552.pdf [Accessed 19.05.16].
11 (1) (2015) 29–46. [38] BBC News, 2005. http://news.bbc.co.uk/2/hi/uk_news/england/lancashire/43
[4] A. Khadi, S. Shinde, Detection of phishing websites using data mining tech- 96914.stm [Accessed 11.04.16].
niques, Int. J. Eng. Res. Technol. 2 (12) (2014). [39] Government of Australia, Hackers, Fraudsters and Botnets: Tackling the Prob-
[5] Afroz, R. Greenstadt, PhishZoo: Detecting phishing websites by looking at lem of Cyber Crime. Report on Inquiry into Cyber Crime, 2011.
them, in: Fifth International Conference on Semantic Computing (September [40] ClickDimensions, 2014. www.clickdimensions.com/sites/default/files/PDF/W
18 –September 21), IEEE, Palo Alto, California USA, 2011. hitePaper-CASL.pdf [Accessed 12.05.16].
[6] L.A.T. Nguyen, B.L. To, H.K. Nguyen, An efficient approach for phishing detec- [41] S.D. Julie, H. Mandy, L.F. Cranor, Behavioral response to phishing risk, in: The
tion using neuro-fuzzy model, J. Autom. Control Eng. 3 (6) (2015). Anti-Phishing Working Groups, 2nd Annual ECrime Researchers Summite,
[7] McCall, Gartner, Inc. 2011. http://www.gartner.com/newsroom/id/565125 Crime ’07, ACM, New York, NY, USA, 2007.
[Accessed 05.06.16]. [42] N.A.G. Arachchilage, S. Love, A game design framework for avoiding phishing
[8] D. Jevans, Anti-Phishing Working Group (APWG): http://www.antiphishing. attacks, Comput. Hum. Behav. 29 (3) (2013) 706–714.
org/ [Accessed 20.06.16], 2003. [43] N.A.G. Arachchilage, M. Cole, Design a mobile game for home computer users
[9] G. Aaron, R. Manning, APWG Phishing Reports, 2014. http://docs.apwg.org/ to prevent from phishing attacks, in: 2011 International Conference on Infor-
reports/apwg_trends_report_q4_2014.pdf [Accessed 20.03.16]. mation Society (i-Society), 2011, pp. 485–489.
[10] V. Suganya, A review on phishing attacks and various anti phishing techniques, [44] N.A.G. Arachchilage, Y. Rhee, S. Sheng, S.H. Hasan, A. Acquisti, L.F. Cranor, J.
Int. J. Comput. Appl. (0975–8887) 139 (1) (2016) 20–23. Hong, Getting users to pay attention to anti-phishing education: evaluation
[11] R. Mohammad, F. Thabtah, L. McCluskey, Tutorial and critical analysis of of retention and transfer, in: ECrime ’07 Proceedings of the Anti-Phishing
phishing websites methods, Comput. Sci. Rev. J. 17 (2015) 1–24. Elsevier. Working Groups 2nd Annual ECrime Researchers Summit, ACM, Pittsburgh,
[12] R. Mohammad, F. Thabtah, L. McCluskey, Phishing websites dataset. 2015, PA, USA, 2007.
Available: https://archive.ics.uci.edu/ml/datasets/Phishing+Websites Accessed [45] D.J.C. Ronald, C. Curtis, F.J. Aaron, Phishing for user security awareness,
January 2016. Comput. Secur. 26 (1) (2007) 73–80.
[13] K.R. Sahu, J. Dubey, A Survey on phishign attacks, Int. J. Comput. Appl. (0975– [46] M. Bright, MillerSmiles. 2011, [Online] Available at: http://www.millersmiles.
8887) 88 (10) (2014) 42–45. co.uk/ [Accessed 09.01.16].
[14] A. Almomani, B.B. Gupta, S. Atawneh, A. Meulenberg, E. Almomani, A survey of [47] B. Nahorney, The MessageLabs Intelligence Annual Security Report: 2009
phishing email filtering techniques, IEEE Commun. Surv. Tutor. 15 (4) (2013) Security Year in Review. 2015. http://www.symantec.com/content/en/us/
2070–2090. enterprise/other_resources/intelligence-report-06-2015.en-us.pdf [Accessed
[15] R. Basnet, S. Mukkamala, A.H. Sung, Detection of phishing attacks: A machine 09.06.16].
learning approach, Soft Comput. Appl. Ind. (2008) 373–383. [48] Cloudmark Org. Cloudmark. 2002. http://www.cloudmark.com/en/home [Ac-
[16] B.B. Gupta, N.A.G. Arachchilage, K.E. Psannis, Defending against phishing at- cessed 10.02.16].
tacks: taxonomy of methods, current issues and future directions, Telecom- [49] WOT, Web of Trust. 2006. http://www.mywot.com/ [Accessed 24.03.16].
mun. Syst. 2017 (2017) 1–21. [50] Google Safe-Browsing, 2010. Google Safe Browsing. http://code.google.com/p/
[17] A. Almomani, B.B. Gupta, TC. Wan, A. Altaher, S. Manickam, Phishing dynamic google-safe-browsing/ [Accessed 10.04.16].
evolving neural fuzzy framework for online detection zero-day phishing email. [51] McAfee SiteAdvisor, 2006. McAfee SiteAdvisor. http://www.siteadvisor.com/
2013. arXiv preprint arXiv:1302.0629(2013). [Accessed 19 February 2016].
[18] N. Abdelhamid, F. Thabtah, A. Ayesh, Phishing detection based associative [52] Retun Path, 2016. https://blog.returnpath.com/blacklist-basics-the-top-email
classification data mining, Expert Syst. Appl. J. 41 (2014) 5948–5959. -blacklists-you-need-to-know-v2/ [Accessed 22.03.16].
[19] B.B. Gupta, A. Tewari, AK. Jain, D.P. Agrawal, Fighting against phishing attacks: [53] J. Chen, C. Guo, Online detection and prevention of phishing attacks (Invited
state of the art and future challenges, 2017. Paper), in: First International Conference on Communications and Networking
[20] M. Rader, S. Rahman, Exploring historical and emerging phishing techniques in China, ChinaCom ’06, IEEE, Beijing, 2006.
and mitigating the associated security risks, Int. J. Netw. Secur. Appl. (IJNSA) [54] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten, The WEKA
5 (4) (2015). http://dx.doi.org/10.5121/ijnsa.2013.540223. July 2013. data mining software: An update, SIGKDD Explor. 11 (1) (2009).
[21] McFredies, P (n.d.) Phishing. 2016. http://www.wordspy.com/words/phishing. [55] H. Liu, R. Setiono, Chi2: Feature selection and discretization of numeric at-
asp [Accessed 15.05.16]. tribute, in: Proceedings of the Seventh IEEE International Conference on Tools
[22] M. Aburrous, A. Hossain, K. Dahal, F. Thabtah, Intelligent quality performance with Artificial Intelligence, November 5–8, 1995, p. 388.
assessment for E-Banking security using fuzzy logic, in: Proceedings of the [56] J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San
7th IEEE International Conference on Information Technology (ITNG 2008). Las Mateo, CA, 1993.
Vegas, USA, 2008. [57] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, John Wiley &
[23] M. Aburrous, M. Hossain, K.P. Dahal, F. Thabtah, Associative Classification Sons, New York, 1973.
techniques for predicting e-banking phishing websites, in: Proceedings of the [58] B. Liu, W. Hsu, Y. Ma, Integrating classification and association rule mining, in:
2010 International Conference on Information Technology, Las Vegas, Nevada, Proceedings of the Knowledge Discovery and Data Mining Conference- KDD,
USA, 2010, pp. 176–181. 1998, pp. 80–86. New York.
[24] I. Qabajeh, F. Thabtah, F. Chiclana, Dynamic classification rules data mining [59] W. Li, J. Han, J. Pei, 2001 CMAR: Accurate and efficient classification based
method, J. Manage. Anal. 2 (3) (2015) 233–253. Wiley. on multiple-class association rule, in: Proceedings of the IEEE International
[25] R. Mohammad, F. Thabtah, L. McCluskey, Intelligent rule based phishing web- Conference on Data Mining-ICDM, pp. 369–376.
sites classification, J. Inf. Secur. (ISSN: 17518709) (2) (2014) 1–17. IET. [60] F. Thabtah, P. Cowling, Y. Peng, MCAR: Multi-class classification based on asso-
[26] M. Aburrous, M. Hossain, K.P. Dahal, F. Thabtah, Experimental case studies ciation rule approach, in: Proceedings of the 3rd IEEE International Conference
for investigating e-banking phishing techniques and attack strategies, J. Cogn. on Computer Systems and Applications, 2005, pp. 1–7.
Comput. 2 (3) (2010) 242–253. Springer Verlag. [61] W.W. Cohen, Fast effective rule induction, in: Proceedings of the Twelfth
[27] PhishTank, 2011. PhishTank. http://www.phishtank.com/ [Accessed 16.01.16]. International Conference on Machine Learning, Morgan Kaufmann, Tahoe City,
[28] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and California, 1995.
Techniques, 2005. [62] J. Cendrowska, PRISM: An algorithm for inducing modular rules, Int. J. Man-
[29] F. Thabtah, R. Mohammad, L. McCluskey, A dynamic self-structuring neural Mach. Stud. 27 (4) (1987) 349–370.
network model to combat phishing, in: The Proceedings of the 2016 IEEE [63] Grossberg, Nonlinear neural networks: Principles, mechanisms, and architec-
World Congress on Computational Intelligence. Vancover, Canada, 2016. tures, Neural Netw. 1 (1) (1988) 17–61.
[30] F. Thabtah, I. Qabajeh, F. Chiclana, Constrained dynamic rule induction learn- [64] H. Joachims, Making Large-Scale Support Vector Machine Learning Practical,
ing, Expert Syst. Appl. 63 (2016) 74–85. Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge,
[31] J. Lee, L. Bauer, L.M. Mazurek, The effectiveness of security images in internet MA, 1999.
banking, IEEE Internet Comput. 19 (1) (2015) 54–62. [65] J. Platt, Fast training of SVM using sequential optimization, in: B. Scholkopf, C.
[32] R. Mohammad, F. Thabtah, L. McCluskey, Predicting phishing websites based Burges, A. Smola (Eds.), Advances in Kernel Methods–Support Vector Learning,
on self-structuring neural network, J. Neural Comput. Appl. (ISSN: 0941-0643) MIT Press, Cambridge, 1998, pp. 185–208.
25 (2) (2014) 443–458. Springer. [66] L.A. Zadeh, ‘‘Fuzzy Sets,’’ Information and Control 8 (3) (1965) 338–353. http:
[33] N.Gh. Jameel, L. George, Detection of phishing emails using feed forward neural //dx.doi.org/10.1016/S0019-9958(65)90241-X.
network, J. Comput. Appl. 77 (7) (2013) 10–15. [67] Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning
[34] Information Week (n.d.), 2016. http://www.informationweek.com/california- and an application to boosting, J. Comput. Syst. Sci. 55 (1) (1997) 119–139.
enacts-tough-anti-phishing-law-/d/d-id/1036636? [Accessed 17.03.16].
I. Qabajeh et al. / Computer Science Review 29 (2018) 44–55 55
[68] E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learn- [72] J. Seymour, P. Tully, Generative Models for Spear Phishing Posts on SocialMe-
ing, MA: Addison Wesley., 1989. dia. Technical report, 2018.
[69] I. Fette, N. Sadeh, A. Tomasic, Learning to detect phishing emails, in: Proceed- [73] S. Abu-Nimeh, D. Nappa, X. Wang, Nair, A comparison of machine learning
ings of the 16th international conference on World Wide Web. 2007, pp. 649– techniques for phishing detection, in: The 2nd Annual Anti-Phishing Working
656. Groupse Crime Researchers, eCrime ’07, ACM, New York, NY, USA, 2007.
[70] David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning represen- [74] Y. Pan, X. Ding, Anomaly based web phishing page detection, in: The 22nd
tations by back-propagating errors, Nature 323 (6088) (1986) 533–536. Annual Computer Security Applications Conference, (ACSAC), IEEE, Miami
[71] R.M. Mohammad, F. Thabtah, L. McCluskey, Predicting phishing websites using Beach, Florida, USA, 2006.
neural network trained with back-propagation, in: World Congress in Com- [75] N. Sanglerdsinlapachai, A. Rungsawang, Using domain top-page similarity
puter Science, Computer Engineering, and Applied Computing, Las Vigas, 2013, feature in machine learning-based web, in: Third International Conference on
pp. 682–686. Knowledge Discovery and Data Mining, IEEE, 2010.