Professional Documents
Culture Documents
AbstractSocial network services have become one of the was Koobface [10], [11]. Being currently the largest social
dominant human communication and interaction paradigms. networking service, Facebook is the main vector of attack via
However, the emergence of highly stealth attacks perpetrated by social networking services [12]. Current techniques to detect
bots in social-networks lead to an increasing need for efficient
detection methodologies. The bots objectives can be as varied bots within a social network rely on automated algorithms that
as those of traditional human criminality by acting as agents evaluate social relations. Based on graph-theory techniques,
of multiple scams. Bots may operate as independent entities they try to detect unnatural relations in social networks [13],
that create fake (extremely convincing) profiles or hijack the [9]. Another technique used to detect bot activity measures
profile of a real person using his infected computer. Detecting mouse movements and keystrokes produced while interacting
social networks bots may be extremely difficult by using human
common sense or automated algorithms that evaluate social in the generation of online contents. In [14] this class of
relations. However, bots are not able to fake the characteristic behavioral analysis was applied in blogging activities, but
human behavior interactions over time. The pseudo-periodicity it can also be easily applied to social networks interfaces.
mixed with random and sometimes chaotic actions characteristic The main downside of this approach is that it must rely on
of human behavior is still very difficult to emulate/simulate. software loaded on the client browser, which can be difficult to
Nevertheless, this human uniqueness is very easy to differentiate
from other behavioral patterns. As so, novel behavior analysis implement and certainly impossible to generalize to all users
and identification methodologies are necessary for an accurate due to confidentiality constrains. A viable solution should only
detection of social network bots. In this work, we propose a new rely on ubiquitous statistics that do not compromise the users
paradigm that, by jointly analyzing the multiple scales of users privacy, namely counting the number of social interactions per
interactions within a social network, can accurately discriminate time interval (e.g., number of posts, number of likes, number
the characteristic behaviors of humans and bots within a social
network. Consequently, different behavior patterns can be built of photo uploads).
for the different social network bot classes and typical humans It is extremely difficult to program a bot to replicate
interactions, enabling the accurate detection of one of most recent the characteristic human behavior of social interactions over
stealth Internet threats. time. Humans actions have an inherent pseudo-periodicity
Keywords - Facebook user behavior, Human social-networking mixed with random (and sometimes chaotic) actions which
behavior, social-network bots, bot detection, Facebook interactions are almost impossible to emulate/simulate. Nevertheless, this
model. human uniqueness is very easy to differentiate from other
behavioral patterns. Therefore, in this paper, we propose a new
methodology that, by jointly analyzing the multiple scales of
I. I NTRODUCTION
the users interactions within a social network, can accurately
Together with the growing predominance of social networking discriminate the characteristic behaviors of humans and bots
services in human communication, a set of scam attacks within a social network. Consequently, different behavior
perpetrated by bots [1], [2] have emerged. We are currently signatures can be use to accurately detect bots acting with
witnessing a major increasing in cybercrime attacks against a social network.
individuals targeting their personal data and financial assets The proposed methodology applies the concept of multi-
[3], [4]. Most of the current Internet malware threats dissem- scalling analysis based on scalograms to the statistical pro-
inate themselves using social engineering and, mainly, using cesses that describe the interaction of a user within the social
social networks [5], since social networking provides an open network. Scalograms reveal much information about the nature
field for illicit activities [6]. Social networking sites are always of non-stationary processes that was previously hidden, so they
improving their security but this is a constant race behind are applied to a lot of different scientific areas: diagnosis
the leading criminals [7]. Existing botnets [8], [9] can use of special events in structural behavior during earthquake
social networking services to spread themselves, but more excitation, ground motion analysis, transient building response
importantly, can use social networks to impersonate the owner to wind storms, analysis of bridge response due to vortex
of the controlled machine in order to obtain valuable personal shedding, among others [15].
information or force the person to interact with unwanted in- The remaining part of this paper is organized as follows:
dividuals or services. One of the better documented examples Section II presents some important background on multiscal-
of social networking services abuse for malicious purposes ing analysis; Section III presents the characteristic behaviors
number of posts 4
1
activity energy
average activity energy standard deviation
0
5 5 5
10 10 10
timescale (hours)
15 15 15
20 20 20
25 25 25
30 30 30
35 35 35
40 40 40
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
time (days)
Figure 1. Multiscale analysis example: (top-left) number of Facebook posts in 20-minutes intervals, (bottom-left) scalogram / user activity energy per
timescale over time, (bottom-middle) average user activity energy over time and (bottom-right) activity energy standard deviation over time.
of human and bots within a social network and how they of the CWT components of the signal and the (local) Wavelet
differ; Section IV presents the results of a proof-of-concept of Power Spectrum (sometimes called Scalogram or Wavelet
the proposed detection methodologies and, finally, Section V Periodogram) is defined as the normalized energy over time
presents some brief conclusions about the presented detection and scales:
methodologies. 2
|Wx; (, s)|
Ex (, s) = 100 P P (3)
0 0 2
II. M ULTISCALING ANALYSIS 0 s0 |Wx; ( , s )|
The main purposes of a multiscaling analysis is to identify the An example of a scalogram can be observed in Figure 1
most important time-scales of (pseudo-periodicity) activity and (bottom-left).
quantify the constancy of that pseudo-periodicity. In order to
achieve that objective, it is necessary to quantify the activity
over time for multiple timescales. III. S OCIAL -N ETWORKING B EHAVIOR
Wavelets are mathematical functions that are used to di- C HARACTERIZATION
vide a given signal into its different timescales components. A. Single User Behavior Inference
Wavelets enable the analysis of each one of the signal compo-
nents in an appropriate scale. Starting with a mother wavelet Within the context of this paper, signal x(t) is a counting
(t), a family ,s (t) of wavelet daughters can be obtained process that quantifies the number of social network interac-
by simply scaling and translating (t): tions (i.e., number of posts, number of likes and number of
posted photos) in a time-interval (e.g., 30 minutes, as used
1 t in section IV) over time. An example of a counting process
,s (t) = p ( ) (1)
|s| s (Facebook posts) can be observed in Figure 1 (top-left). In
where s is a scaling or dilation factor that controls the width of order to characterize the user multiscale behaviour over time,
the wavelet (factor 1 is introduced to guarantee the energy it is possible to estimate (i) the Average Activity Energy of
|s|
signal x(t), Ex (s), by averaging the normalized energy of
preservation, k,s k = ||) and is a translation parameter
the signal over time for all timescales (see (4)) and (ii) the
controlling the time location of the wavelet. Scaling a wavelet
Activity Energy Standard Deviation of signal x(t), Ex (s),
simply means stretching it (if |s| > 1) or compressing it (if
by calculating the standard deviation of the normalized energy
|s| < 1), while translating it simply means shifting its position
of the signal over time for all timescales (see (5)):
in time.
Given a signal x(t), its Continuous Wavelet Transform 1 X
N
(CWT) with respect to the wavelet is a function of time ( ) Ex (s) = Ex (i , s), s (4)
N i=1
and scale (s), Wx; (, s), obtained by projecting x(t) onto the
wavelet family {,s }: v
u N
Z u 1 X 2
1 t Ex (s) = t (Ex (i , s) Ex (s)) , s (5)
Wx; (, s) = x(t) p ( )dt (2) N 1 i=1
+ |s| s
By analogy with the terminology used in the Fourier case, An example of these metrics can be observed in Figure 1, in
the energy components of the signal are given by the square the bottom middle and right plots, respectively.
4.5e-05
B. User Group Behavior Inference
4.0e-05
of user u in the social network, we can quantify the mean and 2.0e-05
Human Posts (2011-2012)
variance values of the (i) group average activity energy (see 1.5e-05
Human Posts (2007-2008)
0 4 8 12 16 20 24 28 32 36 40
(6) and (7)) and (ii) group activity energy standard deviation timescale (hours)
(see (8) and (9)), for all timescales and users within the group: 1.0e-03
Human Posts (2011-2012)
7.0e-05 7.0e-05
Human Posts Human Photos Up.
Exponential-Bot Posts Exponential-Bot Photos Up.
6.0e-05 Periodic-Bot Posts 6.0e-05 Periodic-Bot Photos Up.
average activity energy
4.0e-05 4.0e-05
3.0e-05 3.0e-05
2.0e-05 2.0e-05
1.0e-05 1.0e-05
0.0e+00 0.0e+00
0 4 8 12 16 20 24 28 32 36 40 0 4 8 12 16 20 24 28 32 36 40
timescale (hours) timescale (hours)
1.4e-03 1.2e-03
Human Posts Human Photos Up.
activity energy standard deviation
2.0e-04 2.0e-04
0.0e+00 0.0e+00
0 4 8 12 16 20 24 28 32 36 40 0 4 8 12 16 20 24 28 32 36 40
timescale (hours) timescale (hours)
Figure 3. Multiscale characteristics of human and bot posts on Facebook in Figure 5. Multiscale characteristics of human and bot photo uploads on
2011-2012 dataset: (top) average energy activity over time for all scales and Facebook in 2011-2012 dataset: (top) average energy activity over time for
(bottom) standard deviation of energy activity over time for all scales, with all scales and (bottom) standard deviation of energy activity over time for all
98% confidence intervals. scales, with 98% confidence intervals.
8.0e-04
Human Likes
activity energy standard deviation
4.0e-04
human from bot users within a social network.
3.0e-04
2.0e-04
V. C ONCLUSION
1.0e-04
5.0e-05
networks. The results obtained reveal that multiscale behavior
4.0e-05
signatures can be built for different social network bot classes
3.0e-05
and typical human interactions, which will enable the devel-
2.0e-05
opment of accurate tools for the detection of social networks
1.0e-05
bots.
0.0e+00
0 4 8 12 16 20 24 28 32 36 40
timescale (hours)
R EFERENCES
Figure 4. Multiscale characteristics of human and bot likes on Facebook in
2011-2012 dataset: (top) average energy activity over time for all scales and [1] S. Sengupta, Bots Raise Their Heads Again on Facebook, The New
(bottom) standard deviation of energy activity over time for all scales, with York Times - Bits Blog, Jul. 2012.
98% confidence intervals. [2] E. Gamma, Your Facebook Friends May Be Evil Bots, InfoWorld,
Apr. 2013.
[3] E. Kraemer-Mbula, P. Tang, and H. Rush, The cybercrime ecosystem:
Online innovation in the shadows? Technological Forecasting and
to observe that bots and humans have distinct multiscale Social Change, vol. 80, no. 3, Mar. 2013, pp. 541555.
[4] W. Kim, O.-R. Jeong, C. Kim, and J. So, The dark side of the internet:
behaviors. Periodic bots have their average activity energy Attacks, costs and responses, Information Systems, Special Issue on
centered on a 24-hour timescale and have low energy variation. WISE 2009 - Web Information Systems Engineering., vol. 36, no. 3,
Exponential bots and human users have a similar average 2011, pp. 675705.
[5] S. Abraham and I. Chengalur-Smith, An overview of social engineering
activity energy distribution over the timescales, however, hu- malware: Trends, tactics, and implications, Technology in Society,
man users still have slighter higher energy around the 24-hour vol. 32, no. 3, Aug. 2010, pp. 183196.
timescale. However, the variation of activity energy over time [6] G. R. Weir, F. Toolan, and D. Smeed, The threats of social networking:
Old wine in new bottles? Information Security Technical Report,
is much higher in human users. vol. 16, no. 2, May 2011, pp. 3843.
The multiscale characteristics of human activities in social [7] P. Jagnere, Vulnerabilities in social networking sites, in 2nd IEEE
networks have a pseudo-periodicity of 24-hour, however, the International Conference on Parallel Distributed and Grid Computing
(PDGC 2012), Dec. 2012, pp. 463468.
human analysis reveals an inherent chaotic and unpredictable [8] S. S. Silva, R. M. Silva, R. C. Pinto, and R. M. Salles, Botnets: A
behavior shown by the much higher variation of activity energy survey, Computer Networks, Fev. 2013, pp. 378403 .