You are on page 1of 36

Modelling Cascades Over Time

in Microblogs
Wei Xie, Feida Zhu, Siyuan Liu and Ke Wang*
Living Analytics Research Centre
Singapore Management University

* Ke Wang is from Simon Fraser University, and this work was done when the author was visiting
Living Analytics Research Centre in Singapore Management University.

Motivation

Business applications such as viral marketing


have driven a lot of research effort predicting
whether a cascade will go viral.

In real life, there are very few truly viral


cascades.

Previous research work* shows that temporal


features are the key predictor of cascade size.

* Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon M. Kleinberg, Jure Leskovec:
Can cascades be predicted? WWW 2014: 925-936

Time-aware Cascade Model


u5
u4

u0
t0

u1
t1
u2

u3

t3

t2

u5
u0
t0

u1
t1
u2

t2

u4
t4
u3

t3

t + dt

Time-aware Cascade Model


u5
u4

u0
t0

u1
t1
u2

u3

t3

t2

u5
u0
t0

u1
t1
u2

t2

u4
t4
u3

t3

t + dt

Time-aware Cascade Model


u5

u5

u4

u0
t0

u1
t1
u2

u0
t0

u3

u2

t3

t2

u1
t1

u3

t3

t2

u4
t4

t + dt

P i (t) = h i (t, {t j } u j Followee (i) (t) ; ) dt


P(C(t + dt)) = P(C(t + dt)|C(t)) P(C(t))

P(C(t 0 )) = 1

P(C(t + dt)|C(t)) = P i (t) (1 P i (t))

(1)
(2)
u i X

(t)

u i X

(t)

Time-aware Cascade Model


u5

u5

u4

u0
t0

u1
t1
u2

u0
t0

u3

u2

t3

t2

u1
t1

u3

t3

t2

u4
t4

t + dt

P i (t) = h i (t, {t j } u j Followee (i) (t) ; ) dt


P(C(t + dt)) = P(C(t + dt)|C(t)) P(C(t))

P(C(t 0 )) = 1

P(C(t + dt)|C(t)) = P i (t) (1 P i (t))

(1)
(2)
u i X

(t)

users who have re-shared

u i X

(t)

Time-aware Cascade Model


u5

u5

u4

u0
t0

u1
t1
u2

u0
t0

u3

u2

t3

t2

u1
t1

u3

t3

t2

u4
t4

t + dt

P i (t) = h i (t, {t j } u j Followee (i) (t) ; ) dt


P(C(t + dt)) = P(C(t + dt)|C(t)) P(C(t))

P(C(t 0 )) = 1

P(C(t + dt)|C(t)) = P i (t) (1 P i (t))

(1)
(2)
u i X

(t)

u i X

(t)

users who have re-shared users who havent yet

Time-aware Cascade Model


u5

u5

u4

u0
t0

u1
t1
u2

u0
t0

u3

u2

t3

t2

u1
t1

u3

t3

t2

u4
t4

t + dt

P i (t) = h i (t, {t j } u j Followee (i) (t) ; ) dt


P(C(t + dt)) = P(C(t + dt)|C(t)) P(C(t))

P(C(t 0 )) = 1

P(C(t + dt)|C(t)) = P i (t) (1 P i (t))

(1)
(2)
u i X

(t)

u i X

(t)

users who have re-shared users who havent yet

Observations in Twitter
Observation 1. Only the first re-sharer matters.

P i (t) = h i (t, t j ; ) dt
where

j = argmin j {t j |u j Followee (i) (t)}

Observations in Twitter
Observation 1. Only the first re-sharer matters.

P i (t) = h i (t, t j ; ) dt
where

j = argmin j {t j |u j Followee (i) (t)}

Observation 2. The chance of a tweet to be


retweeted decreases as time goes by.

P i (t) = h i ( ; ) dt
where = t t j and h i () is a decreasing function.

Hazard Function Design


P(t < T t + dt|T > t)
f(t)
h(t) = lim
=
dt0
dt
1 F(t)

Hazard Function Design


P(t < T t + dt|T > t)
f(t)
h(t) = lim
=
dt0
dt
1 F(t)
t

F (u)
H(t) = h(u)du =
du = log(1 F(u))| t0 = log(1 F(t))
1 F(u)
0

Hazard Function Design


P(t < T t + dt|T > t)
f(t)
h(t) = lim
=
dt0
dt
1 F(t)
t

F (u)
H(t) = h(u)du =
du = log(1 F(u))| t0 = log(1 F(t))
1 F(u)
0

F(t) = 1 e

H(t)

Hazard Function Design


P(t < T t + dt|T > t)
f(t)
h(t) = lim
=
dt0
dt
1 F(t)
t

F (u)
H(t) = h(u)du =
du = log(1 F(u))| t0 = log(1 F(t))
1 F(u)
0

F(t) = 1 e
t
H(t) =

F(t) = 1 e

H(t)

Exponential distribution

Hazard Function Design


P(t < T t + dt|T > t)
f(t)
h(t) = lim
=
dt0
dt
1 F(t)
t

F (u)
H(t) = h(u)du =
du = log(1 F(u))| t0 = log(1 F(t))
1 F(u)
0

F(t) = 1 e
t
H(t) =

t
H(t) = ( )

F(t) = 1 e
F(t) = 1 e

H(t)

( t )

Exponential distribution
Weibull distribution

Hazard Function Design


t
H(t) =

t
H(t) = ( )

F(t) = 1 e
F(t) = 1 e

( t )

Exponential distribution
Weibull distribution

Hazard Function Design


t
H(t) =

t
H(t) = ( )

F(t) = 1 e
F(t) = 1 e

Exponential distribution

( t )

H() = F() = 1 e

Weibull distribution

F() = 1

Hazard Function Design


t
H(t) =

t
H(t) = ( )

F(t) = 1 e
F(t) = 1 e

Exponential distribution

( t )

H() = F() = 1 e

Weibull distribution

F() = 1

Hazard Function Design


t
H(t) =

t
H(t) = ( )

F(t) = 1 e
F(t) = 1 e

Exponential distribution

( t )

H() = F() = 1 e

Weibull distribution

F() = 1

Hazard Function Design

H() = (1 ( + 1) )

dH()

(+1)
h() =
= ( + 1)
d

Hazard Function Design

H() = (1 ( + 1) )

Hazard Function Design

H() = (1 ( + 1) )

scale parameter

Hazard Function Design

H() = (1 ( + 1) )

scale parameter

shape parameter

Hazard Function Design

H() = (1 ( + 1) )

scale parameter

F() H() =

shape parameter

Hazard Function Design

H() = (1 ( + 1) )

scale parameter

shape parameter

F() H() =

describes the eventual re-tweeting probability

Hazard Rate Illustration

Hazard Rate Illustration


Retweeting Rate

20
16
12
8
4
0

tC

60
Time (Minute)

Hazard Rate Illustration


Retweeting Rate

20
16
12
8
4
0

tC

60
Time (Minute)

Hazard Rate

16e-4

Emperical Rate
Estimated Rate

12e-4
8e-4
4e-4
0
0

10

20

30

40

Time (Minute)

50

60

Dataset
From a Singapore based Twitter data set, we get all the
retweets to construct retweeting cascades. In all we get
2,425,348 cascades.

Probabilistic Model Fitting


TMt Threshold Model

h i (t) = s(|Followee (i) (t)|)

where

s(x) =

1 + e a(xb)

TCM-CH Constant Hazard


H() =

TCM-EH Exponential Hazard


H() = (1 e

dH()
h() =
=
d

dH()
h() =
= k e k
d

TCM-LH Long tail Hazard (our proposed)

H() = (1 ( + 1) )

dH()

h() =
= ( + 1) (+1)
d

Probabilistic Model Fitting


For each cascade, observe its development in first
training, and the next T for testing.

T 0 for

Probabilistic Model Fitting

Predicting Cascade Growth

Virality Prediction

Thanks

Our work is based on previous


cascade models

J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the
underlying process of word-of- mouth. Marketing letters, 12(3):211223, 2001.

M.Gomez-Rodriguez,D.Balduzzi,andB.Scho lkopf.Uncovering the temporal dynamics of


diffusion networks. In Proceedings of the 28th International Conference on Machine Learning,
ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 561568, 2011.

S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks.
In The 18th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining,
KDD 12, Beijing, China, August 12-16, 2012, pages 3341, 2012.

M. Gomez-Rodriguez, J. Leskovec, and B. Scho lkopf. Modeling information propagation with


survival theory. In ICML (3), pages 666674, 2013.

N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuoustime diffusion networks. In Advances in Neural Information Processing Systems 26: 27th
Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting
held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 31473155, 2013.

You might also like