You are on page 1of 28

An Investigation into Stochastic Processes

for Modelling Human-Generated Data


A 20-cp 3rd year project
Author:
Tim Jones
Supervisor:
Dr. Gordon Ross
Rendered on Thursday 25
th
April, 2013
Acknowledgement of Sources
For all ideas taken from other sources (books, articles, internet), the source of the ideas is mentioned in the
main text and fully referenced at the end of the report.
All material which is quoted essentially word-for-word from other sources is given in quotation marks
and referenced.
Pictures and diagrams copied from the internet or other sources are labelled with a reference to the web
page or book, article etc.
Signed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Contents
1 Introduction 3
2 A Zoo of Stochastic Processes 4
2.1 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 A Formal Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 An Intuitive Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 The Markov-Modulated Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Potential Modelling Techniques 10
3.1 Fitting a Non-Homogeneous Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Fitting a Markov-Modulated Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 A derivation of the classical Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 A First Approximation of the Viterbi Algorithm for the Markov-Modulated Poisson
Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 Applying the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.4 An Alternative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 A Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Conclusion 23
A Fun with Integrals 24
2
Chapter 1
Introduction
Modelling human-generated random data is notoriously dicult. Human event data arises in a variety of
real-world situations, for example when inspecting network trac or email communications. Being able to
spot anomalies has important applications in botnet detection and spotting suspicious behaviour on social
networks. In this project, we attempted to model the behaviour of one user of twitter using various random
processes. Ideally, we would nd a good model for twitter in general, but one user makes a worthy starting
point, and twitter provides an easily accessible representation of such data without trawling through network
logs or other peoples email accounts. The user was observed over the course of around 6 months posting
(emitting) a little over 3,300 tweets. The goal is to nd some kind of model to t these data, without using
a hideously large number of parameters.
The data are visualised in Figure 1.1. We see a little seasonality - the user seems to be going to sleep at
some time, waking up at another time, as well as some burstiness, the user will produce a very rapid series
of tweets in a very short time. Being able to detect these things would let us intuitively decide whether a t
is good or not, but principally well be reliant on statistical tests to judge how good our model is.
Chapter 2 will dene a series of stochastic processes which we can use for modelling the data, and then
chapter 3 will apply the more relevant ones. Of particular note is the Markov-Modulated Poisson Process
dened in 2.3, which several authors [1][2][3] have postulated will provide a good model due to its doubly-
stochastic nature the intuition being that the tweets are a stochastic process, and the parameters of that
stochastic process themselves follow another stochastic process. We will attempt to t this and a series of
other models, before concluding that a DTHMM with lognormal emissions provides the best t
Appendix A also describes a minor result which was spotted during this project, and which may have
rather useful applications in other elds.
Figure 1.1: The raw data gathered from the twitter user. Each blue cross marks a tweet at a particular time.
The 24 hours of a day run across the x axis, and the days within our 6 months run up the y-axis, so a point
at (17.5, 256) is a tweet at 5:30pm 256 days into the observation.
3
Chapter 2
A Zoo of Stochastic Processes
2.1 Markov Chains
2.1.1 A Formal Denition
A stochastic process [4, p590] is a collection {X
t
: t T} (sometimes X(t) for continuous T) of random
variables. These collections may be indexed arbitrarily, but tend to be used to describe the evolution of some
random series of events, using T to be some representation of either discrete or continuous time, for instance
X
t
may be the number of observed emissions from a radioactive source after t minutes or the licence plate
of the t
th
car to go past a speed camera.
A Discrete Time Markov Chain (DTMC) is a particular type of discrete-time stochastic process - one
which obeys the Markov Property. We usually set T = {0, 1, 2, ...} = N and we say that (X
t
)
tN
obeys the
Markov Property i
t T, s S P(X
t+1
= s|X
0
, X
1
, ...X
t
) = P(X
t+1
= s|X
t
)[5]
We refer to S as the state-space, each s S is a state, and X
t
represents the state at time t. The Markov
Property essentially states given the present, the future is conditionally independent of the past [5]. There
is also a Continuous Time Markov Chain (CTMC, sometimes CTMP for Continuous Time Markov Process).
We set T = R
+
, let be some positive value close to 0, and dene a similar Markov Property;
s S P(X(t +) = s|{X() : t}) = P(X(t +) = s|X(t))
Well be dealing exclusively with the discrete-space case in this project, ie a Markov Chain where S is
discrete, though well need to use both continuous and discrete times. It is popular for many authors to set
S Z, though our states can be integers, real numbers, popes, or any other completely arbitrary non-empty
set. Where S and T are discrete, we dene = (
i
)
iS
, the initial probability vector, and = (
ij
)
(i,j)S
2,
the matrix of transition probabilities, such that,

i
= P(X
0
= i)
t N
ij
= P(X
t+1
= j|X
t
= i)
In this case, is constant does not depend on time, a property referred to as homogeneity. Every DTMC
can be uniquley dened by the triple (S, , ). Naturally, the row sums of should be 1, ie
i S,

jS

ij
= 1
For CTMCs, rather than a transition probability matrix, a transition rate matrix Q = (q
ij
)
(i,j)S
2 is
dened for small as;
i, j S, P(X(t +) = j|X(t) = i) = q
ij
+o(),
4
where o() is some function such that o() 0 as 0. This gives us that
i, j S, > 0 P(X(t +) = j|X(t) = i) = (e
Q
)
ij
,
where (e
Q
)
ij
is the i, j
th
element of the matrix exponential of Q. With the initial probability vector
dened as before, we can uniquely dene any CTMC with the triple (S, , Q). Qs row sums are always 0,
with all o-diagonal elements being positive, and all diagonal elements being negative, ie
i, j S, i = j q
ij
> 0
i S, q
ii
=

j=i
q
ij
2.1.2 An Intuitive Interpretation
Whilst the above denes the Markov Process, it fails to describe it in any intuitive way. Before attempting
to use them, it is important to be able to deal with both the continuous and discrete-time Markov Chains
in an intuitive way. My personal preferred method is to use edge-weighted directed graphs [6]. Each state
in S is given a node in the graph, and for each i, j S the edge (i, j) is given weight
ij
in the discrete case
and q
ij
in the continuous case. Generally, in the continuous case we dont draw transition rates from each
node to itself, since these are implied by all other nodes.
As an example, lets use the following denitions, with and indexed in the order the elements of S
are written;
S = {John Paul II, Adrian I}
= (1, 0)
=
_
JP2 A1
JP2 0.3 0.7
A1 0.9 0.1
_
This denes a Discrete Time Markov Chain with the graph in Figure 2.1:
JP2 A1 0.3
0.7
0.1
0.9
Figure 2.1: A DTMCs graph - the initial probability vector may be represented by arrows entering from the
outside, but this is not universal
We can then say that a DTMC will hop from node to node at each time step with probabilities dened
by the weights of the edges between the current node and its neighbors, and can be simulated by algorithm
1. A CTMC is slightly more complex; on arriving in a state i, a random time T Exp(q
ii
) is generated.
The CTMC will remain in state i for T units of time, then jump to state j with probability
qij
qii
. We can
simulate a CTMC with algorithm 2.
5
input : (S, , ), a Markov Chain, and T, a maximum number of steps to simulate
output: x, a vector of states x
i
, recording the sequence of states visited by the chain
begin
x
1
wp

/* x1 takes on state sigma with probability delta sigma */


for n 2 to T do
x
n
wp
xn1,
/* xn takes on state sigma with the relevant probability */
end
return x
end
Algorithm 1: A Simulation Algorithm for the generic Markov Chain
input : (S, , Q), a CTMC, and T, a maximum time for which to simulate the process
output: t, a vector of pairs (t
i
, x
i
), recording the time and destination of the i
th
transition.
begin
t
0
0
x
0
wp

/* x0 takes on state sigma with probability delta sigma */


n 0
while t
n
< T do
Exp(q
xnxn
) /* tau takes on an exponentially distributed random value */
n n + 1
t
n
t
n1
+
x
n
wp
qxn
qxnxn
/* sn takes on state sigma with the given probability */
end
return ((t
0
, x
0
), ..., (t
n1
, x
n1
))
end
Algorithm 2: A Simulation Algorithm for the generic CTMC
6
2.1.3 The Poisson Process
The simplest form of CTMC is the homogeneous Poisson process, where S = N and i S, q
ii
= , q
i,i+1
=
. We call the rate of this process. If N(t) is a Poisson process, we then have that, for small
i N, t, P(N(t +) = i + 1|N(t) = i) = +o()
Note that N is the associated counting process in which N(t) counts a number of events up to time t.
N is a stochastic process not a distribution or a random variable. The graph for this process is similarly
simple, shown in Figure 2.2.
0 1 2 3

Figure 2.2: The graph of a Poisson process
Implicit from this, we see that the process can only increase - once leaving a state, we never return, so
we can dene the emission times t
n
, as t
n
= min{t : N(t) = n}. We can also dene
n
= t
n
t
n1
, the
inter-arrival times for the n
th
jump for n {1, 2, ...}. We refer to these as emissions and arrivals since a
Poisson process generally models a counting process whereby we record the times at which we observe events
happening, eg the times at which radioactive particles are detected from a radioactive material. The most
crucial property of the Poisson process for this work, implicit from algorithm 2, is that
n N
n
Exp()
That is, all inter-arrival times follow an Exponential distribution. A modication of the generic CTMC
simulation algorithm can be made for simulating a Poisson Process. Since its only possible to jump from
state i to state i + 1, we need not record the destinations of each jump; jump i will always take us to state
i. This is detailed in algorithm 3.
input : , a Poisson process rate and T, a maximum time for which to simulate
output: t, the emission times of a Poisson process of rate terminating before time T
begin
t
0
0
n 0
while t
n
< T do
n n + 1

n
Exp() /* tau takes on an exponentially distributed random value */
t
n
t
n1
+
end
return (t
0
, ..., t
n1
)
end
Algorithm 3: A Simulation Algorithm for the Poisson Process
We can also dene an inhomogeneous Poisson process. Everything remains from before, except rather
than having a rate parameter R
+
, we have a rate function : R
+
R
+
, where, for suciently small ,
i N, t R
+
, P(N(t +) = i + 1|N(t) = i) = (t) +o()
This process is no longer homogeneous, however it remains a powerful and more generic tool. Simulating
one of these is, however, less simple than for the homogeneous case. The algorithm for doing so is based on
Bernoulli thinning [7] and is equivalent to algorithm 4.
To simulate an inhomogeneous Poisson process, we rst simulate a homogeneous Poisson process of
constant rate
max
with algorithm 3, where
max
is some upper bound on the inhomogeneous rate function
, and then keep each emission at time t
i
with a probability proportional to the inhomogeneous rate function
(t
i
) using algorithm 4.
7
input : : [0, T] [0,
max
], a desired rate function for the resulting Poisson Process, and t, the
emission times of a Poisson Process of rate no greater than
max
, terminating before time T,
indexed from 1 to n
output: t

, the emission times for a single realisation of an inhomogeneous Poisson Process with rate
function
begin
j 0 for i 1 to n do
r U(0, 1) /* r takes on a uniformly distributed random value in [0,1] */
if r <
(ti)
max
then
t

j
t
i
j j + 1
end
end
return t

end
Algorithm 4: A Thinning Algorithm for Poisson Processes
2.2 Hidden Markov Models
The Hidden Markov Model is an extension of the Markov Chain, in which the chain itself may not be fully
observable. With discrete time and state space, rather than having a directly observable Markov Chain, it
is assumed that the process has an underlying unobservable Markov Chain. An observation space, Y , is
dened, along with a set of probability densities on Y , {p
s
: s S}. At each time i, if the underlying Markov
Chain is in state x
i
, then a single observation, y
i
, will be emitted by the HMM according to the distribution
dened by p
xi
, before jumping into a new state and making another emission.
The traditional example is that of the unfair casino. Imagine the dealer at the casino has two six-sided
dice. Die one is fair, but die two is weighted such that it never shows a 6, and shows the numbers 1 to 5
with equal probabilities. The dealer will clandestinely swap the dice with probability 0.1 before rolling. He
selects his rst die uniformly. We can represent this as follows;
S = {1, 2}
= (0.5, 0.5)
=
_
0.9 0.1
0.1 0.9
_
Y = {1, 2, 3, 4, 5, 6}
y Y P
1
(y) =
1
6
y Y \ {6} P
2
(y) =
1
5
We can only observe the results of the dice rolls, but what were interested in is the sequence of states
entered by the model. Which die is the dealer using at which time? Further to this, suppose we didnt know
the above parameters in advance. We may also want to estimate them from our observations in some way.
2.3 The Markov-Modulated Poisson Process
The Markov-Modulated Poisson Process (MMPP) is a particular type of Hidden Markov Model. We assume
that, underlying some inhomogeneous Poisson process, there is a CTMC. Each state is tied to a xed Poisson
Process rate, so S = {
1
, ...,
n
}. The instantaneous rate of the Poisson process is then dened by the state
in which the underlying CTMC resides. We can dene such a process by letting t = ((t
0
, s
0
), ..., (t
n
, s
n
)) be
a trace of the CTMC as generated by algorithm 2, [n 1] = {1, 2, ..., n 1}, and : [0, T] S be the step
function generated by simulating the underlying CTMC, ie;
8
() =
_
s
i
for [t
i
, t
i+1
), i [n 1]
s
n
for [t
n
, T]
We then have that an inhomogeneous Poisson process of rate is a single realisation of this MMPP. In
order to generate multiple traces, it is necessary for us to produce a new rate function from the underlying
CTMC for each trace.
Though this is correct, I personally nd that it is simpler to imagine an MMPP as the concatenation of a
series of Poisson processes, each generated within a particular state of the underlying CTMC. The underlying
CTMC enters state s
1
at time t
1
. It generates emissions as a Poisson Process of rate s
1
for t
2
t
1
time
units. It then stops acting like this Poisson process, and starts generating emissions as a Poisson process of
rate s
2
for t
3
t
2
time units.
Its not immediately obvious that this intuition matches the denition, so its worth arguing that this is
the case.
Let
1
: [0, T
1
) R
+
and
2
: [0, T
2
) R
+
be two rate functions. Let T = T
1
+ T
2
, and dene , the
concatenation =
1
||
2
, as;
() =
_

1
() for [0, T
1
)

2
( T
1
) for [T
1
, T)
Let {N
1
(t) : t [0, T
1
)} be a Poisson process of rate
1
, similarly for N
2
and N, and let

N be dened
thus;

N() =
_
N
1
() for [0, T
1
)
N
1
(T
1
) +N
2
( T
1
) for [T
1
, T)

N represents my intuition that an MMPP can be formed by gluing together emissions from shorter xed
rate Poisson processes, and N represents the actual denition given above. If we can show that

N and N
are identically distributed, then we can inductively apply this result to arbitrarily long concatenations, and
hence to the generic MMPP.
Let [0, T
1
) and > 0 be such that + < T
1
. We then have that
P(

N( +)

N() = 1) = P(N
1
( +) N
1
() = 1)
=
1
() +o()
= () +o()
So

N is identical to a Poisson process of rate within the range [0, T
1
).
Now let [T
1
, T) and > 0 be such that + < T. We have that
P(

N( +)

N() = 1) = P(N
2
( T
1
+) +N
1
(T
1
) N
2
( T
1
) N
1
(T
1
) = 1)
= P(N
2
( T
1
+) N
2
( T
1
) = 1)
=
2
( T
1
) +o()
= () +o()
So

N is identical to a Poisson process of rate within the range [T
1
, T). We need not deal with the case
where and + are either side of T
1
, since we can always choose an small enough such that this is not
the case these probabilities assume that is close to 0. So we have that

N is distributed identically to N.
So concatenating two Poisson processes of rates
1
and
2
produces a new Poisson process of rate
1
||
2
,
and it follows from simple induction that an MMPP can be thought of as the concatenation of multiple
bounded-length homogeneous Poisson processes of rates determined by the state of the underlying CTMC,
and time bounds determined by the length of time spent in each state of the underlying CTMC.
9
Chapter 3
Potential Modelling Techniques
Now that weve dened a selection of random processes, we can start discussing what might be appropriate
to t to our data. Before tting data to a model, however, it is important to reassure ourselves that, given
a realisation of a known process of known parameters, we can recover those parameters from the realisation
to a reasonable degree of accuracy. The rst step in any consideration will be to ensure that we have some
method of recovery.
3.1 Fitting a Non-Homogeneous Poisson Process
The simplest place to start would be a homogeneous Poisson process, though a cursory glance at our data
in gure 1.1 suggests that this would not be appropriate - the user is clearly tweeting at dierent rates at
dierent times of day, and is not homogeneous. Lets try a non-homogeneous Poisson process. We start by
simulating a Poisson process of known rate function, and seeing how well we can recover it. Let the rate
function be dened as:
(t) =
_

_
5 for 0 t < 30
10 for 30 t < 50
5 for 50 t < 100
The process was simulated for 100 hours, producing a trace such as the one displayed in Figure 3.1a.
We can t a step function by observing multiple traces of these Poisson processes, taking the dierences
in emission times, then attempting to cluster them with the k-means algorithm [18]. Other approaches are
possible, but because of how a step function can approximate an arbitrary function, and how easy it is to
nd implementations of k-means, these will suce as an early heuristic. The average number of emissions
(a) The simulated Poisson process (b) Its estimated rate
Figure 3.1: A trace of a Poisson process, and its estimated rate.
10
per trace is the integral of the rate with respect to time over the time interval for which we observe, ie if N
is the number of observed emissions in a single trace,
E(N) =
_
100
0
(t) dt = 600
So if we run out 6 traces well observe roughly 3, 600 emissions, a little more than our real data set. Its
possible to attempt to t this function with less data, but more data will give a more accurate result. Fitting
a function to these gives us the results from Figure 3.1b. Eyeballing this, we see that its not a terrible t
in this rst attempt, but if we try a more complex rate function, a situation similar to Figure 3.2 happens.
The estimations are well o the mark. We could instead attempt to t a polynomial function with maximum
likelihood or least squares or similar, but this requires more parameters, and completely ignores bursts.
These bursts occur at random times throughout the day, and last for random lengths of time, but they all
take on the same form. This heavily restricts our function. We cant simply say that at some particular time
there is always a burst, but we also cannot deny their existence by smoothing them out since these bursts,
by their very nature, will account for the majority of the observed tweets.
Clearly, a dierent approach is needed.
Figure 3.2: A slightly more complex rate function, whose estimation barely resembles the original
11
3.2 Fitting a Markov-Modulated Poisson Process
An MMPP seems ideal then, capturing the simplicity of a step function, and letting us model the idea of
randomly distributed bursts throughout a day. Indeed, several authors have postulated that such a model
would be ideal for simulating such data [1][2][3], but there have so far been no actual quantiable studies of
its relevance. This could be for a number of reasons, partially due to the lack of algorithms, but also possibly
because these studies focus on somewhat smaller data sets. The twitter user in this study is hugely active,
yielding a vast quantity of data to t, and making the true quality of the model all the clearer.
Fitting a Hidden Markov Model of any kind relies on two main algorithms, Baum-Welch [14] and Viterbi
[16]. The Baum-Welch algorithm is an expectation-maximisation algorithm for estimating the transition
probabilities/rates and the emission probabilities, given a set of possible emissions, a number of states to t
and an observed sequence of emissions. The algorithm runs iteratively over the observed data, incrementally
increasing the likelihood of the estimated model given the observations, and as such it needs an articially
dened stopping condition. In this project, well either use some xed number of iterations, or stop when
the log-likelihood increases by less than 10
6
between some pair of iterations, at which point we say that
the model has converged suciently. Viterbi will take an observed sequence of emissions and the parameters
of an HMM, usually those estimated by Baum-Welch, and produce the most likely state in which each of
these emissions happened. The ecacy of these algorithms hinges on knowing the number of states, an issue
which will be discussed later.
The HiddenMarkov package hosted on CRAN [8] is the only easily-accessible Hidden Markov Model
package which supports the Markov Modulated Poisson Process, but does not contain any implementation
of the Viterbi algorithm, and no implementation of Viterbi for the MMPP can be easily found. We have two
ways of working around this, either write our own or discretise the process.
Since the times between emissions in an MMPP are usually exponentially distributed, we could work in
discrete time by letting y
t
be the time between the t
th
and (t + 1)
th
emissions, though this sacrices some
information. We no longer consider the possibility of making multiple transitions between emissions, and
ignore the intrinsic link between the times between emissions and the times between state transitions, but
this model may also yield some useful results. We will try both of these approaches.
3.2.1 A derivation of the classical Viterbi algorithm
The Viterbi algorithm for a standard Discrete Time Hidden Markov Model relies on a known, nite obser-
vation space Y , a known or estimated distribution on Y for each state s, p
s
, known or estimated transition
probabilities
i,j
, and known or estimated initial probabilities for each state s,
s
. The goal is, given a
sequence of T observations (y
n
)
n[T]
, y
i
Y , to nd the sequence of states (x
n
)
n[T]
satisfying
x = arg max
xS
T
P( x|y)
We call x the Viterbi Path.
Let V
t,s
be the probability of the most probable state sequence responsible for the rst t observations
which ends in state s, that is
V
t,s
= max
xS
t1
P
_
( x
1
, x
2
, ..., x
t1
, s)|(y
1
, ..., y
t
)
_
We have that V
1,s
is the probability of both being in state s at time 1 and seeing observation y
1
from
state s. This gives us that
V
1,s
= P(y
1
|x
1
= s)P(x
1
= s)
Recall that
s
is the probability of being in state s at time 1, ie P(x
1
= s), and that p
s
(y
1
) is the
probability of observing y
1
from state s, ie P(y
1
|x
1
= s). In practice, these wont be known, but estimates
of them will be given by the Baum-Welch algorithm, so we will use their maximum-likelihood estimates to
give us
V
1,s
= p
s
(y
1
)
s
12
Given V
,s
for < t, we can nd V
t,s
by noting that the Markov Property implies a form of memoryless-
ness. Given the present, the future is conditionally independent of the past. As such, we need only consider
V
t1,s
for each s, as well as our known parameters.
The probability of the most likely path that leads us to state s at time t is given by the probability of
the most likely path that led us to some state s

at time t 1, and then jumped to s at time t, and then


emitted y
t
from state s. The probability of jumping from s

to s is
s

,s
. The probability of emitting y
t
from
state s is P
s
(y
t
). The probability of the most likely path that leads us to s

at time t 1 is V
t1,s
. Hence,
V
t,s
= p
s
(y
t
) max
s

S
(
s

,s
V
t1,s
)
Using this recurrence, we can nd V
t,s
t [T], s S by a standard dynamic programming algorithm.
From here, we can then work backwards to nd the Viterbi path. x
T
= arg max
sS
V
T,s
- the most likely
nal state is the state in which the path of maximum probability ends.
Let
T
t,s
= arg max
s

S
(
s

,s
V
t1,s
),
ie, T
t,s
is the state from which we are most likely to have come at time t 1 given that we are in state s
at time t, we can then see that x
t1
= T
xt,t
. Note the similarities in the denitions of V and T - both can
be calculated simultaneously - V is the maximum, T is the argument that maximises. T
1,s
is never used, so
need never be dened.
Since we have an expression for x
t1
in terms of x
t
and an expression for x
T
, we can then recover x, the
Viterbi Path. Algorithm 5 gives this in full.
Data: (S, , , O, p), a DTHMM
input : y, an observed sequence of T emissions, indexed from 1 to T
output: x, the most likely sequence of states generating these emissions
begin
for s S do
V
1,s
p
s
(y
1
)
s
end
for t 2 to T do
for s S do
V
t,s
p
s
(y
t
) max
s

S
(
s

,s
V
t1,s
)
T
t,s
arg max
s

S
(
s

,s
V
t1,s
)
end
end
x
T
arg max
s

S
(V
s

,T
)
for t T to 2 do
x
t1
T
xt,t
end
return x
end
Algorithm 5: The Viterbi Algorithm for DTHMMs
This algorithm is only valid in discrete time. For the continuous time MMPP, modications are necessary.
3.2.2 A First Approximation of the Viterbi Algorithm for the Markov-Modulated
Poisson Process
Recall the dependencies for the DTHMM Viterbi Algorithm. We require knowledge of a nite Y , P
s
for each
s, and
ij
for each pair of states i, j.
In an MMPP, we observe a Poisson Process of rate randomly varying between various known rates - the
rates being our states. Let S = {
1
, ...,
m
}. Our observations can be interpreted as exponential random
13
variables of these rates. Let
0
= 0, and
i
be the time of the i
th
Poisson emission for i [n]. Let y
i
=
i

i1
for i [n]. Given that the underlying CTMC was in state
s
at time
i
, we have that y
i
Exp(
s
). From
the properties of the generic CTMC, the probability that the process is in state j at time
i
given that it
was in state i at time
i1
is given by (e
Qyi
)
i,j
. This gives a quasi-discretised model, where all jumps and
emissions indeed happen in discrete time, but the transition probabilities acknowledge continuity. To ease
notation, well write (e
Qyi
)
i,j
as (e
Qyi
)
i,j
, omitting the s.
The state space S and transition rates Q are estimated by the Baum Welch algorithm as before, so after
running Baum Welch over an observed trace, we can start to nd the most likely state at each emission.
Note that this Viterbi Path is not the most likely sequence of state transitions, it is instead the most likely
state in which the underlying CTMC resides at the time of each emission.
Since our emissions are continuous, we dont have any notion of most probable - if we model height
continuously, the probability that I meet someone exactly 1.8m tall is the same as the probability that I meet
someone exactly 18m tall, theyre both 0, so instead well base our likelihood calculations o probability
density, capturing the idea that, even though I dont know for certain that Ill meet one of the two, its more
likely for me to meet the 1.8m tall person.
We let p
s
(t) =
s
e
ts
, the probability density of an exponential random variable of rate
s
evaluated at
t. Let V
t,s
be the probability density of the most likely path that leads us to emitting y
t
from state s. We
have that
V
1,s
=
s
p
s
(y
1
)
The probability density of the most likely path that leads us to waiting for time y
1
before making an
emission is given by the probability of starting in state s, multiplied by the probability density of waiting y
1
for an emission from state s. The memoryless property of a CTMC allows us to only consider V
t1,s
when
calculating V
t,s
. We have that
V
t,s
= p
s
(y
t
) max
s

S
(V
t1,s
(e
Qyt
)
s

,s
)
The probability density of the most likely path that leads us to waiting for time y
t
between the (t 1)
th
and t
th
emissions in state s is given by the probability density of the most likely path that takes us to state
s

for the (t 1)
th
emission, followed by jumping (along any arbitrary path) into state s for emission t,
multiplied by the probability density of emitting y
t
in state s.
From here, we can proceed as normal. We dene T as before to record our most likely states at each
transition, and work backwards to nd x, producing the following algorithm
The reason that this is an approximation is the fact that the algorithm assumes that either a jump happens
instantaneously, or not at all - we always evaluate p
s
(y
t
), rather than p
s
(y
t
)
1
, where represents the
time we wait for all the relevant transitions to occur. The times between state transitions are exponentially
distributed, and we are dealing with the most likely outcomes. The most likely outcome of any exponential
distribution is 0 so, if the underlying CTMC jumps from one state to another, the most likely time for that
jump to happen is immediately, so = 0. If multiple jumps happen, their individual times are exponentially
distributed, but their sum does not have a mode of 0.
This rst approximation is in fact very powerful. Approximating in this way assumes that multiple
transitions between emissions are rare, alternatively that emissions within each state are more frequent than
transitions out of that state. The converse can be true when the tweeter is asleep, his emission rate is
near 0, but he will have a positive transition rate for when he wakes up and in this case the algorithm will
spot large periods of inactivity as estimate them as being the inactive state. If there are multiple states with
low rate emissions but high rate transitions between them, then we would expect to see very few emissions
occurring on a path through these states, so arguably the information on the exact route through these
states doesnt exist, and cannot be recovered by any algorithm.
As a nal note, we can further rene the tted model based on the results of the Viterbi algorithm by
changing the estimated rate of each state to the observed rate of the emissions estimated to occur in that
state. By the nature of these estimation algorithms, these results are likely to dier, and with a large sample
1
- tinco - represents the voiceless alveolar stop in the Tengwar alphabet, as devised by JRR Tolkein. When dealing with
time so frequently, we eventually run out of ways to write the letter t
14
Data: (S, , Q, p), an MMPP
input : y, the absolute times of an observed sequence of T emissions, indexed from 1 to T
output: x, the most likely sequence of states in which the underlying CTMC resides for each emission
in y
begin
for t 2 to T do

t1
y
t
y
t1
end
T T 1 for s S do
V
1,s
p
s
(t
1
)
s
end
for t 2 to T do
A e
Qt
for s S do
V
t,s
p
s
(
t
) max
s

S
(A
s

,s
V
t1,s
)
T
t,s
arg max
s

S
(A
s

,s
V
t1,s
)
end
end
x
T
arg max
s

S
(V
s

,T
)
for t T to 2 do
x
t1
T
xt,t
end
return x
end
Algorithm 6: An Approximate Viterbi Algorithm for MMPPs
size what we observe is usually closer to the truth than what we assert. In practice, if Baum Welch gives a
good estimation of the underlying MC, the dierence between the two is minor.
3.2.3 Applying the Algorithm
The algorithm was written in R and added to the pre-existing HiddenMarkov [8] package, which was then
recompiled to be loaded into an R environment. A Python script was written to call into this library from
a more convenient language using RPy2 [12] which also allowed mathematical, statistical and visualisation
functions to be loaded in from matplotlib [9], and SciPy [13].
As before, we rst simulate a model, then see if we can recover it, reassuring ourselves that the Viterbi
implementation is correct. The simulated model had the following parameters;
(
1
,
2
,
3
) = (0.01, 0.5, 2)
S = {
1
,
2
,
3
}
Q =
_
_
_

1

2

3

1
1
20
1
60
1
30

2
1
10
2
15
1
30

3
1
10
1
60
7
60
_
_
_
=
_
1
3
,
1
3
,
1
3
_
The expected course of an MMPP is somewhat harder to calculate, but 6,000 hours gave a similar number
of emissions to the number of emissions in our data. The resulting trace can be seen in Figure 3.3.
15
Figure 3.3: A trace of an MMPP with the states shaded in colour
2
Running the Baum-Welch algorithm over the trace with a known state size of 3 for 208 iterations, at
which the model converged suciently, gives us the following predictions, rounded to 2 signicant Figures;
(
1
,
2
,
3
) = (0.0066, 0.53, 2.00)
S = {
1
,
2
,
3
}
Q =
_
_
_

1

2

3

1
0.061 0.022 0.039

2
0.10 0.14 0.053

3
0.094 0.020 0.11
_
_
_
= (1.00, 0.00, 0.00)
On inspection, most these parameters give a reasonable approximation of the originals, but seems to
be well o the mark. The reason for this is fairly simple - the simulated trace had to start somewhere, and
in this case it started in state 1. The algorithm was only given one trace, so to minimise the probability of
error, it estimated that the underlying CTMC always starts in the state in which it was estimated to start.
The initial distribution of the underlying CTMC isnt hugely relevant, however. Were interested in how the
user tweets in the long term, and regardless of initial distribution this Markov Chain will reach an invariant
distribution.
Running the modied Viterbi algorithm over the process gives the estimate in Figure 3.4
Figure 3.4: The same trace as Figure 3.3, but with the shading to match estimated states, rather than actual
states
2
Originally, all the diagrams in this document were vector graphics, but diagrams like this are so intricate that their vector
versions can crash pdf renderers.
16
Comparing the estimated path to the original in a way that clearly displays the results seems daunting,
but we can do this in what I nd a fairly beautiful way by simply aligning the two traces along the same
scale, rendering them as bitmaps, then taking the dierence in the colours between the two images using
GIMP [10, eqn 8.15]. Any black pixels are locations where the original two images match perfectly, all others
are mistakes. Stripping the top row of colours will show mistakes over time. Figure 3.5 is a black and white
thresholded [11] version of the preceding, where all colours appear as white lines. The image is 91.2% black
so my estimated model matches the simulation 91.2% of the time.
Figure 3.5: A comparison between the estimated state sequences of Figure 3.4 (top) and 3.3 (middle). Their
image dierence is shown at the bottom, with adjustments for clarity.
Now that weve veried that our algorithms are capable of recovering known MMPPs to a high degree
of accuracy, its time to pump our data into it.
The data were preprocessed such that rather than recording exact times and dates of tweets, they instead
recorded times since the beginning of the observation in hours at which each tweet occurred. This sequence
was then fed into an MMPP data structure, over which BaumWelch and Viterbi were run based on the
assumption of 3 states. This assumption is arbitrary, but serves as a nice starting point to generate some
early results well address the problem of selecting the number of states later. The resulting model was as
follows:
(
1
,
2
,
3
) = (0.0289, 1.28, 15.5)
S = {
1
,
2
,
3
}
Q =
_
_
_

1

2

3

1
0.123 0.120 0.004

2
0.441 1.09 0.648

3
0.00 5.25 5.25
_
_
_
= (1.00, 0.00, 0.00)
And, with the states shaded in in the colours represented above, the transitions were as represented in
Figure 3.6.
Figure 3.6: The twitter data, with predicted states shaded -
1
in red,
2
in yellow,
3
in green. Since
3
is
a burst state which, on average, only lasts for 12 minutes and contains several tweets, it can be dicult to
see the green behind those tweets
From this distance, we already see at least some kind of sensible behaviour - the user seems to have regular
sleeping patterns, tweeting during the day and not tweeting at night-time. The three states correspond to
17
intuitive states for a human to occupy in state
1
the user is asleep or otherwise away from the computer,
tweeting on average once every 50 hours, but only staying there for 10 hours on average. In state
2
, the
user is awake, online, and going about his day as usual - as an active twitter user he tweets about 1.3 times
per hour. On average, every hour, the user will then either go to bed with probability 0.4, or enter into
a conversation with someone and produce a burst with probability 0.6. This already seems a little odd,
and shows the limitation of using a homogeneous model, we seem to be suggesting here that the user goes
to bed on average every 1.5 hours, which doesnt t our notions of how humans actually behave. Bursts are
tweets produced at a rate of 15 per hour, or one every 4 minutes, and last around 12 minutes on average.
We can test whether this model gives an accurate representation of our data by performing a Kolmogorov-
Smirnov (KS) [15] test on the data for each state. This test takes the maximum dierence in the cumulative
distribution functions for the two data sets, and then compared to a critical value from the Kolmogorov
distribution. This test generally requires a large data set to give meaningful results, but we certainly have
one here.
We take our null hypothesis to be that the emissions within each state follow an exponential distribution
whose rate matches the observed rate. Running this test, we nd, however, that they very much do not -
the p-values for each state were less than 2.2 10
16
to say that this is low would be an understatement.
Whilst 3 states makes sense intuitively, theres no reason why this is denitely the case, so lets try
dierent numbers of states. We can nd the optimal number based on the Bayesian Information Criterion
[17]. Given an estimated model, we can dene the Bayesian Information Criterion, BIC, as
BIC = 2 ln L +k ln n
Where ln L is the log-likelihood of the tted model, ie the natural logarithm of the probability of observing
the data given that the model is correct, k is the number of parameters tted by model, and n is the number
of data points used to t the model. Faced with the choice of two dierent models, we select the one with
the lower BIC.
An MMPP of |S| states has k = |S|
2
+ |S| 1 free parameters Q contains |S|
2
elements, but each of
the |S| diagonal elements can be determined by the row on which it resides, we t s states, each of which is
freely choosable, and has |S| elements, one of which can be determined from the other s 1. The number
of data points is one less than the number of tweets weve observed, and the log likelihood is returned by
the Baum-Welch algorithm.
500 iterations of Baum-Welch were run over the data for varying numbers of states until the BIC stopped
decreasing. This optimum occurred at 4 states with a BIC of 1470, described as follows:
(
1
,
2
,
3
,
4
) = (0.0223, 0.511, 6.14, 28.3)
S = {
1
,
2
,
3
,
4
}
Q =
_
_
_
_
_
_

1

2

3

4

1
0.111 0.0950 0.0160 0.000

2
0.262 1.21 0.950 0.000

3
0.287 2.94 3.67 0.445

4
0.000 0.000 5.56 5.56
_
_
_
_
_
_
= (1.00, 0.00, 0.00, 0.00)
Fitted by the same methods, we see the results in Figure 3.7. Unfortunately, the Kolmogorov-Smirnov
tests still fail - the greatest p-value for any state was 3.85 10
6
. Fitting more states would probably give
a better t, but due to the quadratic scaling it would also start sending the number of parameters up to
absurd levels - the Bayesian Information Criterion would not improve. This is the best MMPP we can t,
and it still isnt very good.
3.2.4 An Alternative Approach
From this point, it is starting to seem like the tweeter does not follow an MMPP, but there is one last thing
we can attempt in this vein. A Discrete Time Hidden Markov Model can still have a continuous observation
18
Figure 3.7: The twitter data, with predicted states shaded -
1
in red,
2
in yellow,
3
in green and
4
in
cyan. As with
3
in Figure 3.4,
4
s emissions can be dicult to see.
space and all the previous methods will work, just with probability density in place of probability. Rather
than observing a series of times, we will instead observe a series of time dierences, and look for any small
number of states that lets us gather these dierences into the same exponential distribution. The results
from such a model will be similar, but lose some information about how long our tweeter actually spends in
each state. The crucial dierence between the two is that in an MMPP emissions and transitions take place
along the same timeline compare the Viterbi algorithms for both the MMPP and DTHMM, and note that
the transition probabilities between a DTHMMs states do not depend on the observed emissions, whilst the
transition probabilities in the MMPP version do.
So we go through the same procedure again - the intuitive 3 states didnt result in anything worthwhile,
so we apply the BIC-based method again to arrive at an optimum of 5 states and a BIC of 1446, resulting
in a DTHMM with the following parameters;
S = {1, 2, 3, 4, 5}
(
1
,
2
,
3
,
4
,
5
) = (0.108, 1.18, 4.27, 14.8, 32.4)
= (1, 0, 0, 0, 0)
=
_
_
_
_
_
_
0.295 0.380 0.000 0.312 0.013
0.153 0.573 0.006 0.268 0.000
0.051 0.052 0.579 0.156 0.161
0.176 0.250 0.286 0.260 0.027
0.007 0.000 0.196 0.012 0.785
_
_
_
_
_
_
Y = R
+
s S, y Y p
s
(y) =
s
e
sy
Which, when we add the Viterbi path to the plot, gives us Figure 3.8. We perform some new KS tests to
nd one p-value of 0.03, and four others below 10
7
. From this, we can denitively conclude that the data
cannot be well-clustered into a small number of exponentially distributed subsets. We can then conclude
that these data do not follow any kind of simple Poisson process, in spite of appearances.
3.3 A Diagnosis
Whilst this gives a fairly strong negative result that this tweeter is not a Poisson process, it doesnt give
any real explanation as to why. What went wrong? Inspecting the tted models we see that the rst few
emissions dont quite t in with the rest, but removing these anomalous results does little to the results of
the Kolmogorov-Smirnov tests; they remain heavily negative in all cases.
Lets start looking at the emissions estimated to occur in each state. Since the DTHMM gave slightly
better p-values, well use that as a jumping-o point.
19
Figure 3.8: The twitter data, with predicted states shaded -
1
in red,
2
in yellow,
3
in green,
4
in cyan
and
5
in black. Burst states are, as usual, very dicult to see.
And now, observing Figure 3.9 we see an issue. The ts are close, but the heads of the distributions are
light and their tails heavy. What we really need is a distribution which will allow for these tails, such as the
lognormal distribution. If X follows a lognormal distribution, then ln(X) follows a normal distribution [19].
Summing lognormal distributions into a Markov Modulated Renewal Process in the same way that we
sum exponential distributions into a Poisson process carries all manner of problems, however. The lognormal
distribution is not memoryless nor is it modally 0, so the resulting processes within each state will be
somewhat harder to t, meaning that simple tting algorithms like Baum-Welch and Viterbi require much
more sophisticated modications to work correctly in continuous time.
The algorithms for tting a discrete model to the data using a DTHMM with lognormal emissions are,
however, completely unchanged, so lets do that. We take the natural logarithm of all the dierences in
emission times, t multiple models and evaluate their BICs. Here, each state requires 2 parameters, a mean
and a standard deviation, so the number of free parameters in an s-state model is now s
2
+2s 1. We nd
that the optimal number of states is 3, and go ahead again with the KS tests, making minor corrections
to the parameters for the estimated emissions within each state, resulting in p-values of 0.312, 0.156 and
0.420, and with densities shown in Figure 3.10. At last, we have some statistically signicant results, with
the following parameters;
S = {1, 2, 3}
(
1
,
2
,
3
) = (3.33, 1.33, 2.34)
(
1
,
2
,
3
) = (1.47, 1.86, 0.366)
= (0, 1, 0)
=
_
_
0.926 0.069 0.006
0.047 0.906 0.047
0.000 0.865 0.135
_
_
Y = R
s S p
s
(y) =
exp(
(ys)
2
2
2
s
)

2
Given a sequence of emissions from this DTHMM, y, the inter-arrival times are then t with t
i
= e
yi
.
Perhaps a little strangely, shading the states onto our graph as in Figure 3.11 gives a less obvious seasonality
to the tweeters behavior. We still have observable and highlighted bursts as well as some regularity to the
users sleeping patterns, but theyre less obvious here than with the MMPP. More work would certainly throw
up more results, perhaps this requires a continuous time model for such things to be seen, but regardless,
this is a positive result for a problem whose solution has so far only been speculated upon, and which shows
that these speculations are very likely to be wrong.
20
Figure 3.9: The estimated densities of emissions in each state (blue), alongside the actual density of an
exponential random variable of rate equal to the rates of the observed emissions in that state
21
Figure 3.10: The estimated densities of the logarithms of emissions in each state (blue), alongside the actual
density of a normal random variable of mean and variance equal to those of the logarithms of the observed
emissions in that state
Figure 3.11: The state transitions estimated by Viterbi for a DTHMM with lognormal emissions, state 1 in
red, 2 in yellow, 3 in green
22
Chapter 4
Conclusion
We conclude, then, in a swarm of negative results, but with an implementation for an as yet unwritten
algorithm, and exactly one statistically valid result for an as yet unsolved problem. Further renements can,
of course, be made. Various path nding algorithms can probably be adapted for a CTMC to give the most
likely sequence of states between two known endpoints, which would allow us to nd the modal transition
time between states and create a more accurate MMPP Viterbi. We could even try for a Markov Modulated
Renewal Process, inter-arrival times being dened by some arbitrary distribution of parameters dened by
the states of an underlying CTMC.
I hope that this project has shone a light into what was once a dark area, and that the resulting methods
prove useful to others for detecting botnets in networks, suspicious social network activity, or simply creating
some rather beautiful diagrams. All code used for this project, as well as the L
A
T
E
Xused to generate this
document and svgs of most of the graphics are hosted on GitHub at https://github.com/Ymbirtt/maths_
project. If you, for instance, had diculty seeing some of the diagrams, or want more concrete information
on exactly what happened in this project, Id recommend nding it all there.
23
Appendix A
Fun with Integrals
Recall Figure 3.1a. It is a trace of an inhomogeneous Poisson process with the following rate
(t) =
_

_
5 for 0 t < 30
10 for 30 t < 50
5 for 50 t < 100
Figure A.1 shows a Poisson process of rate , alongside the integral of . With only 600 samples, the
two functions seem similar. Let N be an inhomogeneous Poisson process of rate : R
+
R
+
, let > 0,
and consider the following;
P(N(t +) N(t) = 1) = (t) +o()
P(N(t +) N(t) > 1) = o()
E[N(t +) N(t)] =

i=1
P(N(t +) N(t) = i)
= (t) +o()
lim
0
E[N(t +) N(t)]

= lim
0
(t) +o()

= (t)
So if N is a Poisson process of rate , then its also a function whose value we expect to increase at a
rate of meaning that, with enough samples, N would approximate an indenite integral of the function
. Currently, the standard method for evaluating dicult integrals computationally is to use the Monte-
Carlo dartboard algorithm [21, 2], though this is only capable of evaluating a single, denite integral.
If we had an approximation of the indenite integral function, we could evaluate arbitrarily many denite
integrals by just looking up values from the function. Figure A.2 shows an aproximation of the integral of
sin
_
x
5
_
+1, taken by simulating 50 Poisson process for a total of roughly 5,000 emissions, each emission only
incrementing the process by 0.02, rather than 1. Here, we see a very close approximation of the true integral.
The dartboard algorithm usually uses tens of thousands of randomly generated points to evaluate a single
denite integral.
Of course, the actual practicalities of such an algorithm and its relevance to higher-order integrals are
yet to be conrmed, though it could be an interesting area for further study.
24
Figure A.1: A trace of an inhomogeneous Poisson process used to estimate an integral, alongside its true
integral, plotted with 620 Poisson emissions
Figure A.2: A trace of an inhomogeneous Poisson process used to estimate an integral, alongside its true
integral, plotted with 5153 Poisson emissions
25
Bibliography
[1] Ihler,A.,Hutchins,J.,& Smyth,P Learning to detect events with Markov-Modulated Poisson Processes.
ACM Transactions on Knowledge Discovery from Data, 1(3), 13. 2007
http://dl.acm.org/citation.cfm?doid=1297332.1297337
[2] R.D. Malmgren, D.B. Stouer, A.E. Motter, and L.A.N. Amaral, A Poissonian explanation for heavy
tails in e-mail communication,
PNAS, 105(47):1815318158, 2008
http://www.pnas.org/content/105/47/18153
[3] Scott, S. L. and Smyth, P. 2003. The Markov modulated Poisson process and Markov Poisson cascade
with applications to web trac data.
Bayesian Statistics 7, 671680.
http://dl.acm.org/citation.cfm?doid=1297332.1297337
[4] Joseph L. Doob, The Development of Rigor in Mathematical Probability (1900-1950)
The American Mathematical Monthly, Vol. 103, No. 7 (Aug. - Sep., 1996), pp. 586-595
http://www.jstor.org/stable/2974673
[5] Weisstein, Eric W. Markov Chain. From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/MarkovChain.html
[6] Weisstein, Eric W. Graph. From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/Graph.html
[7] Lewis, P. A. W. and Shedler, G. S. (1979), Simulation of nonhomogeneous Poisson processes by thinning.
Naval Research Logistics, 26: 403413.
http://onlinelibrary.wiley.com/doi/10.1002/nav.3800260304/abstract
[8] David Harte. HiddenMarkov v1.7-0, a Hidden Markov Model library written in R and Fortran,
http://cran.r-project.org/web/packages/HiddenMarkov/index.html
[9] Hunter, J. D. MatPlotLib, a 2D graphics environment
http://matplotlib.org/
IEEE Computer Soc, Vol. 9, No. 3 (2007), pp. 90-95
[10] The GIMP Development Team. The GIMP docs, the documentation for the GNU Image Manipulation
Program,
http://docs.gimp.org/en/gimp-concepts-layer-modes.html
[11] The GIMP Development Team. The GIMP docs, the documentation for the GNU Image Manipulation
Program,
http://docs.gimp.org/en/gimp-tool-threshold.html
26
[12] Moriera W, Warnes GR, Gautier L. RPy v2-2.3, a Python to R interface
http://rpy.sourceforge.net/
[13] Jones E, Oliphant T, Peterson P & others, SciPy, open source scientic tools for Python, 2001-
http://www.scipy.org
[14] Baum L.E, Petrie T, Soules G, & Weiss N, A maximization technique occuring in the statistical analysis
of probabilistic functions of Markov chains.
Annals of Mathematical Statistics 41(1), 164-171, 1970
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aoms/
1177697196
[15] Weisstein, Eric W. Kolmogorov-Smirnov Test. From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/Kolmogorov-SmirnovTest.html
[16] Forney G.D, Jr., The viterbi algorithm,
Proceedings of the IEEE , vol.61, no.3, pp.268,278, March 1973
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1450960&isnumber=31166
[17] Schwarz, G. (1978) Estimating the Dimension of a Model
Annals of Statistics, 6, 461-464.
http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/
1176344136
[18] Jones E, Oliphant T, Peterson P & others, SciPys k-means algorithm, http://docs.scipy.org/doc/
scipy/reference/cluster.vq.html
[19] Weisstein, Eric W. Log Normal Distribution. From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/LogNormalDistribution.html
[20] Yiying Lu, Lifting a Dreamer
http://www.yiyinglu.com/?portfolio=lifting-a-dreamer-aka-twitter-fail-whale
[21] Caisch E, Monte Carlo and quasi-Monte Carlo methods,
Acta Numerica vol. 7, Cambridge University Press, 1998, pp. 149.
http://websrv.cs.fsu.edu/
~
mascagni/Caflisch_1998_Acta_Numerica.pdf
27

You might also like