Pairs Trading: An Implementation of The Stochastic Spread and Cointegration Approach

University of Amsterdam
Master Thesis
Pairs Trading:
An implementation of the Stochastic Spread and
Cointegration Approach
Supervisors:
Author:
Prof. dr. C.G.H. Diks
Nick Huurman
Dr. S.A. Broda
5631335
August 10, 2012
Contents
1 Introduction
2 Cointegration approach
2.1
Integration, cointegration and error correction . . . . . . . . . . . . . . . . . . . .
2.2
Theoretical framework for pairs trading . . . . . . . . . . . . . . . . . . . . . . .
2.3
Johansen cointegration test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Stochastic spread model
3.1
The state-space model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Trading design
12
4.1
Trading period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.2
Pairs selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.2.1
Cointegration approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.2.2
Stochastic spread approach . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Mean-Variance optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.3
5 Evaluation
16
5.1
Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
5.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
5.2.1
Stochastic Spread Approach . . . . . . . . . . . . . . . . . . . . . . . . . .
17
5.2.2
Cointegration Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
Results using DAX index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
5.3
6 Conclusion
24
Chapter 1
Introduction
History shows us that using a market neutral trading strategy can be a good way to invest your
money. Typically, such a strategy performs in a steady manner, regardless of whether the market
goes up or down, and returns come with low volatility (Vidyamurhty, 2004). These favourable
characteristics are achieved by trading a market neutral portfolio, which can be constructed by
going long and short in two assets that have the same beta (hence, a portfolio with zero beta),
which is also referred to as a spread portfolio.
This thesis will evaluate one particular market neutral trading strategy that has already been
used (and proved its value) for 25 years on Wall Street, namely pairs trading. Recent studies
tell us that pairs trading performs exceptionally well in turbulent markets, where mispricing of
stocks is more common (Gatev et al., 2006; Do et al., 2006; Baronyan et al., 2010). Baronyan
et al. (2010) even reported a 40 per cent net annual profit in the first year (2008) of the financial
crisis. This result shows that pairs trading, despite its 25-year existence, is still profitable and
therefore very relevant to investigate, especially with the recent turbulent stock market.
The concept of pairs trading is relatively simple and can be summarized as follows. To begin,
an investor has to find two securities of which the prices have historically moved together and are
therefore in a relative equilibrium. Then, when the price dierence between the two securities
widens, hence the securities are out of the relative equilibrium, the trader takes a long position
in the cheap security and a short position in the expensive security. Based on the past price
dynamics, the expectation of the investor is that the prices will converge back to their relative
equilibrium. If so, the long and short position are unwound and a profit is made.
The main difficulties of constructing a profitable pairs trading strategy lie evidently in using
the right method for selecting a suitable pair of securities and how and when to take a position
in the selected pair. A recent thesis by Yakop (2011) investigates a broad range of selection and
trading methods which are appropriate for pairs trading. He concludes that the model based
approaches perform best. Therefore, this thesis will investigate and analyse two dierent model
based approaches for pairs trading. The first approach is the cointegration approach, which is
1
CHAPTER 1. INTRODUCTION
based on the error correction model. The second method is the stochastic spread approach,
as introduced in Elliot et al. (2005).
The results of the selected pairs of both methods will be calculated with the use of two
dierent trading strategies. The first is a dynamic model for the number of positions taken in
the spread that is based on the mean-variance optimization procedure discussed in the paper of
Markowitz (1952). The second is the two-standard deviation approach, which is commonly used
in earlier literature (Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). The main objective
for this thesis is to compare the performance of the cointegration approach with the stochastic
spread approach when implemented with the two aforementioned pairs trading strategies.
The thesis is organized as follows. Chapters 2 and 3 give an outline of the two dierent
approaches used for modelling the behaviour of a pair. In the 4th chapter the dierent trading
strategies will be described and Chapter 5 provides an evaluation of the results of both models
with the dierent trading strategies. The last chapter contains the conclusions of this thesis.
Chapter 2
Cointegration approach
The notion of cointegrated time series was first introduced by Engle and Granger (1987) and
is one of the ideas for which they received a Nobel Prize in economics in 2003. Cointegrated
time series possess characteristics that are very useful for pairs trading, such as a long-term
equilibrium with the associated property of mean reversion. In the first section of this chapter
the definitions of integration, cointegration and the error correction model (ECM) for a time
series are given. The second section gives the theoretical framework for pairs trading and in the
third section, the cointegration test proposed by Johansen (1991) is discussed.
2.1
Integration, cointegration and error correction
To begin the theory about cointegration, first the definitions of weakly stationarity and integrated time series are given:
Definition. An n ordered sequence of random variables ,i.e., a time series or process {xt }
is weakly stationary or second-order stationary if the first two moments of the distribution of
{xt } are constant and independent of time.
Definition. A time series which has a stationary, invertible, ARMA representation after dierencing d times, is said to be integrated of order d, denoted {xt } I(d).
The two above definitions become tangible by an example of a simple VAR model. Consider a
k-dimensional VAR(p) time series {xt } with possible time trend so that the model is
xt = t +
1 xt 1
+ ... +
p xt p
or
(B)xt = t + at ,
3
+ at ,
CHAPTER 2. COINTEGRATION APPROACH
with
(B) = [I
1B
...
pB
],
where the innovation at is assumed to be Gaussian and t = 0 + 1 t, where 0 and 1 are

k-dimensional constant vectors. From the definition of weak stationarity, it follows that a necessary condition for the VAR(p) system above to be weakly stationary is that all zeros of the
determinant | (B)| lie outside the unit circle, {xt } is unit-root stationary or is said to be not
integrated (I(0) process) (Tsay, 2010). The definition of cointegration as stated in Engle and
Granger (1987) is given next.
Definition. The components of the vector {xt } are said to be cointegrated of order d, b,
denoted {xt } CI(d, b), if (i) all components of {xt } are I(d); (ii) there exists a vector (6= 0)
0
so that zt = xt I(d
b), b > 0. The vector is called the cointegrating vector.
Considering the case where d = b = 1, cointegration would mean that the equilibrium error
would be I(0) and zt will rarely drift far from its mean and will often cross this line (Engle
and Granger, 1987). A convenient way of representing the vector {xt } as a stationary series is
by the error correction model (ECM) representation (solves the issue of overdierencing (Tsay
(2010),p. 431)). The definition of the ECM is given next (Engle and Granger, 1987):
Definition: A vector time series {xt }, has an error correction representation if it can be expressed as:
A(B)(1
B)xt =
zt
+ ut ,
where ut is a stationary multivariate disturbance, with A(0) = I, A(1) has all elements finite
and
6= 0.
In this representation of the ECM, only the disequilibrium in the last period is an explanatory
variable. However, by rearranging terms, any kind of set of lags can be written in this form.
Therefore, this representation of the ECM permits any type of gradual adjustment towards a
new equilibrium (Engle and Granger, 1987).
2.2
Theoretical framework for pairs trading
Define the observed price of stock i at time t as {Pit } and let pit = ln(Pit ) be the corresponding log price. Now a common assumption about {pit } is made in the literature (Tsay, 2010;
Vidyamurhty, 2004), namely the time series {pit } has a unit-root and follows a random walk:
pit = pi,t
+ rit , where {rit } is the return (this unit root assumption of {pit } will be confirmed
by the Augmented Dicky-Fuller (ADF) unit-root test).
Based on the arbitrage pricing theorem (APT), if two stocks have similar risk factors, they
should have similar returns. If this is the case, {p1t } and {p2t } are likely to be driven by a
common component and are therefore cointegrated (Tsay, 2010). Or in formula, there exists a
linear combination wt =
0p
= p1t
p2t , which is unit-root stationary and mean reverting.
These two price series {p1t } and {p2t } can also be written in an ECM form:
!
!
!
p1t p1,t 1
1
1t
=
(wt 1 w ) +
,
p2t p2,t 1
2
2t
(2.1)
where w = E[wt ] denotes the mean of {wt }, which is referred to as the spread between the
two log stock prices.
The left hand side of the ECM form represents the log returns of both price series. Furthermore, the equation states that the returns depend on the stationary series wt
are therefore also stationary. Specifically, wt
w and
w denote the deviations from the long-term
equilibrium between the two stocks. So, the returns of the stocks (left side of 2.1) depend on the
past deviations from the equilibrium. The coefficients 1 and 2 respectively show the eect of
these past deviations on the returns {r1t } and {r2t }. In practice, the coefficients 1 and 2 will
have opposite signs, indicating the mean reversion behaviour of the stationary series.
2.3
Johansen cointegration test
For testing purpose, the ECM representation for a k-dimensional VAR(p) time series {xt } becomes:
xt = dt + xt
1 xt 1
p xt p
+ ... +
+ at ,
where the deterministic regressor {dt } (constant/trend) is added and t = p + 1, ..., T . Furthermore,
p
X
i,
i=j+1
and
=
The term xt
p 1
+ ... +
I=
(1).
is referred to as the error correction term, which plays a key role in the cointe-
gration study (Tsay, 2010). If we assume that {xt } is at most I(1),
xt is I(0) process. Now,
one can consider three cases of interest of the ECM, namely:

1. Rank() = 0. Hence, = 0 and xt is not cointegrated.
2. Rank() = k. Hence, | (1)| =

6 0 and xt contains no unit roots and one can just look at xt
(which is I(0)).
3. 0 < Rank() = m < k. Hence, xt has m linearly independent cointegration vectors and k m

0
unit roots. If one writes = , and
are k m matrices with Rank() = Rank( ) = m.
As can be seen from the above three cases, the rank of the matrix is sufficient for knowing
if the time series {xt } is cointegrated. Therefore, next a likelihood ratio (LR) test is described
for determining the rank of , which is called the Johansen cointegration test. The hypothesis
of this test can be formulated as H0 : Rank() = m versus Ha : Rank() < m. The value
of m starts at null and is sequentially added by one if the null hypothesis is rejected. If the
null hypothesis is rejected for every m k, {xt } has the properties of the second case specified
above.
The LR test statistic proposed by Johansen is defined as

LRtr (m) =
(T
p)
k
X
ln(1
i ),
i=m+1
where i (should be small for i > m) are the squared canonical correlations between u
t and vt ,
which are the residuals of
xt and xt
1.
This test is also referred to as the trace cointegration
test. The asymptotic null distribution of this test is not

which depends on k
2,
but Dickey-Fuller-type distribution,
m and the deterministic components (Tsay, 2010).
Chapter 3
Stochastic spread model

In this chapter I will describe a mean reverting Gaussian Markov chain model for the spread,
namely the stochastic spread model which is based on the paper by Elliot et al. (2005). Later in
this thesis the returns of this stochastic spread approach, when implemented as a pairs trading
strategy, are compared with the above mentioned cointegration approach using historical data.
3.1
The state-space model
At any given time, a pairs trading portfolio is associated with a quantity called the spread,
which is the dierence between the quoted prices of the securities used. If the spread of the
portfolio is significantly dierent from the mean, a position in both securities is taken with the
expectation that the spread will revert to its mean (Vidyamurhty, 2004).
To explicitly model the mean reverting behaviour of the spread, a state process {xk |k =
0, 1, 2, ...} is introduced, where {xk } denotes the value of some variable at time tk = k for
k = 0, 1, 2, .... We assume that {xk } is mean reverting:
a
p
xk+1 xk = b
xk +
k+1 ,
b
where
(3.1)
0, b > 0, a 2 R and N (0, 1). The above equation is a discretized Ornstein-
Uhlenhorst process: dX(t) = (a
bX(t))dt + dW (t).
Furthermore, it is easy to see that xk N (k ,

k = E(xk = a +(1 b )k
2
k ),
with
= a +(1 b )[a +(1 b )k
2]
= ... =
a a
(1
b b
b )k +(1 b )k 0 ,
and
1 (1 b )2k
+ (1
1 (1 b )2
From these two equations the long term mean and variance can be derived.
2
k
= V ar(xk ) = (1
For k ! 1:
b )2
2
k 1
k =
a
,
b
= ... =
2
k
(1
b )2
b )2k
2
0.
CHAPTER 3. STOCHASTIC SPREAD MODEL
The state equation can be rewritten in the following way:

xk = A + Bxk
where A = a , B = (1
b ) and C =
+ Ck ,
(3.2)
The latent variable {xk } defined above is used in the measurement equation, which defines
the observed spread {yk } as a mean reverting process with noise:

yk = xk + D!k ,
(3.3)
where D > 0 and ! N (0, 1).
The model described above has three major advantages from an theoretical point of view.
The first one is rather obvious, namely the model is mean reverting. This is exactly what is
required of the spread between two stocks to implement a successful pairs trading strategy.
The second advantage is that the model for the spread is continuous in time, such that it is
convenient for forecasting purposes. Critical questions for pairs trading such as, the expected
holding period of the portfolio and the expected return of the strategy, can therefore be answered.
The third advantage is that the model is completely tractable. All the parameters can be
estimated using the Kalman filter and a maximum likelihood procedure called the EM algorithm.
In the next two sections, the Kalman filter and the EM algorithm will be discussed in detail.
3.2
Kalman Filter
To estimate the above dynamical system of the stochastic spread model, a very useful tool
called the Kalman Filter (which is named for the contribution of R.E. Kalman (Kalman, 1960))
is introduced. This Kalman Filter is an algorithm for calculating linear least squares forecasts
of the state vector on the basis of data observed through t,
x
t+1|t = E[xt+1 | t ],
where
= (yt , yt
1 , ..., y1 , xt , xt 1 , ..., x1 ).
sively, generating x
1|0 , x
2|1 ,..., x
t|t
The Kalman filter calculates these forecast recur-
in succesion (Hamilton, 1994).
In this thesis, the Kalman filter is described as a four-step procedure and is based on the
description given in chapter 13 of the book of Hamilton (1994) and the paper of Elliot et al.
(2005). For convenience, the key features of a general state-space system are given first:
xt+1 = A + Bxt + Ct+1 ,
(3.4)
yt = xt + D!t ,
(3.5)
where ! and are both white noise processes.
For now it is assumed that the values of A, B, C and D are know, but later these parameters
are estimated with the use of the EM algorithm from Shumway and Stoer (1982).
To begin the Kalman filtering, the starting point of the recursion has to be set. Typically,
the starting point of the recursion is set as x
1|0 = E[x1 ], which is just the unconditional mean
of x1 . The associated Mean Squared Error (MSE) of this starting point is therefore P1|0 =
E[(x1
x
1|0 )2 ].
After defining the starting point, the next step is to calculate the following points in time as
follows:
k+1 | k ] = A + Bk = A + B x
x
k+1|k = E[x
k|k ,
(3.6)
and the corresponding variance is:

Pk+1|k = E[(xk+1
x
k+1|k )2 ] = B 2 Pk|k + C 2 .
(3.7)
The second step of the Kalman Filter is to forecast the observation of yk :

yk|k
k |xk ,
= E[y
t 1]
= xk x
k|k
1.
(3.8)
The MSE of yt is therefore equal to:

E[(yk+1
yk+1|k )2 ] = Pk|k
+ D2 .
(3.9)
Next the inference about the current value of {xt } is updated on the basis of the observation
of {yt } to produce
k |yk ,
x
k|k = E(x
k 1)
k | k ).
= E(x
(3.10)
Using the formula for updating a linear projection (Hamilton, 1994)(p.379) results in:
x
k|k = x
k|k
+ (E[(xk
x
k|k
1 )(yk
yk|k
1 )]
(E[(yk
x
k+1|k+1 = x
k+1|k + k+1 (yk+1
yk|k
1)
])
x
k+1|k ),
(yk
yk|k
1 ),
(3.11)
(3.12)
where the stands for the kalman gain and is given by:
k+1 = Pk+1|k /(Pk+1|k + D2 ).
The estimate x
k+1|k+1 denotes the best forecast for of {xk+1 } given
3.3
(3.13)
k.
The EM Algorithm
The Kalman filter assumes that the parameters in the state-space model are specified in advance.
Normally, this is not the case and these parameters have to be estimated. One widely used
estimation method is described in the paper of Shumway and Stoer (1982) and will also be
10
used in this thesis. In the paper of Shumway and Stoer (1982) the estimation of the parameters
is done by maximum likelihood using the EM algorithm. Next, I will discuss this estimation
method.
In order to estimate the parameters of the state space model defined by 3.4 and 3.5, the joint
log likelihood has to be specified for this model. The dependence on the unobserved time series
{xk } of the system, makes the specification of the likelihood function not straightforward. To
solve this problem, the EM algorithm is conditioned on the observed time series y1 , ..., yn . Lets
define the estimated parameters at the (r + 1)st iterate as the values # = (A, B, C 2 , D2 ) which
maximize:
G(#) = Er [LogL|y1 , ..., yn ],
(3.14)
where the conditional expectation Er refers to the rth iterative values of A(r), B(r), C 2 (r) and
D2 (r). Furthermore, LogL is the joint log likelihood of the complete data. The conditional
mean and the covariance functions specified by the Kalman filter are conditioned on the full
dataset, which gives smoothed estimators of {xk }:
k|
x
k|n = E(x
Pk|n = E[(xk
Pk,k
1|n
= E[(xk
n ),
x
k|n )2 ],
x
k|n )(xk
x
k
1|n )].
The EM-algorithm is a two step iterative procedure that finds a stationary value # of the
likelihood function in the following way:
step 1 (The E-step): Compute (with # = #j ):
= E [LogL|y1 , ..., yn ],
Q(#, #)
#
step 2 (the M-step): Find
#j+1 2 argmax Q(#, #).

The graph 3.3 shows a generated spread (with the parameters in Elliot et al. (2005)) and
the fitted values of this spread using the stochastic spread approach.
11
Spr e ad
3
0
20
40
60
80
100
120
Day s
Figure 3.1: The fitted values of Stochastic Spread approach (green line) and simulated spread
(blue line)
Chapter 4
Trading design
This chapter discusses the trading strategy used in this thesis. In the first section, the trading
period is described. The second section sets out the pairs selection criteria for the two model
based approaches described in the former chapters. In the third section, the mean-variance
optimization theory of Markowitz (1952) for determining the optimal number of positions in the
spread, is discussed.
4.1
Trading period
The data used in this thesis contains daily closing prices of the stocks of the Amsterdam Stock
Exchange (AEX) in the period from 1st of January 2006 until 30th of December 2011 and is
obtained by Thomas Reuters through Datastream Advance. Since an equilibrium between two
stocks is not very likely to remain over the whole time of the dataset, the data is divided in little
blocks of formation periods and adjacent trading periods. The number of days in the formation
period are arbitrarily chosen and set to 128, 256 and 512 days. The adjacent trading period is
set to half of the trading days of the formation period as is done in earlier literature (Gatev et al.
(2006), Yakop (2011)). In the trading period, the number of positions in the spread is opened
following the mean-variance optimization procedure (discussed at the end of the chapter) and
the two standard deviation strategy. Any remaining open positions in the spread are closed at
the end of the trading period.
A rolling window of 40 trading days will be used to start a new formation period. The result
of implementing a rolling window is that after the first 128, 256 or 512 days (which are the
dierent lengths of the formation periods), all the remaining days in the dataset will be used
for trading and no opportunities are lost.
12
CHAPTER 4. TRADING DESIGN
4.2
13
Pairs selection
This section describes the criteria for selecting a suitable pair for the dierent methods.
4.2.1
Cointegration approach
As mentioned in chapter 2, {pit } is assumed to have a unit-root and follows a random walk
model: pit = pi,t
1 + rit .
This assumption is tested with the ADF-test and if the null hypothesis
(a unit root) is not rejected, the series {pit } is selected.
After selecting the time series {pit }, all the dierent combinations of pairs are tested for
cointegration by the Johansen test procedure. The model specified for testing is:
0
xt = ( wt
w ) + c 0 +
xt
+ t ,
where w is the intercept and c0 the deterministic trend.

Pairs that reject the first hypothesis of m = 0 and did not reject the second hypothesis of
m = 1 are selected as suitable pairs and have a mean reverting spread wt with mean mw . The
spread portfolio is wt = p1t
p2t . So against one stock of {p1t },
where is the speed of adjustment parameter.
4.2.2
stocks of {p2t } are held,
Stochastic spread approach
To select a pair suitable for trading, all the dierent combinations of spreads are estimated with
the EM algorithm and Kalman filter as discussed in chapter 3. After estimating the parameters
of the model, the parameter B of the state equation is evaluated. If B is between 0 < B < 1,
the spread shows mean reversing behaviour and the pair is selected for trading. The number of
positions taken in the spread is again obtained using the Mean-Variance optimization strategy
discussed below.
4.3
Mean-Variance optimization
This section will describe the mean-variance optimization procedure (MV), used for determining
the number of positions in a pairs trade. The concept of mean-variance optimization was first
introduced by Markowitz (1952). The main purpose of Markowitzs paper was to mathematically
explain the behaviour of investors to diversify their portfolio. Markowitz claims that investors
do not only maximize the expected return of a portfolio, but also consider the variance of
the returns. In this thesis I will use Markowitzs expected returns-variance of returns rule to
optimize the number of positions held in a spread portfolio.
The ratio behind the optimization of the number of position in a spread portfolio lies in the
mean reverting behaviour of the spread of a pairs trade. No matter how big the deviation of
14
the mean, the spread is always expected to revert back to its long term equilibrium value. In
earlier literature about pairs trading, a fixed position in the portfolio is opened after the spread
hits a pre-set threshold some distance away (two standard deviations) from the long term mean
(Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). After hitting the threshold value, the
position is held until the spread reverts back to the mean. When this happens, the position is
unwound and a profit is made. In the time that has passed between opening and closing the
position, the spread could have been significantly larger than it was when the trader first opened
the position. If this is the case, the trader can generate a much bigger profit by taking on more
positions proportional to the size of the spread.
In this thesis, the opportunity to generate a higher profit in a trade is explored by varying the
number of positions. The positions taken in a spread are optimized by using a utility function
based on the aforementioned principle of the expected returns-variance of returns by Markowitz
(1952), namely:
Ut (wpt+1 ) = Et
wpt+1 wpt
wpt
V art
wpt+1 wpt
,
wpt
where wpt is the amount of wealth of the portfolio at time t and
is a constant that mea-
sures the risk aversion of the trader (and is set to one when the strategy is evaluated). In
h
i
wpt
the paper of Markowitz (1952) it is stressed that finding reasonable values for Et wpt+1
wpt
h
i
wpt+1 wpt
and V art
by using reliable statistical techniques is essential. Both the stochastic
wpt
spread and the cointegration approach have these favourable characteristics. Now, lets define
{returnt+1 } as the value of a portfolio at time {t + 1} that invested one dollar in the spread
at time {t} . Using this definition for {returnt+1 }, the expected return and variance can be
evaluated using the following equations:
wpt+1 wpt
returnt+1
Et
= zt E t
,
wpt
wpt
returnt+1
2
V art [rt+1 ] = zt V art
,
wpt
where {zt } represents the number of positions taken in the spread portfolio. The value of
Et [returnt+1 ] is calculated with the use of the parameters estimated in the formation period.
The value of V art [returnt+1 ] is estimated in the formation period and is assumed to be constant
in the trading period.
The number of positions taken in the spread at any point in time can now be calculated by
maximizing the utility function with respect to {zt }. The first order condition is given by:
@Ut (zt )
returnt+1
returnt+1
=E
2 zt V ar
= 0.
@zt
wpt
wpt
Since the second derivative of the utility function is always negative ( > 0, V ar[returnt+1 ] > 0),
solving this first order condition for {zt } gives the number of positions to be taken in the spread
15
that maximize the utility function. This optimal value of {zt } at any point in time is given by:
zt =
E[returnt+1 ]
wpt .
2 V ar[returnt+1 ]
The return rt+1 of this strategy is given by:

rt+1 =
wpt+1 wpt
returnt+1
= zt
.
wpt
wpt
When the optimal value of zt is used, the return of the strategy is as follows:
rt+1 =
wpt+1
wp
wpt
t
Et [returnt+1 ]
returnt+1 .
2 V art [returnt+1 ]
It can be seen that the returns of this strategy are not dependent of the value of wpt .
Chapter 5
Evaluation
This chapter gives an evaluation of the results of the two model based approaches discussed in
chapters 2 and 3. The structure of this chapter is as follows. First, the definition of a Sharpe
ratio is given and a few concerns with the calculation of Sharpe ratios, as explained in the master
thesis of Yakop (2011), are discussed. In the second section, the results for both approaches are
given. The last section gives out of sample results of the dierent pairs trading strategies.
5.1
Sharpe ratio
A common way to compare the returns of dierent trading strategies is done by calculating
the reward-to-variability, nowadays also called the Sharpe ratio introduced by Sharpe (1966).
The Sharpe ratio gives the excess expected return of an investment to its return volatility. In
formula,
SR =
where E[rt ] and
E[rt ]
rf
(5.1)
are the expected return and standard deviation of the returns series {rt }. rf
is the average return earned by the benchmark in the evaluated period. The risk-free rate is
usually assumed to be an adequate benchmark for comparing the returns of the strategy. As
discussed in Yakop (2011), an adequate benchmark should act as an appropriate substitute for
pairs trading. Therefore, Yakop (2011) did not use the risk-free rate, but the composite index
of the stocks, in this case the AEX index. When calculating the Sharpe ratio with equation 5.1,
the rf is therefore set to zero. Afterwards, the calculated Sharpe ratios of the dierent trading
strategies are compared to the Sharpe ratios of the AEX index.
P
is found by substituting
The estimation of the Sharpe ratio, SR,
= T1 Tt=1 rt for E[rt ]
q P
and = T1 Tt=1 (rt
) for , which are the estimated mean and standard deviation of the
is based on
return series. As discussed in Yakop (2011), since SR
and (which are estimated
is (also) estimated with some error. Denoting the vector ( )0 by and
with some error), SR
16
CHAPTER 5. EVALUATION
17
the SR formula in equation 5.1 by g(), Lo (2002) shows that the asymptotic distribution of the
SR estimator is given by:
p
The estimation of
@g
@
T (SR
SR) N (0, VGM M ), VGM M =
@g @g
.
@ @0
and and the derivation of the asymptotic distribution are not done in
this thesis. Interested readers are referred to Appendix A of Yakop (2011).

Furthermore, Yakop (2011) discusses two limitations of the use of Sharpe ratios. The first
limitation of the Sharpe ratio is that it implicitly assumes the return series to be normally
distributed or at least approximately so. In practice, pairs trading strategies produce frequent
small positive returns with sometimes large losses, which will accentuate the Sharpe ratios
because of the excess skewness and kurtosis (Lo, 2002).
The second limitation of the use of Sharpe ratio is that it ignores any underlying serial
correlation, which is frequently present in financial time series. The consequence of the serial
correlation is, again, that it results in overestimation of the Sharpe ratios (Lo, 2002) . To resolve
this issue, the standard deviations of the return series,
have to be estimated by the Newey-West
(1987) (heteroskedastic) autocorrelation consistent estimator of variance (HAC estimator). The

HAC estimator is used when calculating the Sharpe ratios of the return series. The derivation of
the HAC estimator is not done in this thesis, but can also be found in Yakop (2011) in Appendix
A.
5.2
Results
In this section the results of pairs trading with the Stochastic Spread approach and the Cointegration approach are given.
5.2.1
Stochastic Spread Approach
As mentioned in the third chapter, the Stochastic Spread model has three major advantages
from a theoretical point of view. The model captures mean-reversion, is continuous in time and
is completely tractable. Despite these hopeful properties of the model, the experienced empirical
results turn out to be less favourable.
First of all it takes a long time to estimate the parameters of one spread, let alone those of
the 276 dierent spreads available in the AEX (consisting of 24 stocks). To give an indication
of the time needed to estimate these spreads: a single formation period already takes forty-two
minutes. There are seven formation periods in this dataset. So the estimation of all the dierent
pairs in the dataset would take roughly five hours.
This first disadvantage stated above, is inconvenient but can be overcome by the use of
faster computers (or patience). However, another disadvantage is more problematic. After the
18
estimation of all the dierent spreads, the amount of pairs found suitable for pairs trading was
minimal. For example, the first formation period resulted in five suitable pairs. This is not
much, given the fact that there are 276 dierent pairs available.
Also, the parameters estimated from the pairs selected by this method, suggest that the
model can be simplified to a simple AR(1) model for the spread. Specifically, the parameter D
in the space equations is estimated to be at most 0.001. This suggest that the state-space model
can be brought back to the state equation, which is just a simple AR(1) model for the spread.
This AR(1) model has already extensively been tested in the context of pairs trading inYakop
(2011) and will therefore not be further analysed in this thesis.
So, despite the favourable theoretical properties, the use of the stochastic spread model for
pairs trading, which was suggested by Elliot et al. (2005), does not turn out to be a good
approach for pairs trading in practice.
Parameters of selected Pairs
Values
Number of possible pairs
276
Average number of selected pairs

A
5
0.0062
B
C
0.9845
0.0007
0.2660
Table 5.1: Estimation results of Stochastic Spread Approach
0.2
0.15
0.1
3
0.05
2
0
1
0.05
0
0.1
1
0.15
2
3
0.2
0.25
100
200
300
400
500
600
(a) Fitted values of spread in FP, with B = 0.9827
50
100
150
200
250
(b) Actual spread in the trading period
Figure 5.1: Example of a Pair selected with the Stochastic Spread Approach
300
5.2.2
19
Cointegration Approach
Contrary to the stochastic spread approach, the results of the cointegration approach are useful
for evaluating a pairs trading strategy. To begin the evaluation of the cointegration approach, an
overview of the specifics of the dataset and parameters used in the analyses are stated in table
5.2. As can be learned from table 5.2, results for three dierent lengths of formation periods
(respectively 128, 256 and 512 days) and the adjacent trading periods, are estimated. In these
dierent lengths, all the possible combinations of pairs (in this case 276 pairs) are tested with
the Johansen cointegration trace test described in 2.3 (with a significance level of 0.05). The
average amount of pairs found by this test for the dierent formation periods are also stated in
table 5.2.
Parameters
Description
Values
Number of trading days
Number of stocks
23
RW
Rolling window
40
FP
Formation period
128 days
256 days
512 days
TP
Trading period
64 days
128 days
256 days
NT
Number of trading periods
28
23
13
NP
Average number of Pairs
19
28
35
1316
Table 5.2: Parameters for trading strategy

The graphs of figure 5.2 on page 20 show the behaviour of two dierent pairs during the
formation and trading period. As can be seen from the graphs, the pair of stocks show periods
of divergence and convergence during the formation period. This mean reversion behaviour is
the key for a profitable pairs trading strategy and is present in all pairs selected in the formation
period. Unfortunately, some of the pairs formed during the formation period will not portray
the same behaviour during the trading period (see graph d). As a result, losses will be made
on these pairs. For the pairs trading strategy to be a success, the pairs that do show mean
reversion behaviour should make up for the probable losses incurred on these bad pairs.
Now I will present the main results of the cointegration approach. Table 5.3 contains the
calculated Sharpe ratios of the cointegration approach using the mean-variance optimization
trading strategy. The Sharpe ratios are calculated on the basis of the daily returns and therefore
look small. Conversion of these daily SRs to annual SRs is commonly done by multiplying the
p
SRs by 250. This is known as time aggregation within finance. Lo (2002) however shows that
statistically speaking this rule is incorrect because of the serial correlation underlying financial
20
1
4
0
3
Spread
Spread
3
1
4
0
5
20
40
60
80
100
120
140
10
20
30
40
50
60
Days
Days
(a) Spread FP: Aegon, Heineken
(b) Spread TP: Aegon, Heineken
70
15
10
Spread
Spread
0
1
5
2
10
0
20
40
60
80
100
120
140
10
(c) Spread FP: PostNL, Unibail-Rodamco
20
30
40
50
60
70
Days
Days
(d) Spread TP: PostNL, Unibail-Rodamco
Figure 5.2: Example of Pairs
21
returns, which can result in extreme overestimation of the SRs. Therefore, only the estimated
daily SRs are included in this thesis.
Furthermore, it has to be noted that the calculation of the daily returns did not incorporate the transaction costs. Including transaction costs in the investigation would require some
creativity, since the dierence between the bid and ask price of a stock is not reported (only
the daily closing prices are). The fee for making a transaction is also not commonly known.
Therefore, the inclusion of transaction costs within pairs trading justifies an entire research on
its own and shall not be further dealt with in this thesis.
As can be seen from the average Sharpe ratios of this strategy, the mean-variance optimization suers large losses in all the dierent formation periods length. This is a remarkable result,
since this strategy is supposed to maximize the value of the portfolio. Unfortunately, one critical
assumption of this strategy is that the selected pairs have the property of mean reversion. If this
assumption is not met and a pair drifts away, the number of positions taken in the spread will
increase dramatically and huge losses will be taken. The results show that there are to many
pairs that show this behaviour. Therefore the average Sharpe ratios of the dierent formation
periods are negative.
Benchmark
SR(AEX)
Descriptive statistics SRs

FP
Average
Max
Min
Count
Std. Dev.
pos. SR
SR
>
Significant
SR(AEX)
at 5%
0.0073
128
-0.0447
0.1461
-0.1523
0.0583
13
0.0046
256
-0.0444
0.0585
-0.1079
0.0436
0.0439
512
-0.0376
0.0041
-0.0665
0.0227
Table 5.3: SRs for dierent FP with MV optimized positions

On the other hand, the histogram in figure 5.3 (which shows the distribution of the SR in
the dierent TP) shows us that if there are enough pairs that do mean reverse in one formation
period, the SR of that period can be high (SR of 0,15). Unfortunately, this does not happen
often enough and the overall results of this strategy are disappointing.
22
Freqeuncy
0
0.2
0.15
0.1
0.05
0
SRs
0.05
0.1
0.15
0.2
Figure 5.3: Histogram of the estimated SRs of the MV strategy of formation period length of
128 days
To compare the mean-variance strategy with a less risky strategy, I also calculated the Sharpe
ratios using the common two standard deviation (2STD) strategy for opening a position. This
strategy is not as risky as the mean-variance optimization, because it will only open one position
at a time. The results of this strategy are stated in table 5.2.2. It can be seen that the 2STD
strategy returns positive average Sharpe ratios in the three dierent formation periods, where
the formation period of 128 days has the highest average. In contrast to the mean-variance
strategy, the pairs that do not converge and will drift away from the equilibrium will only have
a loss of two times the standard deviation. These losses are clearly overcome by all the pairs that
do behave as expected, which results in the positive average Sharpe ratios for all the dierent
trading periods.
Benchmark
SR(AEX)
23

FP
Average
Max
Min
Count
Std. Dev.
pos. SR
SR
>
Significant
SR(AEX)
at 5%
0.0073
128
0.0209
0.0667
-0.0227
0.0233
25
15
11
0.0046
256
0.0147
0.0469
-0.0167
0.0221
19
12
0.0439
512
0.0081
0.0204
-0.0045
0.0092
10
Table 5.4: SR results of dierent FP with 2STD trigger opening of a position
5.3
Results using DAX index
To see if the results of the cointegration approach are robust, an second estimation of the
cointegration approach for both trading strategies is done. The second dataset consists of the
daily closing prices from the last five years of the DAX index (which includes the thirty biggest
listed German companies). The results of both trading strategies are given in the table 5.5
below.
As can be seen in table 5.5, the MV strategy is performing even worse in this dataset than it
did in the AEX dataset. The average daily SRs of the MV strategy for the dierent periods are
all negative and only in one TP does the MV strategy significantly outperform the DAX index
(FP:128 days). The 2STD strategy (again) performs better than the MV strategy and generates
small positive average SR in all the trading periods. The results of both pairs trading strategies
of both datasets are much alike. Therefore, it can be concluded that the results obtained are
robust.
Benchmark
Strategy
MV
2STD
SR(DAX)

FP
Average
Max
Min
Count
Std. Dev.
pos. SR
SR
>
Significant
SR(AEX)
at 5%
0.0551
128
-0.0843
-0.0119
-0.1829
0.0486
0.0596
256
-0.0636
0.0165
-0.1056
0.0426
0.0316
512
-0.0429
-0.0225
-0.0536
0.0105
0.0551
128
0.0210
0.0689
-0.0181
0.0215
17
0.0596
256
0.0135
0.0429
-0.0069
0.0159
14
0.0316
512
0.0069
0.0148
-0.0045
0.0030
Table 5.5: SR Results of MV and 2STD for Dax index
Chapter 6
Conclusion
In this thesis two dierent model based approaches for pairs trading were discussed and tested
with the use of two dierent trading strategies. Results were generated for the daily closing
prices of the stocks in the AEX index over the last five years. Furthermore, an out of sample
estimation was done to verify if the results where robust.
The first approach for modelling the behaviour of a pair, the stochastic spread, was first
suggested (but not yet tested) by (Elliot et al., 2005). From a theoretical point of view, the
stochastic spread has three major advantages. The model captures mean-reversion, is continuous
in time and is completely tractable. Despite these theoretical advantages, the empirical results
turn out to be less favourable in practice. First of all, the stochastic spread approach did not
find pairs suitable for trading. Secondly, the estimated parameters of the state-space form of
the model suggested that the model could be simplified to only the state equation (which is just
an AR(1) model). This renders the estimation of the parameters with the EM-algorithm and
Kalman filter unnecessary, since the AR(1) model is embedded in the other approach discussed
in this thesis. Therefore, only a few estimates and graphs of the spread are present and not the
actual results of pairs trading are present in this thesis.
The second approach for modelling the behaviour of a pair is the cointegration approach.
The idea of cointegration was already used for pairs trading in earlier papers (Yakop (2011),
Vidyamurhty (2004)). The approach in these earlier papers however, is more ad-hoc and not
based on the error correction model (ECM), which is normally used in econometric research. In
this thesis the cointegration approach is based on the ECM and the pairs are tested with the
use of the Johansen cointegration test.
Subsequently, two trading strategies for taking a position in the spread were used to calculate
the results. The first one is the two standard deviations strategy (2STD). This strategy is
commonly used in the literature (Yakop, 2011; Vidyamurhty, 2004, Gatev et al., 2006). The
concept of this strategy is very simple: one takes a position in the spread if it is far enough
(two standard deviation) away from the mean and closes the position when the spread returns
24
CHAPTER 6. CONCLUSION
25
to the equilibrium value. The second strategy is called the mean-variance approach (MV). As
the name suggests, the number of positions taken in the spread is determined by a trade-o
between the dierence from the spread of the mean and the variance of the spread. The spread
is expected to revert back to the mean and the MV strategy uses this assumption to maximize
the portfolio value by varying the number of positions taken in the spread.
The results of both strategies are in tables 5.2.2 and 5.3. The 2STD strategy generated
small positive returns over all the dierent formation periods. This result is typical for a pairs
trading strategy and is thus what you would expect. In contrary, the MV strategy generates
large negative SRs in all the formation periods. This is not what you would expect, because
this strategy aims to maximize the portfolio value by varying the number of positions in the
spread and should, consequently, perform well. However, one crucial assumption for the success
of this strategy, namely mean reversion, is not met by a large number of pairs. The number
of positions drastically increases in these pairs and the losses are substantial. This leads me to
the conclusion that the MV strategy might be too risky (in this case, at least) for pairs trading.
The estimation of the second dataset (DAX index) confirms this, because similar results were
generated. Given the fact that two indices produced similar results, one can conclude that these
results are robust.
Further research in pairs trading should focus on other ways to optimize the trading strategy,
since the MV procedure did not generate the desired results. Furthermore, the inclusion of
transaction costs within pairs trading is a relevant topic that should be taken into account,
but has not yet been investigated. One could also investigate the concept of pairs trading for
more than two securities, such as triple or quadruple trading. The cointegration approach
discussed in this thesis could be a good way for investigating this topic, since the existence of a
cointegration relation between three or four stock can be easily tested within this framework.
Bibliography
Baronyan, S. R., Boduroglu, I. I., and Sener, E. (2010). Investigation of Stochastic Pairs Trading
Strategies under dierent Volatility Regimes. The Manchester School, pages 114134.
Broda, S. (2011). Financial econometrics slides.
Do, B., Fa, R., and Hamza, K. (2006). A New Approach to Modeling and Estimation for Pairs
Trading. Working Paper, pages 130.
Elliot, M. J., van der Hoek, J., and Malcolm, W. (2005). Pairs Trading. Quantitative Finance,
5(3):271276.
Engle, R. F. and Granger, C. W. (1987). Co-integration and Error Correction:representation,
Estimation and Testing. Econometrica, 55(2):251276.
Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. (2006). Pairs Trading: Performance of
a Relative-Value Arbitrige Rule. Review of Financial studies, 19(3):797827.
Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.
Johansen, S. (1991). Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian
Vector Autoregressive Models. Econometrica, 59(6):15511580.
Kalman, R. (1960). A new Approach to Linear Filtering and Prediction Problems. Journal of
Basic Engineering, 82:3545.
Lo, A. (2002). The statistics of Sharpe Ratios. Financial Analysts Journal, July/August:3652.
Markowitz, H. (1952). Portfolio Selection. Journal of finance, 7(1):7791.
Sharpe, W. (1966). Mutual Fund Performance. The journal of Business, 39(1):119138.
Shumway, R. and Stoer, D. (1982). An Approach to Time Series Smoothing and Forecasting
using the EM Algorithm. Journal of Time Series Analysis, 3:253264.
Tsay, R. S. (2010). Analysis of Financial Time Series. John Wiley and Sons, Inc., third edition
edition.
26
BIBLIOGRAPHY
27
Vidyamurhty, G. (2004). Pairs Trading, Quantitative Methods and Analysis. John Wiley and
Sons, Inc.
Yakop, M. (2011). A Comparative Analysis of Pairs Trading. Masters thesis, University of
Amsterdam.

Pairs Trading: An Implementation of The Stochastic Spread and Cointegration Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pairs Trading: An Implementation of The Stochastic Spread and Cointegration Approach

Uploaded by

Copyright:

Available Formats

University of Amsterdam

Prof. dr. C.G.H. Diks

Dr. S.A. Broda

August 10, 2012

Integration, cointegration and error correction . . . . . . . . . . . . . . . . . . . .

Theoretical framework for pairs trading . . . . . . . . . . . . . . . . . . . . . . .

Johansen cointegration test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Stochastic spread model

The state-space model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Stochastic spread approach . . . . . . . . . . . . . . . . . . . . . . . . . .

Stochastic Spread Approach . . . . . . . . . . . . . . . . . . . . . . . . . .

Results using DAX index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Integration, cointegration and error correction

CHAPTER 2. COINTEGRATION APPROACH

where the innovation at is assumed to be Gaussian and t = 0 + 1 t, where 0 and 1 are

b), b > 0. The vector is called the cointegrating vector.

Theoretical framework for pairs trading

by the Augmented Dicky-Fuller (ADF) unit-root test).

CHAPTER 2. COINTEGRATION APPROACH

p2t , which is unit-root stationary and mean reverting.

two log stock prices.

w denote the deviations from the long-term

Johansen cointegration test

gration study (Tsay, 2010). If we assume that {xt } is at most I(1),

xt is I(0) process. Now,

one can consider three cases of interest of the ECM, namely:

2. Rank() = k. Hence, | (1)| =

CHAPTER 2. COINTEGRATION APPROACH

unit roots. If one writes = , and

are k m matrices with Rank() = Rank( ) = m.

The LR test statistic proposed by Johansen is defined as

This test is also referred to as the trace cointegration

test. The asymptotic null distribution of this test is not

but Dickey-Fuller-type distribution,

m and the deterministic components (Tsay, 2010).

Stochastic spread model

The state-space model

0, b > 0, a 2 R and N (0, 1). The above equation is a discretized Ornstein-

Uhlenhorst process: dX(t) = (a

Furthermore, it is easy to see that xk N (k ,

= a +(1 b )[a +(1 b )k

CHAPTER 3. STOCHASTIC SPREAD MODEL

The state equation can be rewritten in the following way:

the observed spread {yk } as a mean reverting process with noise:

where D > 0 and ! N (0, 1).

The Kalman filter calculates these forecast recur-

in succesion (Hamilton, 1994).

where ! and are both white noise processes.

CHAPTER 3. STOCHASTIC SPREAD MODEL

and the corresponding variance is:

The second step of the Kalman Filter is to forecast the observation of yk :

The MSE of yt is therefore equal to:

CHAPTER 3. STOCHASTIC SPREAD MODEL

#j+1 2 argmax Q(#, #).

CHAPTER 3. STOCHASTIC SPREAD MODEL

CHAPTER 4. TRADING DESIGN

(a unit root) is not rejected, the series {pit } is selected.

where w is the intercept and c0 the deterministic trend.

p2t . So against one stock of {p1t },

where is the speed of adjustment parameter.

stocks of {p2t } are held,

Stochastic spread approach

CHAPTER 4. TRADING DESIGN