Professional Documents
Culture Documents
Master Thesis
Pairs Trading:
An implementation of the Stochastic Spread and
Cointegration Approach
Supervisors:
Author:
Nick Huurman
5631335
Contents
1 Introduction
2 Cointegration approach
2.1
2.2
2.3
3.1
3.2
Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3
The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Trading design
12
4.1
Trading period . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.2
Pairs selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.2.1
Cointegration approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.2.2
13
Mean-Variance optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.3
5 Evaluation
16
5.1
Sharpe ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
5.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
5.2.1
17
5.2.2
Cointegration Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
23
5.3
6 Conclusion
24
Chapter 1
Introduction
History shows us that using a market neutral trading strategy can be a good way to invest your
money. Typically, such a strategy performs in a steady manner, regardless of whether the market
goes up or down, and returns come with low volatility (Vidyamurhty, 2004). These favourable
characteristics are achieved by trading a market neutral portfolio, which can be constructed by
going long and short in two assets that have the same beta (hence, a portfolio with zero beta),
which is also referred to as a spread portfolio.
This thesis will evaluate one particular market neutral trading strategy that has already been
used (and proved its value) for 25 years on Wall Street, namely pairs trading. Recent studies
tell us that pairs trading performs exceptionally well in turbulent markets, where mispricing of
stocks is more common (Gatev et al., 2006; Do et al., 2006; Baronyan et al., 2010). Baronyan
et al. (2010) even reported a 40 per cent net annual profit in the first year (2008) of the financial
crisis. This result shows that pairs trading, despite its 25-year existence, is still profitable and
therefore very relevant to investigate, especially with the recent turbulent stock market.
The concept of pairs trading is relatively simple and can be summarized as follows. To begin,
an investor has to find two securities of which the prices have historically moved together and are
therefore in a relative equilibrium. Then, when the price dierence between the two securities
widens, hence the securities are out of the relative equilibrium, the trader takes a long position
in the cheap security and a short position in the expensive security. Based on the past price
dynamics, the expectation of the investor is that the prices will converge back to their relative
equilibrium. If so, the long and short position are unwound and a profit is made.
The main difficulties of constructing a profitable pairs trading strategy lie evidently in using
the right method for selecting a suitable pair of securities and how and when to take a position
in the selected pair. A recent thesis by Yakop (2011) investigates a broad range of selection and
trading methods which are appropriate for pairs trading. He concludes that the model based
approaches perform best. Therefore, this thesis will investigate and analyse two dierent model
based approaches for pairs trading. The first approach is the cointegration approach, which is
1
CHAPTER 1. INTRODUCTION
based on the error correction model. The second method is the stochastic spread approach,
as introduced in Elliot et al. (2005).
The results of the selected pairs of both methods will be calculated with the use of two
dierent trading strategies. The first is a dynamic model for the number of positions taken in
the spread that is based on the mean-variance optimization procedure discussed in the paper of
Markowitz (1952). The second is the two-standard deviation approach, which is commonly used
in earlier literature (Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). The main objective
for this thesis is to compare the performance of the cointegration approach with the stochastic
spread approach when implemented with the two aforementioned pairs trading strategies.
The thesis is organized as follows. Chapters 2 and 3 give an outline of the two dierent
approaches used for modelling the behaviour of a pair. In the 4th chapter the dierent trading
strategies will be described and Chapter 5 provides an evaluation of the results of both models
with the dierent trading strategies. The last chapter contains the conclusions of this thesis.
Chapter 2
Cointegration approach
The notion of cointegrated time series was first introduced by Engle and Granger (1987) and
is one of the ideas for which they received a Nobel Prize in economics in 2003. Cointegrated
time series possess characteristics that are very useful for pairs trading, such as a long-term
equilibrium with the associated property of mean reversion. In the first section of this chapter
the definitions of integration, cointegration and the error correction model (ECM) for a time
series are given. The second section gives the theoretical framework for pairs trading and in the
third section, the cointegration test proposed by Johansen (1991) is discussed.
2.1
To begin the theory about cointegration, first the definitions of weakly stationarity and integrated time series are given:
Definition. An n ordered sequence of random variables ,i.e., a time series or process {xt }
is weakly stationary or second-order stationary if the first two moments of the distribution of
{xt } are constant and independent of time.
Definition. A time series which has a stationary, invertible, ARMA representation after dierencing d times, is said to be integrated of order d, denoted {xt } I(d).
The two above definitions become tangible by an example of a simple VAR model. Consider a
k-dimensional VAR(p) time series {xt } with possible time trend so that the model is
xt = t +
1 xt 1
+ ... +
p xt p
or
(B)xt = t + at ,
3
+ at ,
with
(B) = [I
1B
...
pB
],
so that zt = xt I(d
Considering the case where d = b = 1, cointegration would mean that the equilibrium error
would be I(0) and zt will rarely drift far from its mean and will often cross this line (Engle
and Granger, 1987). A convenient way of representing the vector {xt } as a stationary series is
by the error correction model (ECM) representation (solves the issue of overdierencing (Tsay
(2010),p. 431)). The definition of the ECM is given next (Engle and Granger, 1987):
Definition: A vector time series {xt }, has an error correction representation if it can be expressed as:
A(B)(1
B)xt =
zt
+ ut ,
where ut is a stationary multivariate disturbance, with A(0) = I, A(1) has all elements finite
and
6= 0.
In this representation of the ECM, only the disequilibrium in the last period is an explanatory
variable. However, by rearranging terms, any kind of set of lags can be written in this form.
Therefore, this representation of the ECM permits any type of gradual adjustment towards a
new equilibrium (Engle and Granger, 1987).
2.2
Define the observed price of stock i at time t as {Pit } and let pit = ln(Pit ) be the corresponding log price. Now a common assumption about {pit } is made in the literature (Tsay, 2010;
Vidyamurhty, 2004), namely the time series {pit } has a unit-root and follows a random walk:
pit = pi,t
+ rit , where {rit } is the return (this unit root assumption of {pit } will be confirmed
Based on the arbitrage pricing theorem (APT), if two stocks have similar risk factors, they
should have similar returns. If this is the case, {p1t } and {p2t } are likely to be driven by a
common component and are therefore cointegrated (Tsay, 2010). Or in formula, there exists a
linear combination wt =
0p
= p1t
These two price series {p1t } and {p2t } can also be written in an ECM form:
!
!
!
p1t p1,t 1
1
1t
=
(wt 1 w ) +
,
p2t p2,t 1
2
2t
(2.1)
where w = E[wt ] denotes the mean of {wt }, which is referred to as the spread between the
The left hand side of the ECM form represents the log returns of both price series. Furthermore, the equation states that the returns depend on the stationary series wt
are therefore also stationary. Specifically, wt
w and
equilibrium between the two stocks. So, the returns of the stocks (left side of 2.1) depend on the
past deviations from the equilibrium. The coefficients 1 and 2 respectively show the eect of
these past deviations on the returns {r1t } and {r2t }. In practice, the coefficients 1 and 2 will
have opposite signs, indicating the mean reversion behaviour of the stationary series.
2.3
For testing purpose, the ECM representation for a k-dimensional VAR(p) time series {xt } becomes:
xt = dt + xt
1 xt 1
p xt p
+ ... +
+ at ,
where the deterministic regressor {dt } (constant/trend) is added and t = p + 1, ..., T . Furthermore,
p
X
i,
i=j+1
and
=
The term xt
p 1
+ ... +
I=
(1).
is referred to as the error correction term, which plays a key role in the cointe-
3. 0 < Rank() = m < k. Hence, xt has m linearly independent cointegration vectors and k m
As can be seen from the above three cases, the rank of the matrix is sufficient for knowing
if the time series {xt } is cointegrated. Therefore, next a likelihood ratio (LR) test is described
for determining the rank of , which is called the Johansen cointegration test. The hypothesis
of this test can be formulated as H0 : Rank() = m versus Ha : Rank() < m. The value
of m starts at null and is sequentially added by one if the null hypothesis is rejected. If the
null hypothesis is rejected for every m k, {xt } has the properties of the second case specified
above.
(T
p)
k
X
ln(1
i ),
i=m+1
where i (should be small for i > m) are the squared canonical correlations between u
t and vt ,
which are the residuals of
xt and xt
1.
2,
Chapter 3
3.1
At any given time, a pairs trading portfolio is associated with a quantity called the spread,
which is the dierence between the quoted prices of the securities used. If the spread of the
portfolio is significantly dierent from the mean, a position in both securities is taken with the
expectation that the spread will revert to its mean (Vidyamurhty, 2004).
To explicitly model the mean reverting behaviour of the spread, a state process {xk |k =
0, 1, 2, ...} is introduced, where {xk } denotes the value of some variable at time tk = k for
k = 0, 1, 2, .... We assume that {xk } is mean reverting:
a
p
xk+1 xk = b
xk +
k+1 ,
b
where
(3.1)
bX(t))dt + dW (t).
2
k ),
with
2]
= ... =
a a
(1
b b
b )k +(1 b )k 0 ,
and
1 (1 b )2k
+ (1
1 (1 b )2
From these two equations the long term mean and variance can be derived.
2
k
= V ar(xk ) = (1
For k ! 1:
b )2
2
k 1
k =
a
,
b
= ... =
2
k
(1
b )2
b )2k
2
0.
b ) and C =
+ Ck ,
(3.2)
The latent variable {xk } defined above is used in the measurement equation, which defines
(3.3)
The model described above has three major advantages from an theoretical point of view.
The first one is rather obvious, namely the model is mean reverting. This is exactly what is
required of the spread between two stocks to implement a successful pairs trading strategy.
The second advantage is that the model for the spread is continuous in time, such that it is
convenient for forecasting purposes. Critical questions for pairs trading such as, the expected
holding period of the portfolio and the expected return of the strategy, can therefore be answered.
The third advantage is that the model is completely tractable. All the parameters can be
estimated using the Kalman filter and a maximum likelihood procedure called the EM algorithm.
In the next two sections, the Kalman filter and the EM algorithm will be discussed in detail.
3.2
Kalman Filter
To estimate the above dynamical system of the stochastic spread model, a very useful tool
called the Kalman Filter (which is named for the contribution of R.E. Kalman (Kalman, 1960))
is introduced. This Kalman Filter is an algorithm for calculating linear least squares forecasts
of the state vector on the basis of data observed through t,
x
t+1|t = E[xt+1 | t ],
where
= (yt , yt
1 , ..., y1 , xt , xt 1 , ..., x1 ).
sively, generating x
1|0 , x
2|1 ,..., x
t|t
In this thesis, the Kalman filter is described as a four-step procedure and is based on the
description given in chapter 13 of the book of Hamilton (1994) and the paper of Elliot et al.
(2005). For convenience, the key features of a general state-space system are given first:
xt+1 = A + Bxt + Ct+1 ,
(3.4)
yt = xt + D!t ,
(3.5)
For now it is assumed that the values of A, B, C and D are know, but later these parameters
are estimated with the use of the EM algorithm from Shumway and Stoer (1982).
To begin the Kalman filtering, the starting point of the recursion has to be set. Typically,
the starting point of the recursion is set as x
1|0 = E[x1 ], which is just the unconditional mean
of x1 . The associated Mean Squared Error (MSE) of this starting point is therefore P1|0 =
E[(x1
x
1|0 )2 ].
After defining the starting point, the next step is to calculate the following points in time as
follows:
k+1 | k ] = A + Bk = A + B x
x
k+1|k = E[x
k|k ,
(3.6)
x
k+1|k )2 ] = B 2 Pk|k + C 2 .
(3.7)
k |xk ,
= E[y
t 1]
= xk x
k|k
1.
(3.8)
yk+1|k )2 ] = Pk|k
+ D2 .
(3.9)
Next the inference about the current value of {xt } is updated on the basis of the observation
of {yt } to produce
k |yk ,
x
k|k = E(x
k 1)
k | k ).
= E(x
(3.10)
Using the formula for updating a linear projection (Hamilton, 1994)(p.379) results in:
x
k|k = x
k|k
+ (E[(xk
x
k|k
1 )(yk
yk|k
1 )]
(E[(yk
x
k+1|k+1 = x
k+1|k + k+1 (yk+1
yk|k
1)
])
x
k+1|k ),
(yk
yk|k
1 ),
(3.11)
(3.12)
where the stands for the kalman gain and is given by:
k+1 = Pk+1|k /(Pk+1|k + D2 ).
The estimate x
k+1|k+1 denotes the best forecast for of {xk+1 } given
3.3
(3.13)
k.
The EM Algorithm
The Kalman filter assumes that the parameters in the state-space model are specified in advance.
Normally, this is not the case and these parameters have to be estimated. One widely used
estimation method is described in the paper of Shumway and Stoer (1982) and will also be
10
used in this thesis. In the paper of Shumway and Stoer (1982) the estimation of the parameters
is done by maximum likelihood using the EM algorithm. Next, I will discuss this estimation
method.
In order to estimate the parameters of the state space model defined by 3.4 and 3.5, the joint
log likelihood has to be specified for this model. The dependence on the unobserved time series
{xk } of the system, makes the specification of the likelihood function not straightforward. To
solve this problem, the EM algorithm is conditioned on the observed time series y1 , ..., yn . Lets
define the estimated parameters at the (r + 1)st iterate as the values # = (A, B, C 2 , D2 ) which
maximize:
G(#) = Er [LogL|y1 , ..., yn ],
(3.14)
where the conditional expectation Er refers to the rth iterative values of A(r), B(r), C 2 (r) and
D2 (r). Furthermore, LogL is the joint log likelihood of the complete data. The conditional
mean and the covariance functions specified by the Kalman filter are conditioned on the full
dataset, which gives smoothed estimators of {xk }:
k|
x
k|n = E(x
Pk|n = E[(xk
Pk,k
1|n
= E[(xk
n ),
x
k|n )2 ],
x
k|n )(xk
x
k
1|n )].
The EM-algorithm is a two step iterative procedure that finds a stationary value # of the
likelihood function in the following way:
step 1 (The E-step): Compute (with # = #j ):
= E [LogL|y1 , ..., yn ],
Q(#, #)
#
step 2 (the M-step): Find
11
Spr e ad
3
0
20
40
60
80
100
120
Day s
Figure 3.1: The fitted values of Stochastic Spread approach (green line) and simulated spread
(blue line)
Chapter 4
Trading design
This chapter discusses the trading strategy used in this thesis. In the first section, the trading
period is described. The second section sets out the pairs selection criteria for the two model
based approaches described in the former chapters. In the third section, the mean-variance
optimization theory of Markowitz (1952) for determining the optimal number of positions in the
spread, is discussed.
4.1
Trading period
The data used in this thesis contains daily closing prices of the stocks of the Amsterdam Stock
Exchange (AEX) in the period from 1st of January 2006 until 30th of December 2011 and is
obtained by Thomas Reuters through Datastream Advance. Since an equilibrium between two
stocks is not very likely to remain over the whole time of the dataset, the data is divided in little
blocks of formation periods and adjacent trading periods. The number of days in the formation
period are arbitrarily chosen and set to 128, 256 and 512 days. The adjacent trading period is
set to half of the trading days of the formation period as is done in earlier literature (Gatev et al.
(2006), Yakop (2011)). In the trading period, the number of positions in the spread is opened
following the mean-variance optimization procedure (discussed at the end of the chapter) and
the two standard deviation strategy. Any remaining open positions in the spread are closed at
the end of the trading period.
A rolling window of 40 trading days will be used to start a new formation period. The result
of implementing a rolling window is that after the first 128, 256 or 512 days (which are the
dierent lengths of the formation periods), all the remaining days in the dataset will be used
for trading and no opportunities are lost.
12
4.2
13
Pairs selection
This section describes the criteria for selecting a suitable pair for the dierent methods.
4.2.1
Cointegration approach
As mentioned in chapter 2, {pit } is assumed to have a unit-root and follows a random walk
model: pit = pi,t
1 + rit .
This assumption is tested with the ADF-test and if the null hypothesis
After selecting the time series {pit }, all the dierent combinations of pairs are tested for
cointegration by the Johansen test procedure. The model specified for testing is:
0
xt = ( wt
w ) + c 0 +
xt
+ t ,
4.2.2
To select a pair suitable for trading, all the dierent combinations of spreads are estimated with
the EM algorithm and Kalman filter as discussed in chapter 3. After estimating the parameters
of the model, the parameter B of the state equation is evaluated. If B is between 0 < B < 1,
the spread shows mean reversing behaviour and the pair is selected for trading. The number of
positions taken in the spread is again obtained using the Mean-Variance optimization strategy
discussed below.
4.3
Mean-Variance optimization
This section will describe the mean-variance optimization procedure (MV), used for determining
the number of positions in a pairs trade. The concept of mean-variance optimization was first
introduced by Markowitz (1952). The main purpose of Markowitzs paper was to mathematically
explain the behaviour of investors to diversify their portfolio. Markowitz claims that investors
do not only maximize the expected return of a portfolio, but also consider the variance of
the returns. In this thesis I will use Markowitzs expected returns-variance of returns rule to
optimize the number of positions held in a spread portfolio.
The ratio behind the optimization of the number of position in a spread portfolio lies in the
mean reverting behaviour of the spread of a pairs trade. No matter how big the deviation of
14
the mean, the spread is always expected to revert back to its long term equilibrium value. In
earlier literature about pairs trading, a fixed position in the portfolio is opened after the spread
hits a pre-set threshold some distance away (two standard deviations) from the long term mean
(Yakop (2011), Gatev et al. (2006), Vidyamurhty (2004)). After hitting the threshold value, the
position is held until the spread reverts back to the mean. When this happens, the position is
unwound and a profit is made. In the time that has passed between opening and closing the
position, the spread could have been significantly larger than it was when the trader first opened
the position. If this is the case, the trader can generate a much bigger profit by taking on more
positions proportional to the size of the spread.
In this thesis, the opportunity to generate a higher profit in a trade is explored by varying the
number of positions. The positions taken in a spread are optimized by using a utility function
based on the aforementioned principle of the expected returns-variance of returns by Markowitz
(1952), namely:
Ut (wpt+1 ) = Et
wpt+1 wpt
wpt
V art
wpt+1 wpt
,
wpt
sures the risk aversion of the trader (and is set to one when the strategy is evaluated). In
h
i
wpt
the paper of Markowitz (1952) it is stressed that finding reasonable values for Et wpt+1
wpt
h
i
wpt+1 wpt
and V art
by using reliable statistical techniques is essential. Both the stochastic
wpt
spread and the cointegration approach have these favourable characteristics. Now, lets define
{returnt+1 } as the value of a portfolio at time {t + 1} that invested one dollar in the spread
at time {t} . Using this definition for {returnt+1 }, the expected return and variance can be
evaluated using the following equations:
wpt+1 wpt
returnt+1
Et
= zt E t
,
wpt
wpt
returnt+1
2
V art [rt+1 ] = zt V art
,
wpt
where {zt } represents the number of positions taken in the spread portfolio. The value of
Et [returnt+1 ] is calculated with the use of the parameters estimated in the formation period.
The value of V art [returnt+1 ] is estimated in the formation period and is assumed to be constant
in the trading period.
The number of positions taken in the spread at any point in time can now be calculated by
maximizing the utility function with respect to {zt }. The first order condition is given by:
@Ut (zt )
returnt+1
returnt+1
=E
2 zt V ar
= 0.
@zt
wpt
wpt
Since the second derivative of the utility function is always negative ( > 0, V ar[returnt+1 ] > 0),
solving this first order condition for {zt } gives the number of positions to be taken in the spread
15
that maximize the utility function. This optimal value of {zt } at any point in time is given by:
zt =
E[returnt+1 ]
wpt .
2 V ar[returnt+1 ]
wpt+1 wpt
returnt+1
= zt
.
wpt
wpt
When the optimal value of zt is used, the return of the strategy is as follows:
rt+1 =
wpt+1
wp
wpt
t
Et [returnt+1 ]
returnt+1 .
2 V art [returnt+1 ]
It can be seen that the returns of this strategy are not dependent of the value of wpt .
Chapter 5
Evaluation
This chapter gives an evaluation of the results of the two model based approaches discussed in
chapters 2 and 3. The structure of this chapter is as follows. First, the definition of a Sharpe
ratio is given and a few concerns with the calculation of Sharpe ratios, as explained in the master
thesis of Yakop (2011), are discussed. In the second section, the results for both approaches are
given. The last section gives out of sample results of the dierent pairs trading strategies.
5.1
Sharpe ratio
A common way to compare the returns of dierent trading strategies is done by calculating
the reward-to-variability, nowadays also called the Sharpe ratio introduced by Sharpe (1966).
The Sharpe ratio gives the excess expected return of an investment to its return volatility. In
formula,
SR =
where E[rt ] and
E[rt ]
rf
(5.1)
are the expected return and standard deviation of the returns series {rt }. rf
is the average return earned by the benchmark in the evaluated period. The risk-free rate is
usually assumed to be an adequate benchmark for comparing the returns of the strategy. As
discussed in Yakop (2011), an adequate benchmark should act as an appropriate substitute for
pairs trading. Therefore, Yakop (2011) did not use the risk-free rate, but the composite index
of the stocks, in this case the AEX index. When calculating the Sharpe ratio with equation 5.1,
the rf is therefore set to zero. Afterwards, the calculated Sharpe ratios of the dierent trading
strategies are compared to the Sharpe ratios of the AEX index.
P
is found by substituting
The estimation of the Sharpe ratio, SR,
= T1 Tt=1 rt for E[rt ]
q P
and = T1 Tt=1 (rt
) for , which are the estimated mean and standard deviation of the
is based on
return series. As discussed in Yakop (2011), since SR
and (which are estimated
is (also) estimated with some error. Denoting the vector ( )0 by and
with some error), SR
16
CHAPTER 5. EVALUATION
17
the SR formula in equation 5.1 by g(), Lo (2002) shows that the asymptotic distribution of the
SR estimator is given by:
p
The estimation of
@g
@
T (SR
@g @g
.
@ @0
and and the derivation of the asymptotic distribution are not done in
5.2
Results
In this section the results of pairs trading with the Stochastic Spread approach and the Cointegration approach are given.
5.2.1
As mentioned in the third chapter, the Stochastic Spread model has three major advantages
from a theoretical point of view. The model captures mean-reversion, is continuous in time and
is completely tractable. Despite these hopeful properties of the model, the experienced empirical
results turn out to be less favourable.
First of all it takes a long time to estimate the parameters of one spread, let alone those of
the 276 dierent spreads available in the AEX (consisting of 24 stocks). To give an indication
of the time needed to estimate these spreads: a single formation period already takes forty-two
minutes. There are seven formation periods in this dataset. So the estimation of all the dierent
pairs in the dataset would take roughly five hours.
This first disadvantage stated above, is inconvenient but can be overcome by the use of
faster computers (or patience). However, another disadvantage is more problematic. After the
CHAPTER 5. EVALUATION
18
estimation of all the dierent spreads, the amount of pairs found suitable for pairs trading was
minimal. For example, the first formation period resulted in five suitable pairs. This is not
much, given the fact that there are 276 dierent pairs available.
Also, the parameters estimated from the pairs selected by this method, suggest that the
model can be simplified to a simple AR(1) model for the spread. Specifically, the parameter D
in the space equations is estimated to be at most 0.001. This suggest that the state-space model
can be brought back to the state equation, which is just a simple AR(1) model for the spread.
This AR(1) model has already extensively been tested in the context of pairs trading inYakop
(2011) and will therefore not be further analysed in this thesis.
So, despite the favourable theoretical properties, the use of the stochastic spread model for
pairs trading, which was suggested by Elliot et al. (2005), does not turn out to be a good
approach for pairs trading in practice.
Parameters of selected Pairs
Values
276
5
0.0062
B
C
0.9845
0.0007
0.2660
0.2
0.15
0.1
3
0.05
2
0
1
0.05
0
0.1
1
0.15
2
3
0.2
0.25
100
200
300
400
500
600
50
100
150
200
250
Figure 5.1: Example of a Pair selected with the Stochastic Spread Approach
300
CHAPTER 5. EVALUATION
5.2.2
19
Cointegration Approach
Contrary to the stochastic spread approach, the results of the cointegration approach are useful
for evaluating a pairs trading strategy. To begin the evaluation of the cointegration approach, an
overview of the specifics of the dataset and parameters used in the analyses are stated in table
5.2. As can be learned from table 5.2, results for three dierent lengths of formation periods
(respectively 128, 256 and 512 days) and the adjacent trading periods, are estimated. In these
dierent lengths, all the possible combinations of pairs (in this case 276 pairs) are tested with
the Johansen cointegration trace test described in 2.3 (with a significance level of 0.05). The
average amount of pairs found by this test for the dierent formation periods are also stated in
table 5.2.
Parameters
Description
Values
Number of stocks
23
RW
Rolling window
40
FP
Formation period
128 days
256 days
512 days
TP
Trading period
64 days
128 days
256 days
NT
28
23
13
NP
19
28
35
1316
CHAPTER 5. EVALUATION
20
1
4
0
3
Spread
Spread
3
1
4
0
5
20
40
60
80
100
120
140
10
20
30
40
50
60
Days
Days
70
15
10
Spread
Spread
0
1
5
2
10
0
20
40
60
80
100
120
140
10
20
30
40
50
60
70
Days
Days
CHAPTER 5. EVALUATION
21
returns, which can result in extreme overestimation of the SRs. Therefore, only the estimated
daily SRs are included in this thesis.
Furthermore, it has to be noted that the calculation of the daily returns did not incorporate the transaction costs. Including transaction costs in the investigation would require some
creativity, since the dierence between the bid and ask price of a stock is not reported (only
the daily closing prices are). The fee for making a transaction is also not commonly known.
Therefore, the inclusion of transaction costs within pairs trading justifies an entire research on
its own and shall not be further dealt with in this thesis.
As can be seen from the average Sharpe ratios of this strategy, the mean-variance optimization suers large losses in all the dierent formation periods length. This is a remarkable result,
since this strategy is supposed to maximize the value of the portfolio. Unfortunately, one critical
assumption of this strategy is that the selected pairs have the property of mean reversion. If this
assumption is not met and a pair drifts away, the number of positions taken in the spread will
increase dramatically and huge losses will be taken. The results show that there are to many
pairs that show this behaviour. Therefore the average Sharpe ratios of the dierent formation
periods are negative.
Benchmark
SR(AEX)
Average
Max
Min
Count
Std. Dev.
pos. SR
SR
>
Significant
SR(AEX)
at 5%
0.0073
128
-0.0447
0.1461
-0.1523
0.0583
13
0.0046
256
-0.0444
0.0585
-0.1079
0.0436
0.0439
512
-0.0376
0.0041
-0.0665
0.0227
CHAPTER 5. EVALUATION
22
Freqeuncy
0
0.2
0.15
0.1
0.05
0
SRs
0.05
0.1
0.15
0.2
Figure 5.3: Histogram of the estimated SRs of the MV strategy of formation period length of
128 days
To compare the mean-variance strategy with a less risky strategy, I also calculated the Sharpe
ratios using the common two standard deviation (2STD) strategy for opening a position. This
strategy is not as risky as the mean-variance optimization, because it will only open one position
at a time. The results of this strategy are stated in table 5.2.2. It can be seen that the 2STD
strategy returns positive average Sharpe ratios in the three dierent formation periods, where
the formation period of 128 days has the highest average. In contrast to the mean-variance
strategy, the pairs that do not converge and will drift away from the equilibrium will only have
a loss of two times the standard deviation. These losses are clearly overcome by all the pairs that
do behave as expected, which results in the positive average Sharpe ratios for all the dierent
trading periods.
CHAPTER 5. EVALUATION
Benchmark
SR(AEX)
23
Average
Max
Min
Count
Std. Dev.
pos. SR
SR
>
Significant
SR(AEX)
at 5%
0.0073
128
0.0209
0.0667
-0.0227
0.0233
25
15
11
0.0046
256
0.0147
0.0469
-0.0167
0.0221
19
12
0.0439
512
0.0081
0.0204
-0.0045
0.0092
10
5.3
To see if the results of the cointegration approach are robust, an second estimation of the
cointegration approach for both trading strategies is done. The second dataset consists of the
daily closing prices from the last five years of the DAX index (which includes the thirty biggest
listed German companies). The results of both trading strategies are given in the table 5.5
below.
As can be seen in table 5.5, the MV strategy is performing even worse in this dataset than it
did in the AEX dataset. The average daily SRs of the MV strategy for the dierent periods are
all negative and only in one TP does the MV strategy significantly outperform the DAX index
(FP:128 days). The 2STD strategy (again) performs better than the MV strategy and generates
small positive average SR in all the trading periods. The results of both pairs trading strategies
of both datasets are much alike. Therefore, it can be concluded that the results obtained are
robust.
Benchmark
Strategy
MV
2STD
SR(DAX)
Average
Max
Min
Count
Std. Dev.
pos. SR
SR
>
Significant
SR(AEX)
at 5%
0.0551
128
-0.0843
-0.0119
-0.1829
0.0486
0.0596
256
-0.0636
0.0165
-0.1056
0.0426
0.0316
512
-0.0429
-0.0225
-0.0536
0.0105
0.0551
128
0.0210
0.0689
-0.0181
0.0215
17
0.0596
256
0.0135
0.0429
-0.0069
0.0159
14
0.0316
512
0.0069
0.0148
-0.0045
0.0030
Chapter 6
Conclusion
In this thesis two dierent model based approaches for pairs trading were discussed and tested
with the use of two dierent trading strategies. Results were generated for the daily closing
prices of the stocks in the AEX index over the last five years. Furthermore, an out of sample
estimation was done to verify if the results where robust.
The first approach for modelling the behaviour of a pair, the stochastic spread, was first
suggested (but not yet tested) by (Elliot et al., 2005). From a theoretical point of view, the
stochastic spread has three major advantages. The model captures mean-reversion, is continuous
in time and is completely tractable. Despite these theoretical advantages, the empirical results
turn out to be less favourable in practice. First of all, the stochastic spread approach did not
find pairs suitable for trading. Secondly, the estimated parameters of the state-space form of
the model suggested that the model could be simplified to only the state equation (which is just
an AR(1) model). This renders the estimation of the parameters with the EM-algorithm and
Kalman filter unnecessary, since the AR(1) model is embedded in the other approach discussed
in this thesis. Therefore, only a few estimates and graphs of the spread are present and not the
actual results of pairs trading are present in this thesis.
The second approach for modelling the behaviour of a pair is the cointegration approach.
The idea of cointegration was already used for pairs trading in earlier papers (Yakop (2011),
Vidyamurhty (2004)). The approach in these earlier papers however, is more ad-hoc and not
based on the error correction model (ECM), which is normally used in econometric research. In
this thesis the cointegration approach is based on the ECM and the pairs are tested with the
use of the Johansen cointegration test.
Subsequently, two trading strategies for taking a position in the spread were used to calculate
the results. The first one is the two standard deviations strategy (2STD). This strategy is
commonly used in the literature (Yakop, 2011; Vidyamurhty, 2004, Gatev et al., 2006). The
concept of this strategy is very simple: one takes a position in the spread if it is far enough
(two standard deviation) away from the mean and closes the position when the spread returns
24
CHAPTER 6. CONCLUSION
25
to the equilibrium value. The second strategy is called the mean-variance approach (MV). As
the name suggests, the number of positions taken in the spread is determined by a trade-o
between the dierence from the spread of the mean and the variance of the spread. The spread
is expected to revert back to the mean and the MV strategy uses this assumption to maximize
the portfolio value by varying the number of positions taken in the spread.
The results of both strategies are in tables 5.2.2 and 5.3. The 2STD strategy generated
small positive returns over all the dierent formation periods. This result is typical for a pairs
trading strategy and is thus what you would expect. In contrary, the MV strategy generates
large negative SRs in all the formation periods. This is not what you would expect, because
this strategy aims to maximize the portfolio value by varying the number of positions in the
spread and should, consequently, perform well. However, one crucial assumption for the success
of this strategy, namely mean reversion, is not met by a large number of pairs. The number
of positions drastically increases in these pairs and the losses are substantial. This leads me to
the conclusion that the MV strategy might be too risky (in this case, at least) for pairs trading.
The estimation of the second dataset (DAX index) confirms this, because similar results were
generated. Given the fact that two indices produced similar results, one can conclude that these
results are robust.
Further research in pairs trading should focus on other ways to optimize the trading strategy,
since the MV procedure did not generate the desired results. Furthermore, the inclusion of
transaction costs within pairs trading is a relevant topic that should be taken into account,
but has not yet been investigated. One could also investigate the concept of pairs trading for
more than two securities, such as triple or quadruple trading. The cointegration approach
discussed in this thesis could be a good way for investigating this topic, since the existence of a
cointegration relation between three or four stock can be easily tested within this framework.
Bibliography
Baronyan, S. R., Boduroglu, I. I., and Sener, E. (2010). Investigation of Stochastic Pairs Trading
Strategies under dierent Volatility Regimes. The Manchester School, pages 114134.
Broda, S. (2011). Financial econometrics slides.
Do, B., Fa, R., and Hamza, K. (2006). A New Approach to Modeling and Estimation for Pairs
Trading. Working Paper, pages 130.
Elliot, M. J., van der Hoek, J., and Malcolm, W. (2005). Pairs Trading. Quantitative Finance,
5(3):271276.
Engle, R. F. and Granger, C. W. (1987). Co-integration and Error Correction:representation,
Estimation and Testing. Econometrica, 55(2):251276.
Gatev, E., Goetzmann, W. N., and Rouwenhorst, K. G. (2006). Pairs Trading: Performance of
a Relative-Value Arbitrige Rule. Review of Financial studies, 19(3):797827.
Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.
Johansen, S. (1991). Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian
Vector Autoregressive Models. Econometrica, 59(6):15511580.
Kalman, R. (1960). A new Approach to Linear Filtering and Prediction Problems. Journal of
Basic Engineering, 82:3545.
Lo, A. (2002). The statistics of Sharpe Ratios. Financial Analysts Journal, July/August:3652.
Markowitz, H. (1952). Portfolio Selection. Journal of finance, 7(1):7791.
Sharpe, W. (1966). Mutual Fund Performance. The journal of Business, 39(1):119138.
Shumway, R. and Stoer, D. (1982). An Approach to Time Series Smoothing and Forecasting
using the EM Algorithm. Journal of Time Series Analysis, 3:253264.
Tsay, R. S. (2010). Analysis of Financial Time Series. John Wiley and Sons, Inc., third edition
edition.
26
BIBLIOGRAPHY
27
Vidyamurhty, G. (2004). Pairs Trading, Quantitative Methods and Analysis. John Wiley and
Sons, Inc.
Yakop, M. (2011). A Comparative Analysis of Pairs Trading. Masters thesis, University of
Amsterdam.