You are on page 1of 31

T H E AP P L I C ATI O N O F PAI R T R A D I N G

TO DIFFERENT STOCKS USING R

A research project submitted in the


partial fulfillment of the
requirements for the degree of
M.Tech
in
Data Science
by
Gunjan Dadhich
A-607

UNDER SUPERVISION OF
pROF. sIBA PANDA

2 0 1 5 - 2 0 1 7 D E C L A R ATI O N
I hereby declare that the research project titled, The
Application of Pair Trading to Stock Markets, submitted
by me is based on original work carried out by me. I
certify that it has not been submitted anywhere else. We
further declare that Mukesh Patel School of Technology
Management and Engineering-NMIMS (deemed-to-beuniversity) will have the copyright on the project report
submitted by me to the college (MPSTME).

Thanking You
Gunjan Dadhich

ACKNOWLEDGMENT
It is my proud privilege to release the feelings of my
gratitude to several persons who helped me directly or
indirectly to conduct this research project work. I express
my heart full indebtness and owe a deep sense of
gratitude to my faculty guide Prof. Siba Panda, Prof.
Sarada Samantaray for their sincere guidance and
inspiration in completing this project. I am extremely
thankful to the Mr. Anshul Gupta, Mr. Hemant Palivela
and all faculty members of M.Tech Data Science of
MPSTME for their coordination and cooperation and for
their kind guidance and encouragement. I also thank all
my friends who have more or less supported and
encouraged me to complete this project. I will be always
indebted to them. The study has indeed helped me to
explore more knowledgeable avenues related to my topic
and I am sure it will help me in my future.
Gunjan Dadhich
A-607
M.Tech (Data Science)

TABLE OF CONTENTS

ABSTRACT
This project is to implement the usefulness of a hedge fund trading strategy known as
pairs trading applied to different stocks. The profit return of a simplified pairs
trading strategy is modeled by using a mean-reverting process of the futures price
spread. As per the comparative statics of the model, the high mean-reversion and high
volatility of the spread give rise to the high overall return from trading. Analyzing
energy futures (more specifically, HPCL and BPCL) traded on the National Stock
Exchange, we present empirical evidence that pairs trading can produce a relatively
stable profit. We are using the static model in this project where we are calculating the
hedge ratio from the historical prices. We are using linear regression to find out the
Hedge ratio, and doing ADF test to check the Co integration between the stocks. We
are calculating the return from the pair trading and plotting the significant trading
strategy on the spread where we are selling the expensive and buying the cheap stock
when the spread is moved above certain extend. The data has been taken from Yahoo
finance website and cleaned and formatted as per the project requirement using R,
Also we are using R tool for the implementation of the pair trading. The another Stock
pair of AAPL and QQQ is taken to implement the pair trading in R ,this has been done
4

with the Library pairtrading.We also suggested that the pair trading can be good
with the high frequency data with the entry point and exiting point is calculated on the
high frequency data as compare to the static model. We have included HFT processing
as the future scope of this project. This project is more focused on the in depth study
of the pair trading concept and normal implementation of the pair trading concept in
the R using Quant library of the R.

2. INTRODUCTION
The Pair trading or commonly known as statistical arbitrage, is the most popular
trading strategy among hedge funds, as they are perfect for the minimized risk and
ability to produce returns in any of market environment that the pair trading strategy
gives. Pair trading is been there since the invention of markets, Jesse Livermore, is the
one of the most famous traders of his time he used pair trading back in the late 1800s,
he would recognize a strong stock then short what he called the sister stock. Pair
trading really used with large investment banks and hedge funds in the 1980s with the
help of increased uses of computers.
In pairs trading strategy the trader identifies two brands of stock prices that are highly
correlated which means the two stock prices moves significantly together based on
their price histories and then starts the trades by opening long and short positions of
those two brands selected. The pair trading strategy solely depends on the correlation
of the brand of stocks and the hedge ratio, a ratio that will compares the value of
futures positions that have been bought or sold to the value of the underlying

commodity being hedged. It also can be in reference to the ratio that compares the
value of some part of a security position being hedged with the size of the entire
position itself. If we are for example long one unit of P, how many units of Q should
we sell short? That quantity is known as Hedge Ratio. In this study we are doing the
pair trading on the two Indian stocks from the same industries Hindustan petroleum
corporation limited (HPCL) and Bharat Petroleum Corporation limited (BPCL). We
will be finding the correlation between these two stocks and then trying to fit
historical prices of these models in to the regression model to calculate the Hedge
ratio, then creating the spread for these stocks. The statistical programing language, R
is used for the implementation of the strategy and the R packages QUANTMOD.
The Gatev-Goetzmann, and Rouwenhorst in (2006) perform the empirical tests of
pairs trading on the common stock. They demonstrated that a pairs trading strategy is
much more profitable, even after taking into account such as transaction costs. The
Jurek and Yang (2007) equate the performance of their optimal mean-reversion
strategy with that of Gatev, Goetzmann, and Rouwenhorst (2006) using the simulated
data. They demonstrate that their strategy provides even better performance than the
Gatev-Goetzmann, and Rouwenhorst. Although a pairs trading strategy has been
applied primarily as a stock market trading strategy, there is no need to limit the
strategy to that asset class. A pairs trading strategy generally requires two highly
correlated prices.
High Frequency Data also may be used in conjunction with a pairs trading strategy. In
pairs trading strategies, a trader takes conflicting long and short positions in two
assets when the difference their prices hits a decided opening threshold. These
positions are then closed when a definite closing threshold is reached. The difference
in prices that the trader uses to judge when to open and close a position is commonly
referred to as the spread between the pair of assets. The two stocks identified are
expected to move together due to their status as close substitutes for each other.
Examples of pairs include oil manufacturing companies, large financial institutions,
and some credit card companies. The Pairs trading strategies seek to exploit
temporary mispricing of assets within the market and thus, they rely on meanreversion and develop market-neutral portfolios whose net market exposure is
negligible.
Recently, with the growing admiration of HFT various studies which examine that the
applicability of pairs trading strategies to high frequency environments have been
performed. The Bowen et al. (2010) examine the importance of high frequency
strategies to market attributes, noting that primary returns to their strategy arise in the
very first hour and last hour of trading days, when the trading volume is expected to
be highest.

3. Background of Pairs Trading


3.1 History
6

The history of pair trading is bit interesting somewhere in mid-1980's the Wall Street
quant Nunzio Tartaglia decide to assembled a team of good physicists, good
mathematicians and some of computer scientists to uncover some arbitrage
opportunities in the equities markets. During that period Tartaglia's groups of former
academics used some sophisticated statistical methods to develop a high-tech trading
programs, which are executable through automated trading systems that took the
intuition and trader's skill out of arbitrage and replaced it with disciplinedand more
consistent filter rules. Among other things Tartaglia's programs identified such pairs
of securities whose prices are tended to move together. They traded these pairs with
huge success in 1987 a year when the group reportedly made a $50 million profit for
the firm totally. Although the Morgan Stanley group is disbanded in 1989 after a
couple of bad years of performance but pair trading become an increasingly popular
market-neutral investment strategy which is afterwards used by individual and
institutional traders as well as hedge funds. The amplified popularity of quantitativebased on statistical arbitrage strategies has also apparently affected profits in a New
York Times interview, David Shaw head of one of the most successful modern quant
shops and himself an early Tartaglias acolyte, suggests that recent pickings for quantshops have become slim he attributes the success of his firm D.E. Shaw to early entry
into the business. The Tartaglia's own explanation for pairs trading strategy is
psychological. He claims that Human beings don't like to trade against human
nature, which wants to buy stocks after they go up not down. 1 Could pairs traders be
the self-controlled investors taking advantage of the undisciplined over-reaction
displayed by individual investors.

3.2 The Data Snooping and Effective Market Response


In our project we have not explored over the full strategy space to identify effective
trading rules but rather we have interpreted practitioner description of pairs trading as
straight forwardly as possible. Our rule follow the general outline of first find stocks
that move together and second take a long-short position when they diverge and
unwind upon convergence of the stocks. The test requires that both of the above steps
must be parameterized in some way. The main questions which put up is how to find
the stocks which are moving significantly similar and how we will be deciding the
open position and close position of the trading. We have made straightforward choices
about each of these questions. We draw positions on at a standard deviation spread
which might not always cover transactions costs even when stock prices converge.
Even though it is tempting to try potentially more profitable schemes, the danger in
data-snooping enhancements outweighs the potential insights gained about the higher
profits that could result from the learning through testing. As with all filter rules using
historical asset pricing data, data-snooping is a big concern. Also the one approach
towards the data snooping issue is to test the results out-of-sample. We are using data
through the end of 2013.

3.3 Relative Pricing


The asset pricing can be viewed in absolute as well as relative terms. The absolute
pricing values stocks from fundamentals such as discounted future cash flow. This is a
extremely difficult process with a wide margin for error. The papers by Bakshi and
Chen (1997) and Lee et al. (1997) for example are great attempts to build quantitative
value for investing models. Relative pricing is only somewhat easier. The relative
pricing means that any two securities which are close substitutes for each other should
sell for the same price it does not say how much that price will be. Thus relative
pricing allows for bubbles in the economy, but is not necessarily arbitrage or
profitable speculation.
Law of One Price [LOP] and a near-LOP is only applicable to relative pricing
even if that price is wrong. The Ingersoll (1987) defines the LOP as the
proposition ... that two investments with the same payoff in every state of nature
must have the same current value. In simple words two securities with the same
prices in all states of the world should be selling for the same amount. The Chen and
Knez (1995) extend this by giving argue that closely integrated markets should
assign to similar payoffs prices that are close. They argue that two securities with
similar but not necessarily matching payoffs across states should have similar prices.
This is of course a weaker condition and subject to bounds on prices for unusual
states, however it allows the examination of near efficient economies, or in Chen
and Knez case, near integrated markets. Notice that this theory corresponds to the
desire to find two stocks whose prices move together as long as we can outline states
of nature as the time-series of observed historical trading days.
We have use an algorithm to select pairs based on the criterion that they have had the
same state prices historically. Then we trade pairs whose prices closely match in
historical state-space, since the LOP suggests that in an efficient market, their prices
should be nearly identical. In this project, the current study can be viewed as a test of
the LOP and near-LOP in the equity markets, under certain stationary conditions.
Here we are effectively testing the integration of very local markets the markets for
specific individual securities. This is similar to Bossaerts (1988) test of co-integration
of security prices at portfolio level. We further speculating that the marginal profits to
be had from risk arbitrage of these temporary deviations are crucial to the
maintenance of first-order efficiency. We could not have the first effect without the
second.
3.4 Co-integrated Prices
The pairs trading technique may be justified within symmetrical asset-pricing
framework with non-stationary common factors like Bossaerts and Green (1989) and
Jagannathan and Viswanathan (1988). In which if the long and short components
fluctuate with common non-stationary factors then the prices of the component
portfolios would be co-integrated and the pairs trading strategy would be expected to
work. Evidence of exposures to common non-stationary factors would support a nonstationary factor pricing framework. Co-integration should not be confused by
correlation specifies the co-movement of returns this gives the short term
relationships, whereas the Co-Integration specifies the co-movement of the prices and
it gives long term relationship.
8

The area of normalized and cum-dividend prices, i.e. cumulative total returns with
dividends re-invested, is the basic space for the pairs trading strategies in this project.
The main observation about our motivating models of the HPCL-BPCL variety is that
they are known to imply perfect collinearity of prices which is readily rejected by the
data. On the other hand, Bossaerts (1988) finds evidence of price co-integration for
the US stock market. We would like to keep the concept of the empirically observed
co-movement of prices, without unnecessarily restrictive assumptions, hence we
proceed in the spirit of the co-integrated prices method.
More precisely, our matching in price space can be interpreted as follows. Suppose
that prices obey a statistical model of the form,
p it = il p lt + it , k < n (1)
where it denotes a weakly dependent error in the sense of Bossaerts (1988). Assume
also that pit is feebly dependent after differencing once. Under these assumptions, the
price vector pt is co-integrated of order 1 with co-integrating rank r = n-k , in the
sense of Engle and Granger (1987) and Bossaerts (1988). Thus, there exist r linearly
independent vectors { q}q=1.. r such that zq = q `pt are weakly dependent. That means,
r linear combinations of prices will not driven by the k common non-stationary
components pl. Also note that this interpretation does not imply that the market is
inefficient, in contrary it says that certain assets are weakly redundant, so that any
deviation of their price from a linear combination of the prices of other assets is
expected to be temporary and returning.
In pair trading to interpret the pairs as co-integrated prices, we need to assume that for
n k, there are some co-integrating vectors which have only two nonzero coordinates.
In the case like this the sum or difference of scaled prices will be reverting to zero and
a trading rule could be constructed to exploit the expected temporary deviations. Our
strategy relies upon exactly same conclusion. In principle we could construct trading
strategies with trios, quadruples, etc. of stocks which would presumably capture more
co-integrated prices and would give better profits.
The hypothesis that a linear combination of two stocks can be weakly dependent may
be understood as saying that a co-integrating vector can be partitioned in two parts,
such that the two corresponding portfolios are priced within a weakly dependent error
of another stock. With given the large universe of stocks, this statement is always
empirically valid and provides the basis of our formation of procedure.
3.5 The Bankruptcy Risk
The unpredicted risk of bankruptcy is one of the reasons why the returns on individual
securities cannot be taken as stationary. Sensitivity of the pairs trading to the default
premium suggests that the strategy can work because we are pairing two firms, the
first of which may have a constant or decreasing probability of bankruptcy (short
end), while the second may have a momentarily increasing probability of bankruptcy
(long end).And the wonder improvements in the short end are then followed by
improvement in the long end if that stock survives. In other words, the source of the
profit is the improving ex-post (non) realization of bankruptcy risk in the long (loser)
stock. In such case, we would expect to have asymmetry in the profits from the long
and the short components, with most of the profits coming from the long end. We
have to test long and short positions separately to see if this is driving our results.

4. Research Methodology.
In this study, we first select the pair of stocks HPCL and BPCL and their historical
prices and then we will check if these two stocks are correlated or not. Once the
correlation is found will run the regression model to confirm the correlation and find
the hedge ratio ,which is nothing but gives the equation on if we have one long unit
of HPCL how many units of BPCL we should sell. This ratio will help in creating the
Spread on the prices of HPCL and BPCL. Now we will be deciding our trading
strategy such that.
1)
2)
3)

For each time point in the time series, calculate the risk-adjusted spread
between the two assets of the pair.
Call the amount the spread deviates from a measure of the historical spread
the signal. If the signal is greater than or equal to the opening threshold,
open a position if not already in one.
If the spread is above its historical mean, then we expect that stock 1 is
overpriced and stock 2 is under-priced. Thus, we short-sell stock 1 and buy
10

4)
5)
6)

stock 2. On the other hand, if the spread is under its historical mean, we
buy stock 2 and short-sell stock 1.
If the signal is less than the closing threshold, close any existing position
in the pair.
If the signal is greater than the stop-loss threshold, we close the position.
If a position is open on the last time point in the data series, we close the
position.
Relation Between the two prices for the yearlong interval 2012-2013 is
ben downloaded by using the Quantmod package in R for the given time
period.

Fig1: It gives the prices plots of the two stocks Red is for HPCL and Green is for BPCL.

4.1 Regression Analysis


Regression is a very interesting topic. Regression is a widely used as a statistical tool
in economics, nance and trading. R provides the pre-written functions that perform
linear regressions in a very up-front manner. Also there exist multiple add-on
packages that allow for more advanced functionality. In this project we will only
utilize the lm() function which is available in the base installation of R. The following
example demonstrates the use of this function:
outR < - lm(Stock_y ~ Stock_x)
summary(outR)
The calling to the function lm ( ) performed an OLS (Ordinary Least Squares) t to
the function: y = b0 + b1x + e, where e was distributed as N(mu, sigma^2).

The ~ sign is used to separate the independent from the dependent variables. The
expression Stock_y ~ Stock_x is a formula that species the linear model with one
independent variable and an intercept. If we wanted to t the same model, but
without the intercept, we would specify the formula as Stock_y ~ Stock_x 1. This
tells R to omit the intercept (force it to zero).In the trading application we have to run
the model without the intercept as the trader will only be interested on the significance
coefficient of the two stocks and not with the intercept.
Model1 <- lm(pdtHPC ~ pdtBPC -1)
Whenever a regression is performed, it is very important to analyze the residuals (e)
of the tted model. If everything goes according to plan, the residuals will be
normally distributed with no visible pattern in the data, no auto-correlation and no
heteroskedasticity. The residuals can be extracted from the regression object by using
the residuals keyword.
res<- model1$residuals
plot(res)
acf(res)
The given function acf ( ) computes and by default plots the estimates of
the autocovariance or autocorrelation function. And we can check the
correlation of the two stocks with this test.

Fig2 : figure below shows the randomness of the residulas.

12

Fig3: The ACF plot to show the Autocorrelation of the two stocks.
The below summary keyword is used to obtain the results of the linear regression
model t.
Summary(model)
Along with the other variables the p-values and t-statistics is used to evaluate the
statistical signicance of the coefcients. The lesser the p-value, the more certain we
are that the coefcient estimate is close to the actual population coefcient. Both the
intercept and the independent variable coefcient is signicant in this example. The
extraction of the coefcients can be done by coefficient variable name.

Regression On Returns of HPCL and BPCL (2008-2012)


Residual standard error

102.8 on 1265 degrees of freedom

p-value
Coefficients:
Adjusted R-squared:

< 2.2e-16
1.582053
0.9637

Table 1 : Regression on the prices of HPCL and BPCL

4.2 PAIR TRADING using two stocks HPCL and BPCL.


We will explore a simple two-legged spread between BPCL and HPCL. Once we
download and transform the timeseries for both stocks, we will define a simple
trading rule and explore the trades that our signal generates. Various trade statistics
and graphs will be presented.
Step 1: Obtain data via quantmod.

#Including quantmod to load the security symbols


require(quantmod)
symbols <- c("HINDPETRO.NS","BPCL.NS")
getSymbols(symbols, from='2010-01-01',to = '2013-12-31')

Now that our data frames for HPCL and BPCL are loaded into memory, lets extract
some prices.
The data is in below manner.
BPCL.NS.
BPCL.NS. BPCL.NS. BPCL.NS.C BPCL.NS.Vol BPCL.NS.Adj
Open
High
Low
lose
ume
usted
635.5
635.5
635.5
635.5
0
260.825
635.55
657.6
632
652
1429000
267.597
656.45
656.45
639.2
640.2
1609600
262.754
642
648.4
628.1
629.7
1943100
258.444
631
637.35
614.1
620
1845500
254.463
619
635.95
619
628.9
1144200
258.116

Table 2: The format of the data downloaded from the yahoo Finance by the
QUANTMODE library
The data is on daily basis
OPEN: The price on the open day of the stock market.
HIGH: The highest price the stock reached that day.
LOW: The lowest price the stock reached that day.
CLOSE: The price of the last trade when the market closed that day.
14

VOLUME: The number of share traded that day.


ADJUSTED CLOSE: Close price adjusted for dividends and splits.
Step 2: Extract prices and time ranges.

#we are defining the training set


Start_T <- "2010-01-01"
End_T <- "2013-01-01"
Range_T <- paste(Start_T,"::",End_T,sep ="")
tBPCL <- HINDPETRO.NS [,4][RangeT]
tHPCL <- BPCL.NS [,4][RangeT]

#we are then defining out of sample set


Start_O <- "2013-02-01"
End_O <- "2013-12-01"
Range_O <- paste(Start_O,"::",End_O,sep ="")
oBPCL <- HINDPETRO.NS[,4][Range_O]
oHPCL <- BPCL.NS[,4][Range_O]

We have to be careful how to define in_sample and out_of_sample range. Which we


will use the in-sample data to calculate a simple hedge ratio and then we will apply
this hedge ratio to the out of sample data. The in sample is taken with the two year
data and out sample is only one year data ,we will be developing the model with the
historical data and will then run it on the out data.
Step 3: Compute returns and find hedge ratio.
#compute price differences on in-sample data
pdtBPCl <- diff(tBPCL)[-1]
pdtHPCL <- diff(tHPCL)[-1]
#build the model
model <- lm(pdtBPCL ~ pdtHPCL - 1)
#extract the hedge ratio
hr <- as.numeric(model$coefficients[1])
#hr is 0.14799.

Regression On Returns of HPCL and BPCL (2008-2012)


Residual standard error
16.34 on 1264 degrees

p-value
0.008721
Coefficients:
0.14799
Adjusted R-squared:
Table 3: Regression Results on the return of the two stocks.

0.5061

Step 4: Construct the spread


Construct the spread between the two stocks after stripping out the effects of
cointegration where the spread at time t, S t=P1cointP2.
Spread calculation for each time point in the time series, calculate the risk-adjusted
spread between the two assets of the pair. Calculation of the spread is specified in
Table 1 and depends on the allocation ratio chosen.
#calculating spread price (in-sample)
Spread_T <- tBPCL - hr * tHPCL

#compute statistics of the spread


Mean_T <- as.numeric(mean(Spread_T,na.rm=TRUE))
Sd_T <- as.numeric(sd(Spread_T,na.rm=TRUE))
Upper_Thr <- MeanT + 1 * SdT
Lower_Thr <- MeanT - 1 * SdT
Upper_Thr2 <- MeanT + 2 * SdT
Lower_Thr2 <- MeanT - 2 * SdT

#visualizING the in-sample spread + stats


plot(Spread_T, main = "BPCL vs. HPCL spread (in-sample period)")
abline(h = Mean_T , col = "red", lwd =2)
abline(h = Mean_T + 1 * Sd_T , col = "blue", lwd=2)
abline(h = Mean_T - 1 * Sd_T , col = "blue", lwd=2)
abline(h = Mean_T + 2 * Sd_T , col = "blue", lwd=4)
abline(h = Mean_T - 2 * Sd_T ,col = "blue", lwd=4)

16

Fig 4: Spread for the year 2008 to 2012.

Fig 5: Spread is for the period 2012 to 2013 .

Lets look at the distribution of the spread.


hist (Spread_T, col = "BLUE", breaks = 100, main = "Spread Histogram
(BPCL vs. HPCL)")
abline(v = Mean_T, col = "RED", lwd = 2)

Fig 6: Distribution of spread.

Step 5: Define the trading rule.

We have to decide now our trading strategy ,once the spread will exceeds our
upper threshold, we sell BPCL and buy HPCL. Once the spread drops below
our lower threshold, we buy BPCL and sell HPCL.
Ind_Sell <- which(Spread_T >= meanT + sdT)
Ind_Buy <- which(Spread_T <= meanT - sdT)

Step 6: Figure out the trades.


Spread_L <- length(Spread_T)
Prices_B <- c(rep(NA,Spread_L))
Prices_S <- c(rep(NA,Spread_L))
Sp_T
<- as.numeric(Spread_T)
Trade_Qty <- 100
Total_P <- 0

for(i in 1:Spread_L) {
spTemp <- Sp_T[i]
if(spTemp < Lower_Thr)
{
if(Total_P <= 0)
{
Total_P <- Total_P + Trade_Qty
Prices_B[i] <- spTemp
}
} else if(spTemp > Upper_Thr)
{
if(Total_P >= 0)
{
Total_P <- Total_P Trade_Qty
Prices_S[i] <- spTemp
}
}
}

18

Fig 7: The graph gives the position (red Dots) where we have open our position of
trading and yellow dot where we have stop trading.

Fig 8: this gives the performance of the returns.

4.3 PAIR TRADING USING LIBRARY


Now taking two different pair of stocks AAPL and QQQ and finding out if there is
cointigration between these two stocks.
This time we are using the Library PairTrading in R.

Fig 9: Shows the two stocks APPL and QQQ are moving together .
library(PairTrading)
pair.price<-cbind(tAAPL,tQQQ)

Here we are taking adjusted values of the two stocks which include the adjustment of
the dividends.
reg1<-EstimateParameters(pair.price, method = lm)
Estimate Parameters function calculate the spread of the two stocks and hedge ratio of
the two stocks and the premium. Its a pre-defined function in Library PairTrading.
reg1$hedge.ratio
plot(reg1$spread)

20

Fig10: the spread of the regression on the prices.


This spread has been calculate by the following formula
Spread = log(y) Alpha + Betalog(x),,,,,,,,,x and y are stock prices.
params <- EstimateParametersHistorically(pair.price,
period = 180)
This command is used to get the values of hedge ratio from the previous values in this
case we have taken 180 historical prices to calculate the hedge ratio and now the
spread will be created according to this hedge ration.
plot(params$spread)

Fig 11: This the spread after getting the hedge ratio from the historical values

signal <- Simple(params$spread,0.02)


signal

barplot(signal,col="blue",space = 0, border =
"blue",xaxt="n",yaxt="n",xlab="",ylab="")
par(new=TRUE)
plot.ts(params$spread, type="l", col = "red",
lwd = 3, main = "Spread & Signal")
abline(h = upperThr, col = "blue", lwd = 2)

return.pairtrading <- Return(pair.price, lag(signal),


lag(params$hedge.ratio))
return.pairtrading
plot(100 * cumprod(1 + return.pairtrading),
main = "Performance of pair trading")

Fig 13: the performance of the returns.


22

5. Results and Conclusion


We examine a hedge fund equity trading strategy based on the notion of co-integrated
prices in a reasonably efficient market, known as pairs trading. Pairs are stocks which
are close substitutes according to a minimum distance criterion using a metric in price
space. We find that trading suitably formed pairs of stocks exhibits profits, which are
robust to conservative estimates of transaction costs. We have implemented pair
trading in R using the Quant library, we have successfully found out the open position
and the close positions for the given stocks. And showed that the expansion of the
price spread will increase the return.

6.

Future Scope

In this project we have worked on the static model, the hedge ratio is calculated on the
historical values of the prices and the trading strategy is decided for the large interval
of time. Recently, with the growing popularity of HFT, various studies which examine
the applicability of pairs trading strategies to high frequency environments have been
performed. Bowen et al. (2010) examine the sensitivity of high frequency strategies to
market attributes, noting that the primary returns to their strategy arise in the first hour
and last hour of trading days, when trading volume is expected to be highest.
As the future scope we can build the dynamic model to calculate the hedge ratio
dynamically and then changing our trading strategy frequently to maximize the profit.

7. References
The paper of Bakshi, G. and Z. Chen, 1997, Stock Valuation in Dynamic
Economies, working paper, OhioState University.
The literature of DAvolio, G., 2002, The Market for Borrowing Stock, Journal of
Financial Economics, 66,271-306.
The literature of Bossaerts, P., 1988, Common Nonstationary Components of Asset
Prices, Journal of

Economic Dynamics and Control, 12, 347-364.


The literature of Jagannathan, R. and S. Viswanathan, 1988, Linear Factor Pricing,
Term Structure of Interest Rates and the Small Firm Anomaly, Working Paper 57,
Northwestern University.
The literature of Gatev, Goetzmann, and Rouwenhorst (2006), Pairs Trading:
Performance of a Relative Value Arbitrage Rule, Yale ICF Working Paper No. 08-03.
The literature of Kishore, Vayu, "Optimizing Pairs Trading of US Equities in a High
Frequency Setting" (2012).Wharton Research Scholars Journal.Paper 92.
http://www.rfortraders.com/ , for the R algorithm
Yahoo Finance, to get the data for different stocks..

8. Coding.
#getting better understanding of linear regression
x<- rnorm(1000)
y<- (x-2) + rnorm(1000)
lmout<- lm(y~x)
summary(lmout)
#y=-2.007+.96x
plot (lmout$residuals)
#residual error is high it means that the X and Y are highly corellated
plot(lmout)
#also the residual plot is Random not following any patten
res<- lmout$residuals
plot(res,type="l")
# acf will the bet scenario of checking wheather the varailbles is corellated
acf(res)
lmout$coefficients
require(quantmod)
symbols <- c("HINDPETRO.NS","BPCL.NS")
#write.csv(BPCL.NS,"BPCL.CSV")
#write.csv(BPCL.NS,"BPCL.CSV")
getSymbols(symbols,from='2010-01-01',to = '2013-01-01')
24

summary(HINDPETRO.NS)
HINDPETRO.NS[,6]
tHPC <- HINDPETRO.NS[,6]
tBPC <- BPCL.NS[,6]
plot(tHPC,tBPC)
plot.ts(tHPC,type='l',col="red",main = " Price HPC(RED) vs.
BPC(Green)",ylab='Ad.Price',ylim=c(100,500),xlim=c(0,1000))
par(new=TRUE)
plot.ts(tBPC,type='l',col="green",ylab='Ad.Price',ylim=c(100,500),xlim=c(0,1000))
View(tHPC)
View(tBPC)
length(tBPC)
length(tHPC)
tBPC<- tBPC[1:763]#length(tBPC)
cor(tHPC,tBPC) #0.2831817
#We will use the data to compute a simple hedge ratio
#and then we will apply this hedge ratio to the out of sample data.
#CALCULATE THE RETURN
pdtHPC <- diff(tHPC)[-1]
pdtBPC <- diff(tBPC)[-1]
cor(pdtHPC,pdtBPC)#0.329 there is corelation
# the above plot gives the good understanding of the relationship of both the stocks.
plot.ts(pdtHPC,type='l',col="red",main = " REturn HPC(RED) vs.
BPC(Green)",ylab='RETURN',ylim=c(-15,15),xlim=c(0,10))
par(new=TRUE)
plot.ts(pdtBPC,type='l',col="green",ylab='RETURN',ylim=c(-15,15),xlim=c(0,10))
#build the model
length(pdtHPC)
length(pdtBPC)
model <- lm(pdtHPC ~ pdtBPC -1)
model <- lm(pdtBPC ~ pdtHPC,-1)
summary(model)
model$coefficients[1]
res1<- model$residuals
plot(res1)
acf(res1)
?acf
model$coefficients
hr<- as.numeric(model$coefficients[1])

hr
?acf
spread_T#spread price (in-sample)
#tAAPL = X + hr *tQQQ
# X= tAAPl - hr*tQQQ
# X is nothing but the spread
spread_T <- tHPC - hr * tBPC
fix(spread_T)
# to calculate the mean fo the spread
meanT <- as.numeric(mean(spread_T,na.rm=TRUE)) ; meanT
# to calculate the Standard deveiation for the spread.
sdT <- as.numeric(sd(spread_T,na.rm=TRUE)) ;sdT
#similarlly calculating the first and the second SD for the spread.
upperThr <- meanT + 1 * sdT
lowerThr <- meanT - 1 * sdT
upperThr2 <- meanT + 2 * sdT
lowerThr2 <- meanT - 2 * sdT
sdT
upperThr2
lowerThr2
?abline
plot(spread_T)
spread_T
plot(spread_T, main = "HPC vs. BPC spread (in-sample period)")
abline(h = meanT, col = "red", lwd =2)
abline(h = meanT + 1 * sdT, col = "blue", lwd=1.5)
abline(h = meanT - 1 * sdT, col = "blue", lwd=1.5)
abline(h = upperThr2, col = "blue", lwd=2)
abline(h = lowerThr2, col = "blue", lwd=2)
points(xts(prices_B,index(spread_T)), col="green", cex=1.9, pch=19)
points(xts(prices_S,index(spread_T)), col="red", cex=1.9, pch=19)
points(xts(prices_B,index(spread_T)), col="green", cex=1.9, pch=19)
points(xts(prices_S,index(spread_T)), col="red", cex=1.9, pch=19)

##Once the spread exceeds our upper threshold,


#we sell BPCL and buy HPCL. Once the spread drops below our lower threshold,
# we buy BPCL and sell HPCL.
ind_Sell <- which.min (spread_T <= meanT + sdT)
spread_T[124]
ind_Buy <- (spread_T = meanT - sdT)
ind_Sell <- which(spread_T >= meanT + sdT)
ind_Buy <- which(spread_T <= meanT - sdT)
spread_T[124] ;spread_T[249];spread_T[254]# all the values are above the first Sd
26

ind_Buy
spread_T[1] ;spread_T[125] #all the values are below the 1sd
spread_T
spread_L <- length(spread_T)
prices_B <- c(rep(NA,spread_L))
prices_S <- c(rep(NA,spread_L))
prices_B
prices_S
sp <- as.numeric(spread_T)
sp
spread_L
View(spread_T)
tradeQty<-1000
totalP <- 0
tradep<- 0
for (i in 1:spread_L) {
spTemp<- sp[i]
if(spTemp < lowerThr) {
if(totalP <= 0){
totalP <- totalP + tradeQty
prices_B[i] <- spTemp
}
} else if(spTemp > upperThr) {
if(totalP >= 0){
totalP <- totalP - tradeQty
prices_S[i] <- spTemp
}
}
}
plot(spread_T, main = "AAPL vs. QQQQ spread (in-sample period)")
abline(h = meanT, col = "red", lwd =2)
abline(h = meanT + 1 * sdT, col = "blue", lwd = 2)
abline(h = meanT - 1 * sdT, col = "blue", lwd = 2)
points(xts(prices_B,index(spread_T)), col="green", cex=1.9, pch=19)
points(xts(prices_S,index(spread_T)), col="red", cex=1.9, pch=19)
?cex
xts(prices_B,index (spread_T) )
?points
# the BELOW CODING IS FOR THE STOCKS AAPL AND QQQ USING
library(PairTrading)
library(PairTrading)
#COMBINING THE PRICES OF HE TWO STOCKS
pair.price<-cbind(tAAPL,tQQQ)
head(pair.price)
summary(pair.price)
tQQQ

#RUNNING THE REGRESIION ON THE LOG VALUES OF THE TWO STOCKS


reg1<-EstimateParameters(pair.price, method = lm)
reg1
str(reg1)
#LOOKING AT THE hEDGE RATIO
reg1$hedge.ratio
#PLOTTING THE SPREAD
plot(reg1$spread)
adf.test(reg1$spread)
# USING THE FUNCTION TO ESTIMATTE THE SPREAD
#AND THE HEDGE RATIO WITH THE HISTORICAL VALUES (180 dAYS)
params <- EstimateParametersHistorically(pair.price,
period = 180)
#CALCULATING THE MEAN FOR THE SPREAD TO DECIDE THE TRADING
STRATGY
meanT <- as.numeric(mean(params$spread,na.rm=TRUE))
meanT
sdT<- as.numeric(sd(params$spread,na.rm=TRUE))
sdT
upperThr <- meanT + 1 * sdT;upperThr
lowerThr <- meanT - 1 * sdT
upperThr2 <- meanT + 2 * sdT
lowerThr2 <- meanT - 2 * sdT
#PLOTING THE SPREAD OF THE RETURN OF THE GIVEN STOCKS.
plot(params$spread)
# GIVING THE SIGNAL TO THE SYSTEM WHN TO TAKE THHE POSITION IN
THE MARKET
#IN THIS CASE WE ARE USING THE 1ST sd.
signal <- Simple(params$spread,0.02)
signal
barplot(signal,col="blue",space = 0, border =
"blue",xaxt="n",yaxt="n",xlab="",ylab="")
par(new=TRUE)
plot.ts(params$spread, type="l", col = "red",
lwd = 3, main = "Spread & Signal")
abline(h = upperThr, col = "blue", lwd = 2)
abline(h = lowerThr, col = "blue", lwd = 2)
#CALCULATING THE RETURN OF THE TRADE.
return.pairtrading <Return(pair.price, lag(signal),
lag(params$hedge.ratio))
return.pairtrading
#PLOTING THE RETURN OF THE TRADE.
plot(100 * cumprod(1 +
28

return.pairtrading), main =
"Performance of pair trading")
plot(tAAPL)
par(new=TRUE)
plot(tQQQ)
plot.ts(tHPC,type='l',col="red",main = "AAPL(RED) vs. QQQ(Green)")
par(new=TRUE)
plot(tBPC,type='l',col="green")

# NOW INCLUDING THE CODE FOR THE PRE DEFINED


FUNCTION FOR PAIR TRADING
#Calculate the spread between two stock prices.
#Assume that log(price) is random walk
#Assume that prices has two column as matrix
EstimateParameters <- function(price.pair, method = lm)
{
x <- log(price.pair)
reg <- method(x[, 2] ~ x[, 1])
hedge.ratio <- as.numeric(reg$coef[2])
premium <- as.numeric(reg$coef[1])
spread
<- x[, 2] - (hedge.ratio * x[, 1] + premium)
list(spread = spread, hedge.ratio = hedge.ratio, premium = premium)
}
EstimateParametersHistorically <- function(price.pair, period, method = lm)
{
Applied <- function(price.pair){
reg <- EstimateParameters(price.pair, method)
c(spread = as.numeric(last(reg$spread)), hedge.ratio = reg$hedge.ratio, premium
= reg$premium)
}
as.xts(rollapplyr(price.pair, period, Applied, by.column = FALSE))
}
#Return wether spread is stationary or not
IsStationary <- function(spread, threshold)
{
Is.passed.PP.test <- PP.test(as.numeric(spread))$p.value <= threshold
Is.passed.adf.test <- adf.test(as.numeric(spread))$p.value <= threshold
c(PP.test = Is.passed.PP.test, adf.test = Is.passed.adf.test)
}
HedgeRatio2Weight <- function(hedge.ratio)
{
hedge.ratio <- abs(hedge.ratio) * (-1)
#
normalization.factor <- 1 / (1 + abs(hedge.ratio))
return(cbind(1 * normalization.factor, hedge.ratio * normalization.factor))
}

Return <- function(price.pair, signal.lagged, hedge.ratio.lagged)


{
#
signal
<- na.omit(cbind(signal.lagged, -1*(signal.lagged)))
return.pair <- na.omit(.return(price.pair, type="discrete"))
weight.pair <- na.omit(HedgeRatio2Weight(hedge.ratio.lagged))
#
names(return.pair) <- names(price.pair)
names(signal)
<- names(price.pair)
names(weight.pair) <- names(price.pair)
#
#as.xts(apply(signal * weight.pair * return.pair, 1, sum) * leverage)
x <- signal * weight.pair * return.pair
if(!length(dim(x))){
xts(rep(NA, nrow(price.pair)), order.by = index(price.pair))
}else{
xts(rowSums(x), order.by = index(x))
}
}
.return <- function(x, type = c("continuous", "discrete"), na.pad = TRUE)
{
type <- match.arg(type)
if (type == "discrete") {
result <- x/lag(x, na.pad = na.pad) - 1
}else if (type == "continuous") {
result <- diff(log(x), na.pad = na.pad)
}
return(result)
}
Simple <- function(spread, spread.entry)
{
signal <- ifelse(spread >= spread.entry, -1, NA)
signal <- ifelse(spread <= -spread.entry, 1, signal)
return(na.locf(signal))
}
SimpleWithTakeProfit <- function(spread, spread.entry, spread.take.profit)
{
signal <- ifelse(spread >= abs(spread.entry), -1, 0)
signal <- ifelse(spread <= -abs(spread.entry), 1, signal)
take.profit.upper <- abs(spread.take.profit)
take.profit.lower <- -take.profit.upper
#Hit take.profit line : 0
#other case : continue previous position
for(i in 2:nrow(signal))
{
if(signal[i] == 0){
30

if(signal[i - 1] == 1){
if(spread[i] >= take.profit.lower){
signal[i] <- 0
}else{
signal[i] <- signal[i - 1]
}
}else if(signal[i - 1] == -1){
if(spread[i] <= take.profit.upper){
signal[i] <- 0
}else{
signal[i] <- signal[i - 1]
}
}
}
}
return(signal)
}
****************************END OF
REPORT**************************

You might also like