You are on page 1of 6

STA 457 Pair Trading Project

Team Members
Gursharan Arora Sanya Choudhury Hasan Ejaz Danish Zakir 993999893 993929610 995630170 995690042

High Frequency Data Set: This project starts with listing the chosen pair of stocks, the reason for the selection, the trading signal, followed by the market entering and exiting strategies. Finally, there is a back-testing strategy to confirm that the strategy worked, the explanation and graphs of which are included. As instructed in the class, we did not include the basic technicalities and formulas. The two stocks we used for our high frequency Pairs Trading were Toronto Dominion Bank (TD) and Bank of Nova Scotia (BNS). The high frequency data was found to be highly correlated. The data set starting time was 2nd September 2009 9:00 am and the ending time was 13th August 2010 14:25. The data intervals were 1 hour.1

45 0

50

55

60

65

70

75

200

400

600

800

1200

Time We used R to do this project. The function in R for Dickey-Fuller test is PP.test. We first conducted the Box-Pierce test to check for independency in the stock prices. For both TD and BNS we got p-value < 2.2e-16 which was very small. Hence we rejected the null hypothesis and concluded that the two stocks were not independent. Then we checked for stationarity of the two stocks by the Phillips-Perron Unit Root Test. For TD we got pvalue = 0.6525 and for BNS we got p-value = 0.3332. We found both the p-values were

Rotman Bloomberg Software

large. Therefore, we did not reject the null hypothesis. This led to the conclusion that the stock prices had unit roots and were not stationary. We did a simple linear regression without intercept2, with BNS as the response variable, and TD as the explanatory variable. We then conducted the Box-Pierce test to check for independency of the residuals. We got p-value < 2.2e-16 which was very small. Hence, we rejected the null hypothesis and concluded that the residuals were not independent. Then from the Phillips-Perron Unit Root Test we got p-value = 0.01 which was small. Therefore we rejected the null hypothesis and concluded that the residuals were stationary. Since the residuals were a linear combination of the two stock prices, called spread, there was cointegration between the two stocks.3 In order to calculate the upper and lower bounds for our spread, we used the OrnsteinUhlenbeck process. This gave us the standard error (estimated standard deviation), and the estimated mean. The variable values were: = 0.02123865 = 0.02257505 = 0.3157525 So, the trading signal was set to mean +/- 2*standard deviation. Our strategy was to enter the market and short the spread once the value of the spread was above the upper bound. The spread once again went into the range between the upper and lower bounds, which is when we exited the market. If the value of the spread was lower then the lower limit, we again entered the market by buying the spread. We exited the market once the value of the spread again went between the upper and lower limits. Specifically if the spread moved in or out of the mean 2*standard deviation limits, then we made transactions. The spread was S1-S2 where S1 was the price of the TD stock and S2 was the price of the BNS stock. Thus, shorting the spread consisted of shorting the TD stock and buying the S2 stock. Exiting the market consisted of buying the TD stock and returning to the person we agreed to sell it to when we made the short sale, and selling the BNS stock. Buying the spread consisted of buying the TD stock and short selling the BNS stock. Exiting the market on this strategy consisted of selling the TD stock and buying the BNS stock and returning it to the person we agreed to sell it to when we made the short sale from. We assumed a 0.5% transaction costs on the trades made on exiting the strategy.

2 3

Lecture Notes 5 (Co-integration) E.P Chan (2009) Quantitative Trading, Page 126-127, Wiley Trading.

We had 28 transactions from 2nd September 2009 till 13th August 2010. And the profit/loss was -$23.9215.

-3 0

-2

-1

200

400

600

800

1200

Time

Medium Frequency Data Set: For the medium frequency, we used the same stocks (TD and BNS).The starting date was 2nd September 2009 and the ending date was August 13th 2010. To achieve the medium frequency we just took every tenth observation and made it into our medium frequency data set. Everything was the same as before for the high frequency data. The strategy was similar as well.

45 0

50

55

60

65

70

75

20

40

60

80

100

120

140

Time

Z -3 0 -2 -1 0

20

40

60

80

100

120

140

Time

The TDs Box Pierce test, p-value < 2.2e-16. For BNS the Box Pierce test, p-value < -2.2e16. The P.P test for TD p-value = .5768 and for BNS the p-value = .2877. For the spread (residuals) Box Pierce p-value < 2.2e-16 and for P.P test p-value = .2865. In order to calculate the upper and lower bounds for our spread with the medium frequency data set, we used the Ornstein-Uhlenbeck process. This gave us the standard error (estimated standard deviation), and the estimated mean. The variable values were: = -0.1270203 = 0.07858154 = 0.5725724 So, the trading signal was set to mean +/- 2*standard deviation. We had 4 transactions in all and our profit/loss was -$4.4366.