You are on page 1of 19

Machine Learning for Stock Selection

Robert J. Yan
Charles X. Ling
University of Western Ontario, Canada
{jyan, cling}@csd.uwo.ca

1
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

2
Introduction
 Objective:
– Use machine learning to select a small number
of “good” stocks to form a portfolio
 Research questions:
– Learning in the noisy dataset
– Learning in the imbalanced dataset
 Our solution: Prototype Ranking
– A specially designed machine learning method

3
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

4
Stock Selection Task
Given information prior to week t, predict
performance of stocks of week t
– Training set

Predictor 1 Predictor 2 Predictor 3 Goal


Stock ID Return of Return of Volume ratio Return of
week t-1 week t-2 of t-2/t-1 week t

Learning a ranking function to rank testing data


– Select n highest to buy, n lowest to short-sell

5
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

6
Prototype Ranking

 Prototype Ranking (PR): special machine


learning for noisy and imbalanced stock data

 The PR System
Step 1. Find good “prototypes” in training data
Step 2. Use k-NN on prototypes to rank test data

7
Step 1: Finding Prototypes
Prototypes: representative points
– Goal: discover the underlying
density/clusters of the training
samples by distributing
prototypes in sample
space
– Reduce data size
prototypes samples

prototype
neighborhood 8
Finding prototypes using competitive learning

General competitive learning


 Step 1: Randomly initialize a set of prototypes
 Step 2: Search the nearest prototypes
 Step 3: Adjust the prototypes
 Step 4: Output the prototypes

Hidden density in training is reflected in prototypes

10
Modifications for Stock data

 In step 1: Initial prototypes organized in a tree-structure


– Fast nearest prototype searching
 In step 2: Searching prototypes in the predictor space
– Better learning effect for the prediction tasks
 In step 3: Adjusting prototypes in the goal attribute space
– Better learning effect in the imbalanced stock data
 In step 4, prune the prototype tree
– Prune children prototypes if they are similar to the parent
– Combine leaf prototypes to form the final prototypes

11
Step 2: Predicting Test Data
 The weighted average of k nearest prototypes
 Online update the model with new data

12
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

13
Data
CRSP daily stock database
– 300 NYSE and AMEX stocks, largest market cap
– From 1962 to 2004

14
Testing PR

 Experiment 1: Larger portfolio, lower average


return, lower risk – diversification
 Experiment 2: is PR better than Cooper’s
method?

15
Results of Experiment 1
1.8
1.6

Weekly Average
1.4

Return (%)
1.2
1
Average 0.8
0.6
Return 0.4
0.2
(1978-2004) 0
0 10 20 30 40 50 60 70 80 90 100 110
Stock Number in Portfolio

5
Weekly Std.(%)

4.5

4
Risk (std)
3.5
(1978-2004) 3

2.5

2
0 10 20 30 40 50 60 70 80 90 100 110

Stock Number in Portfolio


16
Experiment 2: Comparison to
Cooper’s method
 Cooper’s method (CP): A traditional non-
ML method for stock selection…
 Compare PR and CP in 10-stock portfolios

17
Results of Experiment 2
Measures:
 Average Return (Ret.)
 Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.

1.6
1.4
1.2
1
0.8 PR 10-stock portfolio
CP 10-stock portfolio
0.6
0.4
0.2
0 18
Ret.(%) SR
Outline
 Introduction
 The stock selection task
 The Prototype Ranking method
 Experimental results
 Conclusions

20
Conclusions
 PR: modified competitive learning and k-NN
for noisy and imbalanced stock data
 PR does well in stock selection
– Larger portfolio, lower return, lower risk
– PR outperforms the non-ML method CP
 Future work: use it to invest and make money!

21

You might also like