You are on page 1of 1

Web Science

Predicting Stock Prices with Online Information


Paul Gaskell Professor Frank McGroarty Dr. Thanassis Tiropanis

Introduction
On the 23 of April 2013, a fake tweet was sent from the White House's twitter account. A few minutes later the price of the Standard and Poor's 500 index, representing 500 of the highest valued companies in the US dived by nearly 1%. 1 tweet accounted for the loss of nearly 1% of the value of the US Economy. In a sense this is not surprising, financial news services like Bloomberg and Reuters regularly update and publish indices of media sentiment towards stocks. Over the last 4-5 years researchers have begun to look for models of media sentiment which can be used to predict prices. The results of this research are, however, generally quite disappointing. The reason for this is that the way language relates to offline events is a difficult thing to model. Language is temporally uncertain, in that a statement can be about an event in the future, past or present. Also as yet there is no literature which describes how to model word frequency movements over time. The aim of this PhD is to define a methodology that tackles these issues.

Signal Diffusion Mapping


A New Time-Series Analysis Methodology for Modelling and Forecasting Based on Complex LeadLag Relationships
Currently, almost all time series analysis research uses some form of linear regression. The trouble with this is that the temporal relationship between variables is fixed if there is uncertainty as to when a variable influences another this cannot be picked up in the analysis. In order to be able to model series where the temporal relationship between the variables is uncertain, we invented a new time-series analysis methodology (paper currently under review). This combines concepts from speech processing and polymer physics to model the relationship as a bumpy surface, over which information attempts to diffuse from one series into the other. We go on to show how mapping the diffusion rate properly allows us to predict the daily return of the major US and UK stock indices. We show a trading model that could return around 908% over a 14 year period, just using two indices to predict each other.

Spurious Regressions with Online Text Data


A large number of studies now exist which report correlations between some text based metric and an offline variable. These studies always use metrics built under the assumption that the probability of a word occurring in a set of messages is either not a function, or at most a linear function of the number of messages. But a wide range of studies exist showing that word frequencies are approximately power law distributed in text. We show that firstly, this property has significant implications for modelling text based time-series metrics and secondly, this property means that current regression results in this literature are largely invalid. We go on to present a model of word frequency movements that better fits that data.

You might also like