1. Introduction 2. Model 5. Results 6. Conclusion amour feliz kasih :-) happy love :) thanks birthday thank follow good welcome luck great nice morning amazing hello lovely xxx best hey awesome cheers :-( sad triste :( nooo booo </3 poor miss *cries* gutted aord :'( poorly hate dean ugh fml urghhh stressed headache *sigh* canada prayforkatie richards rip whyyy horrible revision *hugs* Examples of high-likelihood happy and sad words from the UK 4. Benefits 3. Geography To model different geographic areas, we could train models separately for different regions, but some regions have far fewer tweets than others and we wont get a good estimate of the word distribution in those areas.
We can incorporate our knowledge about word distributions from neighbouring regions and put them into our prior. To do this, we train the models separately, create priors based on these models and then retrain the models with the new priors. Tweet 1: Great :) Tweet 2: Holidays! Great!
We have shown that there are advantages to full probabilistic models over using basic classifiers with noisy labels. We more correctly model our assumptions about tweets and see performance increases because of this. The framework also makes it easy to incorporate information into neighbouring regions, as we show for a geographic sentiment model. Sentiment mining of Twitter is useful in a variety of fields and has been shown to have predictive value for box office revenues, political poll outcomes and even the stock market. We assume that each tweet has a hidden sentiment and that the words in a tweet are drawn from a multinomial distribution that depends only on its sentiment. An example of the probability of the words in the sentence I love this great car under two different multinomial distributions is shown below. We also place Bayesian priors on the multinomials, which allow us to specify which words we believe will be likely in each sentiment, before weve seen any tweets. This is how we incorporate the emoticon information. Nave Bayes can be constructed as a special case of our method. Ours however, has two main advantages.
Most methods use the presence of emoticons in tweets as noisy labels for tweet sentiment; we present a more principled way to incorporate this information and a method to analyze geographical variation. Nave Bayes will learn that great is a happy word, but nothing about holidays. At the first iteration, our method will learn that great is probably a happy word. At the second iteration it will learn that holidays is also probably a happy word. To illustrate the second advantage, consider the following example dataset: 1. It allows correct modeling of uncertainty on labels. 2. It iteratively refines the word distributions. Alexander Davies, Zoubin Ghahramani University of Cambridge