You are on page 1of 1

Language-independent Bayesian

sentiment mining of Twitter


1. Introduction
2. Model
5. Results
6. Conclusion
amour feliz kasih :-)
happy love :) thanks
birthday thank
follow good
welcome luck great
nice morning
amazing hello lovely
xxx best hey
awesome cheers
:-( sad triste :( nooo
booo </3 poor miss
*cries* gutted aord :'(
poorly hate dean ugh fml
urghhh stressed
headache *sigh* canada
prayforkatie richards rip
whyyy horrible revision
*hugs*
Examples of high-likelihood happy and sad
words from the UK
4. Benefits
3. Geography
To model different geographic areas, we could train models
separately for different regions, but some regions have far
fewer tweets than others and we wont get a good estimate of
the word distribution in those areas.

We can incorporate our knowledge about word distributions
from neighbouring regions and put them into our prior. To do
this, we train the models separately, create priors based on
these models and then retrain the models with the new priors.
Tweet 1: Great :) Tweet 2: Holidays! Great!

We have shown that there are advantages to full
probabilistic models over using basic classifiers with
noisy labels. We more correctly model our
assumptions about tweets and see performance
increases because of this. The framework also
makes it easy to incorporate information into
neighbouring regions, as we show for a geographic
sentiment model.
Sentiment mining of Twitter is useful in a variety of fields and
has been shown to have predictive value for box office
revenues, political poll outcomes and even the stock market.
We assume that each tweet has a hidden sentiment and that
the words in a tweet are drawn from a multinomial
distribution that depends only on its sentiment. An example
of the probability of the words in the sentence I love this
great car under two different multinomial distributions is
shown below.
We also place Bayesian priors on the multinomials, which
allow us to specify which words we believe will be likely in
each sentiment, before weve seen any tweets. This is how
we incorporate the emoticon information.
Nave Bayes can be constructed as a special case of our
method. Ours however, has two main advantages.


Most methods use the presence of emoticons in tweets as noisy labels
for tweet sentiment; we present a more principled way to incorporate
this information and a method to analyze geographical variation.
Nave Bayes will learn that great is a happy word, but
nothing about holidays. At the first iteration, our
method will learn that great is probably a happy word.
At the second iteration it will learn that holidays is also
probably a happy word.
To illustrate the second advantage, consider the
following example dataset:
1. It allows correct modeling of uncertainty on labels.
2. It iteratively refines the word distributions.
Alexander Davies, Zoubin Ghahramani
University of Cambridge

You might also like