You are on page 1of 8

review articles

doi:10.1145/ 2436256.2436274

The main applications and challenges


of one of the hottest research areas in
computer science.
By Ronen Feldman

Techniques
and
Applications
for Sentiment available from social media including
Twitter, Facebook, message boards,

Analysis
blogs, and user forums. These snippets
of text are a gold mine for companies
and individuals that want to moni-
tor their reputation and get timely
feedback about their products and ac-
tions. Sentiment analysis offers these
organizations the ability to monitor
the different social media sites in real
time and act accordingly. Marketing
managers, PR firms, campaign manag-
Sen tim ent analys i s ( o r O pi ni o n min in g ) is defined ers, politicians, and even equity inves-
as the task of finding the opinions of authors about tors and online shoppers are the direct
beneficiaries of sentiment analysis
specific entities. The decision-making process of technology.
people is affected by the opinions formed by thought It is common to classify sentences
into two principal classes with regard
leaders and ordinary people. When a person wants to to subjectivity: objective sentences that
buy a product online he or she will typically start by
searching for reviews and opinions written by other key insights
people on the various offerings. Sentiment analysis 
Sentiment analysis offers organizations
the ability to monitor various social media
is one of the hottest research areas in computer sites in real time and act accordingly.

science. Over 7,000 articles have been written on the 


Aspect-level sentiment analysis is the
most fine-grained analysis of review
topic. Hundreds of startups are developing sentiment articles and social media snippets with
respect to specific objects and their
analysis solutions and major statistical packages such aspects.

as SAS and SPSS include dedicated sentiment analysis 


Utilization of sentiment analysis
techniques in stock picking can lead
modules. There is a huge explosion today of ‘sentiments’ to superior returns.

82 communicatio ns o f th e ac m | ap r i l 201 3 | vo l . 5 6 | no. 4


contain factual information and sub- access. Sentiment analysis systems The input to the system is a cor-
jective sentences that contain explicit must be able to provide a sentiment pus of documents in any format (PDF,
opinions, beliefs, and views about score for the whole review as well as an- HTML, XML, Word, among others).
specific entities. Here, I mostly focus alyze the sentiment of each individual The documents in this corpus are con-
on analyzing subjective sentences. aspect of the hotel. verted to text and are pre-processed
However, I refer to the usage of objec- I present the main research prob- using a variety of linguistic tools such
tive sentences when describing a senti- lems related to sentiment analysis and as stemming, tokenization, part of
ment application for stock picking. some of the techniques used to solve speech tagging, entity extraction, and
As an example, here is a review them, then review some of the major relation extraction. The system may
about a hotel in Manhattan. application areas where sentiment also utilize a set of lexicons and lin-
“The king suite was spacious, analysis is being used today. I conclude guistic resources. The main compo-
clean, and well appointed. The recep- with some of the open research prob- nent of the system is the document
tion staff, bellmen, and housekeeping lems in this field. Due to limited space, analysis module, which utilizes the
were very helpful. Requests for extras I am not be able to cover the whole linguistic resources to annotate the
from the maid were always provided. range of problems and techniques; but pre-processed documents with senti-
The heating and air conditioning refer the reader to some of the exten- ment annotations. The annotations
functioned well; this was good as the sive reviews written on this topic.20,21 27 may be attached to whole documents
weather was variable. The sofa bed In this review, I will focus on five (for document-based sentiment), to
was the best I’ve ever experienced. The specific problems within the field of individual sentences (for sentence-
king size bed was very comfortable. sentiment analysis: based sentiment) or to specific as-
The building and rooms are very well ˲˲ Document-level sentiment analysis; pects of entities (for aspect-based
soundproofed. The neighborhood is ˲˲ Sentence-level sentiment analysis; sentiment). These annotations are the
the best for shopping, restaurants, and ˲˲ Aspect-based sentiment analysis; output of the system and they may be
access to subway. Only “complaint” ˲˲ Comparative sentiment analysis; presented to the user using a variety of
has to do with high-speed Internet ac- and, visualization tools.
Illustration by Ch aris Tsevis

cess. It’s only available on floors 8–12.” ˲˲ Sentiment lexicon acquisition.


Overall the review is very positive Before explaining each of these Document-Level
about the hotel. It refers to many dif- problems in detail, let’s review a gener- Sentiment Analysis
ferent aspects of the hotel including: al architecture of a generic sentiment This is the simplest form of sentiment
heating, air conditioning, staff cour- analysis system. The architecture is analysis and it is assumed that the
tesy, bed, neighborhood, and Internet shown in Figure 1. document contains an opinion on one

ap r i l 2 0 1 3 | vo l. 56 | n o. 4 | c om m u n ic at ion s of t he acm 83
review articles

Figure 1. Architecture of a generic sentiment analysis system. ment analysis on these documents us-
ing a sentiment analyzer in English.

Sentence-Level Sentiment Analysis


Corpus A single document may contain mul-
tiple opinions even about the same
entities. When we want to have a more
fine-grained view of the different opin-
Lexicons and Document ions expressed in the document about
Linguistic Resources Processing
the entities we must move to the sen-
tence level.
We assume here that we know the
identity of the entity discussed in the
Document sentence. We further assume there is
Analysis
a single opinion in each sentence. This
assumption can be relaxed by splitting
the sentence into phrases where each
phrase contains just one opinion. Be-
Sentiment Scores fore analyzing the polarity of the sen-
for Entities and Aspects
tences we must determine if the sen-
tences are subjective or objective. Only
subjective sentences will then be fur-
ther analyzed. (Some approaches also
main object expressed by the author of Unsupervised approaches to doc- analyze objective sentences, which
the document. Numerous papers have ument-level sentiment analysis are are more difficult.) Most methods use
been written on this topic. There are based on determining the semantic supervised approaches to classify the
two main approaches to document- orientation (SO) of specific phrases sentences into the two classes.40 A
level sentiment analysis: supervised within the document. If the average bootstrapping approach was suggested
learning and unsupervised learning. SO of these phrases is above some in Hai32 in order to reduce the amount
The supervised approach assumes predefined threshold the document of manual labor needed when prepar-
that there is a finite set of classes into is classified as positive and otherwise ing a large training corpus. A unique
which the document should be clas- it is deemed negative. There are two approach based on the minimum cuts
sified and training data is available main approaches to the selection of was proposed in Pang and Lee.26 The
for each class. The simplest case is the phrases: a set of predefined POS main premise of their approach is that
when there are two classes: positive patterns can be used to select these neighboring sentences should have
and negative. Simple extensions can phrases36 or a lexicon of sentiment the same subjectivity classification.
also add a neutral class or have some words and phrases can be used.34 A After we have zoned in on the sub-
discrete numeric scale into which classic method to determine the SO jective sentences we can classify these
the document should be placed (like of a given word or phrase is to calcu- sentences into positive or negative
the five-star system used by Amazon). late the difference between the PMI classes. As mentioned earlier, most ap-
Given the training data, the system (Pointwise Mutual Information) of the proaches to sentence-level sentiment
learns a classification model by us- phrase with two sentiment words.36 analysis are either based on supervised
ing one of the common classification PMI(P,W) measures the statistical de- learning17 or on unsupervised learn-
algorithms such as SVM, Naïve Bayes, pendence between the phrase P and ing.40 The latter approach is similar in
Logistic Regression, or KNN. This the word W based on their co-occur- nature to that of Turney,36 except that
classification is then used to tag new rence in a given corpus or over the it uses a modified log-likelihood ratio
documents into their various senti- Web (by utilizing Web search queries). instead of PMI and the number of seed
ment classes. When a numeric value The two words used in Turney36 are ‘ex- words that are used to find the SO of the
(in some finite range) is to be assigned cellent’ and ‘poor.’ The SO measures words in the sentence is much larger.
to the document then regression can whether P is closer in meaning to the Recent research24 has shown that it
be used to predict the value to be as- positive word (‘excellent’) or the nega- is advisable to handle different types
signed to the document (for example, tive word (‘poor’). of sentences by different strategies.
in the Amazon five-star ranking sys- A few researchers1,37 have used ma- Sentences that need unique strate-
tem). Research28 has shown that good chine translation to perform docu- gies include conditional sentences,
accuracy is achieved even when each ment-level sentiment analysis in lan- question sentences and sarcastic sen-
document is represented as a simple guages such as Chinese and Spanish tences. Sarcasm is extremely difficult
bag of words. More advanced repre- that lack the vast linguistic resources to detect and it exists mainly in politi-
sentations utilize TFIDF, POS (Part of available in English. (Their method cal contexts. One solution for identify-
Speech) information, sentiment lexi- works by translating the documents ing sarcastic sentences is described in
cons, and parse structures. to English and then performing senti- Tsur et al.35

84 communicatio ns o f th e acm | ap r i l 201 3 | vo l . 5 6 | no. 4


review articles

Aspect-Based Sentiment Analysis old are retained. For instance, for the
The two previous approaches work printer category such phrases, for ex-
well when either the whole document ample, would be “printer comes with”
or each individual sentence refers to a or “printer has.”
single entity. However, in many cases
people talk about entities that have Aspect-based Another approach to aspect identi-
fication is to use a phrase dependency
many aspects (attributes) and they
have a different opinion about each
sentiment analysis parser that utilizes known sentiment
expressions to find additional aspects
of the aspects. This often happens is the research (even infrequent ones).39
in reviews about products or in dis-
cussion forums dedicated to specific
problem that We can also view the problem of
aspect identification as an informa-
product categories (such as cars, cam- focuses on the tion extraction problem and then use a
eras, smartphones, and even pharma-
ceutical drugs). As an example here is
recognition of tagged corpus to train a sequence clas-
sifier such as a Conditional Random
a review of Kindle Fire taken from the all sentiment Field (CRF)18 to find the aspects.14
Amazon website:
“As a long-time Kindle fan I was ea- expressions within I have just discussed identification
of explicit aspects, that is, aspects that
ger to get my hands on a Fire. There are a given document are mentioned explicitly in the sen-
some great aspects; the device is quick
and for the most part dead-simple to and the aspects to tences. However, there are many as-
pects that are not mentioned explicit-
use. The screen is fantastic with good
brightness and excellent color, and a
which they refer. ly in the sentences and can be inferred
from the sentiment expressions that
very wide viewing angle. But there are mention them implicitly. These as-
some downsides too; the small bezel pects are called implicit aspects. Ex-
size makes holding it without inad- amples of such aspects are weight,
vertent page-turns difficult, the lack of which can be inferred from the frag-
buttons makes controls harder, the ac- ment “this phone is too heavy,” or
cessible storage memory is limited to size, which can be inferred from “the
just 5GB.” camera is quite compact.” One way to
Classifying this review as either extract such implicit aspects is sug-
positive or negative toward the Kindle gested in Liu10 where a two-phase co-
would totally miss the valuable infor- occurrence association rule mining
mation encapsulated in it. The author approach is used to match implicit
provides feedback about many aspects aspects (sentiment expressions) with
of the Kindle (like speed, ease of use, explicit aspects.
screen quality, bezel size, buttons, and With these two sets we can use a
storage memory size). Some of these as- simple algorithm2 that determines
pects are reviewed positively while some the polarity of each sentiment expres-
of the others get a negative sentiment. sion based on a sentiment lexicon,
Aspect-based sentiment analysis sentiment shifters (such as negation
(also called feature-based sentiment words), and special handling of ad-
analysis) is the research problem versative conjunctions, such as ‘but.’
that focuses on the recognition of all The final polarity of each aspect is
sentiment expressions within a given determined by a weighted average
document and the aspects to which of the polarities of all sentiment ex-
they refer. pressions inversely weighted by the
The classic approach, which is used distance between the aspect and the
by many commercial companies, to the sentiment expression.
identification of all aspects in a corpus
of product reviews is to extract all noun Comparative Sentiment Analysis
phrases (NPs) and then keep just the In many cases users do not provide a
NPs whose frequency is above some ex- direct opinion about one product but
perimentally determined threshold.12 instead provide comparable opinions
One approach is to reduce the noise such as in these sentences taken from
in the found NPs.30 The main idea is the user forums of Edmonds.com:
to measure for each candidate NP the “300 C Touring looks so much better
PMI with phrases that are tightly relat- than the Magnum,” “I drove the Hon-
ed to the product category (like phones, da Civic, it does not handle better than
printers, or cameras). Only those NPs the TSX, not even close.” The goal of
that have a PMI above a learned thresh- the sentiment analysis system in this

ap r i l 2 0 1 3 | vo l. 56 | n o. 4 | c om m u n ic at ion s of t he acm 85
review articles

case is to identify the sentences that Word Net’s synonyms and antonyms.
contain comparative opinions, and One of the elegant algorithms is pro-
to extract the preferred entity(-ies) in posed in Kamp et al.16 The method de-
each opinion. fines distance d(t1, t2) between terms
One of the pioneering papers on
comparative sentiment analysis is The sentiment t1 and t2 as the length of the shortest
path between t1 and t2 in WordNet.
Jindal and Liu.15 This paper found
that using a relatively small number of
lexicon is the most The orientation of t is defined as SO(t)
= (d(t, bad) − d(t, good))/d(good, bad).
words we can cover 98% of all compara- crucial resource |SO(t)| is the strength of the senti-
tive opinions. These words are:
˲˲ Comparative adjectives adverbs
for most sentiment ment of t, SO(t) > 0 entails t is posi-
tive, and t is negative otherwise. The
such as: ‘more,’ ‘less,’ and words end- analysis algorithms. main disadvantage of any dictionary-
ing with –er (for example, ‘lighter’). based algorithm is that the acquired
˲˲ Superlative adjectives and adverbs lexicon is domain independent and
such as: ‘most,’ ‘least,’ and words end- hence does not capture the specific
ing with –est (for example, ‘finest’). peculiarities of any specific domain.
˲˲ Additional phrases such as ‘favor,’ More advanced dictionary-based ap-
‘exceed,’ ‘outperform,’ ‘prefer,’ ‘than,’ proaches are reported in Dragut et al.4
‘superior,’ ‘inferior,’ ‘number one,’ and Peng and Park.29
‘up against.’ If we want to create a domain-spe-
Since these words lead to a very cific sentiment lexicon we have to use
high recall, but low precision, a naïve one of the many corpus-based algo-
Bayes classifier was used to filter out rithms. A classic work11 in this area
sentences that do not contain com- introduced the concept of sentiment
parative opinions. The classifier used consistency that enables one to iden-
sequential patterns as features. The tify additional adjectives that have
sequential patterns were discovered a consistent polarity as a set of seed
by the class sequential rule (CSR) min- adjectives. A set of linguistic con-
ing algorithm. A simple algorithm to nectors (AND, OR, NEITHER-NOR,
identify the preferred entities based EITHER-OR) was used to find adjec-
on the type of comparative used and tives that are connected to adjectives
the presence of negation is described with known polarity. Consider the
in Ding et al.3 sentence “the phone is both powerful
and light.” If we know that ‘powerful’
Sentiment Lexicon Acquisition is a positive word, we can assume that
As we have seen in the previous dis- by utilizing the connector AND the
cussion, the sentiment lexicon is the word ‘light’ is positive as well. In or-
most crucial resource for most sen- der to eliminate noise the algorithm
timent analysis algorithms. Here, I created a graph of adjectives by using
briefly mention a few approaches for connections induced by the corpus
the acquisition of the lexicon. There and after a clustering step, positive
are three options for acquiring the sen- and negative clusters are formed.
timent lexicon: manual approaches An approach called double propa-
in which people code the lexicon by gation for simultaneous acquisition
hand, dictionary-based approaches in of a domain-specific sentiment lexi-
which a set of seed words is expanded con and a set of aspects was intro-
by utilizing resources like WordNet,8 duced in Qiu et al.31 This approach
and corpus-based approaches in which used the minipar19 parser to parse
a set of seed words is expanded by us- the sentences in the corpus and find
ing a large corpus of documents from a associated aspects and sentiment ex-
single domain. pressions. The algorithm starts with
Clearly, the manual approach is in a seed set of sentiment expressions
general not feasible as each domain and uses a set of predefined depen-
requires its own lexicon and such a la- dency rules and the minipar parser to
borious effort is prohibitive. I will fo- find aspects that are connected to the
cus on the other two approaches. The sentiment expressions. It then uses
dictionary-based approach starts with the found aspects to find more sen-
a small set of seed sentiment words timent expressions that in turn find
suitable for the domain at hand. This more aspects. This mutual bootstrap-
set of words is then expanded by using ping process stops when no more as-

86 communicatio ns o f th e acm | ap r i l 201 3 | vo l . 5 6 | no. 4


review articles

pects or sentiment expressions can Twitter and Facebook are a focal of such a graph is shown in Figure 2.
be added. For example, in “Kindle point of many sentiment analysis ap- The sentiment for CHK is extremely
Fire has an amazing display,” the ad- plications. The most common applica- negative and indeed the stock went
jective ‘amazing’ modifies the noun tion is monitoring the reputation of a down considerably between April 21,
‘display,’ so given that ‘amazing’ is specific brand on Twitter and/or Face- 2012 and May 22, 2012. The graph is
a sentiment expression and we have book. One application that performs interactive, so a click on any point will
the rule “a noun which is modified by real-time analysis of tweets that con- reveal the events and sentiment ex-
a sentiment expression is an aspect,” tain a given term is tweetfeel (http:// pressions behind the various increas-
we can extract ‘display’ as an aspect. www.tweetfeel.com). es in positive or negative sentiment, as
Conversely, if we know ‘display’ is an Sentiment analysis can provide sub- shown in Figure 3.
aspect, then using a similar rule we stantial value to candidates running for StockTwits (http://www.stocktwits.
can infer that ‘amazing’ is a senti- various positions. It enables campaign com) is a site that shows all tweets
ment expression. The algorithm uses managers to track how voters feel about that contain at least one stock ticker
several additional constraints to re- different issues and how they relate to in them (A ‘$’ sign must be before
duce the effect of noise. the speeches and actions of the can- the ticker of the stock to signal it is a
Migrating a sentiment lexicon from didates. An analysis of tweets related ticker). The following are three tweets
one domain to another domain was to the 2010 campaign can be found at about Google (Ticker: GOOG) from
studied in Du et al.5 An algorithm for http://www.nytimes.com/interactive/us/ Sunday, July 29, 2012.
acquiring a slightly different type of politics/2010-twitter-candidates.html. 1. IMO, if market up Monday, $PCLN
lexicon called a connotation lexicon is Another important domain for sen- $AAPL look much better for Call op-
reported in Feng et al.9 A connotation timent analysis is the financial mar- tions plays than $GOOG. $GOOG needs
lexicon contains words that express kets. There are numerous news items, a little rest
sentiment either explicitly or implicit- articles, blogs, and tweets about each 2. $GOOG Monday will probably
ly. For instance, award and promotion public company. A sentiment analysis prove to be a nice shorting opportunity.
have positive connotations and cancer system can use these various sources I’m guessing it will close at or at least
and war have negative connotations. to find articles that discuss the com- trade to 625.
panies and aggregate the sentiment 3. Slag $MSFT all u want, but it gets
Applications about them as a single score that how TV is evolving. The next gen Ki-
The most common application of sen- can be used by an automated trad- nect is something I want to buy, NOT
timent analysis is in the area of reviews ing system. One such system is The $GOOG TV v.39.3 beta
of consumer products and services. Stock Sonar (http://www.thestockso- Detecting sentiment on the first
There are many websites that pro- nar.com).7 This system (developed by tweet will be done by utilizing com-
vide automated summaries of reviews Digital Trowel) shows graphically the parative sentiment analysis tech-
about products and about their specif- daily positive and negative sentiment niques. We will conclude that the
ic aspects. A notable example of that is about each stock alongside the graph writer is positive on PriceLine (PCLN)
“Google Product Search.” of the price of the stock. An example and Apple (AAPL) and negative on

Figure 2. Sentiment graph of Chesapeake Energy (http://www.thestocksonar.com).

ap r i l 2 0 1 3 | vo l. 56 | n o. 4 | c om m u n ic at ion s of t he acm 87
review articles

Google. Analyzing the second tweet 2012 at $634.96, the author predicts analysis of the tweets. The third and
will reveal a negative sentiment on a down movement of 1.57% to $525. last tweet is the most difficult to an-
Google (shorting opportunity). Since Clearly, we need to be able to get his- alyze since it requires background
Google closed on Friday, July 27, torical prices of stocks to do proper knowledge not available inside the
tweet. We need to know that Kinect
Figure 3. The negative events for CHK on May 9th (http://www.thestocksonar.com). is a product of Microsoft (MSFT)
and hence the author has a posi-
tive opinion on MSFT and a negative
opinion on Google (by utilizing the
sentiment shifter “NOT”). These ex-
amples show some of the challenges
facing sentiment analysis systems
when trying to analyze short mes-
sages that include reference to ad-
ditional objects (products and stock
prices in this case). The systems
must utilize background knowledge
in order to determine the relation-
ship between the sentiment targets
and the other objects.
An application that utilizes com-
parative sentiment analysis to assess
the market structure of sedan cars
and drugs for diabetes is described
in Netzer et al.25 In Figure 4 we can
see a visual map that shows the vari-
ous connections between drugs and
Figure 4. Drugs and symptoms (diabetes forums) based on extractions done by symptoms. Two types of connections
Visual Care (http://www.digitltrowel.com).
are extracted by the sentiment analy-
sis system: Drug Causes Symptom
(negative, shown in red) and Drug
Phenergan Prednisone Remedies Symptom (positive, shown
ALTACE Glucagon in blue).
Dandelion Compazine
Bactrim
Avandia
Januvia
Research Issues
There are many open research issues in
nausea sentiment analysis, including:
swelling
1. There is a need for better model-
ing of compositional sentiment. At the
Byetta sentence level, this means more accu-
Actos rate calculation of the overall sentence
Metformin rash
sentiment of the sentiment-bearing
hives Caverject Lipitor words, the sentiment shifters, and the
Elavil
Statins Pregabalin sentence structure.
muscle weakness
Saline 2. Each product has many names
pain
Tramadol Glucosamine Naproxen that refer to it even within the same doc-
Biotin Darvocet ument and clearly across documents.
Lyrica Neurontin tingling This issue of automatic entity resolu-
Cymbalta
Crestor tion is not yet solved. Another related
Tylenol
major hurdle is handling of anaphora
Multivitamin Lisinopril
Aspirin resolution in an accurate way. This is a
Ginkgo
dizziness problem for aspect extraction too, that
headache bleeding
indigestion Lotrimin Iron Phentermine
is, how to group aspects, for example,
Supplements “battery life” and “power usage” refer
Ginseng Ibuprofen
to the same aspect of a phone.
seizures 3. When a document discusses sev-
Salicyclic acid
itching cough eral entities, it is crucial to identify the
text relevant to each entity. Current ac-
curacy in identifying the relevant text is
far from satisfactory.

88 com municatio ns o f th e ac m | ap r i l 201 3 | vo l . 5 6 | no. 4


review articles

4. Although there are some ap- Acknowledgments by common words and phrases: Using Mechanical
Turk to create an emotion lexicon. In Proceedings of
proaches that use classification meth- I thank Lyle Ungar, Bing Liu, Benjamin the NAACL HLT 2010 Workshop on Computational
ods to identify sarcasm, they are not yet Rosenfeld, and Roy Bar-Haim for help- Approaches to Analysis and Generation of Emotion in
Text (2010).
integrated within autonomous senti- ful comments on drafts of this article. 24. Narayanan, R., Liu, B. and Choudhary, A. Sentiment
ment analysis systems. analysis of conditional sentences. In Proceedings of
the 2009 Conference on Empirical Methods in Natural
5. Noisy texts (those with spelling/ References Language Processing (Singapore, 2009). Association
1. Brooke, J., Tofiloski, M. and Taboada, M. Cross- for Computational Linguistics, 180–189.
grammatical mistakes, missing/prob- linguistic sentiment analysis: From English to Spanish. 25. Netzer, O., Feldman, R., Fresko, M. and Goldenberg, Y.
lematic punctuation and slang) are In Proceedings of RANLP (2009). Mine your own business: Market structure surveillance
2. Ding, X., Liu, B. and Yu, P.S. A holistic lexicon-based through text mining. Marketing Science, 2012.
still a big challenge to most sentiment approach to opinion mining. In Proceedings of the 26. Pang, B. and Lee, L. A Sentimental Education:
analysis systems. Conference on Web Search and Web Data Mining Sentiment Analysis using Subjectivity Summarization
(2008). based on minimum cuts. In Proceedings of the
6. Many of the statements about 3. Ding, X., Liu, B. and Zhang, L. Entity discovery Association for Computational Linguistics (2004),
entities are factual in nature and yet and assignment for opinion mining applications. 271–278.
In Proceedings of ACM SIGKDD International 27. Pang, B. and Lee, L. Opinion mining and sentiment
they still carry sentiment. Current Conference on Knowledge Discovery and Data Mining analysis. Foundations and Trends in Information
(2009).
sentiment analysis approaches de- 4. Dragut, E.C., Yu, C., Sistla, P. and Meng, W.
Retrieval 2, 1-2 (2008), 1–135.
28. Pang, B., Lee, L. and Vaithyanathan, S. Thumbs up?
termine the sentiment of subjective Construction of a sentimental word dictionary. In Sentiment Classification using machine learning
Proceedings of ACM International Conference on
statements and overlook such objec- Information and Knowledge Management (2010).
techniques. In Proceedings of EMNLP-02, 7th
Conference on Empirical Methods in Natural Language
tive statements. There is a need for 5. Du, W., Tan, S., Cheng, X. and Yun, X. Adapting Processing (Philadelphia, PA, 2002). Association for
information bottleneck method for automatic
algorithms that use context to attach construction of domain-oriented sentiment lexicon. In
Computational Linguistics, Morristown, NJ, 79–86.
29. Peng, W. and Park, D.H. Generate adjective sentiment
sentiment scores to objective (fac- Proceedings of ACM International Conference on Web dictionary for social media sentiment analysis
Search and Data Mining (2010).
tual) statements. Such statements oc- 6. Esuli, A. and Sebastiani, F. Determining term
using constrained nonnegative matrix factorization.
In Proceedings of the Fifth International AAAI
cur frequently in news articles. subjectivity and term orientation for opinion mining. In Conference on Weblogs and Social Media (2011).
Proceedings of Conf. of the European Chapter of the 30. Popescu, A.-M. and Etzioni, O. Extracting product
Association for Computational Linguistics (2006). features and opinions from reviews. In Proceedings of
Resources 7. Feldman, R., Rosenfeld, B., Bar-Haim, R. and Fresko, M. Conference on Empirical Methods in Natural Language
The Stock Sonar—Sentiment Analysis of Stocks Based Processing (2005).
The following resources contain senti- on a Hybrid Approach. IAAI-12 (2011), 1642–1647. 31. Qiu, G., Liu, B., Bu, J. and Chen, C. Opinion word
ment lexicons that can be used within 8. Fellbaum, C.D. Wordnet: An Electronic Lexical expansion and target extraction through double
Database. MIT Press, Cambridge, MA, 1998. propagation. Computational Linguistics 37, 1 (2011),
sentiment analysis systems: 9. Feng, S., Bose, R. and Choi, Y. Learning general 9–27.
˲˲ General Inquirer lexicon;33 http:// connotation of words using graph-based algorithms. 32. Riloff, E. and Wiebe, J. Learning extraction patterns
In Proceedings of the 2011 Conference on Empirical for subjective eExpressions. In Proceedings of the
www.wjh.harvard.edu/~inquirer/ spread- Methods in Natural Language Processing. Association Conference on Empirical Methods in Natural Language
for Computational Linguistics (Edinburgh, Scotland,
sheet_guide.htm. UK, 2011). 1092–1103.
Processing (2003).
33. Stone, P. The general inquirer: A computer approach
˲˲ Sentiment lexicon;13 http://www. 10. Hai, Z., Chang, K. and Kim, J-j. Implicit feature to content analysis. Journal of Regional Science 8, 1
identification via co-occurrence association rule
cs.uic.edu/~liub/FBS/sentiment-analy- mining. Computational Linguistics and Intelligent Text
(1968).
34. Taboada, M., J. Brooke, J., Tofiloski, M., Voll, K. and
sis.html. Processing (2011), 393–404. Stede, M. Lexicon-based methods for sentiment
11. Hatzivassiloglou, V. and K. McKeown, Predicting the
˲˲ MPQA subjectivity lexicon;38 http:// analysis. Computational Linguistics 37, 2 (2011),
semantic orientation of adjectives. In Proceedings of 267–307.
www.cs.pitt.edu/mpqa/subj_lexicon.html. the Joint ACL/EACL Conference (1997), 174–181. 35. Tsur, O., Davidov, D. and Rappoport, A. A great catchy
12. Hu, M. and Liu, B. Mining and summarizing customer
˲˲ SentiWordNet;6 http://sentiword- name: Semi-supervised recognition of sarcastic
reviews. In Proceedings of the ACM SIGKDD sentences in online product reviews. In Fourth
net.isti.cnr.it/. Conference on Knowledge Discovery and Data Mining International AAAI Conference on Weblogs and Social
(2004), 168–177. Media (2010).
˲˲ Emotion lexicon;23 http://www. 13. Hu, M. and Liu, B. Mining opinion features in customer 36. Turney, P. Thumbs up or thumbs down? Semantic
purl.org/net/emolex. reviews. In Proceedings of AAAI (2004), 755–760. orientation applied to unsupervised classification
14. Jakob, N. and Gurevych, I. Extracting opinion targets of reviews. In Proceedings of the Association for
˲˲ Financial Sentiment Lexicons (suit- in a single-and cross-domain setting with conditional Computational Linguistics (2002), 417–424.
ed for the determination of the senti- random fields. In Proceedings of Conference on 37. Wan, X. Using bilingual knowledge and ensemble
Empirical Methods in Natural Language Processing techniques for unsupervised Chinese sentiment
ment of financial documents);22 http:// (2010). analysis. In Proceedings of the 2008 Conference
nd.edu/~mcdonald/Word_Lists.html. 15. Jindal, N. and Liu, B. Identifying comparative on Empirical Methods in Natural Language
sentences in text documents. In Proceedings of Processing (Honolulu, Hawaii, 2008). Association for
ACM SIGIR Conf. on Research and Development in Computational Linguistics, 553–561.
Information Retrieval (2006).
Conclusion 16. Kamps, J., Marx, M., Mokken, R.J. and de Rijke, M.
38. Wilson, T., Wiebe, J. and Hoffmann, P. Recognizing
contextual polarity in phrase-level sentiment analysis.
This article reviewed some of the main Using WordNet to measure semantic orientation of In Proceedings of the Human Language Technology
adjectives. LREC, 2004.
research problems within the field of 17. Kim, S.-M. and Hovy, E. Crystal: Analyzing predictive
Conference and the Conference on Empirical Methods
in Natural Language Processing (2005), 347–354.
sentiment analysis and discussed opinions on the Web. In Proceedings of the Joint 39. Wu, Y., Zhang, Q. Huang, X. and Wu, L. Phrase
Conference on Empirical Methods in Natural Language
several algorithms that aim to solve Processing and Computational Natural Language
dependency parsing for opinion mining. In Proceedings
of Conference on Empirical Methods in Natural
each of these problems. I have also Learning (2007). Language Processing (2009).
18. Lafferty, J., McCallum, A. and Pereira, F. Conditional
described some of the major applica- random fields: Probabilistic models for segmenting
40. Yu, H. and Hatzivassiloglou, V. Towards answering
opinion questions: Separating facts from opinions
tions of sentiment analysis and provid- and labeling sequence data. In Proc. 18th International and identifying the polarity of opinion sentences. In
Conf. on Machine Learning. Morgan Kaufmann, San
ed a few major open challenges. Many Proceedings of the Conference on Empirical Methods
Francisco, CA, 2001, 282–289.
in Natural Language Processing (2003).
of the commercial sentiment analysis 19. Lin, D. Minipar; http://webdocs.cs.ualberta.ca/ lindek/
minipar.htm. 2007.
systems still use simplistic techniques 20. Liu, B., Sentiment analysis and subjectivity. Handbook Ronen Feldman (Ronen.Feldman@huji.ac.il) is a professor
in order to avoid these open challeng- of Natural Language Processing. N. Indurkhya and F.J. of information systems in the School of Business
Damerau, eds. 2010. Administration at The Hebrew University of Jerusalem,
es and hence their performance leaves 21. Liu, B. Sentiment analysis and opinion mining. Israel.
a lot to be desired. Providing satisfac- Synthesis Lectures on Human Language Technologies.
Morgan & Claypool Publishers, 2012.
tory solutions to these challenges will 22. Loughran, T. and Mcdonald, B. When is a liability not a
liability? Textual analysis, dictionaries, and 10-Ks. The
make the area of sentiment analysis Journal of Finance 66, 1 (2011), 35-65.
far more widespread. 23. Mohammad, S.M. and Turney, P.D. Emotions evoked © 2013 ACM 0001-0782/13/04

ap r i l 2 0 1 3 | vo l. 56 | n o. 4 | c om m u n ic at ion s of t he acm 89

You might also like