Professional Documents
Culture Documents
Dylan Valerio
University of the Philippines Diliman
dylan_valerio@yahoo.com
I. INTRODUCTION
The new user-centric Web hosts a large volume of data created by
various users. Social media have made common users the new cocreators of web content rather than being passive consumers.
Opinions expressed in social media in form of reviews or opinion
posts constitute an important and interesting topic worthy of
exploration and exploitation. With the increase in the accessibility
of opinion resources such as movie reviews, social network tweets
etc. and availability of opinion sentiment lexicons, opinion
analysis is currently one of the more interesting problems in
Natural Language Processing. The new challenging task now is to
do opinion mining on large volumes of text and devise suitable
learning algorithms to understand the opinions of others.
Sentiment analysis refers to the use of language processing to
extract opinions and beliefs of people in electronic document. It
deals with analyzing and trying to identify the expressions of
opinion and mood of the writer. Sentiment analysis has been
successfully used in literature to analyze the subjective content of
product reviews [1][2][3]. A task common to sentiment analysis is
recognizing the sentiment polarity of a given document. Given a
movie review R, sentiment polarity analysis is the binary
classification task in inferring whether review R, represented as a
set of words belongs to polarity classes p {+1, 1}, +1 implying
positive polarity and -1 implying negative polarity.
A common approach to extracting polarity is analyzing worldlevel features. The occurrence of a word is a factor for computing
the polarity of a sentence, and in turn factorizes to the polarity of
the entire document. This bag-of-words representation has led to
varying success. However, we recognize that this representation is
oversimplified. Sentences have a structure, where words either
diminish, negate or intensify its polarity. Complex style, such as
humor, sarcasm and intelligence is built on top of this structure,
thus, words, analyzed only individually, loses its context.
Traditional methods, such as mapping words into negative and
positive polarity with varying strengths are ineffective in
incorporating structure and context to sentiment analysis.
Consider the following sentences,
1.
2.
Enigma is well-made , but it's just too dry and too placid.
The story loses its bite in a last-minute happy ending that's
even less plausible than the rest of the picture. Much of the
way, though, this is a refreshingly novel ride .
At the onset, the first movie review above is positive, but the
second clause has more intense negative polarity and thus should
be considered negative. The second movie review looks to have
more negative phrases, but at the end, the author reverses the
polarity of the document, giving it a positive sentiment. In this
study, we discuss different approaches to incorporate structure
2. RELATED LITERATURE
Nakagawa et al used conditional random fields with hidden
variables to capture word dependencies in sentiment analysis [1].
Instead of individual words, they used dependency trees to group
together words into subjective phrases that has a root head word.
Getting the polarity of the phrase is contingent on getting the
dependencies of words and their interaction with their respective
parent words in the dependency tree. The hidden node captures
information regarding the dependencies of the words and other
words, as well as sentiments given phrases. Central to their paper
is the idea of polarity reversal and p, which takes into account
reversing the polarity of a phrase when the head word is part of
the polarity reversal lexicon. They tested their algorithm on 4
Japanese and 4 English corpora and found that their Tree-CRF
worked significantly better than the best models that take into
account polarity reversal as well.
Matsumoto et al also used dependency sub trees to incorporate
structure in their work [3]. They used frequent pattern mining
techniques, Prefix span and FREQT to extract words occurring in
frequent sentences. Due to the exponential growth of sequences
given a corpus, they constrained their sentences to short clauses,
SBAR, then removed some punctuations and words with part-ofspeech tags that are irrelevant to the sentiment analysis task. They
then used a feature vector representation of their corpus as input
to their linear support vector machine algorithm.
Zhang et al has treated sentiment analysis as a information
extraction task [2]. Firstly, they used conditional random fields to
determine the features of a product review, for example, the
camera or the battery life of a mobile phone. Second, they extract
the opinion of the author given the recognized feature of the first
task. Lastly, they extract the opinions polarity. The first task,
product review recognition, can be considered an entity
recognition task, in which the entity is the product feature. Using
a state-of-the-art sequence tagger, conditional random fields, the
authors have delivered results such as [positive] [battery life] and
[negative] [camera].
Common to most related literature described above is the attempt
to capture sentence structure specifically to recognize polarity
reversal. Jbara and Radev have discussed negation detection and
resolution in their work [4]. They used an available corpus, the
SEM Shared Task 2012 Corpus, which contained annotated works
of Sir Arthur Conan Doyle, The Hound of Baskervilles and The
Adventures of Wisteria Lodge. The paper focused on three tasks,
1
3. METHODOLOGY
3.1 Data Set
For this paper, we used the corpus available from Rotten
Tomatoes, a website that lets users rate and review movies [15].
There is a total of 10,662 movie reviews, with an even number of
positive and negative sentiments. There are a total of 18,342
unique words. For the negative reviews only, there exist 5761
unique words not found in the positive reviews. On the other
hand, 5422 unique words are found in the negative reviews. A
table of the most frequent words unique to each set is outlined at
Table 1.
Word unique to
Positive
Frequency
Word unique to
Negative
Frequency
riveting
20
unfunny
26
gem
17
badly
25
wonderfully
15
poorly
19
detailed
14
disguise
17
heartwarming
14
pointless
17
lively
14
seagal
17
vividly
14
bore
16
polished
13
benigni
15
spare
13
product
14
tour
13
pinocchio
13
3.3.1
In this approach, sentiment scores were first calculated by wordlevel and then contextual changes are propagated to phrase-level
and finally, the sentiment overall-level polarity is computed by
summing up the sentiment scores of each phrases. This approach
is often called recursive sentiment analysis. Sentiment scores are
propagated in a bottom-up level manner: beginning with the prior
polarity of each word and ending with the root of the tree,
containing the overall sentiment polarity.
(2)
Aspect Phrases
Aspect extraction is an important preliminary phase since they
serve as the scope for sentiment words interactions. They are also
a critical component in the computation for the overall sentiment
score which then affects the output sentiment polarity. Review
aspect phrases are essentially non-overlapping word segments
which form the base Noun Phrase. For an input movie review, we
extract the Aspect Noun Phrases using a phrase chunker
implemented in CRF++ [14]. Due to the effectiveness of
(1)
(2)
Computation:
Current polarity score of affected word * -1, if negator
3. Reversers (Long-range Reversers) The effects of reversers
are similar with negators. As observed in the movie review
domain, reverser terms such as but does not affect just the words
but the phrases preceding it.
Examples
(1) [The concept] is okay, [the scenery] is great and the [acting]
is fine but the [movie] is too long
In the above example, overall sentiment polarity is classified as
negative which is correct. The three preceding aspect phrases will
all contain reversed scores when but is propagated.
Computation:
For every preceding phrases, phrase score * -1, if reverser
Sentiment Lexicons
1.) We used SentiWords 1.0 for calculating the prior sentiment
score of each word. SentiWords 1.0 is a freely available sentiment
lexicon containing 155,000 words associated with a sentiment real
number score. Sentiment scores are learned from SentiWordNet.
SentiWords was built using the method described in Guerini et al.
(2013) and the dataset presented in Warriner et al. (2013).
Downloaded from https://hlt.fbk.eu/technologies/sentiwords
2.) To obtain the list of contextual shifters, we collected
annotated words in the Harvard General Inquirer Lexicon. The
lexicon is a collection of syntactic, semantic and pragmatic
information attached to part-of-speech tagged words. Lexicon was
downloaded from http://www.wjh.harvard.edu/~inquirer/
3.) The prior sentiment score of an aspect phrase is highly
dependent on the prior scores of the nouns within it. An issue that
we encountered was that SentiWords 1.0 also provided high
scores for generally neutral nouns making classification
performance biased to the existence of non-domain words. To
circumvent this, we limit the sentiment effect of noun words. For
each noun, we determine if it is related to the domain of film
review by looking it up on a domain lexicon before obtaining its
prior sentiment score from SentiWords. The movie review domain
lexicon was obtained from:
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Sentiment Polarity Classification Algorithm
The algorithm for the phrase-level sentiment analysis is detailed
below. The classifier is implemented in Java language with the
necessary language processing modules downloaded from the
available resources from The Stanford Natural Language
Processing Group at http://nlp.stanford.edu/software/index.shtml
a.
b.
c.
d.
Negators
Adverb modifiers (diminishers and intensifiers)
Adjective modifiers (diminishers and intensifiers)
Verb object modifiers
Threshold
Overall Accuracy
Precision
Recall
F-Score
62.2801%
79.6435%
59.1226%
0.6787
0.1
63.6875%
74.7655%
61.2135%
0.67314
0.2
64.6493%
71.1069%
62.9830%
0.6680
0.3
65.0481%
67.0732%
64.4725%
0.65747
0.4
65.3061%
62.9925%
66.0600%
0.64490
4. EXPERIMENTAL RESULTS
Features
Accuracy
Precision
Recall
F-Score
Bag of Words
Threshold
Overall Accuracy
Precision
Recall
F-Score
61.6676%
78.9533%
58.6702%
0.6732
0.1
63.2527%
73.5884%
60.9824%
0.6670
0.2
63.7029%
69.5554%
62.2670%
0.65710
0.3
63.6654%
65.2223%
63.2527%
0.64222
0.4
64.1156%
61.0955%
65.0230%
0.6300
unigrams
9826
52.2738%
70.8861%
51.6570%
0.5976
bigrams
12434
44.4444%
22.5504%
40.1168%
0.2887
unigrams
+ bigrams
22260
44.1163%
35.7712%
42.9375%
0.3903
Context
part-ofspeech
13126
56.7192%
59.0930%
56.9536%
0.5800
sentiment
word
count
AFFIN111
Lexicon
62.2128%
56.4463%
63.8050%
0.5990
Method
Features
Accuracy
Precision
Recall
F-Score
Proposed Methods
unigrams with negation
8630
48.5004%
67.9475%
48.9204%
0.56885
200
52.5773%
77.7888%
51.7134%
0.6213
1700
53.7960%
60.1690%
53.3670%
0.56564
lemmas
50.2430%
02.3490%
40.9680%
0.0444
62.2801%
79.6435%
59.1226%
0.6787
65.0481%
67.0732%
64.4725%
0.6422
5. DISCUSSION OF RESULTS
For the bag of words approaches, the unigrams feature is the most
accurate with 52.27% accuracy and F-score of 0.5976. In terms of
precision and recall, the unigram model has a good precision of
70.89% but has lower recall at 51.66%. The use of a bigram
model reduces the unigram accuracy to 44.44% and weakens the
classifier overall. A combination of bigrams and unigrams
resulted in an increase in F-measure on the bigram feature model
(0.29 to 0.39) but no improvement on its overall accuracy.
Incorporating context using part of speech tag features did not
provide better performance as compared to the unigrams model. It
however has a slightly higher recall but suffers greatly from
precision performance. With the use of a sentiment lexicon for
polarity features in the next method, overall accuracy increases
from 52.27% up to 62.21%. Recall also improves to 63.81% but
precision is lower at 56.45%. We can observe that by adding
context features such as part-of-speech tags and polar word
counts, we can improve the recall performance of bag of word
models and achieve higher overall accuracy. The observed
limitation of these two methods is its difficulty in classifying
positive move reviews which entails a lower precision result.
In our proposed methods, unigram lemmas with negation proved
only slightly better than unigrams. We also observed that feature
reduction significantly improved our Nave Bayes models
performance (increasing accuracy from 48% to 52%). Using only
lemmas as features proved devastating to performance (50.24%
accuracy but 0.044 F-score). In the case of the phrase-level model
with context shifters, evaluation results show that it outperforms
the other classification models, including the bag of words
approaches. We achieved the highest precision (79.64%) and Fscore (0.6787) when the threshold is in default value, 0.00.
Further increasing the threshold to 0.30 reduced the precision to
67%, as expected but achieved the highest recall (64.4725%) and
overall accuracy (65.0481%).
In general, incorporating negation to the classifier model produce
results similar with the two context baseline methods in terms of
recall and overall accuracy. However, they perform well on
classifying positive movie reviews hence, achieving better
precision results and higher F-scores. The observation of word
interactions thru polarity shifts further boosted sentiment polarity
classification performance and achieved the highest results among
the other models.
6. CONCLUSION
In this paper, we discussed different approaches in sentiment
polarity classification. We used Conditional Random Fields as a
sequence labeling model for negation cue detection and aspect
phrase extraction. Our proposed methods, which takes into
consideration the context of the reviews performed significantly
better than the traditional bag of words models. We also achieved
the highest performance measures using phrase-level analysis
with context shifting. Indeed, the analysis of contextual word
interactions can significantly improve the classifiers performance
for the Sentiment Polarity classification task.
The results obtained with our implementations are highly
encouraging and suggest that modifications can be further made
and that more effective sentiment classification models can be
constructed.
REFERENCES
[1] Tetsuji Nakagawa, Kentaro Inui, and Sadao Kurohashi.
2010. Dependency tree-based sentiment classification using CRFs
with hidden variables. In Human Language Technologies: The
2010 Annual Conference of the North American Chapter of the
Association for Computational Linguistics(HLT '10). Association
for Computational Linguistics, Stroudsburg, PA, USA, 786-794.
[2] Shu Zhang; Wen-Jie Jia; Ying-Ju Xia; Yao Meng; Hao Yu,
"Opinion Analysis of Product Reviews," Fuzzy Systems and
Knowledge Discovery, 2009. FSKD '09. Sixth International
Conference on , vol.2, no., pp.591,595, 14-16 Aug. 2009.
[3] Shotaro Matsumoto, Hiroya Takamura, and Manabu Okumura.
2005. Sentiment classification using word sub-sequences and
dependency sub-trees. In Proceedings of the 9th Pacific-Asia
conference on Advances in Knowledge Discovery and Data
Mining (PAKDD'05), Tu Bao Ho, David Cheung, and Huan Liu
(Eds.). Springer-Verlag, Berlin, Heidelberg, 301-311.
[4] Amjad Abu-Jbara and Dragomir Radev. 2012. UMichigan: a
conditional random field model for resolving the scope of
negation. In Proceedings of the First Joint Conference on Lexical
and Computational Semantics - Volume 1: Proceedings of the
main conference and the shared task, and Volume 2: Proceedings
of the Sixth International Workshop on Semantic
Evaluation(SemEval '12). Association for Computational
Linguistics, Stroudsburg, PA, USA, 328-334.
[5] Guerini M., Gatti L. & Turchi M. Sentiment Analysis: How
to Derive Prior Polarities from SentiWordNet. In Proceedings of
the 2013 Conference on Empirical Methods in Natural Language
Processing (EMNLP'13), pp 1259-1269. Seattle, Washington,
USA. 2013.