Professional Documents
Culture Documents
Sarcasm is generally characterized as ironic or satir- The sarcasm in these tweets arises from the jux-
ical wit that is intended to insult, mock, or amuse. taposition of a positive sentiment word (e.g., love,
Sarcasm can be manifested in many different ways, enjoyed, adore, pleased) with a negative activity or
but recognizing sarcasm is important for natural lan- state (e.g., being ignored, bus is late, shoveling, and
guage processing to avoid misinterpreting sarcastic being woken up).
statements as literal. For example, sentiment anal- The goal of our research is to identify sarcasm
ysis can be easily misled by the presence of words that arises from the contrast between a positive sen-
that have a strong polarity but are used sarcastically, timent referring to a negative situation. A key chal-
which means that the opposite polarity was intended. lenge is to automatically recognize the stereotypi-
Consider the following tweet on Twitter, which in- cally negative situations, which are activities and
cludes the words yay and thrilled but actually states that most people consider to be unenjoyable or
expresses a negative sentiment: yay! its a holi- undesirable. For example, stereotypically unenjoy-
day weekend and im on call for work! couldnt be able activities include going to the dentist, taking an
more thrilled! #sarcasm. In this case, the hashtag exam, and having to work on holidays. Stereotypi-
#sarcasm reveals the intended sarcasm, but we dont cally undesirable states include being ignored, hav-
always have the benefit of an explicit sarcasm label. ing no friends, and feeling sick. People recognize
these situations as being negative through cultural tuguese. Filatova (2012) presented a detailed de-
norms and stereotypes, so they are rarely accompa- scription of sarcasm corpus creation with sarcasm
nied by an explicit negative sentiment. For example, annotations of Amazon product reviews. Their an-
I feel sick is universally understood to be a nega- notations capture sarcasm both at the document level
tive situation, even without an explicit expression of and the text utterance level. Tsur et al. (2010) pre-
negative sentiment. Consequently, we must learn to sented a semi-supervised learning framework that
recognize phrases that correspond to stereotypically exploits syntactic and pattern based features in sar-
negative situations. castic sentences of Amazon product reviews. They
We present a bootstrapping algorithm that auto- observed correlated sentiment words such as yay!
matically learns phrases corresponding to positive or great! often occurring in their most useful pat-
sentiments and phrases corresponding to negative terns.
situations. We use tweets that contain a sarcasm Davidov et al. (2010) used sarcastic tweets and
hashtag as positive instances for the learning pro- sarcastic Amazon product reviews to train a sarcasm
cess. The bootstrapping algorithm begins with a sin- classifier with syntactic and pattern-based features.
gle seed word, love, and a large set of sarcastic They examined whether tweets with a sarcasm hash-
tweets. First, we learn negative situation phrases tag are reliable enough indicators of sarcasm to be
that follow a positive sentiment (initially, the seed used as a gold standard for evaluation, but found that
word love). Second, we learn positive sentiment sarcasm hashtags are noisy and possibly biased to-
phrases that occur near a negative situation phrase. wards the hardest form of sarcasm (where even hu-
The bootstrapping process iterates, alternately learn- mans have difficulty). Gonzalez-Ibanez et al. (2011)
ing new negative situations and new positive sen- explored the usefulness of lexical and pragmatic fea-
timent phrases. Finally, we use the learned lists tures for sarcasm detection in tweets. They used sar-
of sentiment and situation phrases to recognize sar- casm hashtags as gold labels. They found positive
casm in new tweets by identifying contexts that con- and negative emotions in tweets, determined through
tain a positive sentiment in close proximity to a neg- fixed word dictionaries, to have a strong correlation
ative situation phrase. with sarcasm. Liebrecht et al. (2013) explored N-
gram features from 1 to 3-grams to build a classifier
2 Related Work to recognize sarcasm in Dutch tweets. They made an
interesting observation from their most effective N-
Researchers have investigated the use of lexical gram features that people tend to be more sarcastic
and syntactic features to recognize sarcasm in text. towards specific topics such as school, homework,
Kreuz and Caucci (2007) studied the role that dif- weather, returning from vacation, public transport,
ferent lexical factors play, such as interjections (e.g., the church, the dentist, etc. This observation has
gee or gosh) and punctuation symbols (e.g., ?) some overlap with our observation that stereotypi-
in recognizing sarcasm in narratives. Lukin and cally negative situations often occur in sarcasm.
Walker (2013) explored the potential of a bootstrap- The cues for recognizing sarcasm may come from
ping method for sarcasm classification in social di- a variety of sources. There exists a line of work
alogue to learn lexical N-gram cues associated with that tries to identify facial and vocal cues in speech
sarcasm (e.g., oh really, I get it, no way, etc.) (e.g., (Gina M. Caucci, 2012; Rankin et al., 2009)).
as well as lexico-syntactic patterns. Cheang and Pell (2009) and Cheang and Pell (2008)
In opinionated user posts, Carvalho et al. (2009) performed studies to identify acoustic cues in sarcas-
found oral or gestural expressions, represented us- tic utterances by analyzing speech features such as
ing punctuation and other keyboard characters, to speech rate, mean amplitude, amplitude range, etc.
be more predictive of irony1 in contrast to features Tepperman et al. (2006) worked on sarcasm recog-
representing structured linguistic knowledge in Por- nition in spoken dialogue using prosodic and spec-
1
They adopted the term irony instead of sarcasm to re-
tral cues (e.g., average pitch, pitch slope, etc.) as
fer to the case when a word or expression with prior positive well as contextual cues (e.g., laughter or response to
polarity is figuratively used to express a negative opinion. questions) as features.
While some of the previous work has identi-
Sarcastic Tweets
fied specific expressions that correlate with sarcasm,
none has tried to identify contrast between positive
1 2
sentiments and negative situations. The novel con- 4 3
tributions of our work include explicitly recogniz-
Positive Negative
ing contexts that contrast a positive sentiment with a Seed Word
Sentiment Situation
negative activity or state, as well as a bootstrapped "love" Phrases Phrases
learning framework to automatically acquire posi-
tive sentiment and negative situation phrases. Figure 1: Bootstrapped Learning of Positive Sentiment
and Negative Situation Phrases
3 Bootstrapped Learning of Positive
Sentiments and Negative Situations
in numerous ways, we focus on positive sentiments
Sarcasm is often defined in terms of contrast or say- that are expressed as a verb phrase or as a predicative
ing the opposite of what you mean. Our work fo- expression (predicate adjective or predicate nomi-
cuses on one specific type of contrast that is common nal), and negative activities or states that can be a
on Twitter: the expression of a positive sentiment complement to a verb phrase. Ideally, we would
(e.g., love or enjoy) in reference to a negative like to parse the text and extract verb complement
activity or state (e.g., taking an exam or being ig- phrase structures, but tweets are often informally
nored). Our goal is to create a sarcasm classifier for written and ungrammatical. Therefore we try to rec-
tweets that explicitly recognizes contexts that con- ognize these syntactic structures heuristically using
tain a positive sentiment contrasted with a negative only part-of-speech tags and proximity.
situation. The learning process relies on an assumption that
Our approach learns rich phrasal lexicons of pos- a positive sentiment verb phrase usually appears to
itive sentiments and negative situations using only the left of a negative situation phrase and in close
the seed word love and a collection of sarcastic proximity (usually, but not always, adjacent). Picto-
tweets as input. A key factor that makes the algo- rially, we assume that many sarcastic tweets contain
rithm work is the presumption that if you find a pos- this structure:
itive sentiment or a negative situation in a sarcastic
[+ VERB PHRASE] [ SITUATION PHRASE]
tweet, then you have found the source of the sar-
casm. We further assume that the sarcasm probably This structural assumption drives our bootstrap-
arises from positive/negative contrast and we exploit ping algorithm, which is illustrated in Figure 1.
syntactic structure to extract phrases that are likely The bootstrapping process begins with a single seed
to have contrasting polarity. Another key factor is word, love, which seems to be the most common
that we focus specifically on tweets. The short na- positive sentiment term in sarcastic tweets. Given
ture of tweets limits the search space for the source a sarcastic tweet containing the word love, our
of the sarcasm. The brevity of tweets also probably structural assumption infers that love is probably
contributes to the prevalence of this relatively com- followed by an expression that refers to a negative
pact form of sarcasm. situation. So we harvest the n-grams that follow the
word love as negative situation candidates. We se-
3.1 Overview of the Learning Process lect the best candidates using a scoring metric, and
Our bootstrapping algorithm operates on the as- add them to a list of negative situation phrases.
sumption that many sarcastic tweets contain both a Next, we exploit the structural assumption in the
positive sentiment and a negative situation in close opposite direction. Given a sarcastic tweet that con-
proximity, which is the source of the sarcasm.2 Al- tains a negative situation phrase, we infer that the
though sentiments and situations can be expressed negative situation phrase is preceded by a positive
2
Sarcasm can arise from a negative sentiment contrasted
sentiment. We harvest the n-grams that precede the
with a positive situation too, but our observation is that this is negative situation phrases as positive sentiment can-
much less common, at least on Twitter. didates, score and select the best candidates, and
add them to a list of positive sentiment phrases. 3.4 Learning Negative Situation Phrases
The bootstrapping process then iterates, alternately The first stage of bootstrapping learns new phrases
learning more positive sentiment phrases and more that correspond to negative situations. The learning
negative situation phrases. process consists of two steps: (1) harvesting candi-
We also observed that positive sentiments are fre- date phrases, and (2) scoring and selecting the best
quently expressed as predicative phrases (i.e., pred- candidates.
icate adjectives and predicate nominals). For ex- To collect candidate phrases for negative situa-
ample: Im taking calculus. It is awesome. #sar- tions, we extract n-grams that follow a positive senti-
casm. Wiegand et al. (2013) offered a related ob- ment phrase in a sarcastic tweet. We extract every 1-
servation that adjectives occurring in predicate ad- gram, 2-gram, and 3-gram that occurs immediately
jective constructions are more likely to convey sub- after (on the right-hand side) of a positive sentiment
jectivity than adjectives occurring in non-predicative phrase. As an example, consider the tweet in Figure
structures. Therefore we also include a step in 2, where love is the positive sentiment:
the learning process to harvest predicative phrases
that occur in close proximity to a negative situation I love waiting forever for the doctor #sarcasm
phrase. In the following sections, we explain each
step of the bootstrapping process in more detail. Figure 2: Example Sarcastic Tweet
lexicons could be for sarcasm recognition in tweets. sentiments are not generally indicative of sarcasm.
Since our hypothesis is that sarcasm often arises Third, we labeled a tweet as sarcastic if it contains
from the contrast between something positive and both a positive sentiment term and a negative senti-
something negative, we systematically evaluated the ment term, in any order. The Positive and Negative
positive and negative phrases individually, jointly, Sentiment, Unordered section of Table 2 shows that
and jointly in a specific order (a positive phrase fol- this approach yields low recall, indicating that rela-
lowed by a negative phrase). tively few sarcastic tweets contain both positive and
First, we labeled a tweet as sarcastic if it con- negative sentiments, and low precision as well.
tains any positive term in each resource. The Pos- Fourth, we required the contrasting sentiments to
itive Sentiment Only section of Table 2 shows that occur in a specific order (the positive term must pre-
all three sentiment lexicons achieved high recall (75- cede the negative term) and near each other (no more
78%) but low precision (30-34%). Second, we la- than 5 words apart). This criteria reflects our obser-
beled a tweet as sarcastic if it contains any negative vation that positive sentiments often closely precede
term from each resource. The Negative Sentiment negative situations in sarcastic tweets, so we wanted
Only section of Table 2 shows that this approach to see if the same ordering tendency holds for neg-
yields much lower recall and also lower precision ative sentiments. The Positive and Negative Senti-
of 22-24%, which is what would be expected of a ment, Ordered section of Table 2 shows that this or-
random classifier since 23% of the tweets are sar- dering constraint further decreases recall and only
castic. These results suggest that explicit negative slightly improves precision, if at all. Our hypothe-
sis is that when positive and negative sentiments are In the last experiment, we added the positive pred-
expressed in the same tweet, they are referring to icative expressions and also labeled a tweet as sar-
different things (e.g., different aspects of a product). castic if a positive predicative appeared in close
Expressing positive and negative sentiments about proximity to (within 5 words of) a negative situa-
the same thing would usually sound contradictory tion. The positive predicatives improved recall to
rather than sarcastic. 13%, but decreased precision to 63%, which is com-
parable to the SVM classifiers.
4.3 Evaluation of Bootstrapped Phrase Lists
The next set of experiments evaluates the effective- 4.4 A Hybrid Approach
ness of the positive sentiment and negative situa- Thus far, we have used the bootstrapped lexicons
tion phrases learned by our bootstrapping algorithm. to recognize sarcasm by looking for phrases in our
The results are shown in the Our Bootstrapped Lex- lists. We will refer to our approach as the Contrast
icons section of Table 2. For the sake of compar- method, which labels a tweet as sarcastic if it con-
ison with other sentiment resources, we first eval- tains a positive sentiment phrase in close proximity
uated our positive sentiment verb phrases and neg- to a negative situation phrase.
ative situation phrases independently. Our positive The Contrast method achieved 63% precision but
verb phrases achieved much lower recall than the with low recall (13%). The SVM classifier with un-
positive sentiment phrases in the other resources, but igram and bigram features achieved 64% precision
they had higher precision (45%). The low recall with 39% recall. Since neither approach has high
is undoubtedly because our bootstrapped lexicon is recall, we decided to see whether they are comple-
small and contains only verb phrases, while the other mentary and the Contrast method is finding sarcastic
resources are much larger and contain terms with tweets that the SVM classifier overlooks.
additional parts-of-speech, such as adjectives and In this hybrid approach, a tweet is labeled as sar-
nouns. castic if either the SVM classifier or the Contrast
Despite its relatively small size, our list of neg- method identifies it as sarcastic. This approach im-
ative situation phrases achieved 29% recall, which proves recall from 39% to 42% using the Contrast
is comparable to the negative sentiments, but higher method with only positive verb phrases. Recall im-
precision (38%). proves to 44% using the Contrast method with both
Next, we classified a tweet as sarcastic if it con- positive verb phrases and predicative phrases. This
tains both a positive verb phrase and a negative sit- hybrid approach has only a slight drop in precision,
uation phrase from our bootstrapped lists, in any yielding an F score of 51%. This result shows that
order. This approach produced low recall (11%) our bootstrapped phrase lists are recognizing sarcas-
but higher precision (56%) than the sentiment lex- tic tweets that the SVM classifier misses.
icons. Finally, we enforced an ordering constraint Finally, we ran tests to see if the performance of
so a tweet is labeled as sarcastic only if it contains the hybrid approach (Contrast SVM) is statisti-
a positive verb phrase that precedes a negative situa- cally significantly better than the performance of the
tion in close proximity (no more than 5 words apart). SVM classifier alone. We used paired bootstrap sig-
This ordering constraint further increased precision nificance testing as described in Berg-Kirkpatrick
from 56% to 70%, with a decrease of only 2 points et al. (2012) by drawing 106 samples with repeti-
in recall. This precision gain supports our claim that tion from the test set. These results showed that the
this particular structure (positive verb phrase fol- Contrast SVM system is statistically significantly
lowed by a negative situation) is strongly indicative better than the SVM classifier at the p < .01 level
of sarcasm. Note that the same ordering constraint (i.e., the null hypothesis was rejected with 99% con-
applied to a positive verb phrase followed by a neg- fidence).
ative sentiment produced much lower precision (at
best 40% precision using the Liu05 lexicon). Con- 4.5 Analysis
trasting a positive sentiment with a negative situa- To get a better sense of the strength and limitations
tion seems to be a key element of sarcasm. of our approach, we manually inspected some of the
tweets that were labeled as sarcastic using our boot- constrained context. We plan to explore methods for
strapped phrase lists. Table 3 shows some of the sar- allowing more flexibility and for learning additional
castic tweets found by the Contrast method but not types of phrases and contrasting structures.
by the SVM classifier. We also would like to explore new ways to iden-
tify stereotypically negative activities and states be-
i love fighting with the one i love cause we believe this type of world knowledge is
love working on my last day of summer essential to recognize many instances of sarcasm.
i enjoy tweeting [user] and not getting a reply For example, sarcasm often arises from a descrip-
working during vacation is awesome . tion of a negative event followed by a positive emo-
cant wait to wake up early to babysit ! tion but in a separate clause or sentence, such as:
Table 3: Five sarcastic tweets found by the Contrast Going to the dentist for a root canal this after-
method but not the SVM noon. Yay, I cant wait. Recognizing the intensity
of the negativity may also be useful to distinguish
These tweets are good examples of a positive sen- strong contrast from weak contrast. Having knowl-
timent (love, enjoy, awesome, cant wait) contrast- edge about stereotypically undesirable activities and
ing with a negative situation. However, the negative states could also be important for other natural lan-
situation phrases are not always as specific as they guage understanding tasks, such as text summariza-
should be. For example, working was learned as tion and narrative plot analysis.
a negative situation phrase because it is often neg-
ative when it follows a positive sentiment (I love 6 Acknowledgments
working...). But the attached prepositional phrases
This work was supported by the Intelligence Ad-
(on my last day of summer and during vacation)
vanced Research Projects Activity (IARPA) via De-
should ideally have been captured as well.
partment of Interior National Business Center (DoI
We also examined tweets that were incorrectly la-
/ NBC) contract number D12PC00285. The U.S.
beled as sarcastic by the Contrast method. Some
Government is authorized to reproduce and dis-
false hits come from situations that are frequently
tribute reprints for Governmental purposes notwith-
negative but not always negative (e.g., some peo-
standing any copyright annotation thereon. The
ple genuinely like waking up early). However, most
views and conclusions contained herein are those
false hits were due to overly general negative situa-
of the authors and should not be interpreted as
tion phrases (e.g., I love working there was labeled
necessarily representing the official policies or en-
as sarcastic). We believe that an important direction
dorsements, either expressed or implied, of IARPA,
for future work will be to learn longer phrases that
DoI/NBE, or the U.S. Government.
represent more specific situations.
5 Conclusions
References
Sarcasm is a complex and rich linguistic phe- Taylor Berg-Kirkpatrick, David Burkett, and Dan Klein.
nomenon. Our work identifies just one type of sar- 2012. An empirical investigation of statistical signifi-
casm that is common in tweets: contrast between a cance in nlp. In Proceedings of the 2012 Joint Confer-
positive sentiment and negative situation. We pre- ence on Empirical Methods in Natural Language Pro-
sented a bootstrapped learning method to acquire cessing and Computational Natural Language Learn-
lists of positive sentiment phrases and negative ac- ing, EMNLP-CoNLL 12, pages 9951005.
tivities and states, and show that these lists can be Paula Carvalho, Lus Sarmento, Mario J. Silva, and
Eugenio de Oliveira. 2009. Clues for detecting irony
used to recognize sarcastic tweets.
in user-generated contents: oh...!! its so easy ;-). In
This work has only scratched the surface of pos- Proceedings of the 1st international CIKM workshop
sibilities for identifying sarcasm arising from posi- on Topic-sentiment analysis for mass opinion, TSA
tive/negative contrast. The phrases that we learned 2009.
were limited to specific syntactic structures and we Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM:
required the contrasting phrases to appear in a highly A library for support vector machines. ACM Transac-
tions on Intelligent Systems and Technology, 2:27:1 Olutobi Owoputi, Brendan OConnor, Chris Dyer, Kevin
27:27. Gimpel, Nathan Schneider, and Noah A. Smith. 2013.
Henry S. Cheang and Marc D. Pell. 2008. The sound of Improved part-of-speech tagging for online conversa-
sarcasm. Speech Commun., 50(5):366381, May. tional text with word clusters. In The 2013 Confer-
Henry S. Cheang and Marc D. Pell. 2009. Acous- ence of the North American Chapter of the Associa-
tic markers of sarcasm in cantonese and english. tion for Computational Linguistics: Human Language
The Journal of the Acoustical Society of America, Technologies (NAACL 2013).
126(3):13941405. Katherine P. Rankin, Andrea Salazar, Maria Luisa Gorno-
Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. Tempini, Marc Sollberger, Stephen M. Wilson, Dani-
Semi-supervised recognition of sarcastic sentences in jela Pavlic, Christine M. Stanley, Shenly Glenn,
twitter and amazon. In Proceedings of the Four- Michael W. Weiner, and Bruce L. Miller. 2009. De-
teenth Conference on Computational Natural Lan- tecting sarcasm from paralinguistic cues: Anatomic
guage Learning, CoNLL 2010. and cognitive correlates in neurodegenerative disease.
Elena Filatova. 2012. Irony and sarcasm: Corpus gener- Neuroimage, 47:20052015.
ation and analysis using crowdsourcing. In Proceed- Joseph Tepperman, David Traum, and Shrikanth
ings of the Eight International Conference on Lan- Narayanan. 2006. Yeah right: Sarcasm recogni-
guage Resources and Evaluation (LREC12). tion for spoken dialogue systems. In Proceedings of
Roger J. Kreuz Gina M. Caucci. 2012. Social and par- the INTERSPEECH 2006 - ICSLP, Ninth International
alinguistic cues to sarcasm. online 08/02/2012, 25:1 Conference on Spoken Language Processing.
22, February. Oren Tsur, Dmitry Davidov, and Ari Rappoport. 2010.
Roberto Gonzalez-Ibanez, Smaranda Muresan, and Nina ICWSM - A Great Catchy Name: Semi-Supervised
Wacholder. 2011. Identifying sarcasm in twitter: A Recognition of Sarcastic Sentences in Online Product
closer look. In Proceedings of the 49th Annual Meet- Reviews. In Proceedings of the Fourth International
ing of the Association for Computational Linguistics: Conference on Weblogs and Social Media (ICWSM-
Human Language Technologies. 2010), ICWSM 2010.
Lars Kai Hansen, Adam Arvidsson, Finn Arup Nielsen, Michael Wiegand, Josef Ruppenhofer, and Dietrich
Elanor Colleoni, and Michael Etter. 2011. Good Klakow. 2013. Predicative adjectives: An unsuper-
friends, bad news - affect and virality in twitter. In vised criterion to extract subjective adjectives. In Pro-
The 2011 International Workshop on Social Comput- ceedings of the 2013 Conference of the North Ameri-
ing, Network, and Services (SocialComNet 2011). can Chapter of the Association for Computational Lin-
Roger Kreuz and Gina Caucci. 2007. Lexical influences guistics: Human Language Technologies, pages 534
on the perception of sarcasm. In Proceedings of the 539, Atlanta, Georgia, June. Association for Compu-
Workshop on Computational Approaches to Figurative tational Linguistics.
Language. T. Wilson, P. Hoffmann, S. Somasundaran, J. Kessler,
Christine Liebrecht, Florian Kunneman, and Antal J. Wiebe, Y. Choi, C. Cardie, E. Riloff, and S. Patward-
Van den Bosch. 2013. The perfect solution for detect- han. 2005a. OpinionFinder: A System for Subjec-
ing sarcasm in tweets #not. In Proceedings of the 4th tivity Analysis. In Proceedings of HLT/EMNLP 2005
Workshop on Computational Approaches to Subjec- Interactive Demonstrations, pages 3435, Vancouver,
tivity, Sentiment and Social Media Analysis, WASSA Canada, October.
2013. Theresa Wilson, Janyce Wiebe, and Paul Hoffmann.
Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. 2005b. Recognizing contextual polarity in phrase-
Opinion observer: Analyzing and comparing opinions level sentiment analysis. In Proceedings of the 2005
on the web. In Proceedings of the 14th International Human Language Technology Conference / Confer-
World Wide Web conference (WWW-2005). ence on Empirical Methods in Natural Language Pro-
Stephanie Lukin and Marilyn Walker. 2013. Really? cessing.
well. apparently bootstrapping improves the perfor-
mance of sarcasm and nastiness classifiers for online
dialogue. In Proceedings of the Workshop on Lan-
guage Analysis in Social Media.
Finn Arup Nielsen. 2011. A new anew: Evaluation of
a word list for sentiment analysis in microblogs. In
Proceedings of the ESWC2011 Workshop on Making
Sense of Microposts: Big things come in small pack-
ages (http://arxiv.org/abs/1103.2903).