Professional Documents
Culture Documents
Tanushree Mitra
School of Interactive Computing
Georgia Institute of Technology
Atlanta, GA, USA
tmitra3@cc.gatech.edu
David A. Shamma
Internet Experiences Group
Yahoo! Research
Santa Clara, CA, USA
aymans@acm.org
Eric Gilbert
School of Interactive Computing
Georgia Institute of Technology
Atlanta, GA, USA
gilbert@cc.gatech.edu
ABSTRACT
What topics are within bounds for search, but out of bounds for
Twitter, and vice versa? Where do they overlap? Using a ground-up,
empirically-based approach, we compare topics across tweets and
search queries. Using a random sample of tweets and queries, we
perform a deep content analysis. We nd substantial and signicant
differences on many topics. For example, we observe entertainment,
food and sports topics in tweets far more often than we see them
among search queries. On the other hand, people routinely query for
shopping, yet rarely mention it on Twitter. There is also considerable
overlap: we see references to celebrities, health & beauty, travel &
recreation, education, gaming and weather as often in tweets as in
searches. By identifying where topics converge and diverge, this
work bridges a contextual gap between social and search, informing
modern systems that combine the two, as well as computer-mediated
communication theory.
Keywords
twitter, search queries, social search, topics, cross-site
Categories and Subject Descriptors
H.5.3. [Information Interfaces and Presentation: Group and Or-
ganization Interfaces]: Web-based interaction
1. INTRODUCTION
. . . when ones activity occurs in the presence of other persons,
some aspects of the activity are expressively accentuated and
other aspects, which might discredit the fostered impression,
are suppressed. It is clear that accentuated facts make their
appearance in what I have called a front region; it should
be just as clear that there may be another regiona back
region or back stagewhere the suppressed facts make an
appearance.
The Presentation of Self in Everyday Life, pg. 111 [19]
Most Americans nd themselves constantly migrating between
search engines and social network sites (SNSs) [44]between a
place where we think out loud (e.g., Twitter) and a place where
we keep our thoughts to ourselves (e.g., Google). In Goffmans
language, Twitter is the front-stage and search is the back-stage
the place where we can express thoughts and desires without fear of
social retribution. How do the two stages alter the meaning of what
we ask or post? What topics are within bounds for search, but out
of bounds for Twitter, and vice versa? Where do they overlap? In
other words, what are the topics that people use when they have an
audience (as in Twitter) versus when they interact with a machine
(as with a search engine)?
In this paper, we explore these questions. We believe the answers of-
fer contributions to two bodies of research: computer mediated com-
munication (CMC) theory and social search. Recent research shows,
for instance, that tweets can detect breaking news [43], enhance
situational awareness in an emergency [53], and predict opinion
polls [41]. Yet, all this work is predicated on what kinds of things
people will or will not say on systems like Twitter (i.e., compared to
a baseline like search). We believe our work informs the theoretical
underpinnings of work such as this. Moreover, the present research
has practical implications for social search systemssystems which
are based on the premise that search is not a solitary activity but
is informed by social sources of information [15, 16, 21, 35, 42].
Systems like socialmention
2
(1, N = 2000) = 26.47, p < 0.003. For example, consider the
following tweets:
Tweet: It will be love at rst bite with these Chocolate and
Raspberry Cream Tarts.
Tweet: Bout To Bless My Stomach w/ This Chick l A
Chicken Sandwhich #mhmm
One interpretation is people talk more about the good things they eat,
more than they look for good things to eat. What motivates people
to talk about food on Twitter? Yoshihiko [32] performed a series of
studies to discover whether knowledge of food is shared socially and
whether people use it to maintain relationships. He found positive
results on both points. Our ndings corroborate these results.
6
4.5 Profanity and Pornography
Using profane content is a common practice in online communities,
and Twitter is no exception. What is perhaps surprising is that
profanity is almost always targeted to an audience (Twitter) and
rarely used in queries,
2
(1, N = 2000) = 22.51, p < 0.003.
Going back to our previous example quoted in an earlier section, we
see how a user expresses his frustration through his tweet, something
which we almost never encounter in search behavior.
Tweet: Sleeping pattern f***ed!! [asterisks in original]
Query: dyssomnia
This behavior pattern is ipped in the Pornography category,
2
(1, N =
2000) = 43.66, p < 0.003.
4.6 Business & Finance
The Business/Finance category comprises tweets and search queries
related to business, money, banks and nances. We found a sig-
nicant difference in tweets and search queries belonging to this
category,
2
(1, N = 2000) = 10.38, p < 0.003. We return to its
implications later in the Discussion section.
4.7 Technology
Tweets and queries mentioning electronic and digital products, soft-
ware and hardware applications and questions about troubleshooting
technological glitches have been grouped under Technology. We
see a higher percentage of queries than tweets in this category,
2
(1, N = 2000) = 17.41, p < 0.003. While we see many
references to technical troubleshooting in search queries, we rarely
see them in tweets.
4.8 Search
Another perhaps more obvious category where we nd a sweeping
presence in queries is Search,
2
(1, N = 2000) = 98.75, p <
0.003. These are mainly navigational queries [5] where the users
intention is to reach a top level of a website they already have in
mind, such as searching for care.com or facebook.
4.9 Similar Topics
We nd statistically identical behavior in 6 of the 18 categories
(excluding Miscellaneous). People tweet about celebrities and they
also search for celebrity news, photos, videos and styles. People
extensively use search engines to plan their vacations (e.g., grand
canyon vacations lake resorts). They also tweet about it once they
get there (e.g., Had an excellent trip to the Somme this week).
In Health/Beauty, tweet and query proportions did not show sig-
nicant differences either. But it was illuminating to see queries
oriented towards nding information on medications, treatments
and diseases (e.g., ringworm treatment); tweets, on the other hand,
focused on maintaining a healthy and youthful appearance (e.g.,
My eyelashes are falling out at an alarming rate. V concerned I will
wake up tomorrow with a bald eye). Finally, we found it difcult to
classify about 9% of the queries and 4% of the tweets in the corpus,
labeling them as Miscellaneous.
5. DISCUSSION
Our study reveals the topical distribution emerging from two differ-
ent, massive internet systems: 1) Twitter, where words are visible to
an audience; and, 2) a search engine, where the sole interaction is
with a machine and the focus is information seeking. We see a large
proportion of tweets in the Entertainment category, mostly talking
about what someone is listening to or watching, and whether they
like it. People talk more about food and cooking than they look for
good things to eat. They enthusiastically tweet and retweet breaking
news and live sporting events. This behavior is perhaps an attempt
at self-verication [7], verifying whether others attribute the same
meaning to an individuals role performance.
Certain topics seem more appropriate for public consumption than
others. Compared to how often they search for them, people rarely
talk about certain topics on Twitter. For example, in the Shopping
and Business/Finance categories we see lower percentages of tweets
than search queries. Perhaps this is due to the stigma sometimes
attached with brazenly pursuing material possessions. Individuals
might not want to portray an image of a materialistic, self-centered
individual to their Twitter audience. Or, they may not want to reveal
the extent of their wealth [47]. We see this as a potentially deep area
for future research.
We also see similar phenomena in the Technology category. We
nd several instances of search queries where people look for ways
to troubleshoot a problem. However, tweets about these issues at a
much lower rate. Are people hesitant to accept their lack of technical
know-how in front of their Twitter followers despite the help they
may receive, or is it because they are more condent they will nd a
better answer through a search engine? Or Twitter might present a
challenge for describing a problem in 140 characters which contrasts
the developed language for short search queries.
5.1 Practical Implications
These ndings provide a starting point for what kind of things people
say and what kind of things they ask search engines. The focus
here is understanding more deeply the context around search and
social media: the intent of a social post versus the intent of a query.
The same text can represent different intents, which presents new
challenges for how algorithms and systems address the structure of
the content itself [31]. Consider the design of a question answering
system dealing with questions specic to the technological domain.
An important rst step in such a system is question processing. If the
designer builds this based on technical troubleshooting questions
posed in SNSs, she will rarely nd technical questions in SNSs,
and the system will likely perform poorly. The scenario will be
similar when the domain is switched from Technology to Business &
Finance. If a business & nancial advice knowledge base is built
based on questions people ask on SNSs, the system would likely
only cover a small subset of the things people care about.
We saw that people query extensively for shopping and tweet their
purchases, product preferences and advertise products, but never
search for advertisements. Consider the following tweet:
Tweet: Just ordered myself a skirt fromurban outtters . . . when
I get some more money in feb im buying some braces too
. . . bring it on!
The user not only tells about her recent purchase, but also talks
about her future purchasing intentions. Perhaps her recent invest-
ment in products from Urban Outtters and her future intentions
to buy braces can serve as useful inputs to an advertising system
for targeting relevant ads. Advertising systems often match user-
entered Web search query against keywords associated with ads for
7
content-targeted advertising [28]. The challenges associated with
that approach is identifying relevant ads using queries which are
often short and lack context [6]. Query expansion using external
source of knowledge, specically Web search results and a large
taxonomy of commercial topics are suggested ways to address this
challenge [6]. We believe that Twitter could be an excellent source
of information for query augmentation, especially for those topics
which are more often tweeted about than queried for on search
engines.
Perhaps most importantly, consider systems like Bings social search.
It prompts users to share searches with Facebook friends, irrespec-
tive of the genre of the search query. Given what we have found
regarding the distribution of topics like Business/Finance, should a
system prompt a user to broadcast a query such as the following?
what does it mean if bank of america is taking up to 3
months to see if person qualies for a mortgage
Based on the shared social information, Bing also nds friends in
your social network who have knowledge relevant to the search
query. This approach may work best when the search query matches
up with something people often talk about on SNSs. Our ndings
provide a way forward.
How can we harness the richness of public discourse without com-
promising privacy? Perhaps systems such as these could consider
allowing a user to post certain search queries anonymously to an
SNS. The three characteristics of social translucence theory [14]
visibility, awareness, and accountability would work in interesting
ways in such a system design. The person posting an anonymous
query is not visible to his social connections, but his connections
are aware that it was issued by someone within their social network.
This might permit discussion about otherwise off-limits topics, like
money, shopping, etc. However, will there be accountability attached
to answering a question like this?
5.2 Theoretical Implications
We believe our work makes two important contributions to existing
theory in cross-site CMC research. First, we show how topics differ
between two massive internet systems. This sets up an interesting
follow-on question: Why do such differences occur? While we often
speculate in this paper about personal motivations behind the data
we observe, observational quantitative studies like this one usually
cannot establish motivation. This is likely best tackled with interview
techniques.
We saw that people are more likely to talk about some topics than
search for them. The topical differences were drawn without respect
to individual differences on Twitter: for example, How might these
results differ with audience size? This gives way to new research
questions: What topics would be more popular if people can control
their audience or direct messages towards specic constituencies
(e.g., Google circles)? We need more work on problems like these
to see how differences like the ones we present here manifest under
variations like audience size or composition.
5.3 Limitations
One limitation of our study is that the demographics of the two
channels do not completely overlap [17, 50]. At the same time,
they are by no means entirely disjoint. We hope that future work
will look into overcoming this limitation by nely segmenting user
populations. Moreover, further research needs to be done to explore
deeper questions around motivation and intent. Also, our current
study is limited to highly public SNS interactions. The broad range
of social interactions vary from being highly public to very private
are left unexplored. This suggests future work should examine how
the topical space varies as a function of privacy in SNSs; however,
that would require privileged data access.
6. CONCLUSION
Our study addresses the behavioral differences and similarities be-
tween two widely used internet systems: a search engine and Twitter.
It examines how topics in these systems compare and contrast and
draws a connection between what people say versus what they
search for. Addressing a broader issue of cross-site studies in CMC
research, this work bridges a gap between two important internet
systems, and can inform modern CMC research as well as the design
of social search. We hope that this study motivates future researchers
to nd the intrinsic reasons behind the differences we observe.
7. ACKNOWLEDGEMENTS
We would like to thank various colleagues for reviewing early drafts
of this work. Also, we extend our gratitude to the funders of this
work.
8. REFERENCES
[1] Douglas G. Altman. Practical Statistics for Medical Research.
Chapman and Hall, 1991.
[2] Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, David
Grossman, and Ophir Frieder. Hourly analysis of a very large
topically categorized web query log. In Proc. SIGIR, 2004.
[3] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent
dirichlet allocation. J. Mach. Learn. Res., 2003.
[4] d. boyd and N.B. Ellison. Social Network Sites: Denition,
History, and Scholarship. Journal of Computer-Mediated
Communication, 13(1):210230, 2008.
[5] Andrei Broder. A taxonomy of web search. SIGIR Forum,
36(2):pages 310, September 2002.
[6] Andrei Z. Broder, Peter Ciccolo, Marcus Fontoura, Evgeniy
Gabrilovich, Vanja Josifovski, and Lance Riedel. Search
advertising using web relevance feedback. In Proc. CIKM,
pages 10131022, 2008.
[7] Peter J. Burke and Anna Riley. Indentities and
self-verication in the small group. Social Psychology
Quarterly, 58(2):6173, 1995.
[8] Declan Butler. When google got u wrong. Nature,
494(7436):155, 2013.
[9] Michael D. Byrne, Bonnie E. John, Neil S. Wehrle, and
David C. Crow. The tangled Web we wove: a taskonomy of
WWW use. In Proc. CHI, pages 544551, 1999.
[10] Mike Cassidy and Matthew Kulick. An update to google
social search. http://googleblog.blogspot.com/
2011/02/update-to-google-social-search.html,
2011. Accessed: 2/2013.
[11] Andy Cockburn and Steve Jones. Which way now? analysing
and easing inadequacies in www navigation. International
Journal of Human-Computer Studies, 45:105129, 2000.
[12] Russell H. Colley. Dening Advertising Goals for Measured
Advertising Results. Assoc. of Natl. Advertisers, 1961.
[13] Samantha Cook, Corrie Conrad, Ashley L Fowlkes, and
Matthew H Mohebbi. Assessing google u trends
8
performance in the united states during the 2009 inuenza
virus a (h1n1) pandemic. PLoS One, 6(8):e23610, 2011.
[14] Thomas Erickson and Wendy A. Kellogg. Social translucence:
an approach to designing systems that support social
processes. ACM Trans. Comput.-Hum. Interact., March 2000.
[15] Brynn M Evans and Ed H Chi. Towards a model of
understanding social search. In Proceedings of the 2008 ACM
conference on Computer supported cooperative work, pages
485494. ACM, 2008.
[16] Brynn M. Evans, Sanjay Kairam, and Peter Pirolli. Do your
friends make you smarter?: An analysis of social strategies in
online information seeking. Inf. Process. Manage., November
2010.
[17] Evergreen Consulting Group. Search Engine Demographics
for 2010.
http://evergreendirect.com/index.php/2010/02/search-engine-
demographics-for-2010. Accessed September 19, 2012.,
2010.
[18] Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel,
Lynnette Brammer, Mark S Smolinski, and Larry Brilliant.
Detecting inuenza epidemics using search engine query data.
Nature, 457(7232):10121014, 2008.
[19] E. Goffman. The presentation of self in everyday life, 1959.
[20] Luis Gravano, Vasileios Hatzivassiloglou, and Richard
Lichtenstein. Categorizing web queries according to
geographical locality. In Proc. CIKM, New York, NY, USA,
2003.
[21] B. Hecht, J. Teevan, M.R. Morris, and D. Liebling.
SearchBuddies: Bringing Search Engines into the
Conversation. In Proc. ICWSM, 2012.
[22] C. Honey and S.C. Herring. Beyond microblogging:
Conversation and collaboration via twitter. In Proc. HICSS,
pages 110, 2009.
[23] Liangjie Hong and Brian D. Davison. Empirical study of topic
modeling in twitter. In Proc. SOMA, New York, NY, USA,
2010. ACM.
[24] Mengdie Hu, Shixia Liu, Furu Wei, Yingcai Wu, John Stasko,
and Kwan-Liu Ma. Breaking news on twitter. In Proc. CHI,
pages 27512754, 2012.
[25] Scott B. Huffman and Michael Hochster. How well does result
relevance predict session satisfaction? In Proc. SIGIR, pages
567574, 2007.
[26] Rosie Jones and Kristina Lisa Klinkner. Beyond the session
timeout: automatic hierarchical segmentation of search topics
in query logs. In Proc. CIKM, 2008.
[27] In-Ho Kang and GilChang Kim. Query type classication for
web document retrieval. In Proc. SIGIR, New York, NY, USA,
2003.
[28] Ansio Lacerda, Marco Cristo, Marcos Andr Gonalves,
Weiguo Fan, Nivio Ziviani, and Berthier Ribeiro-Neto.
Learning to advertise. In Proc. SIGIR09, pages 549556,
New York, NY, USA, 2006. ACM.
[29] David Laniado and Peter Mika. Making sense of twitter. In
Proceedings of the 9th international semantic web conference
on The semantic web - Volume Part I, ISWC10, pages
470485, Berlin, Heidelberg, 2010. Springer-Verlag.
[30] Yu-Ru Lin, Hari Sundaram, Munmun De Choudhury, and
Aisling Kelliher. Discovering multirelational structure in
social media streams. ACM Trans. Multimedia Comput.
Commun. Appl., 8(1):4:14:28, February 2012.
[31] Peter Mika. Making things ndable: semantics for web search
and online media. In Proceedings of the International
Conference on Web Intelligence, Mining and Semantics,
WIMS 11, pages 3:13:2, New York, NY, USA, 2011. ACM.
[32] Yoshihiko Miyazaki. Social knowledge of food: How and why
people talk about foods, 2008.
[33] Robert J. Moore, Elizabeth F. Churchill, and Raj Gopal Prasad
Kantamneni. Three sequential positions of query repair in
interactions with internet search engines. In Proceedings of
the ACM 2011 conference on Computer supported
cooperative work, CSCW 11, pages 415424, New York, NY,
USA, 2011. ACM.
[34] E. Morozov. Iran elections: A twitter revolution?, June 2009.
[35] Meredith Ringel Morris. A survey of collaborative web search
practices. In Proc. CHI, 2008.
[36] Meredith Ringel Morris, Jaime Teevan, and Katrina Panovich.
A comparison of information seeking using search engines
and social networks. In Proc. ICWSM, 2010.
[37] Meredith Ringel Morris, Jaime Teevan, and Katrina Panovich.
What do people ask their social networks, and why?: a survey
study of status message q&a behavior. In Proc. CHI, 2010.
[38] J. Muller. GM Says Facebook Ads Dont Work, Pulls $10
Million Account. http://onforb.es/LO5Hur. Accessed
September 19, 2012. Forbes, 2012.
[39] Mor Naaman. Social multimedia: highlighting opportunities
for search and mining of multimedia data in social media
applications. Multimedia Tools Appl., 56(1):934, January
2012.
[40] Mor Naaman, Jeffrey Boase, and Chih-Hui Lai. Is it really
about me?: message content in social awareness streams. In
Proc. CSCW, 2010.
[41] Brendan OConnor, Ramnath Balasubramanyan, Bryan
Routledge, and Noah Smith. From tweets to polls: Linking
text sentiment to public opinion time series. In Proc. ICWSM,
2010.
[42] Sharoda A Paul, Lichan Hong, and Ed H Chi. Is twitter a good
place for asking questions? a characterization study. In
ICWSM, 2011.
[43] Swit Phuvipadawat and Tsuyoshi Murata. Breaking news
detection and tracking in twitter. In Proc., pages pages
120123. IEEE Computer Society, 2010.
[44] K. Purcell. Search and email still top the list of most popular
online activities. In Pew Internet & American Life Project,
2011.
[45] Daniel Ramage, Susan T. Dumais, and Daniel J. Liebling.
Characterizing microblogs with topic models. In Proc.
ICWSM, 2010.
[46] Peter J. Rentfrow and Samuel D. Gosling. Message in a
Ballad: The Role of Music Preferences in Interpersonal
Perception. Psychological Science, 17(3), 2006.
[47] S. Sengupta. Preferred Style: Dont Flaunt It in Silicon Valley.
http://nyti.ms/N3sdkp. Accessed September 19, 2012. New
York Times, 2012.
[48] David A. Shamma, Lyndon Kennedy, and Elizabeth F.
Churchill. Peaks and persistence: modeling the shape of
microblog conversations. In Proc. CSCW, 2011.
[49] David A. Shamma, Ryan Shaw, Peter L. Shafton, and Yiming
Liu. Watch what i watch: using community activity to
understand content. In Proceedings of the international
workshop on Workshop on multimedia information retrieval,
MIR 07, pages 275284, New York, NY, USA, 2007. ACM.
9
[50] A. Smith. Twitter Update 2011.
http://pewresearch.org/pubs/2007/twitter-users-cell-phone-
2011-demographics. Accessed September 19, 2012. Pew
Research, 2011.
[51] Amanda Spink, Deitmar Wolfram, Bernard Jansen, B. J.
Jansen, and Tefko Saracevic. Searching the web: The public
and their queries, 2001.
[52] Jaime Teevan, Daniel Ramage, and Merredith Ringel Morris.
#twittersearch: a comparison of microblog search and web
search. In Proc. WSDM, 2011.
[53] Sarah Vieweg, Amanda L. Hughes, Kate Starbird, and Leysia
Palen. Microblogging during two natural hazards events: what
twitter may contribute to situational awareness. In Proc. CHI,
pages 10791088, 2010.
[54] Shaomei Wu, Jake M Hofman, Winter A Mason, and
Duncan J Watts. Who says what to whom on twitter. In
Proceedings of the 20th international conference on World
wide web, pages 705714. ACM, 2011.
[55] Jiang Yang, Meredith Ringel Morris, Jaime Teevan, Lada A.
Adamic, and Mark S. Ackerman. Culture matters: A survey
study of social q&a behavior. In Proc. ICWSM, 2011.
10