You are on page 1of 64

Beyond Common Knowledge on the Internet

Yana Volkovich
Barcelona Media Innovation Center, Barcelona, Spain

Moscow, May 16th, 2013


Barcelona Media Innovation Center, ,

, 16 , 2013

Acknowledgment
Barcelona Media Innovation Center carries out applied research focused on the needs of the Media and Communications industry in Spain and Brazil Social Media group
Pablo Aragn Karolin Kappler* Andreas Kaltenbrunner Jessica G. Neff* David Laniado

Knowledge on the Internet

Knowledge on the Internet


Knowledge is about

objects & connections between these objects

Network of objects

Knowledge on the Internet


Knowledge is about objects & connections between these objects Knowledge on the Internet is about

Wikipedia

Beyond common knowledge on the Internet


Wikipedia as a global memory place Wikipedia as the largest existing collaborative projects

Wikipedia as a global memory place

Wikipedia as a global memory place


Networks of individuals: Biographical Social Networks Networks of social entities: Sister-city networks

10

Biographical Social networks Is history made by great man? or is great man made by history?
undoubtedly social connections shape history

11

Biographical Social networks

Wikipedia as global collective memory place


allows to extract from biographies how social links are recorded across cultures to generate networks of links between biographical articles

Questions:
1. 2. 3. 4. Who are the most central characters in these networks? Do culture related peculiarities exist? Which cultures are more similar? What is the shared knowledge about connections between persons across cultures?

12

Biographical Social Networks


data extraction
Selected the 15 largest language editions of Wikipedias Starting point: 296 511 biographies from the English Wikipedia (from Dbpedia) Identified the corresponding articles (when existing) on the remaining 14 languages

Generated a directed biographical network for each language version:


nodes -> persons edges -> links between the articles of the corresponding persons

Manage alternative titles of articles: track redirects

13

Biographical Social Networks [redirects]

14

Biographical Social Networks [redirects]

15

Biographical Social Networks [redirects statistics]

16

Biographical Social Networks [different language networks]


language English German French Italian Spanish Japanese Dutch Portuguese Swedish Polish Finish Norwegian Russian Chinese Catalan code en de fr it es ja nl pt sv pl fi no ru zh ca # nodes 198 190 62 402 51 811 35 756 34 828 26 155 24 496 23 705 23 085 22 438 18594 18 423 16 403 11 715 11 027 # edges 928 339 260 889 283 453 190 867 169 302 109 081 76 651 85 295 60 745 50 050 44 941 49 303 34 436 44 739 42 321 average % nodes in average clustering Giant reciprocity path length coefficient component 0.03 95% 6.53 0.17 0.05 94% 6.83 0.14 0.06 96% 6.11 0.15 0.06 95% 6.28 0.14 0.06 97% 6.29 0.16 0.08 96% 6.47 0.20 0.08 94% 7.91 0.18 0.07 94% 6.98 0.18 0.07 91% 8.27 0.20 0.08 85% 8.94 0.16 0.07 87% 7.80 0.17 0.09 83% 8.31 0.22 0.06 87% 9.10 0.10 0.17 91% 7.20 0.20 0.09 93% 7.14 0.17 diameter 43 33 36 42 36 26 37 45 46 43 30 48 35 32 32

low clustering: with exception for Chinese c=0.17 two persons are rarely mutually connected: parasocial interactions one-sided interpersonal relationships in which one part knows a great deal about the other, but the other does not a person is influenced by the works of somebody who died decades before
17

Biographical Social Networks

Who are the most central characters in these networks?

18

Biographical Social Networks Centralities


degree centrality is the number of links incident upon a node betweenness centrality counts the number of shortest paths between other nodes passing this node
Example for centralities measures: A has the highest degree and D has the highest betweenness centrality

19

Biographical Social Networks Most central persons in the English Wikipedia


The top 25 persons in the English Wikipedia ranked by in-degree. Ranks for out-degree, betweenness and PageRank in parenthesis

20

Biographical Social Networks

Do culture related peculiarities exist?

21

Biographical Social Networks


Most central persons in different language Wikipedias
Most central persons in different language Wikipedias are known to be (or have been) highly influential : political leaders, revolutionaries, famous musicians, writers and actors Hitler, Bush, Obama dominate in almost all top rankings Top ranked in many languages reflect country specifities

22

Biographical Social Networks

Which cultures are more similar?

23

Biographical Social Networks


Jaccard coefficient: Given the set of links A and B of two networks J=|AB|/|AUB| the ratio between the number of links present in both networks (their intersection) and the number of links existing in their union

24

Biographical Social Networks

What is the shared knowledge about connections between persons across cultures?

25

Intersection of networks in different languages

26

Biographical Social Networks [to-gos]


Global social network measures are largely similar for all networks

Most central persons unveil interesting peculiarities about the language communities
Networks are more similar for geographically or linguistically closer communities

27

Network of Social Entities


Analysis of institutional (sister city) relations

via elsief1 @Flickr 28

Network of sister cities

29

Network of sister cities [Motivation]


Institutional partnership between two cities or towns with the aim of cultural and economical exchange
These relations had never been analyzed before to understand social, geographical, and economic mechanisms of city pairings

30

Network of sister cities


Interesting facts:
The earliest form of town twinning in Europe was between the German city of Paderborn and the French city of Le Mans in 1836 Coventry twinned with Stalingrad (Volgograd) and with Dresden: all three cities having been heavily bombed during the war Many German cities still are twinned with other German cities: Hanover and Leipzig or Hamburg and Dresden Tashkent was twinned with Seattle, Washington in 1973 and became the first Soviet city to be twinned with one in the US

31

Network of sister cities


Example for Wikipedia article used for data extraction

32

Network of sister cities


Data extraction process automated parser and a manual cleaning process. Google Maps API to geo-locate cities.
network city network country network number of nodes 11 618 207 number of edges 15 225 2 933 clustering coefficient 0.11 0.43 % nodes in giant component 61.35 100 average path length 6.74 2.12

Disclaimer No central register User generated data (only 30% of reciprocal connections) No guarantee that the dataset is complete
33

Network of sister cities

34

Network of sister cities [Top 20 cities ranked by degree]


city Saint Petersburg Shanghai Istanbul Kiev Caracas Buenos Aires Beijing So Paulo Suzhou Taipei Izmir Bethlehem Moscow Odessa Malchow Guadalajara Vilnius Rio de Janeiro Madrid Barcelona degree 78 75 69 63 59 58 57 55 54 53 52 50 49 46 46 44 44 44 40 39 betw. centrality 1 4 12 5 23 36 124 24 6 20 3 2 16 8 17 9 14 29 203 60

35

Network of sister cities

36

Network of sister cities [Top 20 cities countries ranked by degree]


country USA France Germany degree 4520 3313 2778 betw. centrality 1 3 6

UK
Russia Poland Japan Italy China Ukraine Sweden Norway Spain Finland Brazil Mexico Canada Romania Belgium the Netherlands

2318
1487 1144 1131 1126 1076 946 684 608 587 584 523 492 476 472 464 461

2
9 33 20 7 4 27 14 22 11 35 13 21 28 32 23 16

37

Network of sister cities [Assortativity]


Measure of diversity in network: do nodes having many connections preferentially interact with one another or with poorly connected nodes? Degree assortativity by city: Cities with many connections tend to be connected with cities with many connection and vice-versa Relations are assortative by country: Gross Domestic Product per capita; Human Development Index; Political Stability Index

38

Network of sister cities


Comparison of distances between two pairs of
connected sister-cities random (not necessarily connected) cities

An evidence of the Death of Distance (F. Cairncross The Death of Distance: How the Communications Revolution Is Changing our Lives)
39

Network of sister cities [to-gos]


Assortative mixing with respect to degree, economic and political country indexes. Sister-city relationships reflect country preferences Geographic distance between sister cities does not influence city pairing

40

Wikipedia as the largest existing collaborative projects

41

Wikipedia visible side

42

Wikipedia article talk pages

43

The hidden side of Wikipedia


since 2007 growth of Wikipedia has notably slowed down (B. Suh et al.; The singularity is not near: slowing growth of Wikipedia; 2009)

The hidden side of Wikipedia is gaining importance


article talk pages explicit coordination and discussion user talk pages personal communications (sort of public inbox) Article Barack Obama: discussion split into 72 pages 22 000 comments in the article talk pages (17 500 edits done to the article)

44

The hidden side of Wikipedia [motivation]


Unlike in other online discussion spaces in Wikipedia the users discuss to reach consensus and to coordinate their activity with each other

Detect patterns of interaction in the communications between the Wikipedians

45

Discussion tree for article Presidency of Barack Obama


red root (the article) blue structural nodes green anonymous comments grey registered comments

46

Discussion trees

Slashdot Wikipedia
47

Discussion trees
Number of users involved Number of chains of length >= 3, or consecutive replies between two users example chain of length 3: A B A good indicator of conflictive discussions

85% of articles have 10 comments 15 000 articles have 100 comments


48

Most discussed Wikipedia articles [Top 20 ordered by number of chains]

49

Article discussions categorisation [structural differences among discussions from different macro-categories ]

50

Wikipedians networks [to-gos]


the number of chains of direct replies between pairs of users seems to be a good indicator of contentious discussion topics significant differences in discussions from different semantic areas

51

Wikipedia brings political opponents together


2004 U.S. presidential campaign: political blogs served as a prominent information source regarding the campaign and candidates

conservative and liberal political blogs primarily link to other blogs with their same political orientation (Adamic L, Glance N; The political blogosphere and the 2004 U.S. election: Divided they blog, 2005)
people tend to read blogs that reinforce, rather than challenge, their political beliefs (Lawrence E, Sides J, Farrell H; Self-segregation or deliberation? Blog readership, participation, and polarization in American politics, 2010) strong evidence of political polarization on Twitter (e.g. Aragn P, Kappler K, Kaltenbrunner A, Laniado D, Volkovich Y; Communication Dynamics in Twitter during Political Campaigns: the Case of the 2011 Spanish National Election, 2013)
52

Wikipedia brings political opponents together


What are the identity and representation practices of users who claim their affiliation to a party within the Wikipedia community? Do we see a division in patterns of participation along party lines? Do users exhibit a preference for interacting with members of their same political party?

53

Wikipedia brings political opponents together


What are the identity and representation practices of users who claim their affiliation to a party within the Wikipedia community?

54

Wikipedia brings political opponents together


1,390 members of the Wikipedia community who explicitly proclaimed their political affiliation as either a Republican or Democrat

conservative ideology: This user is pro-life, This user supports LEGAL immigration, and this user thinks the global warming issue has been immensely exaggerated liberal ideology: This user supports the legalization of same-sex marriage, This user is pro-choice, and This user supports immigration and the right to travel freely upon the planet we share
55

Wikipedia brings political opponents together


Do we see a division in patterns of participation along party lines?

56

Most edited articles


Political (relating to a political issue or a politician, e.g. United States Presidential Election, 2008; George Bush) Conservative (related to a conservative politician, commentator, or issue, e.g. Rush Limbaugh) Liberal (related to a liberal politician, commentator, or issue, e.g. Al Gore) Neutral (political in nature, but not partisan, e.g. European Union, September 11 attacks) Not Political (e.g. Britney Spears, 2008 Summer Olympics)

57

Most edited articles


100 most edited articles, Democrats and Republicans had 44 articles in common. Democrats: 38 articles with political topics (15 liberal, 15 conservative, and 8 neutral) Republicans: 35 articles with political topics (7 liberal, 17 conservative, and 11 neutral) All users: 22 articles with political topics (5 liberal, 3 conservative, and 14 neutral)

58

Wikipedia brings political opponents together


Do users exhibit a preference for interacting with members of their same political party?

59

Cross-party interactions
Editors appear to be equally likely to engage conversations with users from the other party as with users from the same party

Levels of conflict are high both within and across parties when the discussion threads dealt with political or other potentially controversial topics

60

Wikipedia brings political opponents together [to-gos]


the lack of preference to interact with same-party members in the context of article discussions does not indicate the same polarization that has been observed in other contexts

Wikipedian identity seems to predominate over party identity

61

References
J.G. Neff, D. Laniado, K. Kappler, Y. Volkovich, P. Aragon, and A. Kaltenbrunner; Jointly they edit: examining the impact of community identification on political interaction in Wikipedia.; in PLOS ONE, 2013 A. Kaltenbrunner, P.Aragon, D. Laniado, and Y. Volkovich; Not all paths lead to Rome: Analysing the network of sister cities.; in IWSOS2013 P Aragon, A Kaltenbrunner, D Laniado, and Y Volkovich, Biographical Social Networks on Wikipedia: A cross-cultural study of links that made history.; in 8th International Symposium on Wikis and Open Collaboration (WikiSym2012) D. Laniado, R. Tasso, Y. Volkovich, and A. Kaltenbrunner, When the Wikipedians Talk: Network and Tree Structure of Wikipedia Discussion Pages.; in Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM2011); 2011

62

Questions?

63

Fundaci Barcelona Media


Av. Diagonal, 177 | 08018 Barcelona Tel (+34) 93 238 14 00 | Fax (+34) 93 309 31 88

www.barcelonamedia.org

You might also like