Professional Documents
Culture Documents
AbstractIn recent years, social computing has become a very process and analyze these data [31]. In this paper, we therefore
popular application in the Internet, and therefore large amount will propose method to analyze the common social data and
of social (communication) data has been collected in different discuss how to extract social networks from multi-sources
social computing application. This paper will introduce a social data dynamically. Furthermore, this paper will propose a
methodology to collect and analyze multi-source social, and by system architecture to use the three techniques of social
this for extracting social networks from the data. A system network analysis, social network construction and visualization
architecture will also be presented in this paper to show how the to process and analyze those valuable data. The system will
data can be collected, pre-processed, analyzed. Furthermore, the allow user to input tasks for dynamic social network analysis
system will allow the users to use the data as a resource for
and construction and the final results will be presented by
personal decision support.
visualized mean and interface for decision support.
Keywords-Social Networking; Instant Messenger; E-mail; Data The structure of this paper is organized as below: In section
Mining; Social Network Analysis 1, the background and introduction will be introduced. Some
related literatures of social network extraction, social network
I. INTRODUCTION and data mining and social network analysis will be reviewed
With the rapid growth of Internet and communication in section 2. A system architecture about how to extract
technologies, there are many communication and social dynamic social networks from multi-sources data will be
activities of people have been transferred to Internet-based proposed in section 3 as well as the introduction of the
platform, e.g. e-mail communication, instant messaging components in the system. In section 4, we will focus on how
software and social networking websites (such as Blog and to extract social networks from social data and how to use data
web albums), etc. [8] Under this background, large amount of mining and AI techniques for decision support. In section 5,
personal communication and social data has been aggregated this paper will be concluded with the suggestions for future
and stored in different locations [15]. However, these valuable research.
data have not been well organized, treated and used. Thus, it is
an interesting research issue about how to use current II. LITERATURE REVIEW
information techniques to process and analyze these data, such In this section, related literatures will be reviewed,
as artificial intelligence, data mining or visualization technique including social networks analysis, social networks extractoin
[18]. and social networking for decision support.
Social network analysis and construction are originally in
the research fields of Sociology. In recent years, many research A. Social Networks Analysis
issues of information science and social networking have been The research methodology of social network analysis is
concerned due to the development of information techniques developed to understand the relationship between actors, and
and the requirements of data processing ability [13]. The target the term actor can be a person, an organization, an event or an
of social network analysis and construction is relationship data object [4]. In a social network, each actor is presented as a
and it is therefore suitable to process and analyze node and each pair of nodes can be connected by lines to show
communication and social data that discussed previously [7]. the relationships. The social network structure graph is a graph
Since the communication data, such as e-mails and the logs that formed by those lines and nodes, and social network
of instant messenger, are very common data in our daily life. analysis is therefore a methodology that used to understand the
However, there is less work focusing on how to organize, graph and the relationships and actors in the social network
[11][34][27].
There are three important elements that included in a social Most of the researches that discussed above are focusing on
network: actors, ties, and relationships. Actors are the essential a single source for social network extraction. However, the
elements in the social network to define the people, events or issue of how to extract social networks from different sources
objects. Ties are used to construct the relationship between has not been discussed well in related literatures. It is also a
actors by using a mean of path to establish the relationship hard task about how to integrate multi-source data for social
directly or indirectly. Ties can also be divided into strong tie networking extraction. In addition to the problem of multi-
and weak tie according to the strength of the relationships; they source data, instant messenger is a very popular and hot
are also useful for discovering the subgroups of the social software for people to send message and communication
network. Relationships are used to illustrate the interactions recently. However, it has not been seen in recent research about
and relationship between two actors. Furthermore, different how to extract social networks from the data. These research
relationships may cause the network to reflect different issues will be discussed in this paper.
characteristics [32][33].
The most important measurements of SNA include network C. Web Mining Techniques for Social Networking
size, diameter, density, centrality and structure holes [5]. Size
is a measurement to measure the amount of nodes or links in a According to different analysis targets and resources, the
network, and the measurement of diameter is to measure the web mining techniques can be divided into three different types,
amount of nodes between two nodes in a network. Density is which are Web Content Mining, Web Structure Mining and
used to calculate the closeness of a network [23][28]. These Web Usage Mining [30].
measurements are common used in many social network
Web content mining is a web mining technique to analyze
related researches and will be used in this paper as well.
the contents in the web, such as texts, graphs, graphics, etc [2].
Traditionally, researches about SNA are mainly focus on Recently, most of web content mining researches are focused
small group of actors and are process manually in most cases. on the text data processing and few are focused on other
[6] However, with the rapid growth of Internet and web multimedia data. Natural language process is therefore the
techniques, more and more data have been collected and it has main technology that used in this area. The concept and
become a hard task to process these data by only the mean of techniques of Semantic Web and Ontology also have to be
manually [9]. Therefore, the scholars of information studied [16][ 20].
technology and computer science are starting to devote related
Web structure mining is a technique that can be used to
researches to deal with these research issues [12][26].
analyze the links and structure of websites [10]. Graph theory
Currently, the researches of computer science in SNA can be
is usually the main concept and theory for web structure
divided into four main topics, including social networks
mining to analyze and explain the structure of websites. In
construction, social networks extraction, social networks
addition, the extraction of the structure of websites is always
analysis and visualization.[24]
essential in this research area [12]. Therefore, its usually the
concern about how to design and implement a crawler (or
B. Social Networks Extraction spider, bots) to extract and construct the structure of websites,
such as the research topic of Deep-web.
In the research field of information technology and Web usage mining is a web mining technique that can be
computer science in social networking, social networks used to analyze how the websites have been used, such as the
extraction is a subfield focusing on extract social networks navigation behavior of users. The server-side Clickstream data
from large amount of communication data. With the rapid (logs file) is the main sources that used for web usage mining.
growth of Internet and WWW, there are various kinds of data Client-side data (such as client-side logs file, cookies) is
have been generated due to communication purpose. The sometimes to be used due to some research concerns, such as in
common used communication data such as email order to record more complete behavior of users. Different web
communication data, web usage logs, event logs, instant usage mining analyses include basic statistical analysis of the
messenger logs, logs of telecommunication, etc[22][29]. navigation behavior of users in a website, such as how many
times the website has been browsed, where the users comes
Currently, there are some researches which are focusing on
from, etc. Furthermore, advanced web usage mining analyses
the extraction of these social data. For example, Bird et al.
can also be provided, such as more complex analysis for
propose a method to extract social networks from e-mail
understand the navigation history of users in a website or cross-
communications [3]. Agrawal et. al using web mining
website analysis [25].
techniques to understand the behavior of users in newsgroup
[2]. Web is considered as the biggest database in the world, so
that various social networks can be extracted from this resource,
such as Furukawa et al. were trying to identify social networks
from blogspace [14][19] Jin et al. and Matsuo et al. developed
systems and tried to extract social networks from the web [17]
[21]. Adamic and Adar developed a method to discover the
relationship of friends and neighbours in the web [1].
Figure 1. The architecture of the social network extraction system