You are on page 1of 10

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal

of Information Processing and Management Volume 2, Number 3, July 2011

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures


Yoko Nishihara, Wataru Sunayama the University of Tokyo, Hiroshima City University nishihara@sys.t.u-tokyo.ac.jp, sunayama@sys.info.hiroshima-cu.ac.jp
doi: 10.4156/ijipm.vol2.issue3.2

Abstract
Internet users often write blogs related to their personal experiences, daily news, and so on. We can obtain blogs including personal experiences using search engines on the Web. However, search engines output various blogs including not only personal experiences but also other topics. Therefore, it takes too much time to obtain personal experiences because we have to read all output blogs.. This paper proposes a support system for obtaining blogs including personal experiences. The system extracts three kinds of keywords that denote place, object, and action from a blog. The three keywords describe an event that is written in a blog with a personal experience. The system expresses an event with three kinds of pictures related to the each extracted keywords. The pictures help users to judge whether a personal experience is written in a blog or not. According to the experimental results, the system could support users to obtain personal experiences.

Keywords: Personal Experience, Event Extraction from Blogs, Images Depicting Events 1. Introduction
Since many people have written blogs, we can obtain personal experiences and reviews of commercial items from blogs. For example, a man/woman, who will travel to Paris, can obtain information about local areas for sightseeing from blogs. When he/she obtains such information, it is considered that he/she enters a query, such as Paris AND sightseeing, into a search engine and obtains information by reading Web pages given by the search engine. The search results also contain blogs about other topics, plans for sightseeing in Paris, sightseeing in some other areas near Paris and so on. Therefore, he/she has to spend much time to read all blogs. The efficiency of information acquisition will be bad. This paper proposes a support system for obtaining information about personal experiences from blogs efficiently. We consider that people obtain personal experiences through some events. Therefore, our system extracts some keywords denoting an event, and visualizes the event with pictures expressing the extracted keywords. Users can judge by watching the pictures whether personal experiences are written in blogs or not. It is expected that events visualization with pictures reduces the amount of time to obtain personal experiences. The possibility that a blog contains both events and personal experiences is usually high. The probability was 93% in an experiment for the proposed system. Therefore, we can support users for obtaining personal experiences by extracting events. The time to understand text contents is reduced by visualizing pictures [1,2]. It is considered that users can judge in a short time whether events are written in blogs or not. Therefore, we have determined to use pictures for describing events. We define that an event is expressed with three keywords, an action keyword, an object keyword, and a place keyword. Other keywords, such as a time keyword, a subject keyword, and a reason keyword, are generally used to visualize events. In case of time keywords, though some methods have been proposed [3], we do not extract time keywords because the method needs a large amount of data for training of machine learning. In case of subject keywords, the subject of a personal experience in a blog is always the blog writer. Therefore, we do not extract subject keywords from blogs. And we do not extract reason keywords because of extraction difficulty.

- 10 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

2. Related works
2.1. Information extraction from Web
There are many studies about information extraction from the Web [4]. In case of extracting information that is interesting to the public, there is a method to extract human names, keywords, and sentences attracted by the public from blogs [5]. Some methods have been proposed for extracting noticed persons and noticed events [6,7,8]. Another methods have been proposed for extracting keywords that will be attracted in the future [9,10]. The methods proposed in [9,10] use time series analysis of keyword frequencies. Though the proposed system also extracts information from the Web, the system extracts information about personal experiences appealed by one person, instead of information attracted by the public.

2.2. Review extraction


It is considered that a personal experience is one of reviews about a commercial item. Some methods have been proposed to extract reviews of a commercial item and movie reviews from the Web. Some of the methods learn linguistic features of reviews by machine learning [11,12]. The others have been proposed to extract and show reviews of a commercial item using learned features [13,14,15]. The proposed system extracts events written in blogs with personal experiences. Events are different from movie reviews. Each person feels the same event differently. It is considered that the number of keywords to write personal experiences is higher than that to write events [16]. Many blogs contains both events and personal experiences. Therefore, the proposed system extracts events and supports users for obtaining personal experiences using the extracted events.

3. Proposed system
The proposed system requires a query as an input. A query represents a theme for which a blog is written. For example, if he/she needs to obtain personal experiences about sightseeing in Hawaii, he/she should input Hawaii AND sightseeing into a search engine. The system downloads blogs including the query. After that, the system narrows downloaded blogs into ones including sentences about events. Sentences are usually written as past tense to express events. Therefore, the blogs are narrowed into ones with sentences of past tense. Next, the system separates a blog text by every keyword representing a place. The system extracts keywords representing a event from each separated text. Then, the system sets out pictures representing extracted keywords. Finally, the system outputs the set of pictures with the original blog texts.

3.1. Blog text separation


In this section, it is described how to separate a blog text by every keyword that represents a place. Since it is considered that a blog has some descriptions about different places, the system separates a blog text into blocks by using place keywords. In order to extract place keywords, the system extracts nouns with the following prepositions that are frequently used with places, such as at, on, in and so on. The system extracts such nouns as candidates of the place keywords, and decides one noun as a place keyword for a block of a blog text.

3.2 Extraction of keywords corresponding to event


The system extracts three kinds of keywords from each block. The keywords are an action keyword, an object keyword, and a place keyword. In the following sections, it is described how to extract object keywords and action keywords.

- 11 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

3.2.1. Extraction of object keyword In this section, it is described how to extract object keywords. The object keywords are noun. If some nouns are in a block, relationship values between each noun and the extracted place keyword are calculated by Eq. (1), where hit(x) denotes the number of Web pages including a keyword x, p denotes a place keyword, and o denotes an object keyword.

validity(o)

hit(p o) hit(p)

(1)

Eq. (1) calculates a rate of the number of Web pages including the place keyword to the number of Web pages including the place keyword and an object keyword. If the value of Eq. (1) become large, it is considered that two words p and a are highly related each other. The system decides a noun with the highest value of Eq. (1) as the object keyword. 3.2.2 Extraction of action keyword In this section, it is described how to extract action keywords. Action keywords are verbs that appear in the sentences including the extracted object keywords. This is because an action keyword and an object keyword usually appear in the same sentences in order to describe an event.

3.3. Setting out of pictures visualizing events


The system sets out pictures representing extracted three kinds of keywords. The system visualizes three pictures, a place picture, an object picture, and an action picture in a line from left to right. If a blog has some blocks, pictures of the first block is visualized only. A picture database have prepared in advance. The database was created by the authors using a image search engine [17]. We input keywords that had extracted from blogs as queries, and chose one picture from top 20 pictures of the search results. We spent from two seconds to 30 seconds to choose a picture. Now, the database has about 1000 pictures for place keywords, about 700 pictures for object keywords, and about 200 pictures for action keywords. If the database does not have pictures representing the extracted keywords, the system outputs blanks instead of pictures. The system does not consider the dependency between pictures in setting out. Though it is desirable to choose pictures considering the context and the coherence of the event, it is difficult to search pictures considering the context of all of queries using a current image search engine (for example, I ran on a field in sport festival). Therefore, the system outputs pictures representing keywords clearly. It is expected that users can imagine the context from three kinds of pictures, if the pictures express the keywords clearly.

3.4. Output: Three kinds of pictures visualizing event


The system outputs sets of pictures visualizing events (shown in Figure 1). If blog titles and their summaries have downloaded, the system also outputs them with the sets of pictures.

4. Experiment for three kinds of keywords extraction


Experiments have executed to confirm the accuracy of blog texts separation(in Section 4.1), and the accuracy of keyword extraction (in Section 4.2).

4.1 Accuracy of separation of blog text


Accuracy of separation of blog text was verified by comparing blocks created by the proposed system with ones separated by participants.

- 12 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

Figure 1. Output of proposed system. Set of blogs with pictures. 4.1.1. Experimental procedures We collected blogs with Google blog search [18] using queries, bought, ate, and visited. We chose the queries for blog collection because we needed blogs in which events were written. We extracted 30 blogs from the collected blogs considering the amount of sentences in each blog. We asked participants to separate blog texts into some blocks. That is, the participants read blog texts and separated them at the points where they thought a place had changed. The number of participants was 20 undergraduate/graduated students who major information sciences. We assigned 10 participants to each blog text. We defined a correct block if more than five participants separated texts with the same position. We compared the blocks by the proposed system with the ones by the participants. We calculated precisions and recalls using Eq. (2) and Eq. (3), where n(x) denotes the number of blocks in a set x, system denotes the set of blocks by the proposed system, and participant denotes the set of blocks by the participants.

precision recall
4.1.2. Result

n(system participant) n(system)

(2)

n(system participant) n(participant)

(3)

The precision was 0.75 and the recall was 0.49. The recall was low because the participants judged that places had changed when the writer of a blog retraced his/her past in the blog. Since it is considered that events are not written in one's reminiscence, we removed reminiscences from blog texts and re-calculated the precision and the recall. The precision was 0.69 and the recall was 0.77. These

- 13 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

values were enough to separate blog texts. Therefore, we confirmed that the proposed system could separate blog texts appropriately.

4.2. Accuracy of keyword extraction


Accuracy of keyword extraction was verified by comparing the keywords extracted by the proposed system with ones extracted by participants. 4.2.1. Experimental procedure We used the same set of blog texts used in Section 4.1. We asked participants to extract keywords from blog texts as follows. First, the participants read blog texts and extracted place keywords. Second, the participants extracted action keywords and object keywords that were related to the place keywords. The participants were the same people in Section 4.1. We assigned 10 participants to each blog text, and set three kinds of keywords that were chosen by more than five participants as corrections. We compared the three kinds of keywords extracted by the proposed system with ones by the participants. We calculated the precision and the recall using Eq. (2) and Eq. (3). 4.2.2. Result Table 1 shows the precisions and the recalls for extraction of all three kinds of keywords and for each keyword from a block. The precision was 0.71 and the recall was 0.48 for extraction of all three kinds of keywords. The recall was low because the proposed system failed to extract many object keywords. There were more than two descriptions about different places in a blog. It is considered that we do not need to extract all events from blogs in order to support users. Since the recall was 0.77 for extraction of place keyword, it is possible to extract at least one event from a blog. From these results, we confirmed that the proposed system could extract three kinds of keywords. Table 1. Precisions and recalls for keyword extraction
Precision Recall All keywords 0.71 0.48 Action 0.81 0.54 Object 0.85 0.58 Place 0.85 0.77

Figure 2. Baseline system for experiment

5. Experiment for proposed system


5.1. Experimental procedures
We asked participants to extract texts that were written about personal experiences from blog texts using the proposed system. We collected blogs from a blog site, Yahoo! Blog [19]. We used top 100

- 14 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

blogs of the search results for this experiment. The queries for blog collection were {Okinawa, Tokyo, Hiroshima, Nigata, or Hokkaido} AND sightseeing, and School Festival AND shop for eating. Five queries were about Japanese places for sightseeing, and the other one was about school festival. We chose these queries because it is considered that, in obtaining personal experiences, most of people search something about events that they will also experience in the near future. We prepared a comparative system (baseline system) that shows blog titles and blog summaries that were the same as the proposed system (shown in Figure 2). The same blog texts shown in the proposed system could be read in the baseline system. The participants read blogs with a Web browser. The window size of Web browser was 1,200 x 1,920 pixels. The 100 blogs were divided into four sets, 25 each, and each set could be seen in a single window. We instructed the participants as follows: (1) To users with the Proposed System: Watch the set of pictures for each blog. To users with the Baseline System: Read the summaries for each blog. (2) Judge whether the blogs are written about events related to the queries or not. Table 2. Averages of blogs read by participants
Proposed Baseline Okinawa 5.8 4.9 Tokyo 4.9 4.7 Hiroshima 5.1 4.3 Niigata 4.7 4.6 Hokkaido 5.9 4.7 school 5.4 4.7

Table 3. Averages of blogs from which personal experiences are extracted by participants.
Proposed Baseline Okinawa 2.2 1.4 Tokyo 3.1 2.6 Hiroshima 2.9 2.6 Niigata 1.8 1.3 Hokkaido 3.2 2.5 school 3.4 2.1

(3) If you find the blogs in which events are written, copy the blog texts about personal experiences and paste them into the text editor. The number of participants was 36. The participants were undergraduate/graduate students who major information sciences. We assigned 18 participants to each query and each system. The time of one session, including above three steps, was set as five minutes. This is because it spends about five minutes to search something in our daily life. We compared the averaged numbers of personal experiences extracted with the proposed system with the numbers with the baseline system.

5.2. Results
Table 2 shows the averaged numbers of blogs read by the participants. For all of the queries, the averages of the proposed system were higher than those of the baseline system (P<.05). This is because the time to understand the contents of a blog text by watching pictures was shorter than the time to understand by reading the summaries. This result indicates that the proposed system could support users for reading more blog texts than users with the baseline system. Table 3 shows the averaged numbers of blogs from which personal experiences were extracted. The all extracted texts were proper to personal experiences. The averages of the proposed system were higher than those of the baseline system (P<.05). This is because the pictures of the proposed system visualized events, and most of the blogs chosen by the participants with the proposed system obtained texts including personal experiences. This is also appeared in the rates of read blogs to extracted ones as in Table 4. Except for Hiroshima, the rates of the proposed system were higher than those of the baseline system (P<.05). In case of Hiroshima, since many place pictures were not visualized by the proposed system, it was difficult for the participants to judge whether events were written or not in each blog. However, in case of the other queries, the rates of the proposed system were higher than those of the baseline system. Therefore, we confirmed that the proposed system was more available to obtain personal experiences than the baseline system.

- 15 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

Table 4. Rate of blogs read by the participants to blogs personal experiences were extracted by the participants Okinawa Tokyo Hiroshima Niigata Hokkaido school Proposed 0.38 0.63 0.57 0.38 0.54 0.63 Baseline 0.29 0.55 0.60 0.28 0.53 0.45 Table 5. Number of blogs divided by two features. One feature is whether events are written in summaries or not. The other feature is whether personal experiences are written in blogs or not. Y means Yes. N means No. Pattern (1) (2) (3) (4) Event: Personal experience Y:Y N:N N:Y Y:N Okinawa 31 30 20 19 Tokyo 15 60 20 5 Hiroshima 29 38 21 22 Niigata 11 61 8 20 Hokkaido 20 26 28 26 school 35 20 33 12 Average 23.5 37.5 21.6 17.3 Rate for all of the blogs 61% 39%

5.3. Efficiency of pictures for choosing blogs


In blog summaries used in the experiment, some of them had descriptions about events (for example, I went to a park.) and others did not. Some of the blog texts used in the experiment had descriptions about personal experiences and others did not have such descriptions. Therefore, we divided the used blogs into four patterns by using the above two features. The results are shown in Table 5. Though 61% blogs (sum of (1) and (2)) could be judged with the baseline system, 39% blogs (sum of (3) and (4)) could not be judged. In case pattern (3), users of the baseline system did not read blogs including personal experiences. In case of pattern (4), the users read the blogs not including personal experiences. However, in case of pattern (3), the proposed system output some pictures as in Figure 3. Therefore, the users could judge that personal experiences were written in the blogs, though it was not written in the summary. In case of pattern (4), the proposed system output some pictures as in Figure 4. Therefore, the users could judge which blog contained personal experiences. Table 6 shows the number of blogs divided by two features: whether pictures visualize events or not, and whether personal experiences are included in blogs or not.

Figure 3. Example output of the proposed system. Event texts were not written in the summary, but personal experiences were written in the blog. The queries were school festival AND shop

Figure 4. Example output of the proposed system. Event texts were written in the summary, but personal experiences were not written in the blog. The queries were Hokkaido AND sightseeing

- 16 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

The blogs divided into pattern (3) and pattern (4) in Table 5 are correspond to the blogs divided into pattern (7) and pattern (8) in Table 6. The sum of pattern (7) and pattern (8) was 22.2%. The sum was lower than the sum of pattern (3) and pattern (4) (39%). It is considered that the participants of the proposed system obtained more texts about personal experiences by watching the pictures. Even when the number of the blogs written about personal experiences was low, the participants with the proposed system extracted texts including personal experiences efficiently. Table 7 shows the number of blogs written about personal experiences. In case of Niigata, the number of blogs including personal experiences was 19. Therefore, the rate of extracted blogs to read ones was low in the baseline system (the rate was 0.28 shown in Table 4). However, the rate was 0.38 with the proposed system, and was larger than that of the baseline system (P<.05). This is because the pictures of the proposed system could support users to judge whether events were written in blogs or not. These results indicate that the proposed system can support users to extract blog texts including personal experiences even if the number of personal experiences were small.
Table 6. Number of blogs divided by two features. One feature is whether the pictures represent events or not. Another feature is whether the personal experiences are written or not in the blog Pattern (5) (6) (7) (8) Event picture : Personal experience Y:Y N:N N:Y Y:N Okinawa Tokyo Hiroshima Niigata Hokkaido school Average Rate for all of the blogs 40 28 52 9 39 41 34.8 43 52 25 70 41 28 43.0 77.8% 5 7 7 7 5 4 5.8 13 13 16 14 15 27 16.3 22.2%

School 68

Table 7. Number of the blogs written about personal experiences Okinawa Hiroshima Hokkaido Tokyo Niigata 51 50 48 35 19

Average 45.1

5.4. Efficiency of overview by event visualizing pictures


The participants with the proposed system watched the outputs in different ways that is compared with the ones with the baseline system. According to questionnaire results of 36 participants with the proposed system, 22 participants answered that they had watched the whole outputs in the Web browser, four participants answered that they had watched the outputs sequentially, and 10 participants answered that they had watched in both way. This is because the participants of the proposed system understood the contents of the blogs at a glance. On the other hand, all participants with the baseline system answered that they had watched the outputs sequentially. This is because the participants of the baseline system could not understand the blog contents at a glance. These results indicate that the participants with the proposed system tend to watch the whole outputs.
Place Y Y Y Y N N N Table 8. Number of participants used combination of pictures Object Action Number of Participants N N 19 N Y 6 Y N 4 Y Y 4 N Y 2 Y Y 1 Y N 0

- 17 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

The proposed system did not output all pictures corresponding to the extracted keywords. 17 pictures for place keywords, 12 pictures for object keywords, and 9 pictures for action keywords were not output actually. The number of blogs without pictures was 81. It may be considered that if there are blanks in the output, the system should output the extracted keywords only. However, it will cause decrease of efficiency for obtaining personal experiences with two reasons. The first reason is that if the extracted keywords are shown instead of pictures, users do not watch the whole outputs, and do not judge quickly whether texts about personal experiences are written or not. The second reason is that the time to understand the contents of the extracted keywords becomes longer than the time to understand the contents of pictures [1,2]. Therefore, it is considered that it takes much time to understand the contents by showing the extracted keywords instead of pictures. When pictures were used as outputs, it takes much time if the contents of pictures were not simple. Though some of the participants answered that it took much time to judge the contents of the pictures, they did not give such comments to all but a part of pictures. This result indicates that pictures shown in the proposed system help users to obtain texts about personal experiences.

5.5. Efficiency of Pictures depicting Place Keyword


We asked the participants with the proposed system that ``Which combination of the pictures did you use for extracting texts including personal experiences? Table 8 shows the results. The number of the participants who used pictures corresponding to place keywords became the highest. It is considered that pictures of place keywords (for example, mountain, sea, and river) were more useful than those of action keywords (for example, walk, run, and watch) and those of object keywords (for example, book, bicycle, and dish). Therefore, most of the participants used pictures of place keywords for extracting texts including personal experiences. The result indicates that pictures of place keywords are the most useful to obtain personal experiences. In Table 8, though the number of the participants who used pictures of place keywords was the highest (19 participants), 17 participants used another combinations of pictures. It is considered that pictures of object keywords and pictures of action keywords were also useful to obtain personal experiences. Therefore, it is necessary to visualize the three kinds of pictures to obtain personal experiences.

5.6. Application domain of proposed system in business


The proposed system extracts events from blogs, and visualizes them with pictures. It is considered that the proposed system helps users to grasp much information of texts, and to understand the contents easily. In the future, the proposed system can be available as follows: (1) Analysis of questionnaires in which people write freely with sentences about a commercial item. (2) Analysis of reviews about a commercial item written in blogs. (3) Analysis of claims about a commercial item. As for the above all cases, though business people generally need to watch a huge amount of text data, it takes too much time to see. If a category of a commercial item is determined, people can collect keywords and pictures about the names of the commercial item, actions what they can do, and accidents that may happen. The proposed system can visualize events that may happen on a commercial item. By seeing visualization results, people can look into the information on the following points: 1. What kinds of information does exist? - People see what kinds of pictures are used many times. They can recognize the whole tendency of information. 2. Is there any important information about a commercial item? - People can find pictures describing special keywords related to an essential part of the commercial item. They can obtain important information as soon as possible. 3. Are there any original comments? - People can recognize the original comments by finding pictures in a few minutes.

- 18 -

Personal Experience Acquisition Support from Blogs using Event-Depicting Pictures Yoko Nishihara, Wataru Sunayama International Journal of Information Processing and Management Volume 2, Number 3, July 2011

After reviewing the above information, people can estimate what information is written in blogs by seeing pictures. It reduces time costs to grasp information.

6. Conclusion
This paper proposed a support system for obtaining personal experiences from blogs by visualizing an event as three kinds of pictures, action, object, and place. Users can judge whether a blog text contains personal experiences or not by watching output pictures. Experimental results showed that the proposed system could support users to obtain personal experiences. Some of the participants said that it was difficult to judge the contents of pictures, when several objects were drawn in a picture. We will improve the method to acquire pictures, and will replace such pictures to new ones representing extracted keywords more simply and intuitively.

7. References
[1] S. Hulbert, J. Beers, and P. Fowler: Motorists' Understanding of Traffic Control Devices, AAA Foundation for Traffic Safety, 1979. [2] M. Pietrucha and R. Knoblauch: Motorists' Comprehension of Regulatory, Warning and Symbol Signs, vol.2, Technical Report Contract DTFH61-83-C-00136, FHWA, U.S. Department of Transportation, 1985. [3] T. Noro, T. Inui, H. Takamura, and M. Okumura: Time Period Identificaiton of Events in Text, in Proc. of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL'06) , pp. 1153--1160, 2006. [4] C. H. Chang, M. Kayed, M. R. Girgis, and K. Shaalan: A Survey of Web Information Extraction Systems, IEEE Transactions on Knowledge and Data Engineering, vol.18, no.10, pp. 1411--1428, 2006. [5] N. S. Glance, M. Hurst, and T. Tomokiyo: BlogPulse: Automated Trend Discovery for Weblogs, in Proc. of the WWW2004 Workshop on the Weblogging Ecosystem, 2004. [6] Web page, (URL) http://search.biglobe.ne.jp/ranking/ [7] Web page, (URL) http://blog360.jp/ [8] Web page, (URL) http://kizasi.jp/ [9] T. Fujiki, T. Nanno, Y. Suzuki, and M. Okumura: Identification of Bursts in a Document Stream, in Workshop on Knowledge Discovery in Data Streams, 2004. [10] J. Kleinberg: Bursty and Hierarchical Structure in Streams, in Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1--25, 2002. [11] K. Dave, S. Lawrence, and D. M. Pennock: Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews, in Proc. of the 12th International World Wide Web Conference, pp. 519--528, 2003. [12] P. D. Turney: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, in Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 417--424, 2002. [13] Web page, (URL) http://blogsphere.biz/ [14] Web page, (URL) http://opinion.labs.goo.ne.jp/cgi-bin/index.cgi [15] Web page, (URL) http://shopping.nifty.com/ [16] Web page, (URL) http://shooti.jp [17] Image search engine, (URL) http://search.yahoo.co.jp/images [18] http://blogsearch.google.co.jp/ [19] Blog search engine, (URL) http://blogs.yahoo.co.jp/

- 19 -

You might also like