You are on page 1of 8

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262728970

VietnamseSentimentAnalysis

Data · June 2014

CITATIONS READS

0 329

2 authors:

Binh Thanh Kieu Son Bao Pham


Vietnam National University, Hanoi Vietnam National University, Hanoi
3 PUBLICATIONS 15 CITATIONS 57 PUBLICATIONS 441 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

TAT (Text Attribution Tool) View project

Social media monitoring for VNU View project

All content following this page was uploaded by Son Bao Pham on 01 June 2014.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Sentiment Analysis for Vietnamese

Binh Thanh Kieu, Son Bao Pham


University of Engineering and Technology
Vietnam National University Hanoi
Email: sonpb@vnu.edu.vn

Abstract sentences, words in different contexts, sentiment


classification for the complex articles etc.
Sentiment analysis is one of the most important In this paper, we propose a rule-based method for
tasks in Natural Language Processing. Research in constructing automatic evaluation of users’ opinion at
sentiment analysis for Vietnamese is relatively new and sentence level. Our system is built on GATE [2] - a
most of current work only focus in document level. In framework for developing components of natural
this paper, we address this problem at the sentence language processing. Our system focuses on the
level and build a rule-based system using the Gate domain of computer products (laptop & desktop).
framework. Experimental results on a corpus of We will present related work on sentiment analysis
computer products reviews are very promising. To the in section 2 and describe our system in section 3.
best of our knowledge, this is the first work that Section 4 will show some experimental results and
analyzes sentiment at sentence level in Vietnamese. error analysis. Finally, section 5 will give concluding
remarks and pointers to future work.
Keywords: Sentiment Analysis, Opinion Mining, Text
Mining. 2. Related Work

1. Introduction For the last decade, sentiment mining has become a


hot subject among natural language processing (NLP)
In recent years, along with the rapid growth of the and information retrieval (IR) researchers [1]. Though
Internet, textual information on the web is becoming the works on sentiment mining all have different
larger and larger. Generally, textual information is focuses, emphasizes and objectives; nevertheless, they
often classified into two main types: facts and generally consists of the following three steps:
opinions. Most current information processing sentiment words or phrases identification, sentiment
techniques (search engines) works with facts. Facts can orientation identification and sentiment sentence or
be expressed with topic keywords. However, search document classification.
engines do not search for opinions. An example for this Sentiment words or phrases identification focuses
kind of information is the product reviews. This on content words (nouns, verbs, adjectives and
information can be collected from manufacturers or adverbs), most of the work use part-of-speech (POS) to
users. Manufacturers use opinions for building extract them [9] [15]. Other natural language
business strategy. A sentiment analysis system about processing techniques like stop words removal,
product’s quality is expected to meet the need of both stemming and fuzzy matching are also used in the
the users and the manufacturers. preprocessing stage to extract sentiment words and
Technically, each sentiment analysis system can phrases.
often be divided into two parts: identifying words and In the work about sentiment orientation
phrases that hold opinions and classifying sentence or identification, there are many approaches proposed. Hu
document according to the opinions. Unlike the and Liu [9] applied POS tagging and some natural
classification by types or subject, the classification by language processing techniques to extract the
sentiment requires the understanding of the emotional adjectives as sentiment words. Experimental result of
trend in the article. Some challenging aspects in their opinion sentence extraction has a precision of
sentiment analysis include the identification of opinion 64.2% and a recall of 69.3%. Fellbaum [7] uses
terms, the intensities of sentiment, the complexity of WordNet to determine whether the extracted adjective
has a positive or negative polarity. The pointwise
mutual information (PMI) is used by Church and
Hanks [3], Turney [14] to measure the strength of preprocess the data such as: standardizing short words
semantic association between two words. Nasukawa ("wa", "ko")
and Yi [11] also consider verbs as sentiment Our corpus contains about 3971 sentences in 20
expressions for their sentiment analysis. They use documents corresponding to 20 products that we have
HMM-based POS tagger [10] and rule-based shallow collected.
parsing [12] for preprocessing. They then analyze the With the collected corpus, we use Callisto
syntactic dependencies among the phrases and look for annotation tool [5][17] to mark up the following
phrases with a sentiment term that modifies or is annotations to do our sentential sentiment analysis. We
modified by a subject term. use this process to obtain an annotated corpus and also
The task of sentence or document sentiment to incrementally create the rules. In the word level, we
classification is to classify a sentence or document have two annotations PosWord (positive word) and
according to its polarity into different sentiment NegWord (negative word). For sentence level, we use
categories – positive or negative with neutral category PosSen (positive sentence), NegSen (negative
added sometimes. Hu and Liu [9] predict the sentence) and MixSen (mixed sentence) annotations to
orientation of the opinion sentence in their study of distinguish sentences with positive, negative and both
customer reviews. Turney [15] used a simple positive and negative sentiment respectively. To handle
unsupervised algorithm to classify reviews in different sentences that have implicit sentiment via comparing
domains as recommended or not recommended and different products, we use CompWord (comparison
then do sentiment words (phrase) extraction based on word) and CompSen (comparison sentences)
Hatzivassiloglou and McKeown’s [8] approach and annotations.
orientation identification based on Turney’s [14]
approach. The averaged classification accuracy of the 3.2 System Overview
reviews in different domains is 74.39%. Pang [13] used Our systems are built on three main components:
supervised machine learning to classify movie reviews. sentiment words or phrases identification, sentiment
Without classifying individual sentiment words or orientation identification and sentential sentiment
phrases, they extract different features from the review classification. These three components are executed in
and use Naive Bayes, Maximum Entropy and Support the following order:
Vector Machine to classify the reviews. They achieved 1. Preprocessing.
accuracies between 78.7% and 82.9%. 2. Identify words, phrases and sentiment
words and phrases.
3. Our System of analyzing users’ opinions 3. Classify sentential sentiment.
4. Evaluate product features based on the
Most of approaches in sentiment analysis are classified sentences.
language and domain dependant. Our approach
analyzes product’s features sentiment and classifies it Let’s look at the following input sentence:
into two categories: positive or negative. In the process “HP dv 4 có thiết kế bắt mắt, ưa nhìn tuy nhiên giá
of data collection, we realize almost all sites were quá cao.”
discussing only one product in each thread, so we In the preprocessing step, we use word
assume that only one product is the target of review in segmentation and POS tagger:
a document. However there are many discussions “<X>HP dv 4</X> <Vts>có</Vts> <Vt>thiết kế</Vt>
about different features of the product in a document. <V>bắt mắt</V>, <A>ưa nhìn</A> <Cc>tuy
3.1 Data and annotation nhiên</Cc> <Na>giá</Na> <Jd>quá</Jd>
This is the first step to build our rule-based system. <An>cao</An>.”
One constraint is that most Vietnamese product After preprocessing, we identified sentiment words
reviews available online are about electronic devices. and phrases:
In addition, the product feedbacks and reviews are “HP dv 4 có <kieudang>thiết kế</kieudang>
often written by teens which uses special language <PosWord>bắt mắt</PosWord>, <PosWord>ưa
including new terms, abbreviation, mixed with foreign nhìn</PosWord> tuy nhiên <gia>giá</gia> quá
terms etc. Our data is mainly taken from an online <NegWord>cao</NegWord>.”
product-advertising page [16] with computer category We divided sentences into simple sentence (or
(laptop & desktop). In the future we will extend the clauses) and classified simple sentences’ sentiment:
data to include other products such as mobile phones, “<PosSen>HP dv 4 có thiết kế bắt mắt, ưa
automobiles. After we collected the data, we nhìn</PosSen> tuy nhiên <NegSen>giá quá
cao.</NegSen>”
Finally, we summarized overall products features’ hình (configuration), hệ thống (system), vi xử
sentiment: lý (CPU) etc.
Kiểu dáng: 1/0 (#positive/#negative) b. Dictionary of words related to “kiểu dáng”
Giá: 0/1 (#positive/#negative) (appearance) feature: kiểu dáng (appearance),

Figure 1 – System overview


The effectiveness of the GATE framework for NLP thiết kế (design), thân hình (body), kích thước
tasks has been proven through many researches, so we (size), màu sắc (color) etc.
decided to build our Vietnamese sentiment analysis 2. Dictionaries containing words used to develop rules
system as plugins on GATE. The architecture of the to identify features’ sentiment:
system is shown in Figure 1 with the following three a. Positive word dictionary: tốt (good), tuyệt vời
components: (excellent), hoàn hảo (perfect), hài lòng
1. Preprocessing: Word segmentation and POS (satisfying) etc.
tagger b. Negative word dictionary: xấu (ugly), đắt
2. Dictionaries (expensive), thô (rough), phàn nàn
3. Rules: word identification, sentence (complain), thất vọng (disappointing) etc.
classification, and features evaluation. c. Reverse opinion word dictionary: không thể
(cannot), không quá (not too) etc.
3.3 Preprocessing 3.5 Rules
A distinctive feature of the Vietnamese language is There are four types of rules:
word segmentation. An English word is identified by 1. Dictionaries lookup words correction.
space characters, but words in Vietnamese are 2. Sentiment word recognition.
different. A word in Vietnamese language can consist 3. Sentential sentiment classification
of more than one monosyllable. For example the 4. Features evaluation
following sentence: We use Gate’s Jape grammar to specify our rules. A
“Học sinh học sinh học.” Jape grammar allows one to specify regular expression
may be word segmented as follows: patterns over semantic annotations. Bellows is an
“Học_sinh học sinh_học.” or example of a JAPE rule:
“Học sinh_học sinh_học.”
Rule: rulePositive1
In our system, we reuse an existing
Coltech.NLP.tokenizer plugin [4] for word Priority: 1
segmentation and POS tagging.
3.4 Dictionaries (
During the process of annotating the corpus using
Callisto, we created a number of dictionaries, which (StrongWord)
can be divided into two groups:
1. Dictionaries containing names related to features ({Word.category=="O"})?
recognition:
a. Dictionary of words related to configuration ({Lookup.majorType=="positive"}) :name
features of computer product such as: cấu
)
-->:name.PosWordFirst = {kind = "StrongWord +
<O>? +<PosWord>", type="Positive", rule =
"Positive recognition"}
In the first step, we remove monosyllables
appearing in dictionaries but are not words and do not
carry the correct meaning in context. For example:
“Macbook Pro MB471ZPA có giá quá cao. Tuy nhiên
chiếc Laptop này vẫn được đánh giá cao.”
Because in our dictionaries define a word "giá" to
refer to the Feature "giá" (price) of products so it
would be incorrect to identify "giá" in the word "đánh
giá" as a feature "giá". This could simply be fixed by
overwriting the result of word segmentation over
dictionaries lookup.
In sentiment word recognition step, sentiment
words are determined based on dictionaries but there Figure 2 – Sentiment word recognition in GATE
are many cases where simply matching dictionaries The sentiment sentence classification step consists
without considering the context gives a wrong result. of two main subtasks:
For example "thời trang" is a sentiment word in • Simple sentence (or clauses) split
sentence “Phong cách rất thời trang” but not a • Sentiment sentence classification: PosSen
sentiment word in sentence “Thiết kế của máy có nét (positive sentence), NegSen (negative sentence),
thời trang giống với chiếc xe ô tô”. There are also MixSen (mix sentence) and CompSen (comparison
cases where a word can bring both positive and sentence).
negative sentiment depending on context. For example, Compound sentences may contain more than one
the word "cao" (high) is positive if it talks about clause discussing several features of a product. The
computer configuration but is negative when talking simple sentence split step is to identify compound
about price. sentences and split them into separate simple
Contextually, it is easy to notice that sentiment sentences. We used rules to determine simple sentence
words usually appear after some adverbs. For example, using connective words. After this step, all sentences
positive sentiment words (PosWord) go with “rất” are considered simple and talk about only one feature
(very), “siêu”, “khá”, “cực”, “đáp ứng” while negative per sentence.
sentiment words (NegWord) go with “dễ”, “hơi”, For sentence classification, positive sentences
“gây”, “bị”. We use the following pattern to recognize (PosSen) are assumed to include only positive words
sentiment words: (PosWord), negative sentences (NegSen) are assumed
<StrongWord> + <Adv> + <word in sentiment to include only negative words (NegWord). Mixed
dictionaries> -> opinion word sentences (MixSen) contain both positive and negative
When user uses multiple sentiment words for sentiment words. Among sentences not containing any
describing a features such as in the following example: sentiment words, we identify sentences containing
“Laptop cho doanh nhân Acer Aspire 3935 sử comparison expressions and label them as CompSen.
dụng thiết kế phá cách, hiện đại.” With comparison sentences, because the sentences
we use the following pattern: often compare one product with another product, we
<Opinion word> (<conjunction: , và hay …> assume the target product of the document is always
<Opinion word>)* mentioned first and the nature of the comparison
Another important scenario is when users use correspond to the sentiment. In particular, if it is a
words that reverse the sentiment of the following better or worse comparison then it is of positive or
statement. We simple use the following rule to handle negative sentiment respectively. In effect, CompSen
this case: sentences will be converted to PosSen and NegSen
<Reverse Opinion> < positive word (negative where appropriate.
word)> -> < negative word (positive word)> Overall features evaluation is based on the result of
In addition, we also create some other rules based simple sentence classification. For positive and
on POS tags using unit testing to ensure consistency negative sentences, it is quite straightforward as we
between new rules and the data already correctly only have to identify the feature mentioned in the
identified by existing rules. sentence and deem the sentiment of sentence to be the
sentiment of the feature. For mixed sentences, we use
an assumption that they normally have the following 85.86 72.07 78.97
format <Feature> <Opinion> <Feature> <Opinion>. All 598 502 431
% % %
Therefore we associate each sentiment with the nearest
preceding feature.
Feature evaluation simply counts how many Table 2 - Result of sentiment word recognition on test
positive and negative sentences containing the feature data
and output the ratio between the number of positive
and negative sentences. This ratio captures how users #Syste
think about the feature. #True F-
#Anno m Preci Recal
annota meas
tation Annot sion l
tion ure
4. Experiments ation

Pos
We collected a corpus of computer products 90.30 71.33 79.70
Wor 300 237 214
reviews and feedbacks and manually annotated all the % % %
d
data using the annotations described in section 3.1. The Neg
corpus consists of 3971 sentences in 20 documents 67.74 70.00 68.85
Wor 60 62 42
corresponding to 20 products. We divided the corpus % % %
d
into 2 parts: the training set and test set. The training 85.71 71.27 77.83
set contains 16 documents (3182 sentences), which is All 362 301 258
% % %
used to create dictionaries and rules for identifying all
the annotations. The test set contains 4 documents and
4.2 Experiment for sentential sentiment
it is used to test the performance of our rule-based
classification
system.
We run the experiments at three levels: word,
At the sentence level, we evaluate the system on the
sentence and features. For word and sentence level
task of labeling PosSen, NegSen and MixSen
evaluation, we just compare the annotation at
annotations. Table 3 and Table 4 show the F-measures
corresponding levels posted by the system with the
of the system for recognizing these three annotations
manually created annotation in the test data.
on training and test data respectively.
4.1 Experiment for sentiment word recognition
Table 3 - Result of sentential sentiment classification
At the word level, we evaluate how well the system on training data
can identify PosWord and NegWord from the test data
#Ann #True F-
using the standard Precision, Recall and F-measure #Anno Preci Recal
otatio annotati measu
measures. Table 1 and Table 2 show the results of the tation
n on
sion l
re
system running on training data and test data Pos 70.64 66.67 68.60
respectively. It appears that the rule-based system 231 218 154
Sen % % %
generalizes quite well for sentiment word recognition Neg 69.79 69.07 69.43
task, as the F-measure on the test data is comparable to 97 96 67
Sen % % %
training data. Mix 26.92 77.78 40.00
9 26 7
Sen % % %
Table 1 – Result of sentiment word recognition on 67.35 67.94 67.64
All 340 343 231
training data % % %

#Syste
True F- Table 4 - Result of sentential sentiment classification
#Anno m Preci Recal
annota meas
tation Annot sion l on testing
tion ure
ation
#Syste
#True F-
Pos #Anno m Preci Recal
88.83 75.74 82.28 annotati measu
Wor 441 376 334 tation Annot sion l
% % % on re
d ation
Neg Pos 63.06 63.06 63.06
76.23 60.78 68.51 157 157 99
Wor 153 122 93 Sen % % %
% % %
d Neg 75.56 69.39 72.34
49 45 34
Sen % % %
Mix 14.29 60.00 23.08 Compaq Presario CQ40 89.99%
5 21 3
Sen % % % HP Pavilion dv3 92.11%
61.16 64.62 62.84 All 88.81%
All 212 224 137
% % %
Even though the system’s performance on sentence
It can be seen that the performance for identifying
level is not very high, but looking at the product as a
sentential sentiment is not very high compared to
whole it is quite reasonable with the averaged
sentiment words. It is partly due to the simple heuristic
correctness of nearly 90%.
we use to identify sentential sentiment based solely on
sentiment words. The MixSen also proves to be much
more difficult to recognize compared to PosSen and 5. Conclusion
NegSen.
We have built a rule-based sentiment analysis
4.3 Features Evaluation system for Vietnamese computer product reviews at
sentence level. Our system looks at features of a
For every product, we evaluate the performance of product and output the ratio of the number of positive
the system on each feature of the product. In this and negative sentiments towards every feature. To the
experiment, we are going to evaluate five features: best of our knowledge, this is the pioneering work for
“vận hành” (operation), “cấu hình” (configuration), Vietnamese.
“màn hình” (monitor), “giá” (price), and “kiểu dáng” Even though the system achieves F-measures of
(appearance). The output of the system for each feature around 77% and 63% for word and sentence levels
is the ration a/b where a and b are the number of respectively, the overall result for a product is of 89%
positive and negative sentences mentioning the feature correctness. While the measure used for evaluating
respectively. For example 15/10 means 15 positive performance of the system on the product level is
sentences discuss the feature and 10 negative sentences subjective, it is indicative of the effectiveness and
talk about the feature. potential of our system.
We define the following measure for a feature: In the future, we plan to collect a larger data set
with more diverse domains and combine our system
Degree of positive sentiment = (number of PosSen) / with machine learning approaches.
(number of PosSen + number of NegSen)
Deviation = | System’s degree of positive sentiment – References
correct degree of positive sentiment |
Correctness = (1 - Deviation)*100% [1] Anne Kao and Stephen R. Poteet. Natural
Language Processing and text mining. April 2006.
The correctness for a product is the averaged value of Chapter 2.
the correctness measure of the product’s features. [2] H. Cunningham, D. Maynard, K. Bontcheva, V.
Tablan. 2002. GATE, A Framework and Graphical
Table 5 and Table 6 show the correctness of system Development Environment for Robust NLP Tools and
when analyzing sentiments for some products on Applications. Proceedings of the 40th Anniversary
training data and test data respectively. Meeting of the Association for Computational
Linguistics (ACL'02). Philadelphia, July 2002.
Table 5 – Result of features evaluation on training [3] Kenneth Ward Church, Patrick Hanks.1989. Word
data association norms, mutual information and
Product Correctness lexicography. Proceedings of the 27th Annual Meeting
Acer Aspire 3935 92.83% of the Association for Computational Linguistics.1989,
Apple Macbook Air 84.26% Vancouver, B.C., Canada, pp76–83.
MB543ZPA [4] Dang Duc Pham, Giang Binh Tran, Son Bao Pham.
2009. A Hybrid Approach to Vietnamese Word
Acer Aspire AS4736 96.11%
Segmentation using Part of Speech tags. International
All 91.07%
Conference on Knowledge and Systems Engineering.
[5] David Day, Chad McHenry, Robyn Kozierok,
Table 6 - Result of features evaluation on test data Laurel Riek. 2004. Callisto: A Configurable
Product Correctness Annotation Workbench. In Proceedings of the Fourth
International Conference on Language Resources and
Dell Inspiron 1210 84.32 %
Evaluation. (LREC 2004). ELRA. May, 2004.
[6] Xiaowen Ding, Bing Liu, Lei Zhang. 2009. Entity [12] Mary S. Neff, Roy J. Byrd, and Branimir K.
Discovery and Assignment for Opinion Mining Boguraev. 2003. The Talent System: TEXTRACT
Applications. Proceedings of the 15th ACM SIGKDD Architecture and Data Model. Proceedings of the
international conference on Knowledge discovery and HLT-NAACL2003 Workshop on Software Engineering
data mining. and Architecture of Language .
[7] Christiane Fellbaum. 1998. WordNet: an electronic [13] Bo Pang, Lillian Lee and Shivakumar
lexical database. MIT Press. Vaithyanathan. 2002. Thumbs up? Sentiment
[8] Vasileios Hatzivassiloglou and Kathleen R. classification using machine learning techniques.
McKeown. 1997. Predicting the Semantic Orientation Proceedings of the 7th Conference on Empirical
of Adjectives. Proceedings of the 8th conference on Methods in Natural Language Processing (EMNLP-
European chapter of the Association for Computational 02).
Linguis- tics. 1997, Madrid, Spain. [14] Peter Turney. 2001. Mining the Web for synonyms:
[9] Minqing Hu and Bing Liu. 2004. Mining and PMI-IR versus LSA on TOEFL. Proceedings of the
summarizing customer reviews. Proceedings of the 12th European Conference on Machine Learning.
10th ACM SIGKDD international conference on Berlin: Spinger-Verlag, pp. 491–502.
Knowledge discovery and data mining. Aug. 22–25, [15] Peter Turney. 2002. Thumbs up or thumbs
2004, Seattle, WA, USA. down? Semantic orientation applied to unsupervised
[10] Chris Manning and Hinrich Schutze. 1999. classification of reviews. Proceedings of the 40th
Foundations of Statistical Natural Language Processing. Annual Meeting of the Association for Computational
MIT Press, Cambridge, MA. Linguistics (ACL-02). Jun. 2002, Philadelphia, PN,
[11] Tetsuya Nasukawa, Jeonghee Yi. 2003. USA, pp.417–424.
Sentiment Analysis: Capturing Favorability Using [16] http://tinvadung.vn
Natural Language Processing. Proceedings of the 2nd [17] http://callisto.mitre.org/download.html
international conference on Knowledge Capture.

View publication stats

You might also like