Professional Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262728970
VietnamseSentimentAnalysis
CITATIONS READS
0 329
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Son Bao Pham on 01 June 2014.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Sentiment Analysis for Vietnamese
Pos
We collected a corpus of computer products 90.30 71.33 79.70
Wor 300 237 214
reviews and feedbacks and manually annotated all the % % %
d
data using the annotations described in section 3.1. The Neg
corpus consists of 3971 sentences in 20 documents 67.74 70.00 68.85
Wor 60 62 42
corresponding to 20 products. We divided the corpus % % %
d
into 2 parts: the training set and test set. The training 85.71 71.27 77.83
set contains 16 documents (3182 sentences), which is All 362 301 258
% % %
used to create dictionaries and rules for identifying all
the annotations. The test set contains 4 documents and
4.2 Experiment for sentential sentiment
it is used to test the performance of our rule-based
classification
system.
We run the experiments at three levels: word,
At the sentence level, we evaluate the system on the
sentence and features. For word and sentence level
task of labeling PosSen, NegSen and MixSen
evaluation, we just compare the annotation at
annotations. Table 3 and Table 4 show the F-measures
corresponding levels posted by the system with the
of the system for recognizing these three annotations
manually created annotation in the test data.
on training and test data respectively.
4.1 Experiment for sentiment word recognition
Table 3 - Result of sentential sentiment classification
At the word level, we evaluate how well the system on training data
can identify PosWord and NegWord from the test data
#Ann #True F-
using the standard Precision, Recall and F-measure #Anno Preci Recal
otatio annotati measu
measures. Table 1 and Table 2 show the results of the tation
n on
sion l
re
system running on training data and test data Pos 70.64 66.67 68.60
respectively. It appears that the rule-based system 231 218 154
Sen % % %
generalizes quite well for sentiment word recognition Neg 69.79 69.07 69.43
task, as the F-measure on the test data is comparable to 97 96 67
Sen % % %
training data. Mix 26.92 77.78 40.00
9 26 7
Sen % % %
Table 1 – Result of sentiment word recognition on 67.35 67.94 67.64
All 340 343 231
training data % % %
#Syste
True F- Table 4 - Result of sentential sentiment classification
#Anno m Preci Recal
annota meas
tation Annot sion l on testing
tion ure
ation
#Syste
#True F-
Pos #Anno m Preci Recal
88.83 75.74 82.28 annotati measu
Wor 441 376 334 tation Annot sion l
% % % on re
d ation
Neg Pos 63.06 63.06 63.06
76.23 60.78 68.51 157 157 99
Wor 153 122 93 Sen % % %
% % %
d Neg 75.56 69.39 72.34
49 45 34
Sen % % %
Mix 14.29 60.00 23.08 Compaq Presario CQ40 89.99%
5 21 3
Sen % % % HP Pavilion dv3 92.11%
61.16 64.62 62.84 All 88.81%
All 212 224 137
% % %
Even though the system’s performance on sentence
It can be seen that the performance for identifying
level is not very high, but looking at the product as a
sentential sentiment is not very high compared to
whole it is quite reasonable with the averaged
sentiment words. It is partly due to the simple heuristic
correctness of nearly 90%.
we use to identify sentential sentiment based solely on
sentiment words. The MixSen also proves to be much
more difficult to recognize compared to PosSen and 5. Conclusion
NegSen.
We have built a rule-based sentiment analysis
4.3 Features Evaluation system for Vietnamese computer product reviews at
sentence level. Our system looks at features of a
For every product, we evaluate the performance of product and output the ratio of the number of positive
the system on each feature of the product. In this and negative sentiments towards every feature. To the
experiment, we are going to evaluate five features: best of our knowledge, this is the pioneering work for
“vận hành” (operation), “cấu hình” (configuration), Vietnamese.
“màn hình” (monitor), “giá” (price), and “kiểu dáng” Even though the system achieves F-measures of
(appearance). The output of the system for each feature around 77% and 63% for word and sentence levels
is the ration a/b where a and b are the number of respectively, the overall result for a product is of 89%
positive and negative sentences mentioning the feature correctness. While the measure used for evaluating
respectively. For example 15/10 means 15 positive performance of the system on the product level is
sentences discuss the feature and 10 negative sentences subjective, it is indicative of the effectiveness and
talk about the feature. potential of our system.
We define the following measure for a feature: In the future, we plan to collect a larger data set
with more diverse domains and combine our system
Degree of positive sentiment = (number of PosSen) / with machine learning approaches.
(number of PosSen + number of NegSen)
Deviation = | System’s degree of positive sentiment – References
correct degree of positive sentiment |
Correctness = (1 - Deviation)*100% [1] Anne Kao and Stephen R. Poteet. Natural
Language Processing and text mining. April 2006.
The correctness for a product is the averaged value of Chapter 2.
the correctness measure of the product’s features. [2] H. Cunningham, D. Maynard, K. Bontcheva, V.
Tablan. 2002. GATE, A Framework and Graphical
Table 5 and Table 6 show the correctness of system Development Environment for Robust NLP Tools and
when analyzing sentiments for some products on Applications. Proceedings of the 40th Anniversary
training data and test data respectively. Meeting of the Association for Computational
Linguistics (ACL'02). Philadelphia, July 2002.
Table 5 – Result of features evaluation on training [3] Kenneth Ward Church, Patrick Hanks.1989. Word
data association norms, mutual information and
Product Correctness lexicography. Proceedings of the 27th Annual Meeting
Acer Aspire 3935 92.83% of the Association for Computational Linguistics.1989,
Apple Macbook Air 84.26% Vancouver, B.C., Canada, pp76–83.
MB543ZPA [4] Dang Duc Pham, Giang Binh Tran, Son Bao Pham.
2009. A Hybrid Approach to Vietnamese Word
Acer Aspire AS4736 96.11%
Segmentation using Part of Speech tags. International
All 91.07%
Conference on Knowledge and Systems Engineering.
[5] David Day, Chad McHenry, Robyn Kozierok,
Table 6 - Result of features evaluation on test data Laurel Riek. 2004. Callisto: A Configurable
Product Correctness Annotation Workbench. In Proceedings of the Fourth
International Conference on Language Resources and
Dell Inspiron 1210 84.32 %
Evaluation. (LREC 2004). ELRA. May, 2004.
[6] Xiaowen Ding, Bing Liu, Lei Zhang. 2009. Entity [12] Mary S. Neff, Roy J. Byrd, and Branimir K.
Discovery and Assignment for Opinion Mining Boguraev. 2003. The Talent System: TEXTRACT
Applications. Proceedings of the 15th ACM SIGKDD Architecture and Data Model. Proceedings of the
international conference on Knowledge discovery and HLT-NAACL2003 Workshop on Software Engineering
data mining. and Architecture of Language .
[7] Christiane Fellbaum. 1998. WordNet: an electronic [13] Bo Pang, Lillian Lee and Shivakumar
lexical database. MIT Press. Vaithyanathan. 2002. Thumbs up? Sentiment
[8] Vasileios Hatzivassiloglou and Kathleen R. classification using machine learning techniques.
McKeown. 1997. Predicting the Semantic Orientation Proceedings of the 7th Conference on Empirical
of Adjectives. Proceedings of the 8th conference on Methods in Natural Language Processing (EMNLP-
European chapter of the Association for Computational 02).
Linguis- tics. 1997, Madrid, Spain. [14] Peter Turney. 2001. Mining the Web for synonyms:
[9] Minqing Hu and Bing Liu. 2004. Mining and PMI-IR versus LSA on TOEFL. Proceedings of the
summarizing customer reviews. Proceedings of the 12th European Conference on Machine Learning.
10th ACM SIGKDD international conference on Berlin: Spinger-Verlag, pp. 491–502.
Knowledge discovery and data mining. Aug. 22–25, [15] Peter Turney. 2002. Thumbs up or thumbs
2004, Seattle, WA, USA. down? Semantic orientation applied to unsupervised
[10] Chris Manning and Hinrich Schutze. 1999. classification of reviews. Proceedings of the 40th
Foundations of Statistical Natural Language Processing. Annual Meeting of the Association for Computational
MIT Press, Cambridge, MA. Linguistics (ACL-02). Jun. 2002, Philadelphia, PN,
[11] Tetsuya Nasukawa, Jeonghee Yi. 2003. USA, pp.417–424.
Sentiment Analysis: Capturing Favorability Using [16] http://tinvadung.vn
Natural Language Processing. Proceedings of the 2nd [17] http://callisto.mitre.org/download.html
international conference on Knowledge Capture.