You are on page 1of 5

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882

Volume 4, Issue 2, February 2015

Improving Semantic Knowledge Base for Transfer


Learning in Sentiment Analysis
R.Gayathri,1 , K. Krishna Kumari2
1

P.G Student, 2Associate Professor


Department of Computer Science and Engineering, A.V.C College of Engineering, Mayiladuthurai, Tamil Nadu

Abstract
Sentiment analysis deals with the computational treatment
of opinion, sentiment, and subjectivity in text, has attracted
a great deal of attention. Sentiment analysis has been
widely used across a wide range of domains in recent
years, such as information retrieval, question answering
systems and social network. This paper presents a new
method for improving the semantic knowledge base for
sentiment classification in social web applications. It
comprises the three steps. First, to identify sentiment
terms. Next, to provide the context information from
training corpus and ground this information to lexical
resources such as WordNet. This Work applies to a
transfer learning method called cross-domain sentiment
classification. In Sentiment Analysis, transfer learning can
be applied to transfer sentiment classification from one
domain to another or building a bridge between two
domains. This is achieved by learning the semantic
knowledge base across the different domains. A model
called AS_LDA is used for the sentiment classification.
The performance of the proposed system improves the
accuracy of the Sentiment Classifier to a significant extent.
Key terms: WordNet, Sentiment Analysis, Cross-Domain
sentiment Classification, Transfer Learning.

1. INTRODUCTION
Sentiment analysis is a technique to classify peoples
opinions in product reviews, blogs or social networks.
Large datasets are available on-line today, they can be
numerical or text file and they can be structured, semistructured or non-structured. Approaches and technique to
apply and extract useful information from these data have
been the major focuses of many researchers and
practitioners lately. Many different information retrieval
techniques and tools have been proposed according to
different data types. In addition to data and text mining,
there has seen a growing interest in non-topical text
analysis in recent years. Sentiment analysis is one of them.
Sentiment analysis, also known as opinion mining, is to
identify and extract subjective information in source
materials, which can be positive, neutral, or negative.

Sentiment Analysis allows business to track the Flame


detection, new product perception, Brand Perception and
Reputation Management. It allows the individuals to get
opinion on something on a global Scale. Sentiment is
expressed differently in dissimilar domains, and it is costly
to interpret data for each new domain in which we would
like to apply a sentiment classifier. Here, we proposed a
cross-domain classification method that overcomes those
challenges.
1.1 Problem Description
Many examples in knowledge engineering can be found
where transfer learning can truly be beneficial. Consider
the problem of sentiment classification, where our task is
to automatically classify the reviews on a product, such as
a brand of camera, into positive and negative views. For
this classification task, we need to first collect many
reviews of the product and annotate them. We would then
train a classifier on the reviews with their corresponding
labels. Since the distribution of review data among
different types of products can be very different, to
maintain good classification performance, we need to
collect a large amount of labeled data in order to train the
review-classification models for each product. However,
this data-labeling process can be very expensive to do. To
reduce the effort for annotating reviews for various
products, we may want to adapt a classification model that
is trained on some products to help learn classification
models for some other products. In this paper, a new
Knowledge Based Transfer Learning (KBTL) model is
proposed.
1.2 Sentiment Classification
Sentiment classification is an opinion mining activity
concerned with determining what, if any, is the
Overall sentiment orientation of the opinions contained
within a given document. It is assumed in general that the
document being inspected contains subjective information,
such as in product reviews and feedback forms. Opinion
orientation can be classified as belonging to opposing
positive or negative polarities positive or negative

www.ijsret.org

109

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 2, February 2015

feedback about a product, favorable or unfavorable


opinions on a topic
1.3 Transfer Learning Method
Transfer learning extracts knowledge from auxiliary
domain to improve the learning process in a target domain.
Transfer learning is considered a new cross domain
learning technique as it addresses the various aspects of
domain differences. In Sentiment Analysis; transfer
learning can be applied to transfer sentiment classification
from one domain to another or building a bridge between
two domains. This proposed system accurately Transfer
the sentiment classification across the different domain
using the enriched knowledge base.

2. RELATED WORK
Common Sentiment Analysis Task [5] proposed the basic
task of opinion mining is polarity classification and
Agreement detection. Polarity classification occurs when a
piece of text stating an opinion on a single issue is
classified as one of two opposing sentiments. Polarity
classifications also identify pro and con expressions in
online reviews. Agreement detection determines whether a
pair of text documents should receive the same or different
sentiment-related labels.
WORDNET relations [2] proposed WORDNET-AFFECT,
generates synsets that still represent affective concepts. If
the resulting synsets are members of WORDNETAFFECT, then the answer is trivially affirmative. For other
relations such as hyperonymy, entailment, causes, verbgroup it assumed the affective mean and it is necessary to
manually filter the synsets in order to select those affective
concepts.
NLP curves [6] proposed the automated analysis
techniques for extract and manipulate text meanings. A
NLP system must have access to a significant amount of
knowledge about the world and the domain of discourse.

Text categorization [17] addresses the use of Support


Vector Machine (SVM), based on principle of Structural
risk minimization it make the analysis with particular
properties of learning with text data. It finds the good
parameters
automatically
using
structural
risk
minimization principle.
Domain adaptation machine [4] proposed a robust
classifier for the target domain by leveraging many base
classifiers which could be learned using the labeled
samples from the source domains or the target domains.

3. PROPOSED SYSTEM
The proposed knowledge base can be used to fix the
existing context-aware approaches use vector space
address the problem of contextual polarity change. This
aims to increase the lexicons coverage and derive
information for subsequent sentiment analysis. We use
WordNet terms and their polarity values to generate a
baseline sentiment lexicon, identify sentiment terms, and
extract context information from training corpus and
ground this information to lexical resources such as
WordNet. This knowledge base is to make as domain
adaptation for cross Domain sentiment classification. It
provides a two-stage framework for cross-domain
sentiment classification. In the first stage they built a
bridge between the source domain and the target domain to
get some most confidently labeled documents in the target
domain. In the second stage they exploited the intrinsic
structure, revealed by these labeled documents to label the
target-domain data.
WordNet is a lexical database for English language that
groups English word into set of synonyms called synset.
WordNet distinguish between nouns, verbs, adjective as
major categories. At Concept level, WordNet which is
given in Figure 1 is used as a knowledge base for deriving
the semantic and lexical relations.

Transfer component analysis [15] proposed the methods


aim to extract a shared feature subspace in which the
difference in distributions across domains can be reduced
by minimizing predefined distance measures.
Structural correspondence learning [9] proposed the
algorithm for linking the source and target domain by
selecting the pivot features
and highly deals with
correct misalignment using labeled instances.

www.ijsret.org

Fig.1 WordNet

110

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 2, February 2015

This section summarizes the sequence of steps to identify


the Sentiment terms and ground them to common-sense
and common knowledge, i.e. WordNet senses. The gained
knowledge is used to improve concepts with context
information from WordNet definitions and statements.
The following Figure 2 illustrates the overall
architecture of the KBTL system.
Training
Data

Domain
Specific
Keywords

Test Data

Word Splitter

Cross
Domain
Sentiment
Classification

Removing
Stop Words
POS Tagger

b. Sentence Splitter: Delimits the sentences through a


binary decision tree following. In general, periods,
uppercase letters, exclamation points and question marks
are good indicators of sentence boundaries.
c. POS Tagger: Determines the function of nouns, verbs
and adjectives (classes of words with a possible affective
content) within the sentence. A statistical approach is
implemented using the Stanford log-linear POS tagger.

Feature
Extraction

Preprocessor

a. Lexical Analyzer: Converts the plain input text into an


output token stream. This module is produced with the
JavaCC2 parser generator. Additionally, this module spots
the possible affective containers (content words), valence
shifters such as negation words and intensifiers and filters
out stop words like function words.

Preprocessing

Test
Data

Lexical Analyzer

Splitter

Accuracy
Detection

POS Tagger

Fig.2 Overview of the Proposed KBTL Model

Bag of
Words

Fig.3 Preprocessing System


3.1 Training Data Set
3.3 Feature Extraction
The training data should be extracted as feature and it can
be trained for making decision making with the test data
from the same domain. The domain specific keywords are
collected and made as input for the training as well as
testing. The most important indicators of sentiments are
sentiment words, also called opinion words. These are
words that are commonly used to express positive or
negative sentiments. Although sentiment words and
phrases are important for sentiment analysis, only using
them is far from sufficient. The problem is much more
complex. In other words, we can say that sentiment lexicon
is necessary but not sufficient for sentiment analysis.
3.2 Preprocessing the Test Data
The test data should be collected and preprocessed for the
process of classification. The Figure 3 shows the various
involved in the preprocessing which are explained as
below

The Classifier predicts the most appropriate sentiment


label according to the features extracted from the terms
observed in the text, which is usually taken for a bag of
words. Word-Sense Disambiguator: resolves the meaning
of affective words (i.e., nouns, adjectives and verbs)
according to their context. It uses a semantic similarity
measure to score the senses of an affective word with the
context words using the WordNet ontology. Additionally,
the module retrieves the set of synonyms for the resulting
sense in order to expand the feature space. It performs a
series of processing steps to ground a sentiment terms to
WordNet concepts involves, extract positive and negative
context terms from the contextualized sentiment lexicon
3.4 Cross Domain Sentiment Classifications
Supervised machine learning techniques are used for
classified document or sentences into finite set of class i.e
into positive, negative and neutral. Training data set is

www.ijsret.org

111

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 2, February 2015

available for all kind of classes. An optimal scenario will


allow for the algorithm to correctly determine the class
labels for unseen instances. We are using the sentiment
model called Auxiliary-Sentiment Latent Dirichlet
Allocation (AS-LDA) for sentiment classification. It
identifies the polarity of the subjective document using the
sentiment element words and auxiliary words which are
sampled accordingly from sentiment topics and auxiliary
topics. Sentiment element words include targets of the
opinions, polarity words and modifiers of polarity words.

shows the implementation of proposed sentiment


classification KBTL model results compared with existing
cross domain sentiment classification methods such as
Structural Correspondence Learning (SCL) algorithm,
Sentiment Sensitive Thesaurus (SST) algorithm.

KBTL

3.5 Accuracy Detection


SST
Accuracy for this classifier can be detected using the
Confusion Matrix which is given Table 1.
Confusion
matrix is a tool for analyzing how well classifier can
recognize tuples of different classes. A confusion matrix
contains information about actual and predicted
classifications done by a classification system. Accuracy
detects the percentage of predictions that are correct using
formula (TP + TN) / (TP + TN + FP + FN).

SCL
0.8

0.84

0.86

0.88

Fig.4 Accuracy of the KBTL Classifier

Table 1 Confusion Matrix

5. CONCLUSION

Predicted Class

Actual Class

0.82

YES

NO

YES

TP

FN

NO

FP

TN

4. RESULTS AND DISCUSSION


We model this classifier by the training the two different
domain of Hotel, Music Datasets and tested against the
laptop domain. Sentiment is expressed differently in
different domains, and it is costly to annotate data for each
new domain in which we would like to apply a sentiment
classifier. It can be done by training a classifier from one
or more source domains and applying the trained classifier
on a different domain target domain. The sentiment
classifications are explained in the proposed method.
While classifying the dataset across the domain it achieves
the higher accuracy compared to single domain
classification. The corresponding Cross-domain classifiers
achieve the performance of 86.15% in accuracy,
respectively. Below graphical representation in Figure 4

Sentiment Analysis, as an interdisplinary field that crosses


natural language processing, artificial intelligence and text
mining. This paper presents a new method to increase the
lexicon exposure and effectively derive the concept
information for sentiment classification. Improving
knowledge bases with information on (i) sentiment terms,
(ii) positive and negative context terms, (iii) the grounding
of this information to common-sense and common
knowledge bases such as WordNet. This knowledge base
is used to make as domain adaptation for cross Domain
sentiment classification. It can be done by transfer
sentiment classification from one domain to another or
building a bridge between two domains. This is achieved
by learning the semantic knowledge base across the
different domains. The AS_LAD classification algorithm
is proposed to obtain the effective results in the sentiment
classification. It evaluates the results through the
supervised learning algorithms. In future, it is planned to
generalize the proposed method to solve other types of
domain adaptation tasks.

REFERENCES
[1] A. Das, B. Gambaeck, Sentimantics: conceptual
spaces for lexical sentiment polarity representation with
contextuality|, Proceedings of 3rd Workshop on
Computational Approaches to Subjectivity and Sentiment

www.ijsret.org

112

International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 4, Issue 2, February 2015

Analysis at ACL (WASSA), Jungmun, South Korea, pp.


3846, 2012.
[2] C. Strapparava, A. Valitutti, WordNet-affect: an
affective extension of WordNet, Proceedings of 4th
International Conference on Language Resources and
Evaluation (LREC), Lisbon, Portugal, pp. 10831086,
2004.
[3] D. Lenat and R. Guha, Building Large KnowledgeBased Systems: Representation and Inference in the Cyc
Project, Artificial Intelligence, vol. 61, pp. 95-104,
Elsevier, 1993.
[4] Duan,L.,Xu, D.,Tsang,I.W, Domain Adaptation from
Multiple Sources:a Domain-dependent Regularization
Approach, IEEE Transactions on Neural Network
Learning System, vol. 23(3), pp. 504518, 2012.

[11] Kim Soo-Min and Eduard Hovy, Determining the


Sentiment of Opinions, In Proceedings of Conference on
Computational Linguistics (COLING-04), Geneva,
Switzerland, pp. 1367-1373, 2004.
[12] Miller G. A., Beckwith R., Fellbaum C, Gross D,
Miller K. J. (1990), Introduction to Wordnet: An Online Lexical Database, International Journal
Lexicography, vol. 3, no. 4, pp. 235-244, 1990.

of

[13] N. Kaji and M. Kitsuregawa, Building Lexicon for


Sentiment Analysis from Massive Collection of HTML
Documents, Proceedings of Joint Conference on
Empirical Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP 07),
pp. 1075-1083, 2007.

[5] E. Cambria, B. Schuller, Y. Xia, C. Havasi, New


Avenues in Opinion Mining and Sentiment Analysis,
IEEE Transactions on Intelligent System, vol. 28 (2),
pp.1521, 2013.

[14] P.D. Turney, Thumbs Up or Thumbs Down?


Semantic
Orientation Applied
to Unsupervised
Classification of Reviews, Proceedings of 40th Annual
Meeting on Assoc. for Computational Linguistics (ACL
02), pp. 417-424, 2002.

[6] E. Cambria, B. White, Jumping NLP curves: A


Review of Natural Language Processing Research, IEEE
Transactions on Computational Intelligent Management,
vol. 9 (2), pp. 4857, 2014.

[15] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang,


Domain adaptation via transfer component analysis,
IEEE Transactions on Neural Network, vol. 22, no. 2, pp.
199210, 2011.

[7] H. Kanayama and T. Nasukawa, Fully Automatic


Lexicon Expansion for Domain-Oriented Sentiment
Analysis, Proceedings of Joint Conference on Empirical
Methods in Natural Language Processing and
Computational Natural Language Learning (EMNLP 06),
pp. 355- 363, 2006.

[16] S. Poria, A. Gelbukh, A. Hussain, N. Howard, D. Das,


S. Bandyopadhyay, Enhanced SenticNet with Affective
Labels for Concept-based Opinion Mining, IEEE
Transactions on Intelligent System, vol. 28 (2), pp. 3138,
2013.

[8] H. Takamura, T. Inui, and M. Okumura, Extracting


Semantic Orientation of Words Using Spin Model,
Proceedings of Annual Meeting of the Association of
Computational Linguistics (ACL 05), pp. 133-140, 2005.

[17] T. Joachims, Text Categorization with Support


Vector Machines:Learning with Many Relevant Features,
Proceedings of 10th European Conference on Machine
Learning (ECML 98), pp. 137-142, 1998.

[9] J. Blitzer, M. Dredze, and F. Pereira, Biographies,


Bollywood, Boom-Boxes and Blenders: Domain
Adaptation for Sentiment Classification, Proceedings of
45th Annual Meeting of the Association of Computational
Linguistics (ACL 07), pp. 440-447, 2007.
[10] K. Yoshida, Y. Tsuruoka, Y. Miyao, and J. Tsujii,
Ambiguous Part-of-Speech Tagging for Improving
Accuracy and Domain Portability of Syntactic Parsers,
Proceedings of 20th International Joint Conference on
Artificial Intelligence (IJCAI 07), pp. 1783-1788, 2007.
www.ijsret.org

113

You might also like