Ceri2012 Garcia

Extraction and Ranking of Product Aspects
Based on Word Dependency Relations

Lisette Garca-Moya, Rafael Berlanga-Llavori, and
Mara Jose Aramburu-Cabo
Universitat Jaume I, Spain
{lisette.garcia,berlanga,aramburu}@uji.es
Abstract. In this paper we present a new method for extracting product aspects from customer reviews. The identification of aspects is mainly
based on both the dependency analysis of customer reviews, and the use
of opinion words to select aspects on which customers have expressed
their opinions. Our method provides acceptable performance results, obtaining the highest precision values. The proposal also relies on a language modelling framework that combines both a kernel-based model of
opinion words and a stochastic translation model between words for assigning a relevance score to the identified product aspects. Then, the aspects are ranked according to their relevance score. Our ranking method
results outperform the results obtained from two proposed baselines.
Keywords: Product Aspect Modelling, Opinion Mining, Sentiment Analysis
Introduction
Nowadays, the Web has become an excellent way of expressing opinions about
almost everything. Thus, the number of Web sites containing such opinions is
huge and it is constantly growing. The consumer opinion web sites are an unpaid
form of promotion in which satisfied customers tell other people how much they
like a business, product, service, or event. It has become in one of the most
credible forms of advertising because people who do not stand to gain personally
by promoting something put their reputations on the line every time they make
a recommendation.
Therefore, in the last years the computational treatment of sentiment and
opinions has been viewed as a challenging area of research that can serve to
different purposes.
Aspect-based summarization usually is composed of three main tasks: aspect
identification, sentiment classification, and aspect rating. Aspect identification
is focused on extracting the set of aspects concerning the product from the reviews. The word aspect is used to represent both components and attributes. For
example, given the sentence, The battery life of this camera is too short, the
review is about the battery life aspect and the opinion is negative. The sentiment classification task consists of determining the opinions about the product
aspects and/or their polarities, whereas aspect rating leverages the relevance of
aspects to properly present them to the users.
The task of generating aspect-based summaries is clearly different from traditional text summarization [9] because it does not summarise the reviews by
selecting or rewriting a subset of original sentences from the reviews. The goal
here is to obtain structured summaries formed by all the aspects of the products
that customers have opinions about, and also whether the opinions are mainly
positive or negative.
This paper focuses on the aspect extraction task. Specifically, given a set
of user reviews about a specific product we address the problem of identifying
aspects on which customers have expressed their opinions. In order to help users
and analysts to better summarise opinions about products, we propose an aspect
relevance ranking model.
Related Work
Existing product aspect extraction techniques can be broadly classified into two
major approaches: supervised and unsupervised ones.
Supervised techniques require a set of pre-annotated review sentences as
training examples. A supervised learning method is then applied to construct
an extraction model, which is capable of identifying product aspects from new
customer reviews. Different approaches such as Hidden Markov Models and Conditional Random Fields [13, 14], Maximum Entropy [10], Class Association Rules
and Naive Bayes Classifier [15] and other ML approaches have been employed
for this task.
Although the supervised techniques can achieve reasonable effectiveness, preparing training examples is time consuming. In addition, the effectiveness of the
supervised techniques greatly depends on the representativeness of the training
examples. In contrast, unsupervised approaches automatically extract product
aspects from customer reviews without involving training examples. Moreover,
the unsupervised approaches seem to be more flexible than the supervised ones
for environments in which various and frequently expanding products get discussed in customer reviews.
Hu and Lius works [7, 6] (PFE technique) uses association rule mining based
on the Apriori algorithm [1] to extract frequent itemsets as explicit product aspects. In association rule mining, the algorithm does not consider the position of
the words in the sentence. In order to remove wrong frequent aspects, two types
of pruning criteria were used: compactness and redundancy pruning. The technique is efficient and does not require the use of training examples or predefined
sets of domain-independent extraction patterns. However, it suffers from three
main shortcomings. First, frequent aspects discovered by the mining algorithm
might not necessarily be product aspects. The compactness and redundancy
pruning rules are unable to eliminate these false aspects. Second, even if a frequent aspect is a product aspect, customers may not be expressing any subjective
opinion about it in their reviews. These frequent yet opinion-irrelevant product
aspects should not be extracted. Third, the technique treats nearby adjectives
of frequent aspects as opinion words, even though many adjectives do not have
subjective implications. If an adjective with any subjective judgement appears
adjacent to frequent aspects in some review sentences, this technique will mistakenly consider this adjective as an opinion word, and use it to discover infrequent
product aspects in other review sentences [12].
To address these limitations, Wei et al. [12] proposed a semantic-based product aspect extraction technique (SPE) that exploits a list of positive and negative adjectives defined in the General Inquirer [11] in order to recognise opinion
words, and subsequently to extract product aspects expressed in customer reviews. Even when SPE technique attains better results than previous works [7, 6],
both rely on mining frequent itemsets. As previously mentioned, this algorithm
is not appropriate because it disregards the sequence of words [8].
According to our review of existing product aspect extraction techniques, the
unsupervised approaches seem to be more flexible than the supervised ones for
scenarios in which various and frequently expanding products get discussed in
customer reviews.
Proposed Method
Figure 1 gives an overview of our aspect extraction method. The input is a set of
customer reviews about a particular product and the output is a ranked list of
product aspects. The general idea is to identify those segments of texts that can
be syntactically considered as product aspects. From these segments (which are
regarded as candidate product aspects), we select as product aspect those ones
that are modified by some opinion words. Finally, the selected product aspects
are ranked according to their relevance.
Preprocessing
Customer
Reviews
POS Tagging
Dependency Parsing
Potential Product
Aspects Extraction
Aspect Filtering
Opinion
words
Aspect Ranking
Ranked
Product Aspects
Fig. 1. Overview of the proposed aspect extraction method.
3.1
Potential Product Aspects Extraction
Our method aims to find what customers like and dislike about a given product.
However, due to the difficulty of natural language understanding, some types of
sentences are hard to deal with. The next sentences were taken from the reviews
of a mobile phone. The first two can be considered easy sentences and the last
sentence a hard one to handle with:
It has a nice color screen.
T-mobile was a pretty good server.
When you put this phone in your pocket you forget it is there ; it is unbelievably
small and oh , so light.
In the first two sentences, it is easy to note that the user is talking about
color screen and T-mobile server respectively because these aspects are explicitly
mentioned. However, some aspects are implicit and hard to find, like in the
third sentence, where the customer is talking about size and weight. Semantic
understanding is needed to find these implicit aspects, but this is out of the
purpose of this paper.
This work is focused on finding explicit aspects. In general, most product
aspects indicating words are nouns or noun phrases. Therefore, after parsing the
sentence, the next step is to identify noun phrases as potential product aspects.
In this sense, we apply the linguistic filtering patterns shown in Table 1. Each
pattern is defined in terms of an extended regular expressions over the POStagging labels: JJ (adjective), NN (common noun), NNP (proper noun), VBG
(gerund verb), VBN (past participle verb), and DT (general determiner). These
definitions allow the extraction of both simple and compound noun phrases as
potential aspects.
Name Pattern
N P 1 (JJ|N N |N N P )+
NP 2
PF1
PF2
Examples
battery life
lcd screen
battery charging system
N P 1 (V BG|V BN ) N P 1
(N P 1|N P 2)
P F 1 (of |f rom|in) (DT )? P F 1 quality of photos
Table 1. Extraction patterns for identifying potential aspects.
3.2
Aspect Filtering
Once the potential aspects are selected, the next step is to identify those ones
on which customers have expressed their opinions. In this sense, from the set
of potential aspects we select those ones which have been modified by opinion
words according to some dependency relations. We call this method as Product
aspect extraction based on word dependency relations (PAE-DR).
We consider that a potential aspect has been modified by an opinion word if

at least one of the followings levels of relations occurs:
1. A word of the candidate aspect is directly related to an opinion word (1st
level of relation).
2. A word of the candidate aspect is related to a word that is related to an
opinion word (2nd level of relation).
3. A word of the candidate aspect is related to a word that is related to other
word that is related to an opinion word (3rd level of relation).
The method considers only the following set of relations {nn, acomp,
advmod, amod, det, dobj, infmod, iobj, measure, nsubj, nsubjpass, partmod, prep, rcmod, xcomp, xsubj} which were obtained
by using the Stanford Lexicalized Parser v1.6.9 [3]. We have used the list of positive and negative opinion and sentiment words for English (around 6800 words)
compiled by [6].
Figure 2 shows the dependency analysis of two sentences. The only candidate aspect in the figure 2a) is color screen. It can be easily identified as a
product aspect because one of its words (screen) is involved in a 1st level relation (amod(screen-6, nice-4)) with an opinion word (nice). In figure 2b), there
are two candidates aspects: T-mobile and server ; server has a 1st level relation
with the opinion word good (amod(server-6, good-5)), and T-mobile is involved
in a 2nd level relation with good (nsubj(server-6, T-mobile-1) and amod(server6, good-5)). Both candidate aspects are identified as product aspects.
nsubj(has-2,
nsubj(has-2,It-1)
It-1)
det(screen-6,
det(screen-6,a-3)
a-3)
amod(screen-6
amod(screen-6,
amod(screen-6nice-4)
amod(screen-6,
nice-4)
nn(screen-6,
nn(screen-6,color-5)
color-5)
dobj(has-2,
dobj(has-2,screen-6)
screen-6)
a) It has a nice color screen.
nsubj(server-6,
nsubj(server-6,T-mobile-1)
T-mobile-1)
cop(server-6,
cop(server-6,was-2)
was-2)
det(server-6,
det(server-6,a-3)
a-3)
advmod(good-5,
advmod(good-5,pretty-4)
pretty-4)
amod(server-6,
amod(server-6,good-5)
good-5)
b) T-mobile was a pretty good server.
Fig. 2. Dependency analysis obtained by applying the Stanford Lexicalized Parser.
3.3
Product Aspects Ranking
Once the set of product aspects is identified, we propose to order them according
to their relevance. In this sense, we apply a methodology for modelling product aspects from a collection of free-text customer reviews. The proposal relies
on a language modelling framework which is domain independent. It combines
both a kernel-based model of opinion words and a stochastic translation model
between words to approach the aspect model of products. Finally, we used this
methodology for ranking our set of product aspects.
Modelling Product Aspects Given a collection of customer reviews about a

specific product and a free-text document d, which can be either a subcollection
of reviews or an individual review, our goal is to obtain a probabilistic model for
retrieving the product aspects from d.
Specifically, we consider modelling the set of aspects discussed in d as a statistical language model that assigns higher probability values to words defining
aspects [4].
In the context of customer reviews, opinion words (e.g good, bad, etc.)
usually express sentiments about the different aspects of a product. This causes
the review texts to reflect some entailment relationship from opinion words to
aspect words. In this way, we consider the use of an (stochastic) entailmentbased self-translation model between the words in d to reveal the probability
distribution of words that approaches the language model of aspects expressed
in d from a general probabilistic model of opinion words [4].
Thus, in this work we consider successive applications of an entailment-based
self translation model between words to an opinion word model in order to model
the unigram language model of product aspects as follows:
k
Q(w1 )
p(w1 |w1 ) p(w1 |wn )

..
..
..
k
..
P (wi )
.
.
.
.
= T Q i
Q(wn )
p(wn |w1 ) p(wn |wn )
(1)
>
where V = {w1 , . . . , wn } represents the vocabulary of d, Q = hQ(w1 ), . . . , Q(wn )i

and T = {p(wi |wj )}1in,1jn are a vector-shaped model of opinion words and
an n-by-n (column-wise) stochastic matrix representing an entailment-based selftranslation language model of words from d respectively, and k > 0 is the number
of times that translation T is applied to the opinion model Q. The idea is that
by applying successively the entailment model T to Q, we can capture some
underlying entailment from opinion words to aspect words.
This unigram language model of aspects P = {P (wi )}1in is used to measure the relevance of each product aspect previously identified as follows:
P (s) =
r
Y
P (wit )1/r
(2)
t=1
where s = wi1 . . . wir is an identified product aspect.

Component Estimation For all i, j {1, . . . , n}, we define p(wi |wj ) to be
proportional to the number of times word wi occurs in those local contexts of d
in which wj occurs. Specifically, we use word N -grams of length 5 to define the
local contexts. Thus,
p(wi , wj )
(3)
p(wi |wj ) =
p(wj )
where:
p(wi , wj )
p(wi |`) p(wj |`) p(`),
(4)
`L
p(wj ) =
p(wi , wj ),
(5)
wi V
L is the set of all word 5-grams from d, p(wi |`) = |`|wi /|`| and p(`) = |L|1
(|`|wi is the number of times wi occurs in `, and |`| is the length of `).
We rely on a kernel-based density estimation approach to define Q from a
predefined set of (general-domain) opinion words {u1 , . . . , um }. Thus, we define:
m
Q(w) =
1 X
K(w, ui )
m i=1
(6)
where K(w, ui ) is the Gaussian kernel:

K(w, ui ) = exp 0.5 h(g(w), g(ui ))2 / 2
(7)
such that h represents the geodesic distance between distributions, g(v) is the
posterior distribution of words {p(wi |v)}1in , and is a predetermined distribution width. In our experiments we use = 0.3.
Aspect language model P is refined to avoid the assignment of high probability values to meaningless words. The refined unigram language model P 0 is
obtained by performing an EM process aimed at maximizing the cross entropy:
n
X
P (wi ) log(P 0 (wi ) + (1 )Pbg (wi ))
(8)
i=1
where Pbg is a background language model of the source language of the reviews
(e.g. English). Currently, we estimate Pbg from the COCA corpus [2].
Experiments
In order to measure the quality of the aspect extraction method here proposed,
we compare our results to the results presented in [12]. Additionally, the relevance
ranking for aspects is evaluated by calculating the precision at different cut-off
ranks. In the following subsections, we detail the design of our experiments,
including the data collection and the evaluation criteria.
4.1
Data Collection
We have conducted our experiments using the customer reviews of five products: Apex AD2600 Progressive-scan DVD player, Canon G3, Creative Labs Nomad Jukebox Zen Xtra 40GB, Nikon coolpix 4300 and Nokia 6610. The reviews
were collected from Amazon.com and CNET.com. This customer review data set
is available at http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip.
Table 2 shows the number of manually tagged aspects and the number of review
sentences for each product in the data set.
Table 2. Summary of customer review data set

Number of review sentences
Number of manually tagged product aspects
Apex Canon Creative Nikon Nokia

738
600
1705
350
548
114
103
184
73
110
Table 3. Comparative evaluation results

Apex Canon Creative Nikon Nokia Macro avg. Micro avg.
Precision
PFE
0.510 0.511
0.370
0.510 0.495
0.479
0.461
SPE
0.524 0.487
0.440
0.474 0.565
0.498
0.486
PAE-DR
0.560 0.550
0.448
0.568 0.592
0.544
0.522
4.2
Recall
PFE
SPE
PAE-DR
0.600
0.700
0.491
0.630
0.750
0.699
0.561
0.650
0.630
0.676
0.757
0.630
0.578
0.725
0.645
0.609
0.716
0.619
0.599
0.705
0.618
F1
PFE
SPE
PAE-DR
0.551
0.599
0.523
0.564
0.591
0.616
0.446
0.525
0.524
0.581
0.583
0.597
0.533
0.635
0.617
0.535
0.587
0.575
0.521
0.575
0.566
Evaluation of the Proposed Method and Comparison with

Existing Techniques
The goal of this evaluation is to measure the effectiveness of our product aspect extraction technique. The complete set of product aspects identified by the
method is compared with the set of manually tagged product aspects for each
product in the customer review data set.
We use precision, recall an F1 to measure the effectiveness of the product
aspect extraction method. When dealing with multiple products, we adopt the
macro-average and micro-average [16] to assess the overall performance across
all products. We follow the same definitions of the measures proposed by Wei et
al. [12].
Table 3 shows the comparative evaluations results. The results of PFE and
SPE techniques were taken from [12]. Notice that our proposed method (PAEDR) obtains the best precision results, but achieves lower recall values than those
ones obtained by SPE. However, the weighted harmonic mean of precision and
recall, F1 , does not show notable difference between these two methods.
Even when our results are not the best, there are some important remarks.
We consider that the data set had not been homogeneously tagged and this affects significantly to the evaluation results. Table 4 shows a group of examples to
better illustrate some of the problems we deal with when matching the system
generated output to the gold standard. The first column indicates the product review where the examples were taken from. In the second appears aspects
manually labelled as different in the collection when they are really referring
to a unique aspect. Finally, the third column shows the aspects extracted by
our method. This is a common problem presented in the evaluation of aspect
extraction systems of the literature.
Table 4. Examples of aspects tagged as different in the data set when they are really
referring to a unique aspect
Product
Apex
Creative
Nikon
Nokia
Product Aspects
Gold Standard System Output
tech support
technical support
technical support
bookmakr
bookmark
bookmark
lens cap
lense cap
lens cap
use
ease of use
use
menu option
menu options
menu option
These examples demonstrate the necessity of establishing standardised alignment methods in order to obtain more reliable comparison between different
approaches in the literature.
4.3
Evaluation of the Product Aspects Ranking
The final step of our method assigns a rank to the set of identified product
aspects. The ranking process must be able to promote the most relevant aspect
to the top of the list. The relevance ranking is evaluated by computing 11 Point
Interpolated Average Precision.
We propose two baseline methods. The first one (baseline1) determines the
relevance of an aspect f by its frequency in the opinion sentences (os) of the
reviews (sentences where opinions about any aspects are emitted):
rel(f ) = T Fos (f )
(9)
The second baseline (baseline2) tries to give more relevance to those aspects
which are receiving more opinions and occur less in non-opinion sentences:
rel(f ) = T Fall
IDFall (f )
IDF+ (f )
(10)
T Fall (f ) is the frequency of the aspect f in all the review sentences. IDF values
are calculated as follows:
N
IDFall (f ) = log
(11)
sf
IDF+ (f ) = log
ON
osf
(12)
where N is the total number of review sentences, sf is the number of sentences

where f occurs, ON is the total number of opinion sentences, and osf is the
number of opinion sentences where f occurs. The ratio between IDFall and
IDF+ expresses the usage degree of f as a true aspect.
In Figure 3, we show the interpolated average precision curves obtained for

each of the five products. It can be seen that our ranking based on the unigram
language model of aspects clearly outperforms the two frequency-based baselines
proposed. Also, it can be appreciated that the curves obtained by our method
are overall concave. This is due to the relative small decrease on the precision
of our ranking with respect to the increase of recall, which is a desired property.
Indeed, the precision values for the five products remain high (i.e., over 0.8) as
recall increases up to almost retrieving the 60% of the relevant aspects.
Conclusion and Future Work
In this paper, a new method for identifying product aspects from customer reviews has been presented. First of all, the candidate product aspects are identified taking in consideration their grammatical structure. From this set, only
those on which customers have expressed their opinions are selected. The proposed aspect filtering considers the dependency relations between aspects and
opinion words at three different levels of relation. Finally, the identified product
aspects are ranked according to their relevance.
Using existing performance benchmarks, the empirical evaluation results
show that even when our method does not achieve the best results for all the
measures, it does obtain the best precision results, and the F1 results are close
to those achieved by the best performance benchmark used in our comparison.
Results obtained for the ranking of aspects are also encouraging.
As mentioned before, it is really necessary to propose better matching methods and new evaluation measures capable of dealing with the inconsistencies
that can appear at evaluation step, in order to obtain more reliable results. Our
future work will be firstly oriented in this direction.
We will also attempt to provide users with the opinion polarity of each identified product aspect and grouping aspects according to the strength of their
opinions or/and their granularity level. Now, we are able to provide a first approximation to the aspect polarity as each identified aspect has been modified
by an opinion word with context-independent polarity. But, in order to provide
aspects with a real polarity, it is necessary to take in consideration some other aspects like the presence of valence shifters (e.g. negation words as no, not, never )
which change the polarity of an opinion word. Also, context-dependent polarity
most be taken into account.
We also aim to produce multidimensional structured summaries for a given
product with all the compiled opinion information. A preliminary approach is
presented in [5], where external knowledge sources are exploited to ensure the
quality of the extracted aspects.
Acknowledgments. This work has been partially funded by the Ministerio
de Economa y Competitividad with contract number TIN2011-24147 and by
the Fundaci
o Caixa Castello project P1-1B2010-49. Lisette Garca-Moya has
0.8
0.8
Interpolated Precision
Fig. 3. 11 Point Interpolated Average Precision (where unigramLM : our proposal,

Frequency: baseline1 and IDFs: baseline2).
0.6
0.4
unigram LM
IDFs
Frequency
0.2
0.6
0.4
unigram LM
IDFs
Frequency
0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
0.1
0.2
0.3
0.4
Recall
0.5
0.6
0.7
0.8
0.9
0.7
0.8
0.9
Recall
(a) Apex AD2600
(b) Canon G3
0.9
0.8
0.7
0.6
0.5
0.4
unigram LM
IDFs
Frequency
0.3
0.2
0.6
0.4
unigram LM
IDFs
Frequency
0.2
0.1
0
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
Recall
0.5
0.6
Recall
(c) Creative Labs
(d) Nikon coolpix 4300
0.8
0.8
0.6
0.4
unigram LM
IDFs
Frequency
0.2
0.1
0.2
0.3
0.4
0.5
0.6
Recall
(e) Nokia 6610
0.7
0.8
0.9
been supported by the PhD Fellowship Program of the Universitat Jaume I

(PREDOC/2009/12).
References
1. Agrawal, R., Srikant, R.: Fast algorithm for mining association rules. In: Proc. of
the International Conference on Very Large Data Base. pp. 487499 (1994)
2. Davies, M.: Word frequency data from the Corpus of Contemporary American
English (COCA) (2011), downloaded from http://www.wordfrequency.info on June
01, 2011
3. De Marneffe, M., MacCartney, B., Manning, C.: Generating typed dependency
parses from phrase structure parses. In: Proceedings of LREC. vol. 6, pp. 449454
(2006)
4. Garca-Moya, L., Anaya-Snchez, H., Berlanga-Llavori, R.: Combining Probabilistic
Language Models for Aspect-Based Sentiment Retrieval. In: Proc. of the 34th European Conference on Information Retrieval. Lecture Notes in Computer Science,
vol. 7224, pp. 561564. Springer-Verlag (2012)
5. Garca-Moya, L., Kudama, S., Aramburu-Cabo, M.J., Berlanga-Llavori, R.: Integrating web feed opinions into a corporate data warehouse. In: Proc. of the 2nd
International Workshop on Business intelligencE and the WEB. pp. 2027 (2011)
6. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. of the 10th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 168177. ACM Press, New York, NY (2004)
7. Hu, M., Liu, B.: Mining opinion features in customer reviews. In: Proc. of American
Association for Artificial Intelligence Conference. pp. 755760 (2004)
8. Hu, M., Liu, B.: Opinion Feature Extraction Using Class Sequential Rules. In:
Proc. of AAAI 2006 Spring Symposia on Computational Approaches to Analyzing
Weblogs (2006)
9. Mani, I., Maybury, M.T. (eds.): Advances in Automatic Text Summarization. MIT
Press (1999)
10. Somprasertsri, G., Lalitrojwong, P.: Automatic product feature extraction from
online product reviews using maximum entropy with lexical and syntactic features. In: Proc. of the IEEE International Conference on Information Reuse and
Integration. pp. 250255. IEEE Systems, Man, and Cybernetics Society (2008)
11. Stone, P.J., Dunphy, D.C., Smith, M.S., Ogilvie, D.M.: The General Inquirer: A
Computer Approach to Content Analysis. MIT Press (1966)
12. Wei, C.P., Chen, Y.M., Yang, C.S., Yang, C.C.: Understanding what concerns consumers: a semantic approach to product feature extraction from consumer reviews.
Information Systems and E-Business Management pp. 149167 (2009)
13. Wong, T.L., Lam, W.: Hot Item Mining and Summarization from Multiple Auction
Web Sites. In: Proc. of the Fifth IEEE International Conference on Data Mining.
pp. 797800. IEEE Computer Society, Washington, DC, USA (2005)
14. Wong, T.L., Lam, W.: Learning to extract and summarize hot item features from
multiple auction web sites. Knowl. Inf. Syst. 14(2), 143160 (2008)
15. Yang, C.C., Wong, Y.C., Wei, C.P.: Classifying web review opinions for consumer
product analysis. In: Proc. of the 11th International Conference on Electronic
Commerce. pp. 5763. ACM, New York, NY, USA (2009)
16. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal
of Information Retrieval 1(1-2), 6990 (1999)

Ceri2012 Garcia

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ceri2012 Garcia

Uploaded by

Copyright:

Available Formats

Extraction and Ranking of Product Aspects

Based on Word Dependency Relations

Fig. 1. Overview of the proposed aspect extraction method.

Potential Product Aspects Extraction

Table 1. Extraction patterns for identifying potential aspects.

We consider that a potential aspect has been modified by an opinion word if

b) T-mobile was a pretty good server.

Fig. 2. Dependency analysis obtained by applying the Stanford Lexicalized Parser.

Product Aspects Ranking

Modelling Product Aspects Given a collection of customer reviews about a

where V = {w1 , . . . , wn } represents the vocabulary of d, Q = hQ(w1 ), . . . , Q(wn )i

where s = wi1 . . . wir is an identified product aspect.

p(wi |`) p(wj |`) p(`),

where K(w, ui ) is the Gaussian kernel:

P (wi ) log(P 0 (wi ) + (1 )Pbg (wi ))

Table 2. Summary of customer review data set

Apex Canon Creative Nikon Nokia

Table 3. Comparative evaluation results

Evaluation of the Proposed Method and Comparison with

Evaluation of the Product Aspects Ranking

where N is the total number of review sentences, sf is the number of sentences

In Figure 3, we show the interpolated average precision curves obtained for

Conclusion and Future Work

Fig. 3. 11 Point Interpolated Average Precision (where unigramLM : our proposal,

(a) Apex AD2600

(c) Creative Labs

(d) Nikon coolpix 4300

(e) Nokia 6610

been supported by the PhD Fellowship Program of the Universitat Jaume I

You might also like