Professional Documents
Culture Documents
ISSN 2278-6856
Abstract
With the increasing use of online shopping; e-commerce
websites have become targets of opinion spam. People can
now post their views using Internet forums, discussion groups,
product reviews and blogs, which are collectively called usergenerated contents. Scanning through large amounts of usergenerated content manually and checking its truthfulness is
time-consuming and sometime impossible. This study explores
the various types of linguistic features of shill reviews and
respected tools developed to extract product features from text
of product reviews. Sentiment analysis, Natural language
processing, Machine learning techniques and linguistic
features can be employed to detect the shill reviews. The main
objective of review spammers is to mislead users and change
their perception about particular product or service. Various
linguistic characteristics including styles of spamming can be
helpful to detect the review spam. Different methods
according to linguistic features have been adopted and
implemented effectively. Review manipulation was found on
reputable e-commerce websites, so linguistic-feature based
methods have gained popularity.
1. INTRODUCTION
Many people rate, review and research products online.
Consumers use product reviews as a media to gather
information about the service, quality and performance of
products. The product information given in reviews
usually comes from actual users of the product. These
reviews can be helpful for consumers in decision making
of buying that product. Due to this, review spammers get
an opportunity to manipulate review system to mislead the
consumers and change their perception about specific
product. Fake reviews are also referred as Shill reviews,
or Opinion spam. Opinion spam can be deceptive or
disruptive [1].
Deceptive reviews mislead readers by giving
undeserving positive reviews to some target objects in
order to promote the objects; or by giving unfair
negative reviews to some target object in order to
damage their reputation.
Disruptive reviews are non-reviews, which mainly
include advertisements and other irrelevant reviews
containing no opinion.
Following are some facts about consumers behaviour and
approach towards online shopping.
ISSN 2278-6856
Fog 0.4 *
100*
(1)
N (words)
sentences
4.2.2. The Flesch Kincaid or Flesch Reading Ease
Index
The value of this index is from 0 to 100, smaller scores
indicating less readable text.
N(words
N(syllables
)
)
84.6*
(2)
FK 206.8551.015*
)
)
N(sentences
N(words
4.2.3. The Automated Readability Index (ARI)
The value of this index is from 1 to 12, number indicates
the grade level education needed to understand the text.
For example, ARI = 5 requires the reader to have fifth
grade education to understand the text. ARI can be
calculated as follow:
N(characters
N(words
)
)
0.5
21.43 (3)
ARI 4.71*
)
)
N(words
N(sentences
4.2.4. The Colemon-Liau Index (CLI)
The CLI ranges from 1 to 16 indicating the grade level
education needed to understand the text.
CLI = 0.0588L - 0.296S - 15.8
(4) L:
number of characters per 100 words.
S: number of sentences per 100 words
4.2.5. Simple measure of Gobbledygook (SMOG)
A SMOG result also ranges from 1 to 12. SMOG is
calculated as follow:
SMOG 1.043 30
Quantityof polysyllables
3.1291
Quantityof sentences
(5)
4.3. Shingling
Main tasks of shingling are:
4.3.1. Review Pre-processing
It takes extracted reviews as input stored in the raw
review database, and removes stop words, special
characters, punctuations and delimiters occurring in the
extracted reviews.
4.3.2. Feature Extraction
Shingling technique extracts the features from the reviews
stored in raw review database, and stores it in
Shingle_Feature_Database Sfd.
4.3.3. Create shingles
In case of shingling technique, each review is viewed as a
sequence of tokens, which could be words, or lines.
4.3.4. Spam_Detect_shingle
Shingles computed of the review documents is the input to
this component, where spam and the non-spam reviews is
detected using w-shingling algorithm
5. EXISTING SYSTEMS
The work on shill review detection has been done in
various fields by applying efficient techniques. Some of
the systems are explained below.
5.1. OPINE
This is an information extraction system which mines
reviews in order to construct a model of important product
features, their evaluation by reviewers, and their relative
quality across products. This framework helps to identify
product features with improved precision compared with
ISSN 2278-6856
Feature
Informativeness
,
Subjectivity,
Readability
Comments
Informativeness
reflects knowledge
about product feature
Subjectivity reflects
Page 271
Readability,
Genre,
Writing style
Readability and
writing style of
reviews
more significant
Writing style
Identification of
fraud authors with
high accuracy
Behavioural
footprints
Effective and
outperforms strong
competitors.
Rating
behaviours
More significant
impact on ratings
compared with the
unhelpful reviewers.
7. CONCLUSION
Shill reviews is an emerging problem of online review
systems. A successful shill attack might trick consumers
into buying low quality products or damage sales of
competing products. In addition, shill attacks that are
detected by consumers might result in loss of trust in
reputation of the systems and movement of consumers
away from electronic marketplaces. Therefore, a powerful
shill review detection method is essential for online
marketplaces moving forward. The occurrence of review
manipulation has the potential to undermine the
effectiveness of reputation systems. This paper surveys the
approaches used in detection of shill reviews. Then the
methods for particular approach are studied which
includes DFEM, readability index measures. Then some
linguistic feature based systems are explained briefly.
Then the comparative analysis is given to throw light on
specific linguistic features. Finally, analysis of linguistic
feature based methods is done on informativeness,
readability, stylometry and rating behaviour level.
References
[1] Yafeng Ren Donghong Ji Hongbin Zhang, Positive
Unlabeled Learning for Deceptive Reviews
Detection, Proceedings of the 2014 Conference on
Empirical Methods in Natural Language Processing
(EMNLP), 2014.
[2] Nitin Jindal Ee-Peng Lim, Viet-An Nguyen.
Detecting Product Review Spammers using Rating
Behaviors, In CIKM '10, Toronto, Ontario, Canada,
October 26 to 30, 2010.
[3] Ziqiong Zhang, Rob Law, Qiang Ye, Sentiment
classification of online reviews to travel destinations
ISSN 2278-6856
Page 272