VTU FINAL YEAR PROJECT REPORT Main Report

Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
CHAPTER1
PREAMBLE
1.1 Introduction
Many approaches have been successfully developed to detect online spam. Fangtao Li
and colleagues1 initially analyzed several attributes related to spam behavior, such as content,
sentiment, product, and metadata features, and exploited a two-view semi supervised method to
identify spam reviews. Song Feng and colleagues2 defined three types of reviewers (any-time,
multi-time, and single- time reviewers) and statistically made distributional footprints of
deceptive reviews by using neuro-linguistic programming (NLP) techniques. Geli Fei and
colleagues3 proposed a model to detect spammed products or product groups by comparing the
differences in rating behaviors between suspicious and normal users.
All these models rely on content features that can be easily found by inserting special
characters, but other features, such as temporal and network information, have been employed as
well. Qian Xu and colleagues4 collected large-scale real-world datasets from telecommunication
service providers and combined temporal and user network information to classify spammers
using Short Message Service (SMS). Sihong Xie and colleagues5 proposed a model that uses
only temporal features, with no semantic or rating behavior analysis, to detect abnormal bursts
as the number of reviews increases. Finally, Tyler Moore and colleagues6 studied the problem of
temporal correlations between spam and phishing websites.
Intuitively, these works can also be used to uncover sophisticated spam strategies.
Amazon has sued more than 1,000 product review sellers who sell fake promotions on
Fiverr.com (one of the most famous being Spam Reviewer Cloud;
http://money.cnn.com/2015/10/18 /technology/amazon-lawsuit-fake -reviews). On such user
cloud platforms, business owners can purchase anonymous comments generated by real users by
paying for them. It makes spam detection very challenging, as the advent of a massive number
of apparently genuine fake reviewers (which we refer to as “genuine fakes” in this article) makes
the fraud pattern much more nebulous to track.
To date many third party platforms have created various fake review markets for online
product sellers and fake review providers. In real-world business processes, massive numbers of
random but genuine fake review providers conduct real transactions and write positive
Department of CSE, TOCE 1

comments to claim a bonus (many e-commerce websites think they can reduce spam reviews by
allowing only real buyers to write them). Existing research ignores the latent connections in
product networks, which are difficult to discover, especially when these spam activities have
become a hyping and advertising investment that has gained increased popularity among
homogeneous competitors online. Thus, antispam rules can be easily avoided, which also
impairs the efficiency and effectiveness of detection performance. In this work, we coin a new
solution— collaborative marketing hyping detection—that aims to detect groups of online stores
that simultaneously adopt marketing hyping. [1]
This field involves various challenges:

• How can heterogeneous product information network be defined to infer their latent
collaborative hyping behaviors? Network information might not be directly observed in the
original datasets, so we need to build up a relationship matrix between products to represent
their underlying correlation.
• What features need to be selected to best solve our problem? Traditional features such as
semantic clues or user relations might no longer be suitable for discovering fraud due to rapidly
evolving spam strategies. Hence, we need to choose dedicated features according to our specific
scenario.
• How can we design a model that effectively identifies collaborative marketing hyping
behavior? A model that can employ the power of heterogeneous product networks to discover
collective hyping behavior is required here.
To overcome these challenges, we propose an unsupervised shapelet learning model to discover

the temporal features of product reviews and then integrate the heterogeneous product network
information as regularization terms, to discover the products that are subject to collaborative
hyping. We define three regularization terms that reflect the underlying correlations among
users, products, and online store networks.[1]
The beginning configuration procedure of recognizing these subsystems and building up a
structure for subsystem control and correspondence is called construction modeling outline and
the yield of this outline procedure is a portrayal of the product structural planning.

1.2 Objectives And Goals
A new solution aims to identify spam comments and detect products that adopt an evolving
spam strategy for promotion. Specifically, an unsupervised learning model combines
heterogeneous product review networks to discover collective hyping activities.
1.3 Existing System
Traditional features such as semantic clues or user relations might no longer be suitable
for discovering fraud due to rapidly evolving spam strategies. Hence, we need to choose
dedicated features according to our specific scenario.
Disadvantages:
 Decreases the inaccuracy caused by only using the user name information.
 Stores usually purchase fake reviews periodically.
1.4 Proposed System

We propose an unsupervised shape let learning model to discover the temporal features of
product reviews and then integrate the heterogeneous product network information as
regularization terms, to discover the products that are subject to collaborative hyping. We define
three regularization terms that reflect the underlying correlations among users, products, and
online store networks.
Advantages:
 Gained increased popularity among homogeneous competitors online.
 Efficiency and effectiveness of detection performance.
 Aims to detect groups of online stores that simultaneously adopt marketing hyping.

CHAPTER 2
LITERATURE SURVEY
 Learning to Identify Review Spam by Fangtao Li, Minlie Huang, Yi Yang and Xiaoyan
Zhu [1]
In this paper, we study the review spam identification task in our product review mining
system. We manually build a review spam collection based on our crawled reviews. We first
employ supervised learning methods and analyze the effect of different features in review spam
identification. We also observe that the spammer consistently writes spam. This provides us
another view to identify review spam: we can identify if the author of the review is spammer.
Based on the observation, we provide a two-view semi-supervised methods to exploit the large
amount of unlabeled data. The experiment results show that the two-view co-training algorithms
can achieve better results than the single-view algorithm. Our designed machine learning
methods achieve significant improvements as compared with the heuristic baselines.
 Distributional Footprints of Deceptive Product Reviews by Song Feng, Longfei Xing,

Anupam Gogar & Yejin Choi [2]
This paper postulates that there are natural distributions of opinions in product reviews.
In particular, we hypothesize that for a given domain, there is a set of representative
distributions of review rating scores. A deceptive business entity that hires people to write fake
reviews will necessarily distort its distribution of review scores, leaving distributional footprints
behind. In order to validate this hypothesis, we introduce strategies to create dataset with
pseudo-gold standard that is labeled automatically based on different types of distributional
footprints. A range of experiments confirm the hypothesized connection between the
distributional anomaly and deceptive reviews. This study also provides novel quantitative
insights into the characteristics of natural distributions of opinions in the Trip Advisor hotel
review and the Amazon product review domains.

 Exploiting Burstiness in Reviews for Review Spammer Detection by Geli Fei, Arjun
Mukherjee & Bing Liu [3]
In this paper, we proposed to exploit bursts in detecting opinion spammers due to the
similar nature of reviewers in a burst. A graph propagation method for identifying spammers
was presented. A novel evaluation method based on supervised learning was also described to
deal with the difficult problem of evaluation without ground truth data, which classifies reviews
based on a different set of features from the features used in identifying spammers. Our
experimental results using Amazon.com reviews from the software domain showed that the
proposed method is effective, which not only demonstrated its effectiveness objectively based
on supervised learning (or classification), but also subjectively based on human expert
evaluation. The fact that the supervised learning/classification results are consistent with human
judgment also indicates that the proposed supervised learning based evaluation technique is
justified.
 Topic: SMS Spam Detection Using Non-Content Features by Qian Xu, Evan Wei Xiang
and Qiang Yang [4]
In this paper, we have examined mobile-phone SMS message features from static,
network and temporal views, and proposed an effective way to identify important features that
can be used to construct an anti-spam algorithm. We exploited a temporal analysis to design
features that can detect SMS spammers with both high performance, and incorporated these
features into an SVM classification algorithm. Our evaluation on a real SMS dataset showed that
the temporal features and network features can be effectively incorporated to build an SVM
classifier, with a gain of around 8% in improvement on AUC, as compared with those that are
only based on conventional static features.
 Topic: Temporal Correlations between Spam and Phishing Websites by Tyler Moore,
Richard Clayton & Henry Stern [6]
Empirical study of malicious online activity is hard. Attackers remain elusive,
compromises happen fast, and strategies change frequently. Unfortunately, each of these factors
cannot be changed. In this paper, we have combined phishing website lifetimes with detailed
spam data, and consequently we have provided several new insights. First, we have
demonstrated the gravity of the threat posed by attackers using fast-flux techniques. They send
out 68% of spam while hosting only 3% of all phishing websites. They also transmit spam

effectively: the bulk is sent out early, it stops once the site is removed, and keeps going
whenever websites are overlooked by the take-down companies. In this respect, we also
conclude that long-lived phishing websites continue to cause harm and should be taken down.
 A Shapelet Transfer for Time Series Classification by Jason Lines, Luke M. Davis &
Anthony Bagnall [8]
In this paper, we have proposed a shapelet transform for TSC that extracts the k best
shapelets from a dataset in a single pass. We implement this using a novel caching algorithm to
store shapelets, and apply a simple, parameter-free cross-validation approach for extracting the
most significant shapelets. We transform a total of 26 data sets with our filter and demonstrate
that a C4.5 decision tree classifier trained with transformed data is competitive with an
implementation of the original shapelet decision tree. We show that our filtered data can be
applied to further, non-tree based classifiers to achieve improved classification performance,
whilst still maintaining the interpretability of shapelets. We provide two implementations of the
filter using different quality measures for discriminating between shapelets; we use information
gain as proposed by in the first, and introduce the application of the F-statistic as an evaluation
method for shapelets in the second. We show that classifiers trained using features derived from
an F-statistic filter are competitive with classifiers trained with the information gain approach,
whilst being easier to apply to multi-class classification problems. Finally, we provide
exploratory data analysis of the shapelets extracted by our Filter on the Gun=NoGun problem
and compare them with the output of 20. We show that the shapelets we find are consistent with
the discriminatory shapelet in the original work, and show that our approach can lead to further
insight into the problem by looking at a number of the top shapelets.

CHAPTER 3
SYSTEM DESIGN
3.1 Design Consideration
The reason for the design is to arrange the arrangement of the issue determined by the
necessities report. This stage is the initial phase in moving from issue to the arrangement space.
As such, beginning with what is obliged; outline takes us to work towards how to full fill
those needs. The configuration of the framework is maybe the most basic component influencing
the nature of the product and has a noteworthy effect on the later stages, especially testing and
upkeep.
Framework outline depicts all the significant information structure, document
arrangement, yield and real modules in the framework and their Specification is chosen.
3.2 System Architecture
The architectural configuration procedure is concerned with building up a fundamental

basic system for a framework. It includes recognizing the real parts of the framework and
interchanges between these segments.
The beginning configuration procedure of recognizing these subsystems and building up a
structure for subsystem control and correspondence is called construction modeling outline and
the yield of this outline procedure is a portrayal of the product structural planning. [5]
The proposed architecture for this system is given below. It shows the way this system is
designed and brief working of the system.

Check Hyping Quality and Pay

Spammer Cloud
($)
Post TP and Pay Fake Review

($) Quality Guarentee
Fake
Reviewers
Purchase & Hyping (€)
Target (€) Online

Products Stores
(€) : Fake reviewers make genuine purchases
($) : Store owners pay fake reviewers and purchasing cost through user cloud
Fig 3.1 System Architecture

3.3 Use Case Diagram
Use case diagram shows the various interactions of actors with a system. Use case is a
coherent piece of functionality that a system can provide by interacting with actors. Actors are
the external end users of the system.
Register
Login
Browse products
User
Buy products
Give reviews
Analyze reviews Admin
Detect spams
Display spammers
Block spam users
Fig 3.2 Use Case Diagram

3.4 Dataflow Diagram
The DFD is straightforward graphical formalism that can be utilized to speak to a

framework as far as the info information to the framework, different preparing did on this
information and the yield information created by the framework.
A DFD model uses an exceptionally predetermined number of primitive images to speak
to the capacities performed by a framework and the information stream among the capacities.
The principle motivation behind why the DFD method is so famous is most likely in
light of the way that DFD is an exceptionally basic formalism.
It is easy to comprehend and utilization. Beginning with the arrangement of abnormal
state works that a framework performs, a DFD display progressively speaks to different sub
capacities. Actually, any various leveled model is easy to get it.

REQUEST
SEARCH LIST OF DATA
PRODUCTS PRODUCTS
EXTRACT
DATA
USER FETCH DETAILS WEBSITE
BUY
PRODUCT
REVIEW PRODUCTS
GET TOTAL
REVIEW
BLOCKED
USER
EXTRACT SET OF
FEATURE REVIEWS
NLP
GET
DETAILS
SPAM
DETECTION
Fig 3.3 Data Flow Diagram

3.4 Activity Diagram
Activity diagram is another important diagram in UML to describe dynamic aspects of

the system. Activity diagram is basically a flow chart to represent the flow from one activity to
another activity. The activity can be described as an operation of the system. So the control flow
is drawn from one operation to another.
BROWSE PRODUCTS
BUY PRODUCTS
FEEDBACK
PROCESS REVIEW
EVALUATE USERS SPAM ANALYSIS
NATURAL LANG. PROCESSING
RESULTS FOR SPAMMERS
Fig 3.4 Activity Diagra

CHAPTER 4
SYSTEM REQUIREMENT SPECIFICATION
System Requirement Specification (SRS) is a central report, which frames the establishment of
the product advancement process. It records the necessities of a framework as well as has a
depiction of its significant highlight. A SRS is essentially an association's seeing (in composing)
of a client or potential customer's frame work necessities and conditions at a specific point in
time (generally) before any genuine configuration or improvement work. It's a two-way
protection approach that guarantees that both the customer and the association comprehend
alternate's necessities from that viewpoint at a given point in time.
The composition of programming necessity detail lessens advancement exertion, as watchful
audit of the report can uncover oversights, mistaken assumptions, and irregularities ahead of
schedule in the improvement cycle when these issues are less demanding to right. The SRS talks
about the item however not the venture that created it, consequently the SRS serves as a premise
for later improvement of the completed item.
The SRS may need to be changed, however it does give an establishment to proceed with
creation assessment. In straightforward words, programming necessity determination is the
beginning stage of the product improvement action. The SRS means deciphering the thoughts in
the brains of the customers – the information, into a formal archive – the yield of the prerequisite
stage. Subsequently the yield of the stage is a situated of formally determined necessities, which
ideally are finished and steady, while the data has none of these properties.

4.1 Hardware Requirements
The most common set of requirements defined by any operating system or software
application is the physical computer resources, also known as hardware, A hardware
requirements list is often accompanied by a hardware compatibility list (HCL), especially
in case of operating systems. The hardware requirements are a follows.,
 System : Intel i3 2.1 GHZ
 Memory : 4GB.
 Hard Disk : 40 GB.
 Monitor : 15 VGA Color
4.2 Software Requirements:
Software requirements may be calculations, technical details, data manipulation

and processing and other specific functionality that define what a system is supposed to
accomplish. Behavioural requirements describing all the cases where the system uses the
functional requirements are captured in use cases. These are things that the system is
required to do.
 Operating System : Windows 7 / 8
 Language : JAVA / J2EE
 Database : MySQL
 Tool : NetBeans, Navicat, Tomcat Server

Java Server Pages
JavaServer Pages(JSP) are a technology that helps software developers create

dynamically generated web pages based on HTML,XML, or other document types. Released in
1999 by Sun Microsystems, JSP is similar to PHP, but it uses the Java programming language.
To deploy and run JavaServer Pages, a compatible web server with a servlet container, such
as Apache Tomcat or Jetty, is required.
Architecturally, JSP may be viewed as a high-level abstraction of Java servlets. JSPs are
translated into servlets at runtime each JSP, servlet is cached and re-used until the original JSP is
modified.
JSP can be used independently or as the view component of a server-side model–view–

controller design, normally with JavaBeans as the model and Java servlets (or a framework such
as Apache Struts) as the controller. This is a type of Model 2 architecture.
JSP allows Java code and certain pre-defined actions to be interleaved with static web
markup content, with the resulting page being compiled and executed on the server to deliver a
document. The compiled pages, as well as any dependent Java libraries, use Java bytecode rather
than a native software format. Like any other Java program, they must be executed within a Java
virtual machine (JVM) that integrates with the server's host operating system to provide an
abstract platform-neutral environment.
JSPs are usually used to deliver HTML and XML documents, but through the use of
Output Stream, they can deliver other types of data as well. The Web container creates JSP
implicit objects like page Context, Servlet Context, session, request & response.
Web Browser
Servelet Filter JSP Pages

(Controller) (View)
JavaBeans
Instantiate
(Model)
Server
Data Sources/Database
Fig 4.1 JSP Model

A JavaServer Pages compiler is a program that parses JSPs, and transforms them into
executable Java Servlets. A program of this type is usually embedded into the application
server and run automatically the first time a JSP is accessed, but pages may also be recompiled
for better performance, or compiled as a part of the build process to test for errors. Some JSP
containers support configuring how often the container checks JSP file timestamps to see
whether the page has changed. Typically, this timestamp would be set to a short interval
(perhaps seconds) during software development, and a longer interval (perhaps minutes, or even
never) for a deployed Web application.
Java Servlet
The servlet is a Java programming language class used to extend the capabilities of
a server. Although servlets can respond to any types of requests, they are commonly used to
extend the applications hosted by web servers, so they can be thought of as Java applets that run
on servers instead of in web browsers. These kinds of servlets are the Java counterpart to other
dynamic Web content technologies such as PHP and ASP.NET.
Response Request
System
(a)
JSP Page JSP
(.JSP) Translator (b)
(Tomcat) Text
Buffer
(in
memory)
Servelet Java Compiler
Source Code (embedded
(Java) server)
Execution
Phase
Server
Class JRE
(.class)
JSP Container
Translation Phase
(a) Translation occurs at this point, if JSP has been changed or is new.
(b) If not, translation is skipped.
Fig. 4.2 Life of a JSP File
Three methods are central to the life cycle of a servlet. These are init(), service(),
and destroy(). They are implemented by every servlet and are invoked at specific times by the
server.
 During the initialization stage of the servlet life cycle, the web container initializes the
servlet instance by calling the init() method, passing an object implementing the
javax.servlet.ServletConfig interface. This configuration object allows the servlet to
access name-value initialization parameters from the web application.
 After initialization, the servlet instance, can service client requests. Each request is serviced
in its own separate thread. The web container calls the service() method of the servlet for
every request. The service() method determines the kind of request being made and
dispatches it to an appropriate method to handle the request. The developer of the servlet
must provide an implementation for these methods. If a request is made for a method that is
not implemented by the servlet, the method of the parent class is called, typically resulting in
an error being returned to the requester.
 Finally, the web container calls the destroy() method that takes the servlet out of service.
The destroy() method, like init(), is called only once in the lifecycle of a servlet.
The following is a typical user scenario of these methods.
1. Assume that a user requests to visit a URL.
 The browser then generates an HTTP request for this URL.

 This request is then sent to the appropriate server.
2. The HTTP request is received by the web server and forwarded to the servlet container.
 The container maps this request to a particular servlet.
 The servlet is dynamically retrieved and loaded into the address space of the
container.
3. The container invokes the init() method of the servlet.
 This method is invoked only when the servlet is first loaded into memory.
 It is possible to pass initialization parameters to the servlet so that it may configure
itself.
4. The container invokes the service() method of the servlet.
 This method is called to process the HTTP request.
 The servlet may read data that have been provided in the HTTP request.

 The servlet may also formulate an HTTP response for the client.
5. The servlet remains in the container's address space and is available to process any other
HTTP requests received from clients.
 The service() method is called for each HTTP request.
6. The container may, at some point, decide to unload the servlet from its memory.
 The algorithms by which this decision is made are specific to each container.
7. The container calls the servlet's destroy() method to relinquish any resources such as file
handles that are allocated for the servlet; important data may be saved to a persistent store.
8. The memory allocated for the servlet and its objects can then be garbage collected.
MySQL
Structured Query Language is a special-purpose programming language designed for

managing data held in a relational database management system (RDBMS).Originally based
upon relational algebra and tuple relational calculus, SQL consists of a data definition
language and a data manipulation language. The scope of SQL includes data insert,
query, update and delete, schema creation and modification, and data access control. Although
SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes
procedural elements.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks." Despite not entirely adhering to the relational model as described by Codd, it became
the most widely used database language.
SQL became a standard of the American National Standards Institute(ANSI) in 1986,

and of the International Organization for Standardization(ISO) in 1987. Since then, the standard
has been enhanced several times with added features. Because the editor is extensible, you can
plug in support for many other languages. Keeping a clear overview of large applications, with
thousands of folders and files, and millions of lines of code, is a daunting task. Despite these
standards, code is not completely portable among different database systems, which can lead
to vendor lock-in. The difference makers do not perfectly adhere to the standard, for instance by
adding extensions, and the standard itself is sometimes ambiguous.

NetBeans IDE
NetBeans IDE is the official IDE for Java 8. With its editors, code analyzers, and
converters, you can quickly and smoothly upgrade your applications to use new Java 8 language
constructs, such as lambdas, functional operations, and method references. Batch analyzers and
converters are provided to search through multiple applications at the same time, matching
patterns for conversion to new Java 8 language constructs. With its constantly improving Java
Editor, many rich features and an extensive range of tools, templates and samples, NetBeans
IDE sets the standard for developing with cutting edge technologies out of the box. An IDE is
much more than a text editor. The NetBeans Editor indent lines, matches words and brackets,
and highlight source code syntactically and semantically. It also provides code templates, coding
tips, and refactoring tools. The editor supports many languages from Java, C/C++, XML and
HTML, to PHP, Groovy, Javadoc, JavaScript and JSP. Because the editor is extensible, you can
plug in support for many other languages. Keeping a clear overview of large applications, with
thousands of folders and files, and millions of lines of code, is a daunting task. NetBeans IDE
provides different views of your data, from multiple project windows to helpful tools for setting
up your applications and managing them efficiently, letting you drill down into your data
quickly and easily, while giving you versioning tools via Subversion, Mercurial, and Get
integration out of the box. When new developers join your project, they can understand the
structure of your application because your code is well-organized.
Design GUIs for Java SE, HTML5, Java EE, PHP, C/C++, and Java ME applications
quickly and smoothly by using editors and drag-and-drop tools in the IDE. For Java SE
applications, the NetBeans GUI Builder automatically takes care of correct spacing and
alignment, while supporting in-place editing, as well. The GUI builder is so easy to use and
intuitive that it has been used to prototype GUIs live at customer presentations. The cost of
buggy code increases the longer it remains unfixed. NetBeans provide static analysis tools,
especially integration with the widely used FindBugs tool, for identifying and fixing common
problems in Java code. In addition, the NetBeans Debugger lets you place breakpoints in your
source code, add field watches, step through your code, run into methods.
The NetBeans Profiler provides expert assistance for optimizing your application's speed and
memory usage, and makes it easier to build reliable and scalable Java SE, JavaFX and Java EE
applications. NetBeans IDE includes a visual debugger for Java SE applications, letting you
debug user interfaces without looking into source code. Take GUI snapshots of your applications
and click on user interface elements to jump back into the related source code.
Fig. 4.3 Snap Shot of Net Beans
Apache
The Apache HTTP Server is a web server software notable for playing a key role in the
initial growth of the World Wide Web. In 2009 it became the first web server software to
surpass the 100 million web site milestone. Apache is developed and maintained by an open
community of developers under the auspices of the Apache Software Foundation. Since April
1996 Apache has been the most popular HTTP server software in use. As of November 2010
Apache served over 59.36% of all websites and over 66.56% of the first one million busiest
websites.
Navicat Premium
Navicat Premium is a multi-connections database administration tool allowing you to
connect to MySQL, MariaDB, SQL Server, and SQLite, Oracle and PostgreSQL databases
simultaneously within a single application, making database administration to multiple kinds of
database so easy.
Navicat Premium combines the functions of other Navicat members and supports most of
the features in MySQL, MariaDB, SQL Server, SQLite, Oracle and PostgreSQL including
Stored Procedure, Event, Trigger, Function, View, etc.
Navicat Premium enables you to easily and quickly transfer data across various database
systems, or to a plain text file with the designated SQL format and encoding. Also, batch job for
different kind of databases can also be scheduled and run at a specific time. Other features
include Import/ Export Wizard, Query Builder, Report Builder, Data Synchronization, Backup,
Job Scheduler and more. Features in Navicat are sophisticated enough to provide professional
developers for all their specific needs, yet easy to learn for users who are new to database server.
Establish a secure SSH session through SSH Tunnelling in Navicat. You can enjoy a
strong authentication and secure encrypted communications between two hosts. The
authentication method can use a password or public / private key pair. And, Navicat comes with
HTTP Tunnelling while your ISPs do not allow direct connections to their database servers but
allow establishing HTTP connections. HTTP Tunnelling is a method for connecting to a server
that uses the same protocol (http://) and the same port (port 80) as a webserver does.

CHAPTER 5
IMPLEMENTATION
5.1 Programming Language Selection
Java is a little, basic, safe, item situated, translated or rapidly improved, byte coded,
engineering, waste gathered, multithreaded programming dialect with a specifically exemption
taking care of for composing circulated and powerfully extensible projects.
With most programming dialects, you either accumulate or translate a project so you can
run it on your PC. The Java programming dialect is irregular in that a project is both
accumulated and deciphered. The stage autonomous codes deciphered by the mediator on the
Java stage. The mediator parses and runs every Java byte code guideline on the PC. Aggregation
happens just once; understanding happens every time the project is executed. The accompanying
figure delineates how this function You can consider Java byte codes as the machine code
directions for the Java Virtual Machine (Java VM). Each Java mediator, whether it’s an
advancement device or a Web program that can run applets, is an execution of the Java VM.
Fig. 5.1 Features of Java

5.2 Selection of Platform

A platform is the hardware or software environment in which a program runs. As already
mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS.
Most platforms can be described as a combination of the operating system and hardware. The
Java platform differs from most other platforms in that it’s a software-only platform that runs on
topof other hardware-based platforms.
The Java platform has two components:

• The Java Virtual Machine (JVM)
• The Java Application Programming Interface (Java API)
We’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported
onto various hardware-based platforms.
The Java API is a large collection of ready-made software components that provide many useful
capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into
libraries of related classes and interfaces; these libraries are known as packages.
The figure depicts a program that’s running on the Java platform. As the figure shows, the Java
API and the virtual machine insulate the program from the hardware.
myProgram.java
Java API
Java Platform
Java Virtual Machine
Hardware Based Platform
Fig. 5.2 Java Interpreter Architecture

5.3 Functional Description of Modules
 Shapelet Learning Model: Shapelets are discriminative subsequence of time series that
best predict the target variable, while shapelet learning models are usually designed with
a classification purpose that aims to identify the similarity between two items.[4]
 Product Network Regularization: The product network provides correlation

information about all online stores. We model three types of heterogeneous information
network as regularization terms: store-based regularization, product-based regularization,
and user-correlation regularization.[9]
 Collaborative Hyping Detection Model: We propose our collaborative hyping

detection model (CHDM) to solve the collective marketing hyping problem defined
earlier. This model integrates all the regularization terms we’ve defined into a shapelet
learning model that utilizes temporal features and product network information for
clustering.

CHAPTER 6
SYSTEM TESTING
Types of Testing:
 Unit Testing
Individual component are tested to ensure that they operate correctly. Each
component is tested independently, without other system component. This system
was tested with the set of proper test data for each module and the results were
checked with the expected output. Unit testing focuses on verification effort on the
smallest unit of the software design module. This is also known as MODULE
TESTING. This testing is carried out during phases, each module is found to be
working satisfactory as regards to the expected output from the module.
 Integration Testing
Integration testing is another aspect of testing that is generally done in order to
uncover errors associated with flow of data across interfaces. The unit-tested
modules are grouped together and tested in small segment, which make it easier to
isolate and correct errors. This approach is continued unit I have integrated all
modules to form the system as a whole.
 System Testing
System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. System testing ensures that the entire integrated
software system meets requirements. It tests a configuration to ensure known and
predictable results. An example of system testing is the configuration oriented
system integration testing. System testing is based on process description and flows,
emphasizing pre-driver process and integration points.
 Performance Testing
The performance testing ensure that the output being produced within the time limits
and time taken for the system compiling, giving response to the users and request
being send to the system in order to retrieve the results.

 Validation Testing
The validation testing can be defined in many ways, but a simple definition is that.
Validation succeeds when the software functions in a manner that can be reasonably
expected by the end user.
 Black Box testing

Black box testing is done to find the following
 Incorrect or missing functions
 Interface errors
 Errors on external database access
 Performance error
 Initialization and termination error
 White Box Testing

This allows the tests to
 Check whether all independent paths within a module have been
exercised at least once
 Exercise all logical decisions on their false sides
 Execute all loops and their boundaries and within their boundaries
 Exercise the internal data structure to ensure their validity
 Ensure whether all possible validity checks and validity lookups
have been provided to validate data entry.
 Acceptance Testing
This is the final stage of testing process before the system is accepted for operational
use. The system is tested within the data supplied from the system procurer rather
than simulated data.

CHAPTER7
CONCLUSION
As previously discussed, the MSSD model identifies spam stores or products one by one by
detecting abnormal singleton reviewers appearing in an assigned time window. However, this
method misses the latent information that underlies evolving hyping activities. We pick up two
of the representative cases, which were tagged as “spam” by the MSSD model but that our
model placed in a “clean” class. Apparently, there’s a remarkable purchasing burst in both of
them, with 80 percent of buyers in this time window being singleton reviewers. In our
experiment, we define customers who have made fewer than five transactions online since their
registration as singleton reviewers. Because of the different customer level–segmentation
strategies and privacy policies in Taobao, this provides the best match with the definition of
singleton reviewers in the MSSD model.

REFERENCES
[1] F. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to Identify Review Spam,” Proc. Int’l
Joint Conf. Artificial Intelligence, 2011, pp. 2488–2493.
[2] S. Feng et al., “Distributional Footprints of Deceptive Product Reviews,” Proc. Int’l Conf.
Web and Social Media, 2012, pp. 98–105.
[3] G. Fei et al., “Exploiting Burstiness in Reviews for Review Spammer Detection,” Proc. Int’l
Conf. Web and Social Media, 2013, pp. 175–184.
[4] Q. Xu et al., “SMS Spam Detection Using Noncontent Features,” IEEE Intelligent Systems,
vol. 27, no. 6, 2012, pp. 44–51.
[5] S. Xie et al., “Review Spam Detection via Temporal Pattern Discovery,” Proc. ACM Int’l
Conf. Knowledge Discovery and Data Mining, 2012, pp. 823–831.
[6] T. Moore, R. Clayton, and H. Stern, “Temporal Correlations between Spam and Phishing
Websites,” Proc. 2nd Usenix Conf. Large-scale Exploits and Emergent Threats, 2009, p. 5.
[7] J. Grabocka et al., “Learning Time- Series Shapelets,” Proc. ACM Int’l Conf. Knowledge
Discovery and Data Mining, 2014, pp. 392–401.
[8] J. Lines et al., “A Shapelet Transform for Time Series Classification,” Proc. ACM Int’l Conf.
Knowledge Discovery and Data Mining, 2012, pp. 289–297.
[9] Q. Zhang et al., “Exploring Heterogeneous Product Networks for Discovering Collective
Marketing Hyping Behavior,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining,
2016, pp. 40–51.

Appendix A
Snapshots
Fig. A 1 Registeration Page

Fig A 2 Login Page

Fig A 3 User Home

Fig A 4 IP Blocking

Fig A 5 Blocked user message

Appendix B
Conference Details
Presenting and publishing paper entitled “Collective Hyping Detection System To Identify
Online Spam Activities Using AI” in proceedings of National Conference on Science
Engineering and Management (NCSEM-2018) which will be help on 24-25th May 2018 at
bThe Oxford College of Engineering, Bengaluru.

VTU FINAL YEAR PROJECT REPORT Main Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VTU FINAL YEAR PROJECT REPORT Main Report

Uploaded by

Copyright:

Available Formats

Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018

Department of CSE, TOCE 1

This field involves various challenges:

To overcome these challenges, we propose an unsupervised shapelet learning model to discover

Department of CSE, TOCE 2

1.2 Objectives And Goals

1.3 Existing System

1.4 Proposed System

Department of CSE, TOCE 3

 Distributional Footprints of Deceptive Product Reviews by Song Feng, Longfei Xing,

Department of CSE, TOCE 4

Department of CSE, TOCE 5

Department of CSE, TOCE 6

3.1 Design Consideration

3.2 System Architecture

The architectural configuration procedure is concerned with building up a fundamental

Department of CSE, TOCE 7

Check Hyping Quality and Pay

Post TP and Pay Fake Review

Purchase & Hyping (€)

Target (€) Online

(€) : Fake reviewers make genuine purchases

Fig 3.1 System Architecture

Department of CSE, TOCE 8

3.3 Use Case Diagram

Analyze reviews Admin

Block spam users

Fig 3.2 Use Case Diagram

Department of CSE, TOCE 9

3.4 Dataflow Diagram

The DFD is straightforward graphical formalism that can be utilized to speak to a

Department of CSE, TOCE 10

Fig 3.3 Data Flow Diagram

Department of CSE, TOCE 11

3.4 Activity Diagram

Activity diagram is another important diagram in UML to describe dynamic aspects of

EVALUATE USERS SPAM ANALYSIS

NATURAL LANG. PROCESSING

RESULTS FOR SPAMMERS

Fig 3.4 Activity Diagra

Department of CSE, TOCE 12

SYSTEM REQUIREMENT SPECIFICATION

Department of CSE, TOCE 13

4.1 Hardware Requirements

 System : Intel i3 2.1 GHZ

 Hard Disk : 40 GB.

 Monitor : 15 VGA Color

4.2 Software Requirements:

Software requirements may be calculations, technical details, data manipulation

 Operating System : Windows 7 / 8

 Language : JAVA / J2EE

 Tool : NetBeans, Navicat, Tomcat Server

Department of CSE, TOCE 14

Java Server Pages

JavaServer Pages(JSP) are a technology that helps software developers create

JSP can be used independently or as the view component of a server-side model–view–

Servelet Filter JSP Pages

Fig 4.1 JSP Model

Department of CSE, TOCE 15

The following is a typical user scenario of these methods.

1. Assume that a user requests to visit a URL.

 The browser then generates an HTTP request for this URL.

Department of CSE, TOCE 17

Structured Query Language is a special-purpose programming language designed for