You are on page 1of 19

Project

Synopsis
on

DEVELOPING A CHATBOT USING SEQUENCE MODELLING

Submitted as partial fulfillment for the award of

BACHELOR OF TECHNOLOGY
DEGREE
Session 2018-19
in
Information Technology
By
DEEPANSHU SHARMA
(1503213041)
AYAAN SAMAD
(1503213036)
KASHISH SINGHAL
(1503213054)

Under the guidance of


Ms. Deepali Dev & Dr. Kanika Gupta

ABES ENGINEERING COLLEGE, GHAZIABAD

AFFILIATED TO
U.P. TECHNICAL UNIVERSITY, LUCKNOW

1|P ag e
DECLARATION

We hereby declare that the work being presented in this report entitled
“Developing a Chatbot using Sequence Modelling” is an authentic record of
our own work carried out under the supervision of “Ms. Deepali Dev and Dr.
Kanika Gupta.”
The matter embodied in this report has not been submitted by us for the award of
any other degree.

Dated Signature:
Name:
Department:

Dated : Signature:
Name:
Department:

Dated : Signature:
Name:
Department:

This is to certify that the above statement made by the candidate(s) is correct to the
best of my knowledge.

Signature of HOD Signature of Supervisor


Prof.Amit Sinha Ms.Deepali Dev
IT Department Assistant Professor
IT Department

Signature of Supervisor
Dr. Kanika Gupta
Associate Professor
IT Department

Date............................

2|P ag e
ACKNOWLEDGEMENT
We wish to extend our sincere gratitude to our project mentor, Ms.Deepali Dev
and Dr. Kanika Gupta, Department of Information Technology for their valuable
guidance and encouragement which has been absolutely helpful.
We are indebted to Dr.Amit Sinha (HOD, Department of Information
Technology) for his valuable support.

Deepanshu Sharma
1503213041

Kashish Singhal
1503213054

Ayaan Samad
1503213036

3|P ag e
ABSTRACT

Our project developing a Chatbot using sequence modelling focuses on


developing a model which will be able to generate responses automatically to
the questions asked using a number of different machine learning techniques.
This document gives an introduction to the basic aspects of the proposed
model. The proposed model may be used for order generating in restaurants or
in call centre to deal with the common problems faced. We are pursuing our
objective in three phases: Design, analysis and Implementation. For analysis
page we have taken a large segment of data from social networking sites to
generate the outcomes for commonly discussed problems. The parameters are
prioritized based on the interpretation of this data. We have also planned to
include algorithms to differentiate the data according to whether we can
generate answer for that or we need to use the internet in the types of
questions asked. The UI is also pursued from analysis point of view.

4|P ag e
TABLE OF CONTENTS

Sr. No. Content Page No.


1 Introduction 7
2 Objective 8
3 Steps for model development 10

4.1 Methodology 11
4.2 Tools Used 11

4.3 Work breakdown structure 12


5.2 Processing and arranging data 14
5.3 Classification algorithms 15
6 Result and discussion 17
8 Conclusion 18
9 References 19

5|P ag e
LIST OF FIGURES:-

Sr.No. Name Page no.

1.1 Process flow diagram 8

1.2 Workflow diagram 11

1.3 Snapshot of collected data 12

1.4 Snapshot of code to collect data from 13


facebook

1.5 Data entered by users with structure 14

1.6 K nearest algorithm structure 15

1.7 Support vector machine structure 16

1.8 test case 1 screenshot 17

1.9 test case 2 screenshot 17

6|P ag e
1) INTRODUCTION-

A chatbot (also known as a talkbots, chatterbot, Bot, IM bot, interactive


agent, or Artificial Conversational Entity) is a computer program or an artificial
intelligence which conducts a conversation via auditory or textual methods.

Such programs are often designed to convincingly simulate how a human


would behave as a conversational partner, thereby passing the Turing test.
Chatbots are typically used in dialog systems for various practical purposes
including customer service or information acquisition. Some chatterbots use
sophisticated natural language processing systems, but many simpler systems
scan for keywords within the input, then pull a reply with the most matching
keywords, or the most similar wording pattern, from a database.
The term "ChatterBot" was originally coined by Michael Mauldin in 1994 to
describe these conversational programs. Today, most chatbots are either
accessed via virtual assistants such as Google Assistant and Amazon Alexa,
via messaging apps such as Facebook Messenger or WeChat, or via individual
organizations' apps and websites.
Chatbots can be classified into usage categories such as conversational
commerce (e-commerce via chat), analytics, communication, customer
support, design, developer tools, education, entertainment, finance, food,
games, health, HR, marketing, news, personal, productivity, shopping, social,
sports, travel and utilities.
Chatbots can be added to a buddy list or provide a single game player with an
entity to interact with while awaiting other "live" players. If the bot is
sophisticated enough to pass the Turing test, the person may not even know
they are interacting with a computer program.
As consumers continue to move away from traditional forms of
communication, chat-based communication methods are expected to rise.
Chatbot-based virtual assistants are increasingly used to handle simple tasks,
freeing human agents to focus on higher-profile service or sales cases. This
leads to cost savings -- employees cost more -- and it also allows companies to
provide a level of customer service during hours when live agents aren't
available.

7|P ag e
2.OBJECTIVE-
Our objective is to construct a chatterbot such that it will provide responses for a
person instead of people needing to attend the problems themselves and it will
answer based on the data provided by the user to avoid disturbance in many
matters.

PROCESS FLOW DIAGRAM

Fig 1: process flow diagram for chatbot


8|P ag e
An untrained instance of ChatterBot starts off with no knowledge of how to
communicate. Each time a user enters a statement, the library saves the text
that they entered and the text that the statement was in response to. As
ChatterBot receives more input the number of responses that it can reply and
the accuracy of each response in relation to the input statement increase.

The program selects the closest matching response by searching for the closest
matching known statement that matches the input, it then chooses a response
from the selection of known responses to that statement.

We can teach chatbot by training it with examples of existing


conversations.

Example :

bot.train([
'How are you?',
'I am good.',
'That is good to hear.',
'Thank you',
'You are welcome.',
])

9|P ag e
2) STEPS FOR MODEL DEVELOPMENT-
The process of creating a chatbot follows a pattern similar to the development of a web
page or a mobile app. It can be divided into Design, Building, Analytics and Maintenance.

Design
The chatbot design is the process that defines the interaction between the user and the
chatbot. The chatbot designer will define the chatbot personality, the questions that will be
asked to the users, and the overall interaction. It can be viewed as a subset of the
conversational design. In order to speed up this process, designers can use dedicated
chatbot design tools that allow for immediate preview, team collaboration and video
export. An important part of the chatbot design is also centered around user testing. User
testing can be performed following the same principles that guide the user testing of
graphical interfaces.

Building
The process of building a chatbot can be divided into two main tasks: understanding the
user's intent and producing the correct answer. The first task involves understanding the
user input. In order to properly understand a user input in a free text form, a Natural
Language Processing Engine can be used. The second task may involve different approaches
depending on the type of the response that the chatbot will generate.

Analytics
The usage of the chatbot can be monitored in order to spot potential flaws or problems. It
can also provide useful insights that can improve the final user experience.

Maintenance
To keep chatbots up to speed with changing company products and services, traditional
chatbot development platforms require ongoing maintenance. This can either be in the
form of an ongoing service provider or for larger enterprises in the form of an in-house
chatbot training team. To eliminate these costs, some startups are experimenting
with Artificial Intelligence to develop self-learning chatbots, particularly in Customer
Service applications.

API's
There are lots of API's available for building your own chatbot like Wikipedia api which helps
us to get data from Wikipedia etc.
10 | P a g e
3) METHODOLOGY-

Start

Obtaining and preparing data to


train

Analyzing data for intents

Analyzing data to build answer system

Designing User interface

Analyzing relevant entities in


questions and answers

Test the model

Stop

Fig 1.2 Workflow Diagram


TOOLS USED
 Python: for developing algorithms and chatbot backend development
 Wikipedia API : for searching data on the internet and loading it in our
chatbot
 Chatterbot: ChatterBot is a Python library that makes it easy to generate
automated responses to a user’s input.
 Php: for developing a social media website to pick data
 Sql: database
 Jquery: used for social networking site
 Html : for frontend development
11 | P a g e
5.Work breakdown structure

5.1) Data Collection-


For analysis, we needed a large data set with information on different topics so we can
either create it or take it from social networking sites

Let’s say we want to scrape the New York Times’ Facebook page. We would send a request
to https://graph.facebook.com/v2.4/nytimes?access_token=XXXXX and we would get:

Fig 3: snapshot of collected data


In this way we have collected data from different facebook pages to get good storage of
data to run our chatbot

12 | P a g e
Fig4: code to scrap data from facebook

5.2PROCESSING AND PUTTING IT T OGE THER

We just have to process each post. If you’re an avid Face book user, you know
that not all of these attributes are not guaranteed to exist. Status updates may
not have text or links. Since we’re making a spreadsheet with an enforced
schema, we need to validate that a field exists before attempting to process it.

Now we have a full plan for scraping, we query each page of Facebook Page
Statuses (100 statuses maximum per page), process all statuses on that page
and writing the output to a CSV file, and navigate to the next page, and repeat
until no more statuses left.

13 | P a g e
DATA FORMAT

Fig 1.5: data entered by users

5.3) Apply classification techniques-

Algorithms used to get the data on the basis of inputs:

K-Nearest Neighbors Algorithm

A type of supervised machine learning algorithm


 KNN is extremely easy to implement in its most basic form, and yet
performs quite complex classification tasks.
 It is a lazy learning algorithm since it doesn't have a specialized training
phase. Rather, it uses all of the data for training while classifying a new
data point or instance.

14 | P a g e
 KNN is a non-parametric learning algorithm, which means that it doesn't
assume anything about the underlying data. This is an extremely useful
feature since most of the real world data doesn't really follow any
theoretical assumption e.g. linear-separability, uniform distribution, etc.

Fig 1.6: K NEAREST ALGORITHM

Support vector machines (SVMs)

SVM’s are a set of supervised learning methods used for classification,


regression and outliers detection.

The advantages of support vector machines are:


 Effective in high dimensional spaces.
 Still effective in cases where number of dimensions is greater than the
number of samples.
 Uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.

15 | P a g e
 Versatile: different Kernel functions can be specified for the decision
function. Common kernels are provided, but it is also possible to specify
custom kernels.

Fig1.7: support vector machine representation

16 | P a g e
6.RESULT AND DISCUSSION

When the bot is unable to find the matching statement in its data set it returns the first line
of data. This problem can be seen in the below two tests we performed. Even though we
add large amount of data whenever we run out of data we can go to internet or use the
Wikipedia api to get a part of data from Wikipedia as result on the asked question.

Fig 1.8: test case 1 screenshot

When there are lots of matching for the same word we can use the regression algorithm to generate
the output which has the maximum amount of hits in the past. We can also create a list of responses
for a particular question to keep the chat interesting and keep changing the common correct
answers.
Example : when user reply hello we can reply back with hi, hello, good morning etc and we can also
initiate further Reponses like how are you ? Or what can I help you with? Etc.

Fig 1.9: test case 2 screenshot

17 | P a g e
7.CONCLUSION-

We have started collection data on different fields and have built a basic
sequence modeling chatbot in linux. We have also started designing a small
social networking site with limited features so as to show real time
modification of data and improved accuracy in generating results
We have a basic model developed and our applying algorithms on the model to
test the best algorithm. We have also starting planning to create our own
social media in php and directly read data from it and modify the data held
previously.
We have plans to add speech recognition using google api for speech to text
conversion and will try to add it into our project
We are also trying to use more api like Wolfram Alpha to make our search
results come faster and complexity is reduced
We are also trying to add a data classifier into our project so that when we
read data we can solve the problems if they are related to maths instead of
wasting time to search the whole database.

18 | P a g e
8.REFERENCES

[1] BUILD BETTER CHATBOTS: A COMPLETE GUIDE TO GETTING STARTED WITH


CHATBOTS by Anik Das
[2] CHATBOT: Architecture, Design, & Development By Jack Cahn
[3] https://en.wikipedia.org/wiki/Chatbot
[4] https://www.skillshare.com

19 | P a g e

You might also like