A Review of Conversational System Framework

DANI WAFAUL FALAH
GRADUATE TECHNICAL WRITING
LITERATURE REVIEW - DRAFT 2
SAT, APRIL 15, 2017
A Review of Conversational System Framework
Research question: How to develop the artificial brain for dialogue systems using Neural
Networks and Generative Models?
I. Introduction
Recently, chatbots , virtual digital assistants, becomes a trend resulting many big
IT providers starting to develop an intelligent chat bot. With an embedded artificial
intelligence, in the near future, these virtual digital assistants could replace human
roles in some areas such as customer services, ticket reservation agent or a banking
assistant.
There are several popular virtual digital assistants such as Siri from Apple,
Cortana from Microsoft, Alexa from Amazon, and Google Now from Google. These
virtual digital assistants could be used instantly from any device provided by their
companies. Rather than creating a device with an embedded virtual digital assistant,
several companies like Facebook and IBM prefer to provide a platform or a
development kit. This development kit is used to build a virtual digital assistant such
as Facebook with Wit.AI development kit and IBM with Watson development kit.
A virtual digital assistant basically is built using a conversational system
technology that can be described as a computer system intended to converse with a
human in a coherent structure using text, speech, graphics, or gestures as
communication methods for both input and output.
A conversational system also includes several branches of Artificial intelligence
such as Natural Language Processing (NLP), Neural Network, and Deep Learning. In a
conversational system, NLP is used for recognizing natural language as an input and
generating natural language response as an output. Meanwhile, the Neural Network
and Deep Learning mostly use as a dialogue state tracker and/or a dialogue policy
manager that act as the brain or a core process to mimic a human natural response
in a conversation.
In most conversational systems, the framework is decided by the task
orientation or measurable task of a dialogue system using a concept of goal-driven
or non-goal-driven conversation. In goal-driven conversation, a dialogue system has
a measurable task and responses to specific question such as restaurant reservation,
ticket booking agent and movie recommender. On the contrary, non-goal-driven
conversation, a dialogue system acts as dialogue companion without a specific
target, so it could be used as language learning companion.
Furthermore, after the framework was decided by the purpose, this framework
was implemented using several techniques and algorithms to achieve a natural
response closer to a human-human conversation. In this literature review, research
from the past ten years have been selected to identify the main purpose of a
conversational system and analyze the framework used as core components. Finally,
this literature review has a focus to give brief illustration how to develop the
artificial brain for dialogue systems using Neural Networks and Generative Models.
In the next section, this literature review discusses structure in detail including
the description of the methodology that is used to select related research.
II. Literature Review: Methodology
This literature review is intended to give a brief explanation about the
framework used to build a conversational system including the history of
conversational systems, the core components and several techniques required to
produce a system closer to a human-human conversation.
Dialogue systems were reviewed based on the framework and measurable task
(goal-driven or non-goal-driven). There are several common core components of
dialogue systems such as Language Understanding (LU), Dialogue State Tracker
(DST), Policy Manager (PM), and Language Generator (LG).
From the selected articles in last decade, the used algorithms were explained for
each component. These algorithms were compared each other based on similarities
and differences to find the best approach and identify gaps for future works. This
literature review also has a purpose of identifying the gaps for future research and
several possibilities to extend conversational system ability.
III. Dialogue Systems Framework
a. Dialogue System history

The first dialogue system in AI history is ELIZA, was found by Wizenbaum at
MIT Artificial Intelligence Lab from 1964 to 1966. ELIZA is an early natural
language program created to demonstrate the superficiality of communication
between a human and a machine. Wiezenbaum’s research has led to another
research that has a purpose to enhance Artificial Intelligence capabilities in
Natural language processing and conversational system.
The idea behind dialogue systems is to perform two way communication
between human and machine using natural language or human language. A
representation of natural language that allows machines to understand and
responses in the same language as human. There is a technology called Natural
Language Processing (NLP) that derived from Artificial Intelligence that focused
on study to understand natural language and response in natural language.
In recent history, several researches has been conducted to add an artificial
brain to dialogue system to allow machine to perform several tasks related to
human input. These research use several techniques, such as rule matching,
probabilistic theory, markup language and neural language. In addition to neural
language, the artificial brain of dialogue systems could perform several
improvements on how machine response to specified task or input, and try to
simulate a conversation closer to human-human conversation. Moreover, to
enhance the artificial brain ability, machine learning techniques, one complex
implementation of neural network, is used to evolve the artificial brain and learn
through time.
b. Task orientation
Dialogue systems are divided into two common task oriented system, such as
a goal-driven systems and non-goal-driven systems.
i. Goal-Driven
Goal-Driven Conversational Systems is build to give question-answer
between man and machine. In this task, conversational system respond
to questions with an answer related to the question. This task usually
refers to travel ticket agent, reservation for a restaurant, or movie
recommendation.
The Goal-driven dialogue systems make use of the complete
framework of the dialogue systems and use policy manager to perform
task selector. In goal-driven policy manager, advance techniques in
machine learning and probabilistic theory are used to predict related
responses based on the pre-trained knowledge base or information
retrieval from the Internet.
ii. Non-Goal Driven
Non-Goal Driven Conversational Systems were intended to give a
chit-chat conversational between a human and machine without a
dedicated task response in the end. Many systems were built using this
purpose such as language learning system. These non-goal driven
systems utilize encoder-decoder as language understanding and language

generator, and policy manager that embed word from pre-trained
knowledge base or datasets.
Most non-goal driven systems use open domain dataset from several
conversation models, such twitter conversation, movie subtitles
database, and movie recommendation database. These datasets contain
human-human conversation that mapped into parameter and word
vector. This word vector is used to embed decoder module in language
generator to generate related answer to human/user input.
c. Dialogue Systems Component
The most common dialogue systems components are Language
Understanding, Dialogue State Tracker, Policy Manager and Language Generator.
Another key factor to dialogue systems is knowledge base. Both goal-driven
system and non-goal-driven system use knowledge base as conversation models
and task options.
i. Language Understanding
Language Understanding (LU) is one of Natural Language Processing
sub-topic and major branch in Artificial Intelligence fields. that described
as computer programs that read, understand, and analyze natural
language input, whether informal, colloquial or even (slightly)
ungrammatical. Most of the related articles use neural network models
as language understanding module such as Sequence to Sequence

(Seq2Seq) Recurrent Neural Network model (Li, et al. 2016a), Hierarchical
Recurrent Encoder Decoder (HRED) neural network model (Serban, et al.
2016) and Memory Networks (Dogde, et al. 2016).
Seq2Seq model was introduced by Susketver as an end-to-end
approach for language learning (Susketver, et al. 2014). Seq2Seq model
uses Recurrent Neural Network (RNN) with Long-Short-Term Memory to
analyze input sequences and works as a decoder. To generate an output
based on the decoder module, another set of RNN and LSTM are used as
an encoder to predict the compatible output.
In non-goal-driven dialogue systems, an end-to-end approach using
Seq2Seq with LSTM, and HRED Neural Network are preferred to use as
Language Understanding, because non-goal-driven dialogue system do
not need to track long-term dialogue state.
ii. Dialogue State Tracker
The next component in a conversational system is Dialogue State
Tracker. A Dialogue State Tracker is used to track conversational position
and give related parameters to dialogue policy from the current question
and previous questions.
A Recurrent Neural Network (RNN) model is used to memorize
dialogue history and perform as a memory unit. There are several
variances of RNN that were used as a memory unit, such as Long-short-

term Memory and Gated Recurrent Unit. Moreover, more complex
neural network model or deep neural network also can be used as
dialogue state tracker (Henderson, et al.2013). Deep Neural Network
Approach use a Neural Network algorithm combined with Long-Short
Term Memory to memorize every task/question asked by a human, so it
can be used to build related responses.
iii. Policy Manager
After a conversation is observed, information extracted from the
conversation as parameters to generate related responses. The policy
manager module in dialogue systems acts as task or response selector
based on the pre-trained knowledge base. The Policy Managers use
certain algorithms to fetch or match related answers from the knowledge
base. In relation to neural network techniques, there are several
algorithms used for the policy manager such as Reinforcement Learning
(Policy Gradient), Supervised Learning (Stochastic Gradient Descent) or
Generative models (n-grams, Hidden Markov Model).
In goal-driven dialogue systems, reinforcement learning is used to
replace handcrafted task selectors. Reinforcement learning allows the
policy managers to explore possible tasks from the knowledge base
(Dhingra, et al. 2016). On the contrary, in non-goal-driven dialogue
systems, Generative models are used for predicting related response. In
combination with supervised learning, Generative models can be trained

to generate responses as close as a human response (Serban, et al. 2016.
& Li, et al. 2016a).
iv. Language Generator
Language Generator is the process by which thought is rendered into
a language. The main role of language generator module in a dialogue
system is to construct a natural language response from an action
generated by the policy manager. This module maintains generated
action values and adds word embeddings from the trained dataset to
produce a natural response. In most of the end-to-end approaches in
dialogue systems, the same method for language understanding is used
again as a decoder module to generate related language output, such as
Seq2Seq neural network model, Hierarchical Recurrent Encoder-Decoder
(Serban, et al. 2016), and Memory Networks (Dodge, et al. 2016)).
Hierarchical Recurrent Encoder-Decoder (HRED) Neural Network uses
hierarchical RNN to maps sentences into a time slot and track dialogue
state. (Serban, et al. 2016). This Recurrent Neural Network uses Gated
Recurrent Unit (GRU) memory cell as both encoder and decoder module.
On the other hand, Memory Networks offer RNN models that can
gain better results and memorize long-term data with multiple
computational hops (Sukhbaatar, et al. 2015). Because of long-term

memory capability, Memory Networks fits perfectly on goal-driven
dialogue systems.
IV. Results
End-to-end non-goal-driven dialogue systems can use Recurrent Neural Network
as decoder and encoder for language understanding and language generator. Non-
goal-driven dialogue systems do not need to maintain longer dialogue history, so a
short term memory cell could be used to save input and predict a response using
Generative models. On the other hand, in goal-driven dialogue systems, a long-term
memory cell is needed to maintain a longer dialogue history as parameters for task
selector. Memory Networks that offer long-term memory can be used as encoder
and decoder module for language understanding and language generator.
Reinforcement Learning, Supervised Learning, and Generative models can be
used to predict next response and perform task selector for given input.
V. Discussion
End-to-end dialogue systems that are used for goal-driven systems could use a
combination of Recurrent Neural Network for encoder-decoder with reinforcement
learning could improve task selector and eliminate handcrafted policy manager. In
the future, the artificial brain for dialogue systems could be composed from an
advance neural network. Deep learning, one of the machine learning techniques that
uses very complex neural network, could be implemented to simulate the artificial
brain for dialogue systems.
VI. Conclusion
This literature review gives a better understanding on how the conversational
systems work and explains several models with its improvement to make the
conversational agent respond naturally to given question. Some gaps and future
works also identified from several articles to bring various methods to enhance
conversational agent abilities.
In this review, dialogue systems are the technology behind virtual assistants. The
dialogue systems framework is decided by the purpose of the systems. The purpose
of dialogue systems is divided into goal-driven systems, such as a banking assistant
or ticket reservation agent, and non-goal-driven systems, such as language learning
tools.
Goal-driven systems need a complicated design that includes encoder-decoder,
dialogue state tracker, and policy manager. A Recurrent Neural Network encoder-
decoder with long-term memory, such as Memory Networks, and Policy Manager
with Reinforcement Learning algorithm or Generative models could be used to
perform complicated task selector that can closely mimic human
On the other hand, non-goal-driven systems do not need a complicated design.
Non-goal-driven systems can be built using Recurrent Neural Network as encoder-
decoder and policy manager for predicting the best responses.

VII. References
Dhingra, B., et al. (2016). End-to-End Reinforcement Learning of Dialogue Agents for
Information Access. arXiv:1609.00777 [Cs].
Dodge, J., et al. (2015). Evaluating Prerequisite Qualities for Learning End-to-End
Dialog Systems.
Henderson, M., Thomson, B., & Young, S. (2013). Deep Neural Network Approach for
the Dialog State Tracking Challenge.
Li, J., et al. (2016a). A Persona-Based Neural Conversation Model.
Li, J., et al. (2016b). Deep Reinforcement Learning for Dialogue Generation.
arXiv:1606.01541 [Cs].
Serban, I. V., et al. (2016). Building End-to-end Dialogue Systems Using Generative
Hierarchical Neural Network Models. In Proceedings of the Thirtieth AAAI
Conference on Artificial Intelligence (pp. 3776–3783). Phoenix, Arizona: AAAI
Press.
Sukhbaatar, et al. (2015). End-To-End Memory Networks. arXiv:1503.08895 [Cs].
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with
Neural Networks. arXiv:1409.3215 [Cs].

A Review of Conversational System Framework - Final - Submitted

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Review of Conversational System Framework - Final - Submitted

Uploaded by

Copyright:

Available Formats

DANI WAFAUL FALAH

GRADUATE TECHNICAL WRITING

LITERATURE REVIEW - DRAFT 2

SAT, APRIL 15, 2017

Networks and Generative Models?

IT providers starting to develop an intelligent chat bot. With an embedded artificial

several companies like Facebook and IBM prefer to provide a platform or a

technology that can be described as a computer system intended to converse with a

human in a coherent structure using text, speech, graphics, or gestures as

communication methods for both input and output.

A conversational system also includes several branches of Artificial intelligence

generating natural language response as an output. Meanwhile, the Neural Network

In most conversational systems, the framework is decided by the task

orientation or measurable task of a dialogue system using a concept of goal-driven

or non-goal-driven conversation. In goal-driven conversation, a dialogue system has

a measurable task and responses to specific question such as restaurant reservation,

ticket booking agent and movie recommender. On the contrary, non-goal-driven

conversation, a dialogue system acts as dialogue companion without a specific

target, so it could be used as language learning companion.

was implemented using several techniques and algorithms to achieve a natural

response closer to a human-human conversation. In this literature review, research

the description of the methodology that is used to select related research.

II. Literature Review: Methodology

This literature review is intended to give a brief explanation about the

framework used to build a conversational system including the history of

conversational systems, the core components and several techniques required to

produce a system closer to a human-human conversation.

(goal-driven or non-goal-driven). There are several common core components of

dialogue systems such as Language Understanding (LU), Dialogue State Tracker

(DST), Policy Manager (PM), and Language Generator (LG).

several possibilities to extend conversational system ability.

III. Dialogue Systems Framework

a. Dialogue System history

language program created to demonstrate the superficiality of communication

between a human and a machine. Wiezenbaum’s research has led to another

research that has a purpose to enhance Artificial Intelligence capabilities in

Natural language processing and conversational system.

The idea behind dialogue systems is to perform two way communication

between human and machine using natural language or human language. A

representation of natural language that allows machines to understand and

responses in the same language as human. There is a technology called Natural

on study to understand natural language and response in natural language.

In recent history, several researches has been conducted to add an artificial

brain to dialogue system to allow machine to perform several tasks related to

probabilistic theory, markup language and neural language. In addition to neural

language, the artificial brain of dialogue systems could perform several

improvements on how machine response to specified task or input, and try to

simulate a conversation closer to human-human conversation. Moreover, to

a goal-driven systems and non-goal-driven systems.

Goal-Driven Conversational Systems is build to give question-answer

between man and machine. In this task, conversational system respond

to questions with an answer related to the question. This task usually

refers to travel ticket agent, reservation for a restaurant, or movie

The Goal-driven dialogue systems make use of the complete

framework of the dialogue systems and use policy manager to perform

task selector. In goal-driven policy manager, advance techniques in

machine learning and probabilistic theory are used to predict related

responses based on the pre-trained knowledge base or information

retrieval from the Internet.

ii. Non-Goal Driven

Non-Goal Driven Conversational Systems were intended to give a

chit-chat conversational between a human and machine without a

purpose such as language learning system. These non-goal driven