You are on page 1of 12

DANI WAFAUL FALAH

GRADUATE TECHNICAL WRITING

LITERATURE REVIEW - DRAFT 2

SAT, APRIL 15, 2017

A Review of Conversational System Framework

Research question: How to develop the artificial brain for dialogue systems using Neural

Networks and Generative Models?

I. Introduction

Recently, chatbots , virtual digital assistants, becomes a trend resulting many big

IT providers starting to develop an intelligent chat bot. With an embedded artificial

intelligence, in the near future, these virtual digital assistants could replace human

roles in some areas such as customer services, ticket reservation agent or a banking

assistant.

There are several popular virtual digital assistants such as Siri from Apple,

Cortana from Microsoft, Alexa from Amazon, and Google Now from Google. These

virtual digital assistants could be used instantly from any device provided by their

companies. Rather than creating a device with an embedded virtual digital assistant,

several companies like Facebook and IBM prefer to provide a platform or a

development kit. This development kit is used to build a virtual digital assistant such

as Facebook with Wit.AI development kit and IBM with Watson development kit.
A virtual digital assistant basically is built using a conversational system

technology that can be described as a computer system intended to converse with a

human in a coherent structure using text, speech, graphics, or gestures as

communication methods for both input and output.

A conversational system also includes several branches of Artificial intelligence

such as Natural Language Processing (NLP), Neural Network, and Deep Learning. In a

conversational system, NLP is used for recognizing natural language as an input and

generating natural language response as an output. Meanwhile, the Neural Network

and Deep Learning mostly use as a dialogue state tracker and/or a dialogue policy

manager that act as the brain or a core process to mimic a human natural response

in a conversation.

In most conversational systems, the framework is decided by the task

orientation or measurable task of a dialogue system using a concept of goal-driven

or non-goal-driven conversation. In goal-driven conversation, a dialogue system has

a measurable task and responses to specific question such as restaurant reservation,

ticket booking agent and movie recommender. On the contrary, non-goal-driven

conversation, a dialogue system acts as dialogue companion without a specific

target, so it could be used as language learning companion.

Furthermore, after the framework was decided by the purpose, this framework

was implemented using several techniques and algorithms to achieve a natural

response closer to a human-human conversation. In this literature review, research

from the past ten years have been selected to identify the main purpose of a
conversational system and analyze the framework used as core components. Finally,

this literature review has a focus to give brief illustration how to develop the

artificial brain for dialogue systems using Neural Networks and Generative Models.

In the next section, this literature review discusses structure in detail including

the description of the methodology that is used to select related research.

II. Literature Review: Methodology

This literature review is intended to give a brief explanation about the

framework used to build a conversational system including the history of

conversational systems, the core components and several techniques required to

produce a system closer to a human-human conversation.

Dialogue systems were reviewed based on the framework and measurable task

(goal-driven or non-goal-driven). There are several common core components of

dialogue systems such as Language Understanding (LU), Dialogue State Tracker

(DST), Policy Manager (PM), and Language Generator (LG).

From the selected articles in last decade, the used algorithms were explained for

each component. These algorithms were compared each other based on similarities

and differences to find the best approach and identify gaps for future works. This

literature review also has a purpose of identifying the gaps for future research and

several possibilities to extend conversational system ability.

III. Dialogue Systems Framework

a. Dialogue System history


The first dialogue system in AI history is ELIZA, was found by Wizenbaum at

MIT Artificial Intelligence Lab from 1964 to 1966. ELIZA is an early natural

language program created to demonstrate the superficiality of communication

between a human and a machine. Wiezenbaum’s research has led to another

research that has a purpose to enhance Artificial Intelligence capabilities in

Natural language processing and conversational system.

The idea behind dialogue systems is to perform two way communication

between human and machine using natural language or human language. A

representation of natural language that allows machines to understand and

responses in the same language as human. There is a technology called Natural

Language Processing (NLP) that derived from Artificial Intelligence that focused

on study to understand natural language and response in natural language.

In recent history, several researches has been conducted to add an artificial

brain to dialogue system to allow machine to perform several tasks related to

human input. These research use several techniques, such as rule matching,

probabilistic theory, markup language and neural language. In addition to neural

language, the artificial brain of dialogue systems could perform several

improvements on how machine response to specified task or input, and try to

simulate a conversation closer to human-human conversation. Moreover, to

enhance the artificial brain ability, machine learning techniques, one complex

implementation of neural network, is used to evolve the artificial brain and learn

through time.
b. Task orientation

Dialogue systems are divided into two common task oriented system, such as

a goal-driven systems and non-goal-driven systems.

i. Goal-Driven

Goal-Driven Conversational Systems is build to give question-answer

between man and machine. In this task, conversational system respond

to questions with an answer related to the question. This task usually

refers to travel ticket agent, reservation for a restaurant, or movie

recommendation.

The Goal-driven dialogue systems make use of the complete

framework of the dialogue systems and use policy manager to perform

task selector. In goal-driven policy manager, advance techniques in

machine learning and probabilistic theory are used to predict related

responses based on the pre-trained knowledge base or information

retrieval from the Internet.

ii. Non-Goal Driven

Non-Goal Driven Conversational Systems were intended to give a

chit-chat conversational between a human and machine without a

dedicated task response in the end. Many systems were built using this

purpose such as language learning system. These non-goal driven

systems utilize encoder-decoder as language understanding and language


generator, and policy manager that embed word from pre-trained

knowledge base or datasets.

Most non-goal driven systems use open domain dataset from several

conversation models, such twitter conversation, movie subtitles

database, and movie recommendation database. These datasets contain

human-human conversation that mapped into parameter and word

vector. This word vector is used to embed decoder module in language

generator to generate related answer to human/user input.

c. Dialogue Systems Component

The most common dialogue systems components are Language

Understanding, Dialogue State Tracker, Policy Manager and Language Generator.

Another key factor to dialogue systems is knowledge base. Both goal-driven

system and non-goal-driven system use knowledge base as conversation models

and task options.

i. Language Understanding

Language Understanding (LU) is one of Natural Language Processing

sub-topic and major branch in Artificial Intelligence fields. that described

as computer programs that read, understand, and analyze natural

language input, whether informal, colloquial or even (slightly)

ungrammatical. Most of the related articles use neural network models

as language understanding module such as Sequence to Sequence


(Seq2Seq) Recurrent Neural Network model (Li, et al. 2016a), Hierarchical

Recurrent Encoder Decoder (HRED) neural network model (Serban, et al.

2016) and Memory Networks (Dogde, et al. 2016).

Seq2Seq model was introduced by Susketver as an end-to-end

approach for language learning (Susketver, et al. 2014). Seq2Seq model

uses Recurrent Neural Network (RNN) with Long-Short-Term Memory to

analyze input sequences and works as a decoder. To generate an output

based on the decoder module, another set of RNN and LSTM are used as

an encoder to predict the compatible output.

In non-goal-driven dialogue systems, an end-to-end approach using

Seq2Seq with LSTM, and HRED Neural Network are preferred to use as

Language Understanding, because non-goal-driven dialogue system do

not need to track long-term dialogue state.

ii. Dialogue State Tracker

The next component in a conversational system is Dialogue State

Tracker. A Dialogue State Tracker is used to track conversational position

and give related parameters to dialogue policy from the current question

and previous questions.

A Recurrent Neural Network (RNN) model is used to memorize

dialogue history and perform as a memory unit. There are several

variances of RNN that were used as a memory unit, such as Long-short-


term Memory and Gated Recurrent Unit. Moreover, more complex

neural network model or deep neural network also can be used as

dialogue state tracker (Henderson, et al.2013). Deep Neural Network

Approach use a Neural Network algorithm combined with Long-Short

Term Memory to memorize every task/question asked by a human, so it

can be used to build related responses.

iii. Policy Manager

After a conversation is observed, information extracted from the

conversation as parameters to generate related responses. The policy

manager module in dialogue systems acts as task or response selector

based on the pre-trained knowledge base. The Policy Managers use

certain algorithms to fetch or match related answers from the knowledge

base. In relation to neural network techniques, there are several

algorithms used for the policy manager such as Reinforcement Learning

(Policy Gradient), Supervised Learning (Stochastic Gradient Descent) or

Generative models (n-grams, Hidden Markov Model).

In goal-driven dialogue systems, reinforcement learning is used to

replace handcrafted task selectors. Reinforcement learning allows the

policy managers to explore possible tasks from the knowledge base

(Dhingra, et al. 2016). On the contrary, in non-goal-driven dialogue

systems, Generative models are used for predicting related response. In

combination with supervised learning, Generative models can be trained


to generate responses as close as a human response (Serban, et al. 2016.

& Li, et al. 2016a).

iv. Language Generator

Language Generator is the process by which thought is rendered into

a language. The main role of language generator module in a dialogue

system is to construct a natural language response from an action

generated by the policy manager. This module maintains generated

action values and adds word embeddings from the trained dataset to

produce a natural response. In most of the end-to-end approaches in

dialogue systems, the same method for language understanding is used

again as a decoder module to generate related language output, such as

Seq2Seq neural network model, Hierarchical Recurrent Encoder-Decoder

(Serban, et al. 2016), and Memory Networks (Dodge, et al. 2016)).

Hierarchical Recurrent Encoder-Decoder (HRED) Neural Network uses

hierarchical RNN to maps sentences into a time slot and track dialogue

state. (Serban, et al. 2016). This Recurrent Neural Network uses Gated

Recurrent Unit (GRU) memory cell as both encoder and decoder module.

On the other hand, Memory Networks offer RNN models that can

gain better results and memorize long-term data with multiple

computational hops (Sukhbaatar, et al. 2015). Because of long-term


memory capability, Memory Networks fits perfectly on goal-driven

dialogue systems.

IV. Results

End-to-end non-goal-driven dialogue systems can use Recurrent Neural Network

as decoder and encoder for language understanding and language generator. Non-

goal-driven dialogue systems do not need to maintain longer dialogue history, so a

short term memory cell could be used to save input and predict a response using

Generative models. On the other hand, in goal-driven dialogue systems, a long-term

memory cell is needed to maintain a longer dialogue history as parameters for task

selector. Memory Networks that offer long-term memory can be used as encoder

and decoder module for language understanding and language generator.

Reinforcement Learning, Supervised Learning, and Generative models can be

used to predict next response and perform task selector for given input.

V. Discussion

End-to-end dialogue systems that are used for goal-driven systems could use a

combination of Recurrent Neural Network for encoder-decoder with reinforcement

learning could improve task selector and eliminate handcrafted policy manager. In

the future, the artificial brain for dialogue systems could be composed from an

advance neural network. Deep learning, one of the machine learning techniques that
uses very complex neural network, could be implemented to simulate the artificial

brain for dialogue systems.

VI. Conclusion

This literature review gives a better understanding on how the conversational

systems work and explains several models with its improvement to make the

conversational agent respond naturally to given question. Some gaps and future

works also identified from several articles to bring various methods to enhance

conversational agent abilities.

In this review, dialogue systems are the technology behind virtual assistants. The

dialogue systems framework is decided by the purpose of the systems. The purpose

of dialogue systems is divided into goal-driven systems, such as a banking assistant

or ticket reservation agent, and non-goal-driven systems, such as language learning

tools.

Goal-driven systems need a complicated design that includes encoder-decoder,

dialogue state tracker, and policy manager. A Recurrent Neural Network encoder-

decoder with long-term memory, such as Memory Networks, and Policy Manager

with Reinforcement Learning algorithm or Generative models could be used to

perform complicated task selector that can closely mimic human

On the other hand, non-goal-driven systems do not need a complicated design.

Non-goal-driven systems can be built using Recurrent Neural Network as encoder-

decoder and policy manager for predicting the best responses.


VII. References

Dhingra, B., et al. (2016). End-to-End Reinforcement Learning of Dialogue Agents for

Information Access. arXiv:1609.00777 [Cs].

Dodge, J., et al. (2015). Evaluating Prerequisite Qualities for Learning End-to-End

Dialog Systems.

Henderson, M., Thomson, B., & Young, S. (2013). Deep Neural Network Approach for

the Dialog State Tracking Challenge.

Li, J., et al. (2016a). A Persona-Based Neural Conversation Model.

Li, J., et al. (2016b). Deep Reinforcement Learning for Dialogue Generation.

arXiv:1606.01541 [Cs].

Serban, I. V., et al. (2016). Building End-to-end Dialogue Systems Using Generative

Hierarchical Neural Network Models. In Proceedings of the Thirtieth AAAI

Conference on Artificial Intelligence (pp. 3776–3783). Phoenix, Arizona: AAAI

Press.

Sukhbaatar, et al. (2015). End-To-End Memory Networks. arXiv:1503.08895 [Cs].

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with

Neural Networks. arXiv:1409.3215 [Cs].

You might also like