You are on page 1of 7

CIS 496 / EAS 499 Senior Project

Project Proposal Specification


Instructors: Norman I. Badler and Joseph T. Kider. Jr.

A Cognitive Model for an Embodied Conversational Agent


Ian Perera

Advisor: Mitchell P. Marcus

University of Pennsylvania

ABSTRACT

Intelligent Virtual Agent cognitive models often use a series of abstractions to split different tasks into
manageable and solvable problems. For example, language is translated from a sentence to a parse tree, and then to a
semantic representation. The semantic representation is then used with a knowledge base to transform the semantics into
a temporal logic, and then the logic is transformed into statements which can be evaluated. However, such a pipeline has
limitations because each of the constituent parts could aid in evaluating other parts for pronoun reference,
disambiguation, prepositions, and pragmatics, yet are kept separate in a pipeline model.
I propose a cognitive model that consists of a cross between a semantic spreading activation network and finite
state machine, which is embodied in a virtual world by means of callback functions expressed as nodes in the network.
Each node in this network represents a concept that is mapped to other nodes with a relationship. This system allows for
conceptual relationships found in a semantic network to coexist with and fill in the information needed for the functional
callback nodes associated with particular actions. Gates are used to control shortest path and spreading activation
calculations when nodes are queried. Learning can take place through the addition of connections either from language
input or through automatic learning (such as Long-Term Potentiation - adding connections between nodes that activate
together). The FSM aspect is used to model sequences of actions while maintaining conceptual information at each step of
the process.

Project Blog: http://virtuallyembodied.blogspot.com

1. INTRODUCTION I propose a system that addresses these weaknesses by


unifying the semantic, conceptual, and experiential
Intelligent Virtual Agent cognitive models often use a aspects of virtual agents. The core of this system can be
series of abstractions to split different tasks into seen as a combination of a spreading activation network
manageable and solvable problems. For example, (for learning and action determination), a semantic
language is translated from a sentence to a parse tree, and network (for defining concepts hierarchically), and a
then to a semantic representation. The semantic finite state machine (to store and execute sequences of
representation is then used with a knowledge base to actions). The system is a network of nodes, where each
transform the semantics into a logic, and then the logic is node represents a concept that is mapped to other nodes
transformed into statements which can be evaluated. with a relationship. Nodes can also consist of callbacks to
However, this design poses a few problems. First, it is written code, which would be executed depending on the
often the case that each of these parts can only be desired action of the node. This system allows for
maintained and developed by experts in the respective conceptual relationships found in a semantic network to
field of the component. Computational linguists are coexist with and fill in the information needed for the
required to change syntactical and semantic parsing, functional nodes associated with particular actions.
logicians are needed to change and interpret logic
representations, and programmers are required to change 1.1 Design Goals
the controller behavior to add functionality or adapt to a
new domain. Second, although splitting these pieces into The goal of this project is to create a cognitive model
separate parts makes the task easier, it also neglects the that could be used for an embodied conversational agent,
information that could be gleaned from outside processes. but also could be useful in other artificial intelligence
For example, the knowledge base could be used to reject tasks that can utilize both symbolic and instance-based
parses that are unintelligible, or multiple parses can be data, such as computer vision and speech recognition.
considered to determine the most favorable action for the I plan to create an interface for easily adding symbolic
agent. Responses from the controller could also be used knowledge to the system, as well as a test interface
to update the knowledge base or provide a context based consisting of a small virtual environment and a terminal
on information from the environment. Finally, such a to give text commands to the agent using this model.
system is not attuned to learning actions. Since each
process is a fairly mechanical transformation from one
input to an output, there are no structures in place that can 1.2 Projects Features and Functionality
be used for induction.
This project will be able to be implemented in a virtual
environment through callback functions. It will be

CIS 496/EAS499 Senior Design Project


controlled through natural language input and react to the one could implement cognition in a computer program,
environment. Furthermore, it will learn both through while also accounting for certain psychological
natural language and by inference over observations in phenomena such as priming. The base structure for a
the virtual world. Spreading Activation Network is typically a semantic
network, a hierarchical knowledge representation that
2. RELATED WORK stores concepts in terms of their relationships to each
other. Semantic networks were first introduced in [Col69]
Various artificial intelligence applications contain as a cognitive model to explain semantic retrieval times
pieces applicable to this design. The concept of from psychological studies. However, their hierarchical
conversing with an agent in a virtual environment stems representation is useful for generalizing and comparing
from a motivation towards embodied AI, as well as concepts based on their similarity, and they are therefore
practical applications for an embodied conversation used for knowledge representations in many applications,
agent, such as training simulations and natural language such as ConceptNet [LS04]. ConceptNet also generates
interfaces for robots. Spreading Activation Networks are semantic information in its network based on language
a natural way of achieving this goal, combining a model input from users.
of cognition with task-planning in a virtual or robotic Given a base network, spreading activation starts with
environment. one or more “source” nodes. Nodes connected to the
source node receive activation with some decay rate. This
2.1 Embodied AI spreading continues until the activation for the next set of
nodes reaches a certain threshold. Spreading activation in
Although the motivation behind this system is task planning is done by performing two activations
primarily to design agents that can provide a human-like simultaneously – one for the current state of the
interface in virtual worlds, it is also motivated by a desire environment, which will activate possible actions, and
to experiment with the theory of “embodied” artificial one for the desired state of the environment, which will
intelligence. [Chr03] explains the embodied AI argument activate which actions will satisfy the desired goal. The
as follows: the assumption that there are global semantic path with the greatest activation from both sources
truths about the world that could be stored in a knowledge describes the sequence of actions required to reach the
representation is flawed – instead, meaning is stored in a goal.
way that is unique to our experiences. Therefore, rather [BBK00] demonstrate the use of a spreading activation
than trying to create conversational agents that are network for an agent architecture that carries out tasks
disembodied programs, we must give them an even under changing environment conditions. They
environment and sensory facilities to form their describe a case study with a robotic feeding arm, but the
perceptions about the world. Work in embodied AI is principles are general enough to apply to any embodied
typically done either in robotics or virtual environments. I agent.
will focus on the latter in evaluating past efforts.
[Bac03] describes a virtual agent architecture based on
a spreading activation network, with additional
2.2 Virtual Embodied Agents
components to handle virtual inputs and behavior.
[Win71] details the embodied conversational agent However, this architecture does not include a natural
SHRDLU, which was one of the first examples of a language interface. While this architecture shares many
natural language interface for a virtual agent. The agent similarities with my model, my model treats language as
was placed in a limited environment of various shapes of the core of the conceptual model, allowing for both
blocks, and could manipulate and describe them learning and user querying of the agent’s plans and
according to the user’s input. My model should provide beliefs.
similar possible commands to the agent and generalize to [Mae90] describes an architecture that uses an
different environments. activation/inhibition network for goal-directed behavior.
[Bur09] demonstrates current work in evaluating the However, it also does not include a language component,
advantages of embodiment in a virtual world such as and I also plan to extend the concepts of activation and
Second Life, where users can interact with the agent at inhibition by implementing a simulated Long-Term
any time. This provides a way for users to experience Potentiation model to facilitate structural learning in the
human-like interactions without the need for actual semantic spreading activation network.
human users. [ISB11] assert that agents that demonstrate
environmental awareness are judged more convincing in 3. PROJECT PROPOSAL
conversation than those without such awareness. They
implement a conversation system that integrates There are multiple components to this project, some of
environmental knowledge in a limited domain, but more which are integrated less tightly than others. The
importantly, show through user studies that semantic spreading activation network forms the core of
environmental awareness does improve a user’s the system, and is therefore the first priority. The
evaluation of the “humanness” of a virtual agent. This language interface is second, and the embodiment in the
study gives credence to the embodied AI theory and virtual environment is third (and can be substituted with a
highlights relevant applications of embodied text environment in the case of time constraints). Once
conversational agents. these essential parts are implemented, additions and
improvements can be made in any order.
2.3 Spreading Activation Networks

[CL75] explain and expand upon the Spreading


Activation Theory of Semantic Processing first proposed
by Collins. Spreading Activation Theory suggests how

© SIG Center for Computer Graphics 2010.


CIS 496 / EAS 499 Senior Project
Project Proposal Specification
Instructors: Norman I. Badler and Joseph T. Kider. Jr.

3.1 Anticipated Design 3.1.4.2 Queries and Facts

Determine the object or concept in question, then say


3.1.1 Knowledge Representation (KR) or modify the attribute that was requested.

Knowledge in this agent system will be represented in


3.1.5 Grounding
terms of a semantic network. Each node in the semantic
network represents a concept, gate, or function callback. Nodes will be grounded with information from the
Concepts are actions, ideas, and objects that comprise the environment in various ways. Code grounding links an
agent’s beliefs and environment, such as tree, run, and action with a callback function to execute prewritten
color. Gates modify the spreading activations or shortest code. Mathematical grounding would use gate nodes to
path calculations used in reasoning and goal-planning. compare values of attributes within the network. Verbal
Some examples would be a logical OR, a logical AND, a grounding links a particular attribute with a phrase that
delay, or an accumulator. Finally, function callbacks take could be output, or phrase frame that could be output with
function arguments from the spreading activation and various attributes filled in.
trigger if the activation is high enough.
Edges between nodes represent both explicit and 3.1.6 Code Grounding
implicit relationships between them. Explicit
relationships are specified by the KR designer or Code grounding provides developers with the ability to
generated based on textual input. Implicit relationships use existing code for interacting with a virtual
are caused by concurrent activation of stimuli in the environment or any other program. By mapping concepts
environment, and simulate Long-Term Potentiation – the and attributes to functions and their parameters, one can
Hebbian theory of “neurons that fire together, wire embed this model into any program to provide a natural
together”. language interface. Code will be written in C++ and will
utilize the same return types and parameters as a regular
3.1.2 Memory Model C++ program.
Different concepts can be linked to the same function
The memory model for this architecture consists of but with different parameters. For example, the concepts
three parts, but all three parts use the same semantic “walk” and “run” might be linked to the same
spreading activation network model. move(vec2) function, but with different magnitudes.
Long-term knowledge contains general relationships Functions embedded in the network can also return
between concepts and actions that the agent is familiar objects to be used by other functions in the network. For
with. Working memory contains information about the example, one function may return a path between two
current environment, and is used for fulfilling goals in the points, and another function would take that path as an
environment and forming new relationships based on input to move the agent along it.
environmental stimuli. Abstract memory contains
possible theoretical scenarios that could be evaluated to
optimize the path towards a goal. 3.2 Anticipated Approach

The first step in this project will be establishing a


3.1.3 Parsing method of storing the knowledge representation. I may
use a third-party tool for initial implementations, but
First, the input string is determined to be a command, tools such as Protégé would not be able to provide all of
query, or a fact. Commands are either direct calls to the functionality I would like in designing my network,
actions or goals that the agent should try to achieve. such as viewing implicit connections and connections
Queries are questions that the agent will respond to by created during real-time interactions. For my initial work
traversing its KR. Facts are additions to be made to the I can hard-code test KRs, and migrate to a more robust
KR. solution given enough time.
A syntactic parser will then take this input string and Next, I will implement the spreading activation
convert it into a parse tree, labeling the syntax structure. algorithm in the network. The spreading activation
From this tree, I will split clauses and resolve them algorithm will allow me to specify commands, queries,
separately. Coordinating (e.g. and, for) and subordinating and facts as if they have already been parsed into node
(e.g. while, before, after) conjunctions are used to form references to test the system. From here I will develop a
relationships between the clauses, such as preconditions basic parsing system that performs a simple analysis on
or concurrency requirements. input strings, holding the user to certain constraints.
Robust language processing will be delayed to later in the
3.1.4 Traversal Process project.
Code grounding will be the final essential part of the
3.1.4.1 Commands architecture. The “callback” system will simply be calls
to predefined methods for demonstration purposes.
First, find the node corresponding to the action Once the above essential parts are complete, I will
requested. Then, determine if any further information is work on integrating the agent into a simple virtual test
needed from the user. If there are any preconditions of the environment, along with various tests to evaluate the
action, initiate activation at the action node and at the functionality of the agent.
environment root node (the agent itself) to determine the
shortest path sequence of actions to complete the action. Once the agent is working in the virtual environment, I
will implement more robust learning, parsing, and
knowledge in that order.

CIS 496/EAS499 Senior Design Project


3.3 Target Platforms • Emotion modeling through gates and emotional
spreading activation
This system will run on any machine capable of
• Improve natural language inputs
executing C++ code and that has the required graphics
libraries for the virtual environment. • Incorporate feature vector learning for sensory
input
3.4 Evaluation Criteria

To evaluate the success of the project, I will devise (remove line)


various scenarios for the agent to complete in the virtual
environment. These will range from simple tests to more
complex experiments. A simple test may include a few
simple objects, and have certain actions defined for them. You will fill these sections in as you complete your
This environment would also allow me to test actions and project for the alpha review and the final document, these
plans defined through the language input. One example sections give psedo-code, charts, images, examples etc to
would be: “Pick up the green box.” This requires the show what you’ve done over the course of the semester.
agent to choose the green box among other objects, find a We are leaving this section open to creativity too, feel
path to it, and then move along that path and finally free to add whatever you feel is necessary to relate your
execute the code associated with picking an object up.
Variations on this scenario could illustrate other 5. Method
aspects of the system. For example, perhaps the agent
does not know that it has to be touching the object to pick
it up. The agent would then say that it was unable to 6. RESULTS
complete the request, and the user could specify, “You
must be touching an object to pick it up.” This input fact
would add a precondition to the “pick up” action and 7. CONCLUSIONS and FUTURE WORK
would then allow the agent to formulate a new plan which
included touching the object before picking it up.
APPENDIX
More complicated scenarios could test problem solving
and observation skills. For example, perhaps the green A. Optional Appendix
box is behind a closed blue door, and there are different Some text here. Some text here. Some text here.
colored switches within reach of the agent. Given the
same command, “Pick up the green box”, the agent would
currently be unable to form a path to the box. The agent
would determine that the door is an obstacle to the path
when it is closed, and would then either ask the user how
to open it, or try various actions to see which would clear (remove line)
the path.

4. RESEARCH TIMELINE

Project Milestone Report (Alpha Version)


• Completed all background reading References
• Knowledge representation functions with [Bac03] Bach, J. The MicroPsi Agent
spreading activation Architecture.
• Simple example of language input triggering a 2003.
function tied to KR [BBK00] Bagchi, S.; Biswas, G. & Kawamura, K.
Task planning under uncertainty using a
• Virtual environment example (time permitting)
spreading activation network. Systems,
Project Milestone Report (Beta Version) Man and Cybernetics, Part A: Systems
• More robust natural language input and Humans, IEEE Transactions on,
2000, 30, 639 -650
• Additions to KR through natural language input
[Bur09] Burden, D. J. Deploying embodied AI
• Natural language responses into virtual worlds. Knowledge-Based
• Virtual environment example Systems, 2009, 22, 540 – 544.
[Chr03] Chrisley, R. Embodied artificial
Project Final Deliverables
intelligence. Artif. Intell., Elsevier
• Additional improvements to natural language Science Publishers Ltd., 2003, 149, 131-
• More complicated actions specified and tested 150.
[CL75] Collins, A. M. & Loftus, E. F.
• Implicit learning through observation (time
A spreading activation theory of semantic
permitting)
processing. Psychological Review, 1975,
Project Future Tasks 82, 407-428
[ISB11] Ijaz, K.; Simoff, S. & Bogdanovych, A.
Enhancing the Believability of Embodied
Conversational Agents through

© SIG Center for Computer Graphics 2010.


CIS 496 / EAS 499 Senior Project
Project Proposal Specification
Instructors: Norman I. Badler and Joseph T. Kider. Jr.

Environment-, Self- and Interaction-


Awareness. Proceedings of the Thirty-
Fourth Australasian Computer Science
Conference (ACSC 2011), 2011
[LS04] Liu, H. & Singh, P. ConceptNet - A
Practical Commonsense Reasoning Tool-
Kit. BT Technology Journal, Springer
Netherlands, 2004, 22, 211-226
[Mae90] Maes, P. Situated agents can have goals.
Robot. Auton. Syst., North-Holland
Publishing Co., 1990, 6, 49-70

CIS 496/EAS499 Senior Design Project


User input: “Jump.”

Parse: (S (NP (-NONE- *) <no subject specified, assumed to be agent>)


(VP (VBP Jump)))

Find verb “jump” in database, get the node:

Agent

Attribute
Jump Execute

Directio Is-A bool jump(agent, direction, speed)


n { agent.velocity += direction *
speed; }

Up Attribute
Movement
Directio
n

Precondition ?
Parameter substitution

Clearanc Directio
? Direction
n
e

AND

Objects in
Zero Of Nearby
Location:

Figure 1: Example KR chunk representing the “jump” action and its relationship to the “movement” concept

© SIG Center for Computer Graphics 2010.


CIS 496 / EAS 499 Senior Project
Project Proposal Specification
Instructors: Norman I. Badler and Joseph T. Kider. Jr.

24- 31- 7- 14- 21- 28- 7- 14- 21- 28- 4- 11- 18- 25- 2-
Jan Jan Feb Feb Feb Feb Ma Ma Ma Ma Ap Apr Apr Apr May
r r r r r
KR Design

Spreading
Activation
Algorithm
Natural Language
Input

Code Grounding

Embody Agent in
Virtual
Environment
Design/Implement
Test Scenarios

Improvements

Figure 2: Proposed schedule

CIS 496/EAS499 Senior Design Project

You might also like