Professional Documents
Culture Documents
Abstract We examined if entertainment-robots should use Synthetic speech and real speech should not be mixed.
emotions. In an experiment we presented jokes to participants Though this was a pleasant feel for the participants, the
to find out if different emotions have different effects on their efficiency when working on their specific task decreased.
pleasure. We found out that emotions do have an impact on
users perceptions when using entertainment robots. Another study [4], examines the influence of the consistency
of speech (recorded + synthetic vs. only synthetic). The
I. INTRODUCTION influence on jokes is regarded in [3] and [5]. Like [3] the
Robots today have a growing importance for different authors conclude that consistency of the produced speech
aspects of our life. They perform tasks of a great diversity, is a preference. The tested participants preferred a joke
being employed as a robot for lawn mowing, a toy robot or system with a purely synthetic voice over a system with
recently even as maintenance and monitoring robots. Robots changing acoustic output. A study about the influence of
interact with their users during many of these activities. emotions on computer generated speech is [6]. It describes an
Interaction and communication should be as effortless as experiment with reviews being read with computer generated
possible. This helps increasing the acceptability and the speech that match with or differs from the personality of
diffusion process. Interaction should be human like, because the test person. The result is that a matching personality
every user knows how to interact in this way and therefore has a positive impact on the reception of a review. This
it is simple to make use of it. Human Interaction is often paper presents the results of our experiment to study the
based on speech. Still the question remains: Will we be influence of emotions in computer-generated speech. In an
able to communicate with a computer on the same level experiment, we used the AIBO robot to communicate with
like communicating to other humans in the future? Its a users. The AIBO reads emails and it tells stories or jokes on
goal that remains to be of great importance. Advancing demand. These stories and jokes are provided by the WWW
the production of speech in robotic systems remains to be via a free Web service. Speech production is accomplished
an important research area. As it is not yet possible to by the Open Source Text-To-Speech System (TTS) MARY
produce a convincingly natural sounding language for a non [7]. The general possibilities of TTS systems are described
strict domain of speech there are only a few approaches to and discussed in [8]. The TTS system MARY offers the
study the influence of difference emotions for the production possibility for a configuration of parameters controlling the
of speech. During a students-project for the integration of emotion of the speech production. Therefore, our interest
robots into everyday life, a web-service-based system for the regarding the project was to control these emotion parame-
entertainment of the inhabitants was developed. The robot, ters. A summarization of multiple parameters usable for the
a Sony AIBO [1], was used to deliver a broad variety of emotional enhancement of synthetic speech is given in [9]. In
entertainment services. Many of these services use speech the following section we are describing our experiment and
to communicate with the user. The usage of synthesized procedures in detail. This is followed by a presentation of our
speech and the influence of emotions of synthesized on the collected data. The collected data will be discussed and we
listener has been already analyzed in several studies. In [2], conclude this paper with an outlook on future developments.
the authors come to the result that the influence of emotions
II. EXPERIMENT
is less remarkable with synthetic language than it is with
recorded speech. Nevertheless, [2] points out, that a synthetic We wanted to study the influence of emotions on the
voice with happy emotions leads to positive perception of perception of synthetic speech. Therefore, we let AIBO
the spoken text. The usage of sad emotions makes the text robots communicate with diverse users. No participant had
more uninteresting. Appropriate emotions engender a feeling any real contact with the AIBO robot system before.
of attraction to the text. Emotions that are in opposition to The experminent was executed with each participant indi-
the text support its credibility. Another aspect is examined vidually. Each participant sat on a table on which the AIBO
in [3]. It examines the influence of the consistency of was placed. Additionally, two supervisors were present that
a human computer interface controlled by language. The guided the experiment and answered questions. The partici-
authors conclude that consistency is an important factor. pants were told that the experiment was done in the context
of a student project examining the interaction between hu-
*The authors are listed in alphabetical order. All authors are members of mans and robots.
the Chair of Agent Technologie (AOT), Technische Universitat Berlin, Ger-
many. olaf.kroll-peters@dai-labor.de, floorice@cs.tu-berlin.de, ugurs@cs.tu- Prior to each experiment, the demographic data of each
berlin.de, stein-uni@web.de, mathias.wilhelm@dai-labor.de. participant (age, gender, native speaker) were recorded.
3823
TABLE II
WAS CONTENT UNDERSTANDABLE ?
TABLE III
WAS CONTENT FUNNY ?
3824
V. CONCLUSION AND FUTURE WORK
Our experiment was set up to answer the question:How
important is emotional speech in the context of entertainment
robots?. Our participants stated that an emotional speech
is preferred for the communication with a robot. A change
in the parameters that control the emotions led to a partly
better comprehension. The answer to our question therefore
is very important. At the same time we revealed in our
tests that there are still a lot of improvements to be made
in order to reproduce emotions with synthetic speech in a
correct and sufficient manner. Because of this we will be put
additional effort into this issue. More experiments will be
set up to increase the perception of the configured emotions
for the user. In next experiments we will evaluate emotional
speech in a environment with children. We want to find out
whether there is a difference between children and adults
when perceiving emotional speech.
VI. ACKNOWLEDGMENTS
The authors gratefully acknowledge the infrastructure,
assistance and any comments of the DAI-Labor at the
Fig. 5. Perceived emotions Technische Universitat Berlin.
VII. REFERENCES
IV. QUESTIONNAIRE R EFERENCES
Every participant filled out a questionnaire in addition [1] Sony AIBO http://support.sony-europe.com/aibo (21.02.2008)
to the experiment described above. This was intended to [2] Nass, C., Foehr, U., Brave, S. and Somoza, M.: The effects of emotion
record the thoughts of the participants in relation to the of voice in synthesized and recorded speech. Proc. AAAI Symposium
Emotional and Intelligent II: The Tangled Knot of Social Cognition,
experiment and their judgment about the use of entertainment 2001.
robots. Overall one could notice that the participants had [3] Gong, L. and Lai, J.: Shall we Mix Synthetic Speech and Human
fun during the experiment. At the beginning the looks of Speech? The Impact on Users Task Performance and Attitude. Proc.
Human Factors in Computing Systems ACM CHI, p.158-165, 2001.
the faces were rather sceptical. This diminished during the [4] Nass, C., Simard, C. and Takhteyev, Y.: Should Recorded and Syn-
process. Everybody smiled at least at some time during the thesized Speech be Mixed? www.stanford.edu/nass/comm369/pdf/
experiment. The question regarding the idea of robots with MixingTTSandRecordedSpeech.pdf (21.02.2008).
[5] Mihalcea, R. and Strapparava, C.: Making Computers Laugh: Inves-
emotionalized speech was answered with good by most tigations in Automatic Humor Recognition. Proc. Human Language
of the participants. Interesting was answered a multiple Technology and Empirical Methods in Natural Language Processing,
times also. Just two of twenty-four participants rated the p.531-538, 2005.
[6] Nass, C. and Lee, K. M.: Does Computer-Generated Speech Manifest
idea as being irrelevant. The reaction of the participants to Personality? An Experimental Test of Similarity-Attraction. Confer-
the movements of the robot prior to the narration of a joke ence on Human Factors in Computing Systems The Hague, p.329-336,
was overall positive. Just three of the participants rated the 2000.
[7] The MARY Text-to-Speech System: http://mary.dfki.de (21.02.2008)
movements as irritating and distracting. There were multiple [8] Mohasi, E. and Mashao, D.: Text-to-Speech Technology in Human
praises for the movements and suggestions for additional Computer Interaction. 5th Conference on Human Computer Interac-
moves on the other hand. tion in Southern Africa (CHISA), p.79-84, 2006.
[9] Schroder, M.: Emotional speech synthesis: A review. In Proceedings
from Eurospeech, vol.1, p.561-564, 2001.
3825