Professional Documents
Culture Documents
IN MPEG-4 SEQUENCES
Amaryllis Raouzaiou, Kostas Karpouzis and Stefanos Kollias
Image, Video and Multimedia Systems Lab - Dept of Electrical and Computer Engineering
National Technical University of Athens
Heroon Polytechniou 9, 157 73 Zographou, GREECE
Tel.: (301) 7722491 Fax: (301) 7722492
email: araouz@softlab.ece.ntua.gr
Abstract
While the previous MPEG standards focus primarily on video coding and transmission issues, MPEG-4
concentrates on hybrid coding of natural and synthetic data streams. In this framework, possible applications
include teleconferencing and entertainment applications, where an adaptable synthetic agent substitutes the
actual user. Such agents can interact with each other, receiving input from multi-sensor data, and utilize highlevel information, such as detected emotions and expressions. This greatly enhances human-computer
interaction, by replacing single media representations with dynamic renderings, while providing feedback on
the users emotional status and reactions. Educational environments, virtual collaboration environments and
online shopping and entertainment applications are expected to profit from this concept. Facial expression
synthesis and animation, in particular, is given much attention within the MPEG-4 framework, where higherlevel, explicit Facial Animation Parameters (FAPs) have been dedicated to this purpose. In this work, we
employ general purpose FAPs so as to reduce the definition of facial expressions for synthesis purposes, by
estimating the actual expression as a combination of universal ones. In addition, we provide explicit features,
as well as possible values for the FAPs implementation, while forming a relation between FAPs and the
activation parameter proposed in classic psychological studies.
1. INTRODUCTION
The establishment of the MPEG-4 standard facilitates an alternative way of analyzing and modeling facial
expressions and related emotions. Facial Animation Parameters (FAPs) are utilized in the framework of
MPEG-4 for facial animation purposes, so as to enable efficient hybrid coding of synthetic objects with
natural video. This enables animators to focus on local or global actions on the face, by means of scripting
an animation sequence. For example, the animator can instruct the synthetic model of a human face to
open mouth or lower eyebrow (see Figure 1); in essence, this instruction is passed to the MPEG-4
decoder, which in turn deforms the model by translating the vertices that correspond to the area in question
(see Figure 2). While the standard does cater for the abstract definition of expressions and emotion as a
collection of FAPs and their subsequent interpolation into intermediate expression, this does not necessarily
mean that all possible expressions and emotions can be modeled this way [1]. In general, facial expression
analysis has been mainly concentrated on six expressions, termed as universal. This term means that
humans across different cultures can easily recognize expressions such as joy or disgust [2]. One can
combine different universal expressions to provide intermediate ones, such as fake joy or upset (see Figure
3), or a number of emotional states, such as pain.
The reverse problem, that is the identification of the universal expressions that must be combined to result to
a given intermediate expression is not always clear-cut. In the quest of forming a low-dimensional space in
which distance measures can be defined, notions such as the Feeltrace plane (see Figure 4), defined by the
activation and evaluation axes may be used to diversify the process of synthesizing an intermediate
expression. These notions originate from psychological studies [3] and can be exploited so as to move from
features comprehensible by humans to quantitative measurements, such as FAPs. This can be
accomplished by reversing of the description of the six universal emotions with MPEG-4 FAPs and use of a
priori knowledge that is embedded within a fuzzy rule system. Because FAPs do not correspond to specific
models or polygonal topologies, this scheme can be extended to other models or characters, different from
the one that was analyzed.
Relaxed
Cry
Almost crying
Depressed
Sadness
Lowered
more
less
Raised
Inner arch
more
less
Inner part
more
less
+
+
+
+
[7] N. Tsapatsoulis, K. Karpouzis, G. Stamou, F. Piat and S. Kollias, A Fuzzy System for Emotion
Classification based on the MPEG-4 Facial Definition Parameter Set, EUSIPCO 2000, September
2000, Tampere, Finland