You are on page 1of 6

EMOTION RECOGNITION USING FACIAL EXPRESSIONS WITH ACTIVE

APPEARANCE MODELS
Matthew S. Ratliff
Department of Computer Science
University of North Carolina Wilmington
601 South College Road
Wilmington, NC, USA
msr3520@uncw.edu
ABSTRACT
Recognizing emotion using facial expressions is a key element in human communication. In this paper we discuss a
framework for the classification of emotional states, based
on still images of the face. The technique we present involves the creation of an active appearance model (AAM)
trained on face images from a publicly available database
to represent shape and texture variation key to expression
recognition. Parameters from the AAM are used as features for a classification scheme that is able to successfully
identify faces related to the six universal emotions. The results of our study demonstrate the effectiveness of AAMs
in capturing the important facial structure for expression
identification and also help suggest a framework for future
development.
KEY WORDS
Emotion, Facial Expression, Expression Recognition, Active Appearance Model

Introduction

Facial expressions provide a key mechanism for understanding and conveying emotion. Even the term interface
suggests the primary role of the face in communication between two entities. Studies have shown that interpreting
facial expressions can significantly alter the interpretation
of what is spoken as well as control the flow of a conversation [31]. Meharbian has suggested that the ability for
humans to interpret emotions is very important to effective
communication, accounting for up to 93% of communication used in a normal conversation [23]. For ideal humancomputer interfaces (HCI), we would desire that machines
have this capability as well. Computer applications could
better communicate by changing responses according to
the emotional state of human users in various interactions.
In order to work toward these capabilities, efforts
have recently been devoted to integrating affect recognition into human-computer applications [20]. Applications
exist in both emotion recognition and agent-based emotion
generation [17]. The work presented in this paper explores
the recognition of expressions, although the same research
can be useful for synthesising facial expressions to convey

Eric Patterson
Department of Computer Science
University of North Carolina Wilmington
601 South College Road
Wilmington, NC, USA
pattersone@uncw.edu
emotion [17] [18]. By creating machines that can understand emotion, we enhance the communication that exists
between humans and computers. This would open a variety
of possibilities in robotics and human-computer interfaces
such as devices that warn a drowsy driver, attempt to placate an angry customer, or better meet user needs in general. The field of psychology has played an important role
in understanding human emotion and in developing concepts that may aid these HCI technologies. Ekman and
Freisen have been pioneers in this area, helping to identify six basic emotions (anger, fear, disgust, joy, surprise,
sadness) that appear to be universal across humanity [12] .
In addition, they developed a scoring system used to systemically categorize the physical expression of emotions,
known as the Facial Action Coding System (FACS) [13].
FACS has been used in a variety of studies and applications, and has found its way into many face-based computer technologies. The study of the facial muscle movements classified by FACS in creating certain expressions
was used to inform the choice of landmarks for active appearance model (AAM) shape parameters in our work. Our
work thus far has been focused on developing a framework
for emotion recognition based on facial expressions. Facial
images representing the six universal emotions mentioned
previously as well as a neutral expression were labeled in
a manner to capture expressions. An AAM was built using
training data and tested on a separate dataset. Test face images were then classified as one of the six emotion-based
expressions or a neutral expression using the AAM parameters as classification features. The technique achieved a
high level of performance in classifying these different facial expressions based on still images. This paper presents
a summary of current contributions to this area of research,
discusses our approach to the problem, and details techniques we plan to pursue for this work.

Previous Work

Facial expressions provide the building blocks with which


to understand emotion. In order to effectively use facial
expressions, it is necessary to understand how to interpret
expressions, and it is also important to study what others
have done in the past. Fasel and Luettin performed an in-

depth study in an attempt to understand the sources that


drive expressions [15] . Their results indicate that using
FACS may incorporate other sources of emotional stimulus
including non-emotional mental and physiological aspects
used in generating emotional expressions.
Much of expression research to date has focused on
understanding how underlying muscles move to create expressions [15] [18]. For example, studies have shown that
movement of the nasiolabial furrow, in addition to movement of the eyes and eyebrows, is a primary indicator of
disgust [3]. FACS as well as EMFACS go into much detail as to the poses of the face and have served as important
references and a system of study for other work [13] [16].
Though it was originally used for the analysis of facial
movement by human observers, FACS has been adopted by
the animation and recognition communities. (EMFACS is
a further simplification to focus only on facial action units
that contribute emotional information). Cohn et al. report
that facial expressions can differ somewhat from culture to
culture for a particular emotion, but the similarities in expression for an emotion are usually strong enough to overcome these cultural differences [9].
Much of previous work has used FACS as a framework for classification. In addition to this, previous studies
have traditionally taken two approaches to emotion classification according to Fasel and Luettin [15]: a judgment
based approach and a sign-based approach. The judgment approach develops the categories of emotion in advance such as the traditional six universal emotions. The
sign-based approach uses a FACS system, encoding action
units in order to categorize an expression based on its constituents. This approach assumes no categories, but rather
assigns an emotional value to a face using a combination of
the key action units that create the expression.
2.1

Data Collection

One challenge of research in expression and emotion


recognition is collection of suitable data for training and
testing systems. Several databases have been developed for
facial expressions. Some are small and have limited accessibility, and most have used actors to portray emotions in
video recordings [27] [1] [24] [18]. The use of professional
actors may fail to capture expression that is completely and
accurately representational of underlying emotional content. There is likely a difference between artificially posed
expressions and those based on true underlying emotion.
One feature that suggests this is the lack of constriction
in the obicularis oculi during artificial smiles [12]. Fasel
and Luettin recognized that the use of posed facial expressions tends to be exaggerated and easier to recognize as opposed to those found in spontaneous expressions [15]. Walhoff however, has developed and released a database constructed in an effort to elicit genuine responses [30]. This
is the database that we have used so far, and it is discussed
in further detail in the next section of this paper. Actual
comparisons between results on specific databases has also

been somewhat limited, and it would be useful to encourage comparisons of techniques on the same data sets.
2.2

Feature Extraction Methods

In order to recognize expressions of the face, a useful feature scheme and extraction method must be chosen. One
of the most famous techniques used in face recognition and
related areas is that of eigenfaces developed by Turk and
Pentland [29]. An average face and a set of basis functions
for face-space is constructed using principal components
analysis. Although a successful method for simple face
recognition, this technique would lack feature specificity
of underlying muscle movements appropriate to facial expressions.
Other feature extraction methods have been explored
including image-processing techniques such as Gabor filters and wavelets [21]. Bartlett used a similar approach for
feature extraction employing a cascade of classifiers used
to locate the best filters for feature extraction [1]. Michel
and Kaliouby use a method similar to our approach for
extracting features [24]. Their method employs a feature
point tracking system similar to active shape models. According to their research, Cohen suggests that using feature point tracking shows on average a 92% agreement with
manual FACS coding by professionals [7]. Shape information of some kind is likely one of the most important types
of data to include in any feature method.
Image-based methods have been applied in many areas of facial computing. One of the most successful recent
techniques, though, incorporates both shape and texture information from facial images. The AAM, developed initially by Cootes and Taylor [11], has shown strong potential in a variety of facial recognition technologies, but to our
knowledge has yet to be used in recognizing emotions. It
has the ability to aid in initial face-search algorithms and in
extracting important information from both the shape and
texture (wrinkles, nasio-labial lines, etc.) of the face that
may be useful for communicating emotion.
2.3

Classification Schemes

Several classification schemes have been used thus far, including support vector machines, fuzzy-logic systems, and
neural networks. For instance, Eckschlager et al. used an
ANN to identify a users emotional state based on certain
pre-defined criteria [10]. NEmESys attempts to predict the
emotional state of a user by obtaining certain knowledge
about things that commonly cause changes in behavior. By
giving the computer prior information such as eating habits,
stress levels, sleep habits, etc. the ANN predicts the emotional state of the user and can change its responses accordingly. (One weakness of this approach is that it requires
the user to fill out a questionaire providing the system with
the information in advance). While this system is unique,
it does not incorporate any interpretation of facial emotion,

which has been identified as one of the key sources of emotional content [12] [18] [9]. Another approach used a fuzzy,
rule-based system to match facial expressions that returned
a probable emotion based on rules of the system [25].
Several have used support vector machines (SVM) as
a classification mechanism [21] [1] [24]. In most cases
SVMs yield good separation of the clusters by projecting
the data into a higher dimension. Michel and Keliouby indicated a 93.3% successful classification when aided with
Adaboost for optimal filter selection [24]. Sebe, Lew, Cohen, Garg, and Huang [27] offer a naive Bayes approach
in emotion recognition based on a probability model of facial features given a corresponding emotional class. The
work presented in this paper does not focus on classification schemes and uses a simple Euclidean-distance classification.

Techniques in Our Work

In this work our main goal was to study the effectiveness


of using AAMs to build a robust framework for recognizing expressions indicative of emotion in still images of the
human face. We present a method for feature extaction
and classification that yields successful results and builds
a framework for future development.
3.1

Background

In order to develop an emotion classification system using


still images, several issues must be resolved. One of the
first and most important challenges is acquiring appropriate data for both training and testing. As discussed earlier,
we chose to use the facial expression database developed
by Walhoff [30], known as FEEDTUM. This database
contains still images and video sequences of eighteen test
subjects, both male and female, of varying age. Rather
than hiring actors to artificially create or mimic emotional
responses, this database was developed with the attempt
to actually elicit the required emotions. Using a camera
mounted on a computer screen, subjects are shown various movie clips that hopefully trigger an emotional response. No prior information about what was to be shown
was given in an attempt to elicit genuine responses. The
database is organized by category using the six basic emotions [12]. In our experiment we create classification states
for each of these basic emotions and also for neutral facial
expressions.
In addition to acquiring training data, a method for
feature extraction from the training data is also needed.
AAMs are well suited for the task of handling various poses
and expressions and are thus chosen for this work.
Building an appearance model entails choosing images to be used as training data and then properly labeling
those images using a pre-defined format based on the nature
of the experiment. The following subsection discusses the
selection of data, landmark labeling, and AAM creation.

Test subject scoring


Subject
1
2
3
4
5
..
.
500

Sincerity Clarity M ovement


3 of 10 3 of 10
No
4 of 10 7 of 10
No
8 of 10 4 of 10
Y es
7 of 10 9 of 10
No
9 of 10 5 of 10
No
..
..
..
.
.
.
9 of 10 9 of 10
No

Score
7.0
7.5
6.5
6.0
3.5
9.5

Table 1. Classification result by subject

3.2

The Experiment

Upon evaluation of the facial expression database [30], several subjects were removed from the data set used for this
work. Occlusions such as eyeglasses, hair, as well as inconsistencies in expression were the main factors that contributed to the removal of these subjects. Facial images and
their representative expression in the database were categorized based on emotion clarity, sincerity, and head movement. Emotion clarity ranks the image based on the clarity
of the emotional content. Sincerity was also chosen as a
measure to help determine how well the subject conveys
the intended emotion. Head movement is not included in
the experiment and those images exhibiting certain levels
of head movement are excluded from the training and test
sets. Also, subjects 7, 14, and 17 were removed from the
data set due to facial occlusions such as eyeglasses and hair
as well as other inconsistencies. Overall, the database stills
were evaluated and each image given an overall score as
shown in Table 1. A benchmark was set which marked the
minimum score required for inclusion in this initial experiment.

Figure 1. Points used in training the AAM.

Once the scoring process was complete, subjects with


very low scores were omitted from the training and test-

ing set for this initial experiment. The facial landmarks


in this work are shown in Figure 1. They were chosen in
an attempt to capture pose information of the underlying
muscles of the face that create expressions. FACS helps
provide insight into which parts of the face correspond to
certain emotional expressions [12]. This guided the landmark selection process with a total of 113 landmarks. Key
areas were chosen to capture the movement of the brow,
eyes, mouth, and nasio-labial region as formed by the underlying muscles expected for expression of the face. Once
an initial AAM was trained on several subjects, the search
function helped automate the labeling process. Using this
technique we labeled over 500 images (4 images per subject x 18 subjects x six emotions + neutral).

Subject 1
Subject 2
Subject 3
Subject 4
Subject 5
Subject 6
Subject 8
Subject 9
Subject 10
Subject 11
Subject 12
Subject 13
Subject 15
Subject 16
Subject 18
Total Average Correct

Percentage Correct
80.0%
74.0%
90.5%
90.9%
96.3%
79.2%
83.3%
100%
60.0%
100%
75%
100%
83.3%
89.7%
100%
91.7%

Table 2. Classification results by subject.

Figure 2. Sample of faces used in training.

In this experiment we used a leave-one-out approach


to improve testing methods with relatively few subjects.
Stills from each of the fifteen subjects were used for testing
data after an AAM and class parameter-vector means were
found using the other subject stills as training data. This
work uses a simple Euclidean distance from face parameters to the mean parameter vector for each emotion as the
classification scheme. Vectors from both the training and
test data were extracted from the appearance model and
loaded into MATLAB code to create the mean parameter
vectors and compute the distances for classification.

Figure 3. Viewing AAM generation using training data.

One mode of the AAM based on these parameter vectors is shown in Figure 3. The two faces on either side
represent variation from the mean within the model. The

center face represents the average face created using the


combinations of all images for all emotions. The variability of the data is adjusted by modifying certain parameters
in the AAM to make sure that a high percentage of the data
is represented by the model to be able to deal with subtle
changes in facial features.

Experimental Results

An analysis of the results in Table 2 show that the system


correctly classified anywhere between 60% and 100% for
each individual using only still images. Most subjects were
in the 80% to 90% range, but a few subjects showed poor
performance recognition. It is difficult with these early results to say whether that is due to subject expression, feature method, or classification method. It may be that more
sophisticated classifiers could achieve better separation and
thus results for these individuals. Table 3 shows anger and
fear generated the largest margin of error with only 63.9%
average correct for both. Possibilities for low performance
on these also include limited training data and poor separation due to the simple Euclidean classifier. Based on
the scoring scheme mentioned earlier, an evaluation of the
database also suggests that the subjects had difficulty expressing negative emotions. The overall success in this
first classification approach leaves room for future development, particularly in some areas. Overall, though, AAM
parameters achieved significant success using only a Euclidean distance measure and produced results that compare well with other techniques.

Fear
Joy
Surprise
Anger
Disgust
Sadness
Neutral

Percentage Correct
90.0%
93.3%
79.7%
63.9%
93.3%
63.9%
93.3%

Table 3. Classification results by emotion.

Conclusions

Using the AAM as a feature method has proven successful even with a simple Euclidean-distance classification
scheme. The capability of AAMs to model both the shape
and texture of faces makes them a strong tool to derive feature sets for emotion-based expression classification. It is
certainly likely that more sophisticated classifiers such as
SVMs will provide better results on this data set. Overall,
though, this initial work has shown potential for AAMs as
a feature set for expression classification.
5.1

Future Work

We are currently expanding this initial data set to consider other classification schemes used in conjunction with
AAM parameter features, considering Bayesian classifiers
and SVMs initially. We also plan to explore dynamic expression recognition, as levels of sophistication can be improved with temporal information. This also has the possibility of strengthening methods based on FACS and scoring
schemes that are generated automatically.
A current weakness in this area of facial study,
though, is still the lack of comparable databases. We plan to
consider others in our future work [2] but would also like to
encourage the creation and use of common data sets in this
area as a means to strengthen comparison and fine-tuning
of techniques.

Acknowledgements

Special thanks to Dr. Eric Patterson for guidance and direction with the project, as well as Dr. Curry Guinn for
assistance with project scope and data collection methods.
Also, gratitude is extended to Frank Walhoff for providing
a freely accessible database.

References
[1] Marian Stewart Barlett, Gwen Littlewort, Ian Fasel,
and Javier R. Movellan. Real time face detection
and facial expression recognition: Development and
applications to human computer interaction. In Pro-

ceeding of the 2003 Converence on Computer Vision


and Pattern Recognition Workshop, 2003.
[2] Alberto Battocchi, Fabio Pianesi, and Dina GorenBar. A first evaluation study of a database of kinetic
facial expressions (dafex). In ICMI 05: Proceedings
of the 7th international conference on Multimodal interfaces, pages 214221, New York, NY, USA, 2005.
ACM.
[3] Jeffrey F. Cohn By Ying-li Tian, Takeo Kanade. Recognizing action units for facial expression analysis.
In 2000 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR00) Volume 1, June 2000.
[4] George Caridakis, Lori Malatesta, Loic Kessous,
Noam Amir, Amaryllis Raouzaiou, and Kostas Karpouzis. Modeling naturalistic affective states via facial and vocal expressions recognition. In ICMI 06:
Proceedings of the 8th international conference on
Multimodal interfaces, pages 146154, New York,
NY, USA, 2006. ACM.
[5] C.Izard. The maximally descriminative facial movement coding system (max). Available from Instructional Resource Center, 1979.
[6] C.Izard, L.M Doughtery, and E.A Hembree. A System
for Indentifying Affect Expressions by Holistic Judgments. University of Delaware, 1983.
[7] Ira Cohen, Nicu Sebe, Fabio G. Cozman, and
Thomas S. Huang. Semi-supervised learning for facial expression recognition. In Proceedings of the 5th
ACM SIGMM, 2003.
[8] Jeffrey F. Cohn. Foundations of human computing:
facial expression and emotion. In ICMI 06: Proceedings of the 8th International Conference on Multimodal interfaces, pages 233238, New York, NY,
USA, 2006. ACM.
[9] Jeffrey F. Cohn, Karen Schmidt, Ralph Gross, and
Paul Ekman. Individual differences in facial expression: Stability over time, relation to self-reported
emotion, and ability to inform person identification.
In IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002.
[10] Manfred Eckschlager, Regina Bernhaupt, and Manfred Tscheligi. Nemesys - neural emotion eliciting
system. In CHI 05 extended abstracts on Human factors in computing systems CHI 05, 2005.
[11] G.J. Edwards, T.F. Cootes, and C.J. Taylor. Face reccognition using active appearance models. In Proceedings of the European Conference on Computer
Vision, 1998.

[12] P. Ekman. Emotions Revealed: Recognizing Faces


and Feeling to Improve Communication and Emotional Life. Holt, 2003.

[24] Philipp Michel and Rana El Keliouby. Real time facial expression expression recognition in video using
support vector machines, 2003.

[13] P. Ekman and W. Friesen. Facial Action Coding


System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo
Alto, 1978.

[25] Muid Mufti and Assia Khanam. Fuzzy rule based


facial expression recognition. In International Conference on Computational Intelligence for Modeling
Control and Automation, and International Converence on Intelligent Agents, IEEE, 2006.

[14] Paul Ekman, Wallace Friesen, and Joseph Hager.


Emotional Facial Action Coding System Manual.
2002.
[15] B. Fasel and J. Luettin. Automatic facial expression
analysis: A survey. Pattern Recognition, 2003.
[16] W.V. Friesen and P. Ekman. Emfacs-7: Emotional facial action coding system. Unpublished Manuscript,
University of California at San Francisco, 1983,
http://citeseer.comp.nus.edu.sg/context/1063041/0.
[17] Lisa Gralewski, Neill Campbell, Barry thomas, Colin
Dalton, and David Gibson. Statistical synthesis of facial expressions for the portrayal of emotion. In Proceedings of the 2nd International conference on Computer graphics and interactive techniques in Australasia and South East Asia, 2004.
[18] Rita T. Griesser, Douglas W. Cunningham, Christian
Wallraven, and Heinrich H. Balthoff. Psychophysical investigation of facial expressions using computer
animated faces. In Proceedings of the 4th symposium
on Applied perception in graphics and visualization
APGV 07, July 2007.
[19] Soumya Hamlaoui and Franck Davoine. Facial action
tracking using particle filters and active appearance
models. In Proceedings of the 2005 Joint Conference
on Smart Objects and Ambient Intelligence, 2005.
[20] Diane J. Litman and Kate Forbes-Riley. Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting
on Association for Computational Linguistics ACL
2004, 2004.
[21] Gwen Littlewort, Marian Stewart Bartlett, Ian Fasel,
Joshua Susskind, and Javier Movellan. Dynamics of
facial expression extracted automatically from video.
In IEEE Conference on Computer Vision and Pattern
Recognition: Workshop on Face Processing in Video,
2004.
[22] Juwei Lu, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos. Face recognition using kernel direct discriminant analysis algorithms. IEEE
Transactions on Neural Networks, 2003.
[23] A. Mehrabian. Communication without words. Psychology Today, 1968.

[26] Maja Pantic, Nicu Sebe, Jeffrey F. Cohen, and


Thomas Huang.
Affective multimodal humancomputer interaction. In Proceedings of the 13th annual ACM international conference on Multimedia
MULTIMEDIA 2005, 2005.
[27] N. Sebe, I. Cohen, A. Garg, M. Lew, and T. Huang.
Emotion recognition using a cauchy naive bayes classifier, 2002.
[28] Jeffrey F. Cohn Takeo Kanade, Yingli Tian. Comprehensive database for facial expression analysis. In
Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition,
March 2000.
[29] Matthew A. Turk and Alex P. Pentland. Face recognition using eigenfaces. Pattern Recognition, 1991.
[30] Frank Walhoff.
Facial expression and emotion
database from technical university of munich.
[31] Christian Wallraven, Heinrich H. Bulthoff, Douglas W. Cunningham, Jan Fischer, and Dirk Bartz.
Evaluation of real-world and computer-generated
stylized facial expressions. In ACM Transactions on
Applied Perception, volume 4, page 16, New York,
NY, USA, 2007. ACM.

You might also like