You are on page 1of 18

726 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO.

4, JULY 2009

Emotion Recognition From Facial Expressions and


Its Control Using Fuzzy Logic
Aruna Chakraborty, Amit Konar, Member, IEEE, Uday Kumar Chakraborty, and Amita Chatterjee

Abstract—This paper presents a fuzzy relational approach to environments to provide assistance in terms of appropriate
human emotion recognition from facial expressions and its control. alerts to prevent accidents. Li and Ji [25] proposed a probabilis-
The proposed scheme uses external stimulus to excite specific tic framework to dynamically model and recognize the users’
emotions in human subjects whose facial expressions are analyzed
by segmenting and localizing the individual frames into regions affective states so as to provide them with corrective assistance
of interest. Selected facial features such as eye opening, mouth in a timely and efficient manner. Picard et al. [31] stressed the
opening, and the length of eyebrow constriction are extracted significance of human emotions on their affective psychological
from the localized regions, fuzzified, and mapped onto an emotion states. Rani et al. [32] presented a novel scheme for the fusion
space by employing Mamdani-type relational models. A scheme of multiple psychological indices for real-time detection of a
for the validation of the system parameters is also presented. This
paper also provides a fuzzy scheme for controlling the transition specific affective state (anxiety) of people using fuzzy logic
of emotion dynamics toward a desired state. Experimental results and regression trees, and compared the relative merits of the
and computer simulations indicate that the proposed scheme for two schemes. Among the other interesting applications in af-
emotion recognition and control is simple and robust, with good fective computing, the works of Scheirer et al. [35], Conati [7],
accuracy. Kramer et al. [22], and Rani et al. [32], [33] deserve special
Index Terms—Emotion control, emotion modeling, emotion mention.
recognition, fuzzy logic. Apart from human–computer interfaces, emotion recogni-
tion by computers has interesting applications in computerized
I. I NTRODUCTION psychological counseling and therapy, and in the detection of
criminal and antisocial motives. The identification of human
H UMANS often use nonverbal cues such as hand gestures,
facial expressions, and tone of the voice to express feel-
ings in interpersonal communications. Unfortunately, currently
emotions from facial expressions by a machine is a complex
problem for the following reasons. First, identification of the
available human–computer interfaces do not take complete exact facial expression from a blurred facial image is not an
advantage of these valuable communicative media and thus easy task. Second, segmentation of a facial image into regions
are unable to provide the full benefits of natural interaction of interest is difficult, particularly when the regions do not have
to the users. Human–computer interactions could significantly significant differences in their imaging attributes. Third, unlike
be improved if computers could recognize the emotion of the humans, machines usually do not have visual perception to map
users from their facial expressions and hand gestures, and facial expressions into emotions.
react in a friendly manner according to the users’ needs and Very few works on human emotion detection have so far
preferences [4]. been reported in the current literature on machine intelligence.
The phrase affective computing [30] is currently gaining Ekman and Friesen [9] proposed a scheme for the recognition
popularity in the literature of human–computer interfaces [35], of facial expressions from the movements of cheek, chin, and
[46]. The primary role of affective computing is to monitor wrinkles. They have reported that there exist many basic move-
the affective states of people engaged in critical/accident-prone ments of human eyes, eyebrows, and mouth that have direct
corelation with facial expressions. Kobayashi and Hara [18]–
Manuscript received December 18, 2006; revised March 16, 2008. First [20] designed a scheme for the recognition of human facial
published April 17, 2009; current version published June 19, 2009. This
work was supported in part by the UGC (UPE) program, Jadavpur University,
expressions using the well-known backpropagation neural algo-
Calcutta. This paper was recommended by Associate Editor S. Narayanan. rithms [16], [39]–[41]. Their scheme is capable of recognizing
A. Chakraborty is with the Department of Computer Science and En- six common facial expressions depicting happiness, sadness,
gineering, St. Thomas’ College of Engineering and Technology, Calcutta
700 023, India, and also with Jadavpur University, Calcutta 700 032, India
fear, anger, surprise, and disgust. Among the well-known meth-
(e-mail: aru_2005@rediffmail.com). ods of determining human emotions, Fourier descriptor [40],
A. Konar is with the Department of Electronics and Tele-Communication template matching [2], neural network models [11], [34], [40],
Engineering, Jadavpur University, Calcutta 700 032, India (e-mail: konaramit@
yahoo.co.in).
and fuzzy integral [15] techniques deserve special mention.
U. K. Chakraborty is with the Department of Mathematics and Com- Yamada [45] proposed a new method of recognizing emotions
puter Science, University of Missouri, St. Louis, MO 63121 USA (e-mail: through the classification of visual information. Fernandez-
chakrabortyu@umsl.edu).
A. Chatterjee is with the Centre for Cognitive Science, Department of Dols et al. proposed a scheme for decoding emotions from
Philosophy, Jadavpur University, Calcutta 700 032, India (e-mail: amita_ju@ facial expression and content [12]. Kawakami et al. [16] an-
yahoo.com). alyzed in detail the scope of emotion modeling from facial
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. expressions. Busso and Narayanan compared the scope of facial
Digital Object Identifier 10.1109/TSMCA.2009.2014645 expressions, speech, and multimodal information in emotion

1083-4427/$25.00 © 2009 IEEE


CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 727

recognition [4]. Cohen et al. [5], [6] considered temporal varia- attempts to extract significant components of facial expressions
tions in facial expressions, which are displayed in live video to through segmentation of the image. Because of the differences
recognize emotions. She proposed a new architecture of hidden in the regional profiles on an image, simple segmentation
Markov models to automatically segment and recognize facial algorithms, such as histogram-based thresholding techniques,
expressions. Gao et al. [13] presented a methodology for facial do not always yield good results. After conducting several
expression recognition from a single facial image using line- experiments, we concluded that for the segmentation of the
based caricatures. Lanitis et al. [23] proposed a novel technique mouth region, a color-sensitive segmentation algorithm is most
for the automatic interpretation and coding of face images using appropriate. Further, because of apparent nonuniformity in the
flexible models. Examples of other important works on the lip color profile, a fuzzy segmentation algorithm is preferred.
recognition of facial expression for conveying emotions include A color-sensitive FCM clustering algorithm [9] has, therefore,
[3], [8], [10], [24], [26], [34], [37], [38], and [44]. been selected for the segmentation of the mouth region.
This paper provides an alternative scheme for human emo- Segmentation of the eye regions, however, in most images
tion recognition from facial images, and its control, using fuzzy has successfully been performed by the traditional thresholding
logic. Audiovisual stimulus is used to excite the emotions of method. The hair region in human face can also easily be
subjects, and their facial expressions are recorded as video segmented by the thresholding technique. Segmentation of the
movie clips. The individual video frames are analyzed to seg- mouth and eye regions is required for the subsequent determina-
ment the facial images into regions of interest. Fuzzy C-means tion of MO and EO, respectively. Segmentation of the eyebrow
(FCM) clustering [1] is used for the segmentation of the facial region is equally useful in determining the length of EBC. The
images into three important regions containing mouth, eyes, details of the segmentation techniques of different regions are
and eyebrows. Next, a fuzzy reasoning algorithm is invoked presented below.
to map fuzzified attributes of the facial expressions into fuzzy
emotions. The exact emotion is extracted from fuzzified emo-
tions by a denormalization procedure similar to defuzzification A. Segmentation of the Mouth Region
(fuzzy decoding). The proposed scheme is both robust and Before segmenting the mouth region, we first represent
insensitive to noise because of the nonlinear mapping of image the image in the L ∗ a ∗ b space from its conventional
attributes to emotions in the fuzzy domain. Experimental results red–gree–blue (RGB) space. The L ∗ a ∗ b system has the
show that the detection accuracies of emotions for adult male, additional benefit of representing a perceptually uniform color
adult female, and children of 8–12 years are as high as 88%, space. It defines a uniform matrix space representation of
92%, and 96%, respectively, outperforming the percentage color so that a perceptual color difference is represented by
accuracies of the existing techniques [26], [43]. This paper also the Euclidean distance. The color information, however, is not
proposes a scheme for controlling emotion [36] by judiciously
adequate to identify the lip region. The position information
selecting appropriate audiovisual stimulus for presentation be-
of pixels together with their color would be a good feature
fore the subject. The selection of the audiovisual stimulus is
to segment the lip region from the face. The FCM clustering
undertaken using fuzzy logic. Experimental results show that
algorithm that we employ to detect the lip region is supplied
the proposed control scheme has good experimental accuracy
with both color and pixel-position information of the image.
and repeatability.
The FCM clustering algorithm is a well-known technique for
This paper is organized into eight sections. Section II pro-
unsupervised pattern recognition. However, its use in image
vides new techniques for the segmentation and localization of
segmentation in general and lip region segmentation in par-
important components in a human facial image. In Section III,
ticular is a novel area of research. A description of the FCM
a set of image attributes, including eye opening (EO), mouth
clustering algorithm can be found in books on fuzzy pattern
opening (MO), and the length of eyebrow constriction (EBC),
recognition (see, e.g., [1], [17], and [45]). In this paper, we
is determined online from the segmented images. In Section IV,
just demonstrate how to use FCM clustering in the present
we fuzzify the measurements of imaging attributes into three
application.
distinct fuzzy sets: HIGH, MEDIUM, and LOW; the principles
of the fuzzy relational scheme for emotion recognition are A pixel in this paper is described by five attributes: three
also discussed in this section. Experimental issues pertaining attributes of color information (L ∗ a ∗ b) and two attributes
to emotion recognition are presented in Section V. Validation of position information (x, y). The objective of the clustering
of the proposed scheme is undertaken in Section VI, where algorithm is to classify the set of 5-D data points into two
measures are taken to tune the membership distributions for classes/partitions—the lip region and the nonlip region. Initial
improving the performance of the overall system. A scheme for membership values are assigned to each 5-D pixel, such that the
emotion control, along with experimental issues, is covered in sum of the memberships in the two regions is equal to one. That
Section VII. Conclusions are drawn in Section VIII. is, for the kth pixel xk , we have

L(xk ) + NL(xk ) = 1 (1)


II. F ILTERING , S EGMENTATION , AND L OCALIZATION
where L(xk ) and NL(xk ) denote the membership of xk to fall
OF F ACIAL C OMPONENTS
in the lip and nonlip regions, respectively.
The identification of facial expressions by pixel-wise analy- Given the initial membership values of L(xk ) and NL(xk )
sis of images is both tedious and time consuming. This paper for k = 1 to n2 (assuming that the image is of size n × n), we
728 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

Fig. 3. Image after applying FCM clustering.

Fig. 1. Original face image.

Fig. 4. Measurement of MO from the dips in average intensity plot.

Fig. 2. Median-filtered image.

use the FCM algorithm to determine the cluster centers VL and


VNL of the lip and the nonlip regions as
Fig. 5. Synthetic eye template.
2 2

n 
n
VL = [L(xK )]m xK / [L(xk )]m (2) B. Segmentation of the Eye Region
k=1 k=1
The eye region in a monochrome image has a sharp contrast
2 2

n 
n to the rest of the face. Consequently, the thresholding method
VNL = [NL(xK )]m xK / [N L(xk )]m . (3) can be employed to segment the eye region from the image.
k=1 k=1 Images grabbed at poor illumination conditions have a very low
Expressions (2) and (3) provide centroidal measures of the lip average intensity value. Segmentation of the eye region in these
and nonlip clusters, evaluated over all data points xk for k = 1 cases is difficult because of the presence of dark eyebrows in the
to n2 . The parameter m (>1) is any real number that affects neighborhood of the eye region. To overcome this problem, we
the membership grade. The membership values of pixel xk in consider images grabbed under good illuminating conditions.
the image for the lip and nonlip regions are obtained from the After segmentation of the image, we need to localize the left
following formulas: and right eyes on the image. In this paper, we use a template-
matching scheme to localize the eyes. The eye template we
⎛ ⎞−1 used is similar to the template shown in Fig. 5. The template-
2
  matching scheme, taken from our previous works [2], [21],
L(xK ) = ⎝ ⎠
1/(m−1)
xk −vL 2 /xk − vj 2 (4)
j=1
attempts to minimize the Euclidean distance between a fuzzy
descriptor of the template and the fuzzy descriptor of the part
⎛ ⎞−1
 of the image where the template is located. Even when the
2
 
NL(xK ) = ⎝ ⎠
1/(m−1)
xk −vNL 2 /xk − vj 2 (5) template is not a part of the image, the nearest matched location
j=1 of the template in the image can be traced.

where Vj denotes the jth cluster center for j ∈ {L, NL}.


C. Localization of EBC Region
Determination of the cluster centers [by (2) and (3)] and
membership evaluation [by (4) and (5)] are repeated several In a facial image, eyebrows are the second darkest region
times following the FCM algorithm until the positions of the after the hair region. The hair region is easily segmented by
cluster centers do not further change. setting a very low threshold in the histogram-based thresholding
Fig. 1 presents a section of a facial image with a large algorithm. The eye regions are also segmented by thresholding.
MO. This image is passed through a median filter, and the A search for a dark narrow template can easily localize the
resulting image is shown in Fig. 2. Application of the FCM eyebrows. Note that the localization of the eyebrow is essential
algorithm to the image in Fig. 2 yields the image in Fig. 3. Fig. 4 for determining its length. This will be undertaken in the next
demonstrates the computation of MO (Section III). section.
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 729

III. D ETERMINATION OF F ACIAL A TTRIBUTES


In this section, we present a scheme for the measurements of
facial extracts such as MO, EO, and the length of EBC.

A. Determination of MO
After segmentation of the mouth region, we plot the aver-
age intensity profile against the MO. The dark region in the
segmented image represents the lip profile, whereas the white
regions embedded in the dark region indicate the teeth. Noisy
images, however, may include false white patches. Fig. 4, for
instance, includes a white patch on the lip region.
Determination of MO in a black and white image is easier
because of the presence of the white teeth. A plot of the average
intensity profile against the MO reveals that the curve has
several minima, out of which the first and third correspond
to the inner region of the top lip and the inner region of the
bottom lip, respectively. The difference between the preceding
two measurements along the Y-axis gives a measure of the MO.
An experimental instance of MO is shown in Fig. 4, where the
pixel count between the thick horizontal lines gives a measure
of MO. When no white band is detected in the mouth region,
MO is set to zero. When only two minima are observed in the
plot of average intensity, the gap between the two minima is the
measure of MO.

B. Determination of EO
After the localization of the eyes, the count of dark pixels
(intensity < 30) plus the count of white pixels (intensity > 225)
is plotted against the x-position. If the peak of this plot occurs
at x = a, then the ordinate at x = a provides a measure of the
Fig. 6. Determination of the EO.
EO (Fig. 6).

C. Determination of the Length of EBC


Constriction in the forehead region can be explained as a
collection of white and dark patches called hilly and valley
regions, respectively. The valley regions are usually darker than
the hilly regions. Usually, the width of the patches is around
10–15 pixels for a given facial image of 512 × 512 pixels.
Let Iav be the average intensity in a selected rectangular
profile on the forehead, and let Iij be the intensity of pixel
(i, j). To determine the length of EBC on the forehead region,
we scan for variation in intensity along the x-axis of the se-
lected rectangular region. The maximum x-width that includes
variation in intensity is defined as the length of EBC. The length
of the EBC has been measured in Fig. 7 by using the preceding
Fig. 7. Determination of EBC in the selected rectangular patch, identified by
principle. An algorithm for EBC is presented as follows. image segmentation.
1) Take a narrow strip over the eyebrow region with thick-
ness two-thirds of the width of the forehead, which is 3) For x-positions central to window-right-end, do the
determined by the maximum count of pixels along the following.
length of projections from the hairline edge to the top a) Select nine vertical lines in the window and compute
edges of the eyebrows. the average intensity on each line.
2) The length l of the strip is determined by identifying its b) Calculate the variance of the nine average intensity
intersection with the hair regions at both ends. Determine values.
the center of the strip, and select a window of x-length c) If the variance is below a threshold, stop. Else, shift
2l/3 symmetric with respect to the center. one pixel right.
730 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

4) Determine the total right shift.


5) Following a procedure similar to step 3, determine the
total left shift.
6) Compute length of EBC = total left shift +
total right shift.

IV. F UZZY R ELATIONAL M ODEL FOR


E MOTION D ETECTION
In this section, the encoding of facial attributes and their
Fig. 8. Structure of a fuzzy relational matrix.
mapping to the emotion space are described using Mamdani-
type implication relations. Let A(x), B(y), C(z), and D(w) be the membership distrib-
utions of linguistic variables x, y, z, and w belonging to A, B,
A. Fuzzification (Encoding) of Facial Attributes C, and D, respectively. Then, the membership distribution of
The measurements we obtain on MO, EO, and EBC are the clause “x is A and y is B and z is C” is given by t[A(x),
encoded into three distinct fuzzy sets: HIGH, LOW, and B(y), C(z)], where t denotes the fuzzy t-norm operator [29].
MODERATE. The typical membership functions [29] that we Using Mamdani-type implication operator, the relation between
have used in our simulation are presented below. For any the antecedent and consequent clauses for the given rule is
continuous feature x, we have described by

HIGH(x) = 1 − exp(−ax), a > 0 R(x, y, z; w) = Min [t (A(x), B(y), C(z)) , D(w)] . (6)
LOW(x) = exp(−bx), b > 0

Taking Min as the t-norm, we can rewrite the preceding expres-
MODERATE(x) = exp −(x − xmean )2 /2σ 2 sion as
where xmean and σ 2 are the mean and variance of the param- R(x, y, z; w) =Min [Min (A(x), B(y), C(z)) , D(w)]
eter x, respectively. =Min [A(x), B(y), C(z), D(w)] . (7)
For the best performance, we need to determine the opti-
mal values of a, b, and σ. Details of these are discussed in Now, given an unknown distribution of (A/ (x), B/ (y),
Section VI. C (z)), where A/ ≈ A, B/ ≈ B, and C/ ≈ C, we can evaluate
/

D/ (w) by the following fuzzy relational equation:


B. Fuzzy Relational Model for Emotion Detection
D/ (w) = Min (A/ (x), B/ (y), C/ (z) o R(x, y, z; w). (8)
Examination of a large facial database reveals that the degree
of a specific human emotion, such as happiness or anger, greatly For discrete systems, the relation R (x, y, z; w) is represented
depends on the degree of MO, EO, and length of EBC. The by a matrix (Fig. 8), where xi , yi , zi , and wi denote specific
following two sample rules describe the problem of mapping arguments (corresponding to the variables x, y, z, and w,
from the fuzzified measurement space of facial extracts to the respectively).
fuzzified emotion space. In our proposed application, the row index of the relational
matrix is represented by conjunctive sets of values of MO, EO,
Rule 1: IF (eye-opening is MODERATE) & and EBC. The column index of the relational matrix denotes
(mouth-opening is SMALL) & the possible values of six emotions: anxiety, disgust, fear,
(eyebrow-constriction is LARGE) happiness, sadness, and surprise.
THEN emotion is VERY-MUCH-DISGUSTED. For determining the emotion of a person, we define two
Rule 2: IF (eye-opening is LARGE) & vectors: fuzzy descriptor vector F and emotion vector M. The
(mouth-opening is SMALL / MODERATE) & structural forms of these two vectors are given as
(eyebrow-constriction is SMALL)
THEN emotion is VERY-MUCH-HAPPY. F = [S(eo) M(eo) L(eo) S(mo) M(mo) L(mo)
× S(ebc) M(ebc) L(ebc)] . (9)
Since each rule contains antecedent clauses of three fuzzy
variables, their conjunctive effect is taken into account to de- where S, M, and L stand for SMALL, MEDIUM, and LARGE,
termine the fuzzy relational matrix. The general formulation of respectively. We also have
a production rule with an antecedent clause of three linguistic
M = [VA(emotion) MA(emotion) N-SoA(emotion)
variables and one consequent clause of a single linguistic vari-
able is discussed below. Consider, for instance, the following VD(emotion)MD(emotion)N-SoD(emotion)
fuzzy rule. VAf(emotion) MAf(emotion) N-SoAf(emotion)
VH(emotion) MH(emotion) N-SoH(emotion)
If x is A and y is B and z is C VS(emotion) MS(mod) N-SoS(emotion)
Then w is D. VSr(emotion) MSr(emotion) N-SoSr(emotion)] (10)
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 731

where V, M, N-SO denote VERY, MODERATELY, and TABLE I


ASSESSMENT OF THE AROUSAL POTENTIAL OF SELECTED AUDIOVISUAL
NOT-SO, and A, D, Af, H, S, and Sr denote ANXIOUS, MOVIE CLIPS IN EXCITING DIFFERENT EMOTIONS
DISGUSTED, AFRAID, HAPPY, SAD, and SURPRISED,
respectively.
The relational equation used for the proposed system is
given by

M = F/ o RFM (11)

where RFM is the fuzzy relational matrix with the row and
column indices as described above, and the ith component of
the F/ vector is given by Min{A/ (xi ), B/ (yi ), C/ (zi )}, where
the variables xi , yi , zi ∈ {eo, mo, ebc}, and the fuzzy sets A/ ,
B/ , C/ ∈ {S, M, L} are determined by the premise of the ith
fuzzy rule.
Given an F/ vector and the relational matrix RFM , we can
easily compute the encoded emotion vector M using the preced-
ing relational equation. Finally, to determine the membership scene, and arousal starts when the object/scene significantly
of the emotions from their fuzzy memberships, we need to differs from the general expectation. The main difficulty to get
employ a decoding scheme [(12), shown at the bottom of someone surprised with a movie clip is that the clip should
the page], where w1 , w2 , and w3 denote the weights of the be long enough to prepare the background knowledge before a
respective graded memberships, which in the present context “strange” scene is presented. To eliminate possible errors in the
have (arbitrarily) been set to 0.33 each. selection of the stimulus due to background differences among
the subjects, it is reasonable to employ alternative means, rather
than audiovisual movie clips, to cause arousal of surprise. Our
V. E XPERIMENTS AND R ESULTS
experiments showed that an attempt at recognizing lost friends
The experiment is conducted in a laboratory environment, (usually schoolmates) from their current photographs causes
where illumination, sounds, and temperature are controlled arousal of surprise.
to maintain uniformity in experimental conditions. Most of To identify the right movie clips capable of exciting specific
the subjects of the experiments are students, young faculty emotions, we need to define a few parameters that would help
members, and family members of the faculties. The experiment indicate a consensus of the observers about the arousal of an
includes two sessions: a presentation session followed by a emotion.
face-monitoring session. In the presentation session, audiovi- We have the following.
sual clips from commercial films are projected on a screen in Oji,k Percentage level of excitation of emotion j by an ob-
front of individual subjects as a stimulus to excite their brain for server k using audiovisual clip i.
arousal of emotion. A computer-controlled pan-tilt-type high- Eji Average percentage score of excitation assigned to emo-
resolution camera is used for the online monitoring of facial tion j by n observers using clip i.
expressions of the subjects in the next phase. The grabbed σji Standard deviation of the percentage score assigned to
images of facial expressions are stored in a computer for feature emotion j by all the subjects using clip i.
analysis in the subsequent phase. n Total number of observers.
Experiments were undertaken over a period of two years Eji and σji are evaluated using the following expressions:
to identify the appropriate audiovisual movie clips that cause
arousal of six different emotions: anxiety, disgust, fear, happi- 
n

ness, sadness, and surprise. A questionnaire was prepared to Eji = Oji,k /n (13)
k=1

determine the consensus of the observers about the arousal of
√ 
n
the first five emotions using a given set of audiovisual clips. It σji = (Oji,k − Eji ) /n .
2
(14)
includes questions to a given observer on the percentage level k=1
of excitation of different emotions by a set of 60 audiovisual
movie clips. The independent responses of 50 observers were The emotion w for which Ewi = maxj {Eji } is the most
collected, and the results are summarized in Table I. Clearly, likely aroused emotion due to excitation by audiovisual clip i.
the percentages in each row total 100. The Ewi values are obtained for i = 1 to 60.
The arousal of surprise, however, requires a subject to We next select six audiovisual clips from the pool of
possess prior background information about an object or a 60 movie samples such that the selected movies best excite

VH(emotion).w1 + MH(emotion).w2 + N-SoH(emotion).w3


HAPPY(emotion) = (12)
w1 + w2 + w3
732 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

Fig. 9. Movie clips containing four frames (row-wise) used to excite anxiety, disgust, fear, happiness, and sadness.

six specific emotions. The selection was made by using the TABLE II
THEME OF THE MOVIE CLIPS CAUSING STIMULATION
average-to-standard-deviation ratio for competitive audiovisual OF S PECIFIC E MOTIONS
clips employed to excite the same emotion. The audiovisual
clip for which the average-to-standard-deviation ratio is the
largest is considered to be the most significant sample to excite
a desired emotion.
Fig. 9 presents the five most significant audiovisual movie
clips selected from the pool of 60 movies, where each clip
was the most successful to excite one of the five emotions.
Table II summarizes the theme of the selected movies. The
selected clips are presented before 300 people, and their facial
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 733

Fig. 10. Arousal of anxiety, disgust, fear, happiness, and sadness using the stimulator in Fig. 9 (female subject).

expressions are recorded. Figs. 10 and 11 show two represen- membership evaluation for the emotion clips given in Fig. 10
tative examples of one female and one male in the age group are presented in Table III.
22–25 years. After evaluation of the memberships, we determine the emo-
Image segmentation is used to segment the mouth region, tion vector M by using (11), and then employ decoding rule [see
eye region, and eyebrow region of individual frames for each (12)] to determine the membership of different emotions for the
clip. MO, EO, and length of EBC are then determined for the five clips. The emotion that comes up with the highest value is
individual frames of each clip. The averages of EO, MO, and regarded as the emotion of the individual clips. The preceding
EBC over all the frames under a recorded emotion clip are then analysis is repeated for 300 people including 100 children,
evaluated. The membership of EO, MO, and EBC in the three 100 adult males, and 100 adult females, and the results of
fuzzy sets (Low, Medium, and High) are then evaluated using emotion classification are presented in Tables IV–VI. Each
the membership functions given in Section IV. The results of of these tables shows six “aroused” emotions, whereas the
734 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

Fig. 11. Arousal of anxiety, disgust, fear, happiness, and sadness using the stimulator in Fig. 9 (male subject).

TABLE III
SUMMARY OF RESULTS OF MEMBERSHIP EVALUATION FOR THE EMOTIONS IN FIG. 10
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 735

TABLE IV
RESULTS OF PERCENTAGE CLASSIFICATION OF AROUSED EMOTIONS IN SIX CLASSES FOR ADULT MALES

TABLE V
RESULTS OF PERCENTAGE CLASSIFICATION OF AROUSED EMOTIONS IN SIX CLASSES FOR ADULT FEMALES

TABLE VI
RESULTS OF PERCENTAGE CLASSIFICATION OF AROUSED EMOTIONS IN SIX CLASSES FOR CHILDREN IN AGE GROUP 8–12 YEARS

“desired” emotion refers to the emotion tag of the most sig- concepts based on the deviation of its performance from that
nificant audiovisual sample. of the desired (ideal) system [18]. It has experimentally been
The experimental results obtained from Tables IV–VI reveal observed that the performance of the proposed system greatly
that the accuracy in the classification of emotion for adult male, depends on the parameters of the fuzzy encoders [28]. To
adult female, and children (8–12 years) are 88.2%, 92.2%, and determine optimal settings of the parameters, a scheme for the
96%, respectively. The classification accuracies obtained in this validation of the system’s performance is proposed in Fig. 12.
paper are better than the existing results on accuracy reported In Fig. 12, we tune the parameters a, b, xmean (or m), and
elsewhere [19], [20], [26], [43]. σ of the fuzzy encoders by a supervised learning algorithm, so
as to generate the desired emotion from given measurements of
the facial extract. The backpropagation algorithm has been em-
VI. V ALIDATION OF THE S YSTEM P ERFORMANCE
ployed to experimentally determine the parameters a, b, m, and
After a prototype design of an intelligent system is complete, σ. The feedforward neural network used for the realization of
we need to validate its performance. The term validation here the backpropagation algorithm has three layers with 26 neurons
refers to building the right system that truly resembles the in the hidden layer. The number of neurons in the input and
system intended to be built. In other words, validation refers to output layers are determined by the dimensions of F/ vector
the relative performance of the system that has been designed, and M vector, respectively. The root mean square error accu-
and suggests reformulation of the problem characteristics and racy of the algorithm was set to 0.001. For a sample space
736 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

cil Weight representing the influence of an


input with strength l on the ith emo-
tional state.
The unnormalized membership value of an emotional state i at
time t + 1 can be expressed as a function of the unnormalized
membership values of all possible emotional states j, and the
membership distribution of the input positive and negative
influences at time t, i.e.,
 
mi (t + 1) = wi,j · mj (t) + bi,k · POS-IN(strengthk , t)
∀j ∀k

− ci,l · NEG-IN(strengthl , t). (15)
∀l

The first term in the right-hand side of (15) accounts for


the cognitive transition of emotional states, which concerns
human memory, whereas the second and third terms indicate
the effect of external influence on the membership of the ith
emotional state. The weights wij of the cognitive memory are
considered time invariant. Since wij ’s are time invariant, con-
Fig. 12. Validation of the proposed system by tuning the parameters a, b, m, trolling the transition of emotional states can be accomplished
and σ of the encoders. by POS-IN(strengthk , t) and NEG-IN(strengthl , t).
The positive terms on the right side of (15) indicate that
of approximately 100 known emotions of different persons,
with a growth in mj and POS-IN(strengthk , t), mi (t + 1) also
the experiment is conducted, and the following values of the
increases. The negative sign of the third term signifies that with
parameters are found to yield the best results: a = 2.2, b = 1.9,
a growth in NEG-IN(strengthl , t), mi (t + 1) decreases.
xmean = 2.0, and σ = 0.17.
A person’s emotional state changes from happy to anxious
on a negative input, e.g., when he/she runs the risk of losing
VII. E MOTION T RANSITION AND I TS C ONTROL
something. An anxious person becomes sad when he suffers
The emotion of a person at a given time is determined by the a loss. A sad person becomes disgusted when he realizes
current state of his/her mind. The current state of one’s mind that he is not responsible for his loss/failure. In other words,
is mainly controlled by the positive and negative influences with increasing negative influence (neg), the human emotion
of input sensory data, including voice, video clips, ultrasonic undergoes a transition in the following order:
signals, and music. We propose a model of emotion transition neg neg neg
dynamics where the strength of an emotion at time t + 1 disgusted ←− sad ←− anxious ←− happy.
depends on the strength of all emotions of a subject at time t,
and the positive/negative influences that have been applied as Alternatively, with increasing positive influence (pos), the hu-
stimulation at time t. man emotion has a gradual transition from the disgusted state
to the happy state, as follows:
A. Model pos pos pos
disgusted −→ sad −→ anxious −→ happy.
We have the following.
mi (t) Positive unnormalized singleton mem- Combining the preceding two state transition schemes, we can
bership of the ith emotion at time t. represent emotion transitions by a graph (see Fig. 13).
[wij ] Weight matrix of dimension n × n, We have the following.
where wij denotes a cognitive M = [mi ] Unnormalized membership vec-
(memory-based) degree of transition tor of dimension n × 1, whose
from the ith to the jth emotional state, ith element denotes the unnor-
and is a signed finite real number. malized singleton membership
POS-IN(strengthk , t) Fuzzy membership distribution of an of emotion i at time t.
input stimulus with strength k to act as µ = [POS-IN(strengthk , t)] Positive influence membership
a positive influence at time t. vector of dimension m×1 whose
NEG-IN(inputl , t) Fuzzy membership distribution of an kth component denotes the fuzzy
input stimulus with strength l to act as membership of strength k of the
a negative influence at time t. input stimulus.
bik Weight representing the influence of an µ/ = [NEG-IN(inputl , t)] Negative influence membership
input with strength k on the ith emo- vector of dimension (m × 1),
tional state. whose lth component denotes
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 737

Fig. 13. Proposed emotion transition graph.

the fuzzy membership of Proof: Since µ = µ/ = 0, we can rewrite (16a) as


strength l of the input stimulus.
B = [bij ] n × m companion matrix to µ M(t + 1) = W.M(t).
vector.
Iterating t = 0 to (k − 1), we have
C = [cij ] n × m companion matrix to µ/
vector. M(1) = W.M(0)
Considering the emotion transition dynamics [see (15)] for M(2) = W.M(1) = W2 M(0)
i = 1 to n, we can represent the complete system of emotion :
transition in vector–matrix form as :
M(t + 1) = W.M(t) + Bµ − Cµ/ . (16a) M(k) = Wk M(0).

The weight matrix W in (16a) is given by Since the membership vector exhibits limit cyclic behavior after
every k-iterations, we have

M(k) = M(0)

which in turn requires Wk = I. 


Theorem 1 indicates that without external perturbation, the
cognitive memory helps in maintaining a recurrent relation
in emotional states under a restrictive selection of memory
weights, satisfying WK = I.
For steering the system states in Fig. 13 toward the happy
state, we need to provide positive influences in the state diagram
at any state. Similarly, for controlling the state transitions
toward the disgusted state from any state s, we submit a negative
influence at state s. This, however, demands a prerequisite of
controllability of the membership state vector M. The control-
lability of a given state vector M to a desired state is examined
To keep the membership vector M normalized, we use the by Theorem 2.
following scaling operation: Theorem 2: The necessary and sufficient condition for the
n state transition system to be controllable is that the controllabil-
MS (t + 1) = M(t + 1)/ Max {mi (t + 1)} . (16b) ity matrices
i=1
 
P = B WB W2 B, . . . , Wn−1 B
Normalization of memberships in [0, 1] is needed for con-  
venience of interpretation, but is not directly related to the Q = C WC W2 C, . . . , Wn−1 C
emotion control problem undertaken here. should have rank equal to n.
A control-theoretic representation of emotion transition dy- Proof: The proof follows from the test criterion of con-
namics [(16a) and (16b)] is given in Fig. 14, where the system trollability of linear systems [27]. 
has an inherent delayed positive feedback along with provision
for control with external input vectors µ and µ/ . C. Emotion Control by Mamdani’s Model

B. Properties of the Model In a closed-loop process control system [21], error is defined
as the difference of the set point (reference input) and the
In an autonomous system, the system states change with- process response, and the task of a controller is to gradually
out application of any control inputs. The emotion transition reduce the error toward zero. When the emotional state transi-
dynamics can be compared with an autonomous system with tion is regarded as a process, we define error as the qualitative
a setting of µ = µ/ = 0. The limit cyclic behavior of an au- difference between the desired and the current emotional states.
tonomous emotion transition dynamics is given in the following To quantitatively represent error, we attach an index to the
theorem. individual emotional states in Fig. 13, such that when the error
Theorem 1: The vector M (t + 1) in an autonomous emo- is positive (negative), we can guide the error toward zero by
tion transition dynamics with µ = µ/ = 0 exhibits limit cyclic applying a positive (negative) influence. One possible indexing
behavior after every k-iterations if W k = I. scheme that satisfies the above principle is given in Table VII.
738 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

Fig. 14. Emotion transition dynamics.


TABLE VII LARGE are fuzzy subsets of AMPLITUDE and STRENGTH,
INDEXING OF EMOTIONAL STATES
whereas POSITIVE and NEGATIVE are fuzzy subsets of the
universe SIGN. Let x, y, and z be fuzzy linguistic variables,
with x and y denoting error and z denoting positive/negative
influence. Let Ai and Ci be any singleton fuzzy subset of
{SMALL, MODERATE, LARGE, VERY-LARGE}, and let Bi
be a singleton subset of {POSITIVE, NEGATIVE}. The gen-
eral form of the ith fuzzy control rule is as follows.

We can now quantify error as the difference between the Rule Ri : If x is Ai and y is Bi Then z is Ci .
desired emotional index (DEI) and the current emotional index
(CEI). For example, if the desired emotion is happiness, and the Typical examples of the ith rule are given below.
current emotion is sadness, then the error is e = DEI − CEI =
4 − 2 = 2. Similarly, when the desired emotion is disgust, and Rule1: If error is SMALL and error is POSITIVE
the current emotion is happiness, error e = DEI − CEI = 1 − Then apply positive influence of SMALL strength.
4 = −3. Rule2: If error is MODERATE and error is NEGATIVE
Naturally, for the generation of control signals, we need to Then apply negative influence of MODERATE
consider both sign and magnitude of the error. To eliminate strength.
the effect of noise in controlling emotion, instead of directly
using the signed errors, we fuzzify the magnitude of errors The Mamdani-type implication relation for Rulei is now
into four fuzzy sets (i.e., SMALL, MODERATE, LARGE, and given by
VERY LARGE) and the sign of errors into two fuzzy sets (i.e.,
POSITIVE and NEGATIVE) using nonlinear (Gaussian-type) Ri (x, y; z) = Min [Min (Ai (x), Bi (y)) , Ci (z)] . (17)
membership functions. The nonlinearity of the membership
functions eliminates the small Gaussian noise (with zero mean Now, for a given distribution of A/ (x) and B/ (y), where A/ ≈
and small variance) over the actual measurement of error. Ai , and B/ ≈ Bi , we can evaluate
Further, to generate fuzzy control signals µ and µ/ , we need  
/
to represent the strength of positive and negative influences Ci (z) = Min A/ (x), B/ (y) o Ri (x, y; z). (18)
in four fuzzy sets (i.e., SMALL, MODERATE, LARGE, and
VERY LARGE) as well. Tables VIII and IX provide a list of To take the aggregation of all the n rules, we determine
membership functions used for fuzzy control. n
/
In this paper, parameter selection of the membership func- C/ (z) = Max Ci (z) . (19)
i=1
tions has been achieved by trial and error. To attain the best
performance of the controller, we considered 50 single-step Example 1: In this example, we illustrate the construction of
/
control instances and found that the settings in Table X gave Ri (x, y; z) and the evaluation of Ci (z). Given
the best performance for all the 50 instances.
Let AMPLITUDE and SIGN be two fuzzy universes of error, POSITIVE(error) = {1/0.2, 2/0.5, 3/0.6}
and let STRENGTH be a fuzzy universe of positive/negative SMALL(error) = {0/0.9, 1/0.1, 2/0.01, 3/0.005}
influences. Here, SMALL, MODERATE, LARGE, and VERY SMALL(pos-in) = {2/0.9, 10/0.6, 20/0.21, 30/0.01}.
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 739

TABLE VIII
MEMBERSHIP FUNCTIONS OF MAGNITUDE OF ERROR AND UNSIGNED STRENGTH OF POSITIVE/NEGATIVE INFLUENCE

TABLE IX
MEMBERSHIP FUNCTIONS OF SIGN OF ERROR

TABLE X
SELECTED PARAMETERS OF THE MEMBERSHIP FUNCTIONS

The relational matrix R (error, error; pos-in) can now be evalu- We can evaluate
ated by Mamdani’s implication function as follows:
SMALL/ (pos-in)
Min [POSITIVE(error), SMALL(error)]
= Min POSITIVE/ (error), SMALL/ (error)
= {(1, 0)/0.2, (1, 1)/0.1, (1, 2)/0.01, (1, 3)/0.005 ;
(2, 0)/0.5, (2, 1)/0.1, (2, 2)/0.01, (2, 3)/0.005 ; o R(error, error; pos-in)

(3, 0)/0.6, (3, 1)/0.1, (3, 2)/0.01, (3, 3)/0.005} . = [0.1 0.1 0.1 0.1 0.2 0.1 0.4 0.5 0.2 0.1 0.4 0.5]

Using (17), the R (error, error; pos-in) matrix is now obtained as o R(error, error; pos-in)

= [02/0.2 10/0.2 20/0.2 30/0.01].


pos-influence
e, e 02 10 20 30 To determine the strength of audiovisual movies to be se-
lected for presentation to the subject, we need to defuzzify
⎛ ⎞
1, 0 0.2 0.2 0.2 0.01 (decode) the control signal Ci (pos-in) or Ci (neg-in). Let
⎜ 1, 1 0.1 0.1 0.1 0.01 ⎟
⎜ ⎟ C(pos-in) = (x1 /C(x1 ), x2 /C(x2 ), . . . , xn /C(xn )}.
⎜ 1, 2 0.01 0.01 0.01 0.01 ⎟
⎜ ⎟
⎜ 1, 3 0.005 0.005 0.005 0.005 ⎟
⎜ ⎟ Then, centroidal-type defuzzification (decoding) [21] yields the
⎜ 2, 0 0.5 0.5 0.2 0.01 ⎟
⎜ ⎟ value of the control signal
⎜ 2, 1 0.1 0.1 0.1 0.01 ⎟
⎜ ⎟.
⎜ 2, 2 0.01 0.01 0.01 0.01 ⎟ 
n 
n
⎜ ⎟
⎜ 2, 3 0.005 0.005 0.005 0.005 ⎟ xdeffzy = C(xi ).xi / C(xi ).
⎜ ⎟
⎜ 3, 0 0.6 0.6 0.2 0.01 ⎟ i=1 i=1
⎜ ⎟
⎜ 3, 1 0.1 0.1 0.1 0.01 ⎟
⎝ ⎠ Example 2: Let
3, 2 0.01 0.01 0.01 0.01
3, 3 0.005 0.005 0.005 0.005 C(pos-in) = [1/0.2, 3/0.2, 6/0.2, 9/0.1].

Let us now consider the observed membership distribution of The defuzzification of the control signal by the center of
error to be positive and small as follows: gravity method is obtained as
1 × 0.2 + 3 × 0.2 + 6 × 0.2 + 9 × 0.1
POSITIVE/ (error) = {1/0.1, 2/0.5, 3/0.7} xdeffzy =
0.2 + 0.2 + 0.2 + 0.1
SMALL/ (error) = {0/0.2, l/0.1, 2/0.4, 3/0.5}. = 4.14.
740 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

Fig. 15. Complete scheme of Mamdani-type emotion control.

D. Architecture of the Proposed Emotion Control Scheme The decoding units are required to defuzzify the control
signals µ and µ/ . The decoding process yields the absolute
A complete scheme of the Mamdani-type emotion control is
strength of audiovisual movies in the range [−a, +b], where in-
presented in Fig. 15. The scheme includes an emotion transition
tegers a and b are determined by system resources, as discussed
dynamics with provision for control inputs µ and µ/ , and a
in Section V. The decoding process thus selects the audiovisual
fuzzy controller to generate the necessary control signals. The
movie of appropriate strength for presentation to the subjects.
fuzzy controller compares DEI and CEI, and their difference
Note that there are two fuzzy decoders in Fig. 15. When the
is passed through the AMPLITUDE and SIGN fuzzy encoder.
error is positive, only fuzzy decoder1 is used. On the other hand,
A fuzzy relational approach to the automatic reasoning intro-
when the error is negative, fuzzy decoder 2 is used.
duced earlier in this paper is then employed to determine the
/
membership distribution of Ci (pos-in) and Ci (neg-in) using
E. Experiments and Results
the ith fired rule. The MAX units determine the maximum
/
of Ci (pos-in) and Ci (neg-in) that corresponds to all the fired The proposed architecture (Fig. 15) was studied on
rules, resulting in µ and µ/ , which are supplied as control 300 individuals at Jadavpur University. The experiment began
inputs to the emotion transition dynamics. The generation of with 100 audiovisual movies labeled with a positive/negative
control signals µ and µ/ are continued until the next emo- integer in [−a, +b], which represents the strength of positive/
tional index (NEI) is equal to the DEI. When NEI is equal to negative external stimulus. The labeling was done by a group
DEI, the control switches S and S/ , which were closed since of 50 volunteers who assigned a score in [−a, +b]. The average
startup, are opened, as further transition of emotion is no longer of the scores assigned to an audiovisual movie stimulus is
required. rounded-off to the nearest integer and used as its label. When
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 741

VIII. C ONCLUSION
The merits of the proposed scheme for emotion detection lie
in the segmentation of the mouth region by FCM clustering
and determination of the MO from the minima in average-
pixel intensity plot of the mouth region. The proposed EO
determination also adds value to the emotion detection system.
The fuzzy relational approach to emotion detection from facial
feature space to emotion space has a high classification accu-
racy of around 90%, which is better than other reported results.
Because of its good classification accuracy, the proposed emo-
tion detection scheme is expected to have applications in next-
generation human–machine interactive systems.
The ranking of audiovisual stimulus considered in this paper
also provides a new approach to determining the best movies to
excite specific emotions.
The existing emotion detection methods hardly consider the
Fig. 16. Submission of audiovisual stimulus of strength 28, 16, and 8 for effect of near-past stimulation on the current arousal of emo-
controlling the emotion of a subject from the disgusted state (leftmost) to a tion. To overcome this problem, we submitted an audiovisual
final happy state through sad and anxious states (in that order). stimulus of relaxation before submission of any other movie
clip to excite emotions. A state transition in emotion by an
the variance of the assigned scores for a movie is above a
audiovisual movie thus always occurs at a relaxed state of mind,
selected small threshold (≈ 1.8), we drop it from the list. In
giving full effect of the current stimulus on the excitatory sub-
this manner, we selected only 60 movies, dropping 40 movie
system of the brain and causing arousal of the desired emotion
samples. This signifies that the scores obtained from 50 volun-
with its full manifestation on the facial expression. Feature
teers are close enough for the selected 60 audiovisual movies,
extraction from the face becomes easy when the manifestation
which indicates good accuracy of the stimulus.
of facial expression truly resembles the aroused emotion.
The current emotion of a subject is detected by the emotion
An important aspect of this paper is the design of an emotion
recognition scheme presented in the previous sections. The
control scheme. The accuracy of the control scheme ensures
desired emotion is randomly selected for a given subject. The
convergence of the control algorithm with a zero error, and
control scheme outlined in Section IV is then invoked. When
repeatability ensures the right selection of audiovisual stimulus.
fuzzy decoder 1 or 2 (Fig. 15) generates a signed score, the
The proposed scheme of emotion recognition and control
nearest available average-scored audiovisual movie is selected
can be applied in system design for two different problem
for presentation to the subject. Note that when error e is equal
domains. First, it can serve as an intelligent layer in the next-
to ±m (m <= 3), we need to select a sequence of at least
generation human–machine interactive system. Such a system
m audiovisual movies until NEI becomes equal to DEI. An
would have extensive applications in the frontier technology of
experimental instance of emotion control is shown in Fig. 16,
pervasive and ubiquitous computing [42]. Second, the emotion
where the current emotion of the subject at time t = 0 is
monitoring and control scheme would be useful for psycho-
disgust, and the desired emotion is happiness. This requires
logical counseling and therapeutic applications. The pioneering
three state transitions in emotions, which can be undertaken
works on the “structure of emotion” by Gordon [14] and the
by presenting three audiovisual movies of strength +28 units,
“emotional control of cognition” by Simon [36] would find
+16 units, and +08 units, in succession, to the subject. Here,
a new direction with the proposed automation for emotion
DEI = 3, and CEI = 0; therefore, error e = DEI − CEI = 3 >
recognition and control.
0. Since the error is positive, we apply positive instances of
suitable strength as decided by the fuzzy controller. If the error
were negative, then the fuzzy controller would have selected ACKNOWLEDGMENT
audiovisual stimuli of negative strength.
Two interesting points of the experiment include 1) good The authors would like to thank the anonymous reviewers for
experimental accuracy and 2) repeatability. Experimental ac- their comments.
curacy ensures that we could always control the error to zero.
Repeatability ensures that for the same subject and the same R EFERENCES
pair of current and desired emotional states, the selected set [1] J. C. Bezdek, “Fuzzy mathematics in pattern classification,” Ph.D. disser-
of audiovisual movies is unique. Robustness of the control tation, Appl. Math. Center, Cornell Univ., Ithaca, NY, 1973.
algorithm is thus established. In Fig. 16, the widths of the [2] B. Biswas, A. K. Mukherjee, and A. Konar, “Matching of digital images
using fuzzy logic,” AMSE Publication, vol. 35, no. 2, pp. 7–11, 1995.
control pulses are 3, 2, and 2 s. At time t = 0, the error [3] M. T. Black and Y. Yacoob, “Recognizing facial expressions in image
is large and positive. Therefore, the control signal generated sequences using local parameterized models of image motion,” Int. J.
has a positive strength of long duration. Then, with a gradual Comput. Vis., vol. 25, no. 1, pp. 23–48, Oct. 1997.
[4] C. Busso and S. Narayanan, “Interaction between speech and facial ges-
decrease in error, the strength of the control signal and its tures in emotional utterances: A single subject study,” IEEE Trans. Audio,
duration decrease. Speech Language Process., vol. 15, no. 8, pp. 2331–2347, Nov. 2007.
742 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 39, NO. 4, JULY 2009

[5] I. Cohen, “Facial expression recognition from video sequences,” M.S. [31] R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional intel-
thesis, Univ. Illinois Urbana-Champaign, Dept. Elect. Eng., Urbana, IL, ligence: Analysis of affective psychological states,” IEEE Trans. Pattern
2000. Anal. Mach. Intell., vol. 23, no. 10, pp. 1175–1191, Oct. 2001.
[6] I. Cohen, N. Sebe, A. Garg, L. S. Chen, and T. S. Huang, “Facial ex- [32] P. Rani, N. Sarkar, and J. Adams, “Anxiety-based affective communi-
pression recognition from video sequences: Temporal and static mod- cation for implicit human–machine interaction,” Adv. Eng. Inf., vol. 21,
eling,” Comput. Vis. Image Underst., vol. 91, no. 1/2, pp. 160–187, no. 3, pp. 323–334, Jul. 2007.
Jul. 2003. [33] P. Rani, N. Sarkar, C. Smith, and L. Kirby, “Anxiety detecting ro-
[7] C. Conati, “Probabilistic assessment of user’s emotions in educational botic systems—Towards implicit human–robot collaboration,” Robotica,
games,” J. Appl. Artif. Intell., Special Issue Merging Cognition Affect vol. 22, no. 1, pp. 83–93, 2004.
HCT, vol. 16, no. 7/8, pp. 555–575, Aug. 2002. [34] M. Rosenblum, Y. Yacoob, and L. Davis, “Human expression recognition
[8] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, from motion using a radial basis function network architecture,” IEEE
“Classifying facial actions,” IEEE Trans. Pattern Anal. Mach. Intell., Trans. Neural Netw., vol. 7, no. 5, pp. 1121–1138, Sep. 1996.
vol. 21, no. 10, pp. 974–989, Oct. 1999. [35] J. Scheirer, R. Fernadez, J. Klein, and R. Picard, “Frustrating the user
[9] P. Ekman and W. V. Friesen, Unmasking the Face: A Guide to Recogniz- on purpose: A step toward building an affective computer,” Interact.
ing Emotions From Facial Clues. Englewood Cliffs, NJ: Prentice-Hall, Comput., vol. 14, no. 2, pp. 93–118, Feb. 2002.
1975. [36] H. Simon, Motivational and Emotional Control of Cognition, Models of
[10] I. A. Essa and A. P. Pentland, “Coding, analysis, interpretation and recog- Thought. New Haven, CT: Yale Univ. Press, 1979, pp. 29–38.
nition of facial expressions,” IEEE Trans. Pattern Anal. Mach. Intell., [37] D. Terzopoulos and K. Waters, “Analysis and synthesis of facial image
vol. 19, no. 7, pp. 757–763, Jul. 1997. sequences using physical and anatomical models,” IEEE Trans. Pattern
[11] W. A. Fellenz, J. G. Taylor, R. Cowie, E. Douglas-Cowie, F. Piat, Anal. Mach. Intell., vol. 15, no. 6, pp. 569–579, Jun. 1993.
S. Kollias, C. Orovas, and B. Apolloni, “On emotion recognition of faces [38] Y. Tian, T. Kanade, and J. Cohn, “Recognizing action units for facial
and of speech using neural networks, fuzzy logic and the ASSESS sys- expression analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23,
tems,” in Proc. IEEE -INNS-ENNS Int. Joint Conf. Neural Netw., 2000, no. 2, pp. 97–115, Feb. 2001.
pp. 93–98. [39] N. Ueki, S. Morishima, and H. Harashima, “Expression analysis/synthesis
[12] J. M. Fernandez-Dols, H. Wallbotl, and F. Sanchez, “Emotion cate- system based on emotion space constructed by multilayered neural net-
gory accessibility and the decoding of emotion from facial expres- work,” Syst. Comput. Jpn., vol. 25, no. 13, pp. 95–103, 1994.
sion and context,” J. Nonverbal Behav., vol. 15, no. 2, pp. 107–123, [40] O. A. Uwechue and S. A. Pandya, Human Face Recognition Using Third-
Jun. 1991. Order Synthetic Neural Networks. Boston, MA: Kluwer, 1997.
[13] Y. Gao, M. K. H. Leung, S. C. Hui, and M. W. Tananda, “Facial expression [41] P. Vanger, R. Honlinger, and H. Haykin, “Applications of synergetic in
recognition from line-based caricatures,” IEEE Trans. Syst., Man, Cybern. decoding facial expressions of emotions,” in Proc. Int. Workshop Autom.
A, Syst., Humans, vol. 33, no. 3, pp. 407–412, May 2003. Face Gesture Recog., Zurich, Switzerland, 1995, pp. 24–29.
[14] R. N. Gordon, The Structure of Emotions: Investigations in Cognitive [42] A. Vasilakos and W. Pedrycz, Ambient Intelligence, Wireless Net-
Philosophy, ser. Cambridge Studies in Philosophy. Cambridge, U.K.: working and Ubiquitous Computing. Norwood, MA: Artech House,
Cambridge Univ. Press, 1990. Jun. 2006.
[15] K. Izumitani, T. Mikami, and K. Inoue, “A model of expression grade for [43] Y. Yacoob and L. Davis, “Computing spatio-temporal representations of
face graphs using fuzzy integral,” Syst. Control, vol. 28, no. 10, pp. 590– human faces,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
596, 1984. Recog., Jun. 1994, pp. 70–75.
[16] F. Kawakami, S. Morishima, H. Yamada, and H. Harashima, “Con- [44] Y. Yacoob and L. Davis, “Recognizing human facial expression from long
struction of 3-D emotion space using neural network,” in Proc. 3rd image sequences using optical flow,” IEEE Trans. Pattern Anal. Mach.
Int. Conf. Fuzzy Logic, Neural Nets Soft Comput., Iizuka, Japan, 1994, Intell., vol. 18, no. 6, pp. 636–642, Jun. 1996.
pp. 309–310. [45] H. Yamada, “Visual information for categorizing facial expression of
[17] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applica- emotion,” Appl. Cogn. Psychol., vol. 7, no. 3, pp. 257–270, 1993.
tions. Englewood Cliffs, NJ: Prentice-Hall, 1995. [46] Z. Zeng, Y. Fu, G. I. Roisman, Z. Wen, Y. Hu, and T. S. Huang, “Sponta-
[18] H. Kobayashi and F. Hara, “The recognition of basic facial expressions by neous emotional facial expression detection,” J. Multimedia, vol. 1, no. 5,
neural network,” Trans. Soc. Instrum. Contr. Eng., vol. 29, no. 1, pp. 112– pp. 1–8, Aug. 2006.
118, 1993.
[19] H. Kobayashi and F. Hara, “Measurement of the strength of six basic facial
expressions by neural network,” Trans. Jpn. Soc. Mech. Eng. (C), vol. 59,
no. 567, pp. 177–183, 1993.
[20] H. Kobayashi and F. Hara, “Recognition of mixed facial expressions
by neural network,” Trans. Jpn. Soc. Mech. Eng. (C), vol. 59, no. 567,
pp. 184–189, 1993.
[21] A. Konar, Computational Intelligence: Principles, Techniques and Appli-
cations. Heidelberg, Germany: Springer-Verlag, 2005.
[22] A. F. Kramer, E. J. Sirevaag, and R. Braune, “A psycho-physiological
assessment of operator workload during simulated flight missions,” Hum.
Factors, vol. 29, no. 2, pp. 145–160, Apr. 1987.
[23] A. Lanitis, C. J. Taylor, and T. F. Cootes, “Automatic interpretation and
coding of face images using flexible models,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 19, no. 7, pp. 743–756, Jul. 1997. Aruna Chakraborty received the M.A. degree in
[24] H. Li, P. Roivainen, and R. Forchheimer, “3D motion estimation in model- cognitive science and the Ph.D. degree on emotional
based facial image coding,” IEEE Trans. Pattern Anal. Mach. Intell., intelligence and human–computer interactions from
vol. 15, no. 6, pp. 545–555, Jun. 1993. Jadavpur University, Calcutta, India, in 2000 and
[25] X. Li and Q. Ji, “Active affective state detection and user assistance with 2005, respectively.
dynamic Bayesian networks,” IEEE Trans. Syst., Man, Cybern. A, Syst., She is currently an Assistant Professor with the
Humans, vol. 35, no. 1, pp. 93–105, Jan. 2005. Department of Computer Science and Engineering,
[26] K. Mase, “Recognition of facial expression from optical flow,” Proc. St. Thomas’ College of Engineering and Technology,
IEICE Trans., Special Issue Comput. Vis. Appl., vol. 74, no. 10, pp. 3474– Calcutta. She is also a Visiting Faculty with Jadavpur
3483, 1991. University, where she offers graduate-level courses
[27] K. Ogata, Modern Control Engineering. Englewood Cliffs, NJ: Prentice- on intelligent automation and robotics, and cognitive
Hall, 1990. science. She is writing a book with her teacher A. Konar on Emotional
[28] W. Pedrycz and J. Valente de Oliveira, “A development of fuzzy encoding Intelligence: A Cybernetic Approach, which is shortly to appear from Springer,
and decoding through fuzzy clustering,” IEEE Trans. Instrum. Meas., Heidelberg, 2009. She serves as an Editor to the International Journal of
vol. 57, no. 4, pp. 829–837, Apr. 2008. Artificial Intelligence and Soft Computing, Inderscience, U.K. Her current
[29] W. Pedrycz and F. Gomide, An Introduction to Fuzzy Sets: Analysis and research interest includes artificial intelligence, emotion modeling, and their
Design. Cambridge, MA: MIT Press, 1998. applications in next-generation human–machine interactive systems. She is a
[30] R. Picard, Affective Computing. Cambridge, MA: MIT Press, 1997. nature lover, and loves music and painting.
CHAKRABORTY et al.: EMOTION RECOGNITION FROM FACIAL EXPRESSIONS AND ITS CONTROL 743

Amit Konar (M’97) received the B.E. degree from Amita Chatterjee received the Ph.D. degree on
Bengal Engineering and Science University (B.E. “The Problems of Counterfactual Conditionals” from
College), Howrah, India, in 1983 and the M.E. Tel the University of Calcutta, West Bengal, India.
E, M. Phil., and Ph.D. (Engineering) degrees from He is currently a Professor of philosophy and the
Jadavpur University, Calcutta, India, in 1985, 1988, Coordinator of the Center for Cognitive Science,
and 1994, respectively. Jadavpur University, Calcutta, India. She is continu-
In 2006, he was a Visiting Professor with the ing her personal research and supervising Ph.D. and
University of Missouri, St. Louis. He is currently M. Phil dissertations for the past 20 six years. Books
a Professor with the Department of Electronics and authored and edited by her include Understanding
Tele-communication Engineering (ETCE), Jadavpur Vagueness (1994), Perspectives on Consciousness
University, where he is the Founding Coordinator of (2003), and Philosophical Concepts Relevant to Sci-
the M.Tech. program on intelligent automation and robotics. He has supervised ences, vol. 1 (2006), vol. 2 (2008). She has contributed articles in national and
ten Ph.D. theses. He has around 200 publications in international journal and international refereed journals and anthologies of repute. She is on the editorial
conference proceedings. He is the author of six books, including two popular board of the Indian Philosophical quarterly and the International Journal of
texts Artificial Intelligence and Soft Computing (CRC Press, 2000) and Compu- Artificial Intelligence and Soft Computing. Her areas of interest are logic, ana-
tational Intelligence: Principles, Techniques and Applications (Springer, 2005). lytical philosophy, philosophy of mind, and cognitive science. She is currently
He serves as the Editor-in-Chief of the International Journal of Artificial engaged in research on inconsistency-tolerant logics, human reasoning ability,
Intelligence and Soft Computing. His research areas include the study of com- consciousness studies, and modeling of perception and emotion.
putational intelligence algorithms and their applications to the entire domain of
electrical engineering and computer science. Specifically, he worked on fuzzy
sets and logic, neurocomputing, evolutionary algorithms, Dempster–Shafer
theory, and Kalman filtering, and applied the principles of computational
intelligence in image understanding, VLSI design, mobile robotics, and pattern
recognition.
Dr. Konar is a member of the editorial board of five other international
journals. He was the recipient of All India Council for Technical Education
(AICTE)-accredited 1997–2000 Career Award for Young Teachers for his
significant contribution in teaching and research.

Uday Kumar Chakraborty received the Ph.D. de-


gree from Jadavpur University, India for his work on
stochastic models of genetic algorithms.
He held positions with the CAD Center,
Calcutta, India; CMC Limited (Calcutta and
London); Jadavpur University, Calcutta, India; and
the German National Research Center for Computer
Science (GMD), Bonn, Germany. He is currently
an Associate Professor of computer science with
the University of Missouri, St. Louis. His research
interests include evolutionary computation, soft
computing, scheduling, and computer graphics. He is (co)author/editor of
three books and 90 articles in journals and conference proceedings. He is
an Area Editor of New Mathematics & Natural Computation and an Editor
of the Journal of Computing and Information Technology. He serves on the
editorial boards of three other journals. He has guest edited special issues
on evolutionary computation of many computer science journals and has
served as track chair or program committee member of numerous international
conferences.

You might also like