Professional Documents
Culture Documents
3518
and the top and left image border was (24 + r)px. The
image histogram was equalized to account for changes Table 2. Comparison of best performing
in lighting condition. configurations by feature descriptor.
Formulating the facial feature descriptor in [6] in
Feature Strategy n r Accuracy
terms of Ri yielded the DCT descriptor. The frequency
components in each region Ri were computed using the DCT per-class 96 6 0.844
LBP per-class 96 6 0.859
DCT and ordered by applying a zig-zag scan. The lower Gabor grid 144 4 0.862
and higher frequency components were discarded by
dropping the first k1 and collecting the next k2 coeffi- fi . AdaBoost was used to select the n most discrimi-
cients into the block features fi = (fik1 +1 , . . . , fik1 +k2 ). nate hjt , which were mapped back to pixel locations to
To balance the individual regions impact on the overall obtain a list of key points.
classification result, the block vectors were normalized We trained AdaBoost in two separate ways: In per-
to kfi k2 = 1. The feature vector FDCT was formed class selection, positive samples were chosen from the
by concatenating the block-features fi . Throughout the target expression, while negative samples were ran-
experiments we used k1 = 1 and k2 = 10. domly selected from every other class. This emphasised
Following [12], the second feature descriptor em- on features that discriminate one from all the other ex-
ploys uniform LBP. Input images were processed by the pressions. Expressive selection drew negative samples
LBPu2 operator with R = 1 and P = 8. The feature from the Neutral class and selected positives from
vector FLBP is the concatenation of histograms of la- all remaining classes. Doing so emphasised on fea-
bels in the regions Ri . Note that [12] uses a LBPu2 op- tures that are useful to discriminate expressive from
erator with P = 8 and R = 2, which focuses on larger non-expressive faces. We expected per-class selection
scale structures. However, since in [12] facial images to perform better than expressive selection.
measure 110 150px, this constitutes only a minor de-
viation. The use of variable sized non-square regions in 3.3 Classification
[12] establishes a more substantial difference.
To obtain the Gabor descriptor, input images were We used a 7-way forced choice to classify samples.
processed by a filter bank of 8 orientations and 5 spa- For each expression, a third degree polynomial ker-
tial frequencies. For each of the complex response im- nel SVMs discriminated the one expression from every
ages Gmn , the other category. The class of a sample was determined
i
P regions energy content was computed by choosing the classifier that produced largest confi-
as Emn = (x,y)Ri kGmn (x, y)k and then collected
into block feature vectors fi . Similar to the DCT de- dence. SVM parameters were found by performing a
scriptor, the feature vector FGabor was obtained by con- grid search and choosing the C and with highest ac-
catenating the block features after normalisation. This curacy in a 5-fold cross validation on the training set.
descriptor is fundamentally different from the one used Each class and feature selection method was allowed to
in [9]: Instead of considering the energy content of pixel produce different parameters.
regions, their descriptor is built by selecting individual
filter responses using AdaBoost. Given a pi , the result- 4 Experimental Results
ing feature vector may contain a single filter response
at that point, whereas our descriptor always includes all We evaluated different key point selection methods
responses at that location. using Ri with r = 4 up to r = 12. The number of
regions varied between n = 5 and n = 144 for ex-
3.2 Key Point Selection pressive and per-class selection and was defined by r
when using the grid based approach. Other feature de-
We employed several strategies to select regions Ri . scriptor parameters were not varied. Table 2 lists the
A trivial solution was to place the pi on a regular grid to best performing configuration for each feature extrac-
cover the whole facial area in a way that the regions do tion method. It can be seen that Gabor features show
not overlap. Doing so resulted in the well known block best recognition performance, while the DCT descrip-
based approaches used as baseline in [12] and [9]. A tor performs worst with a 1.8% difference in mean ac-
second approach utilized AdaBoost to select a list of curacy. However, the DCT descriptor has a much lower
discriminative regions. Initially, every pixel of the input dimensionality and is faster to compute than both the
image was considered to be the region center for fea- Gabor and LBP descriptors.
ture extraction. The descriptor F consisted of W H As expected, per-class key point selection is superior
sub-descriptors fi . With hj denoting linear SVM clas- to expressive selection. Consistent with [12], boosted
sifiers, we associated classifiers hji (f ) = hj (fi ) to the LBP features outperform the grid based approach, al-
3519
Table 3. Confusion matrix of the Gabor
feature descriptor.
Surprise
Sadness
Disgust
Neutral
Anger
Fear
Joy
Ang. 0.44 0.16 0.06 0.03 0.05 0.08 0.06
Dis. 0.19 0.41 0.08 0.06 0.06 0.16 0.05 Figure 1. Commonly misclassified im-
Fear 0.05 0.08 0.43 0.03 0.07 0.10 0.19 ages.
Joy 0.02 0.06 0.03 0.72 0.03 0.04 0.02
Neu. 0.12 0.07 0.06 0.05 0.64 0.13 0.07 cial expression recognition. Feature descriptors based
Sad. 0.11 0.18 0.13 0.08 0.09 0.42 0.05
Sur. 0.06 0.04 0.21 0.03 0.07 0.06 0.56 on the DCT, LBP, and Gabor filters were collectively
formulated in terms of regions around key points. Sev-
though not by such a large margin as observed in their eral strategies to find an optimal selection of key points
study. This might be attributed to the differences in the have been explored. The DCT and LBP based descrip-
feature extraction methods. The Gabor descriptor out- tors have been shown to benefit from key point selec-
performs both the DCT and the LBP descriptor, though tion, while the Gabor feature performed best when re-
surprisingly the best result is not achieved using boosted gions were uniformly distributed.
features, but by the grid based approach. Because we
observed that the Gabor descriptor yielded higher accu- References
racy when using smaller regions, we extended the ex-
periments to include regions with r = 1 and r = 0 (i.e. [1] T. Banziger and K. R. Scherer. Introducing the
only the filter responses). Still, grid based key point Geneva Multimodal Emotion Portrayal (GEMEP) Cor-
pus. Blueprint for affective computing: A sourcebook,
selection yielded highest mean accuracy. This directly pages 271294, 2010.
contradicts with the findings in [9], where AdaSVMs [2] J. Cockburn, M. Bartlett, J. Tanaka, J. Movellan,
performed considerably better than a traditional Gabor M. Pierce, and R. Schultz. SmileMaze: A Tutoring
System in Real-Time Facial Expression Perception and
descriptor. This result may be attributed by the differ- Production in Children with Autism Spectrum Disorder.
ent approach in feature selection: In [9] individual fea- In FGR08, pages 678986, 2008.
ture responses were selected, while we considered all [3] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Vot-
sis, S. Kollias, W. Fellenz, and J. G. Taylor. Emotion
responses given a region Ri . recognition in human-computer interaction. Signal Pro-
Table 3 shows the confusion matrix for the Gabor cessing Magazine, 18(1):3280, January 2001.
feature descriptor. Worst recognition performance is [4] C. Darwin. The Expression of the Emotions in Man
observed with disgust, sadness, fear and anger. Dis- and Animals. Harper Perennial, anniversary edition,
1872/2009.
criminating anger from disgust, disgust from sadness [5] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Static fa-
and fear from surprise produces the most mistakes. Joy cial expression analysis in tough conditions: Data, eval-
and neutral on the other hand are recognized with high uation protocol and benchmark. In ICCV Workshops,
pages 2106 2112, Nov. 2011.
confidence. Figure 1 shows commonly misclassified [6] H. K. Ekenel. A Robust Face Recognition Algorithm
images. From left to right images are tagged anger, fear, for Real-World Applications. PhD thesis, Universitat
anger and neutral and were classified as neutral, sad- Karlsruhe (TH), Fakultat fur Informatik, February 2009.
[7] P. Ekman. Basic emotions. In Handbook of cognition
ness, surprise and sadness, respectively. Some of these and emotion, volume 98, chapter 3, pages 4560. John
images, e.g. the second from left, are difficult to classify Wiley & Sons, 1999.
[8] T. Kanade, J. Cohn, and Y. Tian. Comprehensive
even for humans. Others show features shared between Database for Facial Expression Analysis. In FGR00,
expressions, such as wide opened eyes and raised inner pages 4653, 2000.
eye brows. [9] G. Littlewort, M. S. Bartlett, I. R. Fasel, J. Chenu,
T. Kanda, H. Ishiguro, and J. R. Movellan. Towards
social robots: Automatic evaluation of human-robot in-
5 Conclusion teraction by face detection and expression classification.
In NIPS, 2003.
[10] G. C. Littlewort, M. S. Bartlett, L. P. Salamanca, and
We presented a novel database compiled from web J. Reilly. Automated measurement of childrens facial
images containing 4761 labeled faces of male and fe- expressions during problem solving tasks. In FGR11,
pages 3035, 2011.
male subjects of different ethnicities and age groups. [11] R. W. Picard. Measuring affect in the wild. In ACII11,
Variations of expression intensity, head pose, and light- pages 33, Berlin, Heidelberg, 2011. Springer-Verlag.
ing conditions pose a new challenge for facial expres- [12] C. Shan, S. Gong, and P. W. McOwan. Facial expres-
sion recognition based on Local Binary Patterns: A
sion recognition systems. comprehensive study. Image and Vision Computing,
We furthermore developed a modular system for fa- 27(6):803816, 2009.
3520