You are on page 1of 9

Facial Expression Recognition in Static Images

Ting-Yen Wang and Ting-You Wang


Advised by Jiun-Hung Chen
Abstract
With the advent of the Viola-Jones face detection algorithm, the computer vision community has
a pretty good method for detecting faces Ho!ever, currently, there is actually very little
published research dealing !ith detecting facial e"pressions on still images, as most publications
are focused on detecting facial e"pressions in video images This paper is focused on our three-
!ee# long research pro$ect on facial e"pression detection on still images using different
combinations of image processing methods and machine learners The t!o !ays !e processed
images !ere using the ra! pi"els and eigenfaces% t!o different machine learners, &-'earest
'eighbors and (upport Vector )achines, classified the processed images *ur results indicate
that the detection of facial e"pressions on still images appears to be possible, as using ra!
images !ith (upport Vector )achines turned out some very promising results that should lead to
further research and development
Introduction
There have been many advances in face detection% ho!ever, the area of e"pression detection is
still in its early stages There has been a great deal of !or# done in this area, and even
applications of it +or e"ample, (ony cameras have their ,(mile -etection. that is supposed to
detect !hen a person in the image is smiling /http011!!!gadgetbbcom123341321251sony-dsc-
t633-first-camera-!ith-smile-detection17 *thers !ho have done !or# in this field of research
include C)8 /http011!!!pittedu19emotion1researchhtml7 and :)W /,:imodal +usion of
;motional -ata in an Automotive ;nvironment., ( Hoch, + Althoff, < )c<laun, < =igoll7
(uch research has been focused on detecting !hen a face becomes a particular e"pression That
is, it video se>uences a face and calculates changes in the image from a ,neutral. state to
determine if the face has become another state ? note that there are generally seven categories of
e"pression0 Anger, Disgust, Fear, Happy, Neutral, Sadness, and Surprise
We do not al!ays have the lu"ury of having a se>uence of images from a person@s neutral state%
in this paper !e discuss our research concerning the feasibility of detecting an e"pression from a
still image We combine various techni>ues for finding and describing a face, such as Viola-
Jones, machine learners, and eigenfaces An the follo!ing sections, !e !ill discuss the process of
creating our classifiers, testing the classifiers along !ith results and conclusions, and !e !ill end
!ith some future !or#
Process
The first step !as to detect faces !ithin an image, !hich !e hope to classify To do this !e
leverage the Viola-Jones face detection algorithm in *penCV The *penCV repository contains
a Haar cascade classifier that !ill find frontal faces ;ach of these faces is then saved to a file to
be processed by a classifier
The ne"t step !as to create some classifiers for the various e"pressions !e are going to classify
We first began !ith a simple ,(mile. and ,'o smile. binary classification To begin to see if it
!as even remotely possible to classify images, !e started !ith the small class images from C(;
B5C pro$ect 6 This image ,database. contains D5 smiling images and D5 non-smiling images
The ne"t step !as to move to a much larger database, provided by C)8, !ith over 4333
different images !ith 5 different classifications0 Happy, (ad, Anger, +ear, -isgust, (urprise, and
'eutral
The classifiers that !e chose to use !ere & 'earest 'eighbor and (upport Vector )achine +or
each classifier, !e chose to use t!o types of features to train our classifiers The first feature
vector !as simply the ra! image grayscaled and resiEed to 2B by 2B /resulting in a feature vector
of length C2B7 This !ould provide a baseline for all other future classification methods The
second feature vector is an e"pansion of eigenfaces, using the coefficients of those eigenfaces
used to pro$ect a face as the feature vector +or this method, !e used about 53 images ? D3
images from each e"pression ? to create the eigenfaces and saved the top 63
An order to #eep images consistent for classification, !e used the Viola-Jones face finder on the
training images to crop out the face These faces !ould generally have the same bounding bo"
around the face as other faces found from test images *ne issue that !e ran into by using this
method is that Viola Jones finds many false positives so !e had to manually delete all the false
positives it identified
After !e created all the training images, the ne"t step is to actually train the classifiers An order
to test the classifier, !e also !rote a script that for every D3 images, !e remove D from the
classification set and save it for cross validation testing The classifiers !ere created using
libraries from *penCV The &'' library that is provided is fairly straightfor!ard and !e simply
input each feature vector !ith a classification number The (V) library !as more complicated
and had many configurations that had to be set up before it could be run +or the purposes of our
pro$ect, !e used some basic defaults This myriad of configurations provides a lot of room to
fine-tune the (V) approach to improve the results /for e"ample, !e had to change !eights for
the different classes to account for classes that are less represented7
After setting up the classifiers, the final step !as to run tests on some test images ? our first test
!as to use the images !e reserved for cross-validation and our second test used images
completely unrelated to any images in the training set We !ill go into further details about the
results !e sa! in the Results section, but !e did notice that certain classifications !ere much
more li#ely than others, such as the neutral face seemed to dominate others Therefore, !e had to
change certain !eights to avoid misclassifying the non-neutral classes
The general !or#flo! is0

Viola-Jones
+ace -etector
Classifier
*utput0 ,Happy.
Trained !ith 9DB33
images
We eventually found that !ith seven classes, it turns out that using ra! images !ith (V)
provided the least number of misclassifications !hile using eigenfaces !as less accurate (ince
there !ere still some misclassifications in our best method, !e then focused on eliminating
misclassifications by improving the ,(mile.1.'o (mile. classification in the time provided *ne
of the problems !e believe !as that neutral faces !ere overrepresented in the C)8 database by
an order of magnitude more than any other class Thus, !e reduced the training siEe of the
neutral faces and combined it !ith angry, fear, disgust, sad, and surprise to form the non-smiling
class This continued to e"hibit the same problem as before, classifying nearly everything as
neutral The problem !as that certain classes of faces are too similar to happy, particularly in the
aspect of an open mouth !ith teeth sho!ing This led us to remove anger and fear from the non-
smiling class and this provided better results
+inally, !e loo#ed into the use of contours, since this !ould allo! us to really focus on the curve
of the mouth Ho!ever, !e determined that it might not very suitable, since in order to find the
mouth, !e need to lo!er the gradient threshold% ho!ever, the lo!ering of the threshold allo!ed
a lot of undesired edges to sho! up in the image including teeth, creases around the face,
shado!s, and hair, ma#ing classification fairly difficult At the other end, using a fairly high
threshold caused us to miss most of the mouth !hile !e still get a lot of noise from the image
/creases in around the mouth and shado!s7

This is not to say that the method !ill not !or#, but it !ould re>uire careful tuning and possibly
more image processing to get good results
Results
Two-classes
Anitial results sho!ed that this method of facial e"pression recognition is promising +or ,(mile.
versus ,'o smile. /no smile being all images that !ere not FHappy@7, using &'' on plain
images as feature vectors resulted in DD1D4 of the smiles correct and the D221D63 of the non-
smiling correct !hen feeding the cross-validated /remove one for every D37 data bac# in (V)
!as even better !ith only one incorrect classification out of DG4 tests These results sho! that
the faces in the C)8 database can be classified fairly !ell
Though the accuracy !as very good, the problem !ith these results !as that they !ere some!hat
biased After !e e"amined the training data !e got from C)8, !e discovered that there !ere
actually a lot ,repeated. faces so that for any face, there !as a face very similar to it in the
database of the right classification ma#ing it easier for (V) and &'' to classify the images
+or e"ternal images, they !ere almost al!ays classified as non-smiling /more specifically,
neutral7, !hich seemed to become the classifier@s ,average face. for faces it does not recogniEe
This made some sense in that for any person, their average face is the neutral face and this
e"pression, or lac# thereof, !as overrepresented in the database
To avoid using the ,repeated. faces, !e created a ne! test set to classify !ith faces that did not
have any hint of similarity !ith the images in our database We then bumped the !eights for
smiling /FHappy@7 up to compensate for the overrepresentation of non-smiling faces We also
removed some faces that loo#ed too much li#e smiling ? specifically, !e removed the F+ear@ and
FAngry@ faces from the non-smiling class We then retried (V) on our ne! test data !ith the
modified !eights and the results !ere promising0 At !as still able to classify HC1D3D correctly
'early all of the errors !ere in the fear images, !hich should be labeled as non-smiling (ince
they !ere no longer in the database, many moved to smiling We also retried &'' on our ne!
test data /there !as no !eight ad$ustment as &'' does not use !eights7 and it !as able to
classify H21D3D correctly &'' had its errors spread out throughout the test cases because !e
could not !eight a certain classification more heavily than others to avoid misclassifications of a
certain e"pressions
8sing eigenface coefficients as feature vectors did not produce better results% rather they !ere
!orse in general Ho!ever, it does run much more >uic#ly since !e only have 63 feature points
for each image :ut probably due to this same fact, 63 feature points !as not enough to capture
an e"pression and most e"pressions !ere absorbed into Fneutral@ To >uic#ly summariEe, using
the cross-validation techni>ue, !e classified D51D4 smiling correct, and D2C1D63 non-smiling
correct (o, if a face is in our database, the eigenface method is very good at finding that
person@s e"pression again Ho!ever, if !e move to people not represented in the database,
nearly all results go to non-smiling *nly 2123 images classified smile correctly on our ne! data
set This sho!ed that eigenfaces does not capture a general e"pression very !ell
Three-classes
To ma#e the system slightly more comple", !e added surprise as a third class for plain images
We did not perform this for eigenfaces since the results !ere not very good for $ust t!o classes
8sing ra! images, (V) continued to do fairly good $ob at classifying
Raw Images with SVM using 3 Expressions (smile vs. surprise vs. non-smile)
Students do not exist in database; Classes Weighted; No Fear/Anger; Using half of the neutral images
Smile Surprise Non-Smile
Smile 95% (19) 5% (1)
Surprise 94.7% (18) 5.5% (1)
Non-Smile 4.8% (3) 95.2% (59)
&'', on the other hand, began to have problems differentiating the neutral from surprise
Raw Images with KNN using 3 Expressions (smile vs. surprise vs. non-smile)
Students do not exist in database; No Fear/Anger; Using half neutral
Smile Surprise Non-Smile
Smile 75% (15) 25% (5)
Surprise 42.1% (8) 57.9% (11)
Non-Smile 8.1% (5) 91.9% (57)
Seven-classes
+inally, !e moved to classifying all seven categories, and !e e"perienced a very similar
situation When feeding the C)8 database bac# into the system, the &'' search !as not very
good, hovering around B3I for each category, classifying most e"pressions as neutral0
Raw Images with KNN using Expressions
Students already exist in database
Anger Disgust Fer !pp" Neutrl S#ness Surprise
Anger 42.9% (3) 57.1% (4)
Disgust
63.6%
(7) 3$.4% (4)
Fer
40%
(4) 1%% (1) 5%% (5)
!pp" 61.1% (11) 38.9% (7)
Neutrl 100% (71)
S#ness $%% (9)
40%
(6)
Surprise $2.5% (1%) 37.5% (6)
The problem !as that neutral !as polluting all the other categories Jerhaps tuning & /currently
D37 may produce better results
+or (V), at first, the neutral faces dominated everything, classifying nearly everything as
neutral With a fe! modifications to the !eights of the non-neutral classes, !e !ere able to get
nearly D33I accuracy
Raw Images with SVM using Expressions
Students already exist in database; Classes Weighted
Anger Disgust Fer !pp" Neutrl S#ness Surprise
Anger 100% (7)
Disgust 100% (11)
Fer 90% (9) 1%% (1)
!pp" 100% (18)
Neutrl 100% (71)
S#ness 100% (15)
Surprise
100%
(16)
&eeping the same !eight modifications, !e then tested the (V) approach !ith the e"ternal
faces /those that had no relation to our training images7 (urprisingly, the results !ere very
promising, as !ere able to successfully classify the e"pression about H3DI of the time !ith an
average accuracy across the e"pressions of 44CIK
Raw Images with SVM using Expressions
Students do not exist in database; Classes Weighted
Anger Disgust Fer !pp" Neutrl S#ness Surprise
Anger 100% (3)
Disgust 71.4% (5) 28.$% (2)
Fer 90.9% (10) 9.1% (1)
!pp" 5% (1) 90% (18) 5% (1)
Neutrl 3.45% (1) 93.1% (27) 3.45% (1)
S#ness 1$.$7% (2) 8.33% (1) 75% (9)
Surprise
100%
(19)
When using the ne! test data, &'' !as unable to ma#e good classification and produced many
misclassifications, only BBGI accuracy and average accuracy of only 6BI
Raw Images with KNN using Expressions
Students do not exist in database
Anger Disgust Fer !pp" Neutrl S#ness Surprise
Anger 1%%% (3)
Disgust 14.3% (1) 14.3% (1) 71.4% (5)
Fer 9.1% (1) 27.3% (3) $3.$% (7)
!pp" 65% (13) 35% (7)
Neutrl
96.55%
(28) 3.45% (1)
S#ness 8.33% (1) 75% (9) 16.67% (2)
Surprise 21.1% (4) 2$.3% (5) 52.6% (10)
We also tried eigenfaces here, and sa! the same results as !ith the t!o classes +or test sub$ects
!ith a similar e"pression of their o!n in the database, the method performed pretty !ell, but on
ne! test sub$ects, nearly all became neutral
Eigen!a"es Images with SVM using Expressions
Students do not exist in database; Classes Weighted
Anger Disgust Fer !pp" Neutrl S#ness Surprise
Anger 1%%% (3)
Disgust 14.3% (1) 85.7% ($)
Fer 1%%% (11)
!pp" 15% (3) 85% (17)
Neutrl 100% (29)
S#ness 1%%% (12)
Surprise 84.2% (1$) 5.3% (3)
Though the results !ere poor, it is possible that !ith some modifications to the !eights, altering
some other parameters of (V), and changing the number of eigenfaces, !e may be able to
achieve an unbiased classifier using eigenfaces
Experience
Although !e !ere eventually able to obtain promising results !ith our facial e"pression
classification algorithms, reaching this point !as not an easy one As students that are
completely ne! to the field of Computer Vision, it !as difficult to come up !ith a research idea
that !as based in Computer Vision and could be feasibly done in three !ee#s :eing able to
discuss our ideas !ith the Jrofessor and the Teaching Assistant !as very helpful during this
portion of the pro$ect Ho!ever, !e note that the opportunity to embar# upon !hatever !e
!anted had an en$oyable aspect to it% !e too# this opportunity to let our imaginations attempt to
mi" !hat !e had learned in class !ith !hat !e desired to create
The ne"t portion of the pro$ect ? coming up !ith an implementation plan ? magnified our
Computer Vision ine"perience, !hich also made our TA@s /Jiun-Hung Chen7 help and advice
invaluableK We ended up desiring to !or# on a pro$ect !ith very little published research, so !e
found ourselves coming up !ith different solutions from scratch -ue to our time limitations and
lac# of e"perience, the opportunity to go over our ideas and their feasibility !ith Jiun-Hung !as
e"tremely helpful Jiun-Hung helped us to focus our three !ee#s on a couple of promising paths
instead of !asting time !or#ing on a series of inept algorithms
With some ideas !ith !hich to go about !or#ing on our pro$ect, !e !ere e>uipped to face the
challenges of implementation, most significantly, implementing Viola-Jones and the machine
learners and constructing the databases The hardest part !ith implementing Viola-Jones and
the machine learners !as the process of learning ho! to incorporate and use *penCV
effectively +or e"ample, for the *penCV machine learners, !e had to become competent !ith
the many complicated functions needed for &'' and (V) despite the lac# of good e"amples
and documentation ? in fact, there !ere actually no e"amples of (V) that !e could use This
need for competency at using the *penCV functions !ould become especially important during
the testing phase of our pro$ect !here !e !ould have to tune our e"pression recognition system,
especially the machine learners Also, !e had to discover and !or# around the F>uir#s@ of
*penCV% for e"ample, a number of the machine learning functions declared in their h header
files actually !ere not implemented in *penCV -espite these challenges, !e really appreciated
the provision of *penCV as a tool to implement comple" algorithms li#e Viola-Jones and (V),
as it allo!ed us to focus our pro$ect on actually trying to solve a problem /classifying facial
e"pressions7 instead of $ust trying to get an algorithm to !or# /such as Viola-Jones7
Constructing the databases !as one of the most time consuming portions of this pro$ect Jerhaps
the most time consuming part !as sorting our sample images into the proper e"pression
classifications /ie Anger, +ear, -isgust, etc7 ? in our case, around DB33 images for training and
a fe! hundred more for testing We then needed to format the images according to ho! our
facial recognition system e"pected the images% the ra! image system used JJ< images, !hereas
the eigenfaces method used T<A images
After completing the implementation of our system and creating our image databases, the testing
portion of the pro$ect turned out to be both the most depressing and the most e"hilarating part of
the pro$ect (ince !e !ere embar#ing on a novel research pro$ect, !e e"perienced a very large
number of failed attempts at classifying facial e"pressions before finally arriving at a point !here
!e !ere successfully classifying e"pressionsK After connecting all of our pro$ect components
/databases, Viola-Jones, machine learners, image processors li#e eigenfaces7, !e used a number
of image testing sets and each time !e discovered different !ays to both fail our system as !ell
as !ays to tune our system At times, it seemed as if !e !ould never be able to get our system
to !or# correctly !ith some of our more complicated test sets Ho!ever, after many long hours
of head scratching and intense labor, finally seeing a !or#ing system allo!ed us to transform our
former grief into amaEing e"hilarationK
*verall, this pro$ect did provide a valuable e"perience in the field of Computer Vision We !ere
challenged at every step, from idea formulation, to implementation, to testing Ho!ever, !ith
the advising from our TA, Jiun-Hung Chen, and spending many hours !or#ing through these
challenges, !e !ere able to gro! in many aspects of Computer Vision An the end, finally
developing a !or#ing artifact has really encouraged us to continue loo#ing into the field of
Computer Vision, perhaps even as a research areaK
Future Work
The !or# !e did sho!ed promise that it is possible to ta#e a still image and determine the
e"pression on the face, ho!ever, there is still much !or# to be done An the limited time that !e
had, !e provided a couple of baseline e"amples that can be e"tended upon fairly simply given
some time (ome ideas !e considered !ere contours, !eighting a face, and possibly a mi"ture
of these
Contours, as described above, !ere briefly loo#ed at and !e did not pursue it at this time :y
simply ta#ing contours naively, the result is actually very noisy /for e"ample, !e get contour
lines for individual teeth and creases on the face7 and it is very difficult to ascertain the
e"pression that is being made by the face ;ven if !e decided to $ust loo# at the mouth, it is
actually pretty hard to see !here the mouth is even locatedK Ho!ever, loo#ing at contours does
have promise if !e are able to identify the contour that represents the mouth area
Another option that could provide better results is to !eight certain parts of the image as more
informative than others +or e"ample, much of these images are heavily based on !hat the
mouth is doing Also, !e #no! that the mouth is on the bottom half of the face /at least for our
images7 With these t!o pieces of information, !e can ta#e each face and ,enhance. the bottom
half area of the face before using it as part of the classifier, or in the e"treme case, simply cut off
the top half of the image This !ould then focus on $ust the differences in the mouth This
!ould be our ne"t step if !e had the time
Lastly, !e only loo#ed at t!o classification methods, &'' and (V), a fe! variations of a
database and tuning parameters /such as number of eigenfaces, !eights, &, etc7 There are
possibly other classifiers that can perform this form of classification much better than either
&'' or (V) *n the other hand, (V) has many tunable parts !hich might enhance (V)@s
ability to classify the images more accurately +urthermore, our facial database might not have
been best suited for identifying each e"pression clearly :y ad$usting the images !e used in our
database to train our classifiers and tuning some !eights, !e !ere able to get much better results
This !ould indicate that !ith the right choice of images and parameters, the system !ould be
much more robust to !ide variations of e"pressions

You might also like