Advised by Jiun-Hung Chen Abstract With the advent of the Viola-Jones face detection algorithm, the computer vision community has a pretty good method for detecting faces Ho!ever, currently, there is actually very little published research dealing !ith detecting facial e"pressions on still images, as most publications are focused on detecting facial e"pressions in video images This paper is focused on our three- !ee# long research pro$ect on facial e"pression detection on still images using different combinations of image processing methods and machine learners The t!o !ays !e processed images !ere using the ra! pi"els and eigenfaces% t!o different machine learners, &-'earest 'eighbors and (upport Vector )achines, classified the processed images *ur results indicate that the detection of facial e"pressions on still images appears to be possible, as using ra! images !ith (upport Vector )achines turned out some very promising results that should lead to further research and development Introduction There have been many advances in face detection% ho!ever, the area of e"pression detection is still in its early stages There has been a great deal of !or# done in this area, and even applications of it +or e"ample, (ony cameras have their ,(mile -etection. that is supposed to detect !hen a person in the image is smiling /http011!!!gadgetbbcom123341321251sony-dsc- t633-first-camera-!ith-smile-detection17 *thers !ho have done !or# in this field of research include C)8 /http011!!!pittedu19emotion1researchhtml7 and :)W /,:imodal +usion of ;motional -ata in an Automotive ;nvironment., ( Hoch, + Althoff, < )c<laun, < =igoll7 (uch research has been focused on detecting !hen a face becomes a particular e"pression That is, it video se>uences a face and calculates changes in the image from a ,neutral. state to determine if the face has become another state ? note that there are generally seven categories of e"pression0 Anger, Disgust, Fear, Happy, Neutral, Sadness, and Surprise We do not al!ays have the lu"ury of having a se>uence of images from a person@s neutral state% in this paper !e discuss our research concerning the feasibility of detecting an e"pression from a still image We combine various techni>ues for finding and describing a face, such as Viola- Jones, machine learners, and eigenfaces An the follo!ing sections, !e !ill discuss the process of creating our classifiers, testing the classifiers along !ith results and conclusions, and !e !ill end !ith some future !or# Process The first step !as to detect faces !ithin an image, !hich !e hope to classify To do this !e leverage the Viola-Jones face detection algorithm in *penCV The *penCV repository contains a Haar cascade classifier that !ill find frontal faces ;ach of these faces is then saved to a file to be processed by a classifier The ne"t step !as to create some classifiers for the various e"pressions !e are going to classify We first began !ith a simple ,(mile. and ,'o smile. binary classification To begin to see if it !as even remotely possible to classify images, !e started !ith the small class images from C(; B5C pro$ect 6 This image ,database. contains D5 smiling images and D5 non-smiling images The ne"t step !as to move to a much larger database, provided by C)8, !ith over 4333 different images !ith 5 different classifications0 Happy, (ad, Anger, +ear, -isgust, (urprise, and 'eutral The classifiers that !e chose to use !ere & 'earest 'eighbor and (upport Vector )achine +or each classifier, !e chose to use t!o types of features to train our classifiers The first feature vector !as simply the ra! image grayscaled and resiEed to 2B by 2B /resulting in a feature vector of length C2B7 This !ould provide a baseline for all other future classification methods The second feature vector is an e"pansion of eigenfaces, using the coefficients of those eigenfaces used to pro$ect a face as the feature vector +or this method, !e used about 53 images ? D3 images from each e"pression ? to create the eigenfaces and saved the top 63 An order to #eep images consistent for classification, !e used the Viola-Jones face finder on the training images to crop out the face These faces !ould generally have the same bounding bo" around the face as other faces found from test images *ne issue that !e ran into by using this method is that Viola Jones finds many false positives so !e had to manually delete all the false positives it identified After !e created all the training images, the ne"t step is to actually train the classifiers An order to test the classifier, !e also !rote a script that for every D3 images, !e remove D from the classification set and save it for cross validation testing The classifiers !ere created using libraries from *penCV The &'' library that is provided is fairly straightfor!ard and !e simply input each feature vector !ith a classification number The (V) library !as more complicated and had many configurations that had to be set up before it could be run +or the purposes of our pro$ect, !e used some basic defaults This myriad of configurations provides a lot of room to fine-tune the (V) approach to improve the results /for e"ample, !e had to change !eights for the different classes to account for classes that are less represented7 After setting up the classifiers, the final step !as to run tests on some test images ? our first test !as to use the images !e reserved for cross-validation and our second test used images completely unrelated to any images in the training set We !ill go into further details about the results !e sa! in the Results section, but !e did notice that certain classifications !ere much more li#ely than others, such as the neutral face seemed to dominate others Therefore, !e had to change certain !eights to avoid misclassifying the non-neutral classes The general !or#flo! is0
Viola-Jones +ace -etector Classifier *utput0 ,Happy. Trained !ith 9DB33 images We eventually found that !ith seven classes, it turns out that using ra! images !ith (V) provided the least number of misclassifications !hile using eigenfaces !as less accurate (ince there !ere still some misclassifications in our best method, !e then focused on eliminating misclassifications by improving the ,(mile.1.'o (mile. classification in the time provided *ne of the problems !e believe !as that neutral faces !ere overrepresented in the C)8 database by an order of magnitude more than any other class Thus, !e reduced the training siEe of the neutral faces and combined it !ith angry, fear, disgust, sad, and surprise to form the non-smiling class This continued to e"hibit the same problem as before, classifying nearly everything as neutral The problem !as that certain classes of faces are too similar to happy, particularly in the aspect of an open mouth !ith teeth sho!ing This led us to remove anger and fear from the non- smiling class and this provided better results +inally, !e loo#ed into the use of contours, since this !ould allo! us to really focus on the curve of the mouth Ho!ever, !e determined that it might not very suitable, since in order to find the mouth, !e need to lo!er the gradient threshold% ho!ever, the lo!ering of the threshold allo!ed a lot of undesired edges to sho! up in the image including teeth, creases around the face, shado!s, and hair, ma#ing classification fairly difficult At the other end, using a fairly high threshold caused us to miss most of the mouth !hile !e still get a lot of noise from the image /creases in around the mouth and shado!s7
This is not to say that the method !ill not !or#, but it !ould re>uire careful tuning and possibly more image processing to get good results Results Two-classes Anitial results sho!ed that this method of facial e"pression recognition is promising +or ,(mile. versus ,'o smile. /no smile being all images that !ere not FHappy@7, using &'' on plain images as feature vectors resulted in DD1D4 of the smiles correct and the D221D63 of the non- smiling correct !hen feeding the cross-validated /remove one for every D37 data bac# in (V) !as even better !ith only one incorrect classification out of DG4 tests These results sho! that the faces in the C)8 database can be classified fairly !ell Though the accuracy !as very good, the problem !ith these results !as that they !ere some!hat biased After !e e"amined the training data !e got from C)8, !e discovered that there !ere actually a lot ,repeated. faces so that for any face, there !as a face very similar to it in the database of the right classification ma#ing it easier for (V) and &'' to classify the images +or e"ternal images, they !ere almost al!ays classified as non-smiling /more specifically, neutral7, !hich seemed to become the classifier@s ,average face. for faces it does not recogniEe This made some sense in that for any person, their average face is the neutral face and this e"pression, or lac# thereof, !as overrepresented in the database To avoid using the ,repeated. faces, !e created a ne! test set to classify !ith faces that did not have any hint of similarity !ith the images in our database We then bumped the !eights for smiling /FHappy@7 up to compensate for the overrepresentation of non-smiling faces We also removed some faces that loo#ed too much li#e smiling ? specifically, !e removed the F+ear@ and FAngry@ faces from the non-smiling class We then retried (V) on our ne! test data !ith the modified !eights and the results !ere promising0 At !as still able to classify HC1D3D correctly 'early all of the errors !ere in the fear images, !hich should be labeled as non-smiling (ince they !ere no longer in the database, many moved to smiling We also retried &'' on our ne! test data /there !as no !eight ad$ustment as &'' does not use !eights7 and it !as able to classify H21D3D correctly &'' had its errors spread out throughout the test cases because !e could not !eight a certain classification more heavily than others to avoid misclassifications of a certain e"pressions 8sing eigenface coefficients as feature vectors did not produce better results% rather they !ere !orse in general Ho!ever, it does run much more >uic#ly since !e only have 63 feature points for each image :ut probably due to this same fact, 63 feature points !as not enough to capture an e"pression and most e"pressions !ere absorbed into Fneutral@ To >uic#ly summariEe, using the cross-validation techni>ue, !e classified D51D4 smiling correct, and D2C1D63 non-smiling correct (o, if a face is in our database, the eigenface method is very good at finding that person@s e"pression again Ho!ever, if !e move to people not represented in the database, nearly all results go to non-smiling *nly 2123 images classified smile correctly on our ne! data set This sho!ed that eigenfaces does not capture a general e"pression very !ell Three-classes To ma#e the system slightly more comple", !e added surprise as a third class for plain images We did not perform this for eigenfaces since the results !ere not very good for $ust t!o classes 8sing ra! images, (V) continued to do fairly good $ob at classifying Raw Images with SVM using 3 Expressions (smile vs. surprise vs. non-smile) Students do not exist in database; Classes Weighted; No Fear/Anger; Using half of the neutral images Smile Surprise Non-Smile Smile 95% (19) 5% (1) Surprise 94.7% (18) 5.5% (1) Non-Smile 4.8% (3) 95.2% (59) &'', on the other hand, began to have problems differentiating the neutral from surprise Raw Images with KNN using 3 Expressions (smile vs. surprise vs. non-smile) Students do not exist in database; No Fear/Anger; Using half neutral Smile Surprise Non-Smile Smile 75% (15) 25% (5) Surprise 42.1% (8) 57.9% (11) Non-Smile 8.1% (5) 91.9% (57) Seven-classes +inally, !e moved to classifying all seven categories, and !e e"perienced a very similar situation When feeding the C)8 database bac# into the system, the &'' search !as not very good, hovering around B3I for each category, classifying most e"pressions as neutral0 Raw Images with KNN using Expressions Students already exist in database Anger Disgust Fer !pp" Neutrl S#ness Surprise Anger 42.9% (3) 57.1% (4) Disgust 63.6% (7) 3$.4% (4) Fer 40% (4) 1%% (1) 5%% (5) !pp" 61.1% (11) 38.9% (7) Neutrl 100% (71) S#ness $%% (9) 40% (6) Surprise $2.5% (1%) 37.5% (6) The problem !as that neutral !as polluting all the other categories Jerhaps tuning & /currently D37 may produce better results +or (V), at first, the neutral faces dominated everything, classifying nearly everything as neutral With a fe! modifications to the !eights of the non-neutral classes, !e !ere able to get nearly D33I accuracy Raw Images with SVM using Expressions Students already exist in database; Classes Weighted Anger Disgust Fer !pp" Neutrl S#ness Surprise Anger 100% (7) Disgust 100% (11) Fer 90% (9) 1%% (1) !pp" 100% (18) Neutrl 100% (71) S#ness 100% (15) Surprise 100% (16) &eeping the same !eight modifications, !e then tested the (V) approach !ith the e"ternal faces /those that had no relation to our training images7 (urprisingly, the results !ere very promising, as !ere able to successfully classify the e"pression about H3DI of the time !ith an average accuracy across the e"pressions of 44CIK Raw Images with SVM using Expressions Students do not exist in database; Classes Weighted Anger Disgust Fer !pp" Neutrl S#ness Surprise Anger 100% (3) Disgust 71.4% (5) 28.$% (2) Fer 90.9% (10) 9.1% (1) !pp" 5% (1) 90% (18) 5% (1) Neutrl 3.45% (1) 93.1% (27) 3.45% (1) S#ness 1$.$7% (2) 8.33% (1) 75% (9) Surprise 100% (19) When using the ne! test data, &'' !as unable to ma#e good classification and produced many misclassifications, only BBGI accuracy and average accuracy of only 6BI Raw Images with KNN using Expressions Students do not exist in database Anger Disgust Fer !pp" Neutrl S#ness Surprise Anger 1%%% (3) Disgust 14.3% (1) 14.3% (1) 71.4% (5) Fer 9.1% (1) 27.3% (3) $3.$% (7) !pp" 65% (13) 35% (7) Neutrl 96.55% (28) 3.45% (1) S#ness 8.33% (1) 75% (9) 16.67% (2) Surprise 21.1% (4) 2$.3% (5) 52.6% (10) We also tried eigenfaces here, and sa! the same results as !ith the t!o classes +or test sub$ects !ith a similar e"pression of their o!n in the database, the method performed pretty !ell, but on ne! test sub$ects, nearly all became neutral Eigen!a"es Images with SVM using Expressions Students do not exist in database; Classes Weighted Anger Disgust Fer !pp" Neutrl S#ness Surprise Anger 1%%% (3) Disgust 14.3% (1) 85.7% ($) Fer 1%%% (11) !pp" 15% (3) 85% (17) Neutrl 100% (29) S#ness 1%%% (12) Surprise 84.2% (1$) 5.3% (3) Though the results !ere poor, it is possible that !ith some modifications to the !eights, altering some other parameters of (V), and changing the number of eigenfaces, !e may be able to achieve an unbiased classifier using eigenfaces Experience Although !e !ere eventually able to obtain promising results !ith our facial e"pression classification algorithms, reaching this point !as not an easy one As students that are completely ne! to the field of Computer Vision, it !as difficult to come up !ith a research idea that !as based in Computer Vision and could be feasibly done in three !ee#s :eing able to discuss our ideas !ith the Jrofessor and the Teaching Assistant !as very helpful during this portion of the pro$ect Ho!ever, !e note that the opportunity to embar# upon !hatever !e !anted had an en$oyable aspect to it% !e too# this opportunity to let our imaginations attempt to mi" !hat !e had learned in class !ith !hat !e desired to create The ne"t portion of the pro$ect ? coming up !ith an implementation plan ? magnified our Computer Vision ine"perience, !hich also made our TA@s /Jiun-Hung Chen7 help and advice invaluableK We ended up desiring to !or# on a pro$ect !ith very little published research, so !e found ourselves coming up !ith different solutions from scratch -ue to our time limitations and lac# of e"perience, the opportunity to go over our ideas and their feasibility !ith Jiun-Hung !as e"tremely helpful Jiun-Hung helped us to focus our three !ee#s on a couple of promising paths instead of !asting time !or#ing on a series of inept algorithms With some ideas !ith !hich to go about !or#ing on our pro$ect, !e !ere e>uipped to face the challenges of implementation, most significantly, implementing Viola-Jones and the machine learners and constructing the databases The hardest part !ith implementing Viola-Jones and the machine learners !as the process of learning ho! to incorporate and use *penCV effectively +or e"ample, for the *penCV machine learners, !e had to become competent !ith the many complicated functions needed for &'' and (V) despite the lac# of good e"amples and documentation ? in fact, there !ere actually no e"amples of (V) that !e could use This need for competency at using the *penCV functions !ould become especially important during the testing phase of our pro$ect !here !e !ould have to tune our e"pression recognition system, especially the machine learners Also, !e had to discover and !or# around the F>uir#s@ of *penCV% for e"ample, a number of the machine learning functions declared in their h header files actually !ere not implemented in *penCV -espite these challenges, !e really appreciated the provision of *penCV as a tool to implement comple" algorithms li#e Viola-Jones and (V), as it allo!ed us to focus our pro$ect on actually trying to solve a problem /classifying facial e"pressions7 instead of $ust trying to get an algorithm to !or# /such as Viola-Jones7 Constructing the databases !as one of the most time consuming portions of this pro$ect Jerhaps the most time consuming part !as sorting our sample images into the proper e"pression classifications /ie Anger, +ear, -isgust, etc7 ? in our case, around DB33 images for training and a fe! hundred more for testing We then needed to format the images according to ho! our facial recognition system e"pected the images% the ra! image system used JJ< images, !hereas the eigenfaces method used T<A images After completing the implementation of our system and creating our image databases, the testing portion of the pro$ect turned out to be both the most depressing and the most e"hilarating part of the pro$ect (ince !e !ere embar#ing on a novel research pro$ect, !e e"perienced a very large number of failed attempts at classifying facial e"pressions before finally arriving at a point !here !e !ere successfully classifying e"pressionsK After connecting all of our pro$ect components /databases, Viola-Jones, machine learners, image processors li#e eigenfaces7, !e used a number of image testing sets and each time !e discovered different !ays to both fail our system as !ell as !ays to tune our system At times, it seemed as if !e !ould never be able to get our system to !or# correctly !ith some of our more complicated test sets Ho!ever, after many long hours of head scratching and intense labor, finally seeing a !or#ing system allo!ed us to transform our former grief into amaEing e"hilarationK *verall, this pro$ect did provide a valuable e"perience in the field of Computer Vision We !ere challenged at every step, from idea formulation, to implementation, to testing Ho!ever, !ith the advising from our TA, Jiun-Hung Chen, and spending many hours !or#ing through these challenges, !e !ere able to gro! in many aspects of Computer Vision An the end, finally developing a !or#ing artifact has really encouraged us to continue loo#ing into the field of Computer Vision, perhaps even as a research areaK Future Work The !or# !e did sho!ed promise that it is possible to ta#e a still image and determine the e"pression on the face, ho!ever, there is still much !or# to be done An the limited time that !e had, !e provided a couple of baseline e"amples that can be e"tended upon fairly simply given some time (ome ideas !e considered !ere contours, !eighting a face, and possibly a mi"ture of these Contours, as described above, !ere briefly loo#ed at and !e did not pursue it at this time :y simply ta#ing contours naively, the result is actually very noisy /for e"ample, !e get contour lines for individual teeth and creases on the face7 and it is very difficult to ascertain the e"pression that is being made by the face ;ven if !e decided to $ust loo# at the mouth, it is actually pretty hard to see !here the mouth is even locatedK Ho!ever, loo#ing at contours does have promise if !e are able to identify the contour that represents the mouth area Another option that could provide better results is to !eight certain parts of the image as more informative than others +or e"ample, much of these images are heavily based on !hat the mouth is doing Also, !e #no! that the mouth is on the bottom half of the face /at least for our images7 With these t!o pieces of information, !e can ta#e each face and ,enhance. the bottom half area of the face before using it as part of the classifier, or in the e"treme case, simply cut off the top half of the image This !ould then focus on $ust the differences in the mouth This !ould be our ne"t step if !e had the time Lastly, !e only loo#ed at t!o classification methods, &'' and (V), a fe! variations of a database and tuning parameters /such as number of eigenfaces, !eights, &, etc7 There are possibly other classifiers that can perform this form of classification much better than either &'' or (V) *n the other hand, (V) has many tunable parts !hich might enhance (V)@s ability to classify the images more accurately +urthermore, our facial database might not have been best suited for identifying each e"pression clearly :y ad$usting the images !e used in our database to train our classifiers and tuning some !eights, !e !ere able to get much better results This !ould indicate that !ith the right choice of images and parameters, the system !ould be much more robust to !ide variations of e"pressions