You are on page 1of 5

International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)

(I-SMAC 2017)

Study of Video based Facial Expression and


Emotions Recognition Methods
Husam Salih Lalit kulkarni
Information Technology, MIT Pune Information Technology, MIT Pune
University of Pune University of Pune
husam01salih@gmail.com lalit.kulkarni@mitcoe.edu.in
Baghdad, Iraq Pune, India

Abstract— In real life scenario, facial expressions and is saying no signal can provide to computing background, bring
emotions are nothing but responses to the external and internal our everyday human user to remain at the forefront in the fabric
events of human being. In human computer interaction, will move to absorb that predict a generally establishment [2].
recognition of end user’s expressions and emotions from the It pervasive computing and ambient intelligence such as want
video streaming plays very important role. In such systems it is to achieves for future computing. it's easy to naturally
required to track the dynamic changes in human face movements occurring multimodal human-human communication-focused
quickly in order to deliver the required response system. The one result to the end user will need to developed to identify such
real time application is physical fatigue detection based on facial interfaces and intentions and. as expressed by feelings of social
detection and expressions such as driver fatigue detection in
and emotional signals will need to have the ability to sense
order to prevent the accidents on road. Face expression based
future nonverbal actions and expressions. The automatic
physical fatigue analysis or detection is out of scope of this paper,
but this paper reveal study on different methods those are recognition Research inspired. Facial expression recognition,
presented recently for facial expression and/or emotions computer vision, pattern identification and human and
recognition using video. This paper presenting the methodologies computer interaction research has attracted towards notices in
in terms of feature extraction and classification used in facial communities. automatic recognition of facial expressions so
expression and/or emotion recognition methods with their affective computing technologies, including intelligent tutoring
comparative study. The comparative study is done based on systems supply the essence of the next generation computing
accuracy, implementation tool, advantages and disadvantages. device a forms, patient monitoring systems, etc. personal
The outcome of this paper is the current research gap and wellness profiled. Human face, different age groups, genders
research challenges those are still open to solve for video based and other physical characteristics of an individual differs from
facial detection and recognition systems. The survey on recent the cause [3].
methods is appropriately presented throughout this paper by
considering future research works. Emotions are basic to human beings in day to day
interactions, and it use in everyday life. Emotion recognition
Keywords— Facial expressions, Frames, Emotions, has become an important and interesting field of study in
Expressions, Fatigue, Feature Extraction, Classification. Human-Computer Interaction (HCI), Human Robot Interaction
(HRI), etc. The six basic emotions are, sickening, happy, fear,
I. INTRODUCTION anger sad and surprise. Computer graphics, automatic driver
Now days in many real time applications human computer fatigue detection, 3D/4D avatars animation in the entertainment
interaction based systems are used to immediately and industries, psychology, video & text chat and gaming
accurately track the human activities from the videos. One such applications are include in diverse applications. Recognition of
area is realizing and tracking the human face expression and emotions from facial expressions using videos consists of
emotions recognitions from the video streaming with an preprocessing, feature pulling out and division. Importance of
objective of different purpose such as physical fatigue facial expression system is widely recognized in social
detection. Before going to discuss about it more first we interaction and social intelligence system analysis is an active
introduce about needs of facial emotion and expression research topic since the 19th century. Suwa ET was introduced
recognition in upcoming sentences. In human-to-human facial expression recognition in 1978 Al. creating a facial
conversation, the sound of mental, emotional, and even expression recognition system the main point of face detection
physical state is used in conversations about important and alignment Feature extraction and classification, image
information in addition to pronounce a communication channel standardization [4].
and facial expressions is the notion of a person's facial There are techniques that we use to identify facial
expressions in its simplest form is a more subtle happy or expressions to speed up an efficient algorithm number.
angry thoughts [1], feelings or absorption of all speaker Efficient algorithm faces motion detection by using optical
expectation from listeners, sympathy, or even what the speaker flow proposed for facial it must be based on either recognition

978-1-5090-3243-3/17/$31.00 ©2017 IEEE 692


International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)

detection technology that is based on the optical flow vector later facial expression was recognized by fusing static and
speed infusion technology. Optical flow speed during intervals learned dynamic facial appearance knowledge by training a
of time reflects the image changes the algorithm works on multi-class expressional Markov process using a video
segmented image frames and we offer vector depends on their database [5].
results the strongest degree of equality gives facial emotions
determines. Algorithm of work (AU) coded facial expression Author presented a novel face detection and face tracking
based on database operation unite. Using this method can strategy from video streaming. Video frame does not show any
identify facial expressions match there to recognize that there expression of prior localization of a face nor do make any
are four types of expression [5]. Facial expression to recognize suggest about the pose. Window like rectangle is drawn by
the first type uses emotion s speed. The second type of optical calculating top-left, top-right, bottom left, and bottom-right
flow using facial expression to identify an image frame is the point in face image outline of a video frame. For removing
third Type facial expression to recognize the active shape error some pre-processing mathematical tasks is required for a
model to use. The fourth type neural networks [3] using facial video frame. In respect of counter boundary images production
expression to recognize face a complex multidimensional view. by edges is required. After this, scalar and vector distance
Model and to develop a model for face recognition is hard between all corner points of two consecutive frames are find to
track the face location. Corner point’s displacement means
work. Face detection there are available several types of
different condition database (expression, Lights, etc.) with a changes the location and face position into the next frame.
different face [6]. There are many ways to facial expressions, Figure 1 is showing the approach used for video based face
such as non-monotonic illumination variation, random noise detection as well as tracking [6].
and changes in age, pretending as though each suffered from Online video contextual advertisement user-oriented
the limitations of this method, and to identify conditions of system was introduced by author. Actually this approach was
expression. Although some methods, such as Gradient face, a union of networking streaming structure using Meta-data
High discrimination power illumination variation, they are still structure to video data storage and video-based face
recognized for expression and age variation conditions identification using machine learning models from the camera
capabilities [7]. with multimedia communications. The suitable object classes
Since from last decade there are number of methods are determining from the real captured images. That images
presented for video based or image based human facial will be study on predefined set of conditions. Based on the
expression and emotions recognition. These methods are defined object class, the system use the multimedia advertising
differing in used methodologies and datasets of facial contents database and appropriate contents was select and play
expressions. The important for any method is the detection automatically. Additionally, this approach analyzed existing
accuracy and processing time, these two performance metrics face identification in video streaming and age evaluation from
defines the quality and efficiency of proposed method. In this face images approaches [7].
paper our goal is to study such recent facial expression Author presented different approach based on video facial
recognition methods based on video inputs with their tracking for the detection of physical fatigue. It was introduced
advantages and disadvantages. In reminder of sections, section the efficient non-contact system for detecting non-localized
II is presenting the survey and study on recent video based physical tiredness from maximal muscle activity using facial
facial expression and emotions recognition methods. Section videos acquired in a practical environment with natural lighting
III presenting the comparative analysis of discussed methods in where subjects were allowed to voluntarily turns their head,
tabular form with graph of accuracy comparison. Section IV change their facial expression, and vary their pose. This
presenting the current limitations and research gap. Finally, in method utilized a facial feature point finding method by
section V conclusion and future work discussed. collecting a ‘good feature to track’ and a ‘supervised descent
method’ to address the challenges that beginning from practical
II. METHODS STUDY scenario. A face quality assessment system was also
In this section study on recent nine methods of facial incorporated in this system to reduce erroneous results by
emotions/expression recognition using videos is introducing. eliminating poor quality faces that occurred in a video
The methods studied are from 2014, 2015 and 2016. sequence due to problems in realistic lighting, head motion,
and pose variation [8].
Author introduced the novel facial motion finding and
expression identification system based on video data in [5]. By Author proposed a light weight on-line method to
a 3D deformable facial model, the online statistical model overcome the shortcomings of traditional supervised ER
(OSM) and cylinder head model (CHM) were combined to methods in terms of accuracy and speed for neutral vs. emotion
track 3D facial motion in the framework of particle filtering. classification, as a pre-processer to the supervised ER methods
For facial expression identification, a fast and efficient in [9]. In [23], personalized model was constructed by learning
algorithm and a robust and precise algorithm were developed neutral appearance of the user online using a set of reference
.Sequentially retrieve facial animation. After that facial neutral frames, thereby overcoming the problems caused by
animation was obtained, facial expression was identified by facial biases, lighting conditions, etc., as both learning and
static facial expression knowledge learned from anatomical testing happens on the same user. Emotion based reference
analysis. The second one, facial animation and facial model using emotion frames is difficult to generate and it also
expression were simultaneously access to improve the may not generalize, as it requires many different kinds of
reliability and robustness with noisy input data. In this method emotion frames from a user for model construction, thereby

978-1-5090-3243-3/17/$31.00 ©2017 IEEE 693


International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)

also increases computational complexity. Moreover, the system carried with 30 users to evaluate the possibility of a user
would require a user interface that will guide the user to give starting with non-neutral expression when a smart phone
all kinds of emotions before generating the reference model application is being launched. They explicitly did not ask any
using all such frames. Hence, neutral frames are the preferred user to start with neutral [9]
choice for reference model building. The experiment was
Author introduced functions to identify the emotions from
carried with 30 users to evaluate the possibility of a user
the video streaming of BU-4DFE database. Videos of 101
starting with non-neutral expression when a smart phone
subjects, 6 emotions from the BU-4DFE database use by the
application is being launched. They explicitly did not ask any
Authors. This function dynamically finds apex frame of a video
user to start with neutral [9]
sequence. The Euclidean distance between feature points in
apex and neutral frame is find and their difference in
corresponding neutral and apex frame is determine to form the
feature vector which is given to classifier for recognizing
emotions. With this technique it was used only two frames and
39 feature points. This minimizes the size of feature vector.
The classification was done using SVM (Support Vector
Machine) and NN (Neural Network) with different kernels.
They have calculated the classification time for SVM where
Gaussian RBF, Gaussian RBF (Soft margin) and sigmoid
kernel performs comparatively better at low computation time.
The experimental results of this method showing that its
accuracy is better than previous methods, but not yet evaluated
on real time datasets. Figure 2 is showing the approach adopted
with this method [10].

Given an m-by-n data matrix X, which is treated as m (1-by-n)


row vectors x1, x2... xm, the Euclidean distance between the
vector xs and xt are defined as follows:

D=(xs−xt)(xs−xt) (1)

Figure 1: System Architecture of Method


Author proposed a light weight on-line method to
overcome the shortcomings of traditional supervised ER
methods in terms of accuracy and speed for neutral vs. emotion
classification, as a pre-processer to the supervised ER methods
in [9]. In [23], personalized model was constructed by learning
neutral appearance of the user online using a set of reference
neutral frames, thereby overcoming the problems caused by
facial biases, lighting conditions, etc., as both learning and
testing happens on the same user. Emotion based reference
model using emotion frames is difficult to generate and it also
may not generalize, as it requires many different kinds of
emotion frames from a user for model construction, thereby
also increases computational complexity. Moreover, the system
would require a user interface that will guide the user to give
all kinds of emotions before generating the reference model Figure 2: Emotion Recognition System Using Video
using all such frames. Hence, neutral frames are the preferred Sequences
choice for reference model building. The experiment was

978-1-5090-3243-3/17/$31.00 ©2017 IEEE 694


International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)

Author introduced spatiotemporal feature extraction for TABLE 1: COMPARATIVE STUDY OF EMOTION/EXPRESSION
facial expression recognition on video sequences. The RECOGNITION TECHNIQUES
proposed spatiotemporal texture map (STTM) has ability to
capturing subtle spatial and temporal variations of facial Paper Title Methodology Advantages &
expressions with poor computational difficulty. First, face is Disadvantages
detected using Viola–Jones face detector and frames are
cropped to remove unwanted background. Facial features are Spatiotemporal STTM (spatio- Advantages:
then modelled with the proposed STTM, which uses the Feature temporal texture This Method
spatiotemporal information separated from three-dimensional Extraction for map), SVM, outperforms many
Harris corner function. A block-based method is adopted to Facial Block Processing state-of-the-art
extract the dynamic features and show the features in the form Expression appearance-based
of histograms. The features are then classified into classes of Recognition feature extraction
emotion and expression by the support vector machine
techniques
classifier. The experimental results demonstrate that the
proposed approach shows best performance compared with the Disadvantages:
state-of-the-art approaches with an average recognition rate of Classification
95.37, 98.56, and 84.52% on datasets containing posed Framework is
expressions, unforced micro expressions, and close-to-real- Basic and not
world expressions, respectively [11]. evaluated under
complex head
Author proposed for efficient face expression and emotions movements
recognitions from video frames. To robustly recognize facial Video-Based LGBP-TOP, Advantages:
emotions in real-world natural situations, novel techniques
Facial SVM, Robust method
called Extreme Sparse Learning (ESL], which has the ability to
Recognition Gabor Filters and less complex
jointly learn a dictionary (set of basis) and a non-linear
classification model. This approach combines the Using Histogram Disadvantages:
discriminative power of Extreme Learning Machine (ELM) Sequence of Needs to improve
with the reconstruction property of sparse representation to Local Gabor accuracy
enable accurate classification when presented with noisy Binary Patterns performance
signals and imperfect data recorded in natural settings. from Three
Additionally, this work presents a new local spatio-temporal Orthogonal
descriptor that is distinctive and pose-invariant. This Planes
framework is able to achieve state-of-the-art recognition Dynamic Facial Euclidean Advantages:
accuracy on both acted and spontaneous facial emotion Emotion distance, Apex Accuracy is
databases [12]. Recognition Frames, Neural improved as
Author introduce the another video based facial expression from 4D Video Network, SVM compared
recognition. In this paper, inspired by the success of VLBP, Sequences complex methods.
author first combined the LBP-TOP features (which could also Disadvantages:
be called simplified 3D-LBP) and Gabor feature representation Not yet evaluated
to describe dynamic facial expression sequences. Then SVM is under real time
adopted for classification. The experiments on the extended conditions
Cohn-Kanade database (CK+) show the promising
performance of this method [13].
III. COMPARATIVE STUDY
Neutral Face CLM, Advantages:
In this section we are presenting the tabular form analysis Classification Patch processing, The
of methods studied in above section of this paper with their Using LBP computational
advantages and disadvantages. After tabular (table 1) analysis, Personalized KE points advantage of
the graph of accuracy is presented. Appearance using the
Models for Fast proposed method
and Robust as a pre-
Emotion processing unit
Detection Disadvantages:
CLM fitting also
may not be
accurate under
sudden pose
variations

978-1-5090-3243-3/17/$31.00 ©2017 IEEE 695


International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC 2017)

Robust ESL, ELM Advantages:


Representation Better
and Recognition performance CONCLUSION AND FUTURE WORK
of Facial under challenging The goal of this paper is to present comparative study on
Emotions Using scenarios different techniques of video based facial and emotions
Extreme Sparse Disadvantages: recognition using different methods. Since from last decade,
Learning Higher automatic facial expression and emotions recognition plays a
computational significant role in our daily communication and computer
cost for both science, such as human-computer interaction systems,
feature extraction biometrics, and security and so on. In recent years, much deep
and classification and productive research in this area has been carried out. The
methods studied are from 2015 and 2016 with their advantages,
disadvantages and accuracy performance. The current methods
research problems are defined in this paper. For future work,
new method designing should be done in order to improve
Accuracy Analysis [%] robustness, efficiency and accuracy of recognition.
REFERENCES
[1] Samad, Rosdiyana, and Hideyuki Sawada. "Edge based Facial Feature
Extraction Using Gabor Wavelet and Convolution Filters." In MVA, pp.
95.33 430-433. 2011.
92.33 92.7 [2] Thai, Le Hoang, Nguyen Do Thai Nguyen, and Tran Son Hai. "A facial
expression classification system integrating canny, principal component
analysis and artificial neural network." arXiv preprint arXiv: 1111.4052
(2011).
85 [3] Sisodia, Priya, Akhilesh Verma, and Sachin Kansal. "Human Facial
83.9
81.2 Expression Recognition using Gabor Filter Bank with Minimum
80.22 Number of Feature Vectors." International Journal of Applied
Information Systems, Volume 5 – No. 9, July 2013 pp. 9-13. [4] Meher,
Sukanya Sagarika, and Pallavi Maben. "Face recognition and facial
expression identification using PCA." In Advance Computing
Conference, 2014 IEEE International, pp. 1093- 1098. IEEE, 2014.
REF. 9 REF. 10 REF. 11 REF. 12 REF. 13 REF. 5 REF. 6 [4] Jun Yu & Zengfu Wang, “A Video-Based Facial Motion Tracking and
Expression Recognition System”, Springer Science Business Media
Methods [Reference Number] New York 2016.
[5] Aniruddha Dey, “Contour based Procedure for Face Detection and
Figure 3: Accuracy Comparison of Different Methods Tracking from Video”, 3rd Int'I Conf. on Recent Advances in
Information Technology I RAIT-20161, 2016.
IV. RESEARCH GAP [6] Le Nguyen Bao, Dac-Nhuong Le, Le Van Chung and Gia Nhu Nguyen,
“Performance Evaluation of Video-Based Face Recognition Approaches
In above sections, we studied the most recent techniques for for Online Video Contextual Advertisement User-Oriented System”, ©
video based facial expression and recognition with goal of Springer India 2016.
finding their limitations and further scope of improvement. [7] Mohammad A. Haque, Ramin Irani, Kamal Nasrollahi, Thomas B.
Moeslund, “Facial video-based detection of physical fatigue for maximal
The methods are studied and compared above. The most muscle activity”, IET Computer Vision, 2016.
important performance metric for any method is recognition [8] Pojala Chiranjeevi et.al “Neutral face classification using personalized
accuracy. From the figure 3, we observed that accuracy appearance models for fast and robust emotion detection”, IEEE
performance of different methods, we have below Transactions on Image Processing. 2015.
observations: [9] Suja P et.al, “Dynamic Facial Emotion Recognition from 4D Video
Sequences”, ©2015 IEEE.
- The method those are robust are having poor
[10] Siti Khairuni Amalina Kamarol et.al, “Spatiotemporal feature extraction
performance for recognition accuracy. for facial expression recognition”, IET Image Process., pp. 1–8 & The
- The methods with better accuracy (in 90’s), suffering Institution of Engineering and Technology 2016.
from the efficiency and robustness for recognition. [11] Seyedehsamaneh Shojaeilangari et.al, “Robust Representation and
Recognition of Facial Emotions Using Extreme Sparse Learning”, IEEE
- Most of method does not evaluated for processing Transactions on Image Processing, 2015.
time which is also important for any facial expression [12] XIE Liping et.al, “Video-based Facial Expression Recognition Using
and emotion recognition methods. Histogram Sequence of Local Gabor Binary Patterns from Three
- Some video based methods evaluated using image Orthogonal Planes”, Proceedings of the 33rd Chinese Control
Conference, July 28-30, 2014, Nanjing, China.
based datasets.
- Overall accuracy is approximately near to 95 %,
which still required to be enhanced further.

978-1-5090-3243-3/17/$31.00 ©2017 IEEE 696

You might also like