You are on page 1of 67

Face Detection Using Stable Mass Spring

Model as Dynamic Deformable Template

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of


Bachelor of Science in Computer Science and Engineering

By:
Md. Abdul Kaiyum Neon (14101027), Md. Mahesinul Awal Mannan Joy
(14101036), Aparajita Saha (14101043), Tasnim Islam (14101048)

Supervised By:
Dr. Bilkis Jamal Ferdosi
Associate Professor
Department Of CSE, UAP

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


UNIVERSITY OF ASIA PACIFIC
March 2018
DECLARATION

We, hereby, declare that the work presented in this thesis is the outcome of the investigation
performed by us under the supervision of Dr. Bilkis Jamal Ferdosi, Associate Professor,
Department of Computer Science & Engineering, University of Asia Pacific. We also declare
that no part of this thesis and thereof has been or is being submitted elsewhere for the award
of any degree or Diploma.

Signature

Supervisor Supervisee

…………………………….. ...…..……………………………
Dr. Bilkis Jamal Ferdosi Md. Abdul Kaiyum Neon
Associate Professor, ID: 14101027
Department of CSE
University of Asia Pacific
…...…….………………………
Md. Mahesinul Awal Mannan Joy
ID: 14101036

..……...…………………………
Aparajita Saha
ID: 14101043

…………………………………
Tasnim Islam
ID: 14101048

i
ABSTRACT
The progression of face detection has increased tremendously during this era. It is a vast and
mature research field. Nowadays face can be detected quite reliably using advanced methods and
algorithms. In this thesis, we introduced a new method to detect a human face using “Stable Mass-
Spring Model (SMSM) as Dynamic Deformable Template (DDT)”. Previously there was a work
on DDT to detect objects and digits [10]. We were inspired by the work and improved the method
to the next level to detect a human face. We used a 10x11 rectangular grid model to detect and
classify human face. We also used three different kinds of sensors in our model. First, we used
intensity sensor to detect different intensity in different position of a face. Secondly, we used Sobel
to detect the edge of the face. Third and finally, we used Harris-corner-detection to detect facial
corners. We used AT&T database to train and test our proposed method. Through training, we
acquired the total external energy from the image force, which we used to calculate our Confidence
Interval (CI). In testing, we used the CI and compared it with the energy of the testing data. Based
on the comparison our method can detect and classify human and non-human face. Through the
thesis we found that i) DDT using SMSM can detect a human face, ii) through training we can
learn proper parameter to improve performance, iii) rectangular grid model needs less training than
other methods, iv) the more sensor we use, the more accurate result we can acquire.

ii
ACKNOWLEDGEMENTS

First of all, we would like to thank the almighty Allah. Today we are successful in completing our
work with such ease because He gave us the ability, chance, and cooperating supervisor.

We would like to take the opportunity to express our gratitude to Dr. Bilkis Jamal Ferdosi, our
respected supervisor. Although she was always loaded with several other activities, she gave us
more than enough time in this work. She not only gave us time but also proper guidance and
valuable advice whenever we faced with some difficulties. Her comments and guidance helped us
in preparing our thesis report.

We want to thank all the teacher & staffs of our department for their help during working period.
Finally, we would like to thank our parents and family members for their support.

iii
Dedication

Dedicated to
all image processing
and Computer vision researchers.

“Knowledge from which no benefit is derived is like a treasure out of which nothing is spent
in the cause of Allah."
The Prophet Muhammad (SM) (peace be upon him)

iv
Table of Contents
DECLARATION .................................................................................................. I
ABSTRACT ......................................................................................................... II
ACKNOWLEDGEMENT ................................................................................. III
CHAPTER 1: INTRODUCTION ....................................................................... 1
1.1 MOTIVATION .................................................................................................. 1
1.2 PROPOSED WORK ...........................................................................................3
1.3 APPROACH OVERVIEW ...................................................................................4
CHAPTER 2: STATE OF THE ART ..................................................................5
2.1 FEATURE-BASED METHODS ..........................................................................5
2.2 STATISTICAL-BASED METHODS ................................................................... 11
2.3 MODEL-BASED METHODS........................................................................... 15
2.3.1 Description-Based DMs ....................................................................... 15
2.3.2 Prototype-Based DMs .......................................................................... 21
2.4 DEFORMABLE TEMPLATE MATCHING ........................................................ 22
2.4.1 Geometrical Model ............................................................................... 23
2.4.2 Imaging Model ..................................................................................... 23
2.4.3 Matching Algorithm ............................................................................. 24
2.5 DISCUSSION ................................................................................................. 24
CHAPTER 3: BACKGROUND ......................................................................... 26
3.1 FACE RECOGNITION SYSTEM....................................................................... 26
3.1.1 Face Detection ...................................................................................... 27
3.1.2 Feature Extraction ................................................................................ 28
3.1.3 Face Recognition/ Classification ........................................................ 29
3.1.4 Face Recognition Application ............................................................. 30
3.2 DEFORMABLE MODEL ................................................................................. 31
3.3 THE MASS-SPRING MODEL ......................................................................... 31
3.4 DYNAMICS OF THE MODEL.......................................................................... 33
3.4.1 Internal Forces...................................................................................... 33
3.4.2 External Force ...................................................................................... 35
3.5 DISCRETE EQUATIONS OF MOTION ............................................................ 36
CHAPTER 4: PROPOSED METHOD ............................................................. 37
4.1 INTRODUCTION ........................................................................................... 37
4.1.1 Model design ........................................................................................ 37
4.1.2 Activating the DDTs ............................................................................ 38

v
4.2 SIMULATION OF SMSM ............................................................................... 38
4.2.1 Sensors .................................................................................................. 39
4.2.2 Gaussian smoothing............................................................................. 41
4.3 TRAINING .................................................................................................... 42
4.4 CLASSIFICATION .......................................................................................... 44
CHAPTER 5: EXPERIMENTS & RESULTS .................................................. 45
5.1 TEMPLATE CREATION, SENSORS AND INITIAL PARAMETER SETTINGS ....... 45
5.2 TRAINING AND TEST DATASET ................................................................... 46
5.3 MEASURE PERFORMANCE OF OUR PROPOSED METHOD .............................. 46
5.4 DISCUSSION ................................................................................................. 51
CHAPTER 6: CONCLUSION ........................................................................... 52
6.1 CONCLUSION ............................................................................................... 52
6.2 FUTURE WORK ............................................................................................. 52
BIBLIOGRAPHY ............................................................................................... 54

vi
LIST OF FIGURES
1.1 Rectangular Grid DDT 04
2.1 An example of accurate and detailed face detection in each of the frames over a
video sequence of ASL sentences. The top row shows the automatic detection
obtained with the algorithm defined in this thesis. The bottom row provides a
manual detection for comparison 06
2.2 the red star in the figure corresponds to the center for the eye window, used to
generate the training data for eyes. The blue dots represent the window centers for
the background samples. The distance from the eye center and the background
window is set to 24 pixels 06
2.3 Two examples of eyebrow detection. (a) Binary description of the brow region. (b)
Final contour detection 07
2.4 (a) mouth corner detection, (b) Gradient of the mouth window, its x and y
projection, and the final result 08
2.5 Recognition accuracy varies considerably with mixture complexity. Both “soft”
and “hard” VQ versions of mixture distance are presented 10
2.6 (a) Input Training Images, (b) Edge 12
2.7 Normalized Training Images 13
2.8 Middle: An image containing frontal faces subject to in-plane rotations. Left 14
and right: In-plane rotated by ±30o
2.9 Schematic illustration of merge from different channels. From left to right: Outputs
of fontal, left, right view channels, and the final result after the merge 14
2.10 A deformable template for “archetypal” human eyes 16
2.11 Final results of the eye templates. The right column shows the final state of the
template acting on the eyes in the eyes in the left column. Note that a small error
occurs in the alignment of the bottom template. This is due to a strong intensity peak
on the eyelid and some shadow in the eye 17
2.12 The mouth closed template. The mouth is centered on a point xm and has an
orientation 𝜃. The left and right boundaries are at distances b1 and b2 from xm. The
intersection of the upper two parabolas occurs directly above xm at a height of h.
The lower two parabolas have maximum distances from the central line (shown
dotted) of a and a+ c 17
2.13 A dynamic field for a closed mouth: (a) the original mouth, (b) the valley field, (c)
the peak field, and (d) the edge field. The fig is organized left to right and top of the
bottom. The strengths of the fields are shown in Greystone, white is strong and black
are weak 18
2.14 Final results for the mouth-closed templates. The right column shows the final state
of the template acting on the mouth in the left column 18
2.15 Estimation of facial feature positions 19

vii
2.16 (a) Face shape overlaid on a real image. (b) Topological shape with local texture
attributes 20
2.17 Feature extraction examples (without performing the initial position estimation
step) 20
2.18 Deformable template matching (a) a prototype template (b) deformable template
with transformation interpolation of order 3 (c) localization of a saxophone using
manually chosen initial template position 22
2.19 An example of template matching. The example fish can be matched to only 3 fishes
(solid lines) with rigid template matching using translation, scaling, and rotation. It
can be matched to all the fishes using deformable template matching (dashed lines) 23
3.1 The basic flowchart of a face recognition 26
3.2 spring-mass model 32
3.3 collapsing problem (a) shows a simple mass-spring model. In (b) this system is
contorted, although no spring has changed its length. In (c) the same system is
collapsed with no single spring length changed. (d) is the same system but fully
interconnected. So it tries to avoid such shape instabilities, but lose flexibility at the
same moment 33
3.4 Torsion force: rest directions of the springs are shown in dotted lines; springs
deviated from their rest directly under the influence of the external force (shown in
solid line). 34
4.1 Block diagram of a model design 37
4.2 Block diagram of SMSM simulation 38
4.3 RGB and Gray Scale Image 39
4.4 Image seen as 3-D surface 40
4.5 corners detection 40
4.6 Edges detection using Sobel detection 41
4.7 Block diagram of a training 42
4.8 Block diagram of a classification 44
5.1 (a) is the image when the data is first entered in our method, (b) is the image when
the acquiring image force is complete. 47
5.2 (a) is the main input image for the test, (b) is the feature image after using Harris 50
corner detection, (c) is the feature image after using Sobel edge detection, (d) is the
simulation starting point image, (e) is the simulation ending point image.
5.3 The success rate of our method (SMSM as DDT). 51

viii
LIST OF TABLES
2.1 The f parameter which selects first vs. second-order models has a potent
effect on recognizer accuracy. 10
3.1 Face Recognition Applications 29

5.1 These are the results of our training data, which we acquire using Eq. 3.5 47

ix
Chapter 1
Introduction

1.1 Motivation
Face recognition is getting huge emphasis for the last few years due to its various applications in
law enforcement and security issues. Expectedly, this years-old research area is experiencing a
boom recently, with different techniques to fine tune the face recognition task considering different
obstacles of real-life scenarios.

Face recognition is a method that extracts features mathematically from an individual person and
stores the data as a face print, which then used to recognize the person by comparing the face print
with a live capture or digital image. The process of recognizing a face involves detecting the face
by segmentation from the natural scene, extracting features from the segmented face, and
recognizing the face compared with the faces in the database.

In the approach presented by Yuille et al. was detecting and describing a feature of the face such
as eyes and mouth by using parameterized deformable templates [2]. However, it was unable to
extract internal regions such as the forehead or the cheeks or to find external feature such as ears
and hair and there was no guarantee in the convergence because of it was ineligible for a real-time
surveillance application. F. Zuo and P.H.N. de, earlier research [3] was efficacious feature
localization approach for real-time surveillance appeal using low-quality images. By using
multiple initialization points this technique overcomes the problem of local convergence which
was experienced by Yuille et al. but it only exploits partial descriptions such as iris center. Cootes
described two approaches, Active Shape Model (ASM) and Active Appearance Model (AAM) for
feature extraction [4]. The ASM uses a local deformation process to manipulate the shape model
to the target image. During the model deformation ASM only used 1-D profile information. The
AAM able to synthesize a very close approximation during the model deformation to any image.
AAM only samples the current position of the image. ASM is faster and acquires more
approximate feature point location than the AAM. But both ASM and AAM approach go down

1
Chap. 1: Introduction

with wrong convergence when the real face position was not found. F. Zuo and P.H.N. de,
combined and created one single model in which they were able to solve the above problem by
introducing Haar- wavelets algorithm for modeling [1].

Ding and Martinez, introduced a combined approach to provide accurate and detail detection of
the major facial features [5]. They used part by part of facial feature extraction for eyebrows, eyes,
nose, mouth, and chin for accurate detection. In the previous time in many applications, face
detection process does not run and a very accurate and detailed detection of the shape of the major
facial features was needed. It means limited success on this task. So, they use their algorithm to
detect face automatically. The task of detecting eyes becomes extremely challenging, when an eye
opens or closes, and the iris may be looking at different directions. So, their experimental results
on the detection of these facial features such as the expression, pose, and illumination over the
facial features vary substantially.

Xiang Yu et al. tried to solve the problems of facial landmark localization and tracking from a
single camera [6]. The pose variation problem is partially addressed by multi-view face shape
model [7,8,9]. S. Zhang, et al. had three problems. The problem was stirring non- Gaussian errors,
modeling complex shape variants, and configure local detail information of the input shape [7]. So
they try to solve this problem and they achieved 70% success. Y. Zhou et al. tried to solve the
problem of multi-view face alignment [8]. There were two major problems: i) the problem of multi-
modality caused by diverse shape variation and ii) the varying number of feature points caused by
self-occlusion. They solved these problems by proposing a new nonlinear model for multimodal
shape registration, where a proficient EM (expectation–maximization) algorithm for form
regularization is manifested. Their proposed model can work on a large range. T. Cootes and C.
Taylor had the problem to gain unfailing parameters for a blending model than for an individual
Gaussian [9]. Their pattern exhibited that, blending model can be used to represent the non-linear
shape changing and can be used to find the class in the new image. However, they could not solve
the unlimited possibilities of view changes. They tried to solve the problems using two-step
cascaded deformable shape model.

2
Chap. 1: Introduction

Model-based, especially Mass-Spring models are popular for their capability of segmenting the
region of interest and classifying it by collecting features. They are robust against occlusion,
shadow, and missing parts. They also require very few training samples to achieve the recognition
successfully. Real life scenarios are always very prone to such conditions. Therefore, applying
such model may provide certain benefits compared to traditional approaches. Such as, Statistical
approach depends on a number of factors and statistical calculations to determine the result.
Whereas, SMSM (Stable mass-spring model) based approach analyze the data (image) and
calculate the internal and external force to determine the result.

B. J. Ferdosi and K. D. Tönnies, combined physical based model and deformable template and
creates a dynamic deformable template (DDT) to classify and detect digit [10]. She used two types
of model to detect digit. The first model was handcrafted DDTs which was different for each digit.
The second model was rectangular grid DDTs which was able to detect and classify any digit. She
used two types of forces for the method (External force and internal force).

We use the same method of [10] and created a rectangular grid DDTs to detect a human face.

1.2 Proposed Work


In this thesis, we propose a model using stable mass-spring model (SMSM) as a dynamic
deformable template (DDT) to detect face. This technique is a combination of template-based and
physically-based strategy. Shape information is introduced as model forces. Because of flexibility,
the templates are able to change their size and other parameter values so that they can match
themselves to the data. They will change depending on the external force (image force) and internal
force (mass force). We propose to use a rectangular grid model which will be implemented on
forces. We decided to use AT & T face database as a standard. We choose to use this database for
the variation of faces. AT & T face database contains 10 different images of each of 40 different
people, which is very helpful to test our method. The size of each image is 92*112 pixels, with
256 grey levels per pixel. We create a rectangular grid using DDT which is able to detect and
classify between human & non-human faces.

3
Chap. 1: Introduction

Fig 1.1: Rectangular Grid DDT [10].

1.3 Approach Overview


The presentation of the thesis will be done according to the following structure.
In chapter 1, we discuss our motivation and proposed work and we have introduced the Face
Recognition, Mass-Spring models, Deformable Templates. Rest of the chapters are organized as
follows.
In Chapter 2, we acquire some state of the art by discussing face recognition techniques are
presented.
In Chapter 3, the theoretical background of the stable mass spring model (SMSM) is presented
which is the basis of our proposed Dynamic Deformable Template (DDT).
In Chapter 4, a detailed description of the proposed method of DDTs is presented.
In Chapter 5, the training & testing of our method using dataset and the result calculation is
delineated.
In Chapter 6, the direction of future work and conclusion is depicted.

4
Chapter 2
State of the Art
In this thesis, our goal is to create a dynamic deformable model to detect a human face, which
is illustrated in the previous chapter. In this chapter, we will discuss related work of different
methods. Our main goal is to learn how the methods work and their effectiveness. We will also
have a short review of thesis work related to face recognition and deformable template.
There are mainly three different methods of face recognition:

2.1 Feature-Based Methods


In this method, local features such as eyes, nose, and mouth are first of all extracted and their
position and native statistics (geometric or appearance) are compiled into a structural classifier.
A big challenge for feature extraction methods is a feature "reinstatement", this is when the
system efforts to recover features that are unseen due to large mutation, e.g. head pose when
they are matching a frontal image with a profile image [11].

Ding and Martinez, introduced a combined approach to provide accurate and detail detection
of the major facial features [5]. They used part by part facial feature extraction for eyebrows,
eyes, nose, mouth, and chin for accurate detection. To detect facial features, they employed an
algorithm which learns to split data into adequate subclasses. Each subclass represents a unique
image features. This process can detect facial feature in difficult condition. They learnt to
discriminate between the surroundings of each facial feature from itself. This prevents
detections in areas around the actual feature and improved accuracy. Once a large number of
such classifiers provide a pre-detection of the facial features, they employed edge detection
and color to find the connecting path between them. These results in the final detection are
shown in (Fig 2.1). They provided experimental results on the detection of this facial features
over video sequence of ASL sentences (figure 2.1).

5
Chap. 2: State of The Art

Fig 2.1: An example of accurate and detailed face detection in each of the frames over a video sequence of ASL
sentences. The top row shows the automatic detection obtained with the algorithm defined in this thesis. The
bottom row provides a manual detection for comparison [5].

For eye detection, the first class will be the eyes, dedicated by 941 images of pollarded eyes,
since the second group will fit to copped images of the equivalent shape (24*30 pixels)
resending in the neighboring areas of the eyes. In (fig 2.2) shows how the eye window is
centered since the eight accompanying background windows are located off-center. This yields
a total of 7,528 background images. In order to increase robustness to scale and twirl, they
extended the training set by together with the image gained when re-scaling the main training
images from 0.9 to 1.1 at internals of 1 and by adding randomly rotated versions of them within
the extent of -15◦ to 15◦.

Fig 2.2: the red star in the figure corresponds to the center for the eye window, used to generate the training data
for eyes. The blue dots represent the window centers for the background samples. The distance from the eye
center and the background window is set to 24 pixels [5].

To detect the eyes in a new image. They do a total search in a prespecified region within the
face detection box. Then they first inquiry for them within the lesser green field (since this
includes ∼ 90% of the eye centers in their training set). If the eye’s centers are not successfully
situated in this field, they move their inquiry to the large blue field.

6
Chap. 2: State of The Art

For other detection, Ding and Martinez used different models. Since the eyebrows are either
darker and lighter than the leather, it is easy to detect these by investigating for non-skin paint
in the field atop the eyes. To model leather paint, they use a Gaussian model defined in the
HSV paint shape, N(𝜇c, Σc), with µc = (µH, µS, µV )T and Σc = diag (𝜎𝐻, 𝜎𝑆, 𝜎𝑉). Over three
million leather paint sample points were used to train this model. The pixel atop the eye field
that does not correspond to the color model N(µc, Σc) is dispelled first. A Laplacian operator
is then fruitful to the grayscale image atop the eyes. From the residual of the pixels, the ones
with the highest gradient in one by one column are kept as strong descriptors of the eyebrows.

Fig 2.3: Two examples of eyebrow detection. (a) Binary description of the brow region. (b) Final contour
detection [5].

They used their methodology to execute a fast, proper and detailed detection without the need
to resort to complicated or time swallowing duties. To this end, a mouth corner detection is
defined using the subclass-based classifier. They only train a left mouth classifier. To detect
the right corners, they create speculum images of the testing windows (Fig. 2.4(a))
The restrict box of the mouth is gained following the same procedure described atop for the
nose (Fig. 2.4(b)).

Mouths are rich in color, which makes the final way of depicturing them possible. Here, they
used a resembling way that engaged before (leather color and Laplacian edge) detection. In
individual, they extract three features, given by saturation, hue, and Laplacian edges. Each of
these masks is a threshold at values Ts, Th and Tg prior being combined into a single mask
reaction.

7
Chap. 2: State of The Art

 Ts is set as the average saturation value in our color model.


 Th is set as the mean hue minus one-third of the standard deviation.
 Tg is the mean of the gradient.

An example result is shown in Fig. 2.4(b).

Fig 2.4: 2.4(a) mouth corner detection. 2.4(b) Gradient of the mouth window, its x and y projection, and the
final result [5].

Cox et al. introduced the mixture distance technique which gains a recognition rate of 95% on
a database of 685 people in which each face is represented by 30 measured distances [12].
They formed a new distance function based on local second series statistics as calculated by
modeling the training data as a mixture of general solidities they report on the results from a
mixture of various sizes. Their main goal was to understand the performance limitations of the
feature-based system. The structure of such methods alters extensively but three major
elements may be identified.
 The definition of a feature set.
 The extraction of these features from an image and
 The recognition algorithm.

They assume that an n-element normal mixture model M was created to model the database
components {yi} and they refer to this as the experimental distribution. Each mixture
8
Chap. 2: State of The Art

component MK is a normal density N∑𝑘 𝜇𝑘 and they note by Mk the zero-mean density N∑𝑘, 𝑜.
So Pr(x|Mk) = Pr(x-𝜇 k|Mk) . The system’s query to be classified is denoted q. Using the mixing
probabilities Pr(Mk) gained from EM, they may then calculate a posteriori element probabilities
Pr(Mk|x). These may be visualization of as a stochastic syndrome of x’s membership in each
of the mixture’s elements. They will attribute to each Yi an Oi which is a mixture of the Mk
determined by these stochastic membership values. This is explained best by the derivation
which follows.

1
Pr (q|y,M) = Pr(y|M) . Pr (q.y|M)
1
= Pr(y|M) . ∑nk=1 P r (q. y|Mk).Pr(Mk)
1
= . ∑nk=1 Pr (q|y, Mk) . Pr (y | Mk) . Pr (Mk)
Pr(y|M)
1
= .∑𝑛𝑘=1 Pr (q-y | Mk) . Pr (y | Mk) . Pr (Mk)
Pr(y|M)

= ∑𝑛𝑘=1 Pr (q-y | Mk) Pr (Mk|y)


Where,
Pr (q,y|M) = Pr (q-y | Mk) and
Pr(y|Mk)Pr( Mk)
Pr (Mk|y) =
𝑖=1 Pr(y|M) Pr(yi)
∑𝑛

It is this preparation they use for all of their experiments. In different more complex
expressions are given according to weaker or different assumptions. Finally, they observe that
in the case of one mixture components n = 1, mixture-distance reduces to the Mahalanobis
distance from the query q to average face 𝜇.
They use simple Euclidean distance metric to the database and gained an 84% recognition
level.

9
Chap. 2: State of The Art

For simplicity, they adopt a flat selection method f = 1/2 to decide between first and second
order models, i.e.

Fig 2.5: Recognition accuracy varies considerably with mixture complexity. Both “soft” and “hard” VQ versions
of mixture distance are presented [12].

Just the diagonal variance and the full covariance matrices. Table 2.1 illustrates how f can
significantly affect recognition accuracy.

Mixture elements Recog. Rate Recog. Rate f=0.5 Recog. Rate f≈ 1


f=0
1 93% 91% 84%
2 86% 94% 84%
5 88% 94% 81%
A mixture of 1-10 NA 95% NA
Mixtures

Table 2.1: The f parameter which selects first vs. second-order models has a potent effect on recognizer accuracy
[12].
When the mixture consists of a single Gaussian, a first-order variance model (f = 0) is best and
the full second-order covariance model (f ≈ 1) is notably worse. However, for mixture sizes 2
and 5, an off-diagonal weighting, f = 1/2, is best while the full second-order model with f =
≈ 1 still a distant third. The point of Table 2 is not that the flat selection model can increase
the recognition rate from 93% to 94% but rather that the recognition rate is consistently good
using a flat selection.

10
Chap. 2: State of The Art

2.2 Statistical-Based Methods


This is the simplest approach of statistical based in which the image is represented as a 2D
array of intensity values and recognition is done by direct correlation comparisons between the
input face and all other faces in the database. Example of this includes Principal Component
Analysis (PCA), Eigenfaces and fisher face, LDA [13] etc.

The Eigenface method is one of the generally used algorithms for face recognition. The
eigenface technique is based on the Principal Component Analysis (PCA) also known as
Karhunen-Loeve transform. This method is successfully used to perform dimensionality
reduction. Principal Component Analysis is used for face recognition and detection [14].

M A Rabbani and C. Chellappan, want to effectively identify a frontal human face with
recognition rate using appearance-based statistical method for Face Recognition [15]. Their
approach produced better recognition rate for more complex images are used. For
Computational models of face recognition, their approaches are interesting because they can
contribute not only to theoretical insights but also to practical applications. This face
recognition method could be applied to a variety of problems, including criminal identification,
security systems, image and film processing, and human-computer interaction. Bledsoe [1966]
was the first to attempt semi-automated face recognition with a hybrid human-computer system
that classified faces on the basis of fiducially marks entered on photographs by hand. Later
work at Bell Labs (Goldstein, Harmon, & Lesk.1971) developed a vector of up to 21 features
and recognized faces using standard pattern classification techniques.

Face Recognition using Median


Let a face image I(x, y) be a two-dimensional N by N array of (8-bit) intensity values. On the
other hand, an image may also be considered as a vector of dimension N2, these vectors defined
the subspace of face images, which they call “face space”. A typical image of size 256 by 256
becomes a vector of dimension 65,536, or, equivalently, a point in 65,536-dimensional space.

11
Chap. 2: State of The Art

Let the training set of face images be Γ1, Γ2, Γ3,….., ΓM. and median face of the Set is said
ψ.
Фi = Γi – ψ………………………………………….........................................................(2.1)

(a) (b)
Fig 2.6: (a) Input Training Images, (b) Edge [15].

An example training set is shown in Fig 2.6 (a) with the median face Ψ shown in Fig 2.6 (b).
This set of very large vectors is then subject to principal component analysis, which seeks a
set of M orthonormal vectors un, which best describes the distribution of the data. The Kth
vector, uk is chosen such that,
λk = 1 ⁄M ∑(ukT фn)2 ,n=1,2 …..M ……………………………………………………(2.2)
ulT uk = δ = {1,if 1= k 0, otherwise
The vectors uk and scalars λk are the eigenvectors and eigenvalues, respectively, of the
covariance matrix
C = 1 ⁄M ∑ фn фn T , n=1,2…..M …………………………………………………….(2.3)
= AAT
Where the matrix A = {Ф1, Ф2, Ф3 … Ф4}.
Image Фi. Consider the eigenvector vi of AAT such that ATA Vi = μi vi
Pre-multiplying both sides by A, they have A ATA Vi = μi A vi
M training set of face images to form the Eigenfaces ul Ul = ∑vlk Фk, n=1,2….M

12
Chap. 2: State of The Art

Using Eigenvectors to Classify a Face Image

A new face image (Γ) is transformed into its eigenface components (projected into “face
space”) by a simple operation,
Wk = WkT(Γ – ψ)
an input face image is to find the face class k that minimizes the Euclidean distance
€ k = ║ Ω - Ωk ║2 where Ωk is a vector describing.
Using two face databases, the Yale face database and the other face database is AT&T.
In this experiment, they have tested their approach on several frontal face images with variant
facial expressions and lighting conditions.

Fig 2.7: Normalized Training Images [15].

This experimental result shows that our technique is the simplest and effective method for face
recognition using Median with various distance measures, and they obtained the best accuracy
using City Block distance.

Li et al. are discussing the Pattern recognition problems that have two essential issues: (i)
feature selection, and (ii) classifier design based on selected features. Boosting method
attempts to boost the accuracy of an ensemble of weak classifiers to a strong one [16]. The
AdaBoost is an algorithm, which solved many of the practical difficulties of earlier boosting
algorithms. Floating Search, which is a sequentially select feature with backtracking. Two
straight sequential selection method, sequential forward search (SFS) or sequential backward

13
Chap. 2: State of The Art

search (SBS) adds or deletes one feature at a time. The Sequential Floating Search methods
which allow the number of backtracking steps to be controlled instead of being fixed
previously.
1) Learning incrementally crucial features from a large feature set.
2) Constructing weak classifiers each of which is based on one of the selected features.
3) Boosting the weak classifiers into a stronger classifier using a linear combination
derived during the learning process.

They propose a new boosting algorithm, which is called Float Boost, it works for effective
statistical learning [16]. AdaBoost is a sequential forward search procedure, which is using the
greedy selection strategy. Float Boost incorporates the idea of Floating Search to solve the
non-monotonicity problem encountered in the sequential algorithm of AdaBoost. They could
present an application of Float Boost in a learning-based system for real-time multi-view face
detection [16]. For non-frontal faces, multi-view face detection should be able to detect it.

Fig 2.8: Middle: An image containing frontal faces subject to in-plane rotations. Left and right: In-plane rotated
by ±30o [16].

Fig 2.9: Schematic illustration of merge from different channels. From left to right: Outputs of fontal, left, right
view channels, and the final result after the merge [16].

14
Chap. 2: State of The Art

2.3 Model-Based Methods


In this method, a model is hypothesized for each feature of the face and finds the best fit to the
given model. This method locates the feature by clustering each feature separately. This
reflects the spatial distribution of the data points. In this case, our main interest is to know,
how the models are defined, how the models are working and in what purpose they have
utilized. After that, we will have a short review of papers related to face detection and
deformable template matching measures.
In this thesis different deformable models (DMs) will be discussed in two broad categories,
following the taxonomy and definition [10].

I. Description-Based DMs
Description-based DMs model shape apparently (shape descriptive) with deformations
modeled as perturbations in the shape parameter space [10].
II. Prototype-Based DMs
In prototype-based DMs, parameters are not shaped descriptive but parameterized
shape deformations directly [10].

2.3.1 Description-Based DMs

2.3.1.1 Local Description-Based DMs


Local description-based DMs are also called free-form models, physics-based models, or
distributed parameter models. Polygonal representation is one of the simplest ways to represent
this type of DMs by a linear interpolation of an ordered set of points. Without involving the
movement of the others each model point can move freely and it is locally parameterized. This
kind of representation used by Kass et al. in their Active contour model [15]. Yuille et al. was
detecting and describing a feature of the face such as eyes and mouth by using parameterized
deformable templates [2]. They propose a new method to detect such features by using

15
Chap. 2: State of The Art

deformable templates. These templates are specified by a set of parameters which enables a
priori knowledge about the expected shape of the features to guide the detection process. The
templates are flexible enough to be able to change their size, and other parameter values, so as
to match themselves to the data.

The template is illustrated in fig 2.10. It has a total of eleven parameters represented by
→ = (→ , → , p1, p2, r, a, b, c, 𝜃 ). All of these are allowed to vary during the matching.
g xc, xe

Fig 2.10: A deformable template for “archetypal” human eyes [2].

To give the explicit representation for the boundary they first defined vectors.
→ = (cos 𝜃, sin 𝜃) and → = (-sin 𝜃, cos 𝜃)
e1 𝑒2

Which changes as the orientation of the eye changes. A point x in shape can be represented by
(x1, x2) where, x = x1e1 + x2e2. Using the co-ordinates the top half of the boundary can be
represented by a section of a parabola with x1 𝜖 [-b, b].

𝑎
x2 = a- b2 x12

16
Chap. 2: State of The Art

Note that the maximum height, x2, of the parabola is a and the height is zero at x1 = ± b.
similarly, the lower half of the boundary is given by,

𝑐
x2 = -c + b2 x12

Where, x1 𝜖 [-b, b]

Fig 2.11: Final results of the eye templates. The right column shows the final state of the template acting on the
eyes in the eyes in the left column [2].

They define the mouth closed template in terms of a coordinate system (x,y) centered on a
point x (the center of the mouth) and inclined at an angle 𝜃 (the orientation of the mouth). The
positive y direction points downward for consistency with the coordinate system used on the
computer screen. The edge at the top of the upper lip is represented by two parabolas pu which
intersect above the center of the mouth.

Fig 2.12: The mouth closed template. The mouth is centered on a point xm and has an orientation 𝜃. The left and
right boundaries are at distances b1 and b2 from xm. The intersection of the upper two parabolas occurs directly
above xm at a height of h. The lower two parabolas have maximum distances from the central line (shown dotted)
of a and a+ c [2].

17
Chap. 2: State of The Art

Fig 2.13 demonstrates the method and shows the time history of a simulation and Fig 2.14
shows the final position s of the template on several images. For the mouth open template,
there were again two epochs: (1) Coefficients are high for the valley forces and the teeth forces.
They are zero for the edges forces on the boundary. The teeth forces pull the template to the
mouth, scale, and orient it. (2) The edge coefficients are increased. The edge forces help adjust
the position of the edge boundaries. Stage (2) has not yet been implemented for this case.

Fig 2.13: A dynamic field for a closed mouth: (a) the original mouth, (b) the valley field, (c) the peak field, and
(d) the edge field. The fig is organized left to right and top of the bottom. The strengths of the fields are shown in
Greystone, white is strong and black are weak [2].

Fig 2.14: Final results for the mouth-closed templates. The right column shows the final state of the template
acting on the mouth in the left column [2].
Cohen et al. introduced a new active contour model called balloons where the model curve
behaves like an inflated balloon and ignores the spurious or weak edges with the help of

18
Chap. 2: State of The Art

inflation force [16]. In that model, a correct choice for parameters is guided by numerical
analysis considerations. The main advantage of most of the active contour models is they do
not incorporate any global shape information of the face and they require predefined
parameterization that does not change during the deformation process.

2.3.1.2 Global Description-Based DMs


Parametric models, geometric models or lumped parameter models are other names of Global
description-based DMs. The representation is used, if prior knowledge about the shapes is
known. By the help of specific parameter adaptation scheme, the model generates a family of
shapes [10]. F. Zuo and P.H.N. de, tried to address the problems of below researchers [1]. In
this study, they attempt to solve the problems. The key to their solution is based on three
improvements. Firstly, they use a gradient map for fast determination of the approximate facial
feature positions. Secondly, they employ 2-D texture attributes for improving the convergence
and robustness of a deformable face model. Thirdly, they adopt Haar wavelets for modeling
local texture attributes, which offers high processing speed and robustness for low-quality
images.

Fig 2.15: Estimation of facial feature positions using T-shaped model [1].

In Fig 2.15 the existing methods like ASM do not have a good initialization period. In this
case, they use a fast presumption of the face position in order to improve the overall algorithm’s
correctness and convergence. B. Froba and C. Kulbeck, employed a gradient map of the image
to determine the best match between a candidate portion and an average face template, once
the face location and scale are estimated [17].

19
Chap. 2: State of The Art

They define a face model (FM) as an ordered set of face point. In fig 2.16 illustrates their face
model. The shape description P portrays the global topology of the facial features, while
texture parameters T = {Ti|1 ≤ i ≤ NFP} describe the local patterns characterizing each feature
point. Later they will see that the local texture attributes guide the deformation of the face
model to a specific shape instance, while the global feature topology constrains the
deformation to maintain a face-like shape.

Fig 2.16: (a) Face shape overlaid on a real image. (b) Topological shape with local texture attributes [1].

Their deformable model fitting takes 30–70 ms to process one face (Pentium-IV, 1.7 GHz),
which is comparable to ASM. Fig 2.17 shows two examples. It can be seen that their proposed
feature extraction is able to achieve correct convergence, even when the model is poorly
initialized.

Fig 2.17: Feature extraction examples (without performing the initial position estimation step) [1].

2.3.1.3 Analytical Models


Analytical Models are represented by parametric curves in which their shapes can be controlled
by a compact set of parameters. They are global shape models. By reducing the search space
for the subsequent matching step Parametric curves have restricted modeling capability and
meaningful shape information for subsequent image analysis can provide by the extracted

20
Chap. 2: State of The Art

model parameters. These are the advantages of this representation. A good example of this
type of representation is represented by Yuille et al. [2]. They used parametric templates to
detect feature like eyes or lips of the human face.

2.3.1.4 Decomposition-based Models


Shapes of similar types represented by the shape decomposition approach with the global
support via a combination of a set of basis functions, where the basis can be either pre-defined
in a priori manner or obtained via training. Decomposition-based models are in general less
governing with respect to their representational power, compared with analytical models.

Fourier Decomposition can be used for a model representation in order to get base function. A
parametric model base on the elliptical fourier decomposition of the contour is presented by
Staib et al. [25]. A set of sinusoidal base function was used to present a parametric contour.
Eigen Decomposition is another technique commonly used to represent a shape family that
derived from a basis. By retaining only the significant eigenvectors as the basis, dimension
reduction can be effectively achieved. A good example of this type of representation was
proposed by Cootes et al. in their Active Shape Model (ASM) [25].

2.3.2 Prototype-Based DMs


The major characteristic of description-based DMs uses explicit shape abstractions for
representing objects. The underlying deformation process is parameterized instead and the
building blocks of the models are simply images or shape prototypes in that cases prototype-
based DMs are fundamentally different. The model construction step is unnecessary for using
the prototype-based paradigm. This, however, implies that the global shape information. The
object localization and retrieval scheme proposed by Jain et al. uses this approach [20].

21
Chap. 2: State of The Art

(a) (b) (c)

Fig: 2.18. Deformable template matching (a) a prototype template (b) deformable template with transformation
interpolation of order 3 (c) localization of a saxophone using manually chosen initial template position [20].

2.4 Deformable Template Matching


Deformable template matching is more versatile and flexible in dealing with the insufficiency
of rigid shape matching. By allowing deformation of the template geometry and providing
relative invariance to lightening condition, deformable templates propose a solution to the
limitations of classical template matching. Because of its capability to deal with shape
deformations and variations, the deformable template matching can be more powerful
technique. The deformable templates concept was introduced to computer vision at one time
by Widrow with the ‘rubber masks’ [24] and Fischler et al. with the spring-loaded templates
[23].

22
Chap. 2: State of The Art

Fig 2.19: An example of template matching. The example fish can be matched to only 3 fishes
(solid lines) with rigid template matching using translation, scaling, and rotation. It can be
matched to all the fishes using deformable template matching (dashed lines).

A deformable template consists of three basic elements [2]:

2.4.1 Geometrical Model


The geometry of the template is defined by the parametric geometrical model and introduces
a restriction on the deformation of the geometry. This model includes the prior probabilities
for the parameters. These prior probabilities correspond to a geometric measure of fitness.
Considering an eye template, for instance, the prior probabilities should yield higher values for
a normal eye, and lower probabilities for an abnormal eye, and even fewer values for objects
that are not eyes

2.4.2 Imaging Model


The imaging model specifies how a deformable template of specific geometry is related to
specific intensity values (or colors) in a given image. This corresponds to an imaging measure
of fitness. Considering an eye template again, for instance, the imaging model should state that
iris would be much darker than the whites of the eye, and it should also state that the boundaries
of iris and eye correspond to high edge values in the image.

23
Chap. 2: State of The Art

2.4.3 Matching Algorithm


An algorithm using the geometrical and imaging measures of fitness to match the template to
the image. The algorithm might define how the template will be initialized, which optimization
algorithm will be used for energy minimization (gradient descent, downhill simplex, genetic
algorithms, etc), which states will be involved during the minimization, and so on.

2.5 Discussion

Through this literature review, we found a wide range of face recognition methods. The
problem they faced and the solution they used to solve the problem helps us understand more
about facial recognition system and how to improve our own method.

Ding and Martinez, introduced a combined approach to provide accurate and detail detection
of the major facial features [5]. They used part by part facial feature extraction for eyebrows,
eyes, nose, mouth, and chin for accurate detection. To detect facial features, they employ an
algorithm which learns to split data into adequate subclasses.

Cox et al. introduced the mixture distance technique which gained a recognition rate of 95%
on a database of 685 people in which each face is represented by 30 measured distances [12].
They formed a new distance function based on local second series statistics as calculated by
modeling the training data as a mixture of general solidities they report on the results from a
mixture of various sizes. Their main goal was to understand the performance limitations of the
feature-based system. The structure of such methods alters extensively but three major
elements may be identified.

M. A. Rabbani and C. Chellappan, want to effectively identify a frontal human face with
recognition rate using appearance-based statistical method for Face Recognition [15]. Their
approach produced better recognition rate for more complex images. For Computational
models of face recognition, their approaches are interesting because they can contribute not
only to theoretical insights but also to practical applications.

24
Chap. 2: State of The Art

Yuille et al. was detecting and describing a feature of the face such as eyes and mouth by using
parameterized deformable templates [2]. They propose a new method to detect such features
by using deformable templates. These templates are specified by a set of parameters which
enables a priori knowledge about the expected shape of the features to guide the detection
process. The templates are flexible enough to be able to change their size, and other parameter
values, so as to match themselves to the data.

25
Chapter 3
Background
The base of a good research is the understanding of the background terms and definitions. In
this chapter as background knowledge discovery, we will discuss face recognition system,
deformable model, deformable template matching, and stable mass-spring model (SMSM).

3.1 Face Recognition System


In general, face recognition systems are included in three steps. Their basic flowchart is given
in Figure 1. Among them, Face detection is the first-step in face recognition systems. Face
detection may include edge detection, corner detection, segmentation, and localization. Feature
extraction is the process of finding the face features like eyes, nose, mouth etc from the input
image, which represent the authentic attributes of an image. Face recognition may refer to as
it performs the grouping to the above image features in terms of particular standard [29].

Input image Feature Face Output


Detection
Extraction Recognition

Fig 3.1: The basic flowchart of a face recognition system.

26
Chap. 3: Background

3.1.1 Face Detection


As the name suggests, it is the detection of the face. In this phase, faces are detected in the
image. To detect the face from the image there are four methods:

Knowledge-based Method
The rule-based method uses the knowledge of human to get the information about the typical
face. Usually, the rules capture the relationships between facial features to design the location
of the features in the face [29].

Template Matching Method


In this, several standard patterns of a face are stored in the database or the system to describe
the face as a whole or the facial features separately. The link between an input image and the
stored patterns are evaluated for detection. These methods have been used for both face
localization and detection [29].

Appearance-based Method
In contrast to template matching, the models are learned from a set of training images which
should capture the representative variability of the appearance face. For detection, these
learned models are then used and are mainly designed for face detection [29].

Feature Invariant Approach


This approach aims to find structural features that exist even when the pose, viewpoint, or
lighting conditions vary [29].

27
Chap. 3: Background

3.1.2 Feature Extraction


It is the extraction of features like eyes, nose, and lips from the face which can be used further
to differentiate people from each other [29]. The approaches for face extraction are:

DCT (Discrete Cosine Transform)


The Discrete Cosine Transform expresses a sequence of data points in terms of a sum of cosine
functions oscillating at different frequencies. Therefore, it can be used to transform images,
compact the variations and allows an effective dimensionality reduction. For data compression,
DCT has been widely used [29].

JPEG (DCT Zigzag)


It is the scanning technique which moves in the zigzag form and from low-frequency
component to high-frequency component because most of the energy is stored in low-
frequency component [29].

PCA (Principal Component Analysis)


Principle Component Analysis is a mathematical action that performs a dimensionality
reduction by extracting the principal components of the multi-dimensional data. It is based on
Eigenvector and a linear map [29].

LBP (Local Binary Pattern)


This algorithm divides the image of the face into various small local parts that are further
represented as binary digits. For representing that image of a face the statistics histogram of
these binary digits is used [29].

28
Chap. 3: Background

3.1.3 Face Recognition/ Classification


It is the matching of the face with the existing face saved in the database of the system [29].
To match these faces we again have some methods:

Support Vector Machine (SVM)


By using a hyper-plane SVM can be a powerful technique to classify the unseen test patterns.
Hyper-plane is defined by the weighted combination of a small subset of the training vectors,
called support vectors [29].

Hidden Markov Model (HMM)


In HMM, a finite number of states are needed which must be hidden to form a model. Then,
one can train HMM to learn the transitional probability between states from the examples
where each example is represented as a sequence of observations [29].

Neural Networks (NN)


Neural Network is an interconnected cluster of artificial neurons that uses a computational
model for processing any information based on a connectionist approach to computation. In
most of the cases, an artificial neural network is an adaptive system that changes its structure
based on external or internal information that flows through the network [29].

Self-Organizing Map (SOM)


SOM belongs to the competitive learning networks. It is a kind of a neural network which is
trained by using unsupervised learning to produce a two-dimensional representation of the
input space of the training samples [29].

Face recognition system was measured using two basic methods. Identification performance
measured is the first one, where the percentage of probes in the primary statistic is correctly
identified by the algorithm. Measured verification is the second one, where the performance
measure is the equal error rate between the probability of false alarm and of correct verification.

29
Chap. 3: Background

3.1.4 Face Recognition Application

Face recognition has received remarkable attention due to its manifold applications, such as in
information security, smart cards, entertainment, law enforcement, and surveillance [29]. Table
3.1 lists some of the most important face recognition applications.

Area Specific Application


Information Security Flight boarding system, email
authentication, office access, application
security, database security, file encryption,
internet access, medical report.
Smart cards Faceprints can be stored in the smart card,
bar code, magnetic strip and authenticated
by matching the live image with a stored
template, stored value security, user
authentication
Biometrics Drivers licenses, immigration, national ID,
passports, voter registration.
Law enforcement and Surveillance Monitoring and searching for drug
offenders, CCTV control, power grid
surveillance, portal control, portal control,
post-event analysis, and investigation.
Video Indexing Labeling faces in the video

Civilian Applications e-booking and e-commerce

Access Control Facility access, vehicular access

Table 3.1: Face Recognition Applications [29].

30
Chap. 3: Background

In earlier research, Brunelli et al. recognized frontal views of the face by proposing two
traditional classes of techniques that applied to digital images [12]. The first technique was
based on the computation of a set of geometrical features. Which was the first technique
towards an automated recognition of the face. This approach represents satisfactory result for
recognition from frontal views but the vectors of geometrical features extracted by this system
had low stability. The second one was based on the use of whole image gray-level template
matching. This approach had superior in recognition performance but this not is as consistent
with properties of human vision in gender classification.

In this paper, our approach is to detect human face using stable mass-spring model (SMSM)
as a dynamic deformable template (DDT). Rests of this chapter, we discuss the Deformable
model, Deformable Template Matching, and Stable mass-spring model (SMSM).

3.2 Deformable Model


The term Deformable Model refers to as techniques broadly used in computer vision and
describes a group of computer algorithms. Deformable models provide a discrete model of an
object class that denotes a class of methods. By adapting itself to fit the given data deformable
models can be active. Because of its flexibility, and its ability to both impose geometrical
constraints on the shape and to integrate local image evidence, it is a useful shape model. In
image analysis such as face recognition, image segmentation or classification, the deformable
models are used. The image analysis is accomplished by fitting the deformable model to an
input image. Deformable models can be deformed to match a particular case of that object class
by describing the shape of objects as a flexible 2D curve or a 3D surface.

3.3 The Mass-Spring Model


Mass-spring models (MSM) are repeatedly used to model deformable objects for computer
graphics applications because of their simplicity and computational ability. This MSM is a
geometrical existence to receive it's shaped. Generally, it is represented by a graph. This graph,
31
Chap. 3: Background

where nodes are known as mass point and springs connecting two points that are called edges.
In this MSM graph, if we fixed the mass points then fixed springs length. So, using previous
knowledge of the object shape, mass positions, and springs ease lengths are estimated. The
internal force which is defined by the model graph and external force which is calculated from
the image data. The model takes a shape with internal force vectors adding to zero without any
external force. The model disfeatures only under external force. The external force acts as an
attraction or gravitational force. Decreasing distance to the attraction point that increases the
extension of the attraction force. The internal and external force is changeable, so the model is
disfigured until internal and external force is a balance.

Fig 3.2: spring-mass model [27].

Before, a model suffers from impermanence, where the structural integrity of the model is
controlled only by the spring fixed at the original mass-spring model. To overcome this
problem, it requires dense cross adjoin of the mass point create a very complex model. This
type of solution stop comes off at the cost of high count time and also makes the model too
inflexible.

Thus, S. Bergner et al. suggested the use of an angled force called torque force for stability but
in a higher dimension [26], it also faces the same problem as dense cross-linking. L. Dornhein
et al. solve the problem of instability by introduction torsion force in the system that provides
stability in arbitrarily high dimension without increasing model complexity [27]. In their work,
we used the torsion force for model stability.

32
Chap. 3: Background

Fig 3.3: collapsing problem (a) shows a simple mass-spring model. In (b) this system is
contorted, although no spring has changed its length. In (c) the same system is collapsed with
no single spring length changed. (d) is the same system but fully interconnected [27].

3.4 Dynamics of the Model


Consider a mass-spring model consisting of mass points with masses mi and initial position
vector, pi. Mass points are connected with springs of spring constant, kij and the rest length,
Rlij that are spatially associated with two mass points i and j can be calculated using Eq.
(3.1)

Rlij = pi - p j, where pi and pj are initial positions of mass i and j………..………………….(3.1)

The dynamics of such a system can be described by Newtonian mechanics. The movement of
the model is influenced by the forces acting on the mass points.

3.4.1 Internal Forces

3.4.1.1 Spring Force


The elastically deformed springs exert spring forces, Fsi on the mass points as they deviate

from their rest lengths i.e. mass points get new position vector, pnewi. The spring force, Fsi

33
Chap. 3: Background

acting on mass point i can be calculated using equation (3.2).

K ij  ( Pnewi  Pnewj  Rlij )  ( Pnewi  Pnewj )


Fsi   j …….………………………(3.2)
Pnewi  Pnewj

3.4.1.2 Torsion Force


For model stability, another internal force called torsion force is included in the model
dynamics. To introduce torsion force in the model, rest directions rdij of the springs starting
from one mass point i to all adjacent mass point j along the lines of the spring rest length is
calculated.

Fig 3.4: Torsion force: rest directions of the springs are shown in dotted lines; springs deviated
from their rest directly under the influence of the external force (shown in solid line).

Thus, the torsion force is exerted on the nodes in the direction of their rest direction. Springs
deviations from their rest direction exert a torsion force to the mass points in the direction of
their rest directions to restore the original position.

The torsion force exerts on mass point i can be calculated using Eq (3.3)

t j  p ji , rdji n ji
FTi   j
p ji

n ji
……………………………………….(3.3)

Pij = Pnewi  Pnewj

34
Chap. 3: Background

Where,
tj = torsion constant of the mass point j
The working direction of the torsion forces, nji is calculated using equation (3.4)

p ji , rdji  p ji
nr 2 ………………………………………….…………...(3.4)
p ji

3.4.2 External Force


3.4.2.1 Image-Force
The mass-spring model is activated by making it sensitive to the image. The model is
constrained to lie on the image allowing it to deform according to the image topography under
the action of some gravitational force. Each mass point in the model can be made sensitive to
different sensors. Sensors are defined application‐specific and may be sensitive e.g. to corners,
edges, or specific intensities. Image feature collected by each mass point act as a force on that
mass and the image force, Fli exerted on mass point i can be calculated using Eq. (3.5)

F li  imageloc i  ………………………………………………………….…..(3.5)

Where loc(i) gives the coordinates of the current location of the sensor i in the image and image
gives the image information at that position.

3.4.2.2 Damping Force


In physics, restraining of vibratory motion, by the waste of energy is known as damping force.
It occurs only when a model changes its position. Damping force along with the restoring force
for a spring-mass system. When a model cannot reach a stable position from oscillator
behavior, then dumping force in the system effectively reduces this problem. The damping
force can be defined as a function of velocity as in [28] or the speed of motion of the model
can be damped by a factor, d as in [27]. In our work, we used the second option for simplicity.

35
Chap. 3: Background

3.5 Discrete Equations of Motion


Given the forces discussed above, applying Newton’s second law of motion can be simulated
in a discrete time step of distance ∆t. According to Newton’s second law of motion,
acceleration ai of mass point i can be calculated using Eq. (3.6)

Fi
ai  ………………………………………………………………………(3.6)
mi

Where I Fi is a total force acting on mass point i and can be calculated using eq.(3.7)

Fi  wsi  Fsi  wTi  FTi  wli  Fli ……………………………………………………..(3.7)

Where,
wsi = weight for the spring force acting on mass point i.
wti = weight for the spring force acting on mass point i.
wli = weight for the spring force acting on mass point i.

Following the calculation of the mass point forces, we can compute the new velocity, vi and
position, pi of each mass point given the old velocity, vi old and old position values, pi old as
follows:

 
vi  viold  ai  t  1  d  …..………………………………………………………...(3.8)

pi  piold  vi  t …………...…………………………………………………….(3.8)

The model adaptation to the image will continue until equilibrium when external force cancels
out internal forces.

36
Chapter 4
Proposed Method

4.1 Introduction
Image recognition is a process, which usually consists of taking a picture, process the image
then presents the results. Face recognition has many applications in image processing field.
Therefore, face recognition is an important issue in image processing. In this thesis, we mainly
work on to detect human face using stable mass-spring model (SMSM) as a dynamic
deformable template (DDT). Whole work divides into four parts as follow:

4.1.1 Model design


Calculate external
forces as image force
Start

Calculate local coordinate


of mass and spring point Calculate total force
acting on each mass

Calculate internal forces Assign global


as spring and torsion coordinate of mass and
force spring point

Fig 4.1: Block diagram of model design

The proposed method is named as a stable mass-spring model (SMSM) as a dynamic


deformable template (DDT). Before detecting the face, we need some preprocessing of the
model.

37
Chap. 4: Proposed Method

4.1.2 Activating the DDTs

DDTs are activated by making its mass points sensitive to image data I(x, y). The DDTs are
constrained to lie in an image subject to some gravitational force. S(x,y) (in this thesis we will
use the term ‘sensor’ to indicate this surface) may correspond to image intensities (i.e S =
±I(x,y)) or to image gradient values (e.g., S = ∇I(x,y)). Once S is determined, the DDTs can
be placed on the surface allowing it to deform according to the surface topography under the
influence of gravity. Depending on the nature of the surface considered, the DDTs can be made
attracted to any image force.

4.2 Simulation of SMSM

Start

Input RGB images from data set

Convert into Gray scale image


scale image

Placing model to the image

Feature calculation

Intensity Corner Edge

Apply intensity Apply corner detection Apply edge detection


detection sensor technique (Herris) technique (Sobel)

Gaussian Smoothing Gaussian Smoothing Gaussian Smoothing

Finally obtain Finally obtain


corner edge

Fig 4.2: Block diagram of SMSM simulation.

38
Chap. 4: Proposed Method

For Simulation of SMSM first, we need to do some preprocessing of our database. We convert
the RGB images into Grayscale images. When the model comes to the simulation the image is
replaced in the model. In simulation stage, each node will get a sensor for feature detection.

Fig 4.3: RGB and Gray Scale Image.

4.2.1 Sensors

It has been already stated that we are interested to detect face and facial expression. The DDTs
should not be attracted to the artifacts from the individual image. We do not use the image
gradient as sensor since stroke width does reflect proper feature extraction in the image. For
that reason, intensity sensor can be used so that DDT can be made attracted to medial axis of
the face. We also used corner and edges sensor to detect corner and edges.

4.2.1.1 Intensity Sensor

An intensity sensor is the most basic sensor that will pass image information directly. During
model simulation intensity sensor helps nodes attracted to the bright region of the image.

39
Chap. 4: Proposed Method

Fig 4.4: Image seen as 3-D surface

4.2.1.2 Corner Sensor


The corner is one of the important features. The corner has various range of application in
computer vision like motion detection, image registration, video tracking, image mosaicking,
panorama stitching, 3d modeling, object recognition etc. A corner can be defined as the
intersection of two edges. The Harris corner detector is a corner detection operator that uses a
multi-stage algorithm to detect a wide range of corner in images.

Fig 4.5: corners detection.

40
Chap. 4: Proposed Method

4.2.1.3 Edges Sensor

Edge detection includes a variety of mathematical methods that aim at identifying points in a
digital image at which the image brightness changes sharply or more formally, has
discontinuities. The points at which image brightness changes sharply are typically organized
into a set of curved line segments termed edges. The same problem of finding discontinuities
in one-dimensional signals is known as step detection and the problem of finding signal
discontinuities over time is known as change detection. Edge detection is a fundamental tool
in image processing, machine vision, and computer vision, particularly in the areas of feature
detection and feature extraction. In our approach, we used Sobel operator or sensor for detect
edges.

Fig 4.6: Edges detection using Sobel detection.

4.2.2 Gaussian smoothing


In image processing, Gaussian smoothing is the result of blurring an image by a Gaussian
function. It is widely used in graphics software, typically to reduce image noise and reduce
detail. Gaussian smoothing is also used as a pre-processing stage in computer vision algorithms
in order to enhance image structures at different scales.

41
Chap. 4: Proposed Method

Mathematically, applying a Gaussian blur to an image is the same as convolving the image
with a Gaussian function. This is also known as a two-dimensional Weierstrass transform.

The equation of a Gaussian function in one dimension is:


x2

G(x) = 1
 2 2 ………………………………....(4.1)
2  2

in two dimensions, it is the product of two such Gaussians, one in each dimension:
x2  y2
1 
G(x; y) =  2 2 …………………………………(4.2)
2 2

4.3 Training
Start

Add image to the database

Applying the method SMSM to each image

Energy calculation & Save

Calculate the Mean and Standard Deviation from


the energy

Calculate the Confidence Interval (CI)

Fig 4.7: Block diagram of a training.

42
Chap. 4: Proposed Method

For training, we added some images to the AT & T database. Then applying the method stable
mass-spring model (SMSM) as a dynamic deformable template (DDT) to each image. From
the database image, we calculate the energy value. By using this energy value, we calculate
Mean and Standard Deviation.

In our approach, for calculating mean, we use this formula

1 n 
x    xi  …………………………………………..(4.3)
n  i 1 

Where xi is an individual energy value


n is the total number of values
For calculating Standard Deviation, we use this formula
1 n 
     xi   2  ……………………………………...…...(4.4)
n  i 1 
Where,
xi is an individual energy value
μ is the mean value
n is the total number of values
Then we calculate Confidence Interval (CI) from the value of Mean and Standard Deviation.
The confidence level is the frequency of possible confidence intervals that contain the true
value of their corresponding parameter.
We use this formula for calculating the Confidence Interval
S
X Z ……………………………………………….(4.5)
n
Where,
xi is the mean value
s is the standard deviation
n is the total number of values

43
Chap. 4: Proposed Method

4.4 Classification

Start

Add some non-human face image to the database (Testing set)

Applying the method SMSM to each image

Energy calculation

Comparing the energy value with Confidence Interval (CI)

Within Confidence Interval (CI) Out of Confidence Interval (CI)

Human Non-human

Error rate calculation

Fig 4.8: Block diagram of a classification.

In classification stage, we added some non-human face images to the database as the testing
set. Then we applied our method each of the test set. Then we calculated the energy value of
each. By comparing the energy value of each image with Confidence Interval (CI) of the
training set, we can detect the human and non-human face images. If the energy value of an
image is within the Confidence Interval (CI) range then the image contains a human face.
Otherwise, the image is containing a non-human face. Then we calculate an error rate.

44
Chapter 5
Experiments & Results

In this chapter, we will discuss the experiments that we have carried out with our proposed
DDT. We will also present a detailed discussion of the result and findings of our experiments
and the problems of our proposed method. We experimented with (10x11grid) rectangular
DDT.

Our line of investigation in this thesis can be described as follows:


a) Template creation, Sensors and Initial Parameter Settings.

b) Training and Test Dataset

c) Measure performance of our proposed method.

d) Discuss the problem of our proposed method.

In the next section, we discuss the model and it’s required environment settings.

5.1 Template Creation, Sensors and Initial Parameter


Settings
We created our template based on Stable Mass Spring Model. This template is created by
specifying its size and resolution of the grid. First, we started with a 6x6 grid but it shows very
poor result in detection and classification. The reason may be the lack of sensors required to
accrue data. Thus, we experimented with increasing resolution of 6x7, 7x7, 7x8, 8x8, 8x9, 9x9,
9x10, 10x10, 10x11, 11x11. With the 11x11 resolution, we face a problem exceed memory
consumption. The total mass points were too much to handle memory.

So, the 10x11 grid is the best resolution to cover the image (120x120 pixel) and acquire data.
The template has total 110 mass points and 199 springs. The boundary mass points are fixed,
they do not change their position under any circumstances. But the inner mass points are
capable of moving under the influence of the image force. The internal mass points will move
45
Chap. 5: Experiments & Results

until it finds its suitable place. The mass points use 3 different kinds of sensors, which are
already discussed in the previous chapter.

It is essential to select necessary sensor to acquire the correct data. We already stated that we
are interested to create a new way to detect and classify face using SMSM. So, our primary
goal is to detect a human face and classify it into two groups human and non-human face. Thus,
we used the sensors, which is already explained in the previous chapter.

Another important thing is the parameter of the template and the initial data of the template.
We initialized the weight of the spring and mass points by an educated guess.

5.2 Training and Test Dataset


In our experiment with rectangular grid DDT, 20 sample images were chosen from AT&T
dataset. Each and every human face has different features, so we try to train our method with
some unique face images. As the database we used was noise free, we did not need to do
anything to remove noise.

For test dataset, we used another 20-human face from the AT&T database and we took 10 non-
human face images. The non-human images were RGB images, so we had to convert them to
Grayscale in order to use in our method as testing data.

5.3 Measure performance of our proposed method


We test our proposed method by comparing the CI we gained from the training with the
external force (eq. 3.5) of the test images. The method to acquire CI describer at previous
chapter (Chap. 4.3, eq. 4.5). All of the test data were tested by our method in order to test our
method’s performance.

46
Chap. 5: Experiments & Results

(a) (b)
Fig 5.1: (a) is the image when the data is first entered in our method, (b) is the image when
the acquiring image force is complete.

Data no Total External


Force
01 12.3243
02 10.0025
03 10.1999
04 11.6397
05 9.18047
06 9.05511
07 12.271
08 9.28504
09 10.0144
10 7.80402
11 12.1499
12 9.9578
13 9.95863
14 13.0854
15 8.26651
16 10.088
17 10.8386
18 10.5279
19 10.1892
20 10.5765

Table 5.1: These are the results of our training data, which we acquire using Eq. 3.5

47
Chap. 5: Experiments & Results

The table 5.1 shows that the different energy for different images, that’s why we had to use
multiple different images as training data to make our method robust and more accurate.

After using this training data to calculate CI, we use that CI to compare with the testing data.
If the total external force (eq. 3.5) is between the CI range, then our method will recognize and
classify the image as a human face. Otherwise non-human face.

48
Chap. 5: Experiments & Results

49
Chap. 5: Experiments & Results

(a) (b) (c) (d) (e)

Fig 5.2: (a) is the main input image for the test, (b) is the feature image after using Harris corner detection, (c) is
the feature image after using Sobel edge detection, (d) is the simulation starting point image, (e) is the simulation
ending point image.

Fig 5.2 is the sample of our test data and its simulation. We use 20 human faces from our
AT&T database & 10 non-human faces as testing data.

Our method was able to detect and classify 24 images correctly. Among the correct detection
and classification 14 were original human face image and 10 were non-human face images.
Our method was not able to detect and classify 6 human face images correctly. Thus, the
success rate of our method is 80% (Fig. 5.3).

50
Chap. 5: Experiments & Results

Success Rate and Error

20%
Success
Error

80%

Fig. 5.3: The success rate of our method (SMSM as DDT).

5.4 Discussion

Our method has some problems to detect and classify between human and non-human faces.
There are some primary reasons for the problem:
 As the parameter we set is an educated guess, so the parameter was no accurate. We
need to use more training to improve the performance.
 The rectangular DDT is fixed at some point. That’s why the sensors were not able to
reach the desired location of the face to detect accurate data.
 We used only 3 sensors in our method. To get more accurate result we need to
introduce more sensors in our method.
 In some data, Harris-corner detection was not able to detect all the corners of the face
to detect it perfectly. In order to overcome the problem, we need to segment the data
into some specified parts to detect the corners more accurately.
 We use grayscale images as our data. Grayscale images miss some important facial
feature and margins, so we need to improve our model to work with RGB full-color
images and train it to get accurate data.

There are a lot of scopes to work and improve our method in the future. We will improve our
method and make it more robust, by solving the problems we found in our thesis.

51
Chapter 6
Conclusion

This chapter summarizes the work presented in this thesis. It also discussed about the scope
of future works related to our proposed method.

6.1 Conclusion
The main goal of this thesis was to investigate the possibility of using Stable Mass Spring
Models as deformable templates to use face detection. Thus, we proposed a Dynamic
Deformable Templates (DDT) which is capable of finding the face and non-face. Previous
work with Dynamic Deformable Templates (DDT) was able to detect object and digits. We
investigated a (10x11grid) rectangular grid DDT to detect a human face. The rectangular grid
DDT is independent, but shape information of the image is generated from the sensors. We
investigated training of the DDT to find out the result and improvement of our method. The
mass points are defined by 3 different sensors. We input RGB images from the dataset and
convert it into a Grayscale image of face detection. We also use Gaussian Smoothing for
smoothing the image and apply Sobel and Harris technique for corner and edge. We are
successful to achieve a new method using stable mass-spring model (SMSM) as a dynamic
deformable template (DDT), which is able to classify human and non-Human face.

6.2 Future work


We will improve our method which will work faster and able to work in difficult situations. In
the previous section, we indicate that the investigation done in this thesis is just beginning.
 We will conduct more training to learn parameters which will improve the performance
of our method. The focus of the future investigation will be strengthening the mass
points and make the sensors more active and accurate.

52
Chap. 6: Conclusion

 In this thesis, we mainly work on to detect a frontal face, facial expression, and feature.
using stable mass-spring model (SMSM) as a dynamic deformable template (DDT).
We will develop a model using SMSM for multi-view face detection in future.

 We will improve the model to detect the facial expressions, i.e. happy face, sad face,
angry face etc.

 An important future work can be implemented, is to improve the model to be able to


create a full facial image from a partial face image using 3D reconstruction. We hope
it will be more robust method of face detection.

53
Bibliography

• [1] F. Zuo and P. H. N. de With, “Fast facial feature extraction using a deformable

shape model with Haar-wavelet based local texture attributes,” in Image

Processing. ICIP '04. 2004 International Conference on Image Processing, Vol. 3,

pp. 1425-1428.

• [2] A. L. Yuille, D. S. Cohen, and P. W. Hallinan, “Feature extraction from faces

using deformable templates,” in Proc. CVPR, 1989, pp. 104–109.

• [3] F. Zuo and P. H. N. de With, “Towards fast feature adaptation and localization

for real-time face recognition systems,” in Proc. SPIE, 2003, vol. 5150, pp. 1857–

1865.

• [4] T. Cootes, “Statistical models of appearance for computer vision,” Tech. Rep.,

Univ. Manchester, 2001.

• [5] Ding L. and Martinez A. M., “Precise Detailed Detection of Faces and Facial

Features”, IEEE Conference on Computer Vision and Pattern Recognition, DOI

10.1109/CVPR.2008.4587812 (2008).

• [6] Xiang Yu, Junzhou Huang, and Shooting Zhang “Pose-free Facial Landmark

Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model,”

Computer Vision (ICCV), 2013 IEEE International Conference on image

processing, DOI: 10.1109/ICCV.2013.244

• [7] T. Cootes and C. Taylor, “A mixture model for representing

shape variation.” In BMVC, 1997.

54
Bibliography

• [8] Y. Zhou, W. Zhang, X. Tang, and H. Shum. “A Bayesian mixture model for

multi-view faces alignment.” In CVPR, 2005.

• [9] S. Zhang, Y. Zhan, M. Dewan, J. Huang, D. Metaxas, and X. Zhou. “Sparse

shape composition: A new framework for shape prior modeling.” In CVPR, 2011.

• [10] B. J. Ferdosi and K. D. Tönnies, “Using A Stable Mass-Spring Model as

Trainable Dynamic Deformable Template.” Otto‐von‐Guericke‐Universität,

Magdeburg ©2006.

• [11] Divyarajsinh N. Parmar and Brijesh B. Mehta, “Face Recognition Methods &

Applications,” International Journal of Computer Technology & Applications, Vol

4 (1), pp. 84-86, Jan-Feb 2013.

• [12] Ingemar J. Cox, Joumaiia Ghosn and Peter N.Yianilos, “Feature-Based Face

Recognition Using Mixture-Distance” Computer Vision and Pattern Recognition,

1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on, DOI:

10.1109/CVPR.1996.517076.

• [13] B. a. Draper, K. Baek, M. S. Bartlett, and J. R. Beveridge, "Recognizing faces

with PCA and ICA," Computer vision and image understanding, vol. 91, pp. 115-

137, 2003.

• [14] V. Vijayakumari, "Face recognition techniques: A survey," world journal of

computer application and technology, vol. 1, 2013.

• [15] M A Rabbani and C. Chellappan , “A Different Approach to Appearance –

based Statistical Method for Face Recognition Using Median”, IJCSNS

International Journal of Computer Science and Network Security, VOL.7 No.4,

April 2007.

55
Bibliography

• [16] Stan Z. Li1, Long Zhu1, ZhenQiu Zhang, Andrew Blake, HongJiang Zhang1,

and Harry Shum1, “ Statistical Learning of Multi-view Face Detection”, ECCV

2002: Computer Vision, ECCV 2002 pp 67-81.

• [17] M. Kass A. Witkin and D. Terzopoulos, “Snakes: active contour models,” Int.

J. Comput. Vision 1 (4) (1987) 321–331.

• [18] Cohen Laurent D. “On active contour models and Ballons”, CVGIP: Image

understanding, 53(2),1991.

• [19] B. Froba and C. Kulbeck, “Real-time face detection using edge-orientation

matching,” in Proc.AVBPA 2001, 2001, pp. 78–83.

 [20] A.K. Jain, Y. Zhong and M.-P. Dubuisson-Jolly, “Deformable template

models: a review,” Signal Process. 71 (1998) 109–129.

 [21] K.w Cheung, D.Y. Yeung and R.T. chin: “On Deformable Models For visual

pattern Recognition.”, Pattern Recognition 35(2002) pp. 1507-1526.

 [22] R. Brunelli and T. Poggio, “Face Recognition: Features versus Templates,”

IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 15,

Issue: 10, Oct 1993 )

 [23] M. Fischler and R. Elschlager, “The representation, and matching of pictorial

structures,” IEEE Trans. Comput. 22 (1) (1973) 67-92.

 [24] B. Widrow, “The rubber mask technique, Parts I and II,” Pattern Recognition

5 (1973) 175-211.
56
Bibliography

 [25] T.F. Cootes and C.J. Taylor, “Active shape models—Smart snakes,” in:

Proceedings of the Third British Machine Vision Conference, Leeds, UK,

September 1992, pp. 266–275.

 [26] S. Bergner, S. Al‐Zubi, and K. D. Tönnies: “Deformable Structural Models”

in IEEE International Conference on Image Processing 2004, Singapore, Oct. 2004,

pp. 24.‐27.

 [27] L. Dornheim, K.D. Tonnies, and J. Dornhein: “Stable Dynamic 3D Shape

Models” in IEE International Conference on Image Processing ICIP2005, 2005.

 [28] G. Hamarneh and T. Mclnerney: “Physics-Based Shape Deformations for

Medical Image Analysis”. SPIE-IS & T Electronic Imaging: Images Processing:

Algorithms and Systems, vlo. 5014, 2003.

 [29] N. H. Barnouti, S. S. Mahmood and W. E. Matti, “Face Recognition: A

Literature Review”. International Journal of Applied Information Systems (IJAI) –

2249:0868, vol: 11, no: 04, September 2016.

57

You might also like