Professional Documents
Culture Documents
By:
Md. Abdul Kaiyum Neon (14101027), Md. Mahesinul Awal Mannan Joy
(14101036), Aparajita Saha (14101043), Tasnim Islam (14101048)
Supervised By:
Dr. Bilkis Jamal Ferdosi
Associate Professor
Department Of CSE, UAP
We, hereby, declare that the work presented in this thesis is the outcome of the investigation
performed by us under the supervision of Dr. Bilkis Jamal Ferdosi, Associate Professor,
Department of Computer Science & Engineering, University of Asia Pacific. We also declare
that no part of this thesis and thereof has been or is being submitted elsewhere for the award
of any degree or Diploma.
Signature
Supervisor Supervisee
…………………………….. ...…..……………………………
Dr. Bilkis Jamal Ferdosi Md. Abdul Kaiyum Neon
Associate Professor, ID: 14101027
Department of CSE
University of Asia Pacific
…...…….………………………
Md. Mahesinul Awal Mannan Joy
ID: 14101036
..……...…………………………
Aparajita Saha
ID: 14101043
…………………………………
Tasnim Islam
ID: 14101048
i
ABSTRACT
The progression of face detection has increased tremendously during this era. It is a vast and
mature research field. Nowadays face can be detected quite reliably using advanced methods and
algorithms. In this thesis, we introduced a new method to detect a human face using “Stable Mass-
Spring Model (SMSM) as Dynamic Deformable Template (DDT)”. Previously there was a work
on DDT to detect objects and digits [10]. We were inspired by the work and improved the method
to the next level to detect a human face. We used a 10x11 rectangular grid model to detect and
classify human face. We also used three different kinds of sensors in our model. First, we used
intensity sensor to detect different intensity in different position of a face. Secondly, we used Sobel
to detect the edge of the face. Third and finally, we used Harris-corner-detection to detect facial
corners. We used AT&T database to train and test our proposed method. Through training, we
acquired the total external energy from the image force, which we used to calculate our Confidence
Interval (CI). In testing, we used the CI and compared it with the energy of the testing data. Based
on the comparison our method can detect and classify human and non-human face. Through the
thesis we found that i) DDT using SMSM can detect a human face, ii) through training we can
learn proper parameter to improve performance, iii) rectangular grid model needs less training than
other methods, iv) the more sensor we use, the more accurate result we can acquire.
ii
ACKNOWLEDGEMENTS
First of all, we would like to thank the almighty Allah. Today we are successful in completing our
work with such ease because He gave us the ability, chance, and cooperating supervisor.
We would like to take the opportunity to express our gratitude to Dr. Bilkis Jamal Ferdosi, our
respected supervisor. Although she was always loaded with several other activities, she gave us
more than enough time in this work. She not only gave us time but also proper guidance and
valuable advice whenever we faced with some difficulties. Her comments and guidance helped us
in preparing our thesis report.
We want to thank all the teacher & staffs of our department for their help during working period.
Finally, we would like to thank our parents and family members for their support.
iii
Dedication
Dedicated to
all image processing
and Computer vision researchers.
“Knowledge from which no benefit is derived is like a treasure out of which nothing is spent
in the cause of Allah."
The Prophet Muhammad (SM) (peace be upon him)
iv
Table of Contents
DECLARATION .................................................................................................. I
ABSTRACT ......................................................................................................... II
ACKNOWLEDGEMENT ................................................................................. III
CHAPTER 1: INTRODUCTION ....................................................................... 1
1.1 MOTIVATION .................................................................................................. 1
1.2 PROPOSED WORK ...........................................................................................3
1.3 APPROACH OVERVIEW ...................................................................................4
CHAPTER 2: STATE OF THE ART ..................................................................5
2.1 FEATURE-BASED METHODS ..........................................................................5
2.2 STATISTICAL-BASED METHODS ................................................................... 11
2.3 MODEL-BASED METHODS........................................................................... 15
2.3.1 Description-Based DMs ....................................................................... 15
2.3.2 Prototype-Based DMs .......................................................................... 21
2.4 DEFORMABLE TEMPLATE MATCHING ........................................................ 22
2.4.1 Geometrical Model ............................................................................... 23
2.4.2 Imaging Model ..................................................................................... 23
2.4.3 Matching Algorithm ............................................................................. 24
2.5 DISCUSSION ................................................................................................. 24
CHAPTER 3: BACKGROUND ......................................................................... 26
3.1 FACE RECOGNITION SYSTEM....................................................................... 26
3.1.1 Face Detection ...................................................................................... 27
3.1.2 Feature Extraction ................................................................................ 28
3.1.3 Face Recognition/ Classification ........................................................ 29
3.1.4 Face Recognition Application ............................................................. 30
3.2 DEFORMABLE MODEL ................................................................................. 31
3.3 THE MASS-SPRING MODEL ......................................................................... 31
3.4 DYNAMICS OF THE MODEL.......................................................................... 33
3.4.1 Internal Forces...................................................................................... 33
3.4.2 External Force ...................................................................................... 35
3.5 DISCRETE EQUATIONS OF MOTION ............................................................ 36
CHAPTER 4: PROPOSED METHOD ............................................................. 37
4.1 INTRODUCTION ........................................................................................... 37
4.1.1 Model design ........................................................................................ 37
4.1.2 Activating the DDTs ............................................................................ 38
v
4.2 SIMULATION OF SMSM ............................................................................... 38
4.2.1 Sensors .................................................................................................. 39
4.2.2 Gaussian smoothing............................................................................. 41
4.3 TRAINING .................................................................................................... 42
4.4 CLASSIFICATION .......................................................................................... 44
CHAPTER 5: EXPERIMENTS & RESULTS .................................................. 45
5.1 TEMPLATE CREATION, SENSORS AND INITIAL PARAMETER SETTINGS ....... 45
5.2 TRAINING AND TEST DATASET ................................................................... 46
5.3 MEASURE PERFORMANCE OF OUR PROPOSED METHOD .............................. 46
5.4 DISCUSSION ................................................................................................. 51
CHAPTER 6: CONCLUSION ........................................................................... 52
6.1 CONCLUSION ............................................................................................... 52
6.2 FUTURE WORK ............................................................................................. 52
BIBLIOGRAPHY ............................................................................................... 54
vi
LIST OF FIGURES
1.1 Rectangular Grid DDT 04
2.1 An example of accurate and detailed face detection in each of the frames over a
video sequence of ASL sentences. The top row shows the automatic detection
obtained with the algorithm defined in this thesis. The bottom row provides a
manual detection for comparison 06
2.2 the red star in the figure corresponds to the center for the eye window, used to
generate the training data for eyes. The blue dots represent the window centers for
the background samples. The distance from the eye center and the background
window is set to 24 pixels 06
2.3 Two examples of eyebrow detection. (a) Binary description of the brow region. (b)
Final contour detection 07
2.4 (a) mouth corner detection, (b) Gradient of the mouth window, its x and y
projection, and the final result 08
2.5 Recognition accuracy varies considerably with mixture complexity. Both “soft”
and “hard” VQ versions of mixture distance are presented 10
2.6 (a) Input Training Images, (b) Edge 12
2.7 Normalized Training Images 13
2.8 Middle: An image containing frontal faces subject to in-plane rotations. Left 14
and right: In-plane rotated by ±30o
2.9 Schematic illustration of merge from different channels. From left to right: Outputs
of fontal, left, right view channels, and the final result after the merge 14
2.10 A deformable template for “archetypal” human eyes 16
2.11 Final results of the eye templates. The right column shows the final state of the
template acting on the eyes in the eyes in the left column. Note that a small error
occurs in the alignment of the bottom template. This is due to a strong intensity peak
on the eyelid and some shadow in the eye 17
2.12 The mouth closed template. The mouth is centered on a point xm and has an
orientation 𝜃. The left and right boundaries are at distances b1 and b2 from xm. The
intersection of the upper two parabolas occurs directly above xm at a height of h.
The lower two parabolas have maximum distances from the central line (shown
dotted) of a and a+ c 17
2.13 A dynamic field for a closed mouth: (a) the original mouth, (b) the valley field, (c)
the peak field, and (d) the edge field. The fig is organized left to right and top of the
bottom. The strengths of the fields are shown in Greystone, white is strong and black
are weak 18
2.14 Final results for the mouth-closed templates. The right column shows the final state
of the template acting on the mouth in the left column 18
2.15 Estimation of facial feature positions 19
vii
2.16 (a) Face shape overlaid on a real image. (b) Topological shape with local texture
attributes 20
2.17 Feature extraction examples (without performing the initial position estimation
step) 20
2.18 Deformable template matching (a) a prototype template (b) deformable template
with transformation interpolation of order 3 (c) localization of a saxophone using
manually chosen initial template position 22
2.19 An example of template matching. The example fish can be matched to only 3 fishes
(solid lines) with rigid template matching using translation, scaling, and rotation. It
can be matched to all the fishes using deformable template matching (dashed lines) 23
3.1 The basic flowchart of a face recognition 26
3.2 spring-mass model 32
3.3 collapsing problem (a) shows a simple mass-spring model. In (b) this system is
contorted, although no spring has changed its length. In (c) the same system is
collapsed with no single spring length changed. (d) is the same system but fully
interconnected. So it tries to avoid such shape instabilities, but lose flexibility at the
same moment 33
3.4 Torsion force: rest directions of the springs are shown in dotted lines; springs
deviated from their rest directly under the influence of the external force (shown in
solid line). 34
4.1 Block diagram of a model design 37
4.2 Block diagram of SMSM simulation 38
4.3 RGB and Gray Scale Image 39
4.4 Image seen as 3-D surface 40
4.5 corners detection 40
4.6 Edges detection using Sobel detection 41
4.7 Block diagram of a training 42
4.8 Block diagram of a classification 44
5.1 (a) is the image when the data is first entered in our method, (b) is the image when
the acquiring image force is complete. 47
5.2 (a) is the main input image for the test, (b) is the feature image after using Harris 50
corner detection, (c) is the feature image after using Sobel edge detection, (d) is the
simulation starting point image, (e) is the simulation ending point image.
5.3 The success rate of our method (SMSM as DDT). 51
viii
LIST OF TABLES
2.1 The f parameter which selects first vs. second-order models has a potent
effect on recognizer accuracy. 10
3.1 Face Recognition Applications 29
5.1 These are the results of our training data, which we acquire using Eq. 3.5 47
ix
Chapter 1
Introduction
1.1 Motivation
Face recognition is getting huge emphasis for the last few years due to its various applications in
law enforcement and security issues. Expectedly, this years-old research area is experiencing a
boom recently, with different techniques to fine tune the face recognition task considering different
obstacles of real-life scenarios.
Face recognition is a method that extracts features mathematically from an individual person and
stores the data as a face print, which then used to recognize the person by comparing the face print
with a live capture or digital image. The process of recognizing a face involves detecting the face
by segmentation from the natural scene, extracting features from the segmented face, and
recognizing the face compared with the faces in the database.
In the approach presented by Yuille et al. was detecting and describing a feature of the face such
as eyes and mouth by using parameterized deformable templates [2]. However, it was unable to
extract internal regions such as the forehead or the cheeks or to find external feature such as ears
and hair and there was no guarantee in the convergence because of it was ineligible for a real-time
surveillance application. F. Zuo and P.H.N. de, earlier research [3] was efficacious feature
localization approach for real-time surveillance appeal using low-quality images. By using
multiple initialization points this technique overcomes the problem of local convergence which
was experienced by Yuille et al. but it only exploits partial descriptions such as iris center. Cootes
described two approaches, Active Shape Model (ASM) and Active Appearance Model (AAM) for
feature extraction [4]. The ASM uses a local deformation process to manipulate the shape model
to the target image. During the model deformation ASM only used 1-D profile information. The
AAM able to synthesize a very close approximation during the model deformation to any image.
AAM only samples the current position of the image. ASM is faster and acquires more
approximate feature point location than the AAM. But both ASM and AAM approach go down
1
Chap. 1: Introduction
with wrong convergence when the real face position was not found. F. Zuo and P.H.N. de,
combined and created one single model in which they were able to solve the above problem by
introducing Haar- wavelets algorithm for modeling [1].
Ding and Martinez, introduced a combined approach to provide accurate and detail detection of
the major facial features [5]. They used part by part of facial feature extraction for eyebrows, eyes,
nose, mouth, and chin for accurate detection. In the previous time in many applications, face
detection process does not run and a very accurate and detailed detection of the shape of the major
facial features was needed. It means limited success on this task. So, they use their algorithm to
detect face automatically. The task of detecting eyes becomes extremely challenging, when an eye
opens or closes, and the iris may be looking at different directions. So, their experimental results
on the detection of these facial features such as the expression, pose, and illumination over the
facial features vary substantially.
Xiang Yu et al. tried to solve the problems of facial landmark localization and tracking from a
single camera [6]. The pose variation problem is partially addressed by multi-view face shape
model [7,8,9]. S. Zhang, et al. had three problems. The problem was stirring non- Gaussian errors,
modeling complex shape variants, and configure local detail information of the input shape [7]. So
they try to solve this problem and they achieved 70% success. Y. Zhou et al. tried to solve the
problem of multi-view face alignment [8]. There were two major problems: i) the problem of multi-
modality caused by diverse shape variation and ii) the varying number of feature points caused by
self-occlusion. They solved these problems by proposing a new nonlinear model for multimodal
shape registration, where a proficient EM (expectation–maximization) algorithm for form
regularization is manifested. Their proposed model can work on a large range. T. Cootes and C.
Taylor had the problem to gain unfailing parameters for a blending model than for an individual
Gaussian [9]. Their pattern exhibited that, blending model can be used to represent the non-linear
shape changing and can be used to find the class in the new image. However, they could not solve
the unlimited possibilities of view changes. They tried to solve the problems using two-step
cascaded deformable shape model.
2
Chap. 1: Introduction
Model-based, especially Mass-Spring models are popular for their capability of segmenting the
region of interest and classifying it by collecting features. They are robust against occlusion,
shadow, and missing parts. They also require very few training samples to achieve the recognition
successfully. Real life scenarios are always very prone to such conditions. Therefore, applying
such model may provide certain benefits compared to traditional approaches. Such as, Statistical
approach depends on a number of factors and statistical calculations to determine the result.
Whereas, SMSM (Stable mass-spring model) based approach analyze the data (image) and
calculate the internal and external force to determine the result.
B. J. Ferdosi and K. D. Tönnies, combined physical based model and deformable template and
creates a dynamic deformable template (DDT) to classify and detect digit [10]. She used two types
of model to detect digit. The first model was handcrafted DDTs which was different for each digit.
The second model was rectangular grid DDTs which was able to detect and classify any digit. She
used two types of forces for the method (External force and internal force).
We use the same method of [10] and created a rectangular grid DDTs to detect a human face.
3
Chap. 1: Introduction
4
Chapter 2
State of the Art
In this thesis, our goal is to create a dynamic deformable model to detect a human face, which
is illustrated in the previous chapter. In this chapter, we will discuss related work of different
methods. Our main goal is to learn how the methods work and their effectiveness. We will also
have a short review of thesis work related to face recognition and deformable template.
There are mainly three different methods of face recognition:
Ding and Martinez, introduced a combined approach to provide accurate and detail detection
of the major facial features [5]. They used part by part facial feature extraction for eyebrows,
eyes, nose, mouth, and chin for accurate detection. To detect facial features, they employed an
algorithm which learns to split data into adequate subclasses. Each subclass represents a unique
image features. This process can detect facial feature in difficult condition. They learnt to
discriminate between the surroundings of each facial feature from itself. This prevents
detections in areas around the actual feature and improved accuracy. Once a large number of
such classifiers provide a pre-detection of the facial features, they employed edge detection
and color to find the connecting path between them. These results in the final detection are
shown in (Fig 2.1). They provided experimental results on the detection of this facial features
over video sequence of ASL sentences (figure 2.1).
5
Chap. 2: State of The Art
Fig 2.1: An example of accurate and detailed face detection in each of the frames over a video sequence of ASL
sentences. The top row shows the automatic detection obtained with the algorithm defined in this thesis. The
bottom row provides a manual detection for comparison [5].
For eye detection, the first class will be the eyes, dedicated by 941 images of pollarded eyes,
since the second group will fit to copped images of the equivalent shape (24*30 pixels)
resending in the neighboring areas of the eyes. In (fig 2.2) shows how the eye window is
centered since the eight accompanying background windows are located off-center. This yields
a total of 7,528 background images. In order to increase robustness to scale and twirl, they
extended the training set by together with the image gained when re-scaling the main training
images from 0.9 to 1.1 at internals of 1 and by adding randomly rotated versions of them within
the extent of -15◦ to 15◦.
Fig 2.2: the red star in the figure corresponds to the center for the eye window, used to generate the training data
for eyes. The blue dots represent the window centers for the background samples. The distance from the eye
center and the background window is set to 24 pixels [5].
To detect the eyes in a new image. They do a total search in a prespecified region within the
face detection box. Then they first inquiry for them within the lesser green field (since this
includes ∼ 90% of the eye centers in their training set). If the eye’s centers are not successfully
situated in this field, they move their inquiry to the large blue field.
6
Chap. 2: State of The Art
For other detection, Ding and Martinez used different models. Since the eyebrows are either
darker and lighter than the leather, it is easy to detect these by investigating for non-skin paint
in the field atop the eyes. To model leather paint, they use a Gaussian model defined in the
HSV paint shape, N(𝜇c, Σc), with µc = (µH, µS, µV )T and Σc = diag (𝜎𝐻, 𝜎𝑆, 𝜎𝑉). Over three
million leather paint sample points were used to train this model. The pixel atop the eye field
that does not correspond to the color model N(µc, Σc) is dispelled first. A Laplacian operator
is then fruitful to the grayscale image atop the eyes. From the residual of the pixels, the ones
with the highest gradient in one by one column are kept as strong descriptors of the eyebrows.
Fig 2.3: Two examples of eyebrow detection. (a) Binary description of the brow region. (b) Final contour
detection [5].
They used their methodology to execute a fast, proper and detailed detection without the need
to resort to complicated or time swallowing duties. To this end, a mouth corner detection is
defined using the subclass-based classifier. They only train a left mouth classifier. To detect
the right corners, they create speculum images of the testing windows (Fig. 2.4(a))
The restrict box of the mouth is gained following the same procedure described atop for the
nose (Fig. 2.4(b)).
Mouths are rich in color, which makes the final way of depicturing them possible. Here, they
used a resembling way that engaged before (leather color and Laplacian edge) detection. In
individual, they extract three features, given by saturation, hue, and Laplacian edges. Each of
these masks is a threshold at values Ts, Th and Tg prior being combined into a single mask
reaction.
7
Chap. 2: State of The Art
Fig 2.4: 2.4(a) mouth corner detection. 2.4(b) Gradient of the mouth window, its x and y projection, and the
final result [5].
Cox et al. introduced the mixture distance technique which gains a recognition rate of 95% on
a database of 685 people in which each face is represented by 30 measured distances [12].
They formed a new distance function based on local second series statistics as calculated by
modeling the training data as a mixture of general solidities they report on the results from a
mixture of various sizes. Their main goal was to understand the performance limitations of the
feature-based system. The structure of such methods alters extensively but three major
elements may be identified.
The definition of a feature set.
The extraction of these features from an image and
The recognition algorithm.
They assume that an n-element normal mixture model M was created to model the database
components {yi} and they refer to this as the experimental distribution. Each mixture
8
Chap. 2: State of The Art
component MK is a normal density N∑𝑘 𝜇𝑘 and they note by Mk the zero-mean density N∑𝑘, 𝑜.
So Pr(x|Mk) = Pr(x-𝜇 k|Mk) . The system’s query to be classified is denoted q. Using the mixing
probabilities Pr(Mk) gained from EM, they may then calculate a posteriori element probabilities
Pr(Mk|x). These may be visualization of as a stochastic syndrome of x’s membership in each
of the mixture’s elements. They will attribute to each Yi an Oi which is a mixture of the Mk
determined by these stochastic membership values. This is explained best by the derivation
which follows.
1
Pr (q|y,M) = Pr(y|M) . Pr (q.y|M)
1
= Pr(y|M) . ∑nk=1 P r (q. y|Mk).Pr(Mk)
1
= . ∑nk=1 Pr (q|y, Mk) . Pr (y | Mk) . Pr (Mk)
Pr(y|M)
1
= .∑𝑛𝑘=1 Pr (q-y | Mk) . Pr (y | Mk) . Pr (Mk)
Pr(y|M)
It is this preparation they use for all of their experiments. In different more complex
expressions are given according to weaker or different assumptions. Finally, they observe that
in the case of one mixture components n = 1, mixture-distance reduces to the Mahalanobis
distance from the query q to average face 𝜇.
They use simple Euclidean distance metric to the database and gained an 84% recognition
level.
9
Chap. 2: State of The Art
For simplicity, they adopt a flat selection method f = 1/2 to decide between first and second
order models, i.e.
Fig 2.5: Recognition accuracy varies considerably with mixture complexity. Both “soft” and “hard” VQ versions
of mixture distance are presented [12].
Just the diagonal variance and the full covariance matrices. Table 2.1 illustrates how f can
significantly affect recognition accuracy.
Table 2.1: The f parameter which selects first vs. second-order models has a potent effect on recognizer accuracy
[12].
When the mixture consists of a single Gaussian, a first-order variance model (f = 0) is best and
the full second-order covariance model (f ≈ 1) is notably worse. However, for mixture sizes 2
and 5, an off-diagonal weighting, f = 1/2, is best while the full second-order model with f =
≈ 1 still a distant third. The point of Table 2 is not that the flat selection model can increase
the recognition rate from 93% to 94% but rather that the recognition rate is consistently good
using a flat selection.
10
Chap. 2: State of The Art
The Eigenface method is one of the generally used algorithms for face recognition. The
eigenface technique is based on the Principal Component Analysis (PCA) also known as
Karhunen-Loeve transform. This method is successfully used to perform dimensionality
reduction. Principal Component Analysis is used for face recognition and detection [14].
M A Rabbani and C. Chellappan, want to effectively identify a frontal human face with
recognition rate using appearance-based statistical method for Face Recognition [15]. Their
approach produced better recognition rate for more complex images are used. For
Computational models of face recognition, their approaches are interesting because they can
contribute not only to theoretical insights but also to practical applications. This face
recognition method could be applied to a variety of problems, including criminal identification,
security systems, image and film processing, and human-computer interaction. Bledsoe [1966]
was the first to attempt semi-automated face recognition with a hybrid human-computer system
that classified faces on the basis of fiducially marks entered on photographs by hand. Later
work at Bell Labs (Goldstein, Harmon, & Lesk.1971) developed a vector of up to 21 features
and recognized faces using standard pattern classification techniques.
11
Chap. 2: State of The Art
Let the training set of face images be Γ1, Γ2, Γ3,….., ΓM. and median face of the Set is said
ψ.
Фi = Γi – ψ………………………………………….........................................................(2.1)
(a) (b)
Fig 2.6: (a) Input Training Images, (b) Edge [15].
An example training set is shown in Fig 2.6 (a) with the median face Ψ shown in Fig 2.6 (b).
This set of very large vectors is then subject to principal component analysis, which seeks a
set of M orthonormal vectors un, which best describes the distribution of the data. The Kth
vector, uk is chosen such that,
λk = 1 ⁄M ∑(ukT фn)2 ,n=1,2 …..M ……………………………………………………(2.2)
ulT uk = δ = {1,if 1= k 0, otherwise
The vectors uk and scalars λk are the eigenvectors and eigenvalues, respectively, of the
covariance matrix
C = 1 ⁄M ∑ фn фn T , n=1,2…..M …………………………………………………….(2.3)
= AAT
Where the matrix A = {Ф1, Ф2, Ф3 … Ф4}.
Image Фi. Consider the eigenvector vi of AAT such that ATA Vi = μi vi
Pre-multiplying both sides by A, they have A ATA Vi = μi A vi
M training set of face images to form the Eigenfaces ul Ul = ∑vlk Фk, n=1,2….M
12
Chap. 2: State of The Art
A new face image (Γ) is transformed into its eigenface components (projected into “face
space”) by a simple operation,
Wk = WkT(Γ – ψ)
an input face image is to find the face class k that minimizes the Euclidean distance
€ k = ║ Ω - Ωk ║2 where Ωk is a vector describing.
Using two face databases, the Yale face database and the other face database is AT&T.
In this experiment, they have tested their approach on several frontal face images with variant
facial expressions and lighting conditions.
This experimental result shows that our technique is the simplest and effective method for face
recognition using Median with various distance measures, and they obtained the best accuracy
using City Block distance.
Li et al. are discussing the Pattern recognition problems that have two essential issues: (i)
feature selection, and (ii) classifier design based on selected features. Boosting method
attempts to boost the accuracy of an ensemble of weak classifiers to a strong one [16]. The
AdaBoost is an algorithm, which solved many of the practical difficulties of earlier boosting
algorithms. Floating Search, which is a sequentially select feature with backtracking. Two
straight sequential selection method, sequential forward search (SFS) or sequential backward
13
Chap. 2: State of The Art
search (SBS) adds or deletes one feature at a time. The Sequential Floating Search methods
which allow the number of backtracking steps to be controlled instead of being fixed
previously.
1) Learning incrementally crucial features from a large feature set.
2) Constructing weak classifiers each of which is based on one of the selected features.
3) Boosting the weak classifiers into a stronger classifier using a linear combination
derived during the learning process.
They propose a new boosting algorithm, which is called Float Boost, it works for effective
statistical learning [16]. AdaBoost is a sequential forward search procedure, which is using the
greedy selection strategy. Float Boost incorporates the idea of Floating Search to solve the
non-monotonicity problem encountered in the sequential algorithm of AdaBoost. They could
present an application of Float Boost in a learning-based system for real-time multi-view face
detection [16]. For non-frontal faces, multi-view face detection should be able to detect it.
Fig 2.8: Middle: An image containing frontal faces subject to in-plane rotations. Left and right: In-plane rotated
by ±30o [16].
Fig 2.9: Schematic illustration of merge from different channels. From left to right: Outputs of fontal, left, right
view channels, and the final result after the merge [16].
14
Chap. 2: State of The Art
I. Description-Based DMs
Description-based DMs model shape apparently (shape descriptive) with deformations
modeled as perturbations in the shape parameter space [10].
II. Prototype-Based DMs
In prototype-based DMs, parameters are not shaped descriptive but parameterized
shape deformations directly [10].
15
Chap. 2: State of The Art
deformable templates. These templates are specified by a set of parameters which enables a
priori knowledge about the expected shape of the features to guide the detection process. The
templates are flexible enough to be able to change their size, and other parameter values, so as
to match themselves to the data.
The template is illustrated in fig 2.10. It has a total of eleven parameters represented by
→ = (→ , → , p1, p2, r, a, b, c, 𝜃 ). All of these are allowed to vary during the matching.
g xc, xe
To give the explicit representation for the boundary they first defined vectors.
→ = (cos 𝜃, sin 𝜃) and → = (-sin 𝜃, cos 𝜃)
e1 𝑒2
Which changes as the orientation of the eye changes. A point x in shape can be represented by
(x1, x2) where, x = x1e1 + x2e2. Using the co-ordinates the top half of the boundary can be
represented by a section of a parabola with x1 𝜖 [-b, b].
𝑎
x2 = a- b2 x12
16
Chap. 2: State of The Art
Note that the maximum height, x2, of the parabola is a and the height is zero at x1 = ± b.
similarly, the lower half of the boundary is given by,
𝑐
x2 = -c + b2 x12
Where, x1 𝜖 [-b, b]
Fig 2.11: Final results of the eye templates. The right column shows the final state of the template acting on the
eyes in the eyes in the left column [2].
They define the mouth closed template in terms of a coordinate system (x,y) centered on a
point x (the center of the mouth) and inclined at an angle 𝜃 (the orientation of the mouth). The
positive y direction points downward for consistency with the coordinate system used on the
computer screen. The edge at the top of the upper lip is represented by two parabolas pu which
intersect above the center of the mouth.
Fig 2.12: The mouth closed template. The mouth is centered on a point xm and has an orientation 𝜃. The left and
right boundaries are at distances b1 and b2 from xm. The intersection of the upper two parabolas occurs directly
above xm at a height of h. The lower two parabolas have maximum distances from the central line (shown dotted)
of a and a+ c [2].
17
Chap. 2: State of The Art
Fig 2.13 demonstrates the method and shows the time history of a simulation and Fig 2.14
shows the final position s of the template on several images. For the mouth open template,
there were again two epochs: (1) Coefficients are high for the valley forces and the teeth forces.
They are zero for the edges forces on the boundary. The teeth forces pull the template to the
mouth, scale, and orient it. (2) The edge coefficients are increased. The edge forces help adjust
the position of the edge boundaries. Stage (2) has not yet been implemented for this case.
Fig 2.13: A dynamic field for a closed mouth: (a) the original mouth, (b) the valley field, (c) the peak field, and
(d) the edge field. The fig is organized left to right and top of the bottom. The strengths of the fields are shown in
Greystone, white is strong and black are weak [2].
Fig 2.14: Final results for the mouth-closed templates. The right column shows the final state of the template
acting on the mouth in the left column [2].
Cohen et al. introduced a new active contour model called balloons where the model curve
behaves like an inflated balloon and ignores the spurious or weak edges with the help of
18
Chap. 2: State of The Art
inflation force [16]. In that model, a correct choice for parameters is guided by numerical
analysis considerations. The main advantage of most of the active contour models is they do
not incorporate any global shape information of the face and they require predefined
parameterization that does not change during the deformation process.
Fig 2.15: Estimation of facial feature positions using T-shaped model [1].
In Fig 2.15 the existing methods like ASM do not have a good initialization period. In this
case, they use a fast presumption of the face position in order to improve the overall algorithm’s
correctness and convergence. B. Froba and C. Kulbeck, employed a gradient map of the image
to determine the best match between a candidate portion and an average face template, once
the face location and scale are estimated [17].
19
Chap. 2: State of The Art
They define a face model (FM) as an ordered set of face point. In fig 2.16 illustrates their face
model. The shape description P portrays the global topology of the facial features, while
texture parameters T = {Ti|1 ≤ i ≤ NFP} describe the local patterns characterizing each feature
point. Later they will see that the local texture attributes guide the deformation of the face
model to a specific shape instance, while the global feature topology constrains the
deformation to maintain a face-like shape.
Fig 2.16: (a) Face shape overlaid on a real image. (b) Topological shape with local texture attributes [1].
Their deformable model fitting takes 30–70 ms to process one face (Pentium-IV, 1.7 GHz),
which is comparable to ASM. Fig 2.17 shows two examples. It can be seen that their proposed
feature extraction is able to achieve correct convergence, even when the model is poorly
initialized.
Fig 2.17: Feature extraction examples (without performing the initial position estimation step) [1].
20
Chap. 2: State of The Art
model parameters. These are the advantages of this representation. A good example of this
type of representation is represented by Yuille et al. [2]. They used parametric templates to
detect feature like eyes or lips of the human face.
Fourier Decomposition can be used for a model representation in order to get base function. A
parametric model base on the elliptical fourier decomposition of the contour is presented by
Staib et al. [25]. A set of sinusoidal base function was used to present a parametric contour.
Eigen Decomposition is another technique commonly used to represent a shape family that
derived from a basis. By retaining only the significant eigenvectors as the basis, dimension
reduction can be effectively achieved. A good example of this type of representation was
proposed by Cootes et al. in their Active Shape Model (ASM) [25].
21
Chap. 2: State of The Art
Fig: 2.18. Deformable template matching (a) a prototype template (b) deformable template with transformation
interpolation of order 3 (c) localization of a saxophone using manually chosen initial template position [20].
22
Chap. 2: State of The Art
Fig 2.19: An example of template matching. The example fish can be matched to only 3 fishes
(solid lines) with rigid template matching using translation, scaling, and rotation. It can be
matched to all the fishes using deformable template matching (dashed lines).
23
Chap. 2: State of The Art
2.5 Discussion
Through this literature review, we found a wide range of face recognition methods. The
problem they faced and the solution they used to solve the problem helps us understand more
about facial recognition system and how to improve our own method.
Ding and Martinez, introduced a combined approach to provide accurate and detail detection
of the major facial features [5]. They used part by part facial feature extraction for eyebrows,
eyes, nose, mouth, and chin for accurate detection. To detect facial features, they employ an
algorithm which learns to split data into adequate subclasses.
Cox et al. introduced the mixture distance technique which gained a recognition rate of 95%
on a database of 685 people in which each face is represented by 30 measured distances [12].
They formed a new distance function based on local second series statistics as calculated by
modeling the training data as a mixture of general solidities they report on the results from a
mixture of various sizes. Their main goal was to understand the performance limitations of the
feature-based system. The structure of such methods alters extensively but three major
elements may be identified.
M. A. Rabbani and C. Chellappan, want to effectively identify a frontal human face with
recognition rate using appearance-based statistical method for Face Recognition [15]. Their
approach produced better recognition rate for more complex images. For Computational
models of face recognition, their approaches are interesting because they can contribute not
only to theoretical insights but also to practical applications.
24
Chap. 2: State of The Art
Yuille et al. was detecting and describing a feature of the face such as eyes and mouth by using
parameterized deformable templates [2]. They propose a new method to detect such features
by using deformable templates. These templates are specified by a set of parameters which
enables a priori knowledge about the expected shape of the features to guide the detection
process. The templates are flexible enough to be able to change their size, and other parameter
values, so as to match themselves to the data.
25
Chapter 3
Background
The base of a good research is the understanding of the background terms and definitions. In
this chapter as background knowledge discovery, we will discuss face recognition system,
deformable model, deformable template matching, and stable mass-spring model (SMSM).
26
Chap. 3: Background
Knowledge-based Method
The rule-based method uses the knowledge of human to get the information about the typical
face. Usually, the rules capture the relationships between facial features to design the location
of the features in the face [29].
Appearance-based Method
In contrast to template matching, the models are learned from a set of training images which
should capture the representative variability of the appearance face. For detection, these
learned models are then used and are mainly designed for face detection [29].
27
Chap. 3: Background
28
Chap. 3: Background
Face recognition system was measured using two basic methods. Identification performance
measured is the first one, where the percentage of probes in the primary statistic is correctly
identified by the algorithm. Measured verification is the second one, where the performance
measure is the equal error rate between the probability of false alarm and of correct verification.
29
Chap. 3: Background
Face recognition has received remarkable attention due to its manifold applications, such as in
information security, smart cards, entertainment, law enforcement, and surveillance [29]. Table
3.1 lists some of the most important face recognition applications.
30
Chap. 3: Background
In earlier research, Brunelli et al. recognized frontal views of the face by proposing two
traditional classes of techniques that applied to digital images [12]. The first technique was
based on the computation of a set of geometrical features. Which was the first technique
towards an automated recognition of the face. This approach represents satisfactory result for
recognition from frontal views but the vectors of geometrical features extracted by this system
had low stability. The second one was based on the use of whole image gray-level template
matching. This approach had superior in recognition performance but this not is as consistent
with properties of human vision in gender classification.
In this paper, our approach is to detect human face using stable mass-spring model (SMSM)
as a dynamic deformable template (DDT). Rests of this chapter, we discuss the Deformable
model, Deformable Template Matching, and Stable mass-spring model (SMSM).
where nodes are known as mass point and springs connecting two points that are called edges.
In this MSM graph, if we fixed the mass points then fixed springs length. So, using previous
knowledge of the object shape, mass positions, and springs ease lengths are estimated. The
internal force which is defined by the model graph and external force which is calculated from
the image data. The model takes a shape with internal force vectors adding to zero without any
external force. The model disfeatures only under external force. The external force acts as an
attraction or gravitational force. Decreasing distance to the attraction point that increases the
extension of the attraction force. The internal and external force is changeable, so the model is
disfigured until internal and external force is a balance.
Before, a model suffers from impermanence, where the structural integrity of the model is
controlled only by the spring fixed at the original mass-spring model. To overcome this
problem, it requires dense cross adjoin of the mass point create a very complex model. This
type of solution stop comes off at the cost of high count time and also makes the model too
inflexible.
Thus, S. Bergner et al. suggested the use of an angled force called torque force for stability but
in a higher dimension [26], it also faces the same problem as dense cross-linking. L. Dornhein
et al. solve the problem of instability by introduction torsion force in the system that provides
stability in arbitrarily high dimension without increasing model complexity [27]. In their work,
we used the torsion force for model stability.
32
Chap. 3: Background
Fig 3.3: collapsing problem (a) shows a simple mass-spring model. In (b) this system is
contorted, although no spring has changed its length. In (c) the same system is collapsed with
no single spring length changed. (d) is the same system but fully interconnected [27].
The dynamics of such a system can be described by Newtonian mechanics. The movement of
the model is influenced by the forces acting on the mass points.
from their rest lengths i.e. mass points get new position vector, pnewi. The spring force, Fsi
33
Chap. 3: Background
Fig 3.4: Torsion force: rest directions of the springs are shown in dotted lines; springs deviated
from their rest directly under the influence of the external force (shown in solid line).
Thus, the torsion force is exerted on the nodes in the direction of their rest direction. Springs
deviations from their rest direction exert a torsion force to the mass points in the direction of
their rest directions to restore the original position.
The torsion force exerts on mass point i can be calculated using Eq (3.3)
t j p ji , rdji n ji
FTi j
p ji
n ji
……………………………………….(3.3)
34
Chap. 3: Background
Where,
tj = torsion constant of the mass point j
The working direction of the torsion forces, nji is calculated using equation (3.4)
p ji , rdji p ji
nr 2 ………………………………………….…………...(3.4)
p ji
F li imageloc i ………………………………………………………….…..(3.5)
Where loc(i) gives the coordinates of the current location of the sensor i in the image and image
gives the image information at that position.
35
Chap. 3: Background
Fi
ai ………………………………………………………………………(3.6)
mi
Where I Fi is a total force acting on mass point i and can be calculated using eq.(3.7)
Where,
wsi = weight for the spring force acting on mass point i.
wti = weight for the spring force acting on mass point i.
wli = weight for the spring force acting on mass point i.
Following the calculation of the mass point forces, we can compute the new velocity, vi and
position, pi of each mass point given the old velocity, vi old and old position values, pi old as
follows:
vi viold ai t 1 d …..………………………………………………………...(3.8)
pi piold vi t …………...…………………………………………………….(3.8)
The model adaptation to the image will continue until equilibrium when external force cancels
out internal forces.
36
Chapter 4
Proposed Method
4.1 Introduction
Image recognition is a process, which usually consists of taking a picture, process the image
then presents the results. Face recognition has many applications in image processing field.
Therefore, face recognition is an important issue in image processing. In this thesis, we mainly
work on to detect human face using stable mass-spring model (SMSM) as a dynamic
deformable template (DDT). Whole work divides into four parts as follow:
37
Chap. 4: Proposed Method
DDTs are activated by making its mass points sensitive to image data I(x, y). The DDTs are
constrained to lie in an image subject to some gravitational force. S(x,y) (in this thesis we will
use the term ‘sensor’ to indicate this surface) may correspond to image intensities (i.e S =
±I(x,y)) or to image gradient values (e.g., S = ∇I(x,y)). Once S is determined, the DDTs can
be placed on the surface allowing it to deform according to the surface topography under the
influence of gravity. Depending on the nature of the surface considered, the DDTs can be made
attracted to any image force.
Start
Feature calculation
38
Chap. 4: Proposed Method
For Simulation of SMSM first, we need to do some preprocessing of our database. We convert
the RGB images into Grayscale images. When the model comes to the simulation the image is
replaced in the model. In simulation stage, each node will get a sensor for feature detection.
4.2.1 Sensors
It has been already stated that we are interested to detect face and facial expression. The DDTs
should not be attracted to the artifacts from the individual image. We do not use the image
gradient as sensor since stroke width does reflect proper feature extraction in the image. For
that reason, intensity sensor can be used so that DDT can be made attracted to medial axis of
the face. We also used corner and edges sensor to detect corner and edges.
An intensity sensor is the most basic sensor that will pass image information directly. During
model simulation intensity sensor helps nodes attracted to the bright region of the image.
39
Chap. 4: Proposed Method
40
Chap. 4: Proposed Method
Edge detection includes a variety of mathematical methods that aim at identifying points in a
digital image at which the image brightness changes sharply or more formally, has
discontinuities. The points at which image brightness changes sharply are typically organized
into a set of curved line segments termed edges. The same problem of finding discontinuities
in one-dimensional signals is known as step detection and the problem of finding signal
discontinuities over time is known as change detection. Edge detection is a fundamental tool
in image processing, machine vision, and computer vision, particularly in the areas of feature
detection and feature extraction. In our approach, we used Sobel operator or sensor for detect
edges.
41
Chap. 4: Proposed Method
Mathematically, applying a Gaussian blur to an image is the same as convolving the image
with a Gaussian function. This is also known as a two-dimensional Weierstrass transform.
in two dimensions, it is the product of two such Gaussians, one in each dimension:
x2 y2
1
G(x; y) = 2 2 …………………………………(4.2)
2 2
4.3 Training
Start
42
Chap. 4: Proposed Method
For training, we added some images to the AT & T database. Then applying the method stable
mass-spring model (SMSM) as a dynamic deformable template (DDT) to each image. From
the database image, we calculate the energy value. By using this energy value, we calculate
Mean and Standard Deviation.
1 n
x xi …………………………………………..(4.3)
n i 1
43
Chap. 4: Proposed Method
4.4 Classification
Start
Energy calculation
Human Non-human
In classification stage, we added some non-human face images to the database as the testing
set. Then we applied our method each of the test set. Then we calculated the energy value of
each. By comparing the energy value of each image with Confidence Interval (CI) of the
training set, we can detect the human and non-human face images. If the energy value of an
image is within the Confidence Interval (CI) range then the image contains a human face.
Otherwise, the image is containing a non-human face. Then we calculate an error rate.
44
Chapter 5
Experiments & Results
In this chapter, we will discuss the experiments that we have carried out with our proposed
DDT. We will also present a detailed discussion of the result and findings of our experiments
and the problems of our proposed method. We experimented with (10x11grid) rectangular
DDT.
In the next section, we discuss the model and it’s required environment settings.
So, the 10x11 grid is the best resolution to cover the image (120x120 pixel) and acquire data.
The template has total 110 mass points and 199 springs. The boundary mass points are fixed,
they do not change their position under any circumstances. But the inner mass points are
capable of moving under the influence of the image force. The internal mass points will move
45
Chap. 5: Experiments & Results
until it finds its suitable place. The mass points use 3 different kinds of sensors, which are
already discussed in the previous chapter.
It is essential to select necessary sensor to acquire the correct data. We already stated that we
are interested to create a new way to detect and classify face using SMSM. So, our primary
goal is to detect a human face and classify it into two groups human and non-human face. Thus,
we used the sensors, which is already explained in the previous chapter.
Another important thing is the parameter of the template and the initial data of the template.
We initialized the weight of the spring and mass points by an educated guess.
For test dataset, we used another 20-human face from the AT&T database and we took 10 non-
human face images. The non-human images were RGB images, so we had to convert them to
Grayscale in order to use in our method as testing data.
46
Chap. 5: Experiments & Results
(a) (b)
Fig 5.1: (a) is the image when the data is first entered in our method, (b) is the image when
the acquiring image force is complete.
Table 5.1: These are the results of our training data, which we acquire using Eq. 3.5
47
Chap. 5: Experiments & Results
The table 5.1 shows that the different energy for different images, that’s why we had to use
multiple different images as training data to make our method robust and more accurate.
After using this training data to calculate CI, we use that CI to compare with the testing data.
If the total external force (eq. 3.5) is between the CI range, then our method will recognize and
classify the image as a human face. Otherwise non-human face.
48
Chap. 5: Experiments & Results
49
Chap. 5: Experiments & Results
Fig 5.2: (a) is the main input image for the test, (b) is the feature image after using Harris corner detection, (c) is
the feature image after using Sobel edge detection, (d) is the simulation starting point image, (e) is the simulation
ending point image.
Fig 5.2 is the sample of our test data and its simulation. We use 20 human faces from our
AT&T database & 10 non-human faces as testing data.
Our method was able to detect and classify 24 images correctly. Among the correct detection
and classification 14 were original human face image and 10 were non-human face images.
Our method was not able to detect and classify 6 human face images correctly. Thus, the
success rate of our method is 80% (Fig. 5.3).
50
Chap. 5: Experiments & Results
20%
Success
Error
80%
5.4 Discussion
Our method has some problems to detect and classify between human and non-human faces.
There are some primary reasons for the problem:
As the parameter we set is an educated guess, so the parameter was no accurate. We
need to use more training to improve the performance.
The rectangular DDT is fixed at some point. That’s why the sensors were not able to
reach the desired location of the face to detect accurate data.
We used only 3 sensors in our method. To get more accurate result we need to
introduce more sensors in our method.
In some data, Harris-corner detection was not able to detect all the corners of the face
to detect it perfectly. In order to overcome the problem, we need to segment the data
into some specified parts to detect the corners more accurately.
We use grayscale images as our data. Grayscale images miss some important facial
feature and margins, so we need to improve our model to work with RGB full-color
images and train it to get accurate data.
There are a lot of scopes to work and improve our method in the future. We will improve our
method and make it more robust, by solving the problems we found in our thesis.
51
Chapter 6
Conclusion
This chapter summarizes the work presented in this thesis. It also discussed about the scope
of future works related to our proposed method.
6.1 Conclusion
The main goal of this thesis was to investigate the possibility of using Stable Mass Spring
Models as deformable templates to use face detection. Thus, we proposed a Dynamic
Deformable Templates (DDT) which is capable of finding the face and non-face. Previous
work with Dynamic Deformable Templates (DDT) was able to detect object and digits. We
investigated a (10x11grid) rectangular grid DDT to detect a human face. The rectangular grid
DDT is independent, but shape information of the image is generated from the sensors. We
investigated training of the DDT to find out the result and improvement of our method. The
mass points are defined by 3 different sensors. We input RGB images from the dataset and
convert it into a Grayscale image of face detection. We also use Gaussian Smoothing for
smoothing the image and apply Sobel and Harris technique for corner and edge. We are
successful to achieve a new method using stable mass-spring model (SMSM) as a dynamic
deformable template (DDT), which is able to classify human and non-Human face.
52
Chap. 6: Conclusion
In this thesis, we mainly work on to detect a frontal face, facial expression, and feature.
using stable mass-spring model (SMSM) as a dynamic deformable template (DDT).
We will develop a model using SMSM for multi-view face detection in future.
We will improve the model to detect the facial expressions, i.e. happy face, sad face,
angry face etc.
53
Bibliography
• [1] F. Zuo and P. H. N. de With, “Fast facial feature extraction using a deformable
pp. 1425-1428.
• [3] F. Zuo and P. H. N. de With, “Towards fast feature adaptation and localization
for real-time face recognition systems,” in Proc. SPIE, 2003, vol. 5150, pp. 1857–
1865.
• [4] T. Cootes, “Statistical models of appearance for computer vision,” Tech. Rep.,
• [5] Ding L. and Martinez A. M., “Precise Detailed Detection of Faces and Facial
10.1109/CVPR.2008.4587812 (2008).
• [6] Xiang Yu, Junzhou Huang, and Shooting Zhang “Pose-free Facial Landmark
Fitting via Optimized Part Mixtures and Cascaded Deformable Shape Model,”
54
Bibliography
• [8] Y. Zhou, W. Zhang, X. Tang, and H. Shum. “A Bayesian mixture model for
shape composition: A new framework for shape prior modeling.” In CVPR, 2011.
Magdeburg ©2006.
• [11] Divyarajsinh N. Parmar and Brijesh B. Mehta, “Face Recognition Methods &
• [12] Ingemar J. Cox, Joumaiia Ghosn and Peter N.Yianilos, “Feature-Based Face
1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on, DOI:
10.1109/CVPR.1996.517076.
with PCA and ICA," Computer vision and image understanding, vol. 91, pp. 115-
137, 2003.
April 2007.
55
Bibliography
• [16] Stan Z. Li1, Long Zhu1, ZhenQiu Zhang, Andrew Blake, HongJiang Zhang1,
• [17] M. Kass A. Witkin and D. Terzopoulos, “Snakes: active contour models,” Int.
• [18] Cohen Laurent D. “On active contour models and Ballons”, CVGIP: Image
understanding, 53(2),1991.
[21] K.w Cheung, D.Y. Yeung and R.T. chin: “On Deformable Models For visual
[24] B. Widrow, “The rubber mask technique, Parts I and II,” Pattern Recognition
5 (1973) 175-211.
56
Bibliography
[25] T.F. Cootes and C.J. Taylor, “Active shape models—Smart snakes,” in:
pp. 24.‐27.
57