You are on page 1of 82

Real Time Face Recognition

Carlos Leung
School of Information Technology and Electrical Engineering
University of Queensland

Submitted for the degree of


Bachelor of Engineering

October 2001
Carlos Leung
11 Reagan Place
Stretton QLD 4116

October 19, 2001

Head of School
School of Information Technology and Electrical Engineering
University of Queensland
St Lucia QLD 4101

Dear Professor Kaplan,

In accordance with the requirements of the degree of Bachelor of Engineering in the


division of Electrical Engineering, I present the following thesis entitled “Real-
Time Face Recognition”. This work was performed under the supervision of Dr.
Brian Lovell.

I declare that the work submitted in this thesis is my own, except as acknowledged
in the text and footnotes, and has not been previously submitted for a degree at the
University of Queensland or any other institution.

Yours sincerely,

Carlos Leung
Acknowledgement

The success of this thesis would not have been possible without the constant
encouragement, advice and support from a vast number of people.

Acknowledgement is extended to my supervisor, Dr. Brian Lovell, for his assistance


and guidance throughout the whole thesis. A big thank you to Shane Goodwin for
looking after the lab and being patient with my continual equipment demands, and
to Shao Kang for battling through the challenge of face recognition with me.

A high five thank you to the entire team of “SIP lab Gurus” – Laser Ben, Apple
Ben, Obi Wan and Petair. It has been fun working with all of you. The many
difficult times and stressful late nights have been made easier by your
companionship and your encouraging supports. A special thanks must be extended
to Ben Appleton for his willingness to share his genius and his continual patience in
answering my many questions.

Thanks also to all my friends and family for their support throughout the year, and
to Cheryl for slaving over all my proofreading demands. To my personal printing
specialist, Uncle Colin, thank you for the endless nights spent in front of the printer
and your many wacky ideas. A super special thank you must also be extended to
Yan for her continual spiritual support and always being there to listen to my
whining.

And last but not least, thanks also to all the people who has so willingly donated
their face for my analysis. This thesis would undoubtedly be impossible without
your generosity.

iii
Abstract

As continual research is being conducted in the area of computer vision, one of the
most practical applications under vigorous development is in the construction of a
robust real-time face recognition system. While the problem of recognising faces
under gross variations remains largely unsolved, a demonstration system as proof of
concept that such systems are now becoming practical have been developed. A
system capable of reliable recognition, with reduced constraints in regards to the
position and orientation of the face and the illumination and background of the
image, has been implemented with 96% recognition accuracy.

The design of the face recognition system is based upon “eigenfaces” and has been
separated into three major modules – face detection, face normalisation and face
recognition. Face location and normalisation are performed in real-time, and
consistent accuracy in face detection is recorded with video input. Under dynamic
mode, where recognition is accomplished from real-time video input, the system has
been demonstrated to perform at a maximum speed of two frames per second under
Matlab. While a completely robust real-time face recognition system is still under
heavy investigation and development, the implemented system serves as an
extendable foundation for future research.

iv
Contents

Acknowledgement................................................................................................... iii
Abstract.....................................................................................................................iv
1 Introduction.......................................................................................................1
1.1 Task Definition ...........................................................................................2
1.2 Thesis Structure ..........................................................................................3

2 Literature Review .............................................................................................4


3 Theory ................................................................................................................7
3.1 Standardised Face Database........................................................................7
3.2 Principle Component Analysis ...................................................................8
3.3 Optimised Eigenface Algorithm ...............................................................11
3.4 Normalised Cross Correlation ..................................................................14
3.5 Colour Segmentation ................................................................................15
3.6 Morphology ..............................................................................................15

4 System Design..................................................................................................17
4.1 Design Architecture ..................................................................................17
4.2 Face Recognition ......................................................................................19
4.2.1 The Eigenface Approach ..................................................................19
4.3 Face Detection ..........................................................................................22
4.3.1 Skin Detection...................................................................................23
4.3.2 Refined Face Search .........................................................................26
4.4 Face Normalisation ...................................................................................30
4.4.1 Lighting Normalisation.....................................................................30
4.4.2 Scaling Normalisation.......................................................................32
4.4.3 Rotation Normalisation.....................................................................33
4.4.4 Background Subtraction ...................................................................33

5 System Performance Evaluation ...................................................................35

v
5.1 Face Detection ..........................................................................................35
5.1.1 Skin Detection...................................................................................35
5.1.2 NCC Search ......................................................................................39
5.1.3 Face Space Search ............................................................................41
5.2 Face Normalisation ...................................................................................42
5.2.1 Scaling Normalisation.......................................................................43
5.2.2 Rotation Normalisation.....................................................................44
5.2.3 Lighting Normalisation.....................................................................45
5.3 Face Recognition ......................................................................................47
5.3.1 Training Stage...................................................................................47
5.3.2 Recognition Stage .............................................................................48
5.4 Integrated System Performance ................................................................50

6 Design Review .................................................................................................53


6.1 The Integrated System ..............................................................................53
6.2 Face Detection ..........................................................................................54
6.3 Face Normalisation ...................................................................................55
6.4 Face Recognition ......................................................................................56
6.5 Future Work..............................................................................................57
6.5.1 Tracking ............................................................................................57
6.5.2 Intensity Invariant Colour Model .....................................................58
6.5.3 Multiple Recognition System ...........................................................59
6.5.4 C++ Programming ............................................................................59
6.5.5 3D Modelling....................................................................................60

7 Conclusion .......................................................................................................61
References................................................................................................................62
Appendix A Program Listings ...........................................................................65

vi
List Of Figures

Figure 3-1 Faces are Linear Combinations of eigenfaces..........................................9


Figure 3-2 Example of morphological dilation and erosion ....................................16
Figure 4-1 System Architecture ..............................................................................18
Figure 4-2 A set of perfect eigenfaces - ideal database [20]. ..................................20
Figure 4-3 Standard Average Face [22]...................................................................27
Figure 4-4 Revised alternative template for NCC. ..................................................27
Figure 4-5 Illustration of the determination of the rotational angle ........................33
Figure 5-1 Decomposition of image into V plane ...................................................36
Figure 5-2 Skin Colour Distribution on UV plane ..................................................36
Figure 5-3 Examples of skin colour used to generate the lookup table...................37
Figure 5-4 Binary Mapping of skinned regions of the faces in Figure 5.3..............37
Figure 5-5 Intensity effects on Skin Detection.. ......................................................38
Figure 5-6 Skin Detection results. ...........................................................................38
Figure 5-7 Standard Average Face [22]...................................................................39
Figure 5-8 NCC search results.................................................................................39
Figure 5-9 Revised alternative template for NCC ...................................................40
Figure 5-10 NCC search results...............................................................................40
Figure 5-11 Face Space Search................................................................................41
Figure 5-12 Distance map of face space search.......................................................42
Figure 5-13 Scaled normalisation ............................................................................43
Figure 5-14 Face image subject to both scaling and rotational normalisation ........45

vii
Figure 5-15 Lighting normalisation.........................................................................46
Figure 5-16 Computed eigenfaces ...........................................................................47
Figure 5-17 Face database .......................................................................................49
Figure 5-18 Sample results of recognition stage .....................................................49
Figure 5-19 Example of complete face recognition system ....................................51

viii
1 Introduction

As continual research is being conducted in the area of computer vision, one of the
most practical applications under vigorous development is in the construction of a
robust real-time face recognition system. With the recent major terrorist attacks in
the United States, there have been increasingly substantial interests in the
development of intelligent surveillance cameras that can automatically detect and
recognise known criminals as well as suspicious characters. Due to such uncertain
times, humans are beginning to seek support from computer systems to aid in the
process of identification and location of faces in everyday scenes. Smart buildings
can be implemented whereby the presence of unknown dubious individuals can be
brought to the attention of building security for appropriate action, and smart
computers can be used to load personal preferences and needs. Entertainment
companies are particularly interested in systems where specific actors can be
searched for and located in a video sequence, so that their movements can be
tracked throughout the entire movie.

While solutions to the task of face recognition have been presented, recognition
performances of many systems are heavily dependent upon a strictly constrained
environment. The problem of recognising faces under gross variations remains
largely unsolved. This thesis addressed the issue of developing a real-time face
recognition system under reduced constraints. The design of the face recognition
system is based upon “eigenfaces” and has been separated into three major modules
– face detection, face normalisation and face recognition.

Face detection is accomplished by first performing a skin search of the input image
based on colour segmentation. Although skin colours differ from person to person,
it was found that the colour remained distributed over a very small region on the
chrominance plane. Normalised cross correlation and face space decomposition are
then applied in order to locate the exact position of the face. Since the application
of eigenfaces to the task of face recognition requires a perfectly standardised and
1
aligned database of faces, face normalisation modules are inserted between the
detection stages to account for possible scaling, planar rotational and illumination
differences.

The outcome is a sophisticated system capable of reliable recognition with reduced


constraints in the position and orientation of the face, as well as the illumination and
background of the image. Face location and normalisation are performed in real-
time and consistent accuracy in face detection is recorded with video input. Under
static mode, where recognition is performed under single scaled images without
rotation, a recognition accuracy of 96% has been achieved. Demonstrations
however, will be performed under dynamic mode where recognition is applied to
real-time video input. In dynamic mode, while recognition accuracy diminishes due
to variations in the scaling and rotation of faces, the design of a robust recognition
system invariant to scale and rotation is still under development and will be the
focus for future work.

1.1 Task Definition


The goal of this thesis is to implement a system capable of detecting and
recognising faces in real-time. Successfully constructing a real-time face
recognition system not only implies a system capable of analysing video streams,
but also naturally leads onto the solution to the problems of extremely constraint
testing environments. Analysing a video sequence is the current challenge since the
faces are constantly in dynamic motion, presenting many different possible
rotational and illumination conditions.

Due to the time constraint and complexity of implementing the system in C++, the
aim is to design a prototype under Matlab that is optimised for recognition
performance. A system that can accept varying forms of inputs at different sizes
and image resolutions should be implemented, constructing a well-coded and
documented system for easier future development.

Alongside the design and implementation of the face recognition system, a


mathematical derivation of the eigenface approach, based on the work of Turk and
Pentland [1], has been explored. The result is such that a prove of the validity of the
alternative optimised eigenface algorithm has been proposed and we believe that our
work has not been presented in any literature thus far.

2
1.2 Thesis Structure
A documentation of the design, evaluation and discussion of the integrated real-time
face recognition system is presented in this thesis. Following this introduction, a
literature review of available past techniques in face detection and recognition is
examined in Chapter 2. Chapter 3 is dedicated to the analysis of the underlying
mathematical theories behind the many methods used throughout the system,
including a detailed exploration of the eigenface method. In turn, Chapter 4
explains the methodology and implementation of each individual module and an
evaluation of the designed face recognition system is provided in Chapter 5. Based
upon the performance of the system, the entire design is reviewed in Chapter 6, and
methods in which the design may be improved or extended are highlighted. Chapter
7 concludes the thesis with an overview of the successes accomplished by this
project. A listing of the Matlab code for the recognition system is included in
Appendix A.

3
2 Literature Review

Many have tried to solve the problem of face recognition; something that even little
babies can do so comfortably, effortlessly and naturally, from the time they are
born. Not only have engineers and scientists been involved with developing face
recognition systems. Psychologists have also been involved with investigating how
humans perceive the task of face recognition and how faces are stored in the human
memory [2]. Research had been conducted in search of the distinctiveness of faces
and the contributing factors that aid humans in recognising faces [3]. From
Klatzky’s analysis on a human’s ability to recognise cartoons and caricatures [4],
although results suggest that recognition of faces is based on distinctive features,
strong evidence shows that faces tend to be recognised as an integral stimulus.
Although researches into face recognition through cognitive psychology provide an
interesting alternative view of the problem, most of the work performed has
produced results that remain in the experimental stage rather than designing a real
working system. The few proposed face recognition systems that are found in
cognitive psychology literature such as Facenet [5], all employ a neural network
approach and a multiplayer networks solution.

The use of neural networks has also been a popular approach in designing a face
recognition system and is still undergoing intensive development and research. This
technique uses Self Organizing Maps and Convolutional Neural Networks to
describe a database of faces [6], whereupon a three dimensional neural network is
constructed that has the ability to connect the obtained data. Kohonen [7] extended
these approaches by using an associative network with a simple learning algorithm
to classify face images and recalling face images from incomplete networks.
Fleming and Cottrell [8] further improved upon these ideas to use non-linear units,
training the system by back propagation. The major difficulty with using a
connectionist approach is a neural network’s inability to extend to a large dataset,
since even for a small image of 128 pixel squares, a neural net of more than 16000

4
input is required. The large amount of redundancy makes this approach inefficient
and impractical until further advances and improvements have been discovered and
developed.

Techniques for face recognition began as early as 1888 by Francis Galton [9]. He
proposed a recognition system based on the detection and comparison of the relative
distances between key facial features, such as eye corners and mouths. Research
into how human perceive and recognise faces by Carey and Diamond [10] however,
has shown that adults do not identify human faces based on the immediate
relationship between individual facial features. Recognition through feature
distances moreover is quite fragile and thus extremely sensitive to changes.
Nonetheless, recognition techniques based on detecting individual features are still
popular and used in conjunction with other techniques to improve performances.

Template matching is also a common technique. By using correlation, input images


can be compared to a set of known faces, with the largest correlation resembling the
closest match. Local feature template matching and a global measure of facial
features are proposed. Improvements were also made by Yuille and Cohen [11],
proposing the use of deformable templates, which are models of faces determined
by analysing the input images. Correlation based techniques have proven successful
but only under extremely constraint conditions. The technique has a tendency to
break down due to its high sensitivity to scaling, lighting and noise.

Most of the techniques presented require an arbitrary assumption of which facial


features are important when performing the task of face recognition. These
techniques either focus on a specific set of key facial features, such as the eyes or
the mouth, or not focus on any features at all, such as the method of correlation.
Correlation is essentially a task of pattern matching, placing no particular emphasis
on parts of the image and treating all the pixels in the face image as equally
important.

A different technique that scales and normalises the facial features based on their
relative importance has been proposed by Turk and Pentland [1]. This method
analyses each facial image into a set of eigenvector components, essentially
capturing the variations in a collection of face images independent of any judgement
of particular facial features. With each component representing a certain dimension
and description of the face, the whole set of eigenvectors characterises the variations
between different faces. Every pixel in the image would contribute into the
formation of the eigenvectors, thus each eigenvector is essentially an image of the
face with a certain deviation from the average face depending on the local and

5
global facial features. Each of these eigenvectors of the faces, are called eigenfaces,
hence this technique is termed the “eigenface approach”.

The eigenface approach uses a technique developed by Sirovich and Kirby [12]
called PCA, principal component analysis. PCA is a technique that effectively and
efficiently represents pictures of faces into its eigenface components. With a given
set of weights for each face image and a set of standard pictures, they argue that any
face image can be approximately reconstructed by combining all the standard faces
according to their relative weights. This idea of linear combination is the backbone
of the eigenface technique, and has been proven extremely successful.

When building a face recognition system, it is also important to consider the


extraction and the location of the face itself. Many techniques have been proposed
in the area of face detection, with the most common technique being colour
analysis. A common approach is to locate the skin regions in an image and perform
further processing and calculations to limit the region of interest. Accurate
detection of the face is a particularly important process for analysing video
sequences.

Before face recognition can be performed in video streams, detection of the faces is
an important task. Investigations into designing an automated face detection system
in MPEG video were accomplished by Wang and Chang [13]. Although many of
the current advanced multimedia applications and technology had not been
introduced at that time, face detection in video sequences was successfully
implemented. From then on, many research projects took place and successful
results were obtained with Lorente and Torres [14] presenting various solutions in
image segmentation and face location. Although extremely controlled
environments were used as the test sequence, modifications to cope with different
possible face conditions in a video were considered.

Despite the many successes and papers presented in the area of face recognition,
creating a real-time system is still a gruelling challenge. The eigenface approach
has proven extremely successful in constraint environments; the challenge is to
extend this to cope with the many possible facial orientations in a video sequence.
After locating the face image in a video stream, different illumination conditions,
different size scales, face occlusions and facial expressions are amongst the many
obstacles still to be solved.

6
3 Theory

A real-time face recognition system can be separated into three modules:

1. Face Location

2. Face Normalisation

3. Face Recognition

While Chapter 4 will focus on the details and design of each individual module, a
presentation of the theory and mathematics behind the techniques used in these
stages will be discussed in this chapter.

3.1 Standardised Face Database


The input into the system and consequently, the input into the first face location
module will be either a still image or a frame from a video sequence. Analysis of
the complex scene will then be performed to output an estimated location of the
face.

The face normalisation module will then aim to transform the image into a
standardised format, where differing scaling factors and rotational angles of faces,
differing lighting conditions and background environment, as well as differing
facial expressions of faces will be considered. The normalised faces can then be
entered into the face recognition module, either to add a new face into the database
or to recognise a face from the existing database.

As discussed in Chapter 2, much research and focus has been placed upon the face
recognition stage, and most developmental work has been performed using photo
databases and still images that were created under a constant predefined
environment. Since a controlled environment removes the need for extensive

7
normalisation adjustments, reliable techniques have been developed to recognise
faces with reasonable accuracy provided the database contains perfectly aligned and
normalised face images.

Thus the challenge for face recognition systems lie not in the recognition of the
faces, but in the normalising all input face images to a standardised format that is
compliant with the strict requirements of the face recognition module. Although
there are models developed to describe faces, such as texture mapping with the
Candide model [15], there are currently no developed definitions or mathematical
models defining what the important and necessary components are in describing a
face.

Using still images taken under restraint conditions as inputs, it is reasonable to omit
the face normalisation stage. However, when locating and segmenting a face in
complex scenes under unconstraint environments, such as in a video scene, it is
necessary to define and design a standardised face database. As a result, numerous
configurations and schemes have been proposed and reviewed throughout the
design of this face recognition system in order to create such a database.
Techniques such as alignment of faces relative to the face centroid, centre of mass
of skinned surfaces, and distances between the eyes and the nose, have all been
considered.

3.2 Principle Component Analysis


Until Kirby and Sirovich [12] applied the Karhunen-Loeve Transform to faces, face
recognition systems utilised either feature-based techniques, template matching or
neural networks to perform the recognition. The groundbreaking work of Kirby and
Sirovich not only resulted in a technique that efficiently represents pictures of faces
using Principal Component Analysis (PCA), but also laid the foundation for the
development of the “eigenface” technique of Turk and Pentland [1], which has now
become a de facto standard and a common performance benchmark in face
recognition [16].

Starting with a collection of original face images, PCA aims to determine a set of
orthogonal vectors that optimally represent the distribution of the data. Any face
images can then be theoretically reconstructed by projections onto the new
coordinate system. In search of a technique that extracts the most relevant
information in a face image to form the basis vectors, Turk and Pentland proposed
the eigenface approach, which effectively captures the variations within an
ensemble of face images.
8
Mathematically, the eigenface approach uses PCA to
calculate the principal components and vectors that
best account for the distribution of a set of faces
within the entire image space. Considering an image
as being a point in a very high dimensional space,
these principal components are essentially the
eigenvectors of the covariance matrix of this set of
face images, which Turk and Pentland termed the
eigenface. Each individual face can then be
represented exactly by a linear combination of
eigenfaces, or approximately, by a subset of “best” Figure 3-1 Faces are Linear
eigenfaces – those that account for the most variance Combinations of eigenfaces.

within the face database characterised by its


eigenvalues, as depicted in Figure 3.1.

Consider an N-by-N face image I(x, y) as a vector of dimension N2, so that the
image can be thought of as a point in N2-dimensional space. A database of M
images can therefore map to a collection of points in this high dimensional “face
space” as Γ1, Γ2, Γ3, …, ΓM. With the average face of the image set defined as

M
1
Ψ=
M

n =1
Γn (3.1)

each face can be mean normalised and be represented as deviations from the
average face by Φi = Γi - Ψ. The covariance matrix, defined as the expected value
of ΦΦT can be calculated by the equation

M
1
C=
M

n =1
Φ n Φ Tn (3.2)

It is reasonable to assume that each face image is independent, that is, each image is
taken differently. If we further assume that all the input images into the face
recognition module have been perfectly normalised, we can conclude that the
variances of each Φi lie correspondingly, thus resulting in a set of time-aligned
images. The significance of these assumptions is because together with the
consideration that covariance is a zero-mean process, we can conclude that each Φi
is independent and that cross multiplication between different Φi is possible due to
their time-aligned characteristics, with the multiplication resulting in an expected
value of zero.

9
The above conclusion allows us to express the covariance matrix in an alternative
form. Letting matrix A = [Φ1, Φ2 … ΦM], based on our conclusions, Eq. (3.2) can
be rewritten as

1
C= ( AAT ) (3.3)
M

Since the factor 1/M will only affect the scaling of the output during eigenvector
analysis, we can omit this scaling factor in our calculation resulting in

C ≅ AAT (3.4)

Given the covariance matrix C, we can now proceed with determining the
eigenvectors u and eigenvalues λ of C in order to obtain the optimal set of principal
components, a set of eigenfaces that characterise the variations between face
images. Consider an eigenvector ui of C satisfying the equation

Cu i = λ i u i (3.5)

u iT Cu i = u iT λ i u i

u iT Cu i = λ i u iT u i (3.6)

The eigenvectors are orthogonal and normalised hence

1 i= j
u iT u j = { (3.7)
0 i≠ j

Combining Eq. (3.2) and (3.7), Eq. (3.6) thus become [17]

λ i = u iT Cu i

1 T M
= ui ∑ Φ n ΦTn ui
M n =1

M
1
=
M

n =1
uiT Φ n ΦTn ui

M
1
=
M

n =1
(u i Φ Tn ) T (u i Φ Tn )

10
M
1
=
M

n =1
(u i Φ Tn ) 2

M
1
=
M

n =1
(ui ΓnT − mean(ui ΓnT )) 2

M
1
=
M

n =1
var (u i ΓnT ) (3.8)

Eq. (3.8) shows that the eigenvalue corresponding to the ith eigenvector represents
the variance of the representative face image. Turk and Pentland thus suggest [1]
that by selecting the eigenvectors with the largest corresponding eigenvalues as the
basis vector, the set of dominant vectors that express the greatest variance are being
selected.

Recall however, that an N-by-N face image treated as a vector of dimension N2 is


under consideration. Therefore, if we use the approximated equation derived in Eq.
(3.4), the resultant covariance matrix C will be of dimensions N2 by N2. A typical
image of size 256 by 256 would consequently yield a vector of dimension 65,536,
which makes the task of determining N2 eigenvectors and eigenvalues intractable
and computationally unfeasible.

3.3 Optimised Eigenface Algorithm


An alternative to computing N2 eigenvectors was proposed by Turk and Pentland
[1]. Although the algorithm and methodology have been widely accepted and
applied since its introduction, we believe that a complete prove of the mathematics
behind the more efficient eigenface algorithm has not been seen and published in
any literature.

Similar to the calculations of the eigenvectors, ui, of the covariance matrix, C =


AAT in Eq. (3.5), consider the eigenvectors, vi, of ATA such that

AT Avi = µ i v i (3.9)

Pre-multiplying both sides by A and using Eq. (3.4), we obtain

AAT Avi = µ i Avi

CAv i = µ i Av i (3.10)

11
Comparing to the standard eigenvectors definition in Eq. (3.5), Eq. (3.10) implies
that the product, Avi, are the eigenvectors of C, with µi being its corresponding
eigenvalue.

Thus rather than calculating the N2 eigenvectors of AAT, we can instead compute
the eigenvectors of ATA, and multiply the results with A in order to obtain the
eigenvectors of the covariance matrix, C = AAT. Recalling that A = [Φ1, Φ2 …
ΦM], the matrix multiplication of ATA results in an M-by-M matrix. Since M is the
number of faces in the database, the eigenvectors analysis is reduced from the order
of the number of pixels in the images (N2) to the order of the number of images in
the training set (M). In practise, the training set is relatively small (M<<N2) [1],
making the computations mathematically manageable.

The simplified method calculates only M eigenvectors while previously it was


proven that there are mathematically N2 possible eigenvectors. As demonstrated in
Eq. (3.8), only the eigenvectors with the largest corresponding eigenvalues from the
N2 set are selected as the principal components. Thus, the eigenvectors calculated
by the alternative algorithm will only be valid, if the resulting eigenvectors
correspond to the dominant eigenvectors selected from the N2 set.

Consider the singular value decomposition (SVD) of matrix A, which would result
in a diagonal matrix S of the same dimension as A with nonnegative diagonal
elements in decreasing order, and unitary matrices U and V such that A = U*S*VT
and AT = V*S*UT. The covariance matrix, C = AAT, would therefore be equal to

AAT = U SV T V SU T (3.11)

= U S 2U T

We can split this into a combination of rank one matrices such that

N2
AAT = ∑ s n2 u n u nT
n =1

= s12 u1u1T + s 22 u 2 u 2T + s 32 u 3 u 3T + K (3.12)

If the total N2 eigenvectors of the covariance matrix were found, it would follow
from Eq. (3.5) that a spectral decomposition (decomposing a matrix into orthogonal
sub-basis) of the covariance matrix would give

12
N2
C = AA = ∑ λ n e n e nT
T
(3.13)
n =1

if λn represents the eigenvalues and en the basis vectors. As expected, Eq. (3.12)
and (3.13) shows that the s2 values are the eigenvalues, while the un values
correspond to the eigenvectors of C. By convention, it is acceptable to assume that
an SVD analysis decomposes the basis vectors in descending order such that

λ1 > λ 2 > λ 3 > K (3.13)

With each eigenvector corresponding to a specified eigenvalue, it follows that the


eigenvectors are also ordered according to its dominance with u1 being the most
dominant eigenvector since λ1 is the largest eigenvalue. This conclusion is
significant when considering the singular value decomposition of ATA, from Eq.
(3.11)

A T A = V SU T U SV T

= V S 2V T

M
= ∑ s n2 v n v nT (3.14)
n =1

= s12 v1 v1T + s 22 v 2 v 2T + s 32 v 3 v 3T + K (3.15)

Notice from Eq. (3.14) that the SVD expansion of ATA results in only M terms.
This is expected since the multiplication ATA gives an M-by-M matrix.

Using the same assumption derived in Eq. (3.13) and comparing the outcome of Eq.
(3.15) to Eq. (3.12), we observe that the ordered s2 values are the same in both
equations. This gives strong evidence that each derived eigenvector vn corresponds
to a respective eigenvector un. Since we have concluded that the eigenvectors un are
ordered according to their dominance, we have demonstrated that the M derived
eigenvectors vn are the first M eigenvectors of un, such that the set of v1 to vm forms
our desired dominant basis vectors.

This conclusion flows not only mathematically but also logically if we consider the
number of meaningful data points. If the number of data points in face space is less
than the dimension of the space itself, which in our case is true since M<<N2, it
follows logically that there will only be M – 1, rather than N2, meaningful

13
eigenvectors. The remaining eigenvectors will therefore have associated
eigenvalues of zero [1].

This coincides with the outcome of the SVD analysis. If the first M eigenvectors of
Eq. (3.12) are essentially the vectors derived in Eq. (3.15), the remaining
eigenvalues corresponding to the non-dominant eigenvectors of Eq. (3.15) must
therefore be zero in order for the equations to be mathematically correct.

3.4 Normalised Cross Correlation


The detection and location of faces in a complex scene can be achieved by using the
many well-established pattern-matching techniques available, such as area based,
transform based or feature based matching methods. Each of these techniques is
designed to compare how well two pixels match together, and is commonly used in
image matching, where relationships between pixels are used to calculate depth
information from a pair of stereo images.

Area-based methods are based on the analysis of the grey level pattern around the
point of interest and on the search for the most similar pattern in the successive
image [18]. Having defined a window W(x) around the point x, a similar window is
considered from the search image, and a similarity measure is used to identify how
closely matched the two window is. There are two main types of area-based
matching metrics – a distance measure and a correlation based method.

Using a distance measure comparison, the minimised distance would symbolise the
closest match, since the distance function aims to capture the difference between the
intensity values of the two pixels. Examples of the most common distance
measures include SAD (Sum of Absolute Differences), SSD (Sum of Squared
Distances) and ZSAD (Zero Mean Sum of Absolute Differences).

The closest match using the correlation-based technique is when the correlation
measure is maximised. Considering WL(x, y) as the window to be matched in
Image L of dimension N-by-N centred on the point (x, y), and WR(x+u, y+v) as the
window in Image R that is displaced from WL(x, y) by (u, v), the cross correlation
of the two window is defined as

N/2
CC = ∑ W L ( x + i, y + j ) W R ( x + i + u , y + j + v)
i , j =− N / 2
(3.16)

14
To improve on the plain cross correlation method, which can be over sensitive to
local characteristics, a normalised cross correlation method (NCC) is used, which
divides the correlation by the standard deviations of the signals in the window.
NCC is a commonly used matching technique, for it provides a measure of how
closely related a window of pixels is from another window based on the values of
each pixel inside the window, such that the closest matched window with will score
the highest correlation value. From this, we can determine the disparity of the
image. NCC has values ranging from –1 to 1, and a perfect match has an NCC
value of 1.

Defining the standard deviation of the windows as σL and σR, the equation for NCC
is:

N /2

∑ W L ( x + i, y + j ) W R ( x + i + u , y + j + v )
i, j =− N / 2
NCC = (3.17)
σL σR

3.5 Colour Segmentation


While coloured input images are generally inputted as Red, Green and Blue (RGB)
values, it is also possible to represent colour using other colour coordinate systems
such as HSV or YUV. The HSV system is a polar coordinate system with H
denoting hue values, S the saturation and V the intensity values. YUV is similar to
HSV with Y denoting the intensity, while the UV components, also known as the
chrominance components, specify the hue and saturation in Cartesian coordinates.

The equation that converts RGB values to YUV coordinates is [13]:

 Y   0.299 0.587 0.114   R 


U  = − 0.169 − 0.331 0.500  ⋅ G  (3.18)
     
V   0.500 − 0.419 − 0.081  B 

3.6 Morphology
Morphology is known to many as a branch of biology that deals with form and
structure of animals and plants. Morphology in mathematics is similarly a study of
regions and shapes in images. Morphological techniques are well developed for
binary images, but many methods can be successfully extended to grayscale. For a

15
binary image, white pixels are normally taken to represent foreground regions,
while black pixels denote background.

Virtually all mathematical morphology operators can be defined in terms of


combinations of erosion and dilation along with set operators such as intersection
and union. The operators are particularly useful for the analysis of binary images
and common usages include edge detection, noise removal, image enhancement and
image segmentation. The basic effect of dilation in a binary image is to gradually
enlarge the boundaries of regions of foreground pixels (typically white pixels).
Thus, areas of foreground pixels grow in size while holes within those regions
become smaller, as illustrated in Figure 3.2. Erosion, on the other hand, removes
the boundaries of regions of foreground pixels. Thus, areas of foreground pixels
shrink in size, and holes within those areas become larger.

The technique of ultimate erosion is a particular morphological technique used to


locate faces in our current system. Rather than attempting to template match a face
and determine the most “face-like” region, ultimate erosion makes full use of the
binary mapping derived from the colour segmentation stage to locate the centre of
the face.

One can visualise the erosion process as islands being eroded away as the sea level
rises, where islands are represented by the ones in the binary image. As erosion is
applied, the sea level will rise a little bit so that all the bordering pixels surrounding
each island will disappear. Logically, this will mean that the erosion will begin by
first eliminating the smallest islands, while the larger islands are being shrunk. As
subsequent erosion operations are applied, the largest island will be the last
surviving land-piece, and the inner most point or the centre of the island will be the
last point to be eroded.

Dilation Erosion

Figure 3-2 Example of morphological dilation and erosion [19]

16
4 System Design

Due to the complexity of the face recognition problem, a modular approach was
taken whereby the system was separated into smaller individual stages. Focus was
then placed upon making each stage a reliable section before integrating the
modules into a complete system. The face recognition system includes three major
tasks – face detection, face normalisation and face recognition – and each of these
tasks are further broken down into separate stages. This chapter details the design
of each of these stages, providing a detailed description of how the system was
constructed.

4.1 Design Architecture


While many previous face recognition systems have been designed and quoted their
superior performances using extremely optimised and controlled environments, our
system has been developed to match such successful performances with a number of
conditions unconstrained. With the implementation of various normalisation stages,
the face recognition system have been designed to perform recognition on images
where the faces are subjected to different scaling, rotation and illumination. Images
containing more than one face can also be processed, but only one face will be
identified per input image.

Figure 4.1 presents a block diagram of the integrated face recognition system. The
face detection module has been separated into three main stages - skin detection,
NCC search and face space search. NCC and face space search are not used
simultaneously and is selected based on the status of the database. The face
normalisation module includes a scaling, rotation and two lighting normalisation
stages, and the recognition module involves a training stage and a recognition stage.
Notice that the face detection and normalisation modules are interconnected,
meaning that the success of the face detection module is dependent upon the

17
accuracies of the normalisation modules and vice versa. The joint effort of the
detection and normalisation modules is subsequently to produce a standardised
database for the recognition stages.

Figure 4-1 System Architecture of the face detection (yellow), face normalisation (aqua), and face
recognition module (magenta).

Due to the difficulty in producing a robust system that can operate under any
environment and face orientations, two modes of operations have been devised for
this system – static mode and dynamic mode. Under static mode, recognition is
performed on still images captured under a constrained environment. It is assumed
that faces are properly scaled and without rotation, such that the unreliable scaling
and rotational normalisation modules can be omitted during static mode operations.
Dynamic mode on the other hand is designed to operate on video input so that real-
time face recognition can be performed. From a design perspective, the main
difference between the two modes is that in static mode, scaling and rotation is not a
concern, and thus focus can be placed upon making the other stages reliable before
attempting to relax the constraints on the input faces.

Notice that both modes of operations have been designed to operate without a
restriction on the illumination and the background of the image. There are also no
restrictions placed on the location of the face in the scene and different expressions
of the face to a certain extent, are allowed. Under such relaxed conditions, the face
recognition system has been demonstrated to perform efficiently at a high accuracy
rate. The results and evaluations of the system are presented in Chapter 5.

Since we are trying to design a working robust model for a face recognition system,
Matlab has been selected as the design platform, for it is a programming language
optimal for prototyping and image processing tasks. Translation into C++ remains
for future work, when higher speed, portability and aesthetics are of concern.

18
4.2 Face Recognition
When designing a complex system, it is important to begin with strong foundations
and reliable modules before optimising the design to account for variations.
Provided a perfectly aligned standardised database is available, the face recognition
module is the most reliable stage in the system. As discussed in the Chapter 3, the
biggest challenge in face recognition still lies in the normalisation and
preprocessing of the face images so that they are suitable as input into the
recognition module. Hence, the face recognition module was designed and
implemented first.

Extensive researches have already been conducted and completed in face


recognition by many research groups, producing many papers and available
literature that have addressed the issue of recognising faces based on a pre-
processed face database. Similar to other research groups, the task of implementing
a face recognition module is therefore accomplished first by considering a set of
predefined face inputs rather than using variable images. It is important to begin
with as little variable parameters as possible, and a pre-processed face database
omits any possible uncertainty from the detection and normalisation modules.

Given a perfect set of faces such that the scale, rotation, background and
illuminance is controlled, the recognition module can be designed to work with the
optimal ideal inputs, since it is crucial that the performance of this foundation
module be as optimised as possible. Its ability to recognise an ideal database will
determine the best possible performance attainable by the overall complete system.
Any subsequent development and implementation of the face detection and
normalisation module will therefore be aimed at providing this ideal set of database.

4.2.1 The Eigenface Approach


Based on the work of Cendrillon [17], Turk and Pentland [1], a face recognition
module based on the eigenface approach, the mathematical derivation of which is
presented in Chapter 3, was implemented. There are two separate phases included
in the recognition module depending on the task to be performed – the training stage
and the testing stage. The training stage takes in a database of normalised faces and
returns the weights of each face after projecting each onto the selected dominant
basis vectors. After a set of weights have been determined and the database
initialised, the testing stage can accept a new face image, subject it to eigenface

19
decomposition, and compare the resultant weights with the closest matched weights
in the database to identify the identity of the input.

Without discussing the mathematics of the algorithm, the eigenface training stage
can be broken down into the following steps:

1. Collect and construct a matrix T, of M fixed sized face images of known


individuals. It is assumed that the input images into the recognition module
have been pre-processed by the face detection and normalisation modules.
Each face should also be organised as a one-dimensional vector, and the
corresponding pixels of each face should be aligned in the matrix.

2. Compute the average face of the database by summing the intensity values
of each M corresponding pixels and dividing it by the total number of faces,
M.

3. Subtract the average face from T and obtain matrix A, a zero mean matrix
where each element represents the variance of the pixel’s intensity values.

4. Matrix C′ should be constructed by considering the optimised eigenface


algorithm presented in Chapter 3, such that C′ is an M-by-M matrix
computed by the product of AT and A.

5. The eigenvectors of C′ is calculated and sorted in descending order


according to each eigenvector’s associated eigenvalues.

6. Obtain the eigenfaces – the eigenvectors of the covariance matrix C – by


multiplying the eigenvectors of C′ calculated in step 5 by matrix A.

Figure 4-2 A set of perfect eigenfaces - ideal database [20].

20
7. Obtain the dominant basis vectors that describe and characterise the face
database by normalising the eigenfaces in step 6, by dividing each by its
vector norm.

8. Determine the weights of the M input faces of the database by projecting


each face image into face space, transforming each face into its eigenface
components. This can be accomplished by multiplying each individual zero
mean face from A by the eigenfaces obtained in step 7.

9. Each face is now represented by a set of weights, which can uniquely


describe the M input faces, and be used to reconstruct any face in the
database given the eigenfaces components. For the testing stage, it is
necessary to save the computed average face, eigenfaces, known weights
and the name of the faces.

The nine steps described above will transform a database of face images into a set of
projections into the constructed face space. If a large database is present for
training, for example if M is large, such that a representative set of eigenfaces has
been obtained, it is possible to use less than M eigenfaces to describe the database
[1], since eigenfaces that have small corresponding eigenvalues tends to overfit and
begin to describe peculiarities of individual faces. Under this circumstance, when a
new face is presented for addition into the database, rather than recalculating all the
eigenfaces, only the weights need to be determined by projecting the new face into
the existing face space.

Presented in Figure 4.2 previously is an ideal set of perfect eigenfaces. Each


eigenface is dedicated to describing a set of facial features that captures the greatest
variances between different faces. Along with the average face, due to the perfect
alignment of the database, linear combinations of these eigenfaces can reconstruct a
large ensemble of faces. Thus after the above initialisation stage is completed, any
subsequent faces can also be projected into face space described by the set of
eigenfaces, and be tested to see if the weights matches any of the M faces of the
database, performing the recognition stage of the module.

The recognition stage can be summarised by the following steps:

1. A test image T that is pre-processed by the same face detection and


normalisation modules as the face database is inputted into the recognition
stage of the face recognition module as a one-dimensional vector.

21
2. The saved average face, eigenfaces, known weights and the name associated
with each face from the initialisation are loaded into memory.

3. The input image is translated to zero mean by subtracting the average face
from it, resulting in vector N.

4. Since the face space is represented by the eigenfaces dominant vectors, the
zero mean input image can also be projected onto the space by multiplying
the eigenfaces with the normalised input image, N, in order to determine its
weights.

5. The calculated weights can be compared with the set of known weights to
find the minimum distance between the calculated weights and each face’s
set of weights. The minimum distance symbolises the closest match
between the input test image and the faces in the database.

6. If the minimum distance found is less than a certain sensitivity value, the
input test image can be identified as the identity of the matched face.
Whereas, if the minimum distance found is larger than the sensitivity value,
the input test image can be claimed to be an unknown identity and can
prompt the system to add this new face into the database repeating the
training stage.

If the input into the face recognition module is normalised and adjusted for scale,
rotation and lighting, the training and the testing stage of the module described
above will function flawlessly. However, although the eigenface approach works
theoretically, the criteria for perfectly aligned faces make a perfect face recognition
system difficult to accomplish. As noted throughout the literature on this topic, the
biggest challenge remains in designing normalisation modules that can provide such
ideal databases of faces for recognition.

4.3 Face Detection


The first step in trying to build an ideal database for the recognition modules is to
locate the exact position of the face in the image. Whether an input image is to be
added into the system or be tested for identity, before any processing can be applied
at the recognition stage, the face needs to be detected, extracted and normalised. As
illustrated in Figure 4.1, the process of face detection and face normalisation is
intertwined and supported by one another. The two modules do not work
completely as a stand-alone system and require the intermediate outputs of each

22
module to continue with any further processing. In the case of face detection, two
separate stages have been designed – the initial coarse detection phase, and the
refined face search stage.

The coarse detection phase involves a quick scan over the complete image analysing
the colour content of the input. The purpose of this stage is to reduce the search
space by identifying the skinned regions of the image, so that the second face
detection stage, which is performed after normalisation of the illuminance, can
apply refined search techniques in order to locate the exact position of the face in
the image.

The second stage has been designed to use two different algorithms depending on
the status of the database. When the database is empty and no faces have been
processed by eigenfaces decomposition, normalised cross correlation is used to
determine the centre location of the face. When a set of eigenfaces has been
determined by the training stage of the recognition module, subsequent refined face
searches can be accomplished by using a face space projection search.

4.3.1 Skin Detection


All images or complex scenes from a video sequence enter the face recognition
system through the face detection module, confronting the first stage – skin
detection. The purpose of this first stage is to perform a fast coarse search of the
scene in order to locate probable skinned regions in the image, so that non-skinned
backgrounds can be eliminated with the knowledge that the face will not be located
in those regions of the image. A smaller image can then be extracted from the
scene, such that subsequent searches for the exact location of the face can be
performed on a reduced search space rather than on the entire image. This not only
helps with increasing the speed of processing, but also the accuracy of the location
of the face by eliminating the probability of error and reducing the possibility of
erroneous false matches.

It has been demonstrated that human skin tones form a special category of colours,
distinctive from the colours of most other natural objects [21]. Although skin
colours vary between different people and different races, it was found in YUV
colour representation, human skin colours remain distributed over a very small
region on the chrominance plane [13]. Experiments have been conducted to verify
this conclusion and the results are presented in Chapter 5.

23
Colour Segmentation

To perform the skin region analysis, each pixel is firstly classified as being either
skin or non-skin. In order to increase the speed of this module and realising that it
only acts as a fast coarse search of the scene, a downsampled version of the image is
used. For an input image of size 240 by 320, a down sampled rate of four is
adequate, such that the skin detection module only needs to operate on a 60 by 80
image. Downsampling by a factor higher than four has been observed to degrade
the accuracy of the search dramatically.

A lookup table is employed to classify the “skinness” of each pixel, where each
intensity is tested to see if it lies in the range of skin colour and be associated a
binary value of one if it is and zero if not. After the colour image has been mapped
into a binary image of ones and zeros representing skin and non-skin regions, a
bounded box is needed to determine the range and location of the values of ones.
Recall that the purpose of the colour segmentation is to reduce the search space of
the subsequent modules, thus, it is important to determine as tight a box as possible
without cutting off the face.

It is common during the colour segmentation to return values that are closely skin
but non-skin, or other skin-like coloured regions that is not part of the face or the
body. These noisy erroneous values are generally isolated pixels or group of pixels
that are dramatically smaller than the total face regions, which would be represented
by a big connected region in the binary image. Inclusion of these noisy pixels
would result in a box that is much larger than intended and defeat the purpose of the
segmentation.

Morphological refinements are applied to the binary output in order to reduce some
of the effects of these noisy pixels. Since these spurious errors are generally much
smaller than the face region itself, morphological techniques such as erosion and
dilations are good tools to use to eliminate these pixels. For an improved bounded
box, morphological opening (erosion followed by dilation) is therefore used in the
system after the colour segmentation to clean up the binary mapping prior to
extracting the skinned region.

24
Ultimate Erosion

Due to the distinctness of human skin tone, considerations were also made during
the design of the system to employ the colour segmentation stage as an accurate
face search module rather than a coarse estimator for the location of the face. This
involves applying the binary mapping to its full advantage by using further
morphological techniques to analyse the skin detection results. Using the method of
ultimate erosion, successful results were obtained in locating the centre of the face
of many images using skin detection alone, omitting the necessity to employ refined
search techniques. However, since this method suffers from many restrictions, it is
left out of the design of the system and treated as a possible optimisation module.

Ultimate erosion is essentially morphological erosion repeated continuously. When


a binary image is continually subjected to erosion operations (which clears isolated
pixels as well as pixels lying on edges), the pixels that remain before the last erosion
operation will be most likely the centre of the largest block in the image. Thus the
philosophy is that if we apply ultimate erosion to a binary image of skinned regions,
assuming that the person is wearing clothes, the face should represent the largest
skin block and hence its centre locatable by morphology.

The input into the ultimate erosion module is the binary image output of the colour
segmentation stage. Since the skin colour detection stage contains noise and
possible erroneous values, the binary image needs to be pre-processed before
ultimate erosion can be applied. The task of this pre-processing stage is to ensure
that the face is represented by the largest continuous block in the binary image. Pre-
processing is accomplished by firstly performing an erosion step to clean up the
small noises. Morphological dilation will then be applied until the binary image
contains only large continuous blocks, so that all separated regions inside the face
will be connected. The euler number of binary images, which is a scalar
representing the number of objects in the image minus the total number of holes in
those objects, is used to indicate the connectivity of the separate regions.

Once the image has been successfully pre-processed, ultimate erosion is performed
and the remaining pixels at the last erosion step will indicate the centre of the face
in the image. In the presence of multiple faces, ultimate erosion will locate the
centre of the largest face in the image.

As will be evident from the results of Chapter 5, it was however found that the
distinctive skin colour region tends to fluctuate upon varying illumination
conditions and in occasions, become closely indistinguishable from certain

25
backgrounds. Since ultimate erosion depends extremely heavily upon perfect
colour segmentation, the accuracy of the face location would subsequently rely
upon the results of the unreliable skin detection module. Being able to reliably and
consistently segment and extract faces in a variety of complex scenes and
backgrounds is essential to the success of the system. As a result, skin detection has
been decided to act only as an initial module to determine a region of search for the
refined search stages.

4.3.2 Refined Face Search


As depicted in Figure 4.1, the input into the second refined face detection stage is an
illuminance normalised skinned region of the image. This stage is the final stage of
the face detection module, involving a refined search for the location of the face.
The precision of the output from the refined search modules will determine how
perfectly aligned the images are. Subsequently, as the recognition module requires
ideal inputs that satisfy a list of restrictive criteria, the accuracy of these stages will
be a measure of how successful the recognition and the overall system will be. It is
therefore extremely important and a high accuracy rate is of utmost concern.

Depending on the status of the face database, two methods have been designed for
this important stage of the detection module – Normalised Cross Correlation (NCC)
and Face Space Projection. NCC involves finding the best match between a
template and a sequence of windows, while face space search involves projecting
the sequence of windows into face space and measuring how “face-like” each
window is. Therefore, one major difference between the two searches is the input
data required and the choice of technique used is dependent on what information is
available. The NCC search requires a template; therefore, a typical face or average
face needs to be available in order for NCC to work. Face Space Projection requires
a set of eigenfaces so that each window can be projected into face space. Hence,
unless the set of dominant basis vectors has been calculated and available for use,
the projection technique cannot be used.

Normalised Cross Correlation

Since the set of eigenfaces is not available until a collection of face images have
been processed by the training stage of the recognition module, during the initial set
up of the face database, template matching with normalised cross correlation (NCC)
is used to perform the refined face search.

26
The basic principle behind NCC is to compare two windows of the same size and
measure its correlation or how closely related each pixel from one window is from
its corresponding pixel in another window. A maximised correlation value
therefore means that the two windows under examination, has each of their
corresponding pixels matching the closest from all the other windows under testing.
It is important to find a template that reflects the differing intensities of a face
accurately since it is the relative differences of intensity values within the picture
that is significant during the matching.

Hence, the best candidate for the template is the average face. It is a standard
template representing the most basic features of faces and contains as little biased
additions as can be found of any face image. It therefore is the most suitable choice
for a template that captures the relative differences in intensities between the
features of a face.

However until the training stage, neither the


eigenfaces nor an average face will be calculated,
and thus during this initial detection phase, an
average face representative of the current ensemble
of face images will not be available. An average
face that was obtained from the facial database of
Boston University [22] was therefore used as the
template for this module and is presented in Figure Figure 4-3 Standard Average
4.3. This average face obtained however did not Face [22].
provide accurate face detections and alternative
average faces had to be considered. After analysing
different variations of the average face, the template
that was selected for use in the final system is
displayed in Figure 4.4. Notice how the mouth and
the chin region of the original average face in
Figure 4.2 have been removed in order to obtain the Figure 4-4 Revised alternative
alternative improved template. Refer to Chapter 5 template for NCC.
for a presentation of the effects and the differences
of the two proposed templates.

With the template selected, the process of face detection with NCC remains merely
a matter of applying the equation of normalised cross correlation as discussed in
Chapter 3. The input into the refined search module, the normalised skinned region
of the image, will become the test window; the modified average face on the other
hand will form the template window. Sections of the test window, of the same
27
dimensions as the template, will then be continuously extracted and correlated with
the template. The extracted region with the highest correlation value will be the
region of the test window that corresponds the closest to the average face template.

Notice carefully that the technique of NCC matches intensities values that
correspond closely between two selected windows. It is the correlation between a
particular pixel in an image with another pixel of another image. No particular
focus has been placed upon the relationship of neighbouring pixels, or the
correlation between a region of pixels. Hence, in selected areas where the intensity
values associate similarly to the template of Figure 4.4, there is a possibility of false
matches.

Since this is a module to initialise a database, rather than trying to perfect the NCC
correlation technique and implement auto correction modules, the system allows the
user to manually select those results that successfully located the faces. This also
allows the user to visually and manually predict the future accuracy of the system,
since the more perfect the faces are aligned, the more reliable the succeeding face
recognition module is.

Face Space Projection

A more accurate and alternative method to perform refined face search is to use face
space projection. Once NCC has been used to initialise a database, the training
stage of the recognition module can determine a set of eigenface templates. Given
these templates, any subsequent image that passes through the second phase of the
face detection module can hence utilise eigen-decomposition rather than normalised
cross correlation.

The face space projection technique involves projecting each selected window into
face space. Similar to the NCC method, the input into this refined search stage – the
normalised skinned region of the original image – is treated as the test window.
Throughout the search, sections of this test window the size of an eigenface will
then be continually extracted for projection into face space.

The first step is therefore to load the average face and eigenfaces that were saved
from the training stage of the recognition module. Notice that the average face used
here is the actual average calculated from the set of input faces that were added into
the system, unlike the average face used in NCC, which is foreign to the current
database.

28
Once the average face and eigenfaces are available, the projection is accomplished
by firstly subtracting the selected section of the test window by the average face.
This again is a necessary pre-processing step to transform the data into a zero mean
image since all eigenfaces are calculated from the covariance matrix, which
originated from zero mean images. The zero mean test region is then projected into
face space by multiplying it with the loaded set of eigenfaces in order to produce a
set of weights.

From the weights, it is then possible to determine how “face-like” the region is by
comparing the energy of the window to the energy of the transformed window. If
the region was “face-like”, then the amount of projections onto face space would be
maximised, and large weight values will be recorded since each of the basis vectors
capture the dominant variances in a face. A “non-face-like” region will result in
smaller projection values, since the eigenvectors do not describe as clearly images
that do not reflect a face structure.

The maximum value that the weights can have is theoretically the total energy of the
original window itself prior to transformation, that is, every bit of energy is
conserved and transformed into face space. Mathematically, energy is defined as
the sum of the squares of the intensity values, also referred to as the norm of the
intensity values [23]. Therefore, if the region was exactly centred on the face, then
theoretically, the difference between the sum of squares of the transformed window
– the weights – and the norm of the original window should be zero. Each selected
region from the test window will thus have an associated distance from face space
recorded, such that the minimum distance out of all the regions tested will
symbolise the closest matched, the most “face-like” area, in the test window.

Face detection is accomplished in real-time, continually processing images while


operating in dynamic mode. Thus, not only will accuracy of this refined search
module be important, but the speed is also of concern. Unlike the NCC module,
which is applied at initialisation, the faster the face space search can be, the more
impressive the performance of the system can be. A multi-resolution search is
therefore employed here to achieve such effects.

Prior to inputting the normalised skinned region into the refined face space search,
the image is first downsampled where a quick coarse search can be applied. Then to
improve on the accuracy of the search, neighbouring pixels around the first estimate
will be tested in order to locate the exact centre of the face. The coarse to fine
search technique used here has demonstrated to provide significant speed
improvements, with the amount of speedup limited by the accuracy of the

29
downsampled window. Since the input into the refined stage is already a smaller
subset of the original window, it has been found that a downsampling factor of two
is a reasonable reduction in search space while maintaining the accuracy of the
search.

4.4 Face Normalisation


The process of face detection and face normalisation is intertwined and the success
of the system is dependent on how well the two modules work together. Face
detection will only be successful if the input faces are normalised so that the
orientation and lighting of the images are similar to the stored templates of the
system. In other words, the input images have to be adjusted in such a way that it
would seem as if all input images are captured under the exact same conditions and
environment. This remains the biggest challenge yet to be solved by the many
research groups in this area, and is the major reason why a complete solution to the
problem of face recognition has not been presented.

The face normalisation module can be divided into three main categories – scaling
normalisation, rotation normalisation and lighting normalisation.

4.4.1 Lighting Normalisation


Lighting normalisation involves adjusting the illuminance of the image so that all
images can be regarded as taken under the same lighting conditions. Since the
refined face search modules rely on the intensity values on each pixel, careful
attention needs to be paid in conserving the same amount of illuminance on all
images such that no particular bias is placed upon lighting differences. The
recognition should be based upon the relationship and the variances between faces
and not on the background lighting conditions or the time of day that the image was
captured.

The total amount of lighting can be normalised by observing the energy embedded
in the images. In order for unbiased comparisons between images, the relative
ratios between the pixels within the image itself should not be altered. Therefore,
all intensity values within the image should be multiplied by a constant ratio such
that the resultant total energy is a standardised value. This ratio is determined by
the ratio of energies.

For our system, since we have selected to use the average face as the standardised
image, all input images should normalise their total energy so that it matches the
30
total energy of the average face. Notice that in the case of using NCC for the
refined search stage, the standard template obtained from Boston University [22]
will be used as the average face, while with face space projection, the average face
from our database will determine the total energy. The refined search technique
therefore not only influences the operations of the face detection module but also
the face normalisation modules, further emphasising the dependency between the
different modules.

The lighting normalisation module is applied in two places: once before the refined
detection module and once prior to entering the face images into the recognition
module.

The first normalisation is applied to enhance the accuracy of locating the face, by
ensuring that all selected regions contain the same total energy as the average face.
This is to avoid extreme biasing towards a particular lighting condition in the image
when the distance functions are calculated. There was a particular problem with
NCC search when light was incident from one end of the image to the other, and
biasing was placed upon the brighter side since it would naturally record a higher
correlation value regardless of the template. The addition of a lighting
normalisation stage relieved the problem.

The second lighting normalisation stage was applied so that all images inputting
into the recognition module will have the same total energy, so that no bias is placed
upon a particular face during recognition. Since all images at this stage will be the
same size, a conservation of energy would act as if all images of the heads were
taken under the same lighting conditions. The total energy of an image is the sum
of the squares of the intensity values [23].

image
Energy = ∑ (Intensity ) 2 (4.1)

This makes the ratio between intensities a square rooted relationship between two
images. That is, if we compute the ratio of energy between image A and image B to
be e, we would multiply the total energy of image B by e so that image A and B will
have the same total energy. We would however, only multiply each intensity value
by the square root of e, √e, so that the sum of the squares of (B*√e) is the same as
the norm of A.

In our recognition system, all images are stored as intensity values. Therefore,
inside the illuminance normalisation module, the energy of the selected window will

31
first be calculated and then compared to the energy of the average window. The
ratio of the energy is then computed, and the intensity of each pixel in the window
will be multiplied by the square root of that ratio. The resultant image will be such
that the total energy of the image is normalised.

4.4.2 Scaling Normalisation


A standardised scale is essential for the correct operation of the two refined search
stages and the recognition module. In order for accurate face detection, it is a
requirement that the face in the input image matches the size of the faces that is in
the stored database; and for recognition, comparison between faces is not possible
unless the faces are matched in scale. Scaling problems are a result of faces being
photographed at different distances from the camera. Hence, whether the head
appears bigger due to the face being extremely close to the camera, or smaller
because the face is far away, all the faces need to be normalised to a standard size
that conforms to the database.

Many proposed methods for solving the scaling problem, such as eyes detection,
suffer from scaling problems in itself. Thus, an alternative method that does not
depend on correct scaling in the first place is proposed for this system. This method
involves making full use of the skinned region information obtained from the colour
segmentation stage. Based on the binary mapping of the face region in the image,
the size of the extracted box is compared to the dimensions of the template. Again,
in the case of NCC face search, the template will be the obtained average face,
while face space projection and face recognition will use the eigenfaces as the
template.

Two scaling factors are then determined, one being a ratio of width and the other, a
ratio of height. Selection is achieved by choosing the larger ratio between the two
ratios when scaling down, and selecting the smaller ratio when scaling up. This
process of using the same ratio for both dimensions is chosen in order not to morph
the face to a different set of dimensions by keeping the relative variances between
the pixels constant. Once the ratio has been determined, normalisation can be
achieved by either adjusting the dimensions of the extracted box, or a new box can
be extracted from a scaled version of the image, such that the face in the image is
now in the same scale as the templates. Refined search techniques such as face
space projection can then be applied to accurately locate the exact centre of the face.

32
Figure 4-5 Illustration of the determination of the rotational angle

4.4.3 Rotation Normalisation


In a video sequence, the faces in the scenes will typically be subjected to many
different rotational orientations. In order to match the recognition ability of human
beings, which is currently under vigorous research, a face recognition system would
need to be able to identify faces without placing restrictions upon the allowable
rotational orientations of the faces. Due to the three dimensional rotary motion of
heads, forward tilting, planar rotations (sideway tilting of heads) and turning
rotations (twisting of heads) are all major problems encountered by face
normalisation modules. However, due to the complexity, only planar rotation –
sideways tilting of the head – has been investigated in this system.

A method involving the use of the skin detection binary image to account for
rotation has been proposed for this system. With the aid of Matlab’s specialised
functions [24], the aim is to compute the angle of rotation of the face block in the
binary image. This is achieved by wrapping an ellipse around the faced region such
that the second-order moment of the ellipse is the same as the second-order moment
of the binary mapping. The angle formed between the major axis of the calculated
ellipse and the horizontal x-axis will consequently represent the face’s angle of tilt.
Rotation normalisation can be achieved by rotating the image by the negative of that
angle. Figure 4.4 illustrates an example of how the rotational angle is computed.

4.4.4 Background Subtraction


Under the current design of the face detection and normalisation modules, the
extracted face input into the recognition stage is a rectangular box around the face.
Since a face is elliptical in shape, no matter how tight a bounded region is
determined, with a rectangular box, there are bound to be sections of the
background included inside the extracted image. Background subtraction thus aims
33
to improve on the face inputs by eliminating the background completely so that only
the face is present in the extracted image. This can be accomplished simply by
using the colour segmentation results of the face detection stage, using the binary
image as a mask over the face region.

Considerations were made for the inclusion of a background subtraction stage into
the design of the recognition system, and experiments and tests were conducted to
evaluate the effectiveness of this addition. From the results, it was found that the
recognition performance decreased dramatically with the inclusion of a background
subtraction stage due to the excessive amount of energy being placed into mapping
the boundaries and edge effects of the cutting. Thus rather than recognising the
variances of the face, it is measuring the shape of the background segmentation. As
a result, although this method theoretically enhances the recognition, it is practically
unfeasible and has not been integrated into the system.

34
5 System Performance Evaluation

The face recognition system has been tested with face images captured under a
variety of conditions. The performance of each module and the overall system is
detailed in this chapter. Examples of accurate recognition and cases that highlight
limitations to the system are both presented, allowing an insight into the strengths
and weaknesses of the designed system. Such insight into the limitations of each
module is an indication of the direction and focus for future work.

5.1 Face Detection


The aim of the face detection module is to locate as accurately as possible the centre
of the face regardless of the background, scale, rotation or lighting in the image.
The performance of each module is dependent on the output of the other module’s
stages. For example, the success of the NCC and face space projection search is
heavily dependent on the accuracy of the scaling normalisation module.

While the skin detection stage is invariant to scaling and rotation, the refined face
search stages are sensitive to changes. Thus in order to gain a better judgement on
the performance of the refined face search modules, these stages are evaluated under
static mode so that faces of a single scale are used to test the system. The ability to
handle multi-scaled rotated face images is evaluated by the normalisation modules.

5.1.1 Skin Detection


Colour segmentation and skin detection is the first module confronted by any input
image. The colour content of each pixel is analysed and determined to be either
skin or non-skin. As stated by Wang and Chang [13], regardless of the differences
in skin colour of different people and different races, the chrominance of skin is
relatively consistent. The major difference between skin tones is intensity.

35
150

140

130

120

110

100

90

80
0

5 0

5
10 10

15
15 20

Figure 5-1 Decomposition of image into V plane

Investigations were made to verify that human skin tones do in fact form a special
category of colour in the UV-plane. Figure 5.1 is a three dimensional visualisation
of an image in the V plane. Although the decomposition was applied to a
downsampled version of the 240x320 image on the left, the difference in colour
remains evident and the skin regions are shown to have higher values in V. The
shape of the face is also noticeable and the face is clearly distinguished from the
background. Thresholding thus can be applied to the coloured input to produce a
binary mapping of skinned regions in the image.

50

100
V

150

200

250
50 100 150 200 250
U

Figure 5-2 Skin Colour Distribution on UV plane

36
It is however possible that the plot in Figure 5.1 shows a particular difference in
skin colour because of the background. In order to demonstrate and substantiate
Wang’s claims, a range of skin colour was plotted in the UV-plane. This plot is also
useful as a colour lookup table. Before colour segmentation can be applied, the
range of U and V values that are specific to human skin tone needs to be
determined. The plot is presented in Figure 5.2 and it is clear that skin-tone colours
are distributed over a very small area in UV space.

To analyse the “skinness” of a pixel, skin-tone colour statistics were obtained.


Representative skins from a variety of country and colour were used and a
distribution of skin colour in the UV plane was generated. Over twenty sample
faces were used, and Figure 5.3 presents some of the skin colours that were used to
generate the distribution.

Figure 5.4 presents the binary images of the three faces in Figure 5.3 after the
colour segmentation operations. From these binary mappings, noises and spurious
errors are cleaned up by morphological opening (erosion followed by dilation) and
the approximated locations of the faces determined. These outputs allow smaller
images containing the faces be extracted so that further processing in normalisation
and alignment of the faces can be applied.

Figure 5-3 Examples of skin colour used to generate the lookup table

Figure 5-4 Binary Mapping of skinned regions of the faces in Figure 5.3

37
Figure 5-5 Intensity effects on Skin Detection. The middle picture is obtained using the same lookup
table as the pictures on Figure 5.4, while the picture on the right used a lookup table with the V values
shifted higher.

Using face images taken under the same conditions as in Figure 5.4, of the 50
images tested, all 50 returned a binary map that closely bounded the face region.
The use of the UV model however has also presented some problems and
limitations. Although Wang and Chang [13] claimed that the chrominance model is
intensity invariant, experimentations have shown that intensity affects the range of
the skin colour region. Under different illumination conditions, although skin-tone
colour remains conglomerated in a specific region in the UV-plane, it was found
that the actual range was changed.

Figure 5.5 presents two results of colour segmentation of a photo taken under a
different lighting condition from the photos that created the lookup table. The first
result is the binary mapping of the photo using the same range as the photos that
were used to create the table as depicted in Figure 5.2, while the second binary
image is a shifted version of the lookup table. From these results, it is known that
intensity affects the ability to properly segment the colour, and an intensity invariant
model needs to be derived. Several models were attempted, such as plotting U/Y
and V/Y, but with no success.

Figure 5-6 The black box around the image on the right is the output of the face detection module
given the left input image, encapsulating the pixels that have been identified as skin.

38
As a result, the skin colour region is expanded in the UV-plane in order to
accommodate for the varying illumination conditions. The trade off is that the
detection is not as accurate and it is due to the unreliability and lack of precision in
the skin detection method that causes it to be applied only as a quick coarse search
to approximate the location of the face in the face recognition system.

Figure 5.6 is an example of the output of the coarse skin detection module. Notice
that although it has located the skinned region of the face in a complex background,
due to the larger acceptable range for skin, a larger region than the face is extracted.

5.1.2 NCC Search


The second face detection stage, which occurs after a
lighting normalisation module, is the refined search
stages. The method of NCC will be used to locate the
centre of the faces when the database is first
initialised and no eigenfaces have yet been
determined. NCC is a correlation-based matching
measure, and it was decided during the design that a
standard average face be used as the template. The Figure 5-7 Standard Average
Face [22].
average face used is obtained from the facial face
database of Boston University [22], and is shown in
Figure 5.7.

Depicted in Figure 5.8 is one of failed matches that occurred for many different
images when correlating with the standard average face of Figure 5.7. The yellow
cross indicates the region that recorded the highest correlation measure. Notice that
the correlation is performed not only on the extracted skinned region (illustrated

Figure 5-8 The white box in the left image, from skin detection, is the input into NCC refined search.
The yellow cross in the right image is the outcome of the NCC search using the template of Figure 5.7.

39
with the white bounding box), but also in an illumination normalised version of the
extracted image. Inaccurate detection was recorded regardless of whether
illumination normalisation was applied.

The poor results obtained with the standard face were


analysed and different variations of the standard
average face were investigated in order to obtain the
most suitable template for face detection. Through
experimentation, it was found that a template with the Figure 5-9 Revised alternative
chin and mouth region omitted produced the most template for NCC
consistent and reliable results. The alternative
template used, depicted in Figure 5.9, is essentially
the standard average face of Figure 5.7 with its mouth
region removed.

Using this alternative template, out of the 20 faces that were processed through
NCC face search, 13 were successful in locating the centre of the face. One of these
success outcomes is presented in Figure 5.10 along with a plot of the correlation
measures obtained. The dark red region symbolises the highest score, which
corresponds to a point between the eyes. This result is expected since the template
has changed and the centre of the revised template has shifted from the centre of the
face to a point closer to the middle of the eyes.

Although a 65% success rate is not accurate enough for a robust recognition system,
errors are however tolerable in the NCC search. Since this method is only used for
initialising a database, incorrect outputs can be omitted, storing only successful
results, so that only the aligned images are passed onto the eigenfaces training stage.

Figure 5-10 Successful location of face centre using revised template, along with plot of correlation
measures of the image

40
5.1.3 Face Space Search
As the last face detection stage before the recognition module, the face space search
is extremely important. The accuracy of the face alignment and subsequently the
accuracy of the recognition depend heavily on the success of this module. While
errors in the preceding stages, such as skin detection, are tolerable since its purpose
is to provide an approximated region of where the face is located, any errors in the
face space search will directly degrade the performance of the system, making
accuracy the utmost concern.

On the other hand, the success of the face space search depends heavily on the
outcomes of the normalisation modules. In static mode, where the scaling and
rotation need not be adjusted, with the correct lighting normalisation implemented,
as many as 92% of the input images were aligned perfectly with the centre of the
faces located. In dynamic mode, due to the allowable variations in scale and
rotation, eigenfaces that might not exactly match the input are used for detection.
Since face space search measures how well the windowed input decomposes into
face space described by the eigenface templates, it is not unexpected that the
performance would decrease dramatically when scaling for example has altered the
size of the face input.

Under correct scaling and rotational adjustments however, successful results have
been recorded. With the illumination normalisation stage implemented, such that
the energy of the windowed images matches the total energy of the eigenfaces, out
of the 96 face images treated by face space search, 88 recorded accurate detection of
the centre of the face. The remaining 8 recorded the correct location as the second
best closest match.

Illustrated in Figure 5.11 is a typical output


from the face space search module. Remember
that the face space search is accomplished by
projecting each selected window into face space
and observing how “face-like” each window is.
Thus, the most face-like region should logically
be when the centre of the selected window
coincides with the centre of the face such that
the window is completely over the face. As Figure 5-11 Face Space Search
depicted in Figure 5.11, the centre of the face is
located at some point through the centre of the
nose and between the two eyes.

41
Figure 5-12 Left: Distance map of face space search of face in Figure 5.11. Right: 3D visualisation of
inverse distance surface with the centre of the face represented by the peak.

As the window is scanned across the image, a distance measure is computed to


evaluate how “face-like” each region is. The distance measure is calculated by the
difference between the norm of the window and the sum of the squares of the
weights. At the centre of the face, the amount of energy converted through the
decomposition should be maximised, and hence record a minimum distance.

The distance maps that describe the image in Figure 5.11 are presented in Figure
5.12. On the left is a plot of the recorded distances with the dark blue region
representing the minimum value, while on the right is a 3D visualisation of the
inverse distance surface so that the minimum value on the distance map is
represented by peaks. From the 3D plot, it is easy to observe how face space
projection records the centre of the face distinctively. Comparing this result to the
NCC distance map in Figure 5.10, the centre of the face is more distinct, accurately
found, and not spread out over a range of possible locations.

5.2 Face Normalisation


Successful face detection is dependent upon successful normalisation. Unless all
input images conform to a common scaling, rotation and illumination, it is
extremely difficult to accurately locate the face and recognise it. Performances of
each of the normalisation modules is evaluated and presented in the following
section. Apart from the accuracy of determining the correction factors, it is also
important to observe the consistency of the results. The inclusion of a static mode
for the operation of the face recognition system is precisely due to the unreliability
of some of the normalisation stages.

42
Figure 5-13 Face input extracted by binary map and scaled normalised.

5.2.1 Scaling Normalisation


Under dynamic mode, where real-time recognition is performed on video input,
faces of all scales will be present. The task of scaling normalisation involves
ensuring that the scaling of all the faces in each input frame captured conforms to a
standardised size determined by the eigenfaces. This is accomplished by computing
the scaling correction factor from the bounded region of the binary image produced
from the skin detection stage.

As shown in Figure 5.13, a binary map representing the skinned region is used to
extract the face in the image. Presented is a face that is larger than the standard size.
Thus, a scaling correction factor is computed and applied to the extracted region.
The scaling factor is determined by selecting the larger ratio between the
dimensions of the extracted region to the dimensions of the eigenfaces when scaling
down, and selecting the smaller ratio when scaling up. This is in order not to morph
the face to a different set of dimensions such that the relative variances between
pixels are altered.

Using this technique and producing scaled versions of the extracted box, the
subsequent face space search stage has been tested to successfully locate the centre
of the faces and produce aligned images. It has been found that as long as the skin
detection stage outputs a binary map that bounds the face correctly within an error

43
of ten pixels, the scaling normalisation module will output a scaled image that is
acceptable to the refined search stages. A bounding box larger than the face by
more than 20 pixels will result in unreliable face location from both NCC and face
space projection search.

When only one face is present in the image and an appropriate colour model has
been chosen, a suitable bounding box can be assured since morphology can be used
to clean up spurious noises. However, in the presence of other skin-like or skinned
regions, such as shoulders and arms, the colour segmentation module will not be
able to output a tight box around the face, preventing an accurate scaling factor to
be computed. Until further research and improvements have been conducted in the
normalisation modules, operating the scaling stage in dynamic mode will produce
unreliable outputs.

5.2.2 Rotation Normalisation


In a video sequence, the faces appearing in the scenes will typically be subjected to
many different rotational orientations. Before a face can be recognised, the
orientation of the faces need to correspond to the database, which in our case, is a
frontal view of the face. While faces can be rotated in three dimensions, only planar
rotation normalisation – sideways tilting of the head – has been investigated in this
system.

The rotational normalisation module designed in this system works similarly to the
scale normalisation module. As depicted in Figure 5.14, rotational analysis is also
based on the binary mapping produced from the skin detection stage. The rotational
angle determined from the binary map is used to adjust the scaled face so that the
extracted box can be normalised in terms of both scale and rotation. Many pictures
of different faces subjected to planar rotations ranging from 10 to 45 degrees have
been tested, and it was observed that all the images were correctly normalised to
within five degrees.

Such successful results, however suffer from the same limitation as the scaling stage
in that the accuracy of the correction angle is heavily dependent upon the results
from colour segmentation. Whilst outputs from the rotational module can be fed
into the refined search to precisely align the faces, the rotational results are
inaccurate unless a tight bounded box of the face was extracted from the skin
detection module.

44
Figure 5-14 Face image subject to both scaling and rotational normalisation.

5.2.3 Lighting Normalisation


Accurate refined face search was not achievable until the lighting normalisation
stages were inserted into the system. Adjusting the total energy of the extracted
window prior to face space search improved the consistency of face location
dramatically. Prior to the standardisation of the lighting, unexplainable minimums
were recorded in many of the images. After lighting adjustments have been applied
to the extracted boxes, the distance maps converged to a minimum at the centre of
the face.

Mathematically, it would seem more logical to normalise the energy of each of the
selected window during the face search scan rather than the whole extracted region.
Experimentally however, it was found that detection accuracy decreased
considerably when lighting normalisation was applied to each individual window.

45
Figure 5-15 Lighting normalised output for two images of different background illumination.
Notice how the detection is performed on extracted regions subjected to lighting adjustments.

Thus the system currently involves a lighting normalisation stage prior to the
beginning of the face search. With such addition implemented, as many as 92% of
the images have had the centre of the face correctly located under static mode,
where scaling and rotation does not affect the outcome.

The second illumination normalisation stage occurs after the faces have been
successfully aligned to ensure that all the faces into the recognition module have the
same total energy. Shown in Figure 5.15 is an example of how images taken under
different lighting has been normalised to a standard. The inclusion of this stage has
enabled the recognition accuracy to improve. Rather than matching weights that
reflect common intensity and background illumination distributions, the recognition
stage, with all faces normalised in terms of lighting, focuses on the relative
variances between the faces. The current recognition system under static mode has
demonstrated to be performing at 96% accuracy, as detailed in the following
section. Prior to the insertion of the second illumination stage, many faces were
matched wrongly and only 50% of the faces were correctly identified.

Apart from extreme cases where the lighting of the images is extremely dark or
excessively bright, the combination of the two lighting normalisation stages can
compensate for almost any illumination variations. Reliable face detection and
recognition has been achieved with the addition of these two stages, making them
the only dependable components within the normalisation module.

46
Figure 5-16 Comparison between eigenfaces constructed from a standardised (top row) and a
misaligned (bottom row) database. The leftmost picture in both rows is the associating average face.

5.3 Face Recognition


The importance of aligning and standardising all face inputs prior to entering the
recognition module has been continually emphasised throughout the thesis.
Without an aligned set of faces, the computed eigenfaces will focus on describing
the misalignments and variations in the position of the faces rather than the desired
variances between facial features. Even though many face recognition systems
presented in the past has shown superior recognition performances, careful analysis
of the eigenfaces used will show that many of these systems were performing
identification based on the orientation of the faces.

The differences and consequences of using a standardised database compared to a


misaligned database is described and analysed in the following section,
demonstrating the significance of the normalisation modules’ task in aligning the
faces. A robust, reliable and extendable face recognition system cannot be
constructed unless an ideal database is available.

5.3.1 Training Stage


The eigenfaces constructed from the training stage determines the success of the
face recognition system in correctly identifying faces. Unless equipped with a set of
eigenfaces that can be linearly combined to describe any face, the system will not be
able to identify faces from large databases. Such a set of eigenfaces can be
constructed given a set of perfectly standardised face inputs.

47
Presented in Figure 5.16 are two extreme ensembles of eigenfaces along with their
associating average faces. The set of eigenfaces on the top row is constructed from
a collection of faces that have been treated by the face detection and normalisation
modules, with the leftmost picture in the row being the average face of that set of
eigenfaces. The bottom set of eigenfaces, on the other hand, is computed from a
database that did not restrict the position and scaling of the faces, with the leftmost
picture of the bottom row being the average face.

Observing the top row shows that an aligned database produces eigenfaces that tend
to describe variations in faces, while in the bottom set, the eigenfaces tend to
highlight the misalignments. Details of individual faces can also be seen from the
misaligned database since the misalignment has caused that particular face to be
described by its own eigenface. Misaligned databases thus defeat the purpose of
eigenface decomposition since each face is supposed to be represented by a linear
combination of eigenfaces, each representing a dominant basis that captures the
variances between faces.

The current face detection and normalisation modules have been producing faces
that are aligned within a tolerable error in static mode, such that a “good” set of
eigenfaces can be determined. Misalignment to the order of less than five pixels is
acceptable. However, with the introduction of scaling and rotation variations, the
standardisation ability of the normalisation module tends to decrease causing
misaligned faces to enter the recognition module. Therefore, when the recognition
system is operating in training mode to obtain the eigenfaces, static mode is applied
so that images that require extensive adjustments are avoided.

5.3.2 Recognition Stage


To demonstrate the capability and the accuracy of the recognition stage, a selected
database of ten faces and several recognition results are presented. The database
presented in Figure 5.17 is a small database that was used during the development
of the design to test the recognition ability of the system. The faces presented are
the inputs into the training stage where a representative set of eigenfaces were
determined. After training, new images are face detected, normalised and entered
into the recognition stage for identification.

Identification is performed by comparing the weights of the test face with the
known weights of the database. Mathematically, a score is found by calculating the
norm of the differences between the test and known set of weights, such that a
minimum difference between any pair would symbolise the closest match.

48
Figure 5-17 Small face database used for testing – all faces have been standardised using the face
detection and normalisation modules.

58% 86%

97% 95%

85% 93%

9% 64%

Figure 5-18 Sample results of recognition stage with confidence measure included. The left image of
each pair is the normalised test input, while the right image is the closest match found.

49
A confidence measure has been devised during testing to describe the accuracy of
the recognition. Its aim is to describe the certainty of the identification by searching
for a minimum score and observing how unique the score is compared to the other
matched scores. Mathematically, it is computed by the difference between the
squared of the score of the best match and the squared of the score of the second
closest match, divided by the squared of the second matched score. Since the scores
are originally calculated by norms of vector differences, it is not mathematically
unsound to use the squares of these values as a form of measure.

Confidence % =
(Second Best Score)2 − (Best Matched Score)2 % (5.1)
(Second Best Score)2
A best match score of zero, recorded when the test weights of an image correspond
exactly to a set of known weights, would result in a confidence measure of 100%,
implying that the face is identified with complete certainty.

Under static mode, depending on the availability of the images, between 3 to 6


different photos of each of the ten faces in the database of Figure 5.17 has been
tested for recognition. Some of the results are presented in Figure 5.18 displaying
the standardised test face samples, its closest match and its corresponding
confidence measure. Of the 50 pictures tested, 48 recorded correct identification,
claiming an accuracy of 96%. Although these include confidence measures ranging
from 9% to 97%, many of the recognition uncertainties can be explained by slight
differences in orientations and scaling of the test faces compared to the database.
Notice also that the recognition system is, to some extent, invariant to facial
expressions. In one of the examples in Figure 5.18, the recognition system was
even able to correctly identify an individual with and without glasses.

One of the failed examples, identified with a small 9% confidence, is also presented.
Comparing with the database shows that it is due to the face being improperly
scaled, such that a complete picture of the face was not provided to the recognition
stage. Many errors in recognition can similarly be attributed to poor normalisation,
further emphasising the importance of strictly standardised databases.

5.4 Integrated System Performance


While the previous sections were dedicated to presenting the results of each separate
individual stage, this section aims to combine all these results and provide an
overview of the performance of the overall integrated system under static and
dynamic mode operations.
50
Skin Box
Detection Extraction

Lighting
Normalisation

Face
Face Space
Detection
Distance Map

Face
Extraction

Lighting Recognition
Normalisation
67%

Figure 5-19 Example of integrated stages of a complete face recognition system under static mode.
Successful face detection and normalisation has been achieved with correct recognition at 67%
confidence.

The results of the recognition stage discussed previously are good indications and
reflections on the accuracy of the face recognition system under static mode. Prior
to arriving at the standardised inputs seen in Figure 5.18, each test face that is
shown has been treated with the face detection and normalisation modules. Thus, a
successful recognition implies that all the individual stages discussed were also
successful in providing accurate results. An incorrect output from any stage, such
as inaccurate face detection, would result in possible erroneous recognition such as
the example in Figure 5.18. Figure 5.19 shows an example of the integrated system,
where all the stages combine to perform recognition on a test image.

51
The successful example in Figure 5.19 is an excellent illustration of how all the
stages are integrated, in particular, how the face detection and normalisation
modules interlink in order to accurately locate the face. This particular example,
which was a problematic image in the earlier stages of the design of the system, has
been successful in extracting the face regardless of the facial expression and the
presence of a hat. Notice how the adjustment of the lighting, normalised the total
energy of the image so that the face space search could accurately locate the face,
demonstrated by the peak in the inverse distance surface.

Apart from achieving an accuracy rate as high as 96% in static mode, the speed of
processing the modules is another useful measure of performance for the face
recognition system. Using a Pentium III 833MHz processor, 256MB RAM, and a
Logitech web cam, the recognition system has been demonstrated to perform at a
maximum speed of two frames a second under Matlab. This includes the total
processing time for all three modules – face detection, normalisation and
recognition – for each input image. This speed is expected to increase substantially
when the system is implemented in C and when prior knowledge of the location of
the face is integrated using tracking.

For demonstrations, the system operates under dynamic mode where real-time face
recognition on video input is performed. Since faces of all angles and orientations
will be confronted, consistently high recognition accuracy of every frame is not
viable. However, knowing that continuous face images are available for
recognition, the current system is designed to only acknowledge and output
identification when the confidence level is high and the best match score is low.
Since only faces that are aligned properly can result in low scores at high
confidence level, under dynamic mode from a database of ten faces, accurate
recognition as high as 70% has been achieved by the system. That is, of the ten
individuals tested under the real-time recognition system, seven faces were correctly
identified.

Successful results were also recorded in cases where there was more than one face
in the image as is often the case when operating with real-time video input. As the
face space search is scanned across the image, the face that is most “face-like” and
matches the scaling of the eigenfaces will be extracted and consequently passed
onto the recognition stage for identification.

52
6 Design Review

The previous chapters have provided a detailed description of the design and a
comprehensive evaluation of the performance of the face recognition system.
Presented in this chapter will be a discussion of the results, analysing the strengths
and weaknesses of the system and proposing future work based on the limitations of
the current system.

6.1 The Integrated System


An operational system has been built for recognising faces under static mode. With
limitations applied to the scaling and rotation of the faces, the face detection and
normalisation modules have been successful in reliably producing a standardised
database for the recognition stage. With a reasonably well aligned face database,
recognition accuracy as high as 96% has been recorded by the face recognition
system. These include images that are taken under different lighting conditions on
different days, with different facial expressions and for certain individuals, with and
without hats or glasses. These successful results however are very sensitive to
scaling, rotational and illumination changes. Once the preprocessing steps fails to
produce a strictly standardised database, the system’s performance degrades
considerably. Therefore, operating the system in dynamic mode where the
orientations of the faces are not limited, identification accuracies are unpredictable.

Although under dynamic mode, the system can be adjusted to output recognition
results only when the face is identified with high certainty, thus seemingly simulate
real-time face recognition; efforts have been made in order to produce a truly robust
face recognition system that can identify faces without a list of constraints. The
current difficulty with the system is that each individual module within the system
has its own certain limitations and weaknesses, such as an intensity dependent
colour model or a colour segmentation dependent scaling normalisation. All these

53
uncertainties and inaccuracies add up when integrating the complete system thus
producing results that are less optimal.

The current face recognition system integrates the colour information, template
matching results and eigenvectors decomposition outcomes of an image, to deduce
an educated guess on the identity of the face in the picture. There are however,
many more additional possible clues and information that can be combined to derive
the final solution, such as edge detection, morphological analysis and even motion
detection. While humans’ face recognition ability is not only restricted to one mode
of information, but a combination of inputs from the sensors of the body, we believe
that a truly successful recognition system will be one that is not solely dependent on
a few techniques, but an integration of a range of different input data. Such a
system will be able to take advantage of the strengths of each method, and use these
strengths to compensate for the weaknesses of the other techniques. Such a
revolutionary system will be of great value in research and a breakthrough in the
area of image processing.

6.2 Face Detection


Consistently reliable face detection has been achieved under static mode. With the
implementation of the lighting normalisation module, the centre of the face can be
confidently located regardless of the position or expression of the face. Successful
detection has also been achieved in a variety of backgrounds and in the presence of
additional features, such as hats and glasses.

One of the drawbacks with the current face detection module is the inconsistency in
the skin detection. Wang and Chang stated that skin colours remain distributed over
a very small region on the chrominance plane with intensity changes accounting for
the major difference between skin tones [13]. Although their claim had been
demonstrated to hold true for a set of images from our testing database, it was also
found that this specific region, while still conglomerated into a small space, tends to
shift around upon illumination changes. Shifting in this sense refers to the range of
either the U or the V plane being increased or decreased by a constant value. For
example, a U range of 80-100 might be altered to 120-140 due to illumination
variations.

Illumination variation is the major reason for the unreliability of the colour
segmentation results. Further research needs to be conducted in this area in order to
produce a more robust intensity invariant skin colour model. Such a model would
be extremely beneficial to the solution of the face recognition problem, for many
54
inconsistent errors in the normalisation modules are due to the inaccuracies of the
skin detection search.

The accuracy of face detection is also variable while operating the system under
dynamic mode. In cases where the face in the image matches the scaling of the
faces in the database, even under minor rotations, the face can be successfully
located. Our current refined search technique of NCC template matching and face
space projection however, is so sensitive to scaling, that even slight differences
between the dimensions of the template and the test face will cause inaccuracies in
the location of the centre of the face. Thus, although using a pyramid of various
template sizes could be an alternative solution to reliable face detection, research
has shown that this is also not a useful and practical solution.

Due to the high sensitivity towards facial orientation differences in template


matching and face space projection search, scaling and rotational invariant
techniques need to be explored in order to achieve reliable face detection. The
ultimate erosion stage for example, although omitted from the design due to its
dependency on colour segmentation, can provide invaluable information regarding
location of the faces. Therefore as mentioned previously, future research should be
focussed on integrating all the available methods to unfailingly locate the face,
rather than in the design and improvement of specific methods. While each method
has its strengths and weaknesses, systems should be designed to compensate all
weaknesses by the benefits of other methods.

6.3 Face Normalisation


Throughout the evaluation and discussion of the face recognition system, the
importance of face normalisation has been emphasised incessantly many times. The
effects of differences in scaling, rotation and illumination have been demonstrated
to have significant consequences on the recognition ability of the system. It is
without doubt the most variable component, and the most pivotal in the success of
recognition by eigenfaces.

Under the current design, the normalisation of scaling and rotation suffer from its
dependency on the outcomes of the colour segmentation stage. In complex dynamic
scenes, where the amount of skin in the picture changes frequently under varying
lighting conditions, reliable skin detection is very hard to achieve and be
guaranteed. This prompts for other techniques to be used to solve the problem of
scale and rotation.

55
For example, other researches have presented solutions based on the location of the
eyes. It is claimed that by ensuring that all input faces have aligned eyes, the scale
and rotation of the face will also be consequently aligned. While this is a
theoretically sound idea, successful location of the eyes is also scale dependent.
Many of the proposed techniques are similarly not invariant to scale, and thus suffer
from scaling problems in the first place, before being able to solve the actual scaling
problem. Heavy researches are still being conducted in these areas to improve on or
to develop new ideas in order to produce robust normalisation modules.

While the addition of the current lighting normalisation stages has enabled dramatic
improvements in the accuracy of both the detection and recognition of the faces,
there are still many opportunities for improvements. One of the problems of the
current energy summation and ratio technique is that it assumes a direct
proportionality between the illumination and the pixel intensity values. Researches
have demonstrated that this is not the case and more complex relationships are in
fact being investigated. Nonetheless, the face recognition system has produced
sufficient results and in turn, has formed a strong foundation and basis for future
work to develop upon.

6.4 Face Recognition


Face recognition using eigenfaces requires one major criterion – an ideal database.
This single criterion has lead to the design of numerous other similar eigen-
decomposition techniques and the research into the improvement of face detection
and normalisation methods. While under a perfectly aligned and standardised
database, unfailingly accurate face identification can be assured; such a database is
generally not available and obtainable in practice. As mentioned continuously,
much of the limitations in eigenface recognition lie in the inability of the system to
normalise a database of faces reliably. The bridge between transforming an image
of a face into a standardised extraction of a face is one of the biggest challenges
confronted by engineers.

Since the technique of eigenface requires such strict alignment of faces, any
inaccuracies in aligning the face will degrade the recognition and generalisation
ability of the database. While many research groups have published successful
results on face recognition using eigenfaces, very few in fact achieved their results
based on a generalised ideal set of the eigenfaces. Without such ensemble of
eigenfaces, it is highly probable that the recognition is performed based on

56
erroneous information, such as the orientation, the scaling or the illumination of the
faces.

Since there are so many complicated disadvantages related to the eigenface


technique, it is logical to consider other alternatives that can be used to perform face
recognition. Eigenfaces is the most popular method and is in essence, a statical
analysis of a two-dimensional image. Careful observation would reveal that the
many researches conducted on eigenfaces are in fact based on grey scale images.
The colour contents of the face images are generally omitted. This poses the
question of why colour eigenfaces have not been investigated. Furthermore, since
human faces are three-dimensional objects, it is worth considering why a two-
dimensional method has been utilised.

A simple answer is that this area of research is still extremely new and the
development of new techniques generally takes time and investments by many
research groups. However, future research into the area of 3D modelling and
recognition including colour will become revolutionary in the image processing
field and will pave the pathway towards a new paradigm of image analysis.
Amazing explorations are yet to be completed in this dynamic area of research.

6.5 Future Work


In a research area that is under vigorous investments and major developments,
opportunities for future work are abundant. Improvements in the design and
implementation of the face detection and normalisation modules, or any extensions
that could aid in the speed and robustness of the face recognition system are
continually in demand. Presented below are several enhancements and alternatives
that can be applied to the current system.

6.5.1 Tracking
A possible solution and enhancement to the problem of normalisation is to consider
combining the recognition system with a tracking module. In dynamic video, faces
are in continual motion, and applying recognition to each frame is highly unfeasible.
Thus, rather than attempting to recognise faces under variable conditions of
unlimited constraints, a different view of the problem is to consider only
circumstances where the face can be reliably recognised.

Tracking will not begin until the face is under suitable scaling and rotation,
whereupon the system can perform reliable identification based on that particular
57
extraction. Then as the face moves from frame to frame, being subjected to
different orientations, tracking can be performed to keep track of the position of the
individual, replacing the need to perform normalisation and recognition under
unaligned facial orientations. Instead of puzzling over how to overcome the varying
scaling and rotations of a face, the face recognition system can simply focus on
determining when the face can be reliably normalised, and capture that particular
image for recognition. Correct integration of a tracking module can expand the
applicability, robustness and accuracy of the system. Since the area of search is also
greatly reduced upon subsequent face detections, improvements in the speed
performance of the system are also possible.

6.5.2 Intensity Invariant Colour Model


Throughout the evaluations and discussions of the skin detection and lighting
normalisation stages, the undesired side effects of intensity changes have been
presented. Contrary to the conclusions of Wang and Chang [13], our results showed
that although skin colours remain distributed over a very small region on the
chrominance plane, differences in lighting cause this range to change unpredictably.
Thus, in order to obtain reliably consistent skin detection results and to better
understand the effects of illumination differences, an intensity invariant colour
model needs to be investigated.

Psychological experiments and analysis have been conducted in search of a


relationship between the magnitude of a stimulus and the magnitude of human’s
sensory experience, such as human’s perception of brightness when the illumination
of light changes intensity. Gustav Fechner derived a general law stating that the
magnitude of the sensory experience of a stimulus is directly proportional to the
logarithm of the physical magnitude of the stimulus [25]. Fechner’s conclusions
were tested and a power law stating that the intensity of a sensation is directly
proportional to the intensity of the physical stimulus raised by a constant power was
proposed. Thus rather than the sensory magnitudes being directly proportional to
the logarithm of the physical magnitude, it was found that the logarithm of the
sensory magnitude is proportional to the logarithm of the physical magnitude.
These psychological conclusions are of great value to the field of image processing,
for it provides a different perspective on the perception of the relationship between
illumination and colour.

58
6.5.3 Multiple Recognition System
The biggest restrictions to recognising multiple people in a scene at any one time
are due to scaling and rotational problems. Although the current face detection
module is theoretically capable of locating more than one face, by accounting for all
the high matches in the face projection stage, there is no reliable technique to
account for the variety of scaling and rotational orientations possible. Scaling
normalisation requires future research, for when multiple faces are in the scene the
faces are very rarely all aligned and taken at the same distance from the camera.
Considerations for all modes of rotation is also necessary when attempting to
recognise multiple people, for faces are again rarely all orientated towards the same
direction. Without a suitable solution to the problem of normalisation, multiple
faces recognition will continually be a major obstacle.

6.5.4 C++ Programming


Recent advances in computer hardware and software have led to the possibility of
implementing low-cost real-time computer vision systems on conventional PC
hardware based upon the Intel processor supporting the Multimedia Extended
instruction set (MMX). Specific libraries such as the Image Processing Libraries
(IPL) and Computer Vision Libraries (CV) have been introduced recently to aid in
the design of image processing applications on PC.

Due to time constraints of the project, a C version of the face recognition system
was not produced. More time was rather devoted to prototyping a working robust
system prior to optimising the speed of the system by translating into C. The
original decision to work in Matlab and not in C was due to the complexity of
performing eigenvectors and eigenvalues calculations in C, and Matlab’s efficient
matrix algorithms make it a desirable platform for development and design.

A C version of the system however is manageable. While the training stage of the
recognition module requires the availability of eigenvectors decomposition
functions, the testing stage does not. Hence, if future developers of the face
recognition system would also like to avoid the complexity of eigenvectors
calculations in C, the training stage can be performed in Matlab while programming
the testing stage in C. The current recognition stages have been programmed to be
optimised in Matlab, such that the combination of the face detection and recognition
modules operates at two frames a second.

59
6.5.5 3D Modelling
Using stereo vision techniques to model faces as three-dimensional objects from
video input is a novel development. Due to the complexity of the mathematics
involved, very few groups have conducted research in this area. Despite the three
dimensionality of faces, many face recognition systems still operate on two-
dimensional images. Such reduction in dimensionality is essentially an omission of
meaningful data, reducing the amount of information being processed. Thus, it is
not unexpected that recognition accuracy comparable to human beings has not yet
been achieved. Although successful results have been obtained using two-
dimensional techniques such as eigenfaces, we believe that the use of 3D modelling
is the key to the solution of real-time face recognition.

The construction of a 3D model of a face from a sequence of video frames is


however extremely difficult; let alone attempting to perform face recognition using
such models. Heavy research into the mathematics of three-dimensional surfaces
and Max Cut Min Flow theory should be the focus of future research. Such major
advancements into the development of 3D modelling are not only applicable to the
task of face recognition systems, but also a tremendous benefit to the entire
discipline of image analysis.

60
7 Conclusion

An overview of the design and development of a real-time face recognition system


has been presented in this thesis. Although some aspects of the system are still
under experimental development, the project has resulted in an overall success,
being able to perform reliable recognition in a constrained environment. Under
static mode, where recognition is performed on single scaled images without
rotation, a recognition accuracy of 96% has been achieved. Face location and
normalisation were performed in real-time and consistent accuracy in face detection
is recorded with video input. A demonstration system that operates under dynamic
mode, performing recognition from real-time video input, has also been
implemented, and has been demonstrated to perform at a maximum speed of two
frames per second under Matlab with 70% accuracy.

The design of the face recognition system is based upon eigenfaces and has been
separated into three major modules – face detection, face normalisation and face
recognition. Face detection was accomplished by first performing a skin detection
search of the input image based on colour segmentation. Although skin colours
differ from person to person, and race to race, it was found that the colour remains
distributed over a very small region in the chrominance plane. Normalised cross
correlation and face space decomposition were then applied in order to locate the
exact position of the face. Since the application of eigenfaces to the task of face
recognition requires a perfectly standardised and aligned database of faces, face
normalisation modules were inserted between the detection stages to account for
possible scaling, planar rotational and illumination differences.

While the problem of recognising faces under gross variations remains largely
unsolved, a thorough analysis of the strengths and weaknesses of face recognition
using eigenfaces has been presented and discussed. With the implemented system
serving as an extendable foundation for future research, extensions to the current
system have been proposed.
61
References

[1] M. Turk and A. Pentland, “Eigenfaces for Recognition,” Journal of Cognitive


Neuroscience, Vol. 3, No. 1, Mar. 1991, pp. 71-86.

[2] A.W. Young and V. Bruce, “Perceptual Categories and the Computation of
‘Grandmother’,” European Journal of Cognitive Psychology, Vol. 3, No. 1, 1991,
pp. 5-49.

[3] J.W. Shepherd et al., “The Effects of Distinctiveness, Presentation Time and
Delay on Face Recognition,” European Journal of Cognitive Psychology, Vol. 3,
No. 1, 1991, pp. 137-145.

[4] R.L. Klatzky and F.H. Forrest, “Recognising Familiar and Unfamiliar Faces,”
Memory and Cognition, Vol. 12, 1984, pp. 60-70.

[5] A.C. Schreiber, S. Rousset, and G. Tiberghien, “Facenet: A Connectionist


Model of Face Identification in Context,” European Journal of Cognitive
Psychology, Vol. 3, No. 1, 1991, pp. 177-198.

[6] S. Lawrence et al., “Face Recognition: A Convolutional Neural Network


Approach,” IEEE Transactions on Neural Networks, Vol. 8, No. 1, pp. 98-113.

[7] T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag,


Berlin, 1989.

[8] G. Cottrell and M. Fleming, “Face Recognition using Unsupervised Feature


Extraction,” Proc. Int’l Neural Network Conf., Vol. 1, Paris, France, 1990, pp. 322-
325.

[9] F. Galton, “Personal Identification and Description,” Nature, June 1888, pp.
173-177.

62
[10] S. Carey and R. Diamond, “From Piecemeal to Configurational Representation
of Faces,” Science, Vol. 195, 1977, pp. 312-313.

[11] A.L. Yuille, D.S. Cohen, and P.W. Hallinan, “Feature Extraction from Faces
using Deformable Templates,” Proc. Of CVPR, San Diego, Calif., 1989.

[12] M. Kirby and L. Sirovich, “Application of the Karhunen-Loeve Procedure for


the Characterisation of Human Faces,” IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 12, No. 1, 1990, pp. 103-108.

[13] H. Wang and S.F. Chang, “A Highly Efficient System for Automatic Face
Region Detection in MPEG Video,” IEEE Transactions on Circuits and Systems for
Video Technology, Vol. 7, No. 4, Aug. 1997, pp. 615-628.

[14] L. Lorente and L. Torres, “Face Recognition of Video Sequences in a MPEG-7


Context using a Global Eigen Approach,” Int’l Conf. Image Processing Vol. 4,
Asahi Kosoku Printing, Japan, 1999, pp. 187-191.

[15] D. Rowland et al., “Transforming Facial Images in 2 and 3-D,” Proc. Imagina
97 Conferences, 1997, pp. 159-175.

[16] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian Face Recognition,”


Pattern Recognition, Vol. 33, No. 11, Nov. 2000, pp. 1771-1782.

[17] R. Cendrillon, Real Time Face Recognition using Eigenfaces, undergraduate


thesis, Univ. of Queensland, Dept. of Computer Science and Electrical Engineering,
1999.

[18] A. Giachetti, “Matching Techniques to Compute Image Motion,” Image and


Vision Computing, Vol. 18, Jun. 2000, pp. 247-260.

[19] R. Fisher, “Image Processing Teaching Materials,” Univ. of Edinburgh,


http://www.dai.ed.ac.uk/HIPR2 (current Oct. 16, 2001)

[20] www.angelsite.de/identalink/how_face_recognition_works.htm (current Oct.


16, 2001)

[21] T. Rzeszewski, “A Novel Automatic Hue Control System,” IEEE Trans.


Consumer Electron., Vol. CE-21, May 1975, pp. 155-162.

63
[22] “Average Face,” Boston University Compuer Help Desk,
http://caslab.bu.edu/course/cs585/P2/artdodge/average_face_clipped.jpg (current
Oct. 16, 2001)

[23] E. Kreyszig, Advanced Engineering Mathematics¸ Ed. 7, Johns Wiley and


Sons, Singapore, 1993.

[24] Matlab, “Matlab Help Desk”, http://www.mathworks.com (current Oct. 16,


2001)

[25] P. Gray, Psychology, Worth Publishers, New York, 1991.

64
Appendix A
Program Listings

A selection of code for the three modules – face detection, face normalisation and
face recognition – is included below.

% Recogniser -
%
% The following code performs the recognition stage of the system under static
% mode. Recall that the stages of the face detection and normalisation modules are
% interconnected.

close all, on = 1; off = 0; t = cputime; ncc = off;save=off;

% Load test Image


dir_loc = 'C:\My Picture\Standard\';
name = 'Carlos3';

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Face Detection and Normalisation – Convert face to a standardised face
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Read file in
color_image = imread([dir_loc name '.jpg'], 'jpg');
figure, imagesc(color_image);
grey_image = rgb2gray(color_image);
figure, imshow(grey_image), axis on
grey_image = double(grey_image);

65
%--------------------------------------------------
% Downsample image
[rows, cols, rgb] = size(color_image);
dsamp = 1;
if dsamp,
drate = 4;
imtmp = color_image(:, :, 1);
im(:, :, 1) = imtmp(1:drate:rows, 1:drate:cols);
imtmp = color_image(:, :, 2);
im(:, :, 2) = imtmp(1:drate:rows, 1:drate:cols);
imtmp = color_image(:, :, 3);
im(:, :, 3) = imtmp(1:drate:rows, 1:drate:cols);
end
color_image = im;
%--------------------------------------------------

% Perform Skin detect to find bounding box


[rows, cols, rgb] = size(image);
image = rgb2ycbcr(color_image);

Mask = zeros(256, 256);


for u = 120:165,
for v = 128:170,
Mask(u, v) = 1;
end
end

Cb = image(:, :, 2);
Cr = image(:, :, 3);
[rows, cols, rgb] = size(image);
skin_image = zeros(rows,cols);
numpixels = rows*cols;
for n=1:numpixels
skin_image(n) = Mask(Cb(n),Cr(n));
end
clear Mask u v Cb Cr rows cols rgb numpixels n

66
Skin_mask_time = cputime-t

% YUV Colour Model


figure, imshow(skin_image), zoom on
figure, imagesc(image(:, :, 3)), zoom on, colorbar, title('V')
figure, imagesc(image(:, :, 2)), zoom on, colorbar, title('U')
figure, imagesc(image(:, :, 1)), zoom on, colorbar, title('Y')

%--------------------------------------------------

% Clean up noise
mimage = bwmorph(skin_image, 'erode');
mimage = bwmorph(mimage, 'dilate');
clear skin_image
figure, imshow(mimage), zoom on, pause

%--------------------------------------------------

% Find Bounding Box & Extract Face


info = imfeature(mimage, 'BoundingBox');
% Draw Box - Round up/down to obtain index
col = floor(info.BoundingBox(1))+1;
row = floor(info.BoundingBox(2))+1;
width = ceil(info.BoundingBox(3));
height = ceil(info.BoundingBox(4));

if dsamp,
col = col*drate; row = row*drate;
width = width*drate; height = height*drate;
end

%--------------------------------------------------

% Extract face from greyscale


box = grey_image(row:row+height-1, col:col+width-1);
h = height-1+10; w = width-1+10;
if row + h > size(grey_image, 1)

67
h = size(grey_image, 1) - row;
end
if col + w > size(grey_image, 2)
w = size(grey_image, 2) - col;
end
box = grey_image(row-10:row+h-1, col-10:col+w-1);

grey_image(row, col:col+width) = 255;


grey_image(row:row+height, col) = 255;
grey_image(row+height, col:col+width) = 255;
grey_image(row:row+height, col+width) = 255;
figure, imagesc(grey_image), colormap(gray), axis on, title('Skin Region')
f = 4; figure(f), imshow(uint8(box)), axis on, zoom on, title('Box Image')

Determine_box_time = cputime-t

%--------------------------------------------------

% Loading Average Face


if ncc
avgname = 'average_eyes2';
avg = imread([dir avgname '.bmp'], 'bmp');
avg = double(avg);
else
load eigen_template
end
[heights, widths] = size(avg);
rowres = 1; colres = 1;
[no_rows, no_columns] = size(box);

%--------------------------------------------------

% Lighting Normalisation 1
if on
avg_energy = sum(sum(avg.^2));
box_energy = sum(sum(box.^2));
ratio = sqrt(avg_energy/box_energy)

68
box = box.*ratio;
figure, imshow(uint8(box)), colormap(gray), axis on, title('Light Normalised Box')
end

%--------------------------------------------------

% Downsample by drate
boxsaved = box;
avgsaved = avg;
drate = 2;
if on,
box = box(1:drate:no_rows, 1:drate:no_columns);
avg = avg(1:drate:heights, 1:drate:widths);
no_rows = ceil(no_rows/drate); no_columns = ceil(no_columns/drate);
heights = ceil(heights/drate); widths = ceil(widths/drate);
for person = 1:size(eigentemplate, 1)
% Preset in box define module later
% height_face_box = 90; width_face_box = 80;
x = reshape(eigentemplate(person, :), 90, 80);
x = x(1:drate:90, 1:drate:80);
eigentemplate_new(person, :) = x(:)';
end
figure(f+2), imshow(uint8(box)), axis on, zoom on, title('DownSampled Box')
eigentemplate = eigentemplate_new;
clear x eigentemplate_new
end

%--------------------------------------------------

% FaceSpace search using avgface


% NCC on average face - Match Filtering
dist_rows = no_rows -heights;
dist_cols = no_columns - widths;
distance = zeros(dist_rows, dist_cols);
for counter_1 = 1:(dist_rows) / (rowres);
row = counter_1 * rowres;
disp(['row ', num2str(row), '/', num2str(dist_rows)]);

69
for counter_2 = 1:(dist_cols) / (colres);
column = counter_2 * colres;
test_window = box(row:(row + heights - 1), column:(column + widths - 1));

%--------------------------------------------------
% Additional Lighting normalisation
if off
avg_energy = sum(sum(avg.^2));
box_energy = sum(sum(test_window.^2));
ratio = sqrt(avg_energy/box_energy);
test_window = test_window.*ratio;
else
ratio = 1;
end
%--------------------------------------------------

if ncc
distance(row, column) = ncc_cal(test_window, avg);
else
norm_window = test_window(:) - avg(:);
weights = eigentemplate * norm_window;
sum_weights = (sum(weights.^2)).*ratio;
% distance(row, column) = (norm(norm_window(:))^2) - sum(weights.^2);
distance(row, column) = 1 - ( sum_weights / (norm(norm_window(:))^2) );
end
end
end

% Minimise effect of edges for NCC


if ncc,
distance(1:3, :) = distance(1:3, :)-0.1;
distance(dist_rows-20:dist_rows, :) = distance(dist_rows-20:dist_rows, :)-0.1;
%distance(dist_rows-40:dist_rows, :) = distance(dist_rows-40:dist_rows, :)-0.1;
distance(:, 1:5) = distance(:, 1:5)-0.1;
distance(:, dist_cols-5:dist_cols) = distance(:, dist_cols-5:dist_cols)-0.1;
end

70
if ncc,
[void, index] = max(distance(:));
else
[void, index] = min(distance(:));
end
column = fix( (index(1)-1) / (no_rows - heights) ) + 1;
row = rem(index(1), (no_rows - heights));

if (row == 0)
row = no_rows-heights;
end
disp(['face identified at row ', num2str(row), ' column ', num2str(column)]);

figure(f+2)
set(line([column + (widths / 2); column + (widths / 2)], [row; row + heights]), 'Color', [1 1
0], 'Linewidth', 1);
set(line([column; column + widths], [row + (heights / 2); row + (heights / 2)]), 'Color', [1 1
0], 'Linewidth', 1);

figure(f)
row = row*drate; column = column*drate;
heights = heights*drate; widths = widths*drate;
%set(line([column + (widths / 2); column + (widths / 2)], [row; row + heights]), 'Color', [1
1 0], 'Linewidth', 1);
%set(line([column; column + widths], [row + (heights / 2); row + (heights / 2)]), 'Color', [1
1 0], 'Linewidth', 1);
figure, imagesc(distance), colorbar
dist=distance;

%--------------------------------------------------

% Redefine bounding box after successfully finding the centre


if ncc,
height_up = 30; height_face_box = height_up + 60;
else
% FaceSpace search locates centre rather than finding eyeline (ncc)
height_up = 45; height_face_box = height_up + 45;
end
width_left = 40; width_face_box = width_left + 40;
71
% row and column is the top left corner of the box that contains most probable face
row = row + floor(heights/2) - height_up;
col = column + floor(widths/2) - width_left;
face_box = boxsaved(row:row+height_face_box-1, col:col+width_face_box-1);
figure, imshow(uint8(face_box)), title('New Face Box')

%--------------------------------------------------

% ReLight Normalise the Box


if on
avg_energy = sum(sum(avgsaved.^2));
facebox_energy = sum(sum(face_box.^2));
ratio = sqrt(avg_energy/facebox_energy)
face_box = face_box.*ratio;
figure, imshow(uint8(face_box)), colormap(gray), axis on, title('ReLight Box')
end

%--------------------------------------------------

% Save image
if save
dir = 'C:\My Documents\ ';
imwrite(uint8(face_box), [dir name '.bmp'], 'bmp');
end

%--------------------------------------------------

% Place on canvas
% input - row & column position from facespace search
width = 170; height = 128;
% The aim is to have the centre of face = centre canvas
canvas = zeros(height, width);
row_start = floor((height/2) - (height_face_box/2));
col_start = floor((width/2) - (width_face_box/2));
canvas(row_start:row_start+height_face_box-1, col_start:col_start+width_face_box-1) =
face_box;
canvas = uint8(canvas);
figure, imshow(canvas), axis on
72
%--------------------------------------------------

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Face Recognition - Now face image is standard we can find weights and find identity
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Recognise by finding trial_weights comparing with known_weights


load eigen_info
canvas = double(canvas);
norm_window = canvas(:) - avgface(:);
trial_weights = eigenfaces * norm_window;
distance = zeros(1, num_faces);
for person = 1:num_faces
distance(person) = norm(trial_weights - known_weights(:, person));
end
[distance, index] = sort(distance);
person = index(1);
disp(['Person identified as person number ', num2str(person)]);
disp(['Person identified first as ', face_names{person}, ' : distance = ',
num2str(distance(1))]);
disp(['Person identified second as ', face_names{index(2)}, ' : distance = ',
num2str(distance(2))]);
disp(['Person identified third as ', face_names{index(3)}, ' : distance = ',
num2str(distance(3))]);
disp(['Person identified forth as ', face_names{index(4)}, ' : distance = ',
num2str(distance(4))]);
disp(['Person identified fifth as ', face_names{index(5)}, ' : distance = ',
num2str(distance(5))]);
disp(['Person identified sixth as ', face_names{index(6)}, ' : distance = ',
num2str(distance(6))]);
disp(['Person identified sevth as ', face_names{index(7)}, ' : distance = ',
num2str(distance(7))]);
disp(['Person identified eigth as ', face_names{index(8)}, ' : distance = ',
num2str(distance(8))]);
%disp(['Person identified ninth as ', face_names{index(9)}, ' : distance = ',
num2str(distance(9))]);
%disp(['Person identified tenth as ', face_names{index(10)}, ' : distance = ',
num2str(distance(10))]);
clear num_faces norm_window distance index

73
% output - person, trial weights
dir_orig = 'C:\My Documents\Original Face\';
imshow(imread([dir_orig face_names{person} '.jpg'], 'jpg'));
Recognised = cputime-t

if on
speak(face_names{person})
end

%--------------------------------------------------

clear figure_n self_define dir_loc name


Total_time = cputime-t

74

You might also like