Professional Documents
Culture Documents
Carlos Leung
School of Information Technology and Electrical Engineering
University of Queensland
October 2001
Carlos Leung
11 Reagan Place
Stretton QLD 4116
Head of School
School of Information Technology and Electrical Engineering
University of Queensland
St Lucia QLD 4101
I declare that the work submitted in this thesis is my own, except as acknowledged
in the text and footnotes, and has not been previously submitted for a degree at the
University of Queensland or any other institution.
Yours sincerely,
Carlos Leung
Acknowledgement
The success of this thesis would not have been possible without the constant
encouragement, advice and support from a vast number of people.
A high five thank you to the entire team of “SIP lab Gurus” – Laser Ben, Apple
Ben, Obi Wan and Petair. It has been fun working with all of you. The many
difficult times and stressful late nights have been made easier by your
companionship and your encouraging supports. A special thanks must be extended
to Ben Appleton for his willingness to share his genius and his continual patience in
answering my many questions.
Thanks also to all my friends and family for their support throughout the year, and
to Cheryl for slaving over all my proofreading demands. To my personal printing
specialist, Uncle Colin, thank you for the endless nights spent in front of the printer
and your many wacky ideas. A super special thank you must also be extended to
Yan for her continual spiritual support and always being there to listen to my
whining.
And last but not least, thanks also to all the people who has so willingly donated
their face for my analysis. This thesis would undoubtedly be impossible without
your generosity.
iii
Abstract
As continual research is being conducted in the area of computer vision, one of the
most practical applications under vigorous development is in the construction of a
robust real-time face recognition system. While the problem of recognising faces
under gross variations remains largely unsolved, a demonstration system as proof of
concept that such systems are now becoming practical have been developed. A
system capable of reliable recognition, with reduced constraints in regards to the
position and orientation of the face and the illumination and background of the
image, has been implemented with 96% recognition accuracy.
The design of the face recognition system is based upon “eigenfaces” and has been
separated into three major modules – face detection, face normalisation and face
recognition. Face location and normalisation are performed in real-time, and
consistent accuracy in face detection is recorded with video input. Under dynamic
mode, where recognition is accomplished from real-time video input, the system has
been demonstrated to perform at a maximum speed of two frames per second under
Matlab. While a completely robust real-time face recognition system is still under
heavy investigation and development, the implemented system serves as an
extendable foundation for future research.
iv
Contents
Acknowledgement................................................................................................... iii
Abstract.....................................................................................................................iv
1 Introduction.......................................................................................................1
1.1 Task Definition ...........................................................................................2
1.2 Thesis Structure ..........................................................................................3
4 System Design..................................................................................................17
4.1 Design Architecture ..................................................................................17
4.2 Face Recognition ......................................................................................19
4.2.1 The Eigenface Approach ..................................................................19
4.3 Face Detection ..........................................................................................22
4.3.1 Skin Detection...................................................................................23
4.3.2 Refined Face Search .........................................................................26
4.4 Face Normalisation ...................................................................................30
4.4.1 Lighting Normalisation.....................................................................30
4.4.2 Scaling Normalisation.......................................................................32
4.4.3 Rotation Normalisation.....................................................................33
4.4.4 Background Subtraction ...................................................................33
v
5.1 Face Detection ..........................................................................................35
5.1.1 Skin Detection...................................................................................35
5.1.2 NCC Search ......................................................................................39
5.1.3 Face Space Search ............................................................................41
5.2 Face Normalisation ...................................................................................42
5.2.1 Scaling Normalisation.......................................................................43
5.2.2 Rotation Normalisation.....................................................................44
5.2.3 Lighting Normalisation.....................................................................45
5.3 Face Recognition ......................................................................................47
5.3.1 Training Stage...................................................................................47
5.3.2 Recognition Stage .............................................................................48
5.4 Integrated System Performance ................................................................50
7 Conclusion .......................................................................................................61
References................................................................................................................62
Appendix A Program Listings ...........................................................................65
vi
List Of Figures
vii
Figure 5-15 Lighting normalisation.........................................................................46
Figure 5-16 Computed eigenfaces ...........................................................................47
Figure 5-17 Face database .......................................................................................49
Figure 5-18 Sample results of recognition stage .....................................................49
Figure 5-19 Example of complete face recognition system ....................................51
viii
1 Introduction
As continual research is being conducted in the area of computer vision, one of the
most practical applications under vigorous development is in the construction of a
robust real-time face recognition system. With the recent major terrorist attacks in
the United States, there have been increasingly substantial interests in the
development of intelligent surveillance cameras that can automatically detect and
recognise known criminals as well as suspicious characters. Due to such uncertain
times, humans are beginning to seek support from computer systems to aid in the
process of identification and location of faces in everyday scenes. Smart buildings
can be implemented whereby the presence of unknown dubious individuals can be
brought to the attention of building security for appropriate action, and smart
computers can be used to load personal preferences and needs. Entertainment
companies are particularly interested in systems where specific actors can be
searched for and located in a video sequence, so that their movements can be
tracked throughout the entire movie.
While solutions to the task of face recognition have been presented, recognition
performances of many systems are heavily dependent upon a strictly constrained
environment. The problem of recognising faces under gross variations remains
largely unsolved. This thesis addressed the issue of developing a real-time face
recognition system under reduced constraints. The design of the face recognition
system is based upon “eigenfaces” and has been separated into three major modules
– face detection, face normalisation and face recognition.
Face detection is accomplished by first performing a skin search of the input image
based on colour segmentation. Although skin colours differ from person to person,
it was found that the colour remained distributed over a very small region on the
chrominance plane. Normalised cross correlation and face space decomposition are
then applied in order to locate the exact position of the face. Since the application
of eigenfaces to the task of face recognition requires a perfectly standardised and
1
aligned database of faces, face normalisation modules are inserted between the
detection stages to account for possible scaling, planar rotational and illumination
differences.
Due to the time constraint and complexity of implementing the system in C++, the
aim is to design a prototype under Matlab that is optimised for recognition
performance. A system that can accept varying forms of inputs at different sizes
and image resolutions should be implemented, constructing a well-coded and
documented system for easier future development.
2
1.2 Thesis Structure
A documentation of the design, evaluation and discussion of the integrated real-time
face recognition system is presented in this thesis. Following this introduction, a
literature review of available past techniques in face detection and recognition is
examined in Chapter 2. Chapter 3 is dedicated to the analysis of the underlying
mathematical theories behind the many methods used throughout the system,
including a detailed exploration of the eigenface method. In turn, Chapter 4
explains the methodology and implementation of each individual module and an
evaluation of the designed face recognition system is provided in Chapter 5. Based
upon the performance of the system, the entire design is reviewed in Chapter 6, and
methods in which the design may be improved or extended are highlighted. Chapter
7 concludes the thesis with an overview of the successes accomplished by this
project. A listing of the Matlab code for the recognition system is included in
Appendix A.
3
2 Literature Review
Many have tried to solve the problem of face recognition; something that even little
babies can do so comfortably, effortlessly and naturally, from the time they are
born. Not only have engineers and scientists been involved with developing face
recognition systems. Psychologists have also been involved with investigating how
humans perceive the task of face recognition and how faces are stored in the human
memory [2]. Research had been conducted in search of the distinctiveness of faces
and the contributing factors that aid humans in recognising faces [3]. From
Klatzky’s analysis on a human’s ability to recognise cartoons and caricatures [4],
although results suggest that recognition of faces is based on distinctive features,
strong evidence shows that faces tend to be recognised as an integral stimulus.
Although researches into face recognition through cognitive psychology provide an
interesting alternative view of the problem, most of the work performed has
produced results that remain in the experimental stage rather than designing a real
working system. The few proposed face recognition systems that are found in
cognitive psychology literature such as Facenet [5], all employ a neural network
approach and a multiplayer networks solution.
The use of neural networks has also been a popular approach in designing a face
recognition system and is still undergoing intensive development and research. This
technique uses Self Organizing Maps and Convolutional Neural Networks to
describe a database of faces [6], whereupon a three dimensional neural network is
constructed that has the ability to connect the obtained data. Kohonen [7] extended
these approaches by using an associative network with a simple learning algorithm
to classify face images and recalling face images from incomplete networks.
Fleming and Cottrell [8] further improved upon these ideas to use non-linear units,
training the system by back propagation. The major difficulty with using a
connectionist approach is a neural network’s inability to extend to a large dataset,
since even for a small image of 128 pixel squares, a neural net of more than 16000
4
input is required. The large amount of redundancy makes this approach inefficient
and impractical until further advances and improvements have been discovered and
developed.
Techniques for face recognition began as early as 1888 by Francis Galton [9]. He
proposed a recognition system based on the detection and comparison of the relative
distances between key facial features, such as eye corners and mouths. Research
into how human perceive and recognise faces by Carey and Diamond [10] however,
has shown that adults do not identify human faces based on the immediate
relationship between individual facial features. Recognition through feature
distances moreover is quite fragile and thus extremely sensitive to changes.
Nonetheless, recognition techniques based on detecting individual features are still
popular and used in conjunction with other techniques to improve performances.
A different technique that scales and normalises the facial features based on their
relative importance has been proposed by Turk and Pentland [1]. This method
analyses each facial image into a set of eigenvector components, essentially
capturing the variations in a collection of face images independent of any judgement
of particular facial features. With each component representing a certain dimension
and description of the face, the whole set of eigenvectors characterises the variations
between different faces. Every pixel in the image would contribute into the
formation of the eigenvectors, thus each eigenvector is essentially an image of the
face with a certain deviation from the average face depending on the local and
5
global facial features. Each of these eigenvectors of the faces, are called eigenfaces,
hence this technique is termed the “eigenface approach”.
The eigenface approach uses a technique developed by Sirovich and Kirby [12]
called PCA, principal component analysis. PCA is a technique that effectively and
efficiently represents pictures of faces into its eigenface components. With a given
set of weights for each face image and a set of standard pictures, they argue that any
face image can be approximately reconstructed by combining all the standard faces
according to their relative weights. This idea of linear combination is the backbone
of the eigenface technique, and has been proven extremely successful.
Before face recognition can be performed in video streams, detection of the faces is
an important task. Investigations into designing an automated face detection system
in MPEG video were accomplished by Wang and Chang [13]. Although many of
the current advanced multimedia applications and technology had not been
introduced at that time, face detection in video sequences was successfully
implemented. From then on, many research projects took place and successful
results were obtained with Lorente and Torres [14] presenting various solutions in
image segmentation and face location. Although extremely controlled
environments were used as the test sequence, modifications to cope with different
possible face conditions in a video were considered.
Despite the many successes and papers presented in the area of face recognition,
creating a real-time system is still a gruelling challenge. The eigenface approach
has proven extremely successful in constraint environments; the challenge is to
extend this to cope with the many possible facial orientations in a video sequence.
After locating the face image in a video stream, different illumination conditions,
different size scales, face occlusions and facial expressions are amongst the many
obstacles still to be solved.
6
3 Theory
1. Face Location
2. Face Normalisation
3. Face Recognition
While Chapter 4 will focus on the details and design of each individual module, a
presentation of the theory and mathematics behind the techniques used in these
stages will be discussed in this chapter.
The face normalisation module will then aim to transform the image into a
standardised format, where differing scaling factors and rotational angles of faces,
differing lighting conditions and background environment, as well as differing
facial expressions of faces will be considered. The normalised faces can then be
entered into the face recognition module, either to add a new face into the database
or to recognise a face from the existing database.
As discussed in Chapter 2, much research and focus has been placed upon the face
recognition stage, and most developmental work has been performed using photo
databases and still images that were created under a constant predefined
environment. Since a controlled environment removes the need for extensive
7
normalisation adjustments, reliable techniques have been developed to recognise
faces with reasonable accuracy provided the database contains perfectly aligned and
normalised face images.
Thus the challenge for face recognition systems lie not in the recognition of the
faces, but in the normalising all input face images to a standardised format that is
compliant with the strict requirements of the face recognition module. Although
there are models developed to describe faces, such as texture mapping with the
Candide model [15], there are currently no developed definitions or mathematical
models defining what the important and necessary components are in describing a
face.
Using still images taken under restraint conditions as inputs, it is reasonable to omit
the face normalisation stage. However, when locating and segmenting a face in
complex scenes under unconstraint environments, such as in a video scene, it is
necessary to define and design a standardised face database. As a result, numerous
configurations and schemes have been proposed and reviewed throughout the
design of this face recognition system in order to create such a database.
Techniques such as alignment of faces relative to the face centroid, centre of mass
of skinned surfaces, and distances between the eyes and the nose, have all been
considered.
Starting with a collection of original face images, PCA aims to determine a set of
orthogonal vectors that optimally represent the distribution of the data. Any face
images can then be theoretically reconstructed by projections onto the new
coordinate system. In search of a technique that extracts the most relevant
information in a face image to form the basis vectors, Turk and Pentland proposed
the eigenface approach, which effectively captures the variations within an
ensemble of face images.
8
Mathematically, the eigenface approach uses PCA to
calculate the principal components and vectors that
best account for the distribution of a set of faces
within the entire image space. Considering an image
as being a point in a very high dimensional space,
these principal components are essentially the
eigenvectors of the covariance matrix of this set of
face images, which Turk and Pentland termed the
eigenface. Each individual face can then be
represented exactly by a linear combination of
eigenfaces, or approximately, by a subset of “best” Figure 3-1 Faces are Linear
eigenfaces – those that account for the most variance Combinations of eigenfaces.
Consider an N-by-N face image I(x, y) as a vector of dimension N2, so that the
image can be thought of as a point in N2-dimensional space. A database of M
images can therefore map to a collection of points in this high dimensional “face
space” as Γ1, Γ2, Γ3, …, ΓM. With the average face of the image set defined as
M
1
Ψ=
M
∑
n =1
Γn (3.1)
each face can be mean normalised and be represented as deviations from the
average face by Φi = Γi - Ψ. The covariance matrix, defined as the expected value
of ΦΦT can be calculated by the equation
M
1
C=
M
∑
n =1
Φ n Φ Tn (3.2)
It is reasonable to assume that each face image is independent, that is, each image is
taken differently. If we further assume that all the input images into the face
recognition module have been perfectly normalised, we can conclude that the
variances of each Φi lie correspondingly, thus resulting in a set of time-aligned
images. The significance of these assumptions is because together with the
consideration that covariance is a zero-mean process, we can conclude that each Φi
is independent and that cross multiplication between different Φi is possible due to
their time-aligned characteristics, with the multiplication resulting in an expected
value of zero.
9
The above conclusion allows us to express the covariance matrix in an alternative
form. Letting matrix A = [Φ1, Φ2 … ΦM], based on our conclusions, Eq. (3.2) can
be rewritten as
1
C= ( AAT ) (3.3)
M
Since the factor 1/M will only affect the scaling of the output during eigenvector
analysis, we can omit this scaling factor in our calculation resulting in
C ≅ AAT (3.4)
Given the covariance matrix C, we can now proceed with determining the
eigenvectors u and eigenvalues λ of C in order to obtain the optimal set of principal
components, a set of eigenfaces that characterise the variations between face
images. Consider an eigenvector ui of C satisfying the equation
Cu i = λ i u i (3.5)
u iT Cu i = u iT λ i u i
u iT Cu i = λ i u iT u i (3.6)
1 i= j
u iT u j = { (3.7)
0 i≠ j
Combining Eq. (3.2) and (3.7), Eq. (3.6) thus become [17]
λ i = u iT Cu i
1 T M
= ui ∑ Φ n ΦTn ui
M n =1
M
1
=
M
∑
n =1
uiT Φ n ΦTn ui
M
1
=
M
∑
n =1
(u i Φ Tn ) T (u i Φ Tn )
10
M
1
=
M
∑
n =1
(u i Φ Tn ) 2
M
1
=
M
∑
n =1
(ui ΓnT − mean(ui ΓnT )) 2
M
1
=
M
∑
n =1
var (u i ΓnT ) (3.8)
Eq. (3.8) shows that the eigenvalue corresponding to the ith eigenvector represents
the variance of the representative face image. Turk and Pentland thus suggest [1]
that by selecting the eigenvectors with the largest corresponding eigenvalues as the
basis vector, the set of dominant vectors that express the greatest variance are being
selected.
AT Avi = µ i v i (3.9)
CAv i = µ i Av i (3.10)
11
Comparing to the standard eigenvectors definition in Eq. (3.5), Eq. (3.10) implies
that the product, Avi, are the eigenvectors of C, with µi being its corresponding
eigenvalue.
Thus rather than calculating the N2 eigenvectors of AAT, we can instead compute
the eigenvectors of ATA, and multiply the results with A in order to obtain the
eigenvectors of the covariance matrix, C = AAT. Recalling that A = [Φ1, Φ2 …
ΦM], the matrix multiplication of ATA results in an M-by-M matrix. Since M is the
number of faces in the database, the eigenvectors analysis is reduced from the order
of the number of pixels in the images (N2) to the order of the number of images in
the training set (M). In practise, the training set is relatively small (M<<N2) [1],
making the computations mathematically manageable.
Consider the singular value decomposition (SVD) of matrix A, which would result
in a diagonal matrix S of the same dimension as A with nonnegative diagonal
elements in decreasing order, and unitary matrices U and V such that A = U*S*VT
and AT = V*S*UT. The covariance matrix, C = AAT, would therefore be equal to
AAT = U SV T V SU T (3.11)
= U S 2U T
We can split this into a combination of rank one matrices such that
N2
AAT = ∑ s n2 u n u nT
n =1
If the total N2 eigenvectors of the covariance matrix were found, it would follow
from Eq. (3.5) that a spectral decomposition (decomposing a matrix into orthogonal
sub-basis) of the covariance matrix would give
12
N2
C = AA = ∑ λ n e n e nT
T
(3.13)
n =1
if λn represents the eigenvalues and en the basis vectors. As expected, Eq. (3.12)
and (3.13) shows that the s2 values are the eigenvalues, while the un values
correspond to the eigenvectors of C. By convention, it is acceptable to assume that
an SVD analysis decomposes the basis vectors in descending order such that
A T A = V SU T U SV T
= V S 2V T
M
= ∑ s n2 v n v nT (3.14)
n =1
Notice from Eq. (3.14) that the SVD expansion of ATA results in only M terms.
This is expected since the multiplication ATA gives an M-by-M matrix.
Using the same assumption derived in Eq. (3.13) and comparing the outcome of Eq.
(3.15) to Eq. (3.12), we observe that the ordered s2 values are the same in both
equations. This gives strong evidence that each derived eigenvector vn corresponds
to a respective eigenvector un. Since we have concluded that the eigenvectors un are
ordered according to their dominance, we have demonstrated that the M derived
eigenvectors vn are the first M eigenvectors of un, such that the set of v1 to vm forms
our desired dominant basis vectors.
This conclusion flows not only mathematically but also logically if we consider the
number of meaningful data points. If the number of data points in face space is less
than the dimension of the space itself, which in our case is true since M<<N2, it
follows logically that there will only be M – 1, rather than N2, meaningful
13
eigenvectors. The remaining eigenvectors will therefore have associated
eigenvalues of zero [1].
This coincides with the outcome of the SVD analysis. If the first M eigenvectors of
Eq. (3.12) are essentially the vectors derived in Eq. (3.15), the remaining
eigenvalues corresponding to the non-dominant eigenvectors of Eq. (3.15) must
therefore be zero in order for the equations to be mathematically correct.
Area-based methods are based on the analysis of the grey level pattern around the
point of interest and on the search for the most similar pattern in the successive
image [18]. Having defined a window W(x) around the point x, a similar window is
considered from the search image, and a similarity measure is used to identify how
closely matched the two window is. There are two main types of area-based
matching metrics – a distance measure and a correlation based method.
Using a distance measure comparison, the minimised distance would symbolise the
closest match, since the distance function aims to capture the difference between the
intensity values of the two pixels. Examples of the most common distance
measures include SAD (Sum of Absolute Differences), SSD (Sum of Squared
Distances) and ZSAD (Zero Mean Sum of Absolute Differences).
The closest match using the correlation-based technique is when the correlation
measure is maximised. Considering WL(x, y) as the window to be matched in
Image L of dimension N-by-N centred on the point (x, y), and WR(x+u, y+v) as the
window in Image R that is displaced from WL(x, y) by (u, v), the cross correlation
of the two window is defined as
N/2
CC = ∑ W L ( x + i, y + j ) W R ( x + i + u , y + j + v)
i , j =− N / 2
(3.16)
14
To improve on the plain cross correlation method, which can be over sensitive to
local characteristics, a normalised cross correlation method (NCC) is used, which
divides the correlation by the standard deviations of the signals in the window.
NCC is a commonly used matching technique, for it provides a measure of how
closely related a window of pixels is from another window based on the values of
each pixel inside the window, such that the closest matched window with will score
the highest correlation value. From this, we can determine the disparity of the
image. NCC has values ranging from –1 to 1, and a perfect match has an NCC
value of 1.
Defining the standard deviation of the windows as σL and σR, the equation for NCC
is:
N /2
∑ W L ( x + i, y + j ) W R ( x + i + u , y + j + v )
i, j =− N / 2
NCC = (3.17)
σL σR
3.6 Morphology
Morphology is known to many as a branch of biology that deals with form and
structure of animals and plants. Morphology in mathematics is similarly a study of
regions and shapes in images. Morphological techniques are well developed for
binary images, but many methods can be successfully extended to grayscale. For a
15
binary image, white pixels are normally taken to represent foreground regions,
while black pixels denote background.
One can visualise the erosion process as islands being eroded away as the sea level
rises, where islands are represented by the ones in the binary image. As erosion is
applied, the sea level will rise a little bit so that all the bordering pixels surrounding
each island will disappear. Logically, this will mean that the erosion will begin by
first eliminating the smallest islands, while the larger islands are being shrunk. As
subsequent erosion operations are applied, the largest island will be the last
surviving land-piece, and the inner most point or the centre of the island will be the
last point to be eroded.
Dilation Erosion
16
4 System Design
Due to the complexity of the face recognition problem, a modular approach was
taken whereby the system was separated into smaller individual stages. Focus was
then placed upon making each stage a reliable section before integrating the
modules into a complete system. The face recognition system includes three major
tasks – face detection, face normalisation and face recognition – and each of these
tasks are further broken down into separate stages. This chapter details the design
of each of these stages, providing a detailed description of how the system was
constructed.
Figure 4.1 presents a block diagram of the integrated face recognition system. The
face detection module has been separated into three main stages - skin detection,
NCC search and face space search. NCC and face space search are not used
simultaneously and is selected based on the status of the database. The face
normalisation module includes a scaling, rotation and two lighting normalisation
stages, and the recognition module involves a training stage and a recognition stage.
Notice that the face detection and normalisation modules are interconnected,
meaning that the success of the face detection module is dependent upon the
17
accuracies of the normalisation modules and vice versa. The joint effort of the
detection and normalisation modules is subsequently to produce a standardised
database for the recognition stages.
Figure 4-1 System Architecture of the face detection (yellow), face normalisation (aqua), and face
recognition module (magenta).
Due to the difficulty in producing a robust system that can operate under any
environment and face orientations, two modes of operations have been devised for
this system – static mode and dynamic mode. Under static mode, recognition is
performed on still images captured under a constrained environment. It is assumed
that faces are properly scaled and without rotation, such that the unreliable scaling
and rotational normalisation modules can be omitted during static mode operations.
Dynamic mode on the other hand is designed to operate on video input so that real-
time face recognition can be performed. From a design perspective, the main
difference between the two modes is that in static mode, scaling and rotation is not a
concern, and thus focus can be placed upon making the other stages reliable before
attempting to relax the constraints on the input faces.
Notice that both modes of operations have been designed to operate without a
restriction on the illumination and the background of the image. There are also no
restrictions placed on the location of the face in the scene and different expressions
of the face to a certain extent, are allowed. Under such relaxed conditions, the face
recognition system has been demonstrated to perform efficiently at a high accuracy
rate. The results and evaluations of the system are presented in Chapter 5.
Since we are trying to design a working robust model for a face recognition system,
Matlab has been selected as the design platform, for it is a programming language
optimal for prototyping and image processing tasks. Translation into C++ remains
for future work, when higher speed, portability and aesthetics are of concern.
18
4.2 Face Recognition
When designing a complex system, it is important to begin with strong foundations
and reliable modules before optimising the design to account for variations.
Provided a perfectly aligned standardised database is available, the face recognition
module is the most reliable stage in the system. As discussed in the Chapter 3, the
biggest challenge in face recognition still lies in the normalisation and
preprocessing of the face images so that they are suitable as input into the
recognition module. Hence, the face recognition module was designed and
implemented first.
Given a perfect set of faces such that the scale, rotation, background and
illuminance is controlled, the recognition module can be designed to work with the
optimal ideal inputs, since it is crucial that the performance of this foundation
module be as optimised as possible. Its ability to recognise an ideal database will
determine the best possible performance attainable by the overall complete system.
Any subsequent development and implementation of the face detection and
normalisation module will therefore be aimed at providing this ideal set of database.
19
decomposition, and compare the resultant weights with the closest matched weights
in the database to identify the identity of the input.
Without discussing the mathematics of the algorithm, the eigenface training stage
can be broken down into the following steps:
2. Compute the average face of the database by summing the intensity values
of each M corresponding pixels and dividing it by the total number of faces,
M.
3. Subtract the average face from T and obtain matrix A, a zero mean matrix
where each element represents the variance of the pixel’s intensity values.
20
7. Obtain the dominant basis vectors that describe and characterise the face
database by normalising the eigenfaces in step 6, by dividing each by its
vector norm.
The nine steps described above will transform a database of face images into a set of
projections into the constructed face space. If a large database is present for
training, for example if M is large, such that a representative set of eigenfaces has
been obtained, it is possible to use less than M eigenfaces to describe the database
[1], since eigenfaces that have small corresponding eigenvalues tends to overfit and
begin to describe peculiarities of individual faces. Under this circumstance, when a
new face is presented for addition into the database, rather than recalculating all the
eigenfaces, only the weights need to be determined by projecting the new face into
the existing face space.
21
2. The saved average face, eigenfaces, known weights and the name associated
with each face from the initialisation are loaded into memory.
3. The input image is translated to zero mean by subtracting the average face
from it, resulting in vector N.
4. Since the face space is represented by the eigenfaces dominant vectors, the
zero mean input image can also be projected onto the space by multiplying
the eigenfaces with the normalised input image, N, in order to determine its
weights.
5. The calculated weights can be compared with the set of known weights to
find the minimum distance between the calculated weights and each face’s
set of weights. The minimum distance symbolises the closest match
between the input test image and the faces in the database.
6. If the minimum distance found is less than a certain sensitivity value, the
input test image can be identified as the identity of the matched face.
Whereas, if the minimum distance found is larger than the sensitivity value,
the input test image can be claimed to be an unknown identity and can
prompt the system to add this new face into the database repeating the
training stage.
If the input into the face recognition module is normalised and adjusted for scale,
rotation and lighting, the training and the testing stage of the module described
above will function flawlessly. However, although the eigenface approach works
theoretically, the criteria for perfectly aligned faces make a perfect face recognition
system difficult to accomplish. As noted throughout the literature on this topic, the
biggest challenge remains in designing normalisation modules that can provide such
ideal databases of faces for recognition.
22
module to continue with any further processing. In the case of face detection, two
separate stages have been designed – the initial coarse detection phase, and the
refined face search stage.
The coarse detection phase involves a quick scan over the complete image analysing
the colour content of the input. The purpose of this stage is to reduce the search
space by identifying the skinned regions of the image, so that the second face
detection stage, which is performed after normalisation of the illuminance, can
apply refined search techniques in order to locate the exact position of the face in
the image.
The second stage has been designed to use two different algorithms depending on
the status of the database. When the database is empty and no faces have been
processed by eigenfaces decomposition, normalised cross correlation is used to
determine the centre location of the face. When a set of eigenfaces has been
determined by the training stage of the recognition module, subsequent refined face
searches can be accomplished by using a face space projection search.
It has been demonstrated that human skin tones form a special category of colours,
distinctive from the colours of most other natural objects [21]. Although skin
colours vary between different people and different races, it was found in YUV
colour representation, human skin colours remain distributed over a very small
region on the chrominance plane [13]. Experiments have been conducted to verify
this conclusion and the results are presented in Chapter 5.
23
Colour Segmentation
To perform the skin region analysis, each pixel is firstly classified as being either
skin or non-skin. In order to increase the speed of this module and realising that it
only acts as a fast coarse search of the scene, a downsampled version of the image is
used. For an input image of size 240 by 320, a down sampled rate of four is
adequate, such that the skin detection module only needs to operate on a 60 by 80
image. Downsampling by a factor higher than four has been observed to degrade
the accuracy of the search dramatically.
A lookup table is employed to classify the “skinness” of each pixel, where each
intensity is tested to see if it lies in the range of skin colour and be associated a
binary value of one if it is and zero if not. After the colour image has been mapped
into a binary image of ones and zeros representing skin and non-skin regions, a
bounded box is needed to determine the range and location of the values of ones.
Recall that the purpose of the colour segmentation is to reduce the search space of
the subsequent modules, thus, it is important to determine as tight a box as possible
without cutting off the face.
It is common during the colour segmentation to return values that are closely skin
but non-skin, or other skin-like coloured regions that is not part of the face or the
body. These noisy erroneous values are generally isolated pixels or group of pixels
that are dramatically smaller than the total face regions, which would be represented
by a big connected region in the binary image. Inclusion of these noisy pixels
would result in a box that is much larger than intended and defeat the purpose of the
segmentation.
Morphological refinements are applied to the binary output in order to reduce some
of the effects of these noisy pixels. Since these spurious errors are generally much
smaller than the face region itself, morphological techniques such as erosion and
dilations are good tools to use to eliminate these pixels. For an improved bounded
box, morphological opening (erosion followed by dilation) is therefore used in the
system after the colour segmentation to clean up the binary mapping prior to
extracting the skinned region.
24
Ultimate Erosion
Due to the distinctness of human skin tone, considerations were also made during
the design of the system to employ the colour segmentation stage as an accurate
face search module rather than a coarse estimator for the location of the face. This
involves applying the binary mapping to its full advantage by using further
morphological techniques to analyse the skin detection results. Using the method of
ultimate erosion, successful results were obtained in locating the centre of the face
of many images using skin detection alone, omitting the necessity to employ refined
search techniques. However, since this method suffers from many restrictions, it is
left out of the design of the system and treated as a possible optimisation module.
The input into the ultimate erosion module is the binary image output of the colour
segmentation stage. Since the skin colour detection stage contains noise and
possible erroneous values, the binary image needs to be pre-processed before
ultimate erosion can be applied. The task of this pre-processing stage is to ensure
that the face is represented by the largest continuous block in the binary image. Pre-
processing is accomplished by firstly performing an erosion step to clean up the
small noises. Morphological dilation will then be applied until the binary image
contains only large continuous blocks, so that all separated regions inside the face
will be connected. The euler number of binary images, which is a scalar
representing the number of objects in the image minus the total number of holes in
those objects, is used to indicate the connectivity of the separate regions.
Once the image has been successfully pre-processed, ultimate erosion is performed
and the remaining pixels at the last erosion step will indicate the centre of the face
in the image. In the presence of multiple faces, ultimate erosion will locate the
centre of the largest face in the image.
As will be evident from the results of Chapter 5, it was however found that the
distinctive skin colour region tends to fluctuate upon varying illumination
conditions and in occasions, become closely indistinguishable from certain
25
backgrounds. Since ultimate erosion depends extremely heavily upon perfect
colour segmentation, the accuracy of the face location would subsequently rely
upon the results of the unreliable skin detection module. Being able to reliably and
consistently segment and extract faces in a variety of complex scenes and
backgrounds is essential to the success of the system. As a result, skin detection has
been decided to act only as an initial module to determine a region of search for the
refined search stages.
Depending on the status of the face database, two methods have been designed for
this important stage of the detection module – Normalised Cross Correlation (NCC)
and Face Space Projection. NCC involves finding the best match between a
template and a sequence of windows, while face space search involves projecting
the sequence of windows into face space and measuring how “face-like” each
window is. Therefore, one major difference between the two searches is the input
data required and the choice of technique used is dependent on what information is
available. The NCC search requires a template; therefore, a typical face or average
face needs to be available in order for NCC to work. Face Space Projection requires
a set of eigenfaces so that each window can be projected into face space. Hence,
unless the set of dominant basis vectors has been calculated and available for use,
the projection technique cannot be used.
Since the set of eigenfaces is not available until a collection of face images have
been processed by the training stage of the recognition module, during the initial set
up of the face database, template matching with normalised cross correlation (NCC)
is used to perform the refined face search.
26
The basic principle behind NCC is to compare two windows of the same size and
measure its correlation or how closely related each pixel from one window is from
its corresponding pixel in another window. A maximised correlation value
therefore means that the two windows under examination, has each of their
corresponding pixels matching the closest from all the other windows under testing.
It is important to find a template that reflects the differing intensities of a face
accurately since it is the relative differences of intensity values within the picture
that is significant during the matching.
Hence, the best candidate for the template is the average face. It is a standard
template representing the most basic features of faces and contains as little biased
additions as can be found of any face image. It therefore is the most suitable choice
for a template that captures the relative differences in intensities between the
features of a face.
With the template selected, the process of face detection with NCC remains merely
a matter of applying the equation of normalised cross correlation as discussed in
Chapter 3. The input into the refined search module, the normalised skinned region
of the image, will become the test window; the modified average face on the other
hand will form the template window. Sections of the test window, of the same
27
dimensions as the template, will then be continuously extracted and correlated with
the template. The extracted region with the highest correlation value will be the
region of the test window that corresponds the closest to the average face template.
Notice carefully that the technique of NCC matches intensities values that
correspond closely between two selected windows. It is the correlation between a
particular pixel in an image with another pixel of another image. No particular
focus has been placed upon the relationship of neighbouring pixels, or the
correlation between a region of pixels. Hence, in selected areas where the intensity
values associate similarly to the template of Figure 4.4, there is a possibility of false
matches.
Since this is a module to initialise a database, rather than trying to perfect the NCC
correlation technique and implement auto correction modules, the system allows the
user to manually select those results that successfully located the faces. This also
allows the user to visually and manually predict the future accuracy of the system,
since the more perfect the faces are aligned, the more reliable the succeeding face
recognition module is.
A more accurate and alternative method to perform refined face search is to use face
space projection. Once NCC has been used to initialise a database, the training
stage of the recognition module can determine a set of eigenface templates. Given
these templates, any subsequent image that passes through the second phase of the
face detection module can hence utilise eigen-decomposition rather than normalised
cross correlation.
The face space projection technique involves projecting each selected window into
face space. Similar to the NCC method, the input into this refined search stage – the
normalised skinned region of the original image – is treated as the test window.
Throughout the search, sections of this test window the size of an eigenface will
then be continually extracted for projection into face space.
The first step is therefore to load the average face and eigenfaces that were saved
from the training stage of the recognition module. Notice that the average face used
here is the actual average calculated from the set of input faces that were added into
the system, unlike the average face used in NCC, which is foreign to the current
database.
28
Once the average face and eigenfaces are available, the projection is accomplished
by firstly subtracting the selected section of the test window by the average face.
This again is a necessary pre-processing step to transform the data into a zero mean
image since all eigenfaces are calculated from the covariance matrix, which
originated from zero mean images. The zero mean test region is then projected into
face space by multiplying it with the loaded set of eigenfaces in order to produce a
set of weights.
From the weights, it is then possible to determine how “face-like” the region is by
comparing the energy of the window to the energy of the transformed window. If
the region was “face-like”, then the amount of projections onto face space would be
maximised, and large weight values will be recorded since each of the basis vectors
capture the dominant variances in a face. A “non-face-like” region will result in
smaller projection values, since the eigenvectors do not describe as clearly images
that do not reflect a face structure.
The maximum value that the weights can have is theoretically the total energy of the
original window itself prior to transformation, that is, every bit of energy is
conserved and transformed into face space. Mathematically, energy is defined as
the sum of the squares of the intensity values, also referred to as the norm of the
intensity values [23]. Therefore, if the region was exactly centred on the face, then
theoretically, the difference between the sum of squares of the transformed window
– the weights – and the norm of the original window should be zero. Each selected
region from the test window will thus have an associated distance from face space
recorded, such that the minimum distance out of all the regions tested will
symbolise the closest matched, the most “face-like” area, in the test window.
Prior to inputting the normalised skinned region into the refined face space search,
the image is first downsampled where a quick coarse search can be applied. Then to
improve on the accuracy of the search, neighbouring pixels around the first estimate
will be tested in order to locate the exact centre of the face. The coarse to fine
search technique used here has demonstrated to provide significant speed
improvements, with the amount of speedup limited by the accuracy of the
29
downsampled window. Since the input into the refined stage is already a smaller
subset of the original window, it has been found that a downsampling factor of two
is a reasonable reduction in search space while maintaining the accuracy of the
search.
The face normalisation module can be divided into three main categories – scaling
normalisation, rotation normalisation and lighting normalisation.
The total amount of lighting can be normalised by observing the energy embedded
in the images. In order for unbiased comparisons between images, the relative
ratios between the pixels within the image itself should not be altered. Therefore,
all intensity values within the image should be multiplied by a constant ratio such
that the resultant total energy is a standardised value. This ratio is determined by
the ratio of energies.
For our system, since we have selected to use the average face as the standardised
image, all input images should normalise their total energy so that it matches the
30
total energy of the average face. Notice that in the case of using NCC for the
refined search stage, the standard template obtained from Boston University [22]
will be used as the average face, while with face space projection, the average face
from our database will determine the total energy. The refined search technique
therefore not only influences the operations of the face detection module but also
the face normalisation modules, further emphasising the dependency between the
different modules.
The lighting normalisation module is applied in two places: once before the refined
detection module and once prior to entering the face images into the recognition
module.
The first normalisation is applied to enhance the accuracy of locating the face, by
ensuring that all selected regions contain the same total energy as the average face.
This is to avoid extreme biasing towards a particular lighting condition in the image
when the distance functions are calculated. There was a particular problem with
NCC search when light was incident from one end of the image to the other, and
biasing was placed upon the brighter side since it would naturally record a higher
correlation value regardless of the template. The addition of a lighting
normalisation stage relieved the problem.
The second lighting normalisation stage was applied so that all images inputting
into the recognition module will have the same total energy, so that no bias is placed
upon a particular face during recognition. Since all images at this stage will be the
same size, a conservation of energy would act as if all images of the heads were
taken under the same lighting conditions. The total energy of an image is the sum
of the squares of the intensity values [23].
image
Energy = ∑ (Intensity ) 2 (4.1)
This makes the ratio between intensities a square rooted relationship between two
images. That is, if we compute the ratio of energy between image A and image B to
be e, we would multiply the total energy of image B by e so that image A and B will
have the same total energy. We would however, only multiply each intensity value
by the square root of e, √e, so that the sum of the squares of (B*√e) is the same as
the norm of A.
In our recognition system, all images are stored as intensity values. Therefore,
inside the illuminance normalisation module, the energy of the selected window will
31
first be calculated and then compared to the energy of the average window. The
ratio of the energy is then computed, and the intensity of each pixel in the window
will be multiplied by the square root of that ratio. The resultant image will be such
that the total energy of the image is normalised.
Many proposed methods for solving the scaling problem, such as eyes detection,
suffer from scaling problems in itself. Thus, an alternative method that does not
depend on correct scaling in the first place is proposed for this system. This method
involves making full use of the skinned region information obtained from the colour
segmentation stage. Based on the binary mapping of the face region in the image,
the size of the extracted box is compared to the dimensions of the template. Again,
in the case of NCC face search, the template will be the obtained average face,
while face space projection and face recognition will use the eigenfaces as the
template.
Two scaling factors are then determined, one being a ratio of width and the other, a
ratio of height. Selection is achieved by choosing the larger ratio between the two
ratios when scaling down, and selecting the smaller ratio when scaling up. This
process of using the same ratio for both dimensions is chosen in order not to morph
the face to a different set of dimensions by keeping the relative variances between
the pixels constant. Once the ratio has been determined, normalisation can be
achieved by either adjusting the dimensions of the extracted box, or a new box can
be extracted from a scaled version of the image, such that the face in the image is
now in the same scale as the templates. Refined search techniques such as face
space projection can then be applied to accurately locate the exact centre of the face.
32
Figure 4-5 Illustration of the determination of the rotational angle
A method involving the use of the skin detection binary image to account for
rotation has been proposed for this system. With the aid of Matlab’s specialised
functions [24], the aim is to compute the angle of rotation of the face block in the
binary image. This is achieved by wrapping an ellipse around the faced region such
that the second-order moment of the ellipse is the same as the second-order moment
of the binary mapping. The angle formed between the major axis of the calculated
ellipse and the horizontal x-axis will consequently represent the face’s angle of tilt.
Rotation normalisation can be achieved by rotating the image by the negative of that
angle. Figure 4.4 illustrates an example of how the rotational angle is computed.
Considerations were made for the inclusion of a background subtraction stage into
the design of the recognition system, and experiments and tests were conducted to
evaluate the effectiveness of this addition. From the results, it was found that the
recognition performance decreased dramatically with the inclusion of a background
subtraction stage due to the excessive amount of energy being placed into mapping
the boundaries and edge effects of the cutting. Thus rather than recognising the
variances of the face, it is measuring the shape of the background segmentation. As
a result, although this method theoretically enhances the recognition, it is practically
unfeasible and has not been integrated into the system.
34
5 System Performance Evaluation
The face recognition system has been tested with face images captured under a
variety of conditions. The performance of each module and the overall system is
detailed in this chapter. Examples of accurate recognition and cases that highlight
limitations to the system are both presented, allowing an insight into the strengths
and weaknesses of the designed system. Such insight into the limitations of each
module is an indication of the direction and focus for future work.
While the skin detection stage is invariant to scaling and rotation, the refined face
search stages are sensitive to changes. Thus in order to gain a better judgement on
the performance of the refined face search modules, these stages are evaluated under
static mode so that faces of a single scale are used to test the system. The ability to
handle multi-scaled rotated face images is evaluated by the normalisation modules.
35
150
140
130
120
110
100
90
80
0
5 0
5
10 10
15
15 20
Investigations were made to verify that human skin tones do in fact form a special
category of colour in the UV-plane. Figure 5.1 is a three dimensional visualisation
of an image in the V plane. Although the decomposition was applied to a
downsampled version of the 240x320 image on the left, the difference in colour
remains evident and the skin regions are shown to have higher values in V. The
shape of the face is also noticeable and the face is clearly distinguished from the
background. Thresholding thus can be applied to the coloured input to produce a
binary mapping of skinned regions in the image.
50
100
V
150
200
250
50 100 150 200 250
U
36
It is however possible that the plot in Figure 5.1 shows a particular difference in
skin colour because of the background. In order to demonstrate and substantiate
Wang’s claims, a range of skin colour was plotted in the UV-plane. This plot is also
useful as a colour lookup table. Before colour segmentation can be applied, the
range of U and V values that are specific to human skin tone needs to be
determined. The plot is presented in Figure 5.2 and it is clear that skin-tone colours
are distributed over a very small area in UV space.
Figure 5.4 presents the binary images of the three faces in Figure 5.3 after the
colour segmentation operations. From these binary mappings, noises and spurious
errors are cleaned up by morphological opening (erosion followed by dilation) and
the approximated locations of the faces determined. These outputs allow smaller
images containing the faces be extracted so that further processing in normalisation
and alignment of the faces can be applied.
Figure 5-3 Examples of skin colour used to generate the lookup table
Figure 5-4 Binary Mapping of skinned regions of the faces in Figure 5.3
37
Figure 5-5 Intensity effects on Skin Detection. The middle picture is obtained using the same lookup
table as the pictures on Figure 5.4, while the picture on the right used a lookup table with the V values
shifted higher.
Using face images taken under the same conditions as in Figure 5.4, of the 50
images tested, all 50 returned a binary map that closely bounded the face region.
The use of the UV model however has also presented some problems and
limitations. Although Wang and Chang [13] claimed that the chrominance model is
intensity invariant, experimentations have shown that intensity affects the range of
the skin colour region. Under different illumination conditions, although skin-tone
colour remains conglomerated in a specific region in the UV-plane, it was found
that the actual range was changed.
Figure 5.5 presents two results of colour segmentation of a photo taken under a
different lighting condition from the photos that created the lookup table. The first
result is the binary mapping of the photo using the same range as the photos that
were used to create the table as depicted in Figure 5.2, while the second binary
image is a shifted version of the lookup table. From these results, it is known that
intensity affects the ability to properly segment the colour, and an intensity invariant
model needs to be derived. Several models were attempted, such as plotting U/Y
and V/Y, but with no success.
Figure 5-6 The black box around the image on the right is the output of the face detection module
given the left input image, encapsulating the pixels that have been identified as skin.
38
As a result, the skin colour region is expanded in the UV-plane in order to
accommodate for the varying illumination conditions. The trade off is that the
detection is not as accurate and it is due to the unreliability and lack of precision in
the skin detection method that causes it to be applied only as a quick coarse search
to approximate the location of the face in the face recognition system.
Figure 5.6 is an example of the output of the coarse skin detection module. Notice
that although it has located the skinned region of the face in a complex background,
due to the larger acceptable range for skin, a larger region than the face is extracted.
Depicted in Figure 5.8 is one of failed matches that occurred for many different
images when correlating with the standard average face of Figure 5.7. The yellow
cross indicates the region that recorded the highest correlation measure. Notice that
the correlation is performed not only on the extracted skinned region (illustrated
Figure 5-8 The white box in the left image, from skin detection, is the input into NCC refined search.
The yellow cross in the right image is the outcome of the NCC search using the template of Figure 5.7.
39
with the white bounding box), but also in an illumination normalised version of the
extracted image. Inaccurate detection was recorded regardless of whether
illumination normalisation was applied.
Using this alternative template, out of the 20 faces that were processed through
NCC face search, 13 were successful in locating the centre of the face. One of these
success outcomes is presented in Figure 5.10 along with a plot of the correlation
measures obtained. The dark red region symbolises the highest score, which
corresponds to a point between the eyes. This result is expected since the template
has changed and the centre of the revised template has shifted from the centre of the
face to a point closer to the middle of the eyes.
Although a 65% success rate is not accurate enough for a robust recognition system,
errors are however tolerable in the NCC search. Since this method is only used for
initialising a database, incorrect outputs can be omitted, storing only successful
results, so that only the aligned images are passed onto the eigenfaces training stage.
Figure 5-10 Successful location of face centre using revised template, along with plot of correlation
measures of the image
40
5.1.3 Face Space Search
As the last face detection stage before the recognition module, the face space search
is extremely important. The accuracy of the face alignment and subsequently the
accuracy of the recognition depend heavily on the success of this module. While
errors in the preceding stages, such as skin detection, are tolerable since its purpose
is to provide an approximated region of where the face is located, any errors in the
face space search will directly degrade the performance of the system, making
accuracy the utmost concern.
On the other hand, the success of the face space search depends heavily on the
outcomes of the normalisation modules. In static mode, where the scaling and
rotation need not be adjusted, with the correct lighting normalisation implemented,
as many as 92% of the input images were aligned perfectly with the centre of the
faces located. In dynamic mode, due to the allowable variations in scale and
rotation, eigenfaces that might not exactly match the input are used for detection.
Since face space search measures how well the windowed input decomposes into
face space described by the eigenface templates, it is not unexpected that the
performance would decrease dramatically when scaling for example has altered the
size of the face input.
Under correct scaling and rotational adjustments however, successful results have
been recorded. With the illumination normalisation stage implemented, such that
the energy of the windowed images matches the total energy of the eigenfaces, out
of the 96 face images treated by face space search, 88 recorded accurate detection of
the centre of the face. The remaining 8 recorded the correct location as the second
best closest match.
41
Figure 5-12 Left: Distance map of face space search of face in Figure 5.11. Right: 3D visualisation of
inverse distance surface with the centre of the face represented by the peak.
The distance maps that describe the image in Figure 5.11 are presented in Figure
5.12. On the left is a plot of the recorded distances with the dark blue region
representing the minimum value, while on the right is a 3D visualisation of the
inverse distance surface so that the minimum value on the distance map is
represented by peaks. From the 3D plot, it is easy to observe how face space
projection records the centre of the face distinctively. Comparing this result to the
NCC distance map in Figure 5.10, the centre of the face is more distinct, accurately
found, and not spread out over a range of possible locations.
42
Figure 5-13 Face input extracted by binary map and scaled normalised.
As shown in Figure 5.13, a binary map representing the skinned region is used to
extract the face in the image. Presented is a face that is larger than the standard size.
Thus, a scaling correction factor is computed and applied to the extracted region.
The scaling factor is determined by selecting the larger ratio between the
dimensions of the extracted region to the dimensions of the eigenfaces when scaling
down, and selecting the smaller ratio when scaling up. This is in order not to morph
the face to a different set of dimensions such that the relative variances between
pixels are altered.
Using this technique and producing scaled versions of the extracted box, the
subsequent face space search stage has been tested to successfully locate the centre
of the faces and produce aligned images. It has been found that as long as the skin
detection stage outputs a binary map that bounds the face correctly within an error
43
of ten pixels, the scaling normalisation module will output a scaled image that is
acceptable to the refined search stages. A bounding box larger than the face by
more than 20 pixels will result in unreliable face location from both NCC and face
space projection search.
When only one face is present in the image and an appropriate colour model has
been chosen, a suitable bounding box can be assured since morphology can be used
to clean up spurious noises. However, in the presence of other skin-like or skinned
regions, such as shoulders and arms, the colour segmentation module will not be
able to output a tight box around the face, preventing an accurate scaling factor to
be computed. Until further research and improvements have been conducted in the
normalisation modules, operating the scaling stage in dynamic mode will produce
unreliable outputs.
The rotational normalisation module designed in this system works similarly to the
scale normalisation module. As depicted in Figure 5.14, rotational analysis is also
based on the binary mapping produced from the skin detection stage. The rotational
angle determined from the binary map is used to adjust the scaled face so that the
extracted box can be normalised in terms of both scale and rotation. Many pictures
of different faces subjected to planar rotations ranging from 10 to 45 degrees have
been tested, and it was observed that all the images were correctly normalised to
within five degrees.
Such successful results, however suffer from the same limitation as the scaling stage
in that the accuracy of the correction angle is heavily dependent upon the results
from colour segmentation. Whilst outputs from the rotational module can be fed
into the refined search to precisely align the faces, the rotational results are
inaccurate unless a tight bounded box of the face was extracted from the skin
detection module.
44
Figure 5-14 Face image subject to both scaling and rotational normalisation.
Mathematically, it would seem more logical to normalise the energy of each of the
selected window during the face search scan rather than the whole extracted region.
Experimentally however, it was found that detection accuracy decreased
considerably when lighting normalisation was applied to each individual window.
45
Figure 5-15 Lighting normalised output for two images of different background illumination.
Notice how the detection is performed on extracted regions subjected to lighting adjustments.
Thus the system currently involves a lighting normalisation stage prior to the
beginning of the face search. With such addition implemented, as many as 92% of
the images have had the centre of the face correctly located under static mode,
where scaling and rotation does not affect the outcome.
The second illumination normalisation stage occurs after the faces have been
successfully aligned to ensure that all the faces into the recognition module have the
same total energy. Shown in Figure 5.15 is an example of how images taken under
different lighting has been normalised to a standard. The inclusion of this stage has
enabled the recognition accuracy to improve. Rather than matching weights that
reflect common intensity and background illumination distributions, the recognition
stage, with all faces normalised in terms of lighting, focuses on the relative
variances between the faces. The current recognition system under static mode has
demonstrated to be performing at 96% accuracy, as detailed in the following
section. Prior to the insertion of the second illumination stage, many faces were
matched wrongly and only 50% of the faces were correctly identified.
Apart from extreme cases where the lighting of the images is extremely dark or
excessively bright, the combination of the two lighting normalisation stages can
compensate for almost any illumination variations. Reliable face detection and
recognition has been achieved with the addition of these two stages, making them
the only dependable components within the normalisation module.
46
Figure 5-16 Comparison between eigenfaces constructed from a standardised (top row) and a
misaligned (bottom row) database. The leftmost picture in both rows is the associating average face.
47
Presented in Figure 5.16 are two extreme ensembles of eigenfaces along with their
associating average faces. The set of eigenfaces on the top row is constructed from
a collection of faces that have been treated by the face detection and normalisation
modules, with the leftmost picture in the row being the average face of that set of
eigenfaces. The bottom set of eigenfaces, on the other hand, is computed from a
database that did not restrict the position and scaling of the faces, with the leftmost
picture of the bottom row being the average face.
Observing the top row shows that an aligned database produces eigenfaces that tend
to describe variations in faces, while in the bottom set, the eigenfaces tend to
highlight the misalignments. Details of individual faces can also be seen from the
misaligned database since the misalignment has caused that particular face to be
described by its own eigenface. Misaligned databases thus defeat the purpose of
eigenface decomposition since each face is supposed to be represented by a linear
combination of eigenfaces, each representing a dominant basis that captures the
variances between faces.
The current face detection and normalisation modules have been producing faces
that are aligned within a tolerable error in static mode, such that a “good” set of
eigenfaces can be determined. Misalignment to the order of less than five pixels is
acceptable. However, with the introduction of scaling and rotation variations, the
standardisation ability of the normalisation module tends to decrease causing
misaligned faces to enter the recognition module. Therefore, when the recognition
system is operating in training mode to obtain the eigenfaces, static mode is applied
so that images that require extensive adjustments are avoided.
Identification is performed by comparing the weights of the test face with the
known weights of the database. Mathematically, a score is found by calculating the
norm of the differences between the test and known set of weights, such that a
minimum difference between any pair would symbolise the closest match.
48
Figure 5-17 Small face database used for testing – all faces have been standardised using the face
detection and normalisation modules.
58% 86%
97% 95%
85% 93%
9% 64%
Figure 5-18 Sample results of recognition stage with confidence measure included. The left image of
each pair is the normalised test input, while the right image is the closest match found.
49
A confidence measure has been devised during testing to describe the accuracy of
the recognition. Its aim is to describe the certainty of the identification by searching
for a minimum score and observing how unique the score is compared to the other
matched scores. Mathematically, it is computed by the difference between the
squared of the score of the best match and the squared of the score of the second
closest match, divided by the squared of the second matched score. Since the scores
are originally calculated by norms of vector differences, it is not mathematically
unsound to use the squares of these values as a form of measure.
Confidence % =
(Second Best Score)2 − (Best Matched Score)2 % (5.1)
(Second Best Score)2
A best match score of zero, recorded when the test weights of an image correspond
exactly to a set of known weights, would result in a confidence measure of 100%,
implying that the face is identified with complete certainty.
One of the failed examples, identified with a small 9% confidence, is also presented.
Comparing with the database shows that it is due to the face being improperly
scaled, such that a complete picture of the face was not provided to the recognition
stage. Many errors in recognition can similarly be attributed to poor normalisation,
further emphasising the importance of strictly standardised databases.
Lighting
Normalisation
Face
Face Space
Detection
Distance Map
Face
Extraction
Lighting Recognition
Normalisation
67%
Figure 5-19 Example of integrated stages of a complete face recognition system under static mode.
Successful face detection and normalisation has been achieved with correct recognition at 67%
confidence.
The results of the recognition stage discussed previously are good indications and
reflections on the accuracy of the face recognition system under static mode. Prior
to arriving at the standardised inputs seen in Figure 5.18, each test face that is
shown has been treated with the face detection and normalisation modules. Thus, a
successful recognition implies that all the individual stages discussed were also
successful in providing accurate results. An incorrect output from any stage, such
as inaccurate face detection, would result in possible erroneous recognition such as
the example in Figure 5.18. Figure 5.19 shows an example of the integrated system,
where all the stages combine to perform recognition on a test image.
51
The successful example in Figure 5.19 is an excellent illustration of how all the
stages are integrated, in particular, how the face detection and normalisation
modules interlink in order to accurately locate the face. This particular example,
which was a problematic image in the earlier stages of the design of the system, has
been successful in extracting the face regardless of the facial expression and the
presence of a hat. Notice how the adjustment of the lighting, normalised the total
energy of the image so that the face space search could accurately locate the face,
demonstrated by the peak in the inverse distance surface.
Apart from achieving an accuracy rate as high as 96% in static mode, the speed of
processing the modules is another useful measure of performance for the face
recognition system. Using a Pentium III 833MHz processor, 256MB RAM, and a
Logitech web cam, the recognition system has been demonstrated to perform at a
maximum speed of two frames a second under Matlab. This includes the total
processing time for all three modules – face detection, normalisation and
recognition – for each input image. This speed is expected to increase substantially
when the system is implemented in C and when prior knowledge of the location of
the face is integrated using tracking.
For demonstrations, the system operates under dynamic mode where real-time face
recognition on video input is performed. Since faces of all angles and orientations
will be confronted, consistently high recognition accuracy of every frame is not
viable. However, knowing that continuous face images are available for
recognition, the current system is designed to only acknowledge and output
identification when the confidence level is high and the best match score is low.
Since only faces that are aligned properly can result in low scores at high
confidence level, under dynamic mode from a database of ten faces, accurate
recognition as high as 70% has been achieved by the system. That is, of the ten
individuals tested under the real-time recognition system, seven faces were correctly
identified.
Successful results were also recorded in cases where there was more than one face
in the image as is often the case when operating with real-time video input. As the
face space search is scanned across the image, the face that is most “face-like” and
matches the scaling of the eigenfaces will be extracted and consequently passed
onto the recognition stage for identification.
52
6 Design Review
The previous chapters have provided a detailed description of the design and a
comprehensive evaluation of the performance of the face recognition system.
Presented in this chapter will be a discussion of the results, analysing the strengths
and weaknesses of the system and proposing future work based on the limitations of
the current system.
Although under dynamic mode, the system can be adjusted to output recognition
results only when the face is identified with high certainty, thus seemingly simulate
real-time face recognition; efforts have been made in order to produce a truly robust
face recognition system that can identify faces without a list of constraints. The
current difficulty with the system is that each individual module within the system
has its own certain limitations and weaknesses, such as an intensity dependent
colour model or a colour segmentation dependent scaling normalisation. All these
53
uncertainties and inaccuracies add up when integrating the complete system thus
producing results that are less optimal.
The current face recognition system integrates the colour information, template
matching results and eigenvectors decomposition outcomes of an image, to deduce
an educated guess on the identity of the face in the picture. There are however,
many more additional possible clues and information that can be combined to derive
the final solution, such as edge detection, morphological analysis and even motion
detection. While humans’ face recognition ability is not only restricted to one mode
of information, but a combination of inputs from the sensors of the body, we believe
that a truly successful recognition system will be one that is not solely dependent on
a few techniques, but an integration of a range of different input data. Such a
system will be able to take advantage of the strengths of each method, and use these
strengths to compensate for the weaknesses of the other techniques. Such a
revolutionary system will be of great value in research and a breakthrough in the
area of image processing.
One of the drawbacks with the current face detection module is the inconsistency in
the skin detection. Wang and Chang stated that skin colours remain distributed over
a very small region on the chrominance plane with intensity changes accounting for
the major difference between skin tones [13]. Although their claim had been
demonstrated to hold true for a set of images from our testing database, it was also
found that this specific region, while still conglomerated into a small space, tends to
shift around upon illumination changes. Shifting in this sense refers to the range of
either the U or the V plane being increased or decreased by a constant value. For
example, a U range of 80-100 might be altered to 120-140 due to illumination
variations.
Illumination variation is the major reason for the unreliability of the colour
segmentation results. Further research needs to be conducted in this area in order to
produce a more robust intensity invariant skin colour model. Such a model would
be extremely beneficial to the solution of the face recognition problem, for many
54
inconsistent errors in the normalisation modules are due to the inaccuracies of the
skin detection search.
The accuracy of face detection is also variable while operating the system under
dynamic mode. In cases where the face in the image matches the scaling of the
faces in the database, even under minor rotations, the face can be successfully
located. Our current refined search technique of NCC template matching and face
space projection however, is so sensitive to scaling, that even slight differences
between the dimensions of the template and the test face will cause inaccuracies in
the location of the centre of the face. Thus, although using a pyramid of various
template sizes could be an alternative solution to reliable face detection, research
has shown that this is also not a useful and practical solution.
Under the current design, the normalisation of scaling and rotation suffer from its
dependency on the outcomes of the colour segmentation stage. In complex dynamic
scenes, where the amount of skin in the picture changes frequently under varying
lighting conditions, reliable skin detection is very hard to achieve and be
guaranteed. This prompts for other techniques to be used to solve the problem of
scale and rotation.
55
For example, other researches have presented solutions based on the location of the
eyes. It is claimed that by ensuring that all input faces have aligned eyes, the scale
and rotation of the face will also be consequently aligned. While this is a
theoretically sound idea, successful location of the eyes is also scale dependent.
Many of the proposed techniques are similarly not invariant to scale, and thus suffer
from scaling problems in the first place, before being able to solve the actual scaling
problem. Heavy researches are still being conducted in these areas to improve on or
to develop new ideas in order to produce robust normalisation modules.
While the addition of the current lighting normalisation stages has enabled dramatic
improvements in the accuracy of both the detection and recognition of the faces,
there are still many opportunities for improvements. One of the problems of the
current energy summation and ratio technique is that it assumes a direct
proportionality between the illumination and the pixel intensity values. Researches
have demonstrated that this is not the case and more complex relationships are in
fact being investigated. Nonetheless, the face recognition system has produced
sufficient results and in turn, has formed a strong foundation and basis for future
work to develop upon.
Since the technique of eigenface requires such strict alignment of faces, any
inaccuracies in aligning the face will degrade the recognition and generalisation
ability of the database. While many research groups have published successful
results on face recognition using eigenfaces, very few in fact achieved their results
based on a generalised ideal set of the eigenfaces. Without such ensemble of
eigenfaces, it is highly probable that the recognition is performed based on
56
erroneous information, such as the orientation, the scaling or the illumination of the
faces.
A simple answer is that this area of research is still extremely new and the
development of new techniques generally takes time and investments by many
research groups. However, future research into the area of 3D modelling and
recognition including colour will become revolutionary in the image processing
field and will pave the pathway towards a new paradigm of image analysis.
Amazing explorations are yet to be completed in this dynamic area of research.
6.5.1 Tracking
A possible solution and enhancement to the problem of normalisation is to consider
combining the recognition system with a tracking module. In dynamic video, faces
are in continual motion, and applying recognition to each frame is highly unfeasible.
Thus, rather than attempting to recognise faces under variable conditions of
unlimited constraints, a different view of the problem is to consider only
circumstances where the face can be reliably recognised.
Tracking will not begin until the face is under suitable scaling and rotation,
whereupon the system can perform reliable identification based on that particular
57
extraction. Then as the face moves from frame to frame, being subjected to
different orientations, tracking can be performed to keep track of the position of the
individual, replacing the need to perform normalisation and recognition under
unaligned facial orientations. Instead of puzzling over how to overcome the varying
scaling and rotations of a face, the face recognition system can simply focus on
determining when the face can be reliably normalised, and capture that particular
image for recognition. Correct integration of a tracking module can expand the
applicability, robustness and accuracy of the system. Since the area of search is also
greatly reduced upon subsequent face detections, improvements in the speed
performance of the system are also possible.
58
6.5.3 Multiple Recognition System
The biggest restrictions to recognising multiple people in a scene at any one time
are due to scaling and rotational problems. Although the current face detection
module is theoretically capable of locating more than one face, by accounting for all
the high matches in the face projection stage, there is no reliable technique to
account for the variety of scaling and rotational orientations possible. Scaling
normalisation requires future research, for when multiple faces are in the scene the
faces are very rarely all aligned and taken at the same distance from the camera.
Considerations for all modes of rotation is also necessary when attempting to
recognise multiple people, for faces are again rarely all orientated towards the same
direction. Without a suitable solution to the problem of normalisation, multiple
faces recognition will continually be a major obstacle.
Due to time constraints of the project, a C version of the face recognition system
was not produced. More time was rather devoted to prototyping a working robust
system prior to optimising the speed of the system by translating into C. The
original decision to work in Matlab and not in C was due to the complexity of
performing eigenvectors and eigenvalues calculations in C, and Matlab’s efficient
matrix algorithms make it a desirable platform for development and design.
A C version of the system however is manageable. While the training stage of the
recognition module requires the availability of eigenvectors decomposition
functions, the testing stage does not. Hence, if future developers of the face
recognition system would also like to avoid the complexity of eigenvectors
calculations in C, the training stage can be performed in Matlab while programming
the testing stage in C. The current recognition stages have been programmed to be
optimised in Matlab, such that the combination of the face detection and recognition
modules operates at two frames a second.
59
6.5.5 3D Modelling
Using stereo vision techniques to model faces as three-dimensional objects from
video input is a novel development. Due to the complexity of the mathematics
involved, very few groups have conducted research in this area. Despite the three
dimensionality of faces, many face recognition systems still operate on two-
dimensional images. Such reduction in dimensionality is essentially an omission of
meaningful data, reducing the amount of information being processed. Thus, it is
not unexpected that recognition accuracy comparable to human beings has not yet
been achieved. Although successful results have been obtained using two-
dimensional techniques such as eigenfaces, we believe that the use of 3D modelling
is the key to the solution of real-time face recognition.
60
7 Conclusion
The design of the face recognition system is based upon eigenfaces and has been
separated into three major modules – face detection, face normalisation and face
recognition. Face detection was accomplished by first performing a skin detection
search of the input image based on colour segmentation. Although skin colours
differ from person to person, and race to race, it was found that the colour remains
distributed over a very small region in the chrominance plane. Normalised cross
correlation and face space decomposition were then applied in order to locate the
exact position of the face. Since the application of eigenfaces to the task of face
recognition requires a perfectly standardised and aligned database of faces, face
normalisation modules were inserted between the detection stages to account for
possible scaling, planar rotational and illumination differences.
While the problem of recognising faces under gross variations remains largely
unsolved, a thorough analysis of the strengths and weaknesses of face recognition
using eigenfaces has been presented and discussed. With the implemented system
serving as an extendable foundation for future research, extensions to the current
system have been proposed.
61
References
[2] A.W. Young and V. Bruce, “Perceptual Categories and the Computation of
‘Grandmother’,” European Journal of Cognitive Psychology, Vol. 3, No. 1, 1991,
pp. 5-49.
[3] J.W. Shepherd et al., “The Effects of Distinctiveness, Presentation Time and
Delay on Face Recognition,” European Journal of Cognitive Psychology, Vol. 3,
No. 1, 1991, pp. 137-145.
[4] R.L. Klatzky and F.H. Forrest, “Recognising Familiar and Unfamiliar Faces,”
Memory and Cognition, Vol. 12, 1984, pp. 60-70.
[9] F. Galton, “Personal Identification and Description,” Nature, June 1888, pp.
173-177.
62
[10] S. Carey and R. Diamond, “From Piecemeal to Configurational Representation
of Faces,” Science, Vol. 195, 1977, pp. 312-313.
[11] A.L. Yuille, D.S. Cohen, and P.W. Hallinan, “Feature Extraction from Faces
using Deformable Templates,” Proc. Of CVPR, San Diego, Calif., 1989.
[13] H. Wang and S.F. Chang, “A Highly Efficient System for Automatic Face
Region Detection in MPEG Video,” IEEE Transactions on Circuits and Systems for
Video Technology, Vol. 7, No. 4, Aug. 1997, pp. 615-628.
[15] D. Rowland et al., “Transforming Facial Images in 2 and 3-D,” Proc. Imagina
97 Conferences, 1997, pp. 159-175.
63
[22] “Average Face,” Boston University Compuer Help Desk,
http://caslab.bu.edu/course/cs585/P2/artdodge/average_face_clipped.jpg (current
Oct. 16, 2001)
64
Appendix A
Program Listings
A selection of code for the three modules – face detection, face normalisation and
face recognition – is included below.
% Recogniser -
%
% The following code performs the recognition stage of the system under static
% mode. Recall that the stages of the face detection and normalisation modules are
% interconnected.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Face Detection and Normalisation – Convert face to a standardised face
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Read file in
color_image = imread([dir_loc name '.jpg'], 'jpg');
figure, imagesc(color_image);
grey_image = rgb2gray(color_image);
figure, imshow(grey_image), axis on
grey_image = double(grey_image);
65
%--------------------------------------------------
% Downsample image
[rows, cols, rgb] = size(color_image);
dsamp = 1;
if dsamp,
drate = 4;
imtmp = color_image(:, :, 1);
im(:, :, 1) = imtmp(1:drate:rows, 1:drate:cols);
imtmp = color_image(:, :, 2);
im(:, :, 2) = imtmp(1:drate:rows, 1:drate:cols);
imtmp = color_image(:, :, 3);
im(:, :, 3) = imtmp(1:drate:rows, 1:drate:cols);
end
color_image = im;
%--------------------------------------------------
Cb = image(:, :, 2);
Cr = image(:, :, 3);
[rows, cols, rgb] = size(image);
skin_image = zeros(rows,cols);
numpixels = rows*cols;
for n=1:numpixels
skin_image(n) = Mask(Cb(n),Cr(n));
end
clear Mask u v Cb Cr rows cols rgb numpixels n
66
Skin_mask_time = cputime-t
%--------------------------------------------------
% Clean up noise
mimage = bwmorph(skin_image, 'erode');
mimage = bwmorph(mimage, 'dilate');
clear skin_image
figure, imshow(mimage), zoom on, pause
%--------------------------------------------------
if dsamp,
col = col*drate; row = row*drate;
width = width*drate; height = height*drate;
end
%--------------------------------------------------
67
h = size(grey_image, 1) - row;
end
if col + w > size(grey_image, 2)
w = size(grey_image, 2) - col;
end
box = grey_image(row-10:row+h-1, col-10:col+w-1);
Determine_box_time = cputime-t
%--------------------------------------------------
%--------------------------------------------------
% Lighting Normalisation 1
if on
avg_energy = sum(sum(avg.^2));
box_energy = sum(sum(box.^2));
ratio = sqrt(avg_energy/box_energy)
68
box = box.*ratio;
figure, imshow(uint8(box)), colormap(gray), axis on, title('Light Normalised Box')
end
%--------------------------------------------------
% Downsample by drate
boxsaved = box;
avgsaved = avg;
drate = 2;
if on,
box = box(1:drate:no_rows, 1:drate:no_columns);
avg = avg(1:drate:heights, 1:drate:widths);
no_rows = ceil(no_rows/drate); no_columns = ceil(no_columns/drate);
heights = ceil(heights/drate); widths = ceil(widths/drate);
for person = 1:size(eigentemplate, 1)
% Preset in box define module later
% height_face_box = 90; width_face_box = 80;
x = reshape(eigentemplate(person, :), 90, 80);
x = x(1:drate:90, 1:drate:80);
eigentemplate_new(person, :) = x(:)';
end
figure(f+2), imshow(uint8(box)), axis on, zoom on, title('DownSampled Box')
eigentemplate = eigentemplate_new;
clear x eigentemplate_new
end
%--------------------------------------------------
69
for counter_2 = 1:(dist_cols) / (colres);
column = counter_2 * colres;
test_window = box(row:(row + heights - 1), column:(column + widths - 1));
%--------------------------------------------------
% Additional Lighting normalisation
if off
avg_energy = sum(sum(avg.^2));
box_energy = sum(sum(test_window.^2));
ratio = sqrt(avg_energy/box_energy);
test_window = test_window.*ratio;
else
ratio = 1;
end
%--------------------------------------------------
if ncc
distance(row, column) = ncc_cal(test_window, avg);
else
norm_window = test_window(:) - avg(:);
weights = eigentemplate * norm_window;
sum_weights = (sum(weights.^2)).*ratio;
% distance(row, column) = (norm(norm_window(:))^2) - sum(weights.^2);
distance(row, column) = 1 - ( sum_weights / (norm(norm_window(:))^2) );
end
end
end
70
if ncc,
[void, index] = max(distance(:));
else
[void, index] = min(distance(:));
end
column = fix( (index(1)-1) / (no_rows - heights) ) + 1;
row = rem(index(1), (no_rows - heights));
if (row == 0)
row = no_rows-heights;
end
disp(['face identified at row ', num2str(row), ' column ', num2str(column)]);
figure(f+2)
set(line([column + (widths / 2); column + (widths / 2)], [row; row + heights]), 'Color', [1 1
0], 'Linewidth', 1);
set(line([column; column + widths], [row + (heights / 2); row + (heights / 2)]), 'Color', [1 1
0], 'Linewidth', 1);
figure(f)
row = row*drate; column = column*drate;
heights = heights*drate; widths = widths*drate;
%set(line([column + (widths / 2); column + (widths / 2)], [row; row + heights]), 'Color', [1
1 0], 'Linewidth', 1);
%set(line([column; column + widths], [row + (heights / 2); row + (heights / 2)]), 'Color', [1
1 0], 'Linewidth', 1);
figure, imagesc(distance), colorbar
dist=distance;
%--------------------------------------------------
%--------------------------------------------------
%--------------------------------------------------
% Save image
if save
dir = 'C:\My Documents\ ';
imwrite(uint8(face_box), [dir name '.bmp'], 'bmp');
end
%--------------------------------------------------
% Place on canvas
% input - row & column position from facespace search
width = 170; height = 128;
% The aim is to have the centre of face = centre canvas
canvas = zeros(height, width);
row_start = floor((height/2) - (height_face_box/2));
col_start = floor((width/2) - (width_face_box/2));
canvas(row_start:row_start+height_face_box-1, col_start:col_start+width_face_box-1) =
face_box;
canvas = uint8(canvas);
figure, imshow(canvas), axis on
72
%--------------------------------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Face Recognition - Now face image is standard we can find weights and find identity
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
73
% output - person, trial weights
dir_orig = 'C:\My Documents\Original Face\';
imshow(imread([dir_orig face_names{person} '.jpg'], 'jpg'));
Recognised = cputime-t
if on
speak(face_names{person})
end
%--------------------------------------------------
74