You are on page 1of 10

Pattern Recognition 41 (2008) 396 – 405

www.elsevier.com/locate/pr

Elastic shape-texture matching for human face recognition


Xudong Xie, Kin-Man Lam ∗
Department of Electronic and Information Engineering, Centre for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hung Hom,
Kowloon, Hong Kong

Received 26 August 2006; received in revised form 7 March 2007; accepted 12 June 2007

Abstract
In this paper, a novel, elastic, shape-texture matching method, namely ESTM, for human face recognition is proposed. In our approach, both
the shape and the texture information are used to compare two faces without establishing any precise pixel-wise correspondence. The edge map
is used to represent the shape of an image, while the texture information is characterized by both the Gabor representations and the gradient
direction of each pixel. Combining these features, a shape-texture Hausdorff distance is devised to compute the similarity of two face images.
The elastic matching is robust to small, local distortions of the feature points such as those caused by facial expression variations. In addition,
the use of the edge map, Gabor representations and the direction of the image gradient can all alleviate the effect of illumination to a certain
extent.
With different databases, experimental results show that our algorithm can always achieve a better performance than other face recognition
algorithms under different conditions, except when an image is under poor and uneven illumination. Experiments based on the Yale database,
AR database, ORL database and YaleB database show that our proposed method can achieve recognition rates of 88.7%, 97.7%, 78.3% and
89.5%, respectively.
䉷 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords: Face recognition; Hausdorff distance; Gabor wavelets; Elastic shape-texture matching

1. Introduction cannot achieve a robust performance for images under various


conditions.
The morphable face model [1,2] has achieved great success in Psychological studies have indicated that line drawings of
encoding and representing human face images. This approach objects can be recognized as quickly and almost as accurately
separates a given image into its shape and texture information. as photographs [7], which means that the edge-like retinal im-
The shape encodes the feature geometry of the face, which is ages of faces can be used for face recognition at the level of
represented by a set of facial feature points and can be used early visi-break on. Therefore, the edges of a face image can
to construct a pixel-wise correspondence on a reference im- be considered the aggregate of important feature points that are
age. The texture, which is shape-free, can be obtained after useful for face recognition. Hausdorff distance [8] is such an ap-
mapping the original image onto the reference image. There- proach, whereby the distance between two edge maps or point
fore, the shape-free texture information can be constructed only sets can be calculated without the explicit pairing of the points.
after the shape information about a face has been obtained. The smaller the Hausdorff distance, the smaller the difference
Although many different methods have been proposed to lo- or deformation between the two corresponding edge maps
cate facial features [3,4] and to detect face contours [5,6], is, and the more similar the two corresponding face images
it is still a challenge to accomplish this automatically. Fur- are. Takács [9] has introduced a “doubly” modified Hausdorff
thermore, Ref. [2] reported that the morphable face approach distance (M2HD), which provides a more reliable and robust
distance measure between two point sets than the original one.
A spatially weighted modified Hausdorff distance [10] has also
∗ Corresponding author. Tel.: +852 2362 8439; fax: +852 2766 6207. been proposed, which considers the importance of facial fea-
E-mail address: enkmlam@polyu.edu.hk (K.-M. Lam). tures and allocates different weights to the points according to

0031-3203/$30.00 䉷 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2007.06.008
X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405 397

the importance of the facial regions. Ref. [11] incorporates 2. Elastic shape-texture matching
the a priori structure of a human face, namely eigen-mask,
to emphasize the importance of facial regions and achieves a It has been shown that the combined shape and texture fea-
better performance level. All these methods are based on edge ture carries the most discriminating information for human face
maps without considering any texture information about the recognition [1]. In fact, these two features are complemen-
input images. tary to each other and contain the complete information about
The Gabor wavelets (GW), whose kernels are similar to face images. We therefore propose an efficient algorithm which
the response of the 2-D receptive field profiles of the mam- combines these two types of information for face recognition.
malian simple cortical cell [12], exhibit the desirable charac- This algorithm is called ESTM. In our approach, the edge map
teristics of capturing salient visual properties such as spatial is used to represent the shape information about a face image,
localization, orientation selectivity, and spatial frequency [13]. instead of using some specific feature points that are very dif-
The GW can effectively abstract local and discriminating fea- ficult to locate accurately in practice. As the magnitudes of the
tures, which are useful for texture detection and face recogni- Gabor representations can provide a measure of the local prop-
tion [14,15]. In Ref. [14], the GW have been applied to face erties of an image [14], GW are therefore used to extract the
recognition via the dynamic link architecture (DLA) frame- texture information in our method. Furthermore, we consider
work. The DLA first computes the Gabor jets of the face im- the gradient direction [19] of each edge point as a supplement
ages, and then elastic graph matching (EGM) is used to com- to representing a shape. The gradient direction in this paper is
pare their resulting image structures. Ref. [16] has introduced also called the angle information, which is defined as follows:
an automatic weighting for the nodes of the elastic graph ac-  
Gy (x, y)
cording to their significance, and has also explored the signif- G (x, y) = arctan , (1)
icance of the elastic deformation for the application of face- Gx (x, y)
based person authentication. The morphological DLA proposed where Gx (x, y) = f (x, y) ∗ Kx (x, y), Gy (x, y) = f (x, y) ∗
in Ref. [17] adopts discriminatory power coefficients to weigh Ky (x, y), f (x, y) represents the gray-level intensity of an im-
the matching error at each grid node. Although these methods age at coordinates (x, y), * denotes the 2-D convolution oper-
can preserve some texture features and local geometry informa- ator, and Kx (x, y) and Ky (x, y) are the Sobel horizontal and
tion [18], the graph structure cannot sufficiently and effectively vertical gradient kernels, respectively.
represent the distribution of all the feature points of human
faces. 2.1. The edge maps, gabor maps and angle maps
In this paper, we propose a novel, elastic shape-texture
matching (ESTM) method of face recognition. Our method In order to obtain the edge map of a face image, morpholog-
considers the edge map, which represents the shape information ical operations [19] are first applied. In this paper, the output
about a face image, and the GW, which characterize the corre- of an image after edge detection is called an edge image, while
sponding texture information. The angles (gradient direction) after a thresholding procedure, the binary image produced is
of the edge points [19], which provide additional information called an edge map of the image. When determining the thresh-
about the shape, are also incorporated in our algorithm. Based old, we consider not only the edge image, EG (x, y), but also
on the shape and the texture information, an elastic matching the intensity values of the original image f (x, y). This is be-
is proposed for face recognition. Unlike the morphable face cause the important facial features, such as the eyes, mouth,
model, our method does not need to find the pixel-wise cor- etc., usually have lower gray-level intensities than other parts
respondence between images, which is a very difficult task in of a face. We define
practical applications. The elastic matching is robust to small,
EG (x, y)
local distortions of the feature points, such as facial expres- n(x, y) = . (2)
sion variations; and the edge map, the Gabor representations f (x, y)
and the direction of image gradient can all alleviate the effect Therefore, a pixel which has a larger value of n(x, y) can be
of illumination to a certain extent. Therefore, our algorithm, considered more likely to be an edge point of the facial features.
ESTM, can reduce the effect of expressions, lighting and per- The values of n(x, y) are sorted in descending order, and the
spectives, and can achieve a good recognition performance threshold is set so that 12% of the points with the largest magni-
under the different variations. Experimental results based on tudes of n(x, y) are selected, where the threshold value 12% is
different databases show that ESTM outperforms other meth- obtained by experiment based on the Yale database. This thresh-
ods that employ either the shape (edge map) or the texture old can achieve the best recognition result with this database,
(GW) information only under various image conditions. and will then be employed for other databases in our experi-
This paper is organized as follows. Our new ESTM method ments. The binary edge map obtained is denoted as E(x, y).
is presented in Section 2. Experimental results, which com- Fig. 1(b) shows the edge images EG obtained using morpho-
pare the performances of our proposed algorithm to other face logical edge detection, and Fig. 1(c) displays the corresponding
recognition algorithms based on the Yale database [20], the AR edge maps by using this adaptive thresholding scheme.
database [21], the ORL database [22] and the YaleB database The Gabor map of an image is denoted as G̃(x, y), which is
[23], are given in Section 3. Finally, conclusions are drawn in obtained by concatenating the GW representation at different
Section 4. center frequencies and orientations. The dimension of G̃(x, y)
398 X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405

Fig. 1. (a) The original facial images. (b) the edge images obtained by morphological operations. (c) the edge maps obtained by the adaptive thresholding method.

is therefore determined by the numbers of center frequencies d(a, b) is a distance measure between the point pair (a, b),
and orientations used. To reduce the dimension of this repre- which consists of three different terms as follows:
sentation, only one center frequency and eight orientations are
considered in our algorithm. The center frequency is chosen to d(a, b) =  · de (a, b) +  · dg (a, b) +  · da (a, b), (5)
be /2, and the orientation varies from 0 to 7/8 in steps of
where de (a, b), dg (a, b) and da (a, b) are the edge distance,
/8. The gradient direction of a pixel varies from −/2 to /2.
Gabor distance and angle distance, respectively, for the pixel
This angle information is also useful for describing the shape,
a ∈ AP to a pixel b within the neighborhood of a in BP ,
and the angle map of an image is denoted as A(x, y).
and , , and  are the coefficients used to adjust the weights
of these three distance measures. All these three measures are
2.2. Shape-texture Hausdorff distance independent of each other and are defined as follows:

For the edge map E(x, y), Gabor map G̃ (x, y), and angle de (a, b) = a − b, (6)
map A(x, y), our shape-texture Hausdorff distance is defined
as follows. dg (a, b) = G̃A (a) − G̃B (b) (7)
Given two human face images A and B, two finite point sets and
AP = {a1 , . . . , aNA } and BP = {b1 , . . . , bNB } can be obtained,
where the elements in AP and BP correspond to the points in da (a, b) = AA (a) − AB (b), (8)
the edge maps EA and EB of the original images, and NA and
NB are the corresponding numbers of points in sets AP and where . is an underlying norm (the L2 norm is used in our
BP , respectively. Then, the shape-texture Hausdorff distance is method), G̃A , G̃B , AA , and AB are the Gabor maps and angle
maps of the two images, respectively.
H (A, B) = max(hst (A, B), hst (B, A)), (3) In fact, the penalty P in Eq. (4) can also be considered as a
combination of three parts, similar to Eq. (5), i.e.
hst (A, B) is called the directed shape-texture Hausdorff dis-
tance and is defined as follows: P =  · Pe +  · Pg +  · Pa , (9)
 
1 
hst (A, B) = max I mina d(a, b), (1 − I )P , where Pe , Pg , and Pa are the corresponding penalties for these
NA b∈NB three distance measures, respectively, and , , and  have the
a∈AP P

(4) same values as in Eq. (5). An advantage of using Eq. (9) instead
of a fixed P is that this allows us to adopt different penalties
where NBa P is the neighborhood of the point a in the set BP , for different distance measures. In our method, we define
P is an associated penalty, and I is an indicator, which is equal
to 1 if there exists a point b ∈ NBa P , and equal to 0 otherwise. Pg (a) = G̃A (a) − G̃B (a). (10)
X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405 399

Pre-processing

Elastic Shape-Texture Matching

Gabor map
Computation
of shape-
texture
Hausdorff
distance
Input face image Gray edge image Edge map

Angle map

Database

Fig. 2. The architecture of our face recognition system.

Therefore, if a point of the point set BP cannot be found in NBa P existing methods only consider one or two variations. Secondly,
for the point a ∈ AP , the corresponding Gabor representations most of the existing methods need more than one image per
for image B at position a will be considered when computing person for training. When only one image per person is available
the penalty for Gabor distance. Due to the fact that the mag- for training, as is often the case in real applications, most of
nitudes of GW representation are less sensitive to the lighting them cannot achieve a satisfactory performance.
conditions [24], they are useful to alleviate the effect of being In our method, we aim to perform face recognition with vari-
unable to detect the edges under poor lighting. For simplicity, ations due to illuminations, expressions and perspectives at the
the penalties Pe (a) and Pa (a) are set as fixed values in our al- same time, but only one frontal face image of each person un-
gorithm. From Eq. (10), we can see that the value of the penalty der even illumination and with normal expression is available
is adaptive to the point under consideration. Therefore, we use for training. For the features extracted, the edges are relatively
P (a) instead of a fixed value P in Eq. (4), i.e. insensitive to illumination, and the GW and the image gradient
  can also reduce the effect of varying lightings [24,27]. So our
1  method can maintain the recognition performance under un-
hst (A, B) = max I mina d(a, b), (1 − I )P (a) . even lighting conditions. For feature matching, the searching is
NA b∈NB
a∈AP P non-rigid, i.e. elastic, which can tolerate small and local distor-
(11) tions of a human face and reduce the shape variations caused by
expressions or perspective. Because only edge points are con-
2.3. ESTM for face recognition sidered when computing the distance, both the computational
complexity and the memory requirement are greatly reduced.
Using the shape-texture Hausdorff distance, an ESTM for In fact, this ESTM approach can be considered as a combi-
face recognition is proposed. Fig. 2 shows the whole process nation of template matching and geometrical feature match-
of our proposed face recognition system. ing [4], which not only possesses the advantages of feature-
A practical face recognition technique needs to be robust to based approaches—such as being invariant to illumination and
the image variations caused by illumination conditions, facial low memory requirement—but also has the advantage of high
expressions, poses and perspectives, and other factors such as recognition performance in template matching.
aging, hair styles and glasses. However, firstly, the variations As shown in Eq. (5), the values of {, , } are the weights
caused by different conditions disturb an image in different of the three distance measures, which affect the recognition
ways. The lighting or perspective variations affect the global results. If   = 0,  = 0 and  = 0, ESTM is equivalent to
components of the image [25], or its low-frequency spectrum M2HD. Table 1 shows some combinations of {, , }, which
is influenced; while with facial expression variations, only the will be tested in Section 3. For each of the combinations, the
high-frequency spectrum is affected, which is called the high- corresponding best set of parameters is also tabulated, where
frequency phenomenon [26]. In fact, as the compensation for the Yale database is used as the training data. Here we should
one variation may have an adverse effect on another, most of the note that , , and  are normalized such that  +  +  = 1.
400 X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405

Table 1
The best sets of parameters for different conditions of {, , }, where  +  +  = 1

Abbreviation Conditions Parameter set

M2HD  = 0,  = 0,  = 0  = 1, Pe = 4.8
ESTMa  = 0,  = 0,  = 0  = 1, Pa = /20
ESTMg  = 0,  = 0,  = 0 =1
ESTMea  = 0,  = 0,  = 0  = 0.04,  = 0.96, Pe = 4.8, Pa = /20
ESTMeg  = 0,  = 0,  = 0  = 0.32,  = 0.68, Pe = 4.8
ESTM  = 0,  = 0,  = 0  = 0.02,  = 0.05,  = 0.93, Pe = 4.8, Pa = /30

3. Experimental results Table 2


The test databases used in the experiments

In this section, we will evaluate the performances of the Yale AR ORL YaleB
ESTM algorithm with different conditions of the parameter set Number of subjects 15 121 40 10
{, , } for face recognition based on different face databases. Number of test images 150 605 360 640
The databases used include the Yale database, the AR database,
the ORL database and the YaleB database. The number of dis-
tinct subjects and the number of testing images in the respective
databases are tabulated in Table 2. EGM, and M2HD adopt the local information about the im-
The face images in different databases are captured under age. The M2HD considers only the shape information, while
different conditions, such as varied lighting conditions, facial the GW uses the texture information only in the matching, and
expressions, etc. Fig. 3 shows some examples of the images. In the EGM can preserve some of the texture and local geome-
order to investigate the effect of the different conditions on the try information. For PCA, all the eigenfaces available for each
face recognition algorithms, the face images in the databases database are used, i.e. at most M −1, where M is the total num-
are divided manually into several sub-classes according to their ber of training samples. For example, for the Yale database,
different conditions, and the corresponding numbers are tabu- AR database, ORL database and YaleB database, the number
lated in Table 3. A normal image means that the face image of eigenfaces used is 14, 120, 39 and 9, respectively.
√ The GW
is of frontal view, and under even illumination and neutral ex- employs three center frequencies, i.e. /2, 2/4 and /4,
pression. In our experiments, a face is under even illumination and eight orientations, which are from 0 to 7/8 in steps of
if the azimuth angle and the elevation angle of the lighting are /8. The GW representation are concatenated to form a high-
both less than 20◦ . In Table 3, we have also combined the re- dimensional vector, which is used directly to compute the dis-
spective sub-classes of the same conditions to form the com- tance between two images pixel by pixel. The Euclidean dis-
bined databases. For each of the combined databases, the train- tance is computed and the nearest-neighbor rule is adopted for
ing set consists of images from the corresponding sub-classes, classification. Similarly, the number of center frequencies and
e.g. the training and testing images of the combined database orientations used in the EGM are five and eight, respectively,
under normal conditions come from the Yale database, ORL while the dimension of the elastic graph is 6 × 8.
database and YaleB database only.
In each database, one image for each subject under normal 3.1. Face recognition under normal conditions
conditions was selected as a training sample, and others formed
the testing set. All images are cropped to a size of 64 × 64 The respective recognition rates based on the different sub-
based on the eye locations, and all color images are converted databases with normal faces are shown in Table 4. From the
to gray-scale images. In order to measure the recognition per- results, we can observe that under normal conditions, most of
formances of our proposed algorithm, the position of the two the algorithms can achieve a high recognition rate. The perfor-
eyes is located manually. The eyes can also be detected auto- mances of the algorithms are the worst with the ORL database,
matically, but the recognition rates cannot reflect the true per- because the faces in the ORL database have some small facial
formances of the respective methods, as the resulting error in expression and perspective variations.
detecting the eyes will degrade their performances. To enhance The GW always outperforms the M2HD. This is consistent
the global contrast of the images and to reduce the effect of un- with the results in Ref. [1], i.e. the texture carries more dis-
even illuminations, histogram equalization is applied to all the criminating information than the shape. Although the ESTMg
images. The neighborhood size is set at 9 × 9, which is suitable uses only 12% of the pixels in an image as edge points and
for small, non-rigid local distortions in human face recognition. one center frequency for the Gabor filters, this method can still
The performances of ESTM and its several simplified ver- achieve similar recognition rates to the GW. This observation
sions, as listed in Table 1, are evaluated and compared with shows that the edge points can be considered as the aggregate
the PCA [28], M2HD [9], GW, and EGM [14]. PCA can pre- of important feature points that carry the most discriminating
serve the global structure of the image space, while the GW, information for face recognition. The recognition rates using
X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405 401

Fig. 3. Some cropped faces used in our experiments: (a) images from the Yale database; (b) images from the AR database; (c) images from the ORL database;
and (d) images from the YaleB database.

Table 3 on the combined database falls from 80.2% to 45.1%. As dis-


The sub-classes of the test databases used in the experiments cussed in Section 2.3, the lighting variations affect the global
Facial expression Lighting Perspective components of the image. Due to the fact that PCA represents
Normal variation variation variation faces with their principal components, this method cannot work
Yale 45 75 30 – properly when a face is under severe lighting variations.
AR – 242 363 – The edge map can serve as a robust representation to illu-
ORL 189 63 – 108 mination changes if the objects concerned have sharp edges
YaleB 160 – 480 – only. However, when the objects have smooth surfaces, such
Combined 394 380 873 108
as human faces, some of the edges may not be detected in
a consistent manner [29]. Moreover, when the lighting is not
from the front of a face, the shadows produced will also affect
the edge map generated. In the case of large illumination vari-
ESTMa are similar to the results using M2HD. Furthermore,
ation, such as the YaleB database, the performances of those
ESTMea , which adopts not only the edge information, but also
algorithms that rely on the edge information for recognition
the angle information, can perform better than both ESTMa
will be greatly affected. Therefore, the GW outperforms other
and M2HD in most cases. Our proposed ESTM method always
methods when the faces are under poor lighting conditions.
outperforms other methods, which means that the combined
Although our ESTM is not as good as the GW for the YaleB
features carry the most discriminating information, rather than
database, it outperforms GW when the Yale database and the
using only one or two of them.
AR database are used. The images in these two databases have
some illumination variations, but they are not as large as those
3.2. Face recognition under varying lighting conditions in the YaleB database. Furthermore, compared to the results of
the M2HD, which is also based on edge maps, the ESTM can
The experimental results based on the images under varying achieve higher recognition rates of 13.2–26.8%.
lighting are shown in Table 5 . In the Yale database, the lighting
is either from the left or the right of the face images. In the
AR database, besides the lighting from the left and the right, 3.3. Face recognition with different facial expressions and
lighting from both sides of the face is also adopted. The YaleB perspective variations
database is often used to investigate the effect of lighting on
face recognition. In this part of the experiments, we select only Experiments based on the face images with different facial
those images with azimuth angles or elevation angles of lighting expressions are performed and the recognition results are sum-
larger than 20◦ as the testing images. marized in Table 6. The performance of the GW degrades as
The performance of PCA degrades significantly compared to compared to the results in Section 3.1. Furthermore, its recog-
the results based on normal faces. The recognition rate based nition rate is lower than M2HD in some cases. This is because
402 X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405

Table 4
Face recognition results under normal conditions

Recognition rate (%) PCA GW EGM M2HD ESTMa ESTMg ESTMea ESTMeg ESTM

Yale 82.2 91.1 73.3 80.0 91.1 86.7 93.3 88.9 93.3
ORL 64.0 81.0 72.5 79.4 77.8 84.1 79.9 84.7 84.7
YaleB 100.0 100.0 98.1 99.4 98.1 99.4 99.4 99.4 100.0
Combined 80.2 89.8 81.0 86.8 87.3 90.6 88.8 91.1 91.4

Table 5
Face recognition results under varying lighting conditions

Recognition rate (%) PCA GW EGM M2HD ESTMa ESTMg ESTMea ESTMeg ESTM

Yale 46.7 73.3 83.3 76.7 90.0 76.7 90.0 83.3 90.0
AR 80.4 96.7 71.3 84.0 94.2 93.7 96.4 94.8 97.2
YaleB 60.8 92.1 50.0 59.2 57.1 77.7 68.0 81.5 86.0
Combined 45.1 69.4 42.3 49.8 55.6 59.7 62.1 63.0 65.5

Table 6
Face recognition results under different facial expressions

Recognition rate (%) PCA GW EGM M2HD ESTMa ESTMg ESTMea ESTMeg ESTM

Yale 66.7 74.7 57.3 66.7 77.3 78.7 78.7 84.0 85.3
AR 84.3 94.6 92.1 89.7 97.5 98.8 97.1 97.5 98.3
ORL 71.4 73.0 69.8 84.1 77.8 76.2 79.4 88.9 90.5
Combined 76.3 84.2 74.2 78.2 86.6 86.8 86.8 89.2 90.0

facial expressions often cause some local distortions of the fea- From Table 8, we can see that the ESTM outperforms all
ture points, which will then affect the corresponding local tex- the other methods based on the different databases, except for
ture and shape properties. The GW considers the texture infor- the YaleB database. In this case, the GW achieves the best
mation about the neighborhood of each pixel, which is disturbed performance. In addition, the simplified versions of ESTM, i.e.
by local distortions caused by changes in facial expression. ESTMa , ESTMg , ESTMea and ESTMeg , also outperform the
Compared to other methods, our proposed ESTM can achieve traditional methods in most cases. With these four databases,
the best performance. The recognition rate of ESTMg is slightly the recognition rate for the ORL database is always lower than
higher than that of the ESTM when using the AR database, and the others, no matter which method is adopted. This is due to
both methods have a recognition rate higher than 98%. the effect of perspective variations, which has been discussed
The relative performances of the different algorithms were in Section 3.3.
also evaluated for faces with perspective variations. All the test-
ing images were selected from the ORL database with the faces
either rotated out of the image plane, e.g. looking to the right, 3.5. Storage requirements and computational complexity
left, up and down, or rotated in the image plane, clockwise
or anti-clockwise. The experimental results are tabulated in For our approach, the data stored in a database for a face im-
Table 7, and show that none of these face recognition meth- age include its edge map, Gabor map, and angle map. Suppose
ods can achieve a satisfactory performance under perspective that the size of the normalized face is N × N , and  percent
variations. Nevertheless, the ESTM still outperforms the other of the points are selected as edge points in the edge map. The
methods. average number of feature points for an edge map is  · N 2 ,
where a feature point is the x- and y-coordinates, and can be
represented by two bytes. The dimensions of the Gabor map
3.4. Face recognition with different databases and angle map are nf na N 2 and N 2 , respectively, where nf
and na are the numbers of center frequencies and orientations
We have evaluated and discussed the effect of different con- used for the Gabor filters. Each element in the Gabor map and
ditions on different face recognition methods. In this section, the angle map is represented by a 16-bit floating-point num-
we also show the performances of the respective face recogni- ber. The numbers of bits to represent an edge map, Gabor map,
tion methods based on the different databases without dividing and angle map are tabulated in the second row of Table 9, and
them into sub-databases and the performance based on the total therefore the total number of bits used to represent a face image
combined database in Table 8. in the database is 16( + nf na + 1)N 2 .
X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405 403

Table 7
Face recognition results under various perspectives

Recognition rate (%) PCA GW EGM M2HD ESTMa ESTMg ESTMea ESTMeg ESTM

ORL 39.8 56.5 42.6 43.5 50.9 56.5 48.1 57.4 60.0

Table 8
Face recognition results based on different databases

Recognition rate (%) PCA GW EGM M2HD ESTMa ESTMg ESTMea ESTMeg ESTM

Yale 67.3 79.3 67.3 72.7 84.0 80.7 85.3 85.3 88.7
AR 82.0 95.9 79.7 86.3 95.5 95.7 96.7 95.9 97.7
ORL 58.1 72.2 63.1 69.4 69.7 74.4 70.3 77.2 78.3
YaleB 70.6 94.1 62.0 69.2 67.3 83.1 75.9 85.9 89.5
Combined 57.9 75.3 60.7 61.4 68.9 72.7 73.8 75.6 77.0

Table 9
Storage requirements and computational complexity

Edge map Gabor map Angle map

Storage requirements (bits) 16N 2 16nf na N2 16N 2


Computational complexity
Feature extraction O(N 2 ) O(N 2 log2 (N 2 )) O(N 2 )
Matching O(2N 2 D 2 Mt e ) O(2N 2 D 2 Mt g ) O(2N 2 D 2 Mt a )

For a query image, the computational time for face recogni- 4. Conclusions
tion includes two parts: feature extraction and matching. The
runtime required for feature extraction is the time spent com- In this paper, we have proposed a novel elastic shape-texture
puting the corresponding edge map, Gabor map, and angle map. matching algorithm, namely ESTM, for human face recogni-
As all these maps of the training images have been computed tion. In our approach, the edge map is used to represent the
and stored in the face database, we only need to consider the shape information about an input image, and the GW and gra-
time required to compute these maps of the query image. The dient direction are employed to characterize the corresponding
computational complexities for computing an edge map, Gabor texture information. For a query image, its edge map, Gabor
map and angle map are in the order of O(N 2 ), O(N 2 log2 (N 2 )) map and angle map are first computed, and then a shape-texture
and O(N 2 ), respectively. For searching in a large database, Hausdorff distance is proposed to compute the difference be-
the runtime for matching is the most significant part for the tween a query input and the faces in a database. This method
whole process. Suppose that the size of the neighborhood con- does not need to construct a precise pixel-wise correspondence
sidered when searching for a matching pair is D × D. This between the two images being compared, and the matching is
means that the possible number of pixels to be compared when performed within a neighborhood. This makes this approach
matching each point pair is D 2 . In this matching, the edge robust to small and local distortions of the facial feature points,
distance de (a, b), Gabor distance dg (a, b) and angle distance and thus suitable for face recognition.
da (a, b) between pixel a ∈ A and pixel b ∈ B are to be com- The paper also addresses the performances of different face
puted. Suppose that the average runtimes required to compute recognition algorithms in terms of changes in facial expres-
these three distances for one point pair (a, b) are te , tg and sions, uneven illuminations, and perspective variations. Exper-
ta , respectively, and that the total runtime tall = te + tg + ta , iments were conducted based on different databases, which
then the computational complexity of ESTM is in the order of show that our algorithm can always achieve the best perfor-
O(2N 2 D 2 Mt all ), where a factor of 2 is multiplied, since both mance compared to other algorithms, such as PCA, GW, EGM
hst (A, B) and hst (B, A) in Eq. (10) are to be computed, and M and M2HD, under different conditions. The only exception is
is the number of images stored in the database. The respective when the face images are under very poor lighting conditions, in
computational complexities of M2HD, ESTMg and ESTMa are which case the GW performs the best while the ESTM achieves
shown in the last row of Table 9, where only the edge map, the second highest recognition rate. Furthermore, only one im-
Gabor map or angle map is considered. Experiments were con- age per person is used for training in our experiments, which
ducted on a computer system with Pentium IV 2.4 GHz CPU makes it very useful for practical face recognition applications.
and 512 MB RAM. The average runtime using ESTM for face As ESTM uses the edge map, the performance of this method
recognition based on the ORL database (40 face subjects) is relies on the precision of edge detection. In fact, as discussed in
0.6 s. Section 3.2, when an image is under seriously uneven illumina-
404 X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405

tion, whereby the edge information is distorted by the shadow, [11] K.H. Lin, K.M. Lam, W.C. Siu, Spatially eigen-weighted Hausdorff
the performance of ESTM will degrade. When the image has distances for human face recognition, Pattern Recognition 36 (8) (2003)
some variations caused by occlusions, such as wearing glasses 1827–1834.
[12] C.K. Chui, An Introduction to Wavelets, Academic Press, Boston, 1992.
and/or a scarf, the performance will also be degraded. In this [13] C. Liu, H. Wechsler, Independent component analysis of Gabor features
case, the algorithm cannot judge which edges come from the for face recognition, IEEE Trans. Neural Networks 14 (4) (2003) 919–
face and which from the accoutrements. A proper way to solve 928.
this problem is to use an additional preprocessing method to [14] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C.V.D. Malsburg,
R.P. Wurtz, W. Konen, Distortion invariant object recognition in
improve the quality of the edge map. For example, Refs. [30,31]
the dynamic link architecture, IEEE Trans. Comput. 42 (3) (1993)
provide some methods of reconstructing a visually natural hu- 300–311.
man face from an image under uneven illuminations, and Ref. [15] D. Liu, K.M. Lam, L.S. Shen, Optimal sampling of Gabor features for
[32] describes a method of removing glasses from facial im- face recognition, Pattern Recognition Lett. 25 (2) (2004) 267–276.
ages. Another possible way around this problem is to assign [16] B. Duc, S. Fischer, J. Bigun, Face authentication with Gabor information
on deformable graphs, IEEE Trans. Image Process. 8 (4) (1999) 504–
different weights to different edge points according to their im-
516.
portance; this weighting function could then be incorporated [17] C.L. Kotropoulos, A. Tefas, I. Pitas, Frontal face authentication using
into the shape-texture Hausdorff distance, as in Ref. [11]. morphological elastic graph matching, IEEE Trans. Image Process. 9 (4)
(2000) 555–560.
Acknowledgment [18] J. Zhang, Y. Yan, M. Lades, Face recognition: eigenface, elastic matching,
and neural nets, IEEE Proc. 85 (9) (1997) 1423–1435.
[19] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Addison-Wesley,
The work described in this paper was supported by The Hong Reading, MA, 1993.
Kong Polytechnic University, Hong Kong, China. [20] Yale University, http://cvc.yale.edu/projects/yalefaces/yalefaces.html .
[21] A.M. Martinez, R. Benavente, The AR face database, CVC Technical
References Report #24, June 1998.
[22] The Oliver Research Laboratory in Cambridge, UK, http://www.
[1] C. Liu, H. Wechsler, A shape- and texture-based enhanced fisher classifier uk.research.att.com/pub/data/att_faces.zip .
for face recognition, IEEE Trans. Image Process. 10 (4) (2001) 598–608. [23] Yale University, http://cvc.yale.edu/projects/yalefacesB/yalefacesB.
[2] A. Lanitis, C.J. Taylor, T.F. Cootes, Automatic interpretation and coding html .
of face images using flexible models, IEEE Trans. Pattern Anal. Mach. [24] L. Shams, C. von der Malsburg, The role of complex cells in object
Intell. 19 (7) (1997) 743–765. recognition, Vision Res. 42 (22) (2002) 2547–2554.
[3] K.W. Wong, K.M. Lam, W.C. Siu, An efficient algorithm for human [25] Y. Adini, Y. Moses, S. Ullman, Face recognition: the problem of
face detection and facial feature extraction under different conditions, compensating for changes in illumination direction, IEEE Trans. Pattern
Pattern Recognition 34 (10) (2001) 1993–2004. Anal. Mach. Intell. 19 (7) (1997) 721–732.
[4] R. Brunelli, T. Poggio, Face recognition: features versus templates, IEEE [26] C. Nastar, N. Ayach, Frequency-based nonrigid motion analysis, IEEE
Trans. Pattern Anal. Mach. Intell. 15 (10) (1993) 1042–1052. Trans. Pattern Anal. Mach. Intell. 18 (11) (1996) 1067–1079.
[27] H.F. Chen, P.N. Belhumeur, D.W. Jacobs, In search of illumination
[5] I. Craw, D. Tock, A. Bennett, Finding face features, in: G. Sandini (Ed.),
invariants, in: Proceedings of the IEEE Conference CVPR, Hilton Head,
Proceedings of the European Conference on Computer Vision, Springer,
2000, pp. 254–261.
Berlin, 1992, pp. 92–96.
[28] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci.
[6] W.P. Choi, K.M. Lam, W.C. Siu, An adaptive active contour model for
3 (1) (1991) 71–86.
highly irregular boundaries, Pattern Recognition 34 (2) (2001) 323–331.
[29] Y. Moses, Face recognition: generalization to novel images, Ph.D. Thesis,
[7] Biederman, Recognition-by-components: a theory of human image
Weizman Institute of Science, 1993.
understanding, Psychol. Rev. 94 (1987) 115–147.
[30] X. Xie, K.M. Lam, Face recognition under varying illumination
[8] D.P. Huttenlocher, G.A. Klanderman, W.J. Rucklidge, Comparing images
based on a 2D face shape model, Pattern Recognition 38 (2) (2005)
using the Hausdorff distance, IEEE Trans. Pattern Anal. Mach. Intell.
221–230.
15 (9) (1993) 850–863.
[31] D.H. Liu, K.M. Lam, L.S. Shen, Illumination invariant face recognition,
[9] B. Takács, Comparing face images using the modified Hausdorff distance,
Pattern Recognition 38 (10) (2005) 1705–1716.
Pattern Recognition 31 (12) (1998) 1873–1880.
[32] J.S. Park, Y.H. Oh, S.C. Ahn, S.W. Lee, Glasses removal from facial
[10] B. Guo, K.M. Lam, W.C. Siu, S. Yang, Human face recognition based
image using recursive error compensation, IEEE Trans. Pattern Anal.
on spatially weighted Hausdorff distance, Pattern Recognition Lett. 24
Mach. Intell. 27 (5) (2005) 805–811.
(1–3) (2003) 499–507.

About the Author—XUDONG XIE received his B.Eng. degree in Electronic Engineering and M.Sc. degree in Signal and Information Processing from the
Department of Electrical Engineering, Tsinghua University, China, in 1999 and 2002, respectively. He received his Ph.D. degree from the Department of
Electronic and Information Engineering, The Hong Kong Polytechnic University in 2006. His research interests include image processing, pattern recognition,
and human face analysis.

About the Author—DR. KIN-MAN LAM received his Associateship in Electronic Engineering with distinction from The Hong Kong Polytechnic University
(formerly called Hong Kong Polytechnic) in 1986. He won the S.L. Poa Scholarship for overseas studies and was awarded a M.Sc. degree in communication
engineering from the Department of Electrical Engineering, Imperial College of Science, Technology and Medicine, England, in 1987. In August 1993, he
undertook a Ph.D. degree program in the Department of Electrical Engineering at the University of Sydney, Australia, and won an Australia Postgraduate
Award for his studies. He completed his Ph.D. studies in August 1996, and was awarded the IBM Australia Research Student Project Prize.
From 1990 to 1993, he was a lecturer at the Department of Electronic Engineering of The Hong Kong Polytechnic University. He joined the Department of
Electronic and Information Engineering, The Hong Kong Polytechnic University again as an Assistant Professor in October 1996, and has been an Associate
Professor since February 1999. Dr. Lam has also been a member of the organizing committee and program committee of many international conferences. In
particular, he was the Secretary of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), the Technical Chair
X. Xie, K.-M. Lam / Pattern Recognition 41 (2008) 396 – 405 405

of the 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP 2004), and a Technical Co-Chair of the 2005 International
Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2005). In addition, Dr. Lam was a Guest Editor for the Special Issue on
Biometric Signal Processing, EURASIP Journal on Applied Signal Processing.
Currently, Dr. Lam is the Chairman of the IEEE Hong Kong Chapter of Signal Processing and an Associate Editor of EURASIP Journal of Image and Video
Processing. His current research interests include human face recognition, image and video processing, and computer vision.

You might also like