Professional Documents
Culture Documents
Face Recognition
Yun Xue1 , Chong Sze Tong2 , and Weipeng Zhang3
1
Introduction
In the past three decades, face recognition has received increasing attention,
and the Principal Component Analysis (PCA) algorithm has been proven to be
a practical face-based approach for this task [9].
However, the traditional PCA method has some limitations. First, though it
gives an accurate representation of face images, it has not a good discriminatory
ability. Secondly, since there are both additive and subtractive combinations in
this method, its basis images may not facilitate intuitive visual meaning. Finally,
because this approach is used to nd the global features in face images, it cannot
achieve good performance when handling cases with occlusions.
Recently, a new method called non-negative matrix factorization (NMF) is
proposed for obtaining a linear representation of data. Under the non-negativity
Y. Wang, Y. Cheung, and H. Liu (Eds.): CIS 2006, LNAI 4456, pp. 10391049, 2007.
c Springer-Verlag Berlin Heidelberg 2007
1040
constraints, this method approximately factorizes the initial data matrix into two
non-negative matrix factors. Since it allows only additive, not subtractive, combinations of basis images, a part-based representation of images is consequently
produced.
For face recognition, we generally project all the face images into this NMF
space and extract all the relevant feature vectors. Then the comparison between
faces is performed by calculating the distance between all these vectors. Usually,
the Euclidean distance, the L1 distance and the Mahalanobis distance will be
used at this stage.
Though the selection of distance measure is important for the performance of
the face recognition system, there is only limited published research [4] which
evaluates the dierent distance measures for NMF-based face recognition.
In this article, we compare the performance of 17 distance measures for NMFbased face recognition. Based on the experimental results, we nd that a new
non-negative vector similarity coecient-based (NVSC) distance, which we are
advocating for use in NMF-based recognition, is always among the best distance
measures with respect to dierent image databases and at dierent settings.
This paper is organized as follows. Section 2 reviews the background theory of
NMF. The detailed denition of distance measures used in this paper is described
in Sect.3. In Sect.4, we give some description of the image databases used in the
paper. Some experimental results of a face recognition system based on the NMF
algorithm are discussed in Sect.5. Finally, we present our conclusions and discuss
some future work in Sect.6.
Review of NMF
This section provides the background theory of NMF for face recognition, which
is an unsupervised learning method. It is an algorithm to obtain a linear representation of data under non-negativity constraints. These constraints lead to
a part-based representation because they allow only additive, not subtractive,
combinations of the original data [6]. The basic idea is as below.
First, represent an image database as a n m matrix V , where each column, corresponding to a initial face image, includes n non-negative elements
characterizing the pixel value and m means the number of training images.
Then we can nd two new non-negative matrices (W and H) to approximate
the original matrix.
r
Vij
(W H)ij =
(1)
a=1
1041
constraints are compatible with the intuitive idea of combining parts to form a
whole face.
The update rule for NMF is derived as below:
First construct an objective function to characterize the similarity between V
and W H:
n m
Vij
[Vij log
Vij + (W H)ij ]
(2)
F =
(W H)ij
i=1 j=1
Then an iterative algorithm converging to a local maximum of this objective
function is derived [6]:
Vij
Haj
(W H)ij
Wia Wia
j
Wia
Wja
Wia
(3)
(4)
Haj Haj
Wia
i
Vij
(W H)ij
(5)
Distance Measures
Let X, Y be feature vectors of length n obtained by NMF method where X represents the weight of probe images, while Y means the weight of training images.
And is the auto-covariance matrix for training images, while {si , i = 1, , n}
means the square root of diagonal element in , i.e. the standard deviation for
training images. Then we can calculate distances between these feature vectors. All the denitions of distance measures used in this paper are as below
[7,11,8,10,1,3].
(1) Manhattan distance (L1 metric, city block distance)
n
|xi yi |
(6)
(xi yi )2
(7)
(8)
d(X, Y ) =
i=1
d(X, Y ) =
i=1
1042
d(X, Y ) =
(9)
d(X, Y ) =
i=1
|xi yi |
|xi | + |yi |
(10)
xi yi
si
(11)
xi
xi + yi )
yi
(12)
d(X, Y ) =
i=1
(7) Divergence
n
(xi ln
d(X, Y ) =
i=1
Like the Euclidean distance, it is also lower bounded by zero, and vanishes if and
only if X = Y . But it cannot be called a distance, because it is not symmetric
in X and Y , so we will refer to it as the divergence of X from Y .
(8) Kullback-Leibler distance (Relative Entropy)
n
d(X, Y ) =
xi log2
i=1
xi
, xi =
yi
|xi |
n
|xi |
|yi |
, yi =
i=1
(13)
|yi |
i=1
1
2
1
2
xi ln
xi
xi + yi
yi
yi ln
yi
yi + xi
xi
i=1
n
i=1
+
(14)
d(X, Y ) =
here xi =
xi log2
i=1
xi
y
+
yi log2 i
yi
xi
i=1
(15)
|x|x| | , yi = |y|y| | .
i
i=1
i=1
X 1 Y
X 1 X Y 1 Y
(16)
1043
d(X, Y ) =
i=1
(xi yi )2
xi + yi
(17)
1
d(X, Y ) = 1 (X, Y ), (X, Y ) =
n
2
34
(xi yi )2
s2
i
(18)
i=1
n+ n
n+ + n
(19)
xi yi
d(X, Y ) = 1 cos(X, Y ) = 1
i=1
(20)
n
( x2i )( yi2 )
i=1
i=1
(21)
(x x)(y y)
here (X, Y ) =
(x x) ][ (y y) ] .
[
n
i=1
i=1
i=1
The preceding four distance measures are all similarity coecient-based distances. We now suggest to consider a distance measure that seems not to have
been used in face recognition, but which originated from the theory of multivariate clustering analysis [11]. We think it may be a suitable distance measure for
NMF application because it is derived from a similarity coecient specically
dened for non-negative vectors:
(17) Non-negative vector similarity coecient-based (NVSC) distance
n
min(xi , yi )
d(X, Y ) = 1 (X, Y ), (X, Y ) =
2
i=1
n
(22)
max(xi , yi )
i=1
Among all the above distance functions, the Manhattan distance, Euclidean
distance, and the Mahalanobis distance are the most widely-used in pattern
recognition.
1044
4
4.1
In this database, there are 13 subjects and each one has 75 images showing different expressions. These face images are collected in the same lighting condition
using CCD camera, and all of them have been well-registrated by eye locations.
4.3
YaleB Database
The Yale Face Database B (YaleB) contains 5850 source images of 10 subjects
each captured under 585 viewing conditions (9 poses 65 illumination conditions). In the preprocess stage, all frontal pose images have been aligned by the
centers of eyes and mouth and the other images are aligned by the center points
of the faces. Then all images are normalized with the same resolution 92 112.
In contrast with the other two databases, this one includes more complicated
image variations and background noises, therefore the corresponding recognition
result would be expected to be much poorer.
To reduce the computational complexity, we use matlab to resize all the images
in the above databases to 1/16 of the original size, then apply NMF algorithm
on the downsampled image sets.
Experiment
In this section, we build a face recognition system to provide a performance evaluation of 17 dierent distance measures using images from databases described
in Sect.4. The system adopts traditional NMF algorithm which consists of two
stages, namely, training and recognition stages. The detailed procedure is as
below.
5.1
Training Stage
(V1 )ij
(W1 H1 )ij =
1045
Recognition Stage
Experimental Results
A set of experiments are conducted on the above system, then we evaluate the
performance of all the distance measures for NMF-based face recognition. In all
the experiments, we select tr images per person from the database to form a
training set and use the remainder as the test set.
Recognition rates for the three dierent databases with dierent experimental
settings (tr=2, 10, and 20, and dimensionality of feature vectors at 40, 60, and
80) are summarized in Table 1. To facilitate comparison, we use bold fonts for
the best 3 measures in each experimental setting.
From Table 1, we can see that:
The commonly used Manhattan distance (distance 1), Euclidean distance (distance 2), and Mahalanobis distance (distance 4) were not particularly eective.
The Manhattan distance performed best among these three popular distance
measures and was ranked in the top 3 in 3 cases. Among all the conventional
distance measures (distance 1 to 16), the cosine distance (distance 15) achieved
the best result and was ranked as one of the best 3 measures in 5 cases.
For the distance measures designed for non-negative vectors, the divergence
(distance 7) and Kullback-Leibler distance (distance 8) were not particularly
1046
YaleB (tr=20)
Distance
measure
p=40
p=60
p=80
p=40
p=60
p=80
p=40
p=60
p=80
distance 1
0.93949
0.93408
0.8879
0.99684
0.99579
0.26513
0.26549
0.28531
distance 2
0.89682
0.93057
0.89204
0.99579
0.99473
0.98736
0.23823
0.26513
0.28319
distance 3
0.67803
0.6914
0.62134
0.93888
0.9568
0.9157
0.2131
0.26319
0.25274
distance 4
0.85924
0.89459
0.84745
0.99473
0.99579
0.99473
0.34956
0.36566
0.35611
distance 5
0.77898
0.79904
0.75446
0.95258
0.95785
0.97366
0.23912
0.23115
0.2554
distance 6
0.38089
0.38917
0.43949
0.74394
0.73656
0.65753
0.16814
0.18053
0.16973
distance 7
0.87834
0.93089
0.88471
0.98103
0.97998
0.98419
0.34832
0.32991
0.36177
distance 8
0.87357
0.91783
0.86911
0.97787
0.97893
0.98419
0.34903
0.33912
0.33204
distance 9
0.91688
0.92643
0.88726
0.99789
0.99157
0.99895
0.29434
0.27097
0.30389
distance 10
0.92611
0.92771
0.91274
0.99684
0.99473
0.35646
0.35522
0.35009
distance 11
0.88758
0.92707
0.90382
0.99684
0.99368
0.98946
0.21788
0.25876
0.27611
distance 12
0.94427
0.95191
0.91401
0.99262
0.99895
0.29575
0.2931
0.32389
distance 13
0.60955
0.67134
0.64777
0.90095
0.89357
0.87144
0.18159
0.18142
0.17097
distance 14
0.1
0.1
0.1
0.076923
0.076923
0.076923
0.1
0.1
0.1
distance 15
0.93057
0.95987
0.93917
0.99579
0.99368
0.99052
0.35805
0.38655
0.38991
distance 16
0.92994
0.95064
0.92261
0.99368
0.99368
0.99262
0.36708
0.38
0.3869
distance 17
0.95924
0.96369
0.94363
0.99579
0.99789
0.36106
0.37982
0.37221
Recognition rate
0.94
0.92
0.9
0.88
Manhattan distance
Euclidean distance
Mahalanobis distance
NVSC distance
cosine distance
0.86
30
40
50
60
70
80
90
100
1047
Recognition rate
0.92
0.9
0.88
0.86
0.84
0.82
0.8
0.78
10
Manhattan distance
Euclidean distance
Mahalanobis distance
NVSC distance
Cosine distance
20
30
40
50
60
70
80
tr
CMU AMP Face Expression Database: p=10
1
0.995
Recognition rate
0.99
0.985
0.98
0.975
0.97
Manhattan distance
Euclidean distance
Mahalanobis distance
NVSC distance
cosine distance
0.965
0.96
0.955
0.95
tr
Fig. 2. Recognition rate of dierent distance measures when xing the dimensionality p
1048
database, with dimensionality set at 80 and 2 training images]. And even then,
it was in fact ranked 4th with recognition rate of 0.99789 ! In addition to being
a consistently good performer, the NVSC distance was in fact ranked the top
(or shared top) performer in 5 cases out of the 9 sets of experiments.
For further comprehensive comparison, we shall now concentrate on the Manhattan distance, Euclidean distance, Mahalanobis distance, cosine distance and
our NVSC distance. In Fig.1, we plot the respective recognition rates vs. the
dimensionality of feature vectors for the CBCL database (tr = 10).
From Fig.1, we see that although the cosine distance outperforms the NVSC
distance at dimensionality of 50, its recognition rate curve uctuates quite substantially and the NVSC curve is clearly the most consistent best performer
across a wide range of dimensionality.
Finally, we x the dimensionality of the feature vectors and plot the recognition rates vs. the value of tr for the CBCL and CMU AMP databases in Fig.2,
where p represents the dimensionality of the feature space.
Again, the NVSC emerges as the best distance measure.
In this paper, we compared 17 distance measures for NMF-based face recognition. Recognition experiments are performed using 3 dierent databases. The
experiments show that our NVSC distance measure is consistently among the
best measures under dierent experimental conditions and always performs better than the Manhattan distance, Euclidean distance, and the Mahalanobis distance which are often used in pattern recognition systems. We believe that the
eectiveness of the NVSC measure stems from the fact that it is specically designed for non-negative vectors and thus is the most appropriate for NMF-based
applications. The entropy-based measures (distance 7-10) can also handle nonnegative vectors, but they are primarily designed for probability distributions
and are not eective in handling vectors with many zero coecients.
References
1. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York
(1991)
2. Feng, T., Li, S.Z., Shum, H.-Y., Zhang, H.: Local non-negative matrix factorization
as a visual representation. In: ICDL 02: Proceedings of the 2nd International Conference on Development and Learning, vol. 178, p. 178. IEEE Computer Society,
Washington, DC, USA (2002)
3. Fraser, A., Hengartner, N., Vixie, K., Wohlberg, B.: Incorporating invariants in
mahalanobis distance based classiers: Application to face recognition. In: International Joint Conference on Neural Networks (IJCNN), Portland, OR, USA (2003)
4. Guillamet, D., Vitri`
a, J.: Evaluation of distance metrics for recognition based
on non-negative matrix factorization. Pattern Recogn. Lett. 24(9-10), 15991605
(2003)
1049