You are on page 1of 11

Survey of Distance Measures for NMF-Based

Face Recognition
Yun Xue1 , Chong Sze Tong2 , and Weipeng Zhang3
1

School of Physics and Telecommunication Engineering,


South China Normal University,
Guangzhou Guangdong 510631, China
and Department of Mathematics, Hong Kong Baptist University,
Hong Kong, China
yxue@math.hkbu.edu.hk
2
Department of Mathematics, Hong Kong Baptist University,
Hong Kong, China
cstong@math.hkbu.edu.hk
3
Department of Computer Science, Hong Kong Baptist University,
Hong Kong, China
zwp@comp.hkbu.edu.hk

Abstract. Non-negative matrix factorization (NMF) is an unsupervised


learning algorithm that can extract parts from visual data. The goal of
this technique is to nd intuitive basis such that training examples can be
faithfully reconstructed using linear combination of basis images which
are restricted to non-negative values. Thus NMF basis images can be
understood as localized features that correspond better with intuitive
notions of parts of images. However, there has not been any systematic
study to identify suitable distance measure for using NMF basis images
for face recognition.
In this article we evaluate the performance of 17 distance measures between feature vectors based on the result of the NMF algorithm for face
recognition. Recognition experiments are performed using the MIT-CBCL
database, CMU AMP Face Expression database and YaleB database.

Introduction

In the past three decades, face recognition has received increasing attention,
and the Principal Component Analysis (PCA) algorithm has been proven to be
a practical face-based approach for this task [9].
However, the traditional PCA method has some limitations. First, though it
gives an accurate representation of face images, it has not a good discriminatory
ability. Secondly, since there are both additive and subtractive combinations in
this method, its basis images may not facilitate intuitive visual meaning. Finally,
because this approach is used to nd the global features in face images, it cannot
achieve good performance when handling cases with occlusions.
Recently, a new method called non-negative matrix factorization (NMF) is
proposed for obtaining a linear representation of data. Under the non-negativity
Y. Wang, Y. Cheung, and H. Liu (Eds.): CIS 2006, LNAI 4456, pp. 10391049, 2007.
c Springer-Verlag Berlin Heidelberg 2007

1040

Y. Xue, C.S. Tong, and W. Zhang

constraints, this method approximately factorizes the initial data matrix into two
non-negative matrix factors. Since it allows only additive, not subtractive, combinations of basis images, a part-based representation of images is consequently
produced.
For face recognition, we generally project all the face images into this NMF
space and extract all the relevant feature vectors. Then the comparison between
faces is performed by calculating the distance between all these vectors. Usually,
the Euclidean distance, the L1 distance and the Mahalanobis distance will be
used at this stage.
Though the selection of distance measure is important for the performance of
the face recognition system, there is only limited published research [4] which
evaluates the dierent distance measures for NMF-based face recognition.
In this article, we compare the performance of 17 distance measures for NMFbased face recognition. Based on the experimental results, we nd that a new
non-negative vector similarity coecient-based (NVSC) distance, which we are
advocating for use in NMF-based recognition, is always among the best distance
measures with respect to dierent image databases and at dierent settings.
This paper is organized as follows. Section 2 reviews the background theory of
NMF. The detailed denition of distance measures used in this paper is described
in Sect.3. In Sect.4, we give some description of the image databases used in the
paper. Some experimental results of a face recognition system based on the NMF
algorithm are discussed in Sect.5. Finally, we present our conclusions and discuss
some future work in Sect.6.

Review of NMF

This section provides the background theory of NMF for face recognition, which
is an unsupervised learning method. It is an algorithm to obtain a linear representation of data under non-negativity constraints. These constraints lead to
a part-based representation because they allow only additive, not subtractive,
combinations of the original data [6]. The basic idea is as below.
First, represent an image database as a n m matrix V , where each column, corresponding to a initial face image, includes n non-negative elements
characterizing the pixel value and m means the number of training images.
Then we can nd two new non-negative matrices (W and H) to approximate
the original matrix.
r

Vij

Wia Haj , W Rnr , H Rrm

(W H)ij =

(1)

a=1

where matrix W consists of r non-negative basis vectors and r is usually chosen


as small as possible for dimension reduction, while column vectors of H mean
the weights when approximating the corresponding column in V using the bases
from W .
From the original denition, we know, in contrast to the PCA approach,
no subtractions can occur in the above NMF procedure, so the non-negativity

Survey of Distance Measures for NMF-Based Face Recognition

1041

constraints are compatible with the intuitive idea of combining parts to form a
whole face.
The update rule for NMF is derived as below:
First construct an objective function to characterize the similarity between V
and W H:
n m
Vij
[Vij log
Vij + (W H)ij ]
(2)
F =
(W H)ij
i=1 j=1
Then an iterative algorithm converging to a local maximum of this objective
function is derived [6]:
Vij
Haj
(W H)ij

Wia Wia
j

Wia
Wja

Wia

(3)
(4)

Haj Haj

Wia
i

Vij
(W H)ij

(5)

The convergence is proved in [7].

Distance Measures

Let X, Y be feature vectors of length n obtained by NMF method where X represents the weight of probe images, while Y means the weight of training images.
And is the auto-covariance matrix for training images, while {si , i = 1, , n}
means the square root of diagonal element in , i.e. the standard deviation for
training images. Then we can calculate distances between these feature vectors. All the denitions of distance measures used in this paper are as below
[7,11,8,10,1,3].
(1) Manhattan distance (L1 metric, city block distance)
n

|xi yi |

(6)

(xi yi )2

(7)

d(X, Y ) = max |xi yi |

(8)

d(X, Y ) =
i=1

(2) Euclidean distance (L2 metric)


n

d(X, Y ) =
i=1

(3) Chebychev distance (L- norm)


1in

1042

Y. Xue, C.S. Tong, and W. Zhang

(4) Mahalanobis distance


(X Y ) 1 (X Y )

d(X, Y ) =

(9)

(5) Lance distance


n

d(X, Y ) =
i=1

|xi yi |
|xi | + |yi |

(10)

xi yi
si

(11)

xi
xi + yi )
yi

(12)

(6) Statistical distance


n

d(X, Y ) =
i=1

(7) Divergence
n

(xi ln

d(X, Y ) =
i=1

Like the Euclidean distance, it is also lower bounded by zero, and vanishes if and
only if X = Y . But it cannot be called a distance, because it is not symmetric
in X and Y , so we will refer to it as the divergence of X from Y .
(8) Kullback-Leibler distance (Relative Entropy)
n

d(X, Y ) =

xi log2
i=1

xi
, xi =
yi

|xi |
n

|xi |

|yi |

, yi =

i=1

(13)

|yi |

i=1

Like divergence, it also cannot be called a distance, because it is not symmetric


in X and Y .
(9) Symmetrized divergence
d(X, Y ) =

1
2
1
2

xi ln

xi
xi + yi
yi

yi ln

yi
yi + xi
xi

i=1
n

i=1

+
(14)

(10) Symmetrized Kullback-Leibler distance


1
2

d(X, Y ) =
here xi =

xi log2
i=1

xi
y
+
yi log2 i
yi
xi
i=1

(15)

|x|x| | , yi = |y|y| | .
i

i=1

i=1

(11) Mahalanobis angle distance


d(X, Y ) = 1

X 1 Y

X 1 X Y 1 Y

(16)

Survey of Distance Measures for NMF-Based Face Recognition

1043

(12) Chi square distance


n

d(X, Y ) =
i=1

(xi yi )2
xi + yi

(17)

(13) Exponential similarity coecient-based distance


n

1
d(X, Y ) = 1 (X, Y ), (X, Y ) =
n
2

34

(xi yi )2
s2
i

(18)

i=1

(14) Non-parametric similarity coecient-based distance


d(X, Y ) = 1 2 (X, Y ), (X, Y ) =

n+ n
n+ + n

(19)

here xi = xi xi , yi = yi y i , n+ means the frequency of {xi yi 0, i =


1, , n}, and n means the frequency of {xi yi < 0, i = 1, , n}.
(15) Cosine distance
n

xi yi
d(X, Y ) = 1 cos(X, Y ) = 1

i=1

(20)

n
( x2i )( yi2 )
i=1
i=1

(16) Correlation coecient-based distance


d(X, Y ) = 1 (X, Y )

(21)

(x x)(y y)
here (X, Y ) =
(x x) ][ (y y) ] .
[
n

i=1

i=1

i=1

The preceding four distance measures are all similarity coecient-based distances. We now suggest to consider a distance measure that seems not to have
been used in face recognition, but which originated from the theory of multivariate clustering analysis [11]. We think it may be a suitable distance measure for
NMF application because it is derived from a similarity coecient specically
dened for non-negative vectors:
(17) Non-negative vector similarity coecient-based (NVSC) distance
n

min(xi , yi )
d(X, Y ) = 1 (X, Y ), (X, Y ) =
2

i=1
n

(22)
max(xi , yi )

i=1

Among all the above distance functions, the Manhattan distance, Euclidean
distance, and the Mahalanobis distance are the most widely-used in pattern
recognition.

1044

4
4.1

Y. Xue, C.S. Tong, and W. Zhang

Testing Databases Used in This Paper


CBCL Database

The MIT-CBCL face recognition database contains face images of 10 subjects


which is divided into two sets: high resolution pictures, and synthetic images
(324/subject) rendered from 3D head models of the 10 subjects. In this paper, we
used the second set which contains images that varied in illumination and pose.
4.2

CMU AMP Face Expression Database

In this database, there are 13 subjects and each one has 75 images showing different expressions. These face images are collected in the same lighting condition
using CCD camera, and all of them have been well-registrated by eye locations.
4.3

YaleB Database

The Yale Face Database B (YaleB) contains 5850 source images of 10 subjects
each captured under 585 viewing conditions (9 poses 65 illumination conditions). In the preprocess stage, all frontal pose images have been aligned by the
centers of eyes and mouth and the other images are aligned by the center points
of the faces. Then all images are normalized with the same resolution 92 112.
In contrast with the other two databases, this one includes more complicated
image variations and background noises, therefore the corresponding recognition
result would be expected to be much poorer.
To reduce the computational complexity, we use matlab to resize all the images
in the above databases to 1/16 of the original size, then apply NMF algorithm
on the downsampled image sets.

Experiment

In this section, we build a face recognition system to provide a performance evaluation of 17 dierent distance measures using images from databases described
in Sect.4. The system adopts traditional NMF algorithm which consists of two
stages, namely, training and recognition stages. The detailed procedure is as
below.
5.1

Training Stage

This stage includes 3 major steps.


First, we use a n m matrix V1 to represent all the training images in one
database.
Secondly, the NMF algorithm is applied to V1 and we can obtain two new
matrices (W1 and H1 ) as in Sect. 2, s.t.
r

(V1 )ij

(W1 H1 )ij =

(W1 )ia (H1 )aj


a=1

where W1 is the base matrix, and H1 is the weight matrix.

Survey of Distance Measures for NMF-Based Face Recognition

1045

Finally, we build dierent libraries to save the training image representations


and their corresponding representational bases for all the databases.
5.2

Recognition Stage

Face recognition in the NMF linear subspace is performed as follows.


Feature Extraction. There are two ways to obtain the feature vectors of
training images and test images [2,5].
1. Let W + = (W T W )1 W T , then each training face image Vi is projected into
the linear space as a feature vector Hi = W + Vi which is then used as a
prototype feature point. A test face image Vt to be classied is represented
as Ht = W + Vt .
2. Using the obtained bases W1 from the training stage, we can directly use the
iterative technique in the original NMF algorithm but keeping W1 intact [i.e.
do not use the iterative update rule (3) concerning to W1 ]. Then, we will get
the weight matrix H2 using a xed set of bases (W1 ) and use the matrices
H1 and H2 as the feature vectors of training images and test images.
In this paper, we shall adopt the second approach for feature extraction.
Classication. In this step, rst calculate the mean feature vector Hm of each
class in training set; then all the distance measures (dened in Sect.3) between
the feature vector of test image and the mean vector, dist(Ht , Hm ), is calculated;
nally, the test image is classied into the class which the closest mean vector
belongs to.
5.3

Experimental Results

A set of experiments are conducted on the above system, then we evaluate the
performance of all the distance measures for NMF-based face recognition. In all
the experiments, we select tr images per person from the database to form a
training set and use the remainder as the test set.
Recognition rates for the three dierent databases with dierent experimental
settings (tr=2, 10, and 20, and dimensionality of feature vectors at 40, 60, and
80) are summarized in Table 1. To facilitate comparison, we use bold fonts for
the best 3 measures in each experimental setting.
From Table 1, we can see that:
The commonly used Manhattan distance (distance 1), Euclidean distance (distance 2), and Mahalanobis distance (distance 4) were not particularly eective.
The Manhattan distance performed best among these three popular distance
measures and was ranked in the top 3 in 3 cases. Among all the conventional
distance measures (distance 1 to 16), the cosine distance (distance 15) achieved
the best result and was ranked as one of the best 3 measures in 5 cases.
For the distance measures designed for non-negative vectors, the divergence
(distance 7) and Kullback-Leibler distance (distance 8) were not particularly

1046

Y. Xue, C.S. Tong, and W. Zhang


Table 1. Recognition rate of all the distance measures
CBCL (tr=10)

CMU AMP (tr=2)

YaleB (tr=20)

Distance
measure

p=40

p=60

p=80

p=40

p=60

p=80

p=40

p=60

p=80

distance 1

0.93949

0.93408

0.8879

0.99684

0.99579

0.26513

0.26549

0.28531

distance 2

0.89682

0.93057

0.89204

0.99579

0.99473

0.98736

0.23823

0.26513

0.28319

distance 3

0.67803

0.6914

0.62134

0.93888

0.9568

0.9157

0.2131

0.26319

0.25274

distance 4

0.85924

0.89459

0.84745

0.99473

0.99579

0.99473

0.34956

0.36566

0.35611

distance 5

0.77898

0.79904

0.75446

0.95258

0.95785

0.97366

0.23912

0.23115

0.2554

distance 6

0.38089

0.38917

0.43949

0.74394

0.73656

0.65753

0.16814

0.18053

0.16973

distance 7

0.87834

0.93089

0.88471

0.98103

0.97998

0.98419

0.34832

0.32991

0.36177

distance 8

0.87357

0.91783

0.86911

0.97787

0.97893

0.98419

0.34903

0.33912

0.33204

distance 9

0.91688

0.92643

0.88726

0.99789

0.99157

0.99895

0.29434

0.27097

0.30389

distance 10

0.92611

0.92771

0.91274

0.99684

0.99473

0.35646

0.35522

0.35009

distance 11

0.88758

0.92707

0.90382

0.99684

0.99368

0.98946

0.21788

0.25876

0.27611

distance 12

0.94427

0.95191

0.91401

0.99262

0.99895

0.29575

0.2931

0.32389

distance 13

0.60955

0.67134

0.64777

0.90095

0.89357

0.87144

0.18159

0.18142

0.17097

distance 14

0.1

0.1

0.1

0.076923

0.076923

0.076923

0.1

0.1

0.1

distance 15

0.93057

0.95987

0.93917

0.99579

0.99368

0.99052

0.35805

0.38655

0.38991

distance 16

0.92994

0.95064

0.92261

0.99368

0.99368

0.99262

0.36708

0.38

0.3869

distance 17

0.95924

0.96369

0.94363

0.99579

0.99789

0.36106

0.37982

0.37221

CBCL database: tr=10


0.96

Recognition rate

0.94

0.92

0.9

0.88

Manhattan distance
Euclidean distance
Mahalanobis distance
NVSC distance
cosine distance

0.86

30

40

50

60

70

80

90

Dimensionality of the weight vectors


Fig. 1. Recognition rate of dierent distance measures

100

Survey of Distance Measures for NMF-Based Face Recognition

1047

CBCL database: p=80


0.96
0.94

Recognition rate

0.92
0.9
0.88
0.86
0.84
0.82
0.8
0.78
10

Manhattan distance
Euclidean distance
Mahalanobis distance
NVSC distance
Cosine distance

20

30

40

50

60

70

80

tr
CMU AMP Face Expression Database: p=10
1
0.995

Recognition rate

0.99
0.985
0.98
0.975
0.97
Manhattan distance
Euclidean distance
Mahalanobis distance
NVSC distance
cosine distance

0.965
0.96
0.955
0.95

tr
Fig. 2. Recognition rate of dierent distance measures when xing the dimensionality p

eective. The symmetrized versions (distance 9, 10) performed better, but by


far the best result was obtained by our NVSC distance (distance 17). The NVSC
distance was ranked as one of the best 3 measures in all but one case [CMU AMP

1048

Y. Xue, C.S. Tong, and W. Zhang

database, with dimensionality set at 80 and 2 training images]. And even then,
it was in fact ranked 4th with recognition rate of 0.99789 ! In addition to being
a consistently good performer, the NVSC distance was in fact ranked the top
(or shared top) performer in 5 cases out of the 9 sets of experiments.
For further comprehensive comparison, we shall now concentrate on the Manhattan distance, Euclidean distance, Mahalanobis distance, cosine distance and
our NVSC distance. In Fig.1, we plot the respective recognition rates vs. the
dimensionality of feature vectors for the CBCL database (tr = 10).
From Fig.1, we see that although the cosine distance outperforms the NVSC
distance at dimensionality of 50, its recognition rate curve uctuates quite substantially and the NVSC curve is clearly the most consistent best performer
across a wide range of dimensionality.
Finally, we x the dimensionality of the feature vectors and plot the recognition rates vs. the value of tr for the CBCL and CMU AMP databases in Fig.2,
where p represents the dimensionality of the feature space.
Again, the NVSC emerges as the best distance measure.

Conclusions and Future Work

In this paper, we compared 17 distance measures for NMF-based face recognition. Recognition experiments are performed using 3 dierent databases. The
experiments show that our NVSC distance measure is consistently among the
best measures under dierent experimental conditions and always performs better than the Manhattan distance, Euclidean distance, and the Mahalanobis distance which are often used in pattern recognition systems. We believe that the
eectiveness of the NVSC measure stems from the fact that it is specically designed for non-negative vectors and thus is the most appropriate for NMF-based
applications. The entropy-based measures (distance 7-10) can also handle nonnegative vectors, but they are primarily designed for probability distributions
and are not eective in handling vectors with many zero coecients.

References
1. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York
(1991)
2. Feng, T., Li, S.Z., Shum, H.-Y., Zhang, H.: Local non-negative matrix factorization
as a visual representation. In: ICDL 02: Proceedings of the 2nd International Conference on Development and Learning, vol. 178, p. 178. IEEE Computer Society,
Washington, DC, USA (2002)
3. Fraser, A., Hengartner, N., Vixie, K., Wohlberg, B.: Incorporating invariants in
mahalanobis distance based classiers: Application to face recognition. In: International Joint Conference on Neural Networks (IJCNN), Portland, OR, USA (2003)
4. Guillamet, D., Vitri`
a, J.: Evaluation of distance metrics for recognition based
on non-negative matrix factorization. Pattern Recogn. Lett. 24(9-10), 15991605
(2003)

Survey of Distance Measures for NMF-Based Face Recognition

1049

5. Guillamet, D., Vitri`


a, J.: Non-negative matrix factorization for face recognition. In:
Escrig, M.T., Toledo, F.J., Golobardes, E. (eds.) Topics in Articial Intelligence.
LNCS (LNAI), vol. 2504, pp. 336344. Springer, Heidelberg (2002)
6. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788791 (1999)
7. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Adv.
Neural Info. Proc. Syst. 13, 556562 (2001)
8. Perlibakas, V.: Distance measures for pca-based face recognition. Pattern Recogn.
Lett. 25(6), 711724 (2004)
9. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3,
7186 (1991)
10. Yu, J.: Clustering methods, applications of multivariate statistical analysis. In:
Technical report, School of Electronics Engineering and Computer Science, Peking
University, Beijing 100871
11. Zhang, Y., Fang, K.: An Introduction to Multivariate Analysis. Science Press,
Beijing (1982)

You might also like