You are on page 1of 11

150

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

A Method of Face Recognition Based on Fuzzy


c-Means Clustering and Associated Sub-NNs
Jianming Lu, Xue Yuan, and Takashi Yahagi, Senior Member, IEEE
AbstractThe face is a complex multidimensional visual model
and developing a computational model for face recognition is
difficult. In this paper, we present a method for face recognition based on parallel neural networks. Neural networks (NNs)
have been widely used in various fields. However, the computing
efficiency decreases rapidly if the scale of the NN increases. In
this paper, a new method of face recognition based on fuzzy
clustering and parallel NNs is proposed. The face patterns are
divided into several small-scale neural networks based on fuzzy
clustering and they are combined to obtain the recognition result.
In particular, the proposed method achieved a 98.75% recognition
accuracy for 240 patterns of 20 registrants and a 99.58% rejection
rate for 240 patterns of 20 nonregistrants. Experimental results
show that the performance of our new face-recognition method
is better than those of the backpropagation NN (BPNN) system,
the hard c-means (HCM) and parallel NNs system, and the pattern-matching system.
Index TermsFace recognition, fuzzy clustering, parallel neural
networks (NNs).

I. INTRODUCTION

ACE recognition plays an important role in many applications such as building/store access control, suspect identification, and surveillance [1], [2], [4][7], [16][23]. Over the
past 30 years, many different face-recognition techniques have
been proposed, motivated by the increased number of real-world
applications requiring the recognition of human faces. There are
several problems that make automatic face recognition a very
difficult task. The face image of a person input to a face-recognition system is usually acquired under different conditions from
those of the face image of the same person in the database.
Therefore, it is important that the automatic face-recognition
system be able to cope with numerous variations of images of
the same face. The image variations are mostly due to changes
in the following parameters: pose, illumination, expression, age,
disguise, facial hair, glasses, and background [18][23].
In many pattern-recognition systems, the statistical approach
is frequently used [18][23]. Although this paradigm has been
successfully applied to various problems in pattern classification, it is difficult to express structural information unless an
appropriate choice of features is possible. Furthermore, this
approach requires much heuristic information to design a classifier. Neural-network (NN)-based paradigms, as new means of
implementing various classifiers based on statistical and structural approaches, have been proven to possess many advantages
Manuscript received January 28, 2005; revised September 22, 2005.
The authors are with the Graduate School of Science and Technology, Chiba
University, Chiba 263-8522, Japan (e-mail: yuanxue@graduate.chiba-u.jp).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNN.2006.884678

for classification because of their learning ability and good


generalization [5][9]. Generally speaking, multilayered networks (MLNs), usually coupled with the backpropagation (BP)
algorithm, are most widely used for face recognition [24]. The
BP algorithm is a gradient-based method, hence, some inherent
problems (or difficulties) are frequently encountered in the use
of this algorithm, e.g., very slow convergence speed in training,
and difficulty in escaping from a local minimum. Therefore,
some techniques are introduced to resolve these drawbacks;
however, to date, all of them are still far from satisfactory.
A structurally adaptive intelligent neural tree (SAINT) was
proposed by Lin et al. [7]. The basic idea is to hierarchically
partition the input pattern space using a tree-structured NN
composed of subnetworks with topology-preserving mapping
ability. The self-growing NN CombNet-II was proposed by
Nugroho et al. [28]. The stem network divides the input space
by a vector quantizing network into several subspaces. Each
output neuron of the stem network is associated with a branch
network, which is a feedforward three-layered network that
performs a refined classification of the input vector in a specific
subspace. The radial basis function NN (RBFNN) is widely
used for function approximation and pattern-recognition systems [14], [16], [17]. In this paper, we propose a new method
of face recognition based on fuzzy clustering and parallel NNs.
As one drawback of the BP algorithm, when the scale of the
NN increases, the computing efficiency decreases rapidly for
various reasons, such as the appearance of a local minimum.
Therefore, we propose a method in which the individuals in the
training set are divided into several small-scale parallel NNs,
and they are combined to obtain the recognition result. The
HCM is the most well-known conventional (hard) clustering
method [12]. The HCM algorithm executes a sharp classification, in which each object is either assigned to a cluster or not.
Because the HCM restricts each point of a data set to exactly
one cluster and the individuals belonging to each cluster are
not overlapped, some similar individuals cannot be assigned to
the same cluster, and, hence, they are not learned or recognized
in the same NN. In this paper, fuzzy c-means (FCM) is used
[13][15]. In contrast to HCM, the application of fuzzy sets in a
classification function causes the class membership to become
a relative one and an object can belong to several clusters at the
same time but to different degrees. FCM produces the idea of
uncertainty of belonging described by a membership function,
and it enables an individual to belong to several networks. Then,
all similar patterns can be thoroughly learned and recognized in
one NN.
Details of this system are described in the remainder of
this paper. Section II covers preprocessing of the system. In
Section III, we present a method for face recognition based on

1045-9227/$20.00 2006 IEEE

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS

151

Fig. 1. Original face image.

fuzzy clustering and parallel NNs. In Section IV, experimental


results of evaluating the developed techniques are presented.
Discussion is presented in Section V. Finally, conclusions are
summarized in Section VI.

Fig. 2. Geometry of our head model.

II. PREPROCESSING
A. Facial-Image Acquisition
In our research, original images were obtained using a
charge coupled devices (CCD) camera with image dimensions
of 384 243 pixels encoded using 256 gray-scale levels.
In image acquisition, the subject sits 2.5 m away from a CCD
camera. On each site of the camera, two 200-W lamps are placed
at 30 angles to the camera horizontally. The original images are
shown in Fig. 1.
B. Lighting Compensation
We adjusted the locations of the lamps to change the lighting
conditions. The total energy of an image is the sum of the
squares of the intensity values. The average energy of all the
face images in the database is calculated. Then, each face image
is normalized to have energy equal to the average energy
Energy

Intensity

(1)

C. Facial-Region Extraction
We adopt the face-detection method presented in [25]. The
method of detecting and extracting the facial features in a grayscale image is divided into two stages. First, the possible human
eye regions are detected by testing all the valley regions in an
image. A pair of eye candidates is selected by means of the genetic algorithm to form a possible face candidate. In our method,
a square block is used to represent the detected face region.
Fig. 2 shows an example of a selected face region based on the
location of an eye pair. The relationships between the eye pair
and the face size are defined as follows:

Fig. 3. Windows for facial feature extraction.

the symmetrical measure is less than a threshold value, the face


candidate will be selected for further verification.
After measuring the symmetry of a face candidate, the existences of the different facial features are also verified. The positions of the facial features are verified by analyzing the projection of the face candidate region. The facial feature regions will
exhibit a low value on the projection. A face region is divided
into three parts, each of which contains the respective facial features. The -projection is the average of gray-level intensities
along each row of pixels in a window. In order to reduce the effect of the background in a face region, only the white windows,
as shown in Fig. 3, are considered in computing the projections.
The top window should contain the eyebrows and the eyes, the
middle window should contain the nose, and the bottom window
should contain the mouth. When a face candidate satisfies the
aforementioned constraints, it will be extracted as a face region.
The extracted face image is shown in Fig. 4.
D. Principal Component Analysis (PCA)

Then, the symmetrical measure of the face is calculated. The


nose centerline (the perpendicular bisector of the line linking
the two eyes) in each facial image is calculated. The difference
between the left half and right half from the nose centerline of a
face region should be small due to its symmetry. If the value of

be a two-dimensional (2-D)
array of
Let a pattern
intensity values. A pattern may also be considered as a vector
. Denote the database of patterns by
of dimension
. Define the covariance matrix as
follows [4]:
(2)

152

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

following cost function:


(7)

Fig. 4. Extracted face image.

where

and
. Then, the eigenvalues and eigenvectors of
the covariance are calculated. Let
be the eigenvectors corresponding to the
largest eigenvalues. Thus, for a set of patterns
,
their corresponding eigenface-based features
can be
obtained by projecting into the eigenface space as follows:

is a weighting exponent, called a fuzzifier,


where
that is chosen according to the case. When
, the process
,
converges to a generalized classical -means. When
all clusters tend towards the center of gravity of the whole data
set. That is, the partition becomes fuzzier with increasing .
is the vector of the cluster centers, and
is the distance between
and the th cluster. Bezdek [13]
and
, then and
proved that if
minimize
only if their entries are computed as
(8)

(3)
For the PCA method, results are shown for the case of using
32 principal components. In other words, faces from a high-dimensional image space are projected to a 32-dimensional feature vector.
III. FUZZY CLUSTERING AND NEURAL NETWORKS
The clusters are functions that assign to each object a
number between zero and one, which is called the membership
of the object in the cluster. Objects which are similar to each
other are identified by having high membership degrees in the
same cluster. It is also assumed that the membership degrees
are chosen so that their sum for each object is one; therefore,
fuzzy clustering is also a partition of the set of objects. The most
widely used fuzzy clustering algorithm is the FCM algorithm
[13][15].
A. FCM
FCM is a data clustering algorithm in which each data point
is associated with a cluster through a membership degree. This
data points into
fuzzy
technique divides a collection of
groups and finds a cluster center in each group such that a cost
function of a dissimilarity measure is minimized. The algorithm
employs fuzzy partitioning such that a given data point can
belong to several groups with a degree specified by membership grades between 0 and 1. A fuzzy -partition of input feature vector
is represented by a matrix
, and is an -element set of -dimensional vectors,
each representing a 32-dimensional vector. The entries satisfy
the following constraints:
(4)
(5)
(6)
represents the feature coordinate of the th
data.
is the membership degree of
to cluster . A proper
partition of may be defined by the minimization of the

(9)
One of the major factors that influence the determination
of appropriate clusters of points is the dissimilarity measure
chosen for the problem. Indeed, the computation of the memdepends on the definition of the distance
bership degree
measure
, which is the inner product norm (quadratic norm).
The squared quadratic norm (distance) between a pattern vector
and the center of the th cluster is defined as
(10)
where is any positivedefinite matrix. The identity matrix is
the simplest and most popular choice for .
B. Distributing Algorithm of the Facial Images by FCM
The FCM algorithm consists of a series of iterations using (8)
and (9). This algorithm converges to a local minimum point of
. We use the FCM as follows to determine the cluster
centers and the membership matrix .
Step 1) Initially, the membership matrix is constructed using
random values between 0 and 1 such that constraints
(4), (5), and (6) are satisfied.
Step 2) The membership function is computed as follows.
a) For each cluster , the fuzzy cluster center is
computed using (9).
b) All cluster centers which are too close to each
other are eliminated.
is computed
For each cluster , the distance
.
as
for
to ,
When
where
is the average distance value.
is computed
c) For each cluster , the distance
using (10).
d) The cost function using (7) is computed. Stop
if its improvement over the previous iteration
below a threshold.
e) A new using (7) is computed and Step 2) is
repeated.
Step 3) The number of membership functions is decreased
based on defuzzification.

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS

153

Fig. 5. Structure of proposed parallel NNs.

C. Parallel NNs
In this paper, the parallel NNs are composed of three-layer
BPNNs. A connected NN with 32 input neurons and six output
neurons have been simulated (six individuals are permitted to
belong to each subnet, which is presented in Section IV). The
structure of the proposed parallel NNs is illustrated in Fig. 5.
The number of hidden units was selected by sixfold cross
validation from 6 to 300 units [29]. The algorithm added three
nodes to the growing network once. The number of hidden units
is selected based on the maximum recognition rate.
1) Learning Algorithm: A standard pattern (average pattern)
is obtained from 12 patterns per registrant. Based on the FCM
algorithm, 20 standard patterns are divided into several clusters.
Similar patterns in one cluster are entered into one subnet.
Then, 12 patterns of a registrant are entered into the input
layer of the NN to which the registrant belongs. On each subnet,
the weights are adapted according to the minus gradient of the
squared Euclidean distance between the desired and obtained
outputs.
2) Recognition Algorithm: When a test pattern is input into
the parallel NNs, as illustrated in Fig. 5, based on the outputs
in each subnet and the similarity values, the final result can be
obtained as follows.
Step 1) Exclusion by the negation ability of NN. First, all the
registrants are regarded as candidates. Then, only
the candidate with the maximum output remains in
each subnet. If the maximum output values are less
than the threshold value, corresponding candidates
are deleted. The threshold value is set to 0.5, which
is determined based on the maximum output value
of the patterns of the nonregistrant. Since similar individuals are distributed into one subnet, based on
this step, the candidates similar to the desired individual are excluded.
Step 2) Exclusion by the negation ability of parallel NNs.
Among the candidates remaining after Step 1), the
candidate that has been excluded in one subnet will
be deleted from other subnets. If all the candidates

are excluded in this step, this test pattern is judged


as a nonregistrant. When a candidate similar to the
desired individual is assigned to several clusters at
the same time, it may become the maximum output
of the subnets to which the desired individual does
not belong and may be selected as the final answer
by mistake. By performing Step 2), this possibility
is avoided.
Step 3) Judgment by the similarity method. If some candidates remain after Step 2), then, the similarity measure is used for judgment.
The similarity value between the patterns of each remaining candidate and the test pattern is calculated.
The candidate having the greatest similarity value is
regarded as the final answer. If this value is less than
the threshold value of similarity, the test pattern is
regarded as a nonregistrant.
We illustrate the overall recognition rates for different
threshold values in Fig. 11. Lowering the threshold value
raises the recognition rate but lowers the rejection rate, causing
nonregistrants to be judged as registrants. In contrast, raising
the threshold lowers the recognition rate but raises the rejection rate, causing registrants to be judged as nonregistrants.
From Fig. 11, it can be seen that in our experiment, the best
performance is achieved when the threshold is set to 0.97. The
similarity is calculated as
similarity

(11)

where is the number of individuals and


is the number of
trained patterns for the individual.
The system architecture of our experiment is illustrated in
Fig. 6.
IV. EXPERIMENTS
Experiments have been carried out using patterns of 40 individuals at Chiba University, Chiba, Japan (20 individuals were

154

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

TABLE I
SUBNETS DETERMINED USING FCM

Fig. 7. Recognition rate as a function of max-cluster-number.

A. Computation by FCM

Fig. 6. System architecture of our experiment.

selected as registrants and 20 individuals as nonregistrants).


Each individual provided 24 frontal patterns which show different facial expressions: blank expressions and smiles. In our
system, for each individual, 12 patterns were selected as the
training set, and 12 patterns were used as the test patterns for
recognition. Patterns of 40 different individuals were obtained
over a two-month period. The 40 individuals consisted of 26
males and 14 females. The age of the subjects ranged from 12
to 40 years old.

Based on the FCM algorithm presented in Section III, if the


difference between two cluster centers is less than the average
difference value, the two clusters will be incorporated and one
cluster will be deleted. individuals divided into clusters are
described by an
matrix , where the
th entry
is
the membership between 0 and 1, and the sum of the entries in
each column is one. The last number of clusters is ten. In other
words, a total of ten clusters were formed around individuals
[1][5], [7], [10], [11], [15], [19].
B. Defuzzification
1) Defuzzification of Columns: The maximum number of
clusters to which an individual may belong (max-to-cluster)
was set to 5. In order to reduce the number of elements, the
. Here,
threshold of defuzzification was set to
is the number of clusters. If each element value is less than this
threshold, the value of this element is set to 0. If there are more
than five elements in one column, only the top five elements remain in the column.

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS

155

Fig. 8. Learning time as a function of max-cluster-number.

Fig. 10. Learning time as a function of max-to-cluster.

TABLE II
SUBNETS AFTER PARTIAL UNIFICATION

Fig. 9. Recognition rate as a function of max-to-cluster.

3) Merging of Clusters: In order to reduce the amount of calculation, clusters are integrated (arranged). However, the maximum number of elements per cluster is six. We integrate the
clusters automatically. The algorithm for this step is presented
as follows.
for

2) Defuzzification of Lines: The lines determine how many


elements one cluster may contain. The maximum number of
elements in a cluster (max-cluster-member) is set to six. The
top six elements that are not 0 are saved and the other elements
are excluded. In order to guarantee that an element belongs to at
least one cluster, an element that is not among the top six is to
be saved when the value of this element in other clusters is 0 in
the same column. The obtained subnets are presented in Table I.
In order to determine the values of max-to-cluster and maxcluster-member, we performed various experiments. The experimental results are illustrated in Figs. 710. The values were determined when the system achieved the highest recognition rate
and took the shortest learning time.

If net-count

max-to-cluster, continue

for
for
If net-count

net-count

net-count

max-to-cluster, continue

net-count

Here, net-count denotes the number of elements in the


th cluster, and
denotes the number of the clusters after
integration.

156

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

TABLE III
OUTPUT RESULTS OF SUBNET 1 AFTER LEARNING

Furthermore, when the same elements are integrated into one


cluster, only one element remains. Here, integrating the clusters can reduce the amount of calculation, as shown in Table II,
which enhances the efficiency.

TABLE IV
OUTPUTS OF EACH SUBNET FOR PATTERN 1116

C. Learning and Recognition by Parallel NNs


In the learning procedure, let us consider patterns 1, 8, 9, 12,
14, and 16 in subnet 1 of Table II. Table III gives some of the
actual output after learning. On the basis of Table III, learning
by the parallel NNs is judged to be correct.
For 20 registrants, 240 patterns are used for recognition. An
additional 240 patterns are prepared for 20 nonregistrants to determine whether the parallel NNs system can judge that the individuals are not registered. The recognition procedure is based
on the algorithm presented in Section III.
D. Experimental Results
1) Registrant: Here, the results of recognition of a pattern
(pattern 1116, registrant 11, image 16) are shown as an example. Table IV gives the outputs from each subnet for pattern
1116. The element of the greatest output is extracted from each

subnet as the first answer to each subnet. For subnet 6, no answer was obtained because all element values were lower than
the threshold of 0.5. Table V lists the results. For subnets 17,
patterns 6, 11, 12, and 18 were selected. Pattern 6 was excluded
from subnets 3, 5, and 6. Pattern 18 was excluded from subnets 5
and 6 as well as from the answers of all subnets. Patterns 11 and
12 remained after recognition based on the negation ability of
the parallel NNs system. The results are presented in Table VI.
Here, recognition by the similarity measure was applied. The

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS

157

TABLE V
RESULTS OF EACH SUBNET FOR PATTERN 1116

TABLE X
RESULTS OF EACH SUBNET FOR PATTERN N0311

TABLE VI
RECOGNITION RESULTS OF PARALLEL NNS BASED
ON REJECTION RULES FOR PATTERN 1116

TABLE XI
OUTPUT OF EACH SUBNET FOR PATTERN N0307

TABLE VII
EXAMPLE OF SIMILARITY WITH PATTERN 1116

TABLE VIII
OUTPUTS OF EACH SUBNET FOR PATTERN N0101

TABLE IX
OUTPUTS OF EACH SUBNET FOR PATTERN N0311

input pattern had a similarity of 0.994677 with registrant 11,


which was better than that of 0.974347 with registrant 12. The
similarities are presented in Table VII. Therefore, the face pattern was judged to be that of the eleventh registrant.
2) Nonregistrants:
a) Exclusion based on the negation ability of the NN: The
results of recognition of a pattern (pattern N0101, nonregistrant
1, image 1) are used as an example. Table VIII gives the outputs

from each subnet for pattern N0101. An element of the greatest


output is extracted from each subnet as the first answer to each
subnet. For subnets 17, no answer was obtained because all
element values were lower than the threshold of 0.5. Therefore,
pattern N0101 was identified as a nonregistrant.
b) Exclusion based on the negation ability of the parallel
NNs: The results of recognition of a pattern (pattern N0311,
nonregistrant 3, image 11) are used as an example. Table IX
lists the outputs from each subnet for pattern N0311. An element
of the greatest output is extracted from each subnet as the first
answer to each subnet. Table X gives the results of each subnet.
For subnet 7, pattern 8 was selected. However, it was excluded
from subnet 1. No element remained after recognition based on
the negation ability of the parallel NNs system. Therefore, the
face pattern was judged to be a nonregistrant.
c) Exclusion based on similarity: Here, the results of
recognition of a pattern (pattern N0307, nonregistrant 3, pattern
7) are used as an example. Table XI gives the outputs from each
subnet for pattern N0307. An element of the greatest output is
extracted from each subnet as the first answer to each subnet.
Table XII gives the results. For subnets 17, patterns 6 and 18
were selected. Pattern 6 was excluded from subnets 3, 5, and
6. Pattern 18 was left after recognition based on the negation
ability of the parallel NNs system. However, the similarity with
pattern 18 was 0.9602, which was less than the threshold of
0.97. Therefore, the pattern was judged to be a nonregistrant.
We illustrate the overall recognition rates for different
threshold values in Fig. 11, where this method was applied
to frontal patterns. The horizontal axis indicates the threshold
value used, and the vertical axis represents the recognition
rate. When the threshold is set to 0.97, the recognition rate
is 98.75% (two errors and one rejection among 240 patterns)
for registrants and 99.58% (one error among 240 patterns) for
nonregistrants. The false rejection rate (FRR) is 0.42% and the
false acceptation rate (FAR) is 0.42%. Since the FRR and FAR

158

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

TABLE XII
RESULTS OF EACH SUBNET FOR PATTERN N0307

Fig. 11. Recognition results.

are equal, the crossover error rate (CER) is 0.42%. Recognition


rate and rejection rate are defined as follows.
1) Recognition rate: For patterns of registrants in a database,
the correct recognition accuracy rate at the testing stage.
100
(number of correct patterns)/
2) Recognition rate
(total number of all patterns).
3) Rejection rate: For patterns of nonregistrant, the accurate
rejection rate, and for patterns of registrants, the false rejection rate at the testing stage.
4) Rejection rate 100 (number of rejected patterns)/(total
number of patterns).
V. DISCUSSION
In this paper, an efficient approach for face recognition
was presented. In order to assess this system, we tested three
existing approaches for face recognition using the same database and compared the performances with our method. The
BPNN system, the HCM and parallel NNs system, and the
pattern matching system were used as the three approaches.
The three approaches were carried out using facial patterns of
40 individuals (20 individuals were selected as registrants and
20 individuals as nonregistrants). They are the same group as
that in the our research. The processes and experimental results
of the three experiments are as follows.
A BPNN with 32 input neurons and 20 output neurons (the
number of classes) were simulated in this experiment. The
training algorithm was the same as the algorithm in each subnet

TABLE XIII
ERROR RATES OF DIFFERENT APPROACHES

of our proposed system. The optimum number of hidden units


was selected by cross validation presented in Section III. The
recognition rate was 93.75%. Furthermore, it required an approximately two times longer learning time than the proposed
system.
The experiment using parallel NNs and HCM [5] was performed for the same database as ours. First, 20 registrants were
divided into four clusters and the maximum number of individuals in each cluster was six. The training algorithm was performed in each subnet. When a test pattern was input into the
parallel NNs, the maximum output of each subnet was extracted.
Then, the similarity value between the candidates extracted and
the test pattern was calculated. The candidate having the greatest
similarity value is judged as the final answer. Since the individuals belonging to each subnet are not overlapped, some similar
patterns cannot be assigned to the same subnet, the recognition
rate of registrants (95.83%) was lower than that of our system.
Lam and Yan [2] proposed an analytic-to-holistic approach
based on point matching, in which the feature points and the
eyes, nose and mouth windows are compared with those in the
database by correlation. The pattern-matching (MP) method is
based on distances in multidimensional characteristic space and
is widely used in various fields. However, the MP method is vulnerable to image fluctuation. Comparisons with Lam and Yans
method for the same database are shown in Table XIII.
This technique involves the similarity measure for selecting
the final answer from all candidates extracted by the parallel
NNs system. Since the parallel NNs system has already excluded candidates similar to the desired individual through
the processes of extracting the candidate having the maximum
output in each subnet at Step 1) (Section III) and excluding all
the candidates that have been excluded in a subnet at Step 2),
a simple method is sufficient for subsequent judgment. This
method should be used to compare the patterns dissimilar to
each other for final judgment. The similarity measure is a good
method of judging an answer from a small number of dissimilar patterns by an easy process. Furthermore, utilizing the
similarity measure takes much shorter learning and recognition
time than utilizing the NN method at the final step.
VI. CONCLUSION
In this paper, we proposed a fuzzy clustering and parallel
NNs method for face recognition. The patterns are divided into
several small-scale subnets based on FCM. Due to the negation ability of NN and parallel NNs, some candidates are excluded. The similarities between the remaining candidates and
the test patterns are calculated. We judged the candidate with
the greatest similarity to be the final answer when the similarity

LU et al.: FACE RECOGNITION BASED ON FCM CLUSTERING AND SUB-NNS

value was above the threshold value. Otherwise, it was judged


to be nonregistered. The proposed method achieved a 98.75%
recognition accuracy for 240 patterns of 20 registrants and a
99.58% rejection rate for 240 patterns of 20 nonregistrants.

159

TABLE XIV
ERROR RATES OF RECENTLY PERFORMED EXPERIMENTS
ON THE ORL DATABASE

APPENDIX
COMPARISONS WITH OTHER APPROACHES
To compare our proposed recognition system against the popular face-recognition methods, we perform our proposed system
on an Olivetti Research Laboratory (ORL) database, Cambridge
University, Cambridge, U.K.1 All the 400 patterns from the ORL
database are used to evaluate the face-recognition performance
of our proposed method. The ORL face database is composed
of 400 patterns of ten different patterns for each of 40 distinct
individuals. The variations of the patterns are across pose, size,
time, and facial expression. All the images were taken against a
dark homogeneous background with the subjects in an upright,
frontal position, with tolerance for some tilting and rotation of
up to about 20 . There is some variation in scale of up to about
10% [27]. The spatial and gray-scale resolution of the patterns
are 92 112 and 256, respectively.
The training set and test set are derived in the same way as
in [6], [16], [24], and [25]. A total of 200 patterns were randomly selected as the training set and another 200 patterns as the
testing set, in which each individual has five patterns. Next, the
training and testing patterns were exchanged and the experiment
was repeated one more time. Such procedures were carried out
several times. In the following experiments, the National Football League (NFL) error rate was the average of the error rates
obtained by three runs (three runs [6], four runs [26], six runs
[16], and five runs [27]).
The face-recognition procedure consists of 1) a feature extraction step where the feature representation of each training or
test pattern is extracted by PCA + FDA (fisher discriminant analysis) [16], and 2) a classification step in which each feature representation obtained is input into the proposed fuzzy clustering
and parallel NN system. A 1% error rate was obtained when 25
features were used. This is better than the result (error rate of
1.92%) reported by Er et al. [16], where the feature extraction
step was the same as ours and the RBFNN was used in the classification step. It should be noted at this point that PCA + FDA is
used in Step 1) since the facial patterns from the ORL database
are variable in the pose and facial expression. As mentioned by
Er et al. [16], the PCA retains unwanted variations caused by
lighting, facial expression, and other factors, the fisherface paradigm aims at overcoming the drawback of the eigenface paradigm by integrating FDA criteria. Otherwise, it is mentioned
by Lu et al. [27] that fisherfaces may lose significant discriminant information due to the intermediate PCA step. Therefore,
PCA is used to extract features in our aforementioned experiment since the variation of the patterns in our database is slight.
Comparisons with Cable News Network (CNN) [6], RBFNN
[16], NFL [26] and direct fractional LDA (DF-LDA) [27] performed on the same ORL database are shown in Table XIV. It
1The ORL database is available from http//www.cam-orl.co.uk/face-database.html

can be seen that the performance of our proposed method overall


is superior to those of the other known methods when using the
ORL database.
REFERENCES
[1] A. Z. Kouzani, F. He, and K. Sammut, Towards invariant face recognition, Inf. Sci., vol. 123, pp. 75101, 2000.
[2] K. Lam and H. Yan, An analytic-to-holistic approach for face recognition based on a single frontal view, IEEE Trans. Pattern Anal. Mach.
Intell., vol. 20, no. 7, pp. 673686, Jul. 1998.
[3] P. J. Phillips, Matching pursuit filters applied to face identification,
IEEE Trans. Image Process., vol. 7, no. 8, pp. 11501164, Aug. 1998.
[4] M. A. Turk and A. P. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci., vol. 3, pp. 7186, 1991.
[5] T. Yahagi and H. Takano, Face recognition using neural networks with
multiple combinations of categories, J. Inst. Electron. Inf. Commun.
Eng., vol. J77-D-II, no. 11, pp. 21512159, 1994, (in Japanese).
[6] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, Face recognition: A convolutional neural-network approach, IEEE Trans. Neural
Netw., vol. 8, no. 1, pp. 98113, Jan. 1997.
[7] S. H. Lin, S. Y. Kung, and L. J. Lin, Face recognition/detection by
probabilistic decision-based neural network, IEEE Trans. Neural
Netw., vol. 8, no. 1, pp. 114132, Jan. 1997.
[8] H. H. Song and S. W. Lee, A self-organizing neural tree for large-set
pattern classification, IEEE Trans. Neural Netw., vol. 9, no. 3, pp.
369380, May 1998.
[9] C. M. Bishop, Neural Networks for Pattern Recognition. London,
U.K.: Oxford Univ. Press, 1995.
[10] J. L. Yuan and T. L. Fine, Neural-network design for small training
sets of high dimension, IEEE Trans. Neural Netw., vol. 9, no. 1, pp.
266280, Jan. 1998.
[11] X. Xie, R. Sudhakar, and H. Zhuang, Corner detection by a cost minimization approach, Pattern Recognit., vol. 26, no. 12, pp. 12351243,
1993.
[12] S. K. Oh and W. Pedrycz, Multi-FNN identification based on HCM
clustering and evolutionary fuzzy granulation, Simulation Modelling
Practice Theory, vol. 11, no. 78, pp. 627642, 2003.
[13] J. Bezdek, A convergence theorem for the fuzzy ISODATA clustering
algorithms, IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-2, no.
1, pp. 18, Jan. 1981.
[14] W. Pedrycz, Conditional fuzzy clustering in the design of radial basis
function neural networks, IEEE Trans. Neural Netw., vol. 9, no. 4, pp.
601612, Jul. 1998.
[15] X. Wu and M. J. Er, Dynamic fuzzy neural networks: A novel approach to function approximation, IEEE Trans. Syst., Man, Cybern.,
B, Cybern., vol. 30, no. 2, pp. 358364, Apr. 2000.
[16] M. J. Er, S. Wu, J. Lu, and H. L. Toh, Face recognition with radial
basis function RBF neural networks, IEEE Trans. Neural Netw., vol.
13, no. 3, pp. 697710, May 2002.
[17] F. Yang and M. Paindavoine, Implementation of an RBF neural network on embedded systems: Real-time face tracking and identity verification, IEEE Trans. Neural Netw., vol. 14, no. 5, pp. 11621175,
Sep. 2003.
[18] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, Face recognition using kernel direct discriminant analysis algorithms, IEEE Trans.
Neural Netw., vol. 14, no. 1, pp. 117126, Jan. 2003.
[19] B. K. Gunturk, A. U. Batur, and Y. Altunbasak, Eigenface-domain
super-resolution for face recognition, IEEE Trans. Image Process.,
vol. 12, no. 5, pp. 597606, May 2003.
[20] B. L. Zhang, H. Zhang, and S. S. Ge, Face recognition by applying
wavelet subband representation and kernel associative memory, IEEE
Trans. Neural Netw., vol. 15, no. 1, pp. 166177, Jan. 2004.

160

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 18, NO. 1, JANUARY 2007

[21] Q. Liu, X. Tang, H. Lu, and S. Ma, Face recognition using kernel
scatter-difference-based discriminant analysis, IEEE Trans. Neural
Netw., vol. 17, no. 4, pp. 10811085, Jul. 2006.
[22] W. Zheng, X. Zhou, C. Zou, and L. Zhao, Facial expression recognition using kernel canonical correlation analysis (KCCA), IEEE Trans.
Neural Netw., vol. 17, no. 1, pp. 233238, Jan. 2006.
[23] X. Tan, S. Chen, Z. H. Zhou, and F. Zhang, Recognizing partially occluded, expression variant faces from single training image per person
with SOM and soft k-NN ensemble, IEEE Trans. Neural Netw., vol.
16, no. 4, pp. 875886, Jul. 2005.
[24] D. Valentin, H. Abdi, A. J. OToole, and G. W. Cottrell, Connectionist
models of face processing: A survey, Pattern Recognit., vol. 27, no. 9,
pp. 12091230, 1994.
[25] K. W. Wong, K. M. Lam, and W. C. Siu, An efficient algorithm for
face detection and facial feature extraction under different conditions,
Pattern Recognit., vol. 34, no. 10, pp. 19932004, 2001.
[26] S. Z. Li and J. Lu, Face recognition using the nearest feature line
method, IEEE Trans. Neural Netw., vol. 10, no. 2, pp. 439443, Mar.
1999.
[27] J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, Face recognition
using LDA-based algorithm, IEEE Trans. Neural Netw., vol. 14, no.
1, pp. 195200, Jan. 2003.
[28] A. S. Nugroho, S. Kuroyanagi, and A. Iwata, Efficent subspace
learning using a large scale neural network combNET-II, in Proc. 9th
Int. Conf. Neural Inf. Process., Nov. 2002, vol. 1, pp. 447451.
[29] R. Setiono, Feedforward neural network construction using cross validation, Neural Comput., vol. 13, pp. 28652877, 2001.
Jianming Lu received the M.S. and Ph.D. degrees
from the Graduate School of Science and Technology, Chiba University, Chiba, Japan, in 1990 and
1993, respectively.
In 1993, he joined Chiba University as an Associate in the Department of Information and Computer
Sciences. Since 1994, he has been with the Graduate
School of Science and Technology, Chiba University,
where, in 1998, he became an Associate Professor.
His current research interests are in the theory and
applications of digital signal processing and control
theory.
Dr. Lu is a member of the Institute of Electronics, Information and Communication Engineers (IEICEJapan), the Society of Instrument and Control Engineers (SICEJapan), the Institute of Electrical Engineers of Japan (IEEJ), Japan
Society of Mechanical Engineers (JSME), and Research Institute of Signal Processing, Japan.

Xue Yuan received the B.S. degree from the School


of Information Science and Engineering, Northeastern University, Shenyang, China, in 1999 and the
M.S. degree from the Graduate School of Science
and Technology, Chiba University, Chiba, Japan, in
2004, where she is currently working towards the
Ph.D. degree.
Her current research interests include image analysis and pattern recognition.

Takashi Yahagi (M78SM05) received the B.S.,


M.S., and Ph.D. degrees in electronics from the
Tokyo Institute of Technology, Tokyo, Japan, in
1966, 1968, and 1971, respectively.
In 1971, he joined Chiba University, Chiba, Japan,
as a Lecturer in the Department of Electronics. From
1974 to 1984, he was an Associate Professor, and
in 1984 he became a Professor in the Department
of Electrical Engineering. From 1989 to 1998,
he was with the Department of Information and
Computer Sciences. Since 1998, he has been with
the Department of Information Science of the Graduate School of Science and
Technology, Chiba University. He is the author of Theory of Digital Signal
Processing volumes 13 (1985, 1985, and 1986), Digital Signal Processing and
Basic Theory (1996), and Digital Filters and Signal Processing (2001) and the
coauthor of Digital Signal Processing of Speech and Images (1996), VLSI and
Digital Signal Processing (1997), Multimedia and Digital Signal Processing
(1997), Neural Network and Fuzzy Signal Processing (1998), Communications
and Digital Signal Processing (1999), Fast Algorithms and Parallel Signal
Processing (2000), and Digital Filters and Signal Processing (Corona: Tokyo,
Japan, 2001). He is the Editor of the Library of Digital Signal Processing
(Corona: Tokyo, Japan). His current research interests are in the theory and
applications of digital signal processing and other related areas.
Dr. Yahagi is a Fellow of the Institute of Electronics, Information and Communication Engineers (IEICEJapan), and Research Institute of Signal Processing, Japan. He has been the President of the Research Institute of Signal
Processing, Japan, since 1997, and the Editor-in-Chief of the Journal of Signal
Processing.

You might also like