Professional Documents
Culture Documents
Abstructaecent artificial neural network research has focused company, is only able to recognize a few people, but it can
on simple models, but such models have not been very successful do so with an accuracy of between 89% and 100%.
in describing complex systems (such as face recognition). This Flocchini et al. [ l l ] described an image processing and
paper introduces the artificial neural network group-based adap-
tive tolerance (GAT) tree model for translation-invariant face neural network system capable of recognizing human faces
recognition, suitable for use in an airport security system. GAT from different perspectives. Their system is likewise limited
trees use a two-stage divide-and-conquertree type approach. The in the number of faces it can recognize, but it does so with
first stage determines general properties of the input, such as an accuracy of between 80% and 100%.A facial recognition
whether the facial image contains glasses or a beard. The sec- system for law enforcement purposes has been developed for
ond stage identifies the individual. Face perception classification,
detection of front faces with glasses andlor beards, and face the West Midlands Police, U.K. [34]. A set of 96 police
recognition results using GAT trees under laboratory conditions photographs was transferred to video tape. Using computer
are presented. We conclude that the neural network group-based graphics and neural-network techniques, 5% noise was then
model offers significant improvement over conventional neural- added to the target images prior to training. The correct face
network trees for this task. was able to be picked out with a certainty of 62.5%.
I. INTRODUCTION To date, however, no automatic human face recognition
system has been developed which is capable of operating in
A. Automated Face Recognition real-time, under variable lighting conditions, and with large
size face databases.
T HE application of interest in the present study is the
automatic recognition of human faces-it is within this
context we develop the artificial neural network group-based
In this paper, we first introduce GAT tree and then demon-
strate how this model is able to solve complex pattern recog-
adaptive tolerance (GAT) tree model. nition involving noncontinuous, nonsmooth decision func-
Hundreds of papers exist in the scientific literature involving tions-a typical such problem being automatic face recog-
human face recognition, but only a few deal with the automatic nition. We subsequently demonstrate that GAT tree provides
recognition of faces using computers. Early (conventional) superior classification performance compared with alternate
approaches included distance measures [27], algebraic extrac- automatic face recognition approaches, such as the ones men-
tion or principal component analysis [21], [28], and isodensity tioned above.
lines [30], [26]. More recently, custom VLSI (very large scale
B. Neural-Network Trees
integration) image correlator techniques have been applied
~41. Hierarchical classification maps naturally onto binary tree
Artificial neural network approaches include unsupervised structures; each leaf node corresponds to a separate category,
networks [ 101, multilayer perceptrons (MLP’s) hackpropaga- and decisions are made in descending down through each
tion [29], and self-organizing maps [5]. Specialized architec- intermediate node as to whether or not the current input sample
tures such as WISARD [ 11 and dynamic link architectures [22] belongs to a specific subclass. Only a few levels-N-are
have also been applied to this problem. usually required in practice to discriminate between a large
We now consider briefly two of the more successful attempts number ( z N )of categories, which facilitates real-time oper-
at automatic face recognition in recent times. ation. It is not surprising, then, that several researchers have
Bouattour et al. [3] developed a human face recognition sys- attempted to combine neural-network classifiers with hierar-
tem using MLP’s. Their database consisted of 650 grey scale chical trees to boost classification performance. Accordingly,
images, with approximately 70 images per person. Bouattour’s we include here a brief survey of neural network trees.
system, subsequently manufactured by the French MIMETICS The obvious starting point is to construct a hierarchical
Manuscript received June 25,1994; revised January 30,1995 and September network in which each node of the tree corresponds to a
9, 1995. This work was supported by a research grant from SITA (Societe single neuron. Fang et al. [9], for example, used unsupervised
Internationale de Telecommunications Aeronautiques) at the Center For In- (competitive) learning on such a neuron. An interesting finding
formation Technology Research, University of Wollongong, Australia.
M. Zhang is with the Department of Computing and Infomation Systems, of this work was that some their algorithms tended to produce
Faculty of Business and Technology, University of Western Sydney, NSW neural trees in which the node weight vectors approximated
2560, Australia. the probability distribution of the sample patterns.
J. Fulcher is with Department of Computer Science, University of Wollon-
gong, Wollongong, NSW 2522, Australia. Armstrong et al. [2] presented an adaptation algorithm for
Publisher Item Identifier S 1045-9227(96)02877-9. binary tree networks, with each node of the tree performing
1045-9227/96$05.00 0 1996 IEEE
556 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL I, NO. 3, MAY 1996
the Boolean OR or AND function. Their binary tree adaptation Now most of these methods only use top choice information;
algorithm was used to recognize optical characters. as the confidence levels of the top choices drop, second choice
Sanger’s [31] approach was to use (Fourier) basis functions (near miss) information becomes increasingly more important.
at each node, and to dynamically grow new trees in tandem to Ho et al. [16] proposed combining rankings using methods
better approximate continuous functions in high-dimensional which either reduce the given set of classes (using intersection
input spaces. Least mean squared learning is used, together or union), or reranking them (using highest rank, Borda count,
with an algorithm which grows the tree one dimension at a and logical regression). They demonstrated the effectiveness of
time, and in the process reduces the number of basis functions this approach in the recognition of degraded machine-printed
and coefficients which need to be computed. The ultimate tree characters.
size depends on both the input data distribution and the specific Drucker et al. [8] also addressed this problem of classifier
function which is being approximated. fusion (and likewise in an OCR context-the recognition
Adaptive (dynamic) growth is also employed in Sankar and of ZIP codes in this instance). They proposed a boosting
Mammone’s [32] neural tree network-NTN. Note, however, algorithm, based on “probably approximately correct learning”
that in this case each node is an entire neural network. The [35], [33].Using this algorithm, they constructed an ensem-
input feature space is recursively partitioned into subregions, ble of neural networks which led to improved performance,
with each “leaf” subregion being assigned a different class compared with that of a single network. Drucker followed a
label (by contrast, conventional decision trees partition the different approach than is usual with multiclassifier systems,
feature space using hyperplanes constrained to be perpendic- in which each classifer is trained independently prior to com-
ular to the feature axes). Apart from this recursive growth bining them. With boosting, the parameters of each subsequent
algorithm, an optimal pruning algorithm is also described, network depend on the previous networks.
which leads to improved generalization ability (growing the Ho [16] also proved that a multiple classifier system is a
smallest NTN to correctly classify a given training data set is powerful technique for solving difficult pattern recognition
an NP-complete problem). Sankar and Mammone report better problems involving large, noisy class sets. The problem with
classification performance with their NTN structure, compared face recognition, however, is determining the correct multiple
with both neural networks and conventional decision trees. classifier structure. This structure should not only be more
Note, however, that attempts to recognize complex patterns accurate, but moreover be capable of being upgraded to
using NTN invariably lead to problems with tolerance and recognize new tasks without the need for retraining.
accuracy. This is because nodes comprising single neural This is the second motivation for developing the GAT tree
networks are incapable of recognizing complex patterns. model-in other words the ability to handle patterns which
Jordan and Jacobs [19] developed a tree-structured ar- involve large class numbers and noisy inputs.
chitecture suitable for supervised learning, based on a hi- The function approximation capabilities of neural network
erarchical mixture of experts statistical model. Learning is architectures have recently been investigated by several au-
treated as a maximum likelihood problem, to which they apply thors [7], [13], [17]. Hornik [18] concluded:
expectation-maximization (EM) to adjust network parameters. standard multilayer feedforward networks with continu-
Zhang et al. [38] developed the NAT (neural network ous, bounded and nonconstant activation function can
adaptive tolerance) tree technique for face recognition, in approximate any continuous function with respect to
which every node consists of a neural network in tolerance uniform distance.
space (NAT tree is also able to be adaptively connected and
More recently, Leshno [24] proved the following general
grown).
result:
Despite these promising early beginnings, neural network-
based tree research is in its infancy. Multilayer feedforward networks with a nonpolynomial
activation function can approximate any continuous
C. Motivations f o r Developing the GAT Tree Model function to any degree.
The decision functions for complex patterns are invariably The question then arises, are multilayer feedforward networks
noncontiniuous and nonsmooth. Single neural networks are with a nonpolynomial activation function able to approximate
not capable of simulating noncontinuous, nonsmooth functions any piecewise continuous function to any degree? This paper
very well; the GAT tree model, by contrast, is capable. This presents two deductions, and proves that neural network
is thus the first motivation for developing GAT trees. groups can in fact do so. This is the third motivation for the
Pattern recognition using single classifiers is difficult for development of the GAT tree model.
problems involving large numbers of classes and/or noisy The fourth motivation for this paper is to present an artificial
inputs. This has led some researchers to use multiple classifiers neural network model suitable for translation-invariant face
in an attempt to improve classification performance [20]. recognition, as appropriate for inclusion in airport security
Choices arrived at by individual classifiers are typically ranked systems. To be able to recognize small numbers of “people
on the basis of confidence level. The problem then becomes of interest” (target faces) from amongst thousands of pas-
one of how to combine these rankings. Earlier attempts to sengers, (front view) passport photos are assumed. Real-time
address this problem of classifier fusion included decision processing using a DOS personal computer is desirable in such
regions [ 151, voting methods [25] and prediction by top-choice as environment (since most airports currently run PC-based
combinations [ 161. systems).
=
ZHANG AND FULCHER FACE RECOGNITION 551
D.Paper Summary
CAMERA
Section I of this paper has just presented an overview
of and motivations for the work undertaken in this paper.
The overall face recognition system is briefly described in NN FACE
LOCATION
Section 11. In Section 111, the GAT tree model is introduced.
The GAT tree model is further developed in Section IV
(with a more thorough mathematical description included in
the Appendix). The results of using GAT tree for real-time
face perception classification (Section V), detection of (front)
faces with glasses and/or beards (Section VI), as well as face I / / I \ \ \
recognition (Section VII) under laboratory conditions are also
presented. A brief conclusion is included in Section VIII.
11. FACERECOGNITION SYSTEM
FOR AIRPORTSECURITY (SITA)
Machine facial recognition is rendered difficult when a face
is either topologically deformed, translated in three dimen-
sions, or when the background environment is complex.
Fig. 1 shows the overall face recognition system developed
during the course of the present study. The main features of
the system are:
1) At the low level, the camera captures images of people
I16 10, I1J2, ... ,hc 113 Ill I1
as they make their way to the check-in desk. Faces
are located within these captured images using neural
HIGH LEVEL FACE RECOGNITION
network techniques.
2) GAT tree is used in the middle level for face recogni- * I o , h h , ... ,114, Iu: Outputs of Face Recognition GATT=
tion, face perception, using normalized facial images, he: Glasses Face Output:I,,: Capture Face Again Output;Ils:Beard Face Output;
being performed first. The face is classified either as a Fig. 1. Translation-invariantface recognition system.
front face, tilted to the left, tilted right, rotated to the
left, or rotated right. If this first classification attempt A. Neural Network Group-Based Nodes
fails, GAT tree requests the system to capture the
face a second time. Following successful classification, Inputs of Node I f ( N i ,j ) and I d ( N i ,j ) (Fig. 2): The inputs
translation-invariantfaces are recognized by adaptively to each node comprise a fire input-If(iVi,j)--and a data
connecting nodes and/or growing the GAT tree in input-Id(Ni,j). The fire input connects to the output of
tolerance space. Simultaneously, faces with glasses the parent node, and is a binary digit (“0” or “1”). The
and/or beards are classified using the same (GAT tree) data input is the pattern data which is to be recognized (or
technique. trained). For recognition, the data input is a facial image
3) The high level uses neural network training databases, (we used a 28 * 28 pixel matrix, each pixel represented
face databases, fact databases, rule bases, knowledge by one of 256 grey scale levels). With testing, the data
bases, and reasoning networks to perform more intelli- input-Id(Ni, j)-is the M I ( i ,j ) pattern we are attempting
gent (high level) recognition. to recognize. During training, the input data are the translated
training data MIu(i, j ) [see Appendix, Section Dl)].
This paper only concerns itself with the middle level of this
Node Operator 0-OPNO: During testing, when the fire
face recognition system, namely the GAT tree model, together
input becomes one, the node “fires” and the data input can
with its application to face recognition. be accepted into the node (otherwise the data input can
not be input into the node). During training, however, the
111. NEURALNETWORK GROUP-BASED TREE data input can always be accepted into the node. We use
Fig. 2 shows the structure of a GAT tree node. The signifi- OPNO to describe this function (see Appendix). So in the
cant feature of this node is that it is neither an artificial neuron case of either training or firing, the input data (for example,
nor an artificial neural network, but rather a neural-network faces) is fed into the node and the neural-network group
group (”1, “2, . ,NNk) in tolerance space [37], [6]. input-I(Nij)-becomes the input data-Id( N i j ) .
The basic function performed by the node is classification. Node Operator I-OPNI: All K neural networks (neural-
Moreover, since each node consists of a neural-network group, network groups) are involved in training or testing using the
it is able to function as a complex pattern classifier. The neural-network group input I ( N i ,j ) . After training, the best
basic function of each node can be described by a set op- weights are found and fixed for each neural network (we use
erator OPN = {OPNO, OPN1, OPG, OPN2) [see Appendix, MLP as the basic neural network). During training, each neural
Section D2)]. We include a brief discussion here of neural network is trained to cater for a special case. For example,
network group-based nodes: neural network 1 (“1) could be trained to recognize the
558 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7, NO. 3, MAY 1996
Id(Ni,j) , I IflNi,j) Inputs Sections C and D2)] to distinguish the test results, since
tolerance space has a more general meaning than threshold
[6]. We use OPN2 to perform this discrimination. Finally, it is
possible to classify each pattern into several different classes.
We use O o ( N i , j ) O , l ( N i , j ) ,and/or 0 2 ( N i , j ) to represent
recognition results. For example, for front face recognition,
if the pattern belongs in tolerance ( l , O o ( N i , j ) = 1, this
means it is a front face. If the pattern belongs in tolerance
( 2 , 0 1 ( N i , j ) = 1, however, this means it is not a front,
face. The node output could be either binary, ternary, or
higher, as appropriate. In the case of face recognition, we
found a mixture of binary and ternary to be most suitable
(real-world face recognition systems could well require higher
order outputs).
The features of “OR’ and “AND” neural network groups are
not only that they can approximate any continuous function
(and to any degree of accuracy), but also that they are able to
/ \ approximate any kind of multiple-peak piecewise continuous
function with nonsmooth and noncontinuous point(s) (and
Oo(Ni,j ) OIfNi,j) 02INi,j) Outputs
again to any degree of accuracy). Furthermore, “OR’ and
“AND” neural-network groups are able to approximate any
7I-r
Oo(Ni,j) Ol(Ni,j) 02(Ni,j) Outputs
in Section I-B; Brieman’s regression trees partition the input
space using a sequence of binary splits into terminal (leaf)
nodes, in which the predicted response is a constant value.
Fig. 2. Neurd-network group-based tree node. Such regression trees exhibit performance comparable with
standard linear regression.
central portion of facial image M I o ( z , j ) , neural network B. “OR” and “AND’’ Neural-Network Group Features
2 (“2) the lower-left portion of facial image M I l ( i , j ) , The features of “OR” and “AND’ neural-network groups
and so on. Thus after training, the K neural networks are can be derived in the following manner:
able not only to recognize center faces, but also lower- Leshno E241 proved the following general result:
left faces, and so on. Our model is thus able to solve the
shift invariance problem very well. The trained K neural A standard multilayer feedforward network with a lo-
networks can then be used for testing. We use node operator cally bounded piecewise continuous activation function
l-OPN1-to describe this procedure. After O P N 1 , we can approximate any continuous function to any degree
obtain O(NNl), O(NN2), eO(NNlc), the output from each
e e ,
of accuracy if and only if the network’s activation
of the K neural networks. function is not a polynomial.
Group Operator-OPG: The K outputs of the neural Two deductions follow directly from this.
networks-O(NNl), O(NN2), . . . ,O(NNk)-are then found Deduction 1: Consider a neural network “OR’ function
using the “*” . “*” operator (where “*” corresponds to either group, in which each member is a standard multilayer feedfor-
AND or OR); we use the group operator-OPG-to represent ward neural network, with locally bounded, piecewise continu-
this function. After applying OPG [Appendix, Section D2)], ous (rather than polynomial) activation function and threshold.
the neural-network group output-O( Nij)-is obtained. Now Each such group can approximate any kind of multiple-
by using neural-network groups, as well as AND or OR group peak piecewise continuous function with nonsmooth and non
products, all the necessary conditions are satisfied from a continuous point(s), and to any degree of accuracy.
group theory perspective [38]. Deduction 2: Similarly, consider a neural network “AND”
Node Operator 2-OPN2: The neural-network group out- function group, in which each member is a standard multilayer
put O ( N i ,j) lies in the range 0 . . . 1. During testing, O(Ni, j) feedforward neural network, with locally bounded, piecewise
could be any real number between zero and one. We use Continuous (rather than polynomial) activation function and
Zeeman’s [37] tolerance space definition [See Appendix, threshold. Each such group can approximate any kind of
ZHANG AND FULCHER FACE RECOGNITION 559
ml IhI
the neural network is the target face function, which we found
in this study to be always sole-peak, but with nonsmooth and
noncontinuous point@). Let us now consider Fig. 3(b). “Part
1” represents the target function for faces without glasses or
beards, “part 2” the function with glasses, whereas “part 3”
corresponds to the function with beards. No single neural
network is able to approximate the sole-peak function with
x2
X nonsmooth and noncontinuous points of Fig. 3(b).
For face recognition, especially under real world conditions,
we need to be able to approximate multiple-peak or sole
pan2 Part 1 Part 3
peak functions with nonsmooth and noncontinuous points. In
1
our experiments on front face recognition, for example, the
output from the neural network was a multiple-peak function,
0.8
which on occasions included nonsmooth and/or noncontinuous
0.6
point(s). Thus by using the “OR’ neural-network group to
0.4 approximate the front face function, a much better classifica-
02 tion accuracy resulted compared with that obtained using a
0 single neural network. The output from the neural network for
Xl
* x target face recognition is always sole-peaked, and sometimes
(b)
includes nonsmooth and/or noncontinuous point(s). Accord-
ingly, if the “AND’ neural network is used for approximation,
Fig. 3. The features of ”OR’ and “A”’
neural network group.
better accuracy is once again obtained, compared with a single
sole-peak piecewise continuous function with nonsmooth and neural network.
noncontinuous point(s), AND to any degree of accuracy. In this study, both “OR’ and “AND’ neural-network groups
Fig. 3 shows the features of neural network groups in were used as the nodes for GAT tree (see Fig. 2), resulting in
one dimension. Fig. 3(a) shows an “OR’ group of neural more accurate and efficient face recognition.
networks capable of approximating a multipeak function with
nonsmooth point (xl) and noncontinuous point (x2). Fig. 3(b), IV. GAT TREEMODEL
by contrast, shows an “AND” group of neural networks for
The basic GAT tree model of Fig. 4 comprises both bi-
approximating a sole-peak function with nonsmooth point (xl)
nary and ternary trees. Adaptive connections and adaptively
and noncontinuous point (x2). A single neural network is
growing trees, the nodes of which are themselves neural net-
incapable of approximating either multiple-peak or sole-peak
works, were developed during the present study for translation-
functions containing nonsmooth and noncontinuous points.
invariant face recognition. We can describe adaptive connec-
This explains why neural-network groups exhibit more fea-
tion and growth in terms of tolerance space theory [37]. Fig. 4
tures than single neural networks.
illustrates adaptive growth within a GAT tree. We use the
C. Face Recognition Application adaptive operator (OPA) to represent such growth.
Because the output O ( N 1 , k ) of node NZ,k is within
For shift-invariant front face recognition, we use the center
tolerance J3(NZ,k ) , node N m , n is added and fired. In such a
front face as one training case, and left- and right-shifted (by
manner the GAT tree “grows” a node. The adaptively growing
two pixels) versions of this same face as the other two training
tree is therefore very useful for adding new faces which need to
cases. The center, left- and right-shifted front faces constitute
be recognized. Also shown in Fig. 4 are adaptive connections
the input data to the neural network: the neural network output
within the GAT tree. Because output O(N i , j ) of node N i , j is
is the recognition function. Following training, we observe
within tolerance J 3 ( N i , j ) node
, N u , ‘U is added and connected
three peaks, as indicated in Fig. 3(a): peak 1 corresponds to
to node N i , j. One output of N u , IJ is connected to an output
the recognition function for left-shifted front faces, peak 2 to
of node N s , t . Such an adaptively connected GAT tree is very
center front faces, and peak 3 to right-shifted front faces. For
efficient for recognizing topologically deformed faces.
our present purposes, Fig. 3 demonstrates that piecewise con-
tinuous functions with nonsmooth and noncontinuous point(s)
exist in the real world. The point(s) of intersection between A. Translation Invariant Face Recognition
peaks were always found to be nonsmooth, and occasionally When a face is shifted or rotated, face recognition be-
noncontinuous in our experiments. No single neural network comes considerably more difficult. To solve this, a translation-
is capable of approximating such a function comprising three invariant face recognition technique was developed. The basic
peaks and nonsmooth, noncontinuous points. idea is to include all shifted and rotated faces in two di-
Now whenever we recognize a target face in the real world, mensions as training examples for the neural-network node
it could be with or without a beard, and with or without (we use the OPT-to represent this). Thus after training, the
560 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I, NO. 3, MAY 1996
=I
40
h NNG Node
Face
Rh
Glasses Face
NG Node
VI. RECOGNITION OF FRONTFACESWITH Fig. 7. Front glasses and beard face classification using GAT tree.
GLASSESAND/OR BEARDSUSING GAT W E
The four-level GAT tree structure shown in Fig. 7 was used be interpreted as having better discrimination capability than
to recognize front faces containing glasses and beards. The what is actually the case, the Fisher criterion [12] is also
function of the GAT tree node at level-0 is to recognize front incorporated here (the Fisher criterion is a measure of the
faces. If it is a front face, an adaptive GAT node is used to separation between discriminant classes-the bigger the Fisher
grow the tree at level- 1. In level 2, two GAT nodes are used criterion, the better the separability).
to recognize glasses and bearded faces, respectively. In level Fig. 8(a) shows the comparative results for front beard face
3, nodes are used to label different faces. classification. The variances of GAT tree are all smaller than
To recognize front beard faces, six faces were chosen to the variances exhibited by the general tree. Thus GAT tree is
train each GAT node (three front beard faces, three nonbeard seen to have better discrimination capability than the general
faces), with 70 faces being reserved for testing. The output of tree. Moreover, the GAT tree has a bigger Fisher criterion
the GAT tree node converged after 500 iterations. The network value, which means that it has better separability for front
was then tested using faces not previously seen; these test beard face classification.
results are summarized in Fig. 8. The outputs of the artificial Fig. 8(b) shows the comparative results for front glasses
neural network group are all greater than 0.92 for the four face classification. Once again, the variances of the GAT tree
front beard face test cases. For the remaining 66 people (not are all smaller than the variances of the general tree, indicating
front beard faces), the outputs of the GAT tree node N2,l are better discrimination ability. Likewise the much bigger Fisher
all less than 0.92. criterion value indicates that GAT tree has better separability
To recognize front faces with glasses, five faces were chosen for front glasses face classification.
for training GAT nodes; two were front glasses faces, three
were not, and 66 faces were reserved for testing. The output VII. FACERECOGNITIONUSING GAT TREE
of the GAT tree nodes converged after 300 iterations. Once The GAT tree model used for front face recognition is shown
trained, the network was tested with faces it had not previously in Fig. 9. This model is divided into four parts.
met. The outputs of the artificial neural network group were Face Perspective Recognition: One GAT tree node is used
all more than 0.85 for the three front glasses test faces. For for face perspective recognition, especially for front faces
all other 63 (nonfront glasses) faces, the outputs of the GAT (since this is the only information usually available for airport
tree node were all less than 0.3. face recognition security systems).
Comparative results using the histogram and variance data If the input face is a front face, it will revert to the basic
between GAT tree and a general tree are presented in Fig. 8. GAT tree model. On the other hand, if it is not, the system
Now since the smaller variance within the histogram could will make a second attempt to capture a front face.
ZHANG AND FULCHER FACE RECOGNITION 563
Number of Face
TO
Front Face Rmxmtion Non Front Face Rewrnition
ai az aa a4 as ae a7 aa ae am 001 aw I
‘I1 112 0
1 oz f
Meanof Meanof Varianceof Varianceof
NonFront Fmnt NonFmnt Front fisher
BdFaces BeatI3Faces BeardFm BeardFaces Criterion
GATTree 0.400 0.9425 0.0361 0.000169 8.11
GeneralTree 0.455 0.9350 0.0545 0.000225 4.21
I I I I I I I
Number of Face
G6 0
0.1 02 oa OA 0.6 os 0,7 0.0 o s 0.9 0% 1
111 ‘I2 01 02 f
Mean of Mean of Variance of Variance of
NonFront Front NonFront Front Fisher
GlassesFaces GlassesFaces GlassesFaces GlassesFa~s Criterion
GATTree 10.173 0.903 I 10.00443 I0.00142 191.09
I
GenemITree 0.355 0.900 I 10.04370 I0.00167 I 6.55 Fig. 9. GAT tree for front face recognition.
@I
Fig. 8. Front beard face and front glasses face classification. which the tree is to be connected. Such connection enables
the system to recognize the same people (deformed within
Basic GAT Tree Model: Levels zero through four of the tolerance). In this manner, adaptive connections are catered
basic GAT tree model are used to recognize front faces. for within GAT tree.
Different faces will be recognized at the different label (leaf) Adaptive Growth of GAT Tree: Adaptive growth of a GAT
nodes. To recognize faces, each artificial neural network node tree is an efficient means of adding a new face which needs
needs to be trained prior to testing. Each node of the GAT to be recognized. None of the weights of the basic GAT tree
tree is a three layer MLP neural network with the following model need to be changed to recognize new people. The only
configuration: thing which needs to be done is to find the appropriate adaptive
input layer-28 * 28 neurons, node, and grow a small GAT tree from it. This is a significant
hidden layer-three neurons, advantage over conventional ANN’S, which would need to be
output layer-one neuron. retrained using the new (expanded) training set!
This training takes between several minutes and one hour; by In Fig. 9, nodes NNG5,O and NNG5,l have been identified
comparison, recognition takes only around one second. The as the adaptive nodes from which the tree is to be grown. Such
model of Fig. 9 only describes the basic operation of GAT tree. growth enables the system to recognize two new people. Thus
To recognize 1024 target faces, the GAT tree model requires adaptive growth is also possible within GAT trees.
only 12 levels! This means that if a 12-level basic GAT tree Fig. 10 shows the results obtained using GAT tree for face
model is used, recognition of a specific person (1 of 1024 faces recognition. The total number of faces is 780 (10 different
out of one million people) takes only about two seconds. views of 78 people): 20 faces were chosen for training, 760 for
Adaptive Connections within GAT Tree: GAT trees have testing, and eight designated as target faces. This simulation
also proved useful in recognizing topologically-deformed is similar to the real-world situation encountered at airports:
faces. For example, if the difference between a front smiling many people pass through customs/airline check-in, but only
face and a front nonsmiling face is within tolerance, adaptive a very small number of these are of interest to the relevant
connections within GAT tree are capable of recognizing both. authorities. GAT tree was found to make no mistakes under
In Fig. 9, adaptive nodes will be fired if the artificial neural laboratory conditions in recognizing 80 facial images (10
network output of a node is within tolerance, which results in different views of eight people) out of 780 different faces.
that particular person being labeled. For example, node NNG4, The effect of random noise on recognition accuracy is also
6 has been identified as the adaptive connection node from examined in Fig. 10. The results are as follows: when 10%
564 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7, NO. 3, MAY 1996
-GAT TREE We have demonstrated that GAT tree is one kind of neural
.- GENERALTREE
network group-based model which not only offers a means
whereby we can describe very complex systems, but also
opens up an entirely new avenue for neural-network re-
search.
\
94
APPENDIX
The following definitions are needed to describe the GAT
tree model.
A. Image DeJinitions:
84 2
86 c 3 4 5 6 7
Let
95
N i = 256 (i = 1 , 2 , 3 )
and
GG = {GI, G2, G3)
then
.O1_
75
0 0.1 0.12 0.14 0.16 0.18 0.2
Gaussian Noise
0.3
I(i,j) is the color image which can be described as a
color human face.
Let
i = 1for Gi
and
Image size: 28*28; #Training Faces: 20; 8 target faces in 780 face dbase. G G = G1
Fig. 10. Noise versus face recognition accuracy. Digital image operator M I : Zr x Zc
then
random number noise is added, the accuracy of GAT tree is 5%
M I ( i ,j > is a black-and-white image which can be used
higher than for the general tree. The effect of Gaussian noise
to represent a face.
on recognition accuracy is likewise demonstrated in Fig. 10,
from which we see that when the gamma value of the Gaussian Label set: L = {1,2,..-,M},
noise exceeds 0.3, the accuracy of GAT tree is around 7% each label corresponds to a different human face.
higher than for a general tree. These results show that GAT E. Neural-Network DeJnitions:
tree is more noise tolerant than general trees. Ni,j : Neural network group-based (NNGB) node,
adaptive node and label node
VIII. CONCLUSION
where i : the level or deep of the GAT tree
This paper has presented the artificial neural network group-
based model-GAT tree. We have shown how it can be j : the j t h node n the i level
applied to a complex real-world problem, namely translation- I f ( N i , j ) :The fire input of the node N i , j
invariant face recognition (as would be encountered in an
airport security system). if I f ( N i , j ) = 1, the node has been fired
The results of GAT tree for real-time face perception classi-
fication, distinguishing between front glasses faces and faces
I d ( N i , j ) :The data input of the node N i , j
with beards under laboratory conditions have been presented. if I f ( N i , j ) = l , I d ( N i , j )
Addition of new target faces does not require retraining of the can be input to node N i , j .
network. Moreover, the GAT tree model is eminently suited
for large sized face databases. I ( N i , j ) : The input to NNGB node N i , j
ZHANG AND FULCHER FACE RECOGNITION 565
If z E x -+ (x,z)
E I. where we have the matrix shown at the bottom of the page,
where O P N l is one kind of neural-network operator (for
If 2 E X,yE Y, and (z,y) E 5
example MLP).
then (y,x) E I. OPG is used for all neural networks in the group
If x E X,yE Y , z E Z and (x,y) E c,(y,z) E 5
then (x,2 ) E I or ( 2 , ~ $Z
) E.
OPG: O ( N i , j )= O(NN1) * O(NN2) * O(NN3) * *..
* O(NNL)
D. GAT Tree Operator GAT: Considering the above defini-
tions, the GAT tree model can be written as where: * means AND or OR.
For the binary case
GAT tree operator GAT : M I ( i , j ) -+ L.
O P N 2 ( 0 ( N i , j ) ) : ( N i ,j ) -+ {Oo(Ni,j ) ,0 2 ( N i , j ) }
This means that after M I ( i ,j ) has been operated upon by the
the binary output is
GAT operator-which incorporates an adaptive function and
uses the translation invariant face recognition technique-an
object (human face) can be recognized by label set L.
The GAT tree operator GAT is the operator set
GAT = {OPT, OPN, OPP, OPL, OPA}.
For the ternary case
Each operator or operator set belongs to one of the following
four types OPN2(0(Ni,j)) : O(Ni,j)
I ) Translating Operator-OPT: Translating operatorOPT + { O o ( N i , j ) Ol(NZ,j),
, 02(Ni,j).}
uses the translation invariant face recognition technique and
is defined as the ternary output is
OPT(MI(i,j)) : M I ( i , j ) + M I u ( i , j ) ,=~0 , 1 . 2 , . * * N u
where
M I o ( i , j ) :center face of facial image M I ( i , j )
M I l ( i ,j ) : lower-left face of facial imageMI(i, j )
M I 2 ( i , j ) : lower face of facial imageMI(i, j )
OPNO(If(Ni, j),I d ( N i , j ) :
I ( N i , j )= I d ( N i , j ) = M I ( i , j ) , if I f ( N i , j ) = 1 for testing
{ I ( N i , j ) = cp if I f ( N i , j )= 0 for testing
I ( N i , j ) = I d ( N i , j ) = M I u ( z , j ) , for training
U = 1,2,3,*..,Nu
O P N l ( N i , j , I d ( N i , j ) ) : I ( N i , j ) O(NNL) IC = 1 , 2 * * . , k .
-+
566 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7, NO. 3, MAY 1996
[31] T. D. Sanger, “A tree-structured adaptive network for function approxi- John Fulcher (M’79) received the M.Sc. degree
mation in high-dimensional spaces,” IEEE Trans. Neural Networks, vol. from LaTrobe University, Melbourne, Australia, in
2, pp. 285-293, 1991. 1981.
[32] A. Sankar and R. J. Mammone, “Growing and pruning neural network He is currently a Senior Lecturer in Computer
tree network,” ZEEE Trans. Compur., vol. 42, pp. 291-299, 1993. Science at the University of Wollongong, Australia.
[33] R. Schapire, “The strength of weak learnability,” Machine Learning, He has authored several articles on artificial neural
vol. 5, no. 2, pp. 197-227, 1990. networks, most recently three chapters of Handbook
[34] S. Starkey et al., “Facial recognition for police purposes using computer of Neural Computing, to be published by the In-
graphics and neural network,” in Proc. Electron. Division Colloquium stitute of PhysicdOxford University Press. He also
Electron. Images Image Processing Security Forensic Sci., London, presented a paper, “Neural Network Alternatives,”
England, 1990. at the Sixth World Conference on Computers in
[35] L. Valiant, “A theory of the learnable,” Commun. ACM, vol. 27, no. 11, Education, held in Birmingham, U.K. His current research interests include
pp. 1134-1142, 1994. the application of ANN techniques to financial forecasting and the automatic
[36] K.-D. Wernecke, “A coupling procedure for the discrimination of mixed classification of ionograms,
date,” Biometrics, vol. 68, pp. 497-506, 1992. Mr. Fulcher also served as Guest Editor for the recent special issue of
[37] E. C. Zeeman, “The topology of the brain and visual Perception,’’ Computer Standards and Znterfaces on artificial neural networks.
in Topology of 3-Manifolds and Related Topics, M. K . Fork, Jr., Ed.
Englewood Cliffs, NJ: Prentice-Hall, pp. 240-256, 1962.
[38] M. Zhang, J. Crowley, E. Dunstone, and J. Fulcher, “Face recognition,”
Australia Patent PM1828, Oct. 14, 1993.