Professional Documents
Culture Documents
in images :
Neural networks
&
Support Vector
Machines
April 2002
ABSTRACT
Over the years, one of the many problems being dealt with by the
computer-vision community is that of face detection and recognition in
images. The applications of such a system are numerous, from automated
security systems, census, intelligence information etc. In this report, we
present our experience with two of the most successful techniques present
today ([rowley98],[cvpr97face]) and extensions of this work into other
interesting applications.
1 TABLE OF CONTENTS
1 TABLE OF CONTENTS..................................................................................... 2
2 Introduction....................................................................................................... 3
3 Generic Approach ............................................................................................ 4
3.1 The sliding window ................................................................................. 4
3.2 Image pre-processing............................................................................ 4
3.3 Bootstrapping............................................................................................ 5
3.4 Training Set description........................................................................ 5
4 The Neural Network Technique.................................................................. 7
4.1 Network Structure ................................................................................... 7
4.2 Results ......................................................................................................... 8
4.3 Other species ............................................................................................ 9
4.4 Different Network Architectures ...................................................... 10
4.4.1 Fully connected network............................................................. 10
4.4.2 Two outputs..................................................................................... 11
5 Support Vector Machines ........................................................................... 12
5.1 Introduction to SVMs ........................................................................... 12
5.2 SVM learning parameters................................................................... 13
5.3 Results of training................................................................................. 14
6 Implementation Details............................................................................... 15
7 Neural Nets and SVMs – A comparison ................................................ 16
8 Further Directions ......................................................................................... 17
9 References ....................................................................................................... 18
10 Resources..................................................................................................... 18
Table of Figures
Figure 1 - Image pre-processing ................................................................. 5
Figure 2 - Constructing 20x20 training image from original.............................. 6
Figure 3- Basic structure of neural network (Taken from [rowley98]) ................ 7
Figure 4 - Results of Neural Network on pictures taken by us ........................... 8
Figure 5 - Results of neural network on "standard" pictures ............................. 9
Figure 6 - Results on a "fully-connected" network ........................................ 11
-2-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
2 Introduction
-3-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
3 Generic Approach
Thus, the system slides this 20x20 window across the image. For the
classifier to correctly detect a face, the face must fit into the window and
occupy all of it, ie, it must not to larger or smaller than the window. To
expect that this will always happen is ofcourse absurd, and to compensate
for this fact we repeatedly scale down the image by a constant factor and
then slide a 20x20 image on this smaller window.
With this we are able to detect faces that may be larger than the window
in the original image.
Face images have a great deal of variation – the diversity in race, color,
gender etc. bring about a great deal of variation in face pictures. Add to
that the difference in images taken under different lighting conditions,
with different equipment etc. and the classifier can get completely
confused, its decision being influence by such factors.
These steps are applied to each 20x20 window and not the image as a
whole.
-4-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
3.3 Bootstrapping
In each image to be placed in the training set the eyes, nose and left,
right and center of the mouth were marked. With these markings, the face
was transformed into a 20x20 window with the marked features at
predetermined positions [ELABORATE].
-5-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
Initially, for negative samples, random images were created and added to
the training set. The training set was subsequently enhanced with
bootstrapping of scenery and false-detected images.
The last used training set (including bootstrapping) had 8982 input
vectors.
-6-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
Each hidden neuron is not connected to ALL the input neurons. The hidden
neuron connections are as follows:
• The input image is divided into a 2x2 grid. 4 of the hidden neurons
take input from only one of these grids each
• The input image is divided into a 4x4 grid. 16 of these neurons take
input from only one of these grids each. This division into grids
should help in detection local features (eyes, nose) important for
face detection.
• The input image is divided into 6 horizontal stripes (each of height
5 pixels, this there is some overlap between strips).This should aid
in the detection of features such as a pair of eyes or the mouth.
The idea is that the hidden neurons taking square (grid) inputs would
detect individual features while the horizontal stripes would detect pairs of
eyes and the mouth.
-7-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
4.2 Results
-8-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
Here we tried some images of animal faces etc. to see if the network
learnt to recognize faces in general (two eyes, a nose and a mouth) or
was able to detect something unique about human faces. Do note that
none of these animal faces were in the training set. We obtained some
interesting results:
-9-
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
- 10 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
The networks above with only one output gave a few false detections and
on rare occasions missed a face. A common strategy used in many neural-
network based classifiers is a two-output system. Some believe that
neural networks work better with sparse input/output schemes. We thus
tried a two output system, where the first output gives us a measure of
how likely is the given image to be a face while the second output gives a
measure of how likely is the given image to not be a face.
- 11 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
The training set used was exactly the same as that used in the neural
network, i.e., of 8982 input vectors.
Consider data points of the form {(xi ,yi )}i=1..N , and we wish to determine
among the infinite such points in an N-dimensional space which of two
classes of such points does a given point belong to. If the two classes are
linearly separable, we need to determine a hyper-plane that separates
these two classes in space. However, if the classes are not clearly
separable, then our objective would be to minimize the smallest
generalization error. Intuitively, a good choice is the hyper-plane that
leaves the maximum margin between the two classes (margin being
defined as the sum of the distances of the hyper-plane from the closest
points of the two classes), and minimizes the misclassification errors.
Figure 7 - QP eqn. whose solutions are the support vectors (from [cvpr97face])
- 12 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
It turns out that only a small number of coefficients are different from
zero, and since every coefficient is a particular data point, this means that
the solution is determined by the data points associated with the non-zero
coefficients. There are the support vectors, the only ones which are
relevant to the solution of the problem, and thus all other data points can
be deleted from the data set without affecting the solution. Intuitively,
support vectors are data points lying between the border between the two
classes.
Figure 8 - Separating hyperplanes (a) small margin (b) larger margin, better classifier [taken
from [cvpr97face]]
- 13 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
Interestingly, the learning time for the SVM algorithm was significantly
smaller than that for the neural network. Over the ~9000 image training
set, the SVM algorithm produced the model in approximately 15 minutes,
while backpropagation of the neural network took 1 hour.
- 14 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
6 Implementation Details
Intel’s Image Processing Library (IPL) was used for image processing and
manipulation (histogram equalization, window extraction, scaling etc.).
Input vectors were then created from the scaled, processed windows.
The application also assists in the creation of the training set by allowing
features (eyes, nose, mouth) to be labeled, transforming the face based
on the selected features to a 20x20 window, rotating the image randomly,
pre-processing the image and then writing to a training set file.
All training engines are both Linux and Windows compatible. The GUI is
currently written for Windows systems.
The code written is free for use, with the hope that this will save a
significant amount of time for anyone trying to build up from here. Please
feel to contact the authors for these applications.
- 15 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
No (0.020) No (-2.9)
No (0.001) No (-3.9)
No (0.00001) No (-5.6)
No (0.00002) No (-4.5)
No (0.040) No (-2.3)
- 16 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
8 Further Directions
The face detection problem has many applications in the field of security
systems, automated census, intelligence systems etc. However, of
particular interest to us is in the field of video summarization. The idea is
that given a video sequence, we first identify the faces in the frames and
then use the identified faces for motion-tracking and face-recognition.
With this, we may be able to textually comment on the movement of
persons across a scene.
Testing out the feasibility and performance of such a system would be the
next logical step to take after the
Furthermore, the detection scheme described in this report deals with full-
frontal facial images, meaning thereby that profile views and occluded
faces are not handled. Profile views can be detected using the same
technique, possibly using the eye, nose and ear to positions to standardize
the training set and then use the training schemes described above.
- 17 -
Asim Shankar, Priyendra Singh Deshwal Face Detection in Images
IIT Kanpur April 2002
9 References
• [rowley98] – Neural network based Face Detection. Henry Rowley,
Shumeet Baluja, Takeo Kanade. CMU. IEEE Transactions on Pattern
Analysis and Machine Intelligence, volume 20, number 1, pages 23-
38, January 1998.
(http://www-
2.cs.cmu.edu/afs/cs.cmu.edu/user/har/Web/faces.html)
10 Resources
• CMU image test set for face detection
http://vasc.ri.cmu.edu/IUS/eyes_usr17/har/har1/usr0/har/faces/test/
- 18 -