You are on page 1of 4

^ ,*

8th Seminar on Neural Network Applications in Electrical Engineering, NEUREL-2006 /m /E2 ~~~Faculty of Electrical Engineering, University of Belgrade, Serbia, September 25-27, 2006 2 06 +IE ~~~~http://neurel .etf`.1b.ac.vyu, http://www.ewh. ieee.org/reg/8/conferences. htmI

MHH1

FACE DETECTION APPROACH IN NEURAL NETWORK BASED METHOD FOR VIDEO SURVEILLANCE
Zoran Bojkovic, Senior Member IEEE, Andreja Samcovic University of Belgrade, Faculty for Traffic and Transport Engineering, Serbia e-mail: zsbojkovicgyahoo.comT
Abstract: Neural networks are adaptive information processing systems that offer attractive solutions for video surveillance. This application aims at identifying particular patterns. Also, MPEG-4 standard profiling strategy in facial animation guarantees that the standard can provide adequate solutions for video surveillance. The main goal of this presentation is to provide face detection for video surveillance using neural network based method. After providing the corresponding architecture for face detection, the emphasize is on the detector which is trained with multilayer back propagation neural networks. Three different face representations are taken into account, i.e. pixel representation, partial profile representation and eigenface representation. Based on this, three independent sub-detectors are generated. The detection rates are measured. The circle at about 94 00 indicates the position where the neural network achieves the optimal performance.
In the past years, there is a continuous increase in the field of human face processing [3], [4], [5]. The research area are: (a) video surveillance which is based on the fact that faces provide an important fact for people's identity, (b) facial expression analysis which provides a natural and intelligent computer interaction interface, (c) human faces in semantics-based video compression and coding. The first part of this work deals with state of the art in the area of face detection algorithm. After that, neural network based face detection approach, together with the corresponding architecture is analyzed. Training process method and result will conclude the presentation.

II. FACE DETECTION ALGORITHMS: STATE OF THE ART


In the past decade, some efforts have been spent on designing face detection methods. These methods range from simple edge-based algorithms to complex

Keywords surveillance

Face detection, facial structure, video

I. INTRODUCTION
There are two possible approaches to communication of talking-head video. The pixel-based approach renders the facial images and transmits the resulting images as array of pixels, where the modelbased approach transmits the facial animation parameters (FAPs) that describe the facial motions and renders the images at the receiver. The model-based approach divides the task into geometric and articulation modeling. They are described by the MPEG-4 Synthetic and Natural Hybrid Coding (SNHC) group as the facial definition parameters (FDPs) and FAPs, respectively. The geometric model defines the polygonal mesh of face and the associated skin texture from which visually realistic facial images from different view angles can be synthesized [1]. The articulation model (deals with the definition of static geometric models to generate various dynamic effects for intelligible reproduction of facial expressions [2]. FAPs are responsible for describing the movements of the face, either at low level or at high level. Here, low level means displacement of a specific single point of the face, and high level represents reproduction of a facial expression. In other words, FAP represents the proper animation parameter stream. The FDPs are responsible for defining the appearance of the face.

high-level approaches using pattern recognition techniques. Generally speaking, they can be classified as knowledge-based classifiers and statistical learning-based classifiers. Knowledge-based classifiers use the low-level image feature like skin-color face geometry and organic feature distribution. These detection methods use semantic knowledge of human faces an at the same time are relatively simple to implement. These algorithms are not robust for large face variations. A multilayered network to learn the face / non-face patterns from numerous training samples is reported in [6]. The presented detection suffers from the fact that the execution speed of the algorithm is too low for real-time surveillance applications. The system has no preknowledge about the most probable locations of faces. It must scan windows at all pixel positions and use arbitrary window sizes extracted from the input image. The most important factor in the execution time of the system is the number of small windows that the neural network has to process. To improve the efficiency of the neural network based method, a face detection approach that uses successive face detectors to progressively restrict the possible candidate face regions to smaller areas can be used. Successive face detectors approach is presented in Figure 1. It can be seen that three detectors (color-based, structure-based, learning-based) are cascaded. In that way, the outputs of a previous detector (potential facial regions) act as the inputs of the subsequent detector. The

1-4244-0433-9/06/$20.00 C2006 IEEE.

44

initial pruning of large non-face areas by the first detectors, significantly decreases the input windows for

the final neural network based detector and increases the processing speed of the detector process.

Fig. 1. Block-scheme for the successive face detectors approach in the whole chain
A biologically-motivated face detection system developed in [7] is used to segment the face from the rest of the images. The implementation of the system emphasizes the criteria described in developing the face detector. They are as follows: * The algorithm must be robust enough to cope with the intrinsic variability in images * It must perform well in an unstructured environment * It should be amenable to real-time implementation and produce low or no false alarm A successful implementation of the face detection was performed using a retinal connected neural network architecture and was refined later to make it suitable for real-time applications. The average performance of the system is above 95 00 on face images, having 30-35 00 deviation from the frontal face image. Searching for better and suitable data projection methods has always been an integral objective of pattern recognition. Such a method enables one to observe and detect underlying data distribution, patterns and structures. In addition, high dimensionality of data poses various challenges for learning algorithm [7]. For example, the presence of irrelevant and noisy information can mislaid the learning algorithm. In higher dimensions, data may be sparse, making it difficult for an algorithm to find any structure in the data. The two main reasons for reducing the dimensionality of a data are: (a) to allow the distribution of the data to be visualized, and (b) to reduce the size of the input space and find the intrinsic dimension of a signal. Note that the data projection methods attempt to take data from a high dimensional space and map it into a low dimensional space with the minimum of error. A great deal of effort has been devoted to this subject.

approach can be recommended. The detector is trained with multilayered back propagation neural networks which take different face representations as an input. Different inputs take into account different information patterns giving the trained neural networks a broader sensitivity to certain image patterns. The weighted sum of the results from the networks should give a reliable judgment on the existence of face patterns. For roughly locating facial regions, color information is an effective image feature [8]. The chrominance components (Cb, Cr) in the Y Cb Cr color space form a condensed cluster. An elliptical model is shown in Figure 2. It is generated by fitting an ellipse over the cluster. Depending on whether the input pixel falls inside or outside of the elliptical region, we can eliminate large background areas that contain no significant skin-like colors. The first detector in a cascade of face detectors emphasizes more speed then accuracy, while the overall detection performance can be reinforced by the next detectors. Visualization of the principle steps in the skin-color detector understands pixel by pixel color verification, skin-color segmentation and a binary filter as a post-processor to smooth the segmentation results.

250
200

Cr

150 100 50

III. NEURAL NETWORK DETECTION APPROACH

BASED

FACE

b.-

In order to improve the efficiency of the face detection for video surveillance, the neural network

Cb Fig.2. Elliptical model in skin-color distribution


0 50 100 150 200

250

45

The structure-based detector focuses on the feature structure of the face, because facial features have been proven valuable for classifying human faces. On the other hand, facial feature appearance and distribution vary considerably among different people and under different imaging conditions. In order to select face areas present in the skin blobs acquired in the first detector, we can enumerate all small windows (for example, 25 by 25 pixels), extracted from each skin blob at every position and scale. For each window, we use a probability-based face structure detector which ranks the input window as a positive or negative face candidate. The positive candidate are then supplied to a neural network based detector to further evaluate the human face resemblance. Profile analysis acts as a first-step prune in the structure detector and also as a feature candidate selector. The algorithm starts with a gray-scale input window. The facial profile in the vertical direction is then generated from the window, from which the second-order derivatives of local maxima are located as reference lines. The reference lines correspond to some comparatively "flat" and "light" areas in the face. After the vertical profile position of each candidate feature is located, we can apply a probability-based evaluation process to further verify the existence of prominent facial features.
IV. NETWORK ARCHITECTURE FOR FACE DETECTION

and 1 about the input window. An output of 1 indicates that the input window contains a face, while an output of 0 indicates that the input window contains no recognizable face pattern. Each sub-network is a connected three-layer back propagation network with a fixed number of middle layer hidden units.
X-PROFILE OF A X-PROFILE OF B

y-

PROFILE
OF A

BLOCK A BLOCK B

y-

PROFILE OF B
y-

BLOCK C

PROFILE OF C

X-PROFILE OF C

Fig.3. Partial profile block representation


V. TRAINING PROCESS

The neural network gradually learns face characteristics by learning from samples. The goal is to determine which face representation method describes a face optimally. Also, it is important to differentiate faces from other non-face objects. We will suppose three face representation: pixel representation, partial profile representation and eigenface representation. Starting from the pixel representation, it should be noted that this is probably the most commonly used input format for neural network based object detectors. It contains the complete information about the input window and can be seen as a lossless representation method. Also, it captures all the variability among the face samples, but it is computationally expensive. Partial profile representation looks at the profiles of three blocks, each featuring a salient facial feature region in the face. Partial profile block representation is given in Figure 3. The profile information contains the integral information about the pixel distribution. This information retains certain invariability among faces. Eigenface representation is very effective in facial coding and compression. It is based on the principle component decomposition and provides a compact way of representing an arbitrary face using only a few parameters. We select coefficients (for example first 80) in the eigenface space for the representation of faces. Three independent sub-detectors are generated based on three face representations. The network architecture for face detection is shown in Figure 4. It can be seen that the network architecture is composed of three parts which look at the three described face representations. The weighted sum of the sub-network outputs gives an indication between 0

As for the network training, we propose the nonface training samples to be added to the training set dynamically. In that way the network is trained to handle explicitly non-face data, i.e. false acceptances, at a given time during a training period. In [6], face training set contains 12000 face images collected from various face data bases and web photo galleries. These samples also include the scaled versions at the same face with a scaling factor between 0.8 and 1.12. The whole training process can be divided into parts. Each part contains a variable number of iterations. This number is determined by the size of the non-face training set at that time. During each iteration, the network is trained with for example randomly selected 100 face samples and 100 non-face samples from each training set. At the end of each part, the network undergoes a test stage, in which it is supplied with random new non-face samples clipped from the nonface repository. The samples are generated from the repository in a random way so that the correlation effect within one image can be maximally reduced. When up to 250 false detections are collected, the test process pauses. These images are added to the non-face training set and the next part is started all over again. As an example, Figure 5 shows the training process for the face detection rate in the case of the eigenface representation. The test samples are selected in such a way that the correlation between samples is kept as low as possible. The detection rates are measured against a separate test set of 500 faces and 4000 non-faces. The network achieves the optimal performance at about 94 00. This is indicated by circle representation. Sometimes, faces were not detected due to substantial in-depth or in-plane detection. Also, a too bright or too dark environment occurs, which often degrades the performance of the detector, leading to a final detection failure.

46

INPUT UNITS
OUTPUT UNITS
\

FACE: 1 NONFACE:

wo
FINAL DECISION

Fig.4. Network architecture for face detection


1

0,95 0,9
.2

This research has been supported by the Ministry for science and environmental protection of the Republic of Serbia.
REFERENCES

0,85

0,8 0,75
0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

False detection rate

Fig.5. Training process of the neural network eigenspace representation


V. CONCLUSION We have analyzed face detection approach in neural network based method. It is demonstrated how the neural network technology can give a solution for the fast human face detection using a cascade of detectors: colorbased, structure-based and learning-based. A neural network based face detector is trained with three multilayered back propagation neural networks which take three different face representations as an input. The weighted sum of the results from the three networks should give a reliable judgment on the existence of the face patterns. The MPEG-4 standard provides a constant and complete architecture for the coded representation of the desired combination of streamed elementary audiovisual information. Within MPEG-4, a binary format for scene description framework offers a parametric methodology for scene structure representation, as well as adequate solutions for applications in face detection for video surveillance.

ACKNOWLEDGMENTS

[1] Z.S.Bojkovic, D.A.Milovanovic: "Audiovisual integration in multimedia communications based on MPEG-4 facial animation", Circuits, Systems and Signal Processing CSSP, Birkhaeuser, Vol.20, No.3/4, pp 311339, May/August 2001. [2] P.Doenges et al: "MPEG-4 audio/video synthetic graphics/audio for mixed media", Signal Processing, Image Communication, Vol.9, pp 433-464, May 1997. [3] A.Sounal, P.A.Jyenger: "Automatic recognition and analysis of human faces and facial expressions: a survey", Pattern Recognition, Vol.25, No.1, pp 65-77, January 1992. [4] R.Chellapa, C.L.Wilson, S.Sirhey: "Human and machine recognition of faces: a survey", Proceedings IEEE, Vol.83, No.5, pp 705-740, May 1995. [5] W.E.Vieux, K.Schwerdt, J.L.Crowley: "Face-tracking and coding for video compression", Proc. Of the Ist Conf. on Computer Vision Systems, LNCS 1542, pp 151-161, 1999. [6] H.Rowley, S.Balkja, T.Kanada: "Neural networkbased face detection", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.20, No.1, pp 23-28, January 1998. [7] M.Yeasin, B.Bullot, R.Sharima: "Recognition of facial expressions and measurement of levels of interest from video", IEEE Trans. on Multimedia, Vol.8, No.1, pp 500-508, June 2006. [8] R.L.Hsu, A.K.Jain: "Face detection in color images", IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.24, No.5, pp 696-706, May 2002. [9] M.Turk, A.Pentland: "Eigenface for recognition", Journal of Cognitive Neuro-science", Vol.3, No.1, pp 7086, 1991. January

47

You might also like