You are on page 1of 9

n 2001, Viola and John proposed an algorithm for face recognition, which became a

breakthrough in the field of face recognition. The method uses the sliding window
technology. That is, a frame smaller than the original image moves with a certain step in the
image, and using a cascade of weak classifiers determines whether there is a person in the
window in question. The sliding window method is effectively used in various tasks of computer
vision and object recognition.

The method consists of 2 sub-algorithms: the learning algorithm and the recognition
algorithm. In practice, the speed of the learning algorithm is not important. The speed of the
recognition algorithm is extremely important. According to the previously introduced
classification can be attributed to structural, statistical and neural methods.

The method has the following advantages:

it is possible to detect more than one person on the image;


the use of simple classifiers shows good speed and allows using this method in the video
stream.

However, the method is difficult to train, since a large amount of test data is required for training
and involves a long learning time, which is measured in days.
Initially, the algorithm was proposed to recognize only persons, but it can be used to recognize
other objects. One of the contributions of Viola and Jones was the use of a sum table, which they
called an integrated image, a detailed description of which will be given below.

Recognition scheme
The generalized recognition scheme in the Viola-Jones algorithm is shown in the figure below.

1
The general scheme of the algorithm looks like this: before the recognition begins, the learning
algorithm based on test images trains a database consisting of characteristics, their parity and
boundary. More details about parity, trait and border will be described in the following
paragraphs. Further, the recognition algorithm searches for objects at different scales of the
image using the created database. The algorithm of Viola-Jones at the output gives all the set of
unallocated objects found on different scales. The next task is to decide which of the found
objects are actually present in the frame, and which are duplicates.

Attributes of the class


As signs for the recognition algorithm, the authors proposed Haar's signs, based on Haar
wavelets. They were proposed by the Hungarian mathematician Alfred Haar in 1909.

In the problem of face recognition, the general observation is that among all the faces of the area
the eye is darker than the region of the cheeks. Consider masks consisting of light and dark areas.

Each mask is characterized by the size of the light and dark areas, proportions, and also the
minimum size. Together with other observations, the following Haar attributes were proposed, as
a feature space in the recognition problem for a class of individuals.

2
The Haar attributes give a point value of the brightness difference along the X and Y axes,
respectively. Therefore, the common Haar sign for face recognition is a set of two adjacent
rectangles that lie above the eyes and on the cheeks. The value of the characteristic is calculated
by the formula:

F = XY

where X is the sum of the brightness values of the points of the sign closed by the light part, and
Y is the sum of the brightness values of the points covered by the dark part of the characteristic.

It can be seen that if we consider the sum of the intensities for each characteristic, this will
require considerable computational resources. Viola and Jones were asked to use the integral
representation of the image , more about it will be later. Such a representation has become a
rather convenient way of calculating attributes and is also used in other computer vision
algorithms, for example, SURF .

Training scheme
The generalized scheme of the learning algorithm is as follows. There is a test sample of
images. The size of the test sample is about 10 000 images. The figure shows an example of
training images of faces. The learning algorithm works with images in shades of gray.

3
With a test image size of 24 by 24 pixels, the number of configurations of one characteristic is
about 40,000 (depends on the minimum size of the mask). The modern implementation of the
algorithm uses about 20 masks. For each mask, each configuration trains such a weak classifier,
which gives the smallest error on the entire training base. It is added to the database. Thus, the
algorithm is trained. And the output of the algorithm is a database of T weak classifiers. The
general scheme of the learning algorithm is shown in the figure.

Learning the Viola-Jones algorithm is teaching the algorithm with the teacher. It is possible for
him such a problem as retraining. It is shown that AdaBoost can be used for various problems,
including game theory, forecasting. In this paper, the stopping condition is the achievement of a
predetermined number of weak classifiers in the database.

4
For the algorithm, it is necessary to prepare a test sample from l images containing the desired
object and n not containing them in advance. Then the number of all test images will be

where X is the set of all test images, where for each advance it is known whether the object is
present or not and is reflected in the set Y.

Where

By the sign of j we mean a structure of the form

Then the characteristic response is f_j (x), which is calculated as the difference in pixel
intensities in the light and dark areas. A weak classifier looks like this:

The task of a weak classifier is to guess the presence of an object in more than 50% of
cases. Using the training procedure AdaBoost creates a very strong classifier consisting of T
weak classifiers and has the form:

The objective function of training is as follows:

5
Integral representation of images
The integral representation can be represented in the form of a matrix whose dimensions
coincide with the sizes of the original image I, where each element is calculated as follows:

where I (r, c) is the brightness of the pixel of the original image.

Each element of the matrix II [x, y] is the sum of the pixels in the rectangle from (0,0) to (x,
y). The calculation of such a matrix takes linear time. In order to calculate the sum of a
rectangular area in the integral representation of the image, only 4 operations of accessing the
array and 3 arithmetic operations are required. This allows you to quickly calculate the Haar
attributes for the image in learning and recognition.

For example, consider the rectangle ABCD.

The sum inside the rectangle ABCD can be expressed in terms of sums and differences of
adjacent rectangles by the formula:

6
Training
Before the training begins, weights w_ (q, i) are initialized, where q is the iteration number, i is
the image number.

After the training procedure, T weak classifiers and T values will be obtained.

At each iteration of the cycle, the weights are updated so that their sum is 1. Further, for all
possible characteristics, the values of p, , j are selected such that the error value e_j is minimal
at this iteration. The received attribute J (t) (at step t) is stored in the base of weak classifiers, the
weights are updated and the coefficient a_t is calculated.

7
The original algorithm proposed in 2001 did not describe the procedure for obtaining the optimal
attribute at each iteration. It assumes the use of the algorithm AdaBoost and a full search of
possible parameters of the border and parity.

Recognition
After training on the test sample, there is a trained knowledge base of T weak classifiers. For
each classifier, the Haar attribute used in this classifier is known, its position within a 24x24
pixel window and the threshold value E.

The algorithm receives an image I (r, c) of the size W x H, where I (r, c) is the luminance
component of the image. The result of the algorithm is a set of rectangles R (x, y, w, h) that
determine the position of faces in the original image I.

The algorithm scans the image I on several scales, starting from the base scale: the window size
is 24x24 pixels and 11 scales, each next level is 1.25 times larger than the previous one, on the
authors' recommendation. The recognition algorithm is as follows:

8
Disadvantages of the Viola-Jones
algorithm
The basic algorithm of Viola-Jones (hereinafter the basic algorithm) has a number of drawbacks:

long time of the learning algorithm. During the learning process, the algorithm needs to
analyze a large number of test images;
a large number of closely spaced results due to the application of different scales and a
sliding window.

You might also like