Professional Documents
Culture Documents
(x). Let I
m
(x) be the mean
intensity of all layers of the pixel at position x, and
I
m
(x) be the mean intensity of
all layers of N
yN
(x)
m
(x, y)
b
R
(x)
b
G
(x)
b
B
(x)}
m
(x, y) =
0 if
I
m
(x) I
m
(y);
1 if
I
m
(x) < I
m
(y).
where
k,i
=
k,i
n
i=1
k,j
2. Generate a weak classier c
k
for a single feature k,
with error
k
=
i
i
|c
k
(
i
) l
i
|.
3. Compute
k
=
1
2
ln(
1
k
k
) ,
4. Update the weights,
k+1,i
=
k,i
k
c
k
(
i
) = l
i
1 otherwise
The nal strong classier is:
C() =
K
k=1
k
c
k
((k)) >
1
2
K
k=1
k
0 otherwise
where K is the total number of features
A few weak classiers are combined forming a nal strong classier. The weak
classiers consist of histograms of g
p
k
and g
n
k
for the feature k. Each histogram holds
a weight for each feature. To build a weak classier, we rst count the kernel index
statistics at each position. The resulting histograms determine whether a single feature
belongs to a face or non-face. The single feature weak classier at position k with
the lowest boosting error e
t
is chosen in every boosting loop.The maximal number of
features on each stage is limited with regard to the resolution of analysis window. The
denition of histograms are as follows:
g
p
k
(r) =
i
I(
i
(k) = r)I(l
i
= 1)
g
n
k
(r) =
i
I(
i
(k) = r)I(l
i
= 0)
, k = 1, . . . , K, r = 1, . . . , R
where I(.) is the indicator function that takes 1 if the argument is true and 0 other-
wise. The weak classier for feature k is:
c
k
(
i
) =
1 g
p
k
(
i
(k)) > g
n
k
(
i
(k))
0 otherwise
where
i
(k) is the k
th
feature of the i
th
face image, c
k
is the weak classier for
the k
th
feature. The nal stage classier C() is the sum of all weak classiers for the
chosen features.
C() =
K
k=1
k
c
k
((k)) >
1
2
K
k=1
k
0 otherwise
where c
k
is the weak classier, C() is the strong classier, and
k
=
1
2
ln(
1
k
k
).
Table.2 Pseudo-code of bootstrapping algorithm
Set the minimum true positive rate, T
min
, for each boosting iteration.
Set the maximum detection error on the negative dataset, I
neg
, for each
boost-strap iteration.
P = set of positive training examples.
N = set of negative training examples.
K = the total number of features.
While I
err
> I
neg
- While T
tpr
< T
min
For k = 1 to K
Use P and N to train a classier for a single feature.
Update the weights.
Test the classier with 10-fold cross validation to determine T
tpr
.
- Evaluate the classier on the negative set to determine I
err
and put any
false detections into the set N
4 Experimental results
The training data set consists of 6000 faces and 6000 randomly cropped non-faces.
Then both the faces and non-faces are down-sampled to 24 24, 16 16, 8 8, and
6 6 pixels for training detectors for different resolutions.
To test the detector, we use the Georgia Tech face database, which contains images
of 50 people. All people in the database are represented by 15 color JPEG images with
cluttered background taken at resolution 640 480 pixels. The average size of the
faces in these images is 150 150 pixels. The pictures show frontal and tilted faces
with different facial expressions, lighting conditions and scale.
We use a cascade of three strong classiers using a variation of AdaBoost algorithm
used by Viola et al. The number of Boosting iterations is not certain. The boost process
will continue until the detection rate reaches the minimum detection rate. For detecting
faces of 24 24 pixels, the analysis window is of size 22 22. The maximal number
of features for each stage is 20, 300, and 484 respectively. For detecting faces of 16
16 pixels, the analysis window is of size 14 14. The maximal number of features
for each stage is 20, 150, and 196 respectively. For detecting faces of 8 8 pixels, the
analysis window is of size 6 6. The maximal number of features for each stage is 20,
30, and 36 respectively. For detecting faces of 6 6 pixels, the analysis window is of
size 4 4. The maximal number of features for each stage is 5, 10, and 16 respectively.
The experimental results are as follows. In Table 3 we present the results of using
12-bit MCT in different resolutions. We can see from Table 3 that as the resolution
of faces in the testing images reduces the false alarms increase, and the detection rate
reduces. In Table 4 we present the results of using 9-bit MCT in different resolutions.
We can also see from Table 4 that as the resolution of faces in the testing images reduces
the false alarms increase, and the detection rate reduces. From Table 3 and Table 4 we
can conclude that using 12-bit MCT in color images could get much better detection
rate and fewer false alarms in low resolution images than using 9-bit MCT.
In Figures 4 through 7, we show some representative results from the Georgia Tech
face database. Figure 4 presents the detection results on 24 24 pixel face images.
Figure 6 presents the detection results on 16 16 pixel face images. Figure 6 presents
the detection results on 8 8 pixel face images. Figure 7 presents the detection results
on 6 6 pixel face images.
Table.3 Face detection using 12-bit modied census transform
Georgia tech face database
Modied census transform 12-bit MCT
Detection rate False alarms
Boosting
24 24 99.5% 1
16 16 97.2% 10
8 8 95.0% 136
6 6 80.0% 149
Table.4 Face detection using 9-bit modied census transform
Georgia tech face database
Modied census transform 9-bit MCT
Detection rate False alarms
Boosting
24 24 99.2% 296
16 16 98.4% 653
8 8 93.5% 697
6 6 68.8% 474
Fig. 4. Sample detection results on 24 24 pixel face images
5 Conclusion
In this paper, we presented 12-bit MCT that works better than the original 9-bit MCT
in low-resolution color images for object detection. According to the experiments, our
Fig. 5. Sample detection results on 16 16 pixel face images
Fig. 6. Sample detection results on 8 8 pixel face images
method can attain better results than the conventional 9-bit MCT while detecting in
low-resolution color images.
For future work, we will be extend our system in other object detection problems
such as car detection, road detection, and hand gestures detection. In addition, we would
like to perform experiments using other boosting algorithms such as Float- Boost and
Real AdaBoost to further improve the performance of our system in low-resolution
object detection.
References
1. Yang, M.H., Kriegman, D.J., Ahuja, N.: Detecting faces in images: A survey. IEEE transac-
tions on pattern analysis and machine intelligence 24 (2002) 3458
2. Hayashi, S., Hasegawa, O.: Robust face detection for low-resolution images. Journal of
Advanced Computational Intelligence and Intelligent Informatics 10 (2006) 93101
3. Torralba, A., Sinha, P.: Detecting faces in impoverished images. Technical Report 028, MIT
AI Lab, Cambridge, MA (2001)
Fig. 7. Sample detection results on 6 6 pixel face images
4. Kruppa, H., Schiele, B.: Using local context to improve face detection. In: Proceedings of
the British Machine Vision Conference, Norwich, England (2003) 312
5. Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence 20 (1998) 2338
6. Rowley, H.A., Baluja, S., Kanade, T.: Rotation invariant neural network-based face detection.
In: Proceedings of 1998 IEEE Conference on Computer Vision and Pattern Recognition,
Santa Barbara, CA (1998) 3844
7. LeCun, Y., Haffner, P., Bottou, L., Bengio, Y.: Object recognition with gradient-based learn-
ing. In Forsyth, D., ed.: Shape, Contour and Grouping in Computer Vision. Springer (1989)
8. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R., Hubbard, W., Jackel, L.D.:
Object recognition with gradient-based learning. In Forsyth, D., ed.: Shape, Contour and
Grouping in Computer Vision, Springer (1999)
9. Lawrence, S., Giles, C.L., C.L., Tsoi, A.C., Back, A.D.: Face recognition: A convolutional
neural-network approach. IEEE Transactions on Neural Networks 8 (1997) 98113
10. Schneiderman, H., Kanade, T.: A statistical model for 3-d object detection applied to faces
and cars. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE (2000)
11. Jesorsky, O., Kirchberg, K., Frischholz, R.W.: Robust face detection using the hausdorff
distance. In: Third International Conference on Audio- and Video- based Biometric Person
Authentication. Lecture Notes in Computer Science, Springer (2001) 9095
12. Kirchberg, K.J., Jesorsky, O., Frischholz, R.W.: Genectic model optimization for hausdorff
distance-based face localization. In: International Workshop on Biometric Authentication,
Springer (2002) 103111
13. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features.
In: Proceedings of 2001 IEEE International Conference on Computer Vision and Pattern
Recognition. (2001) 511518
14. Sung, K.K.: Learning and Example Seletion for Object and Pattern Detection. PhD thesis,
Massachusetts Institute of Technology (1996)
15. Schneiderman, H., Kanade, T.: Probabilistic modeling of local appearence and spatial rela-
tionship for object recognition. In: International Conference on Computer Vision and Pattern
Recognition, IEEE (1998)
16. Wu, B., Ai, H., Huang, C., Lao, S.: Fast rotation invariant multi-view face detection based
on real adaboost. In: Sixth IEEE International Conference on Automatic Face and Gesture
Recognition. (2004)
17. Fr oba, B., Ernst, A.: Face detection with the modied census transform. In: Sixth IEEE
International Conference on Automatic Face and Gesture Recognition, Erlangen, Germany
(2004) 91 96
18. Zabih, R., Woodll, J.: A non-parametric approach to visual correspondence. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence (1996)
19. Dalal, N.: Finding People in Images and Videos. PhD thesis, Institut National Polytechnique
de Grenoble (2006)