The idea of the second category is histogram matching. It usually needs a reference color model of the object and a similarity measure. The candidate whose mode is most similar to the reference one is selected as the tracking result.
The idea of the second category is histogram matching. It usually needs a reference color model of the object and a similarity measure. The candidate whose mode is most similar to the reference one is selected as the tracking result.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as TXT, PDF, TXT or read online from Scribd
The idea of the second category is histogram matching. It usually needs a reference color model of the object and a similarity measure. The candidate whose mode is most similar to the reference one is selected as the tracking result.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as TXT, PDF, TXT or read online from Scribd
by a set of pixels or estimated by computing the first
and the second moments on the probability map. In
this way it is possible to represent color distribution with a small number of component Gaussians, however, building and updating MoGs via EM is time-consuming. The idea of the second category is histogram matching. It usually needs a reference color model of the object and a similarity measure to evaluate the similarity between the reference and the candidate color model. The candidate whose mode is the most similar to the reference one is selected as the tracking result in the current frame. The wellknown mean shift algorithm [5] and the improved work [23,1,4,22] that follows fall into this category, in which the color model was represented by a weighted histogram (kernel-based probability distribution), and the similarity was measured with Bhattacharrya distance. By the first order gradient descent of the similarity measure, the mean shift algorithm is derived with which the local best candidate is achieved. The method proposed in this paper belongs to the second class, aiming at solving two problems in the algorithms. Conventional histogram methods [5,23,1,4,22] partition the whole color space of the object into regular square tessellation, neglecting the fact that object color is usually very compact and distributed only in some small regions of the whole color space, thus leading to a large number of void bins and a waste of computational resources. The second problem is that in each bin the ample color information is not modelled, discarding the distribution of the multi-channel gray level. To address the two problems, a clustering-based color model is proposed and a fast algorithm based on Integral Images is developed for object tracking. In Section 2 K-means clustering is used to partition the color space adaptively and the histogram bins of the object model is determined accordingly. Moreover, we model the multi-channel gray level distribution in each bin with Gaussian to capture a richer description of the target. Then a similarity measure and its simplified form based on Bhattacharrya distance is introduced to evaluate the similarity between two color models. In Section 3 the Integral Images for computation of histogram, mean and variance are proposed, with which the color model is able to be evaluated with fast array index operation. Thanks to the Integral Images it is possible to implement efficiently the brute-force search tracking algorithm. In Section 4 diverse experiments are made to demonstrate the validity and the performance of the algorithm. 2. Clustering-based color model It is a common understanding that adaptive binning histograms can represent the distributions more efficiently and more accurately with much less bins. Although adaptive partition of color space has long been studied in image coding [6] and image segmentation [2], few related work was found in object tracking. 2.1. Adaptive partition of color space In the paper K-means clustering [7] is employed to adaptively partition the color space of the object. According to the clustering result, the histogram bins are determined using the following simple methods. For each cluster, the pixel farthest to that cluster center is used to determine bin range that is non-uniform rectangle for two dimensions or hyperrectangle for higher dimensions. Adjacent rectangles (or hyper-rectangles) may have small overlapping regions. For a pixel within such an overlapping region, its identity is determined by computing its distance to relevant cluster centers and selecting the cluster with minimum distance. Fig. 1 presents an example of adaptive partition of color space. The left figure is a reference image of a human face. The middle figure shows the color distribution of the object in RG color space, from which we can see color is very compact and distributed only in some small regions of the whole RG color space. The right figure shows nonuniform histogram bins according to K-means clustering (d ¼ 6), where pixels belonging to the same bin are labelled with the same color. Determination of the number of histogram bins is an important yet unresolved problem in colorbased object tracking [5,23,1,3,21,22]. Too many bins fail to handle environment changes or noise which leads to tracking failures, meanwhile too few fail to allow a good discrimination of the target color model, resulting in distraction by similar color regions nearby. In our case, straightforward application of clustering algorithms [8] which handle automatic selection of cluster number cannot yet solve the above problem. Thus, like most color-based tracking algorithms, the bin number is empirically set (between d ¼ 4 ARTICLE IN PRESS L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 677 and 8 in our case) and selection of bin number accounting for environment changes is left for future work. 2.2. Color model and similarity measure Based on the adaptive bins obtained above, given a reference image consisting of a set of pixels IðxiÞ; i ¼ 1; . . . ;N, the reference color model is represented by p ¼ fpug; u ¼ 1; . . . ; d, where pu is defined as puðIðxÞ; bu; lu;RuÞ ¼ buGðlu; RuÞ. (1) In the above equation, Gðlu;RuÞ is a Gaussian distribution with mean vector lu and covariance matrix Ru, and bu; lu; Ru are of the following forms: bu ¼ nu=N, lu ¼ 1 nu XN i¼1 IðxiÞduðxiÞ, Ru ¼ 1 nu XN i¼1 ðIðxiÞ luÞðIðxiÞ luÞTduðxiÞ, ð2Þ where nu ¼ PN i¼1 duðxiÞ is the number of pixels within the uth bin, and duðxiÞ is kronecker function which is 1 if IðxiÞ falls into the uth bin and 0 otherwise. Consider the color model q ¼ fqug; u ¼ 1; . . . ; d, of a candidate region comprising of N0 pixels, in which the component distribution has the form quðIðxÞ; b0u; l0u;R0uÞ ¼ b0uGðl0u; R0uÞ, (3) where b0u, l0u and R0u have similar forms as shown in Eq. (2). Similarity between two component distributions puðIðxÞ; nu; lu;RuÞ and quðIðxÞ; n0u; l0u;R0uÞ is mR easured using Bhattacharrya distance rðpu; quÞ ¼ p 1=2 u q 1=2 u dIðxÞ. By integral we get rðpu; quÞ ¼ cu exp 1 4 ðlu l0uÞT ðRu þ R0uÞ1 ðlu l0uÞ
, ð4Þ where cu is given below: cu ¼ ð2bub0uÞ1=2 jRuj1=2jR0uj1=2 jRu þ R0uj 1=2 . (5) Thus, the similarity measure between two distributions p ¼ fpug and q ¼ fqug is defined as rðp; qÞ ¼ Xd u¼1 rðpu; quÞ. (6) 2.2.1. Simplification of the color model Assuming that gray level distribution of different channel in each bin is independent of each other, the covariance matrix becomes diagonal and similarity measure can be simplified. Let lu ¼ ½mu;1 mu;2 mu;3T and Ru ¼ diagfs2 u;1 s2 u;2 s2 u;3g, the similarity measure between two component distributions, as described by Eq. (4), is simplified as rðpu; quÞ ¼ cu exp 1 4 X3 j¼1 ðmu;j m0u;jÞ2 s2u ;j þ s0u;j 2 ! , (7) where cu has the following form: cu ¼ ð2bub0uÞ1=2 Y3 j¼1 su;js0u;j s2u ;j þ s0u;j 2 !1=2 . (8) The advantage of such an assumption is that we can evaluate the means and the variances in array index ARTICLE IN PRESS Fig. 1. Adaptive partition of color space. From left to right are: a reference i mage (size: 73 69) of a human face, the histogram of the reference model and non-uniform histogram bins in RG space. 678 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 operations through the Integral Images described in Section 3.1. 2.2.2. Remarks A histogram models the probability pu of a pixel pðIðxÞÞ falling into the uth bin. It is interesting to compare different forms of the probability pu for different definitions of histograms pu ¼ bu for a traditional histogram; buGðlu; RuÞ for a histogram proposed in the paper: 8>< >: A traditional histogram only counts the number of pixels belonging to one bin, without modelling color distribution within each bin, the assumption underlying which is that all pixels within that bin are uniformly distributed. As for the histogram proposed in the paper, in addition to counting of the pixel number, the distribution within each bin is modelled as Gaussian. 3. Fast algorithm based on Integral Images for object tracking Exhaustive search via histogram comparison for the maximal mode is computationally prohibitive in real-time tracking applications. However, with the Integral Images proposed below it is possible to make a brute-force search. Motivated by the work of Viola and Jones [19], we presented a straightforward method to compute histogram by introducing a concept of Integral Histogram Image [20]. Porikli independently presented the concept of Integral Histogram and analyzed at length its computational complexity [13]. In agreement with the methods above, the histogram of any size of rectangle region can be achieved with fast array index operations. In the paper we use the methods introduced in [20] to compute histogram. Furthermore, we extended the work of Viola and Jones by presenting Integral Images for computing the means and variances of three channels in each bin. 3.1. Computation of color distribution through Integral Images Given the original color image Dðx; yÞ ¼ ðDj ðx; yÞ j ¼ 1; 2; 3Þ, we present Integral Images Ibu ðx; yÞ, Imu;j ðx; yÞ and Isu;j ðx; yÞ, where u ¼ 1; . . . ; d; j ¼ 1; 2; 3, for computation of histogram, mean and variance of gray level for three channels. Assume the image Dðx; yÞ is of size M N pixels, the corresponding Integral Image for histogram is an array with ðM þ 1Þ ðN þ 1Þ rows and d columns. The Integral Image Ibu ðx; yÞ at location ðx; yÞ corresponds to the number of pixels that falls within the uth bin above and to the left of ðx; yÞ in the image: Ibu ðx; yÞ ¼ X x0px;y0py duðx0; y0Þ, (9) where duðx0; y0Þ ¼ 1 if the pixel at location ðx0; y0Þ belongs to the uth bin, o therwise duðx0; y0Þ ¼ 0. Using the following pair of recurrences: ibu ðx; yÞ ¼ ibu ðx 1; yÞ þ duðx; yÞ, Ibu ðx; yÞ ¼ Ibu ðx; y 1Þ þ ibu ðx; yÞ; u ¼ 1; . . . ; d, ð10Þ where ibu ðx; 0Þ ¼ 0, Ibu ð0; yÞ ¼ 0 for any x and y, the Integral Image for histogram can be computed in one pass over the original image. Given any rectangle, its histogram nuðu ¼ 1; . . . ; dÞ can be determined in 4d array references (see Fig. 2 and Eq. (11)) with Integral Histogram Image for u ¼ 1; . . . ; d: nu ¼ Ibu ðx þ w; y þ hÞ Ibu ðx þ w; yÞ Ibu ðx; y þ hÞ þ Ibu ðx; yÞ, ð11Þ where Ibu ðx; 0Þ ¼ Ibu ð0; yÞ ¼ 0, w and h are the width and height of the rectangle, respectively. The Integral Images for means and variances can be defined as follows: Imu;j ðx; yÞ ¼ X x0px;y0py duðx0; y0ÞDjðx0; y0Þ, Isu;j ðx; yÞ ¼ X x0px;y0py duðx0; y0ÞDjðx0; y0Þ2, u ¼ 1; . . . ; d; j ¼ 1; 2; 3. ð12Þ ARTICLE IN PRESS Fig. 2. Construction of Integral Image for histogram. On the left is a rectangle with width w and height h, and on the right each plane corresponds to one Integral Image plane of one bin. L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 679 With the following two pairs of recurrences: imu;j ðx; yÞ ¼ imu;j ðx 1; yÞ þ Djðx; yÞduðx; yÞ, Imu;j ðx; yÞ ¼ Imu;j ðx; y 1Þ þ imu;j ðx; yÞ, u ¼ 1; . . . ; d; j ¼ 1; 2; 3, ð13Þ isu;j ðx; yÞ ¼ isu;j ðx 1; yÞ þ Djðx; yÞ2duðx; yÞ, Isu;j ðx; yÞ ¼ Isu;j ðx; y 1Þ þ isu;j ðx; yÞ, u ¼ 1; . . . ; d; j ¼ 1; 2; 3, ð14Þ the Integral Images for means and covariances can be computed in one pass over the original image. Based on Eqs. (13) and (14), The mean and variance for the jth channel and the uth bin can be obtained in fast array index operations as below: mu;j ¼ 1 nu ðImu;j ðx þ w; y þ hÞ I mu;j ðx þ w; yÞ Imu;j ðx; y þ hÞ þ I mu;j ðx; yÞÞ, s2 u;j ¼ 1 nu ðIsu;j ðx þ w; y þ hÞ I su;j ðx þ w; yÞ Isu;j ðx; y þ hÞ þ I su;j ðx; yÞÞ m2 u;j , u ¼ 1; . . . ; d; j ¼ 1; 2; 3. (15) 3.2. Object tracking algorithm The object shape is represented by a rectangle which is allowed to move freely in the image plane and to change width and height with the same scale. Given the object location (position and size) in the previous frame, exhaustive search is made seeking the maximal mode in the neighboring region, the size of which is two times of the object size. To adapt to scale variation, the object size is changed 0:2 in scale and exhaustive search procedures are repeated again. The candidate with the maximum similarity is adopted. The search step in x and y directions is adopted as 10% of the object width and height, respectively. Exhaustive search guarantees that the global maximum be achieved, which is superior to a gradient-based algorithm such as the mean shift that can only get a local maximum. Fig. 3 shows an example. In the left image the girl’s face is tracked, which is occluded by the man’s face nearby. The right figure shows probability map in which the left, global maximum corresponds to the object, and the right, local maximum the man. The convergence of gradient descent (ascent)-based algorithm such as mean shift depends on the initial condition, which may be trapped in the local maximum. Thanks to Integral Images proposed, the similarity measure can be evaluated at negligible computational cost. Note that for tracking applications only the Integral Images of the neighboring region surrounding the object needs to be computed. It is very efficient and thus, despite brute-force search in the neighborhood the algorithm runs very fast. 4. Experiments The program is written with Cþþ on a laptop with 1.8GHz Intel Pentium-M 745 (Dothan) CPU and 512 Memory. The cluster number d is 6 in the proposed algorithm, and the mean shift algorithm is implemented with 32 32 32 bins. In both algorithms RGB color space is used. Initializations of both algorithms are by hand in the first frame and the ground truth is manually labelled. Four measures are adopted to compare the two algorithms: x, y coordinates and size of the computed rectangle, as well as area of overlapping region between the true bounding rectangle (ground truth) and the computed one (tracking result). In addition, as a measure to evaluate the amount of time in which the object is not effectively followed, the temporal fraction in which there is no overlap between the true bounding rectangle and the computed one is also used. In most of our experiments, the temporal fraction is zero which means effective tracking throughout the whole sequence. So in the following, only cases where the temporal fraction are not zero are explicitly indicated. ARTICLE IN PRESS Fig. 3. Exhaustive search guarantees the global maximum be achieved. In the left image the girl’s face is tracked, which is occluded by the man’s face nearby. The right figure shows probability map in which the left, global maximum corresponds to the object, and the right, local maximum the man. 680 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 4.1. Person tracking The experiment is conducted on a video clipped from the image sequence (size: 388 284) named ‘‘ThreePastShop2cor.mpg’’ (frames 480–915) [18]. Among the three subjects walking in the corridor, the one dressed in red clothe on the left side is followed. Note that from frame 260 the illumination varies, and from frames 360 to 380 one person occludes the interested subject gradually from the left. Despite these difficulties the proposed algorithm and the mean shift algorithm succeed in following the object throughout the complete sequence. The tracking errors vs. frame index are plotted in Fig. 4, and some of typical tracking results using the proposed algorithm are shown in Fig. 5. The average tracking errors and time of both algorithms are shown in Table 1. It can be seen that, the tracking errors of x and y coordinates and scale using the proposed algorithm are less than those using the mean shift algorithm. The variances of y coordinate and scale using the proposed algorithm are less than those using the mean shift, meanwhile the x coordinate variance of the former is a little more than that of the latter. During occlusion and the immediate short period that follows (frames 360–420) the scale error of the mean shift algorithm becomes very large, as shown in the bottom, left-hand corner in Fig. 6. Actually in this case size of the computed bounding rectangle using the mean shift tends to larger and almost encloses the true one. Therefore its area error becomes very small in this period. ARTICLE IN PRESS 0 50 100 150 200 250 300 350 400 450 0 5 10 15 Frame index X coordinate error of object centroid (pixels) The proposed algorithm Mean shift algorithm 0 50 100 150 200 250 300 350 400 450 0 5 10 15 Frame index Y coordinate error of object centroid (pixels) The proposed algorithm Mean shift algorithm 0 50 100 150 200 250 300 350 400 450 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frame index Scale error The proposed algorithm Mean shift algorithm 0 50 100 150 200 250 300 350 400 450 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frame index Error of overlapping region area The proposed algorithm Mean shift algorithm Fig. 4. Comparison of errors for person tracking between the mean shift algorith m (blue, dotted) and the proposed algorithm (red, solid). From left to right, top to bottom, are shown errors of x, y, scale and overlappi ng region area versus frame index. L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 681 Table 1 also shows the average tracking time per frame for the mean shift algorithm (16 ms) and the proposed algorithm (7 ms) in which most time (5 ms) is taken by computation of Integral Images. 4.2. Human face tracking The face image sequence (size: 256 192) is recorded in a typical office environment [17]. Comparisons between the two algorithms are shown in Fig. 6 and average errors and variances in Table 2. Some of typical tracking results using the proposed algorithm are presented in Fig. 7. Note that tracking errors of both algorithms in this video stream are larger than those of person sequence. It is not surprising because the face sequence is more challenging due to motion of both the camera and the subject, disappearance of the object, severe illumination changes and occlusion by a similar object. From frames 140 to 165 the subject gradually turns her back towards the camera and the face becomes invisible, and in the following consecutive 100 frames the illumination changes are considerable. The face becomes unseen again when the girl turns around from frames 270 to 360.When the face is invisible both trackers deviates from the target and the errors becomes large. The reason for this is that the reference color model is built from the subject’s frontal face. Thanks to the reference color model that contains some pixels of hair the deviation is not much and tracking recovers when the girl faces the camera again. From frames 630 to 710 a man’s face gradually occludes and un-occludes the tracked face and Fig. 8 shows different behaviors of the two algorithms. When a quite similar object appears nearby, two local maxima appear (please refer to Fig. 3), the gradient-based mean shift is trapped in a local maximum and locks on the man’s face. It can been seen from Fig. 6 that errors of x, y and scale of the mean shift becomes very large. But the proposed algorithm performs exhaustive search and so succeeds to handle this situation. The average errors of x, y coordinates and scale using the proposed algorithm are all less than those using the mean shift algorithm, as Table 2 shows. ARTICLE IN PRESS Fig. 5. Some of typical tracking results using the proposed algorithm. From left to right, top to bottom, are shown frames 20; 80; 148; 220; 322; 369; 381 and 430. Table 1 Comparison of tracking errors (means standard variances) and time for person tra cking X error (pixels) Y error (pixels) Scale error (%) Area error (%) Tracking timea (ms) Mean shift 2:3 1:8 4:7 2:7 0:12 0:26 0:14 0:15 16 The proposed 2:0 1:9 3:4 2:3 0:02 0:02 0:15 0:11 7 (5) aThe data in parenthesis is the average time to compute the Integral Images. 682 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 The average tracking time is 20ms for the mean shift and 15 ms for the proposed in which 12 ms is taken by computation of Integral Images. The time fraction of the proposed algorithm is significantly less, which indicates that in most frames the object is successfully tracked. 4.3. Performance evaluation of the proposed algorithm vs. cluster number and color space The cluster number in the above experiments is 6, and it is interesting to see performance variation vs. cluster number, which is shown in Table 3 for ARTICLE IN PRESS 0 100 200 300 400 500 600 700 800 0 10 20 30 40 50 60 70 Frame index X coordinate error of object centroid (pixels) The proposed algorithm Mean shift algorithm 0 100 200 300 400 500 600 700 800 0 20 40 60 80 100 120 Frame index Y coordinate error of object centroid (pixels) The proposed algorithm Mean shift algorithm 0 100 200 300 400 500 600 700 800 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Frame index Scale error The proposed algorithm Mean shift algorithm 0 100 200 300 400 500 600 700 800 0 0.2 0.4 0.6 0.8 1 Frame index Error of overlapping region area The proposed algorithm Mean shift algorithm Fig. 6. Comparison of errors for face tracking between the mean shift algorithm (blue, dotted) and the proposed algorithm (red, solid). From left to right, top to bottom are shown errors of x, y, scale and overlappin g region area versus frame index. Table 2 Comparison of tracking errors (means standard variances) and time for face track ing X error (pixels) Y error (pixels) Scale error (%) Area errora (%) Tracking timeb (ms) Mean shift 12:0 13:4 17:0 26:0 0:24 0:24 0:57 0:41 ð0:22Þ 20 The proposed 10:2 10:7 13:3 14:1 0:14 0:17 0:51 0:32 ð0:09Þ 15 (12) aThe data in parenthesis is the time fraction in which the object is not effecti vely tracked. bThe data in parenthesis is the average time to compute the Integral Images. L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 683 person tracking and in Table 4 for face tracking, respectively. As demonstrated in Table 3, with increase of the cluster number, scale error becomes larger whereas area error gets less, and it is seen that y error fluctuates. X error gradually increases from d ¼ 12 to 24 but is still less than that at d ¼ 6. For face tracking, y, scale and area errors at d ¼ 12; 18; 24 are less than that at d ¼ 6. It is seen that x error at d ¼ 12 is less whereas it is larger at d ¼ 18; 24, in contrast with that at d ¼ 6. The tendency of consistent increase or decrease is not obvious since with the increase of d, fluctuation of each error is almost always observed. For both examples, tracking time are seen on the significant increase when cluster number grows. In all, the performance of the proposed algorithm will be slightly improved with increase of cluster number, however, at the cost of consumption of much more CPU time. It shows that it is generally sufficient for the proposed algorithm to describe well color information of a target with a small number of cluster number. For the sake of simplicity of the color model and computational efficiency, assumption is made that gray-level distribution in different RGB channel is independent. Although correlations exist between channels in RGB space, experiments in Sections 4.1, ARTICLE IN PRESS Fig. 8. Comparison of two algorithms when a similar object occludes the subject. Top row shows results with the proposed algorithm and bottom row with mean shift algorithm. From left to right shown are frames 630; 6 62; 670; 690 and 700. Table 3 Performance vs. cluster number using the proposed algorithm for person tracking Cluster numberd Xerror (pixels) Y error (pixels) Scale error (%) Area error (%) Tracking timea (ms) 6 2:01 1:94 3:37 2:34 0:021 0:021 0:153 0:109 7 (5) 12 1:75 1:57 4:30 2:41 0:039 0:097 0:132 0:100 15 (12) 18 1:78 1:48 2:78 2:45 0:060 0:093 0:057 0:036 24 (19) 24 1:83 1:55 3:62 2:75 0:074 0:101 0:064 0:039 32 (26) aThe data in parenthesis is the average time to compute the Integral Images. Fig. 7. Some of typical tracking results using the proposed algorithm. From left to right, top to bottom, shown are frames 1; 90; 160; 260; 320; 378; 460 and 700. 684 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 4.2 and 4.4 prove that the color model under such a assumption works well. Full consideration of covariance matrix may improve performance of the algorithm, however, at the cost of huge increase of computational load as lack of fast algorithm currently. It is interesting to see the performance of the proposed algorithm in other color spaces with greater channel separation, particularly in YCbCr, CIELAB and HSV color spaces. For person tracking, as Table 5 shows, in comparison with errors in RGB space, y error in YCbCr space increases while the other three decrease, and almost all errors increase in CIELAB and HSV spaces. For the more challenging face sequence, as shown in Table 6, tracker fails in both YCbCr and HSV spaces, where the object is lost from 370 in the former and from 150 in the latter and never recovers. In CIELAB space, x and y errors are larger than those in RGB space, whereas scale error, area error and the time fraction are less than those in RGB space. From experiments above, we see that among some factors including independence assumption we made, illumination and appearance changes may play dominant roles in affecting performance of one tracking algorithm in different color spaces. We note that, to handle the above problem, some researchers investigate how to dynamically select the best one from many color spaces [16] or the best color features based on linear combination of different channels in a color space [3]. 4.4. More tracking results More experiments are made to testify the performance of the algorithm on image sequences accommodating different scenarios, where sequences 1 and 2 are both concerned with vehicle tracking, and sequences 3 and 4 pedestrian tracking. Tracking results are summarized in Table 7. In sequence 1 (frames 560 to 760, size: 768 576) [9], a car was moving on the highway at an accelerating speed the back of which was captured ARTICLE IN PRESS Table 5 Performance vs. color space using the proposed algorithm for person tracking (cl uster number is 6) Color space X error (pixels) Y error (pixels) Scale error (%) Area error (%) RGB 2:01 1:94 3:37 2:34 0:021 0:021 0:153 0:109 YCbCr 1:52 1:67 3:81 2:54 0:019 0:079 0:130 0:108 CIELAB 2:76 2:94 4:03 2:18 0:008 0:010 0:200 0:146 HSV 2:58 2:23 5:99 3:77 0:036 0:035 0:324 0:217 Table 4 Performance vs. cluster number using the proposed algorithm for face tracking Cluster numberd Xerror (pixels) Y error (pixels) Scale error (%) Area errora (%) Tracking timeb (ms) 6 10.15710.71 13.30714.07 0.14370.173 0.51470.316 (0.094) 15 (12) 12 8.85710.51 10.74711.16 0.09870.147 0.38670.315 (0.011) 35 (32) 18 10.5179.58 11.34713.15 0.09770.121 0.42770.319 (0.040) 50 (44) 24 10.9871.68 9.79710.34 0.11570.162 0.40970.329 (0.045) 67 (61) aThe data in parenthesis is the time fraction in which the object is not effecti vely tracked. bThe data in parenthesis is the average time to compute the Integral Images. Table 6 Performance vs. color space using the proposed algorithm for face tracking (clus ter number is 6) Color space X error (pixels) Y error (pixels) Scale error (%) Area errora (%) RGB 10:15 10:71 13:30 14:07 0:143 0:173 0:514 0:316 ð0:224Þ YCbCr — — — — CIELAB 12:25 11:70 18:57 21:77 0:109 0:142 0:513 0:319 ð0:165Þ HSV — — — — aThe data in parenthesis is the time fraction in which the object is not effecti vely tracked. L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 685 with a camera installed in another vehicle following it. In this scenario, both the foreground and the background are moving and the appearance changes are non-trivial as the car tracked moves farther and farther away. As seen in the first column of Table 7, all tracking errors but y error of the proposed algorithm are less than those of the mean shift. It takes on average about 20ms for mean shift in comparison with 16 ms for the proposed algorithm. In the second sequence (frames 990 to 1350, size: 768 576) [10], one hatchback entered the view from the left, moving forward on the road while passing in front of a row of parked vehicles, and finally, moving backward and parked in a slot. In this situation the parked cars nearby which are similar in appearance to the hatchback pose threats to trackers. The second column of Table 7 shows tracking results of the proposed algorithm are better than those of the mean shift except y coordinate. The mean shift takes about 21ms and the proposed algorithm 19 ms to track object. The scenario in sequence 3 (frames 208 to 430, size: 720 576) [11] is a train station hall. A person walked quickly to the exit of the hall, away from the camera. As the person walked fast severe motion blurring occurs in the appearance of the object. As indicated by the third column in Table 7, both algorithms have almost the same scale error. While y error of the proposed is less than that of the mean shift, its x error is larger. The main reason that area error of the mean shift is less, is that, when illumination changes from about frame 380, the size of the computed bounding rectangle tends to larger and almost encloses the true one. The average tracking time is 23 for the mean shift and 10 ms for the proposed algorithm. In sequence 4 (frames 126 to 280, size: 720 576) [12] a lady walks straightforward from the left to the right. During frames 230 to 250 a man occludes in part the lady while walking past. In this scenario all tracking errors of the proposed are less than those of the mean shift. The average time is 25 and 17ms for the proposed algorithm and the mean shift, respectively. 5. Conclusions In the paper a color model is proposed based on K-means clustering, in which the color space is partitioned adaptively and the histogram bins are determined accordingly. Moreover, the distribution of multi-channel gray level is modelled within each bin to catch more information on object color. To measure similarity between two color models, a similarity measure is defined based on Bhattacharrya distance and its simplified form is derived. Thanks to the Integral Images proposed, the tracking algorithm is able to search exhaustively but efficiently for the global maximal mode in the neighboring region. The comparisons with the wellknown mean shift show that the proposed algorithm has better performance while retaining the same (or less) computational cost. Currently the bin number is empirically set, which is applicable to all our experiments. Nevertheless it is desirable to automatically determine the number of bins to account for illumination changes or noise ARTICLE IN PRESS Table 7 Comparisons of tracking results with different image sequences Algorithm Sequence 1 Sequence 2 Sequence 3 Sequence 4 X error Mean shift 3:6 3:7 8:0 7:1 2:2 2:0 8:8 9:6 Proposed 2:9 2:5 6:0 4:6 3:2 2:2 7:1 7:5 Y error Mean shift 1:6 1:6 4:0 3:0 3:0 2:4 10:1 14:3 Proposed 1:8 1:6 8:0 7:5 2:6 2:2 9:1 6:7 Scale error Mean shift 0:0278 0:0314 0:041 0:026 0:021 0:022 0:118 0:087 Proposed 0:0114 0:0120 0:026 0:018 0:021 0:022 0:035 0:050 Area error Mean shift 0:0554 0:0393 0:319 0:124 0:280 0:118 0:301 0:370 Proposed 0:187 0:0951 0:315 0:109 0:411 0:085 0:255 0:271 Timea Mean shift 20 21 23 25 Proposed 16 (14) 19 (16) 10 (8) 17 (15) Unit of X, Y error: pixels; unit of tracking time: ms; unit of scale and area: % . aThe data in parenthesis is the average time to compute the Integral Images. 686 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 while retaining a good discriminative power. Once the Integral Images are computed, the color model can be evaluated very fast. Applications of the Integral Images along with the color model are therefore possible to tasks where a brute-forth yet efficient search is needed, such as object detection and sub-image retrieval, which are our future work. Acknowledgments The work was supported by the National Natural Science Foundation of China (NSFC) under Grant Number 60505006, Natural Science Foundation of Hei Long Jiang Province (F200512), Science and Technology Research Project of Educational Bureau of Hei Long Jiang Province (1151G033), Postdoctoral Fund for Scientific Research of Hei Long Jiang Province (LHK-04093) and Science Fund of Hei Long Jiang University for Distinguished Young Scholars (JC200406). References [1] S.T. Birchfield, S. Rangarajan, Spatiograms versus histograms for region-based tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, June 2005, pp. 1158–1163. [2] H.-D. Cheng, X.-H. Jiang, Y. Sun, J. Wang, Color image segmentation: advances and prospects, Pattern Recognition 34 (12) (2001) 2259–2281. [3] R. Collins, Y. Liu, On-line selection of discriminative tracking features, in: Proceedings of the IEEE Conference on Computer Vision, Nice, France, 2003, pp. 346–352. [4] R.T. Collins, Mean-shift blob tracking through scale space, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2003, pp. 234–241. [5] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of non-rigid objects using mean shift, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2000, pp. 142–149. [6] A. Gersho, R. Gray, Vector Quantization and Signal Compression, Kluwer Publishers, Dordrecht, 1992. [7] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice-Hall, Englewood Cliffs, NJ, 1988. [8] A.K. Jain, M. Murthy, P. Flynn, Data clustering: a review, ACM Comput. Rev. 31 (3) (1999) 264–323. [9] PETS2001 datasets, The University of Reading, UK, found at URL: hhttp://peipa.essex.ac.uk/ipa/pix/pets/PETS2001/ DATASET5/TESTING/CAMERA1_JPEGS/i. [10] PETS2001 datasets, The University of Reading, UK, found at URL: hhttp://peipa.essex.ac.uk/ipa/pix/pets/PETS2001/ DATASET5/TRAINING/CAMERA1_JPEGS/i. [11] PETS 2006 dataset S7 camera 4, ISCAPS consortium, found at URL: hhttp://ftp.cs.rdg.ac.uk/PETS2006/S3-T7-A.zipi. [12] PETS 2006 dataset S7 camera 3, ISCAPS consortium, found at URL: hhttp://ftp.cs.rdg.ac.uk/PETS2006/S3-T7-A.zipi. [13] F. Porikli, Integral histogram: a fast way to extract histograms in cartesian spaces, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005, pp. 829–863. [14] Y. Raja, S.J. McKenna, S. Gong, Colour model selection and adaptation in dynamic scene, in: Proceedings of the European Conference on Computer Vision, 1998, pp. 460–474. [15] C. Stauffer, W.E. Grimson, Learning patterns of activity using real-time tracking, IEEE Trans. Pattern Anal. Machine Intell. 22 (8) (2000) 747–757. [16] H. Stern, B. Efros, Adaptive color space switching for tracking under varying illumination, Image Vision Comput. 23 (3) (2005) 353–364. [17] Test image sequences for face tracking by Stan Birchfield, found at URL: hhttp://vision.stanford.edu/birch/headtracker/ seq/i. [18] The EC Funded CAVIAR project/IST 2001 37540, found at URL: hhttp://homepages.inf.ed.ac.uk/rbf/CAVIAR/i. [19] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 511–518. [20] H. Wang, P. Li, T. Zhang, Proposal of novel histogram features for face detection, in: International Conference on Advances in Pattern Recognition, Bath, UK, 2005, pp. 334–343. [21] C. Wren, A. Azarbayejani, T. Darrell, A.P. Pentland, Pfinder: real-time tracking of the human body, IEEE Trans. Pattern Anal. Machine Intell. 19 (7) (1997) 780–785. [22] C. Yang, R. Duraiswami, L. Davis, Efficient mean-shift tracking via a new similarity measure, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 176–183. [23] Q. Zhao, H. Tao, Object tracking using color correlogram, in: IEEE Workshop on VS-PETS, 2005. ARTICLE IN PRESS L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 687