Professional Documents
Culture Documents
9, SEPTEMBER 2014
Abstract The segmentation of categorized objects addresses level information across the whole image collection to simul-
the problem of joint segmentation of a single category of object taneously extract a foreground object from all images, instead
across a collection of images, where categorized objects are of segmenting the images independently through modeling
referred to objects in the same category. Most existing methods
of segmentation of categorized objects made the assumption just one single image. Such a problem is referred to as the
that all images in the given image collection contain the target segmentation of categorized objects, and has been actively
object. In other words, the given image collection is noise free. studied in recent papers [1][9].
Therefore, they may not work well when there are some noisy Nevertheless, most methods of segmentation of categorized
images, which are not in the same category, such as those image objects were built on the assumption that all images
collections gathered by a text query from modern image search
engines. To overcome this limitation, we propose a method for of the given collection contain the target object, which
automatic segmentation and recognition of categorized objects renders them unable to handle situations where the given
from noisy Web image collections. This is achieved by cotrain- collection of images contain noisy images which do not
ing an automatic object segmentation algorithm that operates contain the target category of object. Such noisy image
directly on a collection of images, and an object category collection may be gathered, e.g., by performing a text
recognition algorithm that identifies which images contain the
target object. The object segmentation algorithm is trained on query using one of the main stream image search engines
a subset of images from the given image collection, which are such as Google and BING image search. This motivated
recognized to contain the target object with high confidence, us to build an automated program to jointly cleanse and
whereas training the object category recognition model is guided extract the categorized objects from noisy Web image
by the intermediate segmentation results obtained from the object collections.
segmentation algorithm. This way, our cotraining algorithm
automatically identifies the set of true positives in the noisy Web On the other hand, most previous works on segmentation
image collection, and simultaneously extracts the target objects of categorized objects only utilized appearance [3][5], [8] or
from all the identified images. Extensive experiments validated shape [1], [7] cues as consistency constraints on foreground
the efficacy of our proposed approach on four data sets: 1) the objects across the image set. Some of these methods attempted
Weizmann horse data set; 2) the MSRC object category data to automatically learn a template for object [2] from the
set; 3) the iCoseg data set; and 4) a new 30-categories data set,
including 15 634 Web images with both hand-annotated category image collection. While some others resorted to labeled pixels
labels and ground truth segmentation labels. It is shown that our for foregound/background modeling [5], [9]. Most of these
method compares favorably with the state-of-the-art, and has the previous works neglected beneficial high level information,
ability to deal with noisy image collections. such as spatial context, across the image set. Context comes
Index Terms Segmentation of categorized objects , cosegmen- with a variety of forms, e.g., different parts of an object
tation, object recognition, auto-context model. can be context to each other, and it can be referred to
as Gestalt laws in middle level knowledge regarding intra-
I. I NTRODUCTION object configurations and inter-object relationships. Intuitively,
they should provide valuable information for object segmen-
W HEN fed with a collection of images in the same
object category, it is beneficial to leverage the high
tation and recognition [6], [10][15], especially when one
tries to jointly extract the categorized objects from a set of
Manuscript received November 20, 2013; revised May 1, 2014; accepted images.
June 27, 2014. Date of publication July 14, 2014; date of current version We extend the auto-context model originally proposed by
August 11, 2014. This work was supported in part by the China 973 Program Tu [12] to simultaneously extract the categorized objects from
under Grant 2012CB316400, and in part by the Natural Science Foundation
of China under Grant 61228303. The work of G. Hua was supported in part a set of images. Tu [12] learned the auto-context model from
by the U.S. National Science Foundation under Grant IIS 1350763, in part by a large number of images with pixel-wise labels. Previously,
the Google Research Faculty Award, and in part by the GHs Start-Up Funds Wang et al. [13] incorporated it in an energy minimization
Stevens Institute of Technology. The associate editor coordinating the review
of this manuscript and approving it for publication was Dr. Olivier Bernard. framework for automatic object of interest extraction from a
L. Wang, J. Xue, Z. Gao, and N. Zheng are with the Institute of Arti- single image, where the auto-context model was iteratively
ficial Intelligence and Robotics, Xian Jiaotong University, Xian 710049, estimated from a single image. In contrast, in our new
China (e-mail: wangleabc@gmail.com; jrxue@mail.xjtu.edu.cn; zhanning-
gao@gmail.com; nnzheng@mail.xjtu.edu.cn). approach, the auto-context model is trained on all images
G. Hua is with the Department of Computer Science, Stevens Institute of of the image collection without using any pixel-wise labels.
Technology, Hoboken, NJ 07030 USA (e-mail: ghua@stevens.edu). It is able to exploit a large amount of contextual informa-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. tion from the image collection, which facilitates more robust
Digital Object Identifier 10.1109/TIP.2014.2339196 object/background segmentation.
1057-7149 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WANG et al.: JOINT SEGMENTATION AND RECOGNITION OF CATEGORIZED OBJECTS 4071
A. Cleansing Web Images Our object segmentation model is cast into an energy
There is considerable previous work on cleansing image minimization framework with an embedded auto-context
sets from the raw output of image search engines, either in model [12]. Energy minimization on Markov Random Fields
an interactive fashion [21], [22] or in an automatic way [16], provides a standard framework to extract an object from a
[23][25]. Their main goal is to gather a large number of high single image [13], [17], [26][32], and they may be further
quality images of a specified object category from the Web for extended to extract the target objects from a collection of
visual concept learning by removing irrelevant images from images one by one. However, because of the single image
image search results. The interactive methods [21], [22] are modeling, they only model various visual cues such as appear-
capable of building large collections of images with ground ance [17], [26], [27], shape [31], and context [13] confined in a
truth labels, but they depend heavily on human efforts. Most of single image. Some of these previous works also made strong
the automatic methods [16], [23], [24] leverage an object cate- assumption on where the target objects are located [27][30],
gory model trained on text and/or visual features to distinguish such as the center of the image. Important contextual informa-
images with high confidence from outliers. In [25], a scheme tion across the image collection is neglected. Therefore, in our
named ARTEMIS was proposed to enhance automatic selec- formulation, we embed an auto-context model into an energy
tion of training images from noisy user-tagged Web images minimization framework, which is automatically trained on
using an instance-weighted mixture modeling framework. all images to effectively exploit the rich spatial contextual
Compared to the above methods, our approach is fully information presented across the image collection.
automatic without any user intervention, and its benefits are Our research is also related to object cosegmentation
two-fold: 1) we first employ both text-based and visual- [33][40], where the appearance consistency of the foreground
based image filtering to remove the illustration images, which objects across the image collection is exploited to benefit
have obvious differences with the images of the target object object segmentation. The goal of cosegmentation is to simul-
category in terms of text and visual features, and 2) we taneously segment a specific object from two or more images,
then remove the remaining difficult outliers by an object it is assumed that all images contain that object. Among
category model, which is trained on the categorized image these, there are a number of recent works [35], [38][40]
collection and its segmentations. Moreover, the object category that consider interleaving cosegmentation and discriminative
model is updated and strengthened with the expansion of the learning in an unsupervised fashion, while considering diverse
categorized image collection. object instances from the same category. There are also several
cosegmentation methods [41], [42] that further conduct the
cosegmentation of multiple objects of multiple categories, in
B. Segmentation of Categorized Objects which they assumed that each image should contain at least
A number of approaches recently have focused on simul- one object among the multiple categories. In contrast, we try
taneous segmentation of categorized objects from a set of to cosegment a collection of categorized images with different
images, through either supervised learning [5], [8], [9] or object instances of an unknown category, and some of the
unsupervised learning [1][4], [6], [7]. Most of them model the outlier images may not contain the categorized object at all.
appearance cues [3][5], [8], and/or the object shape [1], [7]
or subspace structure [8] across the image set. The super- C. Category Recognition
vised methods [5], [8], [9] estimated appearance models for The goal of image category recognition is to predict whether
foreground object through labeled pixels obtained from user an image belongs to a certain category. There are a number
interactions. In unsupervised methods, the aim is to automat- of recent works on category recognition using various models,
ically segment the different instances of an object from a set such as part-based models [43], [44] and bag-of-words mod-
of images. els [16], [20], [45][47]. It is out of the scope of our paper to
Among the unsupervised methods, the style of alternat- discuss all of them.
ing between learning a categorized object model and jointly A lot of methods based on bag-of-words model have shown
extracting the target categorized objects in all images is closest impressive results on image recognition in many settings [16],
to our work [1][3], [7]. In [7], the categorized object model [47][49], and provide several advantages over traditional
was trained on appearance and shape of the target object approaches of matching local features [50]. Such models are
category. Arora et al. [2] learned a consistent template based efficient due to the structure free representation of images and
on location and appearance across all images, and the seg- objects with dense patches [45]. Hence, due to its simplicity
mentation of the images was individually estimated. Winn and and efficacy, in our case of joint segmentation and recognition
Joijc [1] used a generative probabilistic model by incorporating of categorized objects from noisy Web image collections, the
shape, edge and color cues, and made an assumption on object bag-of-words model is selected as our object category model
shape consistency. In addition to these visual cues leveraged to recognize categorized images from noisy Web images. It
in the previous methods, our proposed method also models the is trained using histogram intersection kernel SVM due to its
contextual cue to facilitate more robust segmentation. Besides, success in recognition [51], [52].
we explicitly address the issue when there are outlier images
presented in the categorized image collection, while previous D. Joint Segmentation and Recognition
works all assumed that images from the categorized image Joint segmentation and recognition of a categorized object
collection all contain the target object. from a single image or an image collection has been
WANG et al.: JOINT SEGMENTATION AND RECOGNITION OF CATEGORIZED OBJECTS 4073
image Ik , thus it is image dependent, which is the same as of image Ik , and L0k = {L 0kp | p Ik } is the initial segmen-
the image dependent one built in [13] by fusing the color tation map for image Ik obtained by jointly using a visual
and intensity cues. Moreover, ( p,q)Nk pq (L kp , L kq ) is saliency model [57] and an adaptive selection mechanism
0 O0 denotes the structured patches of
the spatial prior term to encourage the labels (i.e., L kp and designed in [13]. Okp k
L kq ) of neighboring pixels (i.e., p and q) to be consistent, the auto-context model centered at pixel p of image Ik , and
where Nk is the set of neighboring pixels in image Ik . pq O0k = {Okp 0 | p I } are the structured patches of all sample
k
is computed based on the edge probability map from Martin pixels for image Ik , which are sampled from the discriminative
et al. [56], and (L kp , L kq ) is a Dirac delta function. The probability map P0k of image Ik .
subscript ( p, q) Nk denotes that the spatial prior term is Here, we directly use the saliency map generated by a
computed from the target single image Ik , thus it is image visual saliency model [57] as the initial probability map
dependent and the same as the one in [13]. The parameter P0k = { pkp 0 | p I } for each image I . The saliency values are
k k
WANG et al.: JOINT SEGMENTATION AND RECOGNITION OF CATEGORIZED OBJECTS 4075
Fig. 3. From left to right: the categorized image collection, its segmentation
maps and discriminative probability maps. The sampling structure of the auto-
context model is illustrated on the probability maps.
L 2kp
2 denotes the probability on the new discriminative rejecting the outlier/noisy images. Suppose we have collected a
where pkp
categorized image collection I = {Ik }k=1
K including K images,
probability map P2k of image Ik . and have extracted the objects from them. A rejected image
This process iterates until convergence, where the discrimi- Ni
set R = {Ri }i=1 including Ni noisy images is also collected.
native probability maps are not changing anymore. Actually, in The training process of the object category model is sum-
our formulation, the auto-context model is seamlessly updated marized in Fig. 5. We first compute the SIFT descriptors [50]
with the iterative energy minimization of Eq(1). We outline the on a regular grid [45] across each categorized image Ik and
iterative process of the auto-context model in Fig. 4. each noisy image Ri . The SIFT descriptors computed on the
image collection I and the rejected image set R are then
D. Categorized Images Recognition clustered into visual words by using k-means, and the visual
We leverage a bag-of-words model [20], [45] as the words to form a visual word vocabulary. We then compute
object category model to recognize categorized images while the histogram of visual words from each image Ik , each
4076 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014
TABLE IV
S EGMENTATION A CCURACIES OF O UR M ETHOD AND 4 O THER
C OSEGMENTATION M ETHODS ON MSRC DATASET [10]
TABLE V
S EGMENTATION A CCURACIES OF O UR M ETHOD AND 5 O THER
C OSEGMENTATION M ETHODS ON iC OSEG D ATASET [36]
Fig. 10. Comparison results of our method with [8] on MSRC dataset [10].
The 1, 3 and 5 columns are segmentation results from [8]; the 2, 4 and 6
columns are our results.
TABLE VI
S TATISTICS ON THE N OISY W EB I MAGE C OLLECTION , THE C ANDIDATE W EB I MAGE
C OLLECTION AND THE C ATEGORIZED I MAGE C OLLECTION
TABLE VII
R ECOGNITION P RECISION AND P RECISION OF THE C ATEGORIZED I MAGE
C OLLECTION FOR E ACH OF THE 30 C ATEGORIES BY U SING THE O BJECT
C ATEGORY M ODEL T RAINED T HROUGH T WO V ERSIONS . V-1 D ENOTES
T HAT THE O BJECT C ATEGORY M ODEL I S T RAINED J UST ON THE
C ATEGORIZED I MAGES AND THE N OISY I MAGES ; V-2 D ENOTES T HAT
THE O BJECT C ATEGORY M ODEL I S T RAINED ON THE C ATEGORIZED
I MAGES AND T HEIR S EGMENTATIONS , AND THE N OISY I MAGES
Fig. 14. The numbers of categories for which the precision and recall are
greater than 60%, 70%, 80% and 90%, respectively.
TABLE VIII
AVERAGE F-M EASURES FOR 30 C ATEGORIES BY U SING
T WO V ERSIONS OF O UR S EGMENTATION M ETHOD
Fig. 15. Average F-measures of the first 100 images for 5 categories varying
with the numbers of images of the categorized image collection.
Fig. 16. The 1st and 3rd rows: the segmentation results obtained by our
object segmentation method, varying with the augmentation of the categorized
image collection, where the above numbers are the numbers of images of
the corresponding categorized image collection. The 2nd and 4th rows: the
the augmented categorized image collection, which can further probability maps output by the last iteration of the auto-context model, varying
improve the segmentation on those images that we have with the augmentation of the categorized image collection. The example
images are from our 30-categories image dataset.
processed before. We present the average F-measures of the
first 100 images for 5 categories of images varying with
objects from general Web images with moderate variations
the numbers of images in the categorized image collection
in color, size, pose, viewpoint, and shape on most of the
segmented at each time in Fig. 15, and the segmentation results
categories, but encounters difficulties when the objects have
obtained by our object segmentation method and the probabil-
very similar color with the backgrounds (e.g., the cruise and
ity maps output by the last iteration of the auto-context model
tree frog), or exhibit dramatic variations in shape (e.g., the
of some example images varying with the augmentation of
eagle, elephant, and starfish) and size (e.g., the hummingbird),
the categorized image collection in Fig. 16. As demonstrated
or have very complex shape (e.g., the helicopter), or when
by the average F-measures and the segmentation results, the
the background is very cluttered (e.g., the clownfish and
object segmentation model indeed can be strengthened while
gecko).
learning the auto-context model from the augmented image
3) Convergence Analysis: It may not be feasible to derive
collection, and thus results in better segmentations; and as
a strict theoretic guarantee of the convergence of our object
shown in the probability maps, the auto-context model indeed
segmentation method which is cast into an energy mini-
can be enhanced while exploiting the contextual information
mization framework, but empirically it always converges. In
from the augmented image collection, and thus results in better
our experiments, if the energy values of consecutive three
estimations of the probability maps.
iterations satisfy both
Fig. 17 gives some examples of segmented categorized
images of the 30 categories. The 1st and 2nd examples of E T 2 (L) E T 1 (L)
< 0.01
each category are the segmentation results on the first 2 images E T 2 (L)
of the categorized image collection; the 3rd example of each and
category is the sample segmentation result on image with small
E T 1 (L) E T (L)
foreground object; and the 4th example of each category is the < 0.01,
sample failure segmentation result of the categorized image E T 1 (L)
collection. As the results in Table VIII and Fig. 17 shown, the iteration will terminate on the T th iteration. We present
our object segmentation method is capable of segmenting the the trend of the energy function on the first 40 images on
WANG et al.: JOINT SEGMENTATION AND RECOGNITION OF CATEGORIZED OBJECTS 4083
6 categories of our 30-categories image dataset in Fig. 18. 4) Initialization Impact: To illustrate the impact of the
According to the experimental results, each step of the initialization on the performance of the auto-context model and
energy minimization ensures that the energy in Eq(1) is non- also our object segmentation method, we present the interme-
increasing, and the energy function always converges within diate probability maps output by the auto-context model and
10 iterations. the intermediate segmentation results obtained by our object
4084 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014
R EFERENCES
[1] J. Winn and N. Jojic, LOCUS: Learning object classes with unsuper-
vised segmentation, in Proc. 10th IEEE ICCV, Oct. 2005, pp. 756763.
[2] H. Arora, N. Loeff, D. A. Forsyth, and N. Ahuja, Unsupervised
segmentation of objects using efficient learning, in Proc. IEEE CVPR,
Jun. 2007, pp. 17.
[3] E. Borenstein and S. Ullman, Learning to segment, in Proc. 8th ECCV,
May 2004, pp. 315328.
[4] L. Cao and L. Fei-Fei, Spatially coherent latent topic model for
concurrent segmentation and classification of objects and scenes, in
Proc. IEEE 11th ICCV, Oct. 2007, pp. 18.
Fig. 18. Energy values in the iterative process of energy minimization on [5] J. Cui et al., Transductive object cutout, in Proc. IEEE CVPR,
the first 40 images for 6 categories of our 30-categories image dataset. Jun. 2008, pp. 18.
[6] Y. J. Lee and K. Grauman, Collect-cut: Segmentation with top-
down cues discovered in multi-object images, in Proc. IEEE CVPR,
Jun. 2010, pp. 31853192.
[7] B. Alexe, T. Deselaers, and V. Ferrari, Classcut for unsupervised class
segmentation, in Proc. 11th ECCV, 2010, pp. 380393.
[8] L. Mukherjee, V. Singh, J. Xu, and M. D. Collins, Analyzing the
subspace structure of related images: Concurrent segmentation of image
sets, in Proc. 12th ECCV, 2012, pp. 128142.
[9] Y. N. Law, H. K. Lee, M. K. Ng, and A. M. Yip, A semisupervised
segmentation model for collections of images, IEEE Trans. Image
Process., vol. 21, no. 6, pp. 29552968, Jun. 2012.
[10] J. Shotton, J. Winn, C. Rother, and A. Criminisi, Textonboost: Joint
appearance, shape and context modeling for multi-class object recogni-
tion and segmentation, in Proc. 9th ECCV, 2006, pp. 115.
[11] A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and
S. Belongie, Objects in context, in Proc. IEEE 11th ICCV, Oct. 2007,
pp. 18.
[12] Z. Tu, Auto-context and its application to high-level vision tasks, in
Proc. IEEE CVPR, Jun. 2008, pp. 18.
[13] L. Wang, J. Xue, N. Zheng, and G. Hua, Automatic salient object
extraction with contextual cue, in Proc. IEEE ICCV, Nov. 2011,
pp. 105112.
[14] J. Xue, L. Wang, N. Zheng, and G. Hua, Automatic salient object
extraction with contextual cue and its applications to recognition and
Fig. 19. The 1st, 3rd and 5th rows: the intermediate segmentation results alpha matting, Pattern Recognit., vol. 46, no. 11, pp. 28742889, 2013.
obtained by our object segmentation method at each iteration. The 2nd, 4th
[15] L. Wang, G. Hua, R. Sukthankar, J. Xue, and N. Zheng, Video object
and 6th rows: the intermediate probability maps output by the auto-context
discovery and co-segmentation with extremely weak supervision, in
model at each iteration. The example images are from our 30-categories image
Proc. ECCV, 2014.
dataset.
[16] L.-J. Li and L. Fei-Fei, OPTIMOL: Automatic online picture collection
via incremental model learning, Int. J. Comput. Vis., vol. 88, no. 2,
pp. 147168, 2010.
segmentation method of each iteration for some example [17] Y. Y. Boykov and M.-P. Jolly, Interactive graph cuts for optimal
boundary & region segmentation of objects in N-D images, in Proc.
images from our 30-categories image dataset in Fig. 19. As 8th IEEE ICCV, 2001, pp. 105112.
the example results shown, the auto-context model and also [18] Y. Boykov and G. Funka-Lea, Graph cuts and efficient N-D image
our fully automatic object segmentation method are robust to segmentation, Int. J. Comput. Vis., vol. 70, no. 2, pp. 109131, 2006.
[19] L. Wang, J. Xue, N. Zheng, and G. Hua, Concurrent segmentation
the initial salient region, as long as it is not totally off the of categorized objects from an image collection, in Proc. 21st ICPR,
target. This conclusion can also be obtained by observing all Nov. 2012, pp. 33093312.
the results in our experiments. [20] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman,
To summarize, as shown above, our approach has the capa- Discovering object categories in image collections, in Proc. ICCV,
2005.
bility of automatically collecting a large number of categorized [21] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, LabelMe:
images from noisy Web images, and clearly segmenting the A database and web-based tool for image annotation, Int. J. Comput.
categorized objects from them. Vis., vol. 77, nos. 13, pp. 157173, 2008.
[22] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet:
A large-scale hierarchical image database, in Proc. IEEE CVPR,
Jun. 2009, pp. 248255.
VI. C ONCLUSIONS [23] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, Learning object
categories from Googles image search, in Proc. 10th IEEE ICCV,
We propose a method for automatically extracting and Oct. 2005, pp. 18161823.
recognizing categorized objects from noisy Web image collec- [24] F. Schroff, A. Criminisi, and A. Zisserman, Harvesting image databases
tions. This is achieved by co-training of an object segmentation from the web, in Proc. IEEE 11th ICCV, Oct. 2007, pp. 18.
[25] N. Sawant, J. Z. Wang, and J. Li, Enhancing training collections for
model with an embedded auto-context model learned from image annotation: An instance-weighted mixture modeling approach,
all categorized images, and an object category model learned IEEE Trans. Image Process., vol. 22, no. 9, pp. 35623577, Sep. 2013.
WANG et al.: JOINT SEGMENTATION AND RECOGNITION OF CATEGORIZED OBJECTS 4085
[26] X. Chen, J. K. Udupa, U. Bagci, Y. Zhuge, and J. Yao, Medical image [54] S. Yu, R. Gross, and J. Shi, Concurrent object recognition and seg-
segmentation by combining graph cuts and oriented active appearance mentation by graph partitioning, in Advances in Neural Information
models, IEEE Trans. Image Process., vol. 21, no. 4, pp. 20352046, Processing Systems. Cambridge, MA, USA: MIT Press, 2002.
Apr. 2012. [55] M. F. Porter, An algorithm for suffix stripping, Program, vol. 14, no. 3,
[27] C. Rother, V. Kolmogorov, and A. Blake, GrabCut: Interactive pp. 130137, 1980.
foreground extraction using iterated graph cuts, ACM Trans. Graph., [56] D. Martin, C. Fowlkes, and J. Malik, Learning to detect natural image
vol. 23, no. 3, pp. 309314, 2004. boundaries using local brightness, color, and texture cues, IEEE Trans.
[28] G. Hua, Z. Liu, Z. Zhang, and Y. Wu, Iterative local-global energy Pattern Anal. Mach. Intell., vol. 26, no. 5, pp. 530549, May 2004.
minimization for automatic extraction of objects of interest, IEEE [57] J. Harel, C. Koch, and P. Perona, Graph-based visual saliency,
Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 17011706, Advances in Neural Information Processing Systems. Cambridge, MA,
Oct. 2006. USA: MIT Press, 2007.
[29] V. Lempitsky, P. Kohli, C. Rother, and T. Sharp, Image segmentation [58] E. Borenstein, E. Sharon, and S. Ullman, Combining top-down and
with a bounding box prior, in Proc. IEEE 12th ICCV, Oct. 2009, bottom-up segmentation, in Proc. CVPR, Jun. 2004, p. 46.
pp. 277284. [59] X. Ren, C. Fowlkes, and J. Malik, Cue integration for figure/ground
[30] W. Tao, Iterative narrowband-based graph cuts optimization for geo- labeling, in Advances in Neural Information Processing Systems.
desic active contours with region forces (GACWRF), IEEE Trans. Cambridge, MA, USA: MIT Press, 2005.
Image Process., vol. 21, no. 1, pp. 284296, Jan. 2012. [60] G. Liu, Z. Lin, X. Tang, and Y. Yu, A hybrid graph model for
[31] O. Veksler, Star shape prior for graph-cut image segmentation, in Proc. unsupervised object segmentation, in Proc. IEEE 11th ICCV, Oct. 2007,
10th ECCV, Oct. 2008, pp. 454467. pp. 18.
[32] C. Jung and C. Kim, A unified spectral-domain approach for saliency
detection and its application to automatic object segmentation, IEEE
Trans. Image Process., vol. 21, no. 3, pp. 12721283, Mar. 2012.
[33] C. Rother, T. Minka, A. Blake, and V. Kolmogorov, Cosegmentation of
Le Wang received the B.S. degree in automatic
image pairs by histogram matchingIncorporating a global constraint control engineering from Xian Jiaotong University,
into MRFs, in Proc. IEEE CVPR, Jun. 2006, pp. 9931000.
Xian, China, in 2008, where he is currently pursuing
[34] D. S. Hochbaum and V. Singh, An efficient algorithm for co-
the Ph.D. degree with the Institute of Artificial Intel-
segmentation, in Proc. IEEE 12th ICCV, Oct. 2009, pp. 269276.
ligence and Robotics. From 2013 to 2014, he was a
[35] S. Vicente, V. Kolmogorov, and C. Rother, Cosegmentation revisited: Visiting Ph.D. Student with the Stevens Institute of
Models and optimization, in Proc. 11th ECCV, 2010, pp. 465479.
Technology, Hoboken, NJ, USA.
[36] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, iCoseg: Interactive
His research interests include computer vision,
co-segmentation with intelligent scribble guidance, in Proc. IEEE
machine learning, and their application in object
CVPR, Jun. 2010, pp. 31693176. discovery and segmentation from images and videos.
[37] A. Joulin, F. Bach, and J. Ponce, Discriminative clustering for image
co-segmentation, in Proc. IEEE CVPR, Jun. 2010, pp. 19431950.
[38] S. Vicente, C. Rother, and V. Kolmogorov, Object cosegmentation, in
Proc. IEEE CVPR, Jun. 2011, pp. 22172224.
[39] Y. Chai, V. S. Lempitsky, and A. Zisserman, BiCoS: A bi-level co-
segmentation method for image classification, in Proc. IEEE ICCV,
Gang Hua (M03SM11) received the B.S. degree
Nov. 2011, pp. 25792586.
in automatic control engineering and the M.S. degree
[40] J. C. Rubio, J. Serrat, A. Lpez, and N. Paragios, Unsupervised co-
in pattern recognition and intelligence system from
segmentation through region matching, in Proc. IEEE CVPR, Jun. 2012,
Xian Jiaotong University (XJTU), Xian, China, in
pp. 749756.
1999 and 2002, respectively, and the Ph.D. degree
[41] A. Joulin, F. Bach, and J. Ponce, Multi-class cosegmentation, in Proc.
in electrical and computer engineering from North-
IEEE CVPR, Jun. 2012, pp. 542549.
western University, Evanston, IL, USA, in 2006.
[42] G. Kim and E. P. Xing, On multiple foreground cosegmentation, in
He is an Associate Professor of Computer Science
Proc. IEEE CVPR, Jun. 2012, pp. 837844.
with the Stevens Institute of Technology, Hoboken,
[43] R. Fergus, P. Perona, and A. Zisserman, Weakly supervised scale-
NJ, USA. He also holds an Academic Visiting
invariant learning of models for visual recognition, Int. J. Comput. Vis.,
Researcher position with the IBM T. J. Watson
vol. 71, no. 3, pp. 273303, 2007.
Research Center, Ossining, NY, USA. Prior to that, he was a Research Staff
[44] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan,
Member with the IBM Research T. J. Watson Center from 2010 to 2011, a
Object detection with discriminatively trained part-based models,
Senior Researcher with the Nokia Research Center Hollywood, Santa Monica,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 16271645,
CA, USA, from 2009 to 2010, and a Scientist with Microsoft Live Labs
Sep. 2010.
Research, Bellevue, WA, USA, from 2006 to 2009. He was enrolled in the
[45] L. Fei-Fei and P. Perona, A Bayesian hierarchical model for learning
Special Class for the Gifted Young of XJTU in 1994. He holds nine U.S.
natural scene categories, in Proc. IEEE CVPR, Jun. 2005, pp. 524531.
patents and 13 more U.S. patents pending.
[46] K. Kesorn and S. Poslad, An enhanced bag-of-visual word vector space
Dr. Hua is a Life Member of the Association for Computing Machinery. He
model to represent visual content in athletics images, IEEE Trans.
was a recipient of the Richter Fellowship and the Walter P. Murphy Fellowship
Multimedia, vol. 14, no. 1, pp. 211222, Feb. 2012.
from Northwestern University in 2005 and 2002, respectively.
[47] S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial
pyramid matching for recognizing natural scene categories, in Proc.
IEEE CVPR, 2006, pp. 21692178.
[48] J. Zhang, M. Marszaek, S. Lazebnik, and C. Schmid, Local features
and kernels for classification of texture and object categories: A com- Jianru Xue (M06) received the masters and Ph.D.
prehensive study, Int. J. Comput. Vis., vol. 73, no. 2, pp. 213238, degrees from Xian Jiaotong University (XJTU),
2007. Xian, China, in 1999 and 2003, respectively. He
[49] L. Fei-Fei, R. Fergus, and A. Torralba, Recognizing and learning object was with FujiXerox, Tokyo, Japan, from 2002 to
categories, in Proc. ICCV, 2009. 2003, and visited the University of California at
[50] D. Lowe, Distinctive image features from scale-invariant keypoints, Los Angeles, Los Angeles, CA, USA, from 2008
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91110, 2004. to 2009.
[51] S. Maji, A. Berg, and J. Malik, Classification using intersection kernel He is currently a Professor with the Institute of
support vector machines is efficient, in Proc. IEEE CVPR, Jun. 2008, Artificial Intelligence and Robotics at XJTU. His
pp. 18. research field includes computer vision, visual nav-
[52] J. Wu, Efficient HIK SVM learning for image classification, IEEE igation, and video coding based on analysis.
Trans. Image Process., vol. 21, no. 10, pp. 44424453, Oct. 2012. Prof. Xue served as a Co-Organization Chair of the 2009 Asian Conference
[53] C. Pantofaru, C. Schmid, and M. Hebert, Object recognition by on Computer Vision and 2006 Virtual System and Multimedia conference. He
integrating multiple image segmentations, in Proc. 10th ECCV, 2008, also served as a PC Member of the 2012 Pattern Recognition conference, and
pp. 481494. the 2010 and 2012 Asian conference on Computer Vision.
4086 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014
Zhanning Gao received the B.S. degree in auto- Nanning Zheng (SM94F06) received the degree
matic control engineering from Xian Jiaotong Uni- from the Department of Electrical Engineering,
versity, Xian, China, in 2012, where he is currently Xian Jiaotong University (XJTU), Xian, China, in
pursuing the Ph.D. degree with the Institute of Artifi- 1975, the M.E. degree in information and control
cial Intelligence and Robotics. His research interests engineering from XJTU in 1981, and the Ph.D.
include image/text processing and image collection. degree in electrical engineering from Keio Univer-
sity, Tokyo, Japan, in 1985.
He is currently a Professor and the Director of
the Institute of Artificial Intelligence and Robotics
at XJTU. His research interests include computer
vision, pattern recognition, computational intelli-
gence, image processing, and hardware implementation of intelligent systems.
Dr. Zheng has been the Chinese Representative on the Governing Board of
the International Association for Pattern Recognition since 2000. He currently
serves as an Executive Editor of the Chinese Science Bulletin. He became a
member of the Chinese Academy Engineering in 1999.