You are on page 1of 4

A Spectral Method for Image Co-segmentation

Tiantang Chen and Hongliang Li


University of Electronic Science and Technology of China, Chengdu, China

AbstractIn this paper, we propose a spectral method to address the problem of image co-segmentation. Our idea is motivated by the Normalized Cut algorithm and spectral clustering. By dening a superpixel graph over the given image pair and introducing the successful ideas of Ncut formulation, we successfully transform the co-segmentation problem into a Rayleigh quotient problem, which can be solved by eigendecomposition. Then we utilize spectral clustering to classify the pixels in each image and obtain the common objects of the image pair. Experimental results have demonstrated the effectiveness of the proposed method. Index TermsCo-segmentation, eigenvector, Ncut, spectral clustering.

(a) (b) (c) (d) Fig. 1. An example of co-segmentation. (a)-(b) The input image pair I1 and I2. (c)-(d) The binary mask for each image. F 1 and F 2 denote the foreground, B1 and B2 denote the background.

I. I NTRODUCTION N the recent years, co-segmentation has been an active topic of research. This problem mainly refers to simultaneously extracting the common objects in a pair of images. Generally, the image pair has similar foreground regions (objects), but different background regions. While it is difcult to automatically extract an object from a single image, cosegmenting an image pair is much easier since the relationship between the images is an implicit cue. We can treat that as prior information to guide the segmentation process. The idea of co-segmentation is rst introduced for image retrieval applications in [1]. The authors formulate cosegmentation as an energy minimization problem [2], and solve it with the well-studied graph cut technique [3]. Based on the same framework, Hochbaum and Singh [4] use the carrot or stick philosophy to adapt the energy function. The authors show that the minimization problem can be solved using only one maximum ow procedure. Recently, Joulin et al. [5] propose a method based on the discriminative clustering framework, which combines Normalized Cut (Ncut) [6] with discriminative clustering. In this paper, we present a spectral method for cosegmentation, which would involve an Ncut-like formulation and spectral clustering [7]. To be specic, we rst construct a superpixel graph over the image pair, then formulate the co-segmentation problem into a cost function minimization problem. After that, we solve the minimization problem by eigen-decomposition, which has been successfully used in Ncut [6]. Finally, spectral clustering is utilized to extract the similar objects of the images. Since the proposed method treats the eigenvectors as an important component, we call it as a
This work was supported in part by the NSFC (No.60972109), in part by the Program for New Century Excellent Talents in University (NCET-080090), and in part by Sichuan Province Science Foundation for Youths (No. 2010JQ0003).

spectral method. The biggest feature of the spectral method is that co-segmentation can be accomplished in an extremely convenient way. Our experiments have demonstrated that the method has achieved a better balance between quality and running time cost compared with the existing approaches. The rest of this paper is organized as follows. In Section II, we elaborate on how to derive the spectral method for cosegmentation. In Section III, we provide experimental results, and compare the spectral method with other approaches. After that, we draw conclusions in Section IV. II. T HE P ROPOSED M ETHOD An illustration of co-segmentation is shown in Fig. 1. Given two images I1 and I2, each consists of n pixels. Between the images, there is a pair of common objects (toy), which can be interpreted as the foreground regions of their consistent images. F 1 and F 2 are used to denote the foreground regions while B1 and B2 are used to denote the background regions. Under this condition, the problem of co-segmentation amounts to simultaneously dividing each image into two classes, the foreground and the background. We begin with dening two graphs G1 and G2 over the input image pair. In each graph, a node corresponds to a pixel, and adjacent nodes are connected by edges based on the 4neighborhood relation. The edge weights represent the intraimage afnity. G1 and G2 are shown in Fig. 2(a). In G1, the green nodes denote F 1, the purple nodes denote B1. And in G2, the green nodes denote F 2, the orange nodes denote B2. Now in order to express the inter-image afnity, we link each node of one graph to all the nodes of the other. As illustrated in Fig. 2(b), node 3 in G1 and node 7 in G2 can be vivid examples. In this way, we obtain a new graph G. By setting edge weights properly, G can efciently represent the intraand inter-image afnities of the input images. We will discuss how to set the edge weights later. A. Problem formulation We notice that the graph G is composed of two groups, i.e, the foreground group F = F 1 F 2 and the background

the knowledge of matrix algebra, which has been successfully used in Ncut, and utilize spectral clustering to perform graph partition. First of all, we construct the weighted adjacency matrix associated with graph G as
(a) (b) Fig. 2. Illustration of the graph construction. (a) G1 (left) and G2 (right), respectively corresponds to I1 and I2. (b) To build graph G, each node of the former single graph is connected to all the nodes in the other graph.

W =

W1 W 21

W 12 , W2

W (i, j) = wij ,

(5)

group B = B1 B2. Therefore, we can turn to consider how to divide G into two groups. From our observation, there are mainly three constraints for the problem of co-segmentation to be considered. The rst one is that the optimal solution for co-segmentation should make group F and group B as most dissimilar as it can. We dene the similarity function between the two groups as Sim(F, B) =
iF,jB

where W k is the weighted adjacency matrix associated with image Ik , while W 12 and W 21 are the weighted adjacency matrices encoding the similarity between images, here W 12 = (W 21 )T . Then we dene the degree matrix D, which is a diagonal matrix with the row sums of W on its diagonal. Since graph G has N nodes, both W and D are N N dimensional. Then we let x be an N 1 indicator vector such that: xi = 1, 1, if node i F 1 F 2; if node i B1 B2. (6)

wij ,

(1)

where wij is the edge weight between node i in F and node j in B. The smaller the function is, the better result we can obtain. The second is, there should be a great similarity between the objects we want to extract. This constraint is actually the nature goal of co-segmentation. To satisfy this demand, another similarity function is dened as Sim(F 1, F 2) =
iF 1,jF 2

The rst half of x serves for I1 and the latter half serves for I2. The elements of each half associated with xi = 1 indicate the pixels labeled as the foreground in each image. We now rewrite our cost function of co-segmentation as: Coseg(F 1, F 2, B1, B2) = = Sim(F 1 F 2, B1 B2) Sim(F 1 F 2, F 1 F 2) (xi >0,xj <0) wij xi xj
(xi >0,xj >0)

wij xi xj

(7)

wij ,

(2)

To simplify the equation above, we introduce the new vector 1+x 2 . Here, 1 is an N 1 vector of all ones. Therefore, ( 1+x )i = 2 1, 0, if node i F 1 F 2; if node i B1 B2. (8)

which is the total weight of the edges connecting F 1 and F 2. And the last one is that the similarity in the same object should be large, such that the object can be extracted as a whole. The similarity function for that can be expressed as Sim(Fk , Fk ) =
i,jFk

Then the cost function can be further rewritten as: Coseg(x) = ( 1+x )T (D W )( 1+x ) 2 2 ( 1+x )T W ( 1+x ) 2 2 (1 + x)T (D W )(1 + x) = . (1 + x)T W (1 + x)

wij ,

k = 1, 2.

(3)

It is the total weight of the edges in Fk . With these denitions, we now dene a cost function for co-segmentation as, Coseg(F 1, F 2, B1, B2) Sim(F 1 F 2, B1 B2) . (4) = Sim(F 1, F 2) + Sim(F 1, F 1) + Sim(F 2, F 2) From Eq. (4), we can see that the three constraints (i.e., Eqs. (1)-(3)) are used to build the cost function. Thus, the cosegmentation problem can be converted to a minimization problem of the cost function. The smaller the Coseg value, the more similar the extracted objects. When the Coseg value approaches its minimum, we get the optimal cut for bipartitioning graph G and nd the common objects of the image pair. However, there are 2N (N = 2n, the total number of nodes in graph G) solutions for the combinatorial problem. It turns out to be NP-hard to directly nd out the optimum in so many solutions. Our idea is to simplify the minimization problem by

(9)

Setting y = 1 + x, we nd that the former minimization problem is elegantly transformed into a nice form: minx Coseg(x) = miny y T (D W )y , yT W y (10)

which is the so-called Rayleigh quotient in matrix algebra. Generally, its solution can be obtained by solving the following generalized eigensystem: (D W )y = W y. (11)

With the denition of the Laplacian matrix L = D W , the above equation is equivalent to, W 1 Ly = y. (12)

From this, we can see that the solution for the minimization problem in Eq. (10) is the smallest eigenvectors of the normalized Laplacian matrix W 1 L. As is indicated by

the knowledge of matrix algebra, the eigenvectors can be conveniently obtained by eigen-decomposition. However, the rst smallest eigenvector of W 1 L is not always the optimal solution, since it turns out to be 1 with the corresponding eigenvalue 0. Such solution will lead to a meaningless result: F = G, B = or F = , B = G. In Ncut [6], the second smallest eigenvector is chosen for single image segmentation. In our method, we utilize the idea of spectral clustering. Specically, we choose the rst k smallest eigenvectors to construct an N k matrix (In most cases of our simulations, k = 4). This new matrix can be considered as a dimensional reduction result of the normalized Laplacian matrix W 1 L. Each row of it can be seen as a data point in the k dimensional space. We then break the new matrix into two blocks, each of the size N k. From the structure of W , we can see that 2 the top block is actually the k-dimensional data set associated with I1, while the bottom block is the k-dimensional data set associated with I2. Finally, we perform a K-means algorithm over each block to classify the image data into two groups. The foreground group of each image is exactly what we want to extract. We can see that the formulation process above is indeed the basis of the proposed method, and internally inuences the nal co-segmentation result, because it points out what kind of matrix spectral clustering should be performed on. As a comparison, the Ncut algorithm derives a different form of matrix to segment an image, details can be found in [6]. B. Preclustering Note that W 1 L is an N N matrix. For an input image of the size 128 128, N = 32768. That is really a heavy burden for the storage and the computational efciency. To address this problem, we need to reduce the number of pixels. We therefore perform a superpixel oversegmentation [8] over the images. Then the superpixels obtained will serve as nodes in graph G. Hence, graph G is indeed a superpixel graph. In our implementation, we generate s superpixels per image (s is between 50 and 250), using the superpixel code provided by Greg Mori, downloaded from http://www.cs.sfu.ca/mori/ research/superpixels. As a result, the size of W 1 L is reduced to 2s 2s (2s is a small number relative to N ). For each superpixel, we simply extract a 9-dimensional feature vector ci composed of all the mean color components in the RGB, Lab and YCbCr color spaces. Then, we choose the Gaussian function to measure the edge weight between the superpixel nodes i and j: wij = exp((ci cj )2 ), (13)

Algorithm 1 The spectral method for co-segmentation Input: An image pair I1, I2 Output: The common objects of the images 1: Perform superpixel segmentation over each image. Describe each superpixel with mean color features. 2: Construct a superpixel graph G consistent with the image pair and set the edge weights. Get the weighted adjacency matrix W and the degree matrix D. 3: Perform eigen-decomposition on the normalized Laplacian matrix W 1 L for the smallest eigenvectors. 4: Use the matrix composed of the rst k smallest eigenvectors to bipartition I1 and I2, extract the common objects.

extract several representative eigenvectors of the normalized Laplacian matrix, and ultimately utilize them to segment the common objects of the input images. III. E XPERIMENTS In this section, we rst evaluate our proposed algorithm on the image co-segmentation. Some comparisons with existing spectral methods and co-segmentation methods are then addressed. Finally we show the running time cost. To begin the experiment, we download some images from the Internet and create synthetic image pairs with common objects using Photoshop. We then perform our spectral method over the synthetic image pairs. For comparisons with existing spectral method, we use an implementation of Ncut from http://www.cis.upenn.edu/jshi/software/les/ NcutImage 7 AMD64.zip. We start with a 2-class segmentation on each image, and then incrementally change the number of classes until the common objects can be extracted as a segment. The experimental results are shown in Fig. 3. Figs. 3(e)-(h) demonstrate that the spectral method properly segments the common objects of the image pairs: the mango and the bird are successfully distinguished from the background of the images. From Figs. 3(i)-(l), we can see that in most cases of 2-class segmentation, Ncut is not able to exactly segment the objects of the images. In Fig. 3(m), the image is divided into 4 classes so that the mango can be extracted. And in Fig. 3(o) and Fig. 3(p), the number of classes is 3. The worst case is in Fig. 3(n), we need to break the image into 55 segments to nd the mango. It is really annoying to determine the parameter. On the contrary, the spectral method works well with the xed number of classes which equals 2 all the time. To compare with existing approaches of co-segmentation, we use the images which are available from www.cs.wisc.edu/ vsingh/pairimages.tar.gz. As shown in Fig. 4, the middle row is the co-segmentation results obtained by the spectral method. We observe that the lower part of the stone is not successfully extracted in the stone images. This is because there is a strong edge cross the stone. And for the bear images, the bear is not integrally segmented, some regions of dark color on its body are misclassied as the background. In contrast,

where is a free parameter (we use = 1 in the simulations). The outline of the spectral method is given in Algorithm 1, which mainly includes four steps. For an image pair, we rst oversegment each image into superpixel regions, then we use these subregions to construct a superpixel graph and obtain the associated afnity matrices. After that, we

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 4. Stone images and bear images: (top) the input images, (middle) cosegmentation results by the spectral method, (bottom) co-segmentation results by the discriminative clustering method [5]. (i) (j) (k) (l) TABLE I T HE AVERAGE RUNNING T IME C OST (U NIT: S ECOND ) Our method [5] Mango 1.22 82.72 Bird 0.99 98.13 Stone 1.27 87.36 Bear 1.28 88.02

(m)

(n)

(o)

(p)

Fig. 3. Mango images and bird images. (a)-(d) The input images. (e)-(h) Cosegmentation results by the spectral method. (i)-(l) Single image segmentation results by Ncut, each image is divided into 2 classes. (m)-(p) Segmentation results by Ncut, the number of segments from (m) to (p) is: 4, 55, 3, 3.

IV. C ONCLUSION We have proposed a spectral method for the image cosegmentation problem. The formulation of the method is guided by Ncut and spectral clustering. With the introduction of matrix algebra, we manipulate the cost function minimization problem into an eigen-decomposition problem. As a result, the complexity of the co-segmentation problem is signicantly simplied. The experiments have shown that the spectral method can achieve a better trade-off between quality and runtime compared with existing co-segmentation methods. R EFERENCES
[1] C. Rother, T. Minka, A. Blake, and V. Kolmogorov, Cosegmentation of image pairs by histogram matching - incorporating a global constraint into MRFs, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006. [2] V. Kolmogorov and R. Zabih, What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Analysis and Machine Intelligence, 26(2):147-159, 2004. [3] Y. Boykov, O. Veksler, and R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Trans. Pattern Analysis and Machine Intelligence, 23(11):1222-1239, 2001. [4] D. S. Hochbaum and V. Singh, An efcient algorithm for cosegmentation, in Proc. IEEE Int. Conf. Computer Vision, 2009. [5] A. Joulin, F. Bach, and J.Ponce, Discriminative clustering for image co-segmentation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010. [6] J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, 22(8):888-905, 2000. [7] R. Kannan and S. Vempala, Spectral algorithms, Foundations and Trends in Theoretical Computer Science, 4(3-4):157-288, 2009. [8] X. Ren and J. Malik, Learning a classication model for segmentation, in Proc. IEEE Int. Conf. Computer Vision, 2003.

these problems rarely arise in the results obtained by the discriminative clustering method [5] (software available at http: //www.di.ens.fr/%7Ejoulin/code/diffrac coseg.tar.gz), which is shown in the bottom row of Fig. 4. The main reason for these is that the superpixel features we use do not have enough discriminative power. We notice that [5] has utilized several feature descriptors which can express information about color and texture, such as SIFT descriptors, Gabor lters and color histograms. It could signicantly improve the performance of our method to nd better descriptors for superpixels. We will leave it be a part of our future work. Note that our code is implemented in MATLAB, and all the experiments are performed on a 2.6 GHz Pentium processor with 2 GB of RAM. Compared with existing approaches, it takes much less time for our method to co-segment an image pair. In Table I, we list the running time of our method and the discriminative clustering method [5]. Since the implementation code of [5] requires a superpixel map as an input, its runtime does not include the time spent on generating the superpixels. To be consistent, that is also not taken into account in our runtime. As shown in Table I, it takes about 1 second for our method to co-segment a pair of images. In contrast, [5] needs more than 80 seconds. From this, it can be seen that a good balance between quality and running time cost can be achieved by our method.