Paper 44

The 2014 FTRA International Workdhop on Advanced Multimedia Computing (AMC-14) Proceedings Image Segmentation via Visual Attention
using Tensor Voting Based Gaussian Modeling Huynh Trung Manh, Guee-Sang Lee 1, 1 Department of Electronics and Computer Engineering, Chonnam National University, Korea trungmanhhuynh@gmail.com, gslee@jnu.ac.kr
Abstract: Salient object detection is a challenging task in field of object segmentation and recognition. Salient object detection mainly uses the visual attention concept to extract the objects in the scenes. However, the adaptation of visual attention on image segmentation field is still limited and cause bad segmentation results in some cases. In this paper, therefore, we propose a novel and simple saliency detection method, which is efficient for our image segmentation system. The saliency detection is based on Gaussian Mixture Model, in which its parameters are generated from Tensor Voting process. At first, the color images are mapped into 2-D space in which tensor voting process is applied to find out extremes. Then, Gaussian Mixture Model (GMM) is estimated and for each pixel, the set of normalized likelihood measures to different Gaussian Models are calculated. The color saliency value measure and spatial saliency measure of each Gaussian model are evaluated based on its color distinctiveness and spatial distribution. Finally, the final saliency map is generated by fusing color saliency map and spatial saliency map. We compare our algorithm to several states-of- the art methods using the well-known 1000 available public images. The experimental results show that our method significantly outperforms the previous algorithm in both precision and recall. Keywords: Visual Saliency; Segmentation; Object Detection; Gaussian Mixture Model; 1. Introduction Visual saliency detection from images is an important component in a number of content-based applications including salient object segmentation, region of interesting (ROI) coding, content aware image retargeting and image quality assessment. Generally, visual saliency is the perceptual quality that makes an object, a person or pixels stand out relative to its neighbors and thus capture our attention. Recently, many research efforts have been dedicated to Figure 1. Limitations of previous method for establish a computational model for saliency saliency detection causing bad segmentation detection that is bottom-up models and top-down results, (a) original images, (b) saliency results, saliency models. The bottom-up models are driven by (c) bad segmentation results, (d) ground truth. low-level features, whereas the top-down models are thus generally cannot completely highlight the whole dependent on specific task. In this paper, we focus on salient region with accurate boundaries. Compared developing our own saliency map which is efficient with those spotlight saliency maps, salient for our segmentation method. objects/regions can be highly highlighted in the Most of traditional segmentation methods: grabcut, saliency maps generated by other saliency models graph-based or active contours employ color or mentioned above, but the boundaries between salient spatial information of pixels to segment out the regions and background regions are usually not object which have different colors or spatial accurately preserved. Moreover, in [3-7] the contrasts distribution in images. Recent methods are based on between salient regions and background regions are visual attention concepts in which salient objects are usually not sufficiently significant in these saliency stand-out considering to surroundings. However, maps. Therefore it increases the difficulty and most of the previous saliency models generate unreliability of efficiently using such saliency maps spotlight saliency maps [1, 2], which usually can only for image segmentation. Examples of some failures highlight the center portion or the high-contrast are shown in figure 1. boundaries of salient regions, and usually have a With the main motivation to provide a more resolution lower than the original image. Spotlight applicable saliency map for image segmentation, we saliency maps are useful for predicting fixations of propose an efficient saliency model to overcome the eye movements and roughly locating salient regions, drawbacks of previous saliency models and so it but are not sufficient for salient object segmentation. creates simple, fast and efficient saliency map for Such spotlight saliency maps only reserve the lowest image segmentation. spatial frequency content from the original image,
FTRA AMC-14 Proceedings
http://www.ftrai.org/amc2014
The 2014 FTRA International Workdhop on Advanced Multimedia Computing (AMC-14) Proceedings The rest of this paper is organized as follows. Section 2 describes the proposed saliency model in detail. Experimental results and comparisons with three previous saliency models are presented in Section 3, and conclusions are given in Section 4. 2. The Proposed Method The proposed system consists of three stages. First, the color images containing salient objects were presented in color spaces and finding parameters to model Gaussian Mixture which including local maxima, means, covariance matrix by using tensor voting. Second, we evaluate the color saliency and spatial saliency for each Gaussian model based on its color distinctiveness and spatial distribution, respectively. Finally, we generate the pixel-wise saliency maps that highlight the whole object regions with accurate boundaries. 2.1 Tensor Voting based Gaussian Modeling Tensor voting is a framework for grouping perceptual structures by vote casting among tokens of an image [8, 9]. In our work, tensor voting is applied to the color feature space. Then, image will be successfully clustered. Because of the limits of this paper, the more details will be presented in our journal paper. Since a majority of pixels in each segmented region Rk ( k = 1,...n) usually share similar color,
Pk ( x, y ) =
a k N (c x , y m k , S k )
S nj =1a j N (cx , y m j , S j )
(3)
Where N (cx , y | m k , S k ) denotes the probability of the color
cx , y evaluated on the Gaussian model Gk ,
and is defined as
N (c x , y | m k , S k ) =
1 (2p )3 S k
1 -1 - ( cx , y - mk )T S k ( cx , y - mk ) 2
(4) The weight
ak
for each Gaussian model
Gk is
defined as the area ratio of region image as
Rk to the whole
ak =
Rk
j =1 R j
(5)
Rk can be modeled by a Gaussian model Gk = N ( mk , S k ) in terms of colour feature. The

mean vector defined as
mk and
the covariance matrix
S k are
mk =
Sk =
Where
S ( x , y )Rk cx , y Rk
Rk
(1)
It can be seen from (4) and (5) that the color similarity between each pixel at (x,y) and each Gaussian model Gk is the dominant factor to determine the normalized colour likelihood measure, and the weighting factors a k ( k = 1,..., n) actually consider the different contributions of similar color regions globally distributed over the whole image. 2.2 Color and Spatial Saliency Estimation The following will detail the calculation process of color saliency and spatial saliency for Gaussian models. First, the distance between each pixel at (x, y) and each Gaussian model Gk is evaluated on both color and position features. The color distance vector
S ( x , y )Rk (cx , y - mk )(cx , y - mk )T
d kc ( x, y )
(2)
s
and
the
spatial
distance
vector d k ( x, y ) are defined as follows
d kc ( x, y ) = cx , y - mk
d ks ( x, y ) = [ x - x, y - y ]T
Where
(6) (7)
cx , y denotes the color feature of the pixel

Rk denotes the number of pixels in
at ( x, y ) , and
Rk .
The original color image can be represented using a set of Gaussian models estimated for each segmented region. Based on the estimated Gaussian models for different segmented regions, for each pixel at ( x, y ) , we define its normalized color likelihood measure to each Gaussian model Gk as follows
( x, y ) is the centre position of each Gaussian model Gk in the spatial domain, and is defined as
The 2014 FTRA International Workdhop on Advanced Multimedia Computing (AMC-14) Proceedings Compared (12) with (10), we should note that (13) also includes the intra-distance Ds (i, i ) , whereas (10) only includes inter-distance (8) The intra-distance
x=
( x, y )
xPk ( x, y ) Pk ( x, y )
;y=
( x, y )
( x, y )
yPk ( x, y ) Pk ( x, y )
( x, y )
Dc (i, j ), "j i .
Ds (i, i ) actually represents the
It is observed from a variety of images that the colors of salient regions are usually distinctive from background colors, and thus distinctive colors deserve more attention from a human observer. In the colour domain, if one Gaussian model is far away from other Gaussian models, the colors near the mean of this Gaussian model are thus distinctive from other colors. The colour distance between any pair of Gaussian models Gi and G j is defined as
Dc (i, j ) =
( x, y )
Pi ( x, y ) d c j ( x, y )
( x, y )
Pi ( x, y )
( x, y )
Pj ( x, y ) dic ( x, y )
( x, y )
coverage of the Gaussian mode Gi in the spatial domain, and thus it should be included in the spatial saliency measure. The intra-distance Dc (i, i ) actually represents the colour homogeneity property of the Gaussian model Gi . In terms of colour, it is not reasonable to consider that non-homogeneous Gaussian model is more salient than homogenous Gaussian model, and therefore Dc (i, i ) is excluded from the colour saliency measure. Similarly, the spatial saliency measures for all Gaussian models are normalized with
Pj ( x, y )
(9) The colour saliency for the Gaussian model Gi is then defined as the sum of weighted color distances between Gi and all other Gaussian models
n
n i =1
GSc (i ) = 1 .
GSc (i ) = a j Dc (i, j ) -a i Dc (i, i ) (10)

j =1
2.3 Pixel-wise saliency map Based on the normalized colour likelihood measures of pixels and the colour/spatial saliency measures of Gaussian models, the pixel-wise colour saliency map Sc and spatial saliency map S s are generated as follows
n
The color saliency measures for all 3 Gaussian models are normalized with
i =1 GSc (i) = 1 .
Sc ( x, y ) = Pi ( x, y )GSc (i )
i =1 n
(13)
In the spatial domain, salient regions are usually surrounded by background regions, and the colors of background regions usually have a wider distribution over the whole image than the colors of salient regions. Therefore the spatial distribution of Gaussian models estimated for segmented regions can be exploited to identify the colors of salient regions. The spatial distance between any pair of Gaussian models Gi and Gj is defined as
Ds (i, j ) =
S s ( x, y ) = Pi ( x, y )GS s (i )
i =1
(14)
( x, y )
Pi ( x, y ) d s j ( x, y )
( x, y )
Pi ( x, y )
( x, y )
Pj ( x, y ) dis ( x, y )
( x, y )
Pj ( x, y )
(11) Based on the above defined distance, Gaussian models that mainly cover salient regions usually have shorter spatial distances to other Gaussian models. Therefore the spatial saliency for the Gaussian model Gi is defined as the reciprocal of the sum of weighted spatial distances between
GS s (i ) =
Gi and all Gaussian models 1

(12)
It can be seen from (13) and (14) that the pixel saliency is the sum of saliency measures of Gaussian models weighted by normalized colour likelihood measures. In this sense, the global colour information over the whole image is incorporated into the saliency calculation at each local pixel. Using (13) and (14), the pixel-wise color saliency map and spatial saliency map and they are normalized into the range of [0, 255]. The pixel-wise colour saliency map and spatial saliency map are integrated into the final saliency map S as follows (15) S ( x, y ) = S c ( x, y ) S s ( x, y ) The final saliency map is shown in Figure 2, which is also normalized into the range of [0, 255]. Our saliency model combines the color saliency and the spatial saliency using are more significantly highlighted in the salient objects.
j =1a j Ds (i, j )
The 2014 FTRA International Workdhop on Advanced Multimedia Computing (AMC-14) Proceedings
Figure 2. Saliency measure of Gaussian models, (a) Clustered image, (b) color saliency result, (c) spatial saliency result, (d) final saliency map. these segmented results (column 3th, 4th, 5th). First, it fails to segments the whole objects. Second, it includes many noises which are caused by affection of illumination, shade or shadows. And lastly, it fails to consider parts of background as salient objects. In contrast, our segmented images (last column) have completed match with the ground truth (first column) despite of minor errors and so it achieves a very high precision, recall and F-beta (0.868, 0.7312, and 0.8312 respectively). The more experimental results on our saliency models and segmentation will be presented clearly on our journal paper.
1 0.9 0.8 0.7
3. Experimental Results. We perform experiments on the image dataset [3] with the ground truths for salient objects are given in 1000 images ( available at http://ivrg.epfl.ch/supplementary_material/RK_CVP R09/GroundTruth/binarymasks.zip). In this paper, we evaluate the saliency detection performance for salient object segmentation, in which saliency maps provide useful segmentation cues. In this paper, we compare the performance of our segmentation method implemented using Matlab against three previous models, that is, AC08 [3], FT09 [4], and MS10 [7]. Assume that the binary object mask generated by thresholding a saliency map is denoted by B, and the corresponding ground truth is denoted by G, precision and recall are defined as:
precision =
( x, y ) B( x, y)G( x, y) ( x , y ) B ( x, y )
( x, y )
(16)
0.6 0.5 0.4
Precision Recall F-beta
recall =
B ( x, y )G ( x, y ) G ( x, y )
(17)
0.3 0.2 0.1 0 AC08 FT09 MS10 Our
. We further use Otsu threshold to segment salient object. Average values of precision, recall, and FMeasure are obtained over the same ground-truth database used in the previous experiment.
(1 + b 2 )Precision Recall Fb = b 2 Precision + Recall

2
Figure 3. Precision, Recall, and
Fb bars for salient
(18)
object segmentation. Our method shows high precision, recall and Fb values on the 1000 image database.
We use b = 0.3 in our work to weigh precision more than recall. Figure 3 show the outperformance of our method in all three criteria compares to the previous state-of-the art methods. We also show examples of our segmented images in Figure 4. In the previous method, the poor qualities of saliency maps have caused bad results in application of image segmentation. There are three visible problems in
4. Conclusion and Future Works We have presented an efficient saliency model based tensor voting through Gaussian Modeling. The color saliency measure and spatial saliency measure for each Gaussian model are evaluated based on its
Figure 4. Examples of image segmentation based on saliency maps, (1st column) Original images, (2nd column) ground truths,(3rd, 4th , 5th ) results using saliency maps of AC08, FT09, MS10,(6th) results using saliency maps of our method. color distinctiveness and spatial distribution, respectively. The pixel-wise color saliency map and spatial saliency map are generated by summing the color and spatial saliency measures of Gaussian models weighted by the normalized color likelihood measures, and they are finally combined together to obtain the final saliency map. Our saliency model can significantly highlight the complete salient regions with accurate boundaries, and can provide an overall better saliency detection performance than previous saliency models. Experimental results demonstrate that our saliency model is very useful for salient object segmentation, and we believe that it can be efficiently exploited by a wide range of content-based applications including content aware image retargeting and ROI-based image coding. Our future works could be using more high-level knowledge to construct a saliency maps for detecting salient objects which can be given for further applications such as image segmentation, object classification and recognition. Acknowledgements: This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST)(2012-047759 and 2013006535). Also this research was supported by the MSIP(Ministry of Science, ICT&Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (NIPA-2013H0301-13-3005) supervised by the NIPA(National IT Industry Promotion Agency). Corresponding Author: Professor GueeSang Lee Department of Electronics and Computer Engineering, Chonnam National University. 300 Yongbong-Dong street, Bukgu 500-757, Gwangju, Korea E-mail: gslee@jnu.ac.kr References
[1] L. Itti, C. Koch, and E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11,pp. 12541259, November (1998).
[2] C. Koch and S. Ullman, Shifts in selective visual attention: Towards the underlying neural circuitry, Human Neurobiol-ogy, vol. 4, no. 4, pp. 219227. [3] Achanta, R., Estrada, F., Wils, P., Susstrunk, S.: Salient region detection and segmentation,. Proc. Int . Conf. on Computer Vision Systems, Santorini, Greece, May 2008, pp. 66- 75. [4] R.Achanta, S.Hemami, F.Estrada, and S.Susstrunk, Frequency-tuned salient region detection , in
Computer Vision and Pattern Recognition (CVPR) IEEE Conference on, 20-25 2009,pp 1597-1604. [5] J. Harel, C. Koch, and P. Perona, Graph-based visual saliency, Advances in Neural Information Processing Sys-tems , pp. 545552, 2007. [6] Hou, X., Zhang, L.: Saliency detection: a spectral residual approach, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, MN, June 2007, p. 4270292. [7] R. Achanta, S.Susstrunk, Saliency detection usingmaximum symmetric surround, IEEE 17th International Conference on Image Processing 2010 (ICIP 2010). [8] G. Guy and G. Medioni. Inference of Surfaces, 3D Curves, and Junctions from Sparse, Noisy, 3-D Data, IEEE Trans. on PAMI, vol. 19, no. 11, pp. 1265-1277, 19997. [9] G. Medioni, M.S. Lee and C.K. TangA computational framework for segmentation and grouping, Elsevier, 2000

Paper 44

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper 44

Uploaded by

Copyright:

Available Formats

The 2014 FTRA International Workdhop on Advanced Multimedia Computing (AMC-14) Proceedings Image Segmentation via Visual Attention

FTRA AMC-14 Proceedings

Where N (cx , y | m k , S k ) denotes the probability of the color

cx , y evaluated on the Gaussian model Gk ,

(4) The weight

for each Gaussian model

defined as the area ratio of region image as

Rk can be modeled by a Gaussian model Gk = N ( mk , S k ) in terms of colour feature. The

the covariance matrix

S ( x , y )Rk (cx , y - mk )(cx , y - mk )T

vector d k ( x, y ) are defined as follows

cx , y denotes the color feature of the pixel

FTRA AMC-14 Proceedings

Ds (i, i ) actually represents the

GSc (i ) = a j Dc (i, j ) -a i Dc (i, i ) (10)

Gi and all Gaussian models 1

FTRA AMC-14 Proceedings

0.6 0.5 0.4

Precision Recall F-beta

0.3 0.2 0.1 0 AC08 FT09 MS10 Our

(1 + b 2 )Precision Recall Fb = b 2 Precision + Recall

Figure 3. Precision, Recall, and

Fb bars for salient

FTRA AMC-14 Proceedings

FTRA AMC-14 Proceedings

FTRA AMC-14 Proceedings

You might also like