You are on page 1of 10

2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Survey on Image Saliency Detection Methods

MA Runxin, YU Yang & YUE Xiaomin


School of Computer Science and Engineering
Hebei University of Technology
Tianjin, China
e-mail: 1282239082@qq.com

Abstract—Image saliency detection is one of the most active University, Pattern Recognition Laboratory (PRLab) at the
research topics in the field of computer vision. It focus on how University of Neuchatel in Sweden [19], Computer Vision
to detect significant objects under the complex background, and Multimedia Laboratory (CVM Lab) of Pavia University
and to reduce the computational cost for getting high- in Italy, Image Processing and Pattern Recognition Institute
resolution, clear boundary, the overall uniform significant of Shanghai Jiaotong University, Intelligent Information
objects. First, in this paper the state-of-the-art research on Processing Institute of Xi'an Electronic and Engineering
image saliency detection is analyzed and presented in detail. University, and so on. At the same time, many academic
Then the related methods are classified into two types of space journals and conferences on artificial intelligence and
domain based, and frequency domain based, and the
computational vision also provide communication platforms
prototypical research methods are analyzed and tested in
for related research achievements.
public image databases for image saliency detection. Finally,
the development tendency of image saliency detection is According to the cause of generating attention, neural
predicted. psychology divides the saliency detection studies into two
models [20]: the image data based bottom-up model and
Keywords- image saliency detection; visual attention models; task-based top-down model. The bottom-up model is the
spatial domain; frequency domain low-level cognitive process, and most of research predict
human fixation in free-viewing task based on this process
I. INTRODUCTION [21-30]. Moreover, many saliency models construct the
With the development of science and technology, images cognitive process through calculating the saliency degree of
become one of more and more important medias for low-level features. In contrast, by adjusting the selection
information transmission and also it is difficult to deal with criteria, the top-down models focus the attention on a
the large number of images quickly and accurately. Studies particular target and then recognize it. The top-down models
find that the most important part of an image often centers on require the prior knowledge. The selection process is related
some key parts, which are the so-called salient regions. If the to human subjective consciousness, and different subjects in
salient parts are extracted accurately, fast processing of the same scene will lead to different special cares because of
images can be realized, and the calculation time could be varied purposes. As a result, the top-down models are limited
significantly reduced. The task of image saliency detection is to the theoretical stage [31-36].
to find out which areas are more likely to become the focus In 1998, Itti, from California University, USA, presented
of human visual attention [1]. Moreover, we use the saliency a classic visual attention model, referred to as the Itti Model
map to describe the degree of image saliency. Pixel values in [22]. The theory of this model is based on Koch’s biological
the saliency map represent the saliency degree of the structure and the feature integration theory of Treisman[37-
corresponding regions in the image. At present, applications 38] and furthermore applied to the analysis process of real
of image saliency detection are very successful in content scene images. The experiment results is close to human
transmission [2], image compression [3-4], image visual perception. In 2006, Jonathan Harel et al. proposed the
segmentation [5-6], target recognition[7], image scaling [8], GBVS (Global-based Vision Saliency) algorithm [27] based
image retrieval [9], image editing [10-13], and other areas. on graph theory. The algorithm structurally still adopts Itti’s
The Human Visual System (HVS) [14-15] has a very three stages of feature extraction, center-surround
efficient capability of data filtering [16]. Therefore, confrontation and recombination. It uses Markov chain [39]
researches on saliency detection technology are all started to calculate the center-surround mutual influence. In 2008,
with neuropsychological methods [17-18]. With the Achanta et al. proposed the full resolution algorithm, named
promotion of computer processing capacity, more and more AC algorithm [28]. Because only take the subtractions of the
computer vision research institutions pay more attention to average of color and brightness with the surrounding regions
the researches of visual attention based on saliency. Among into account, this algorithm calculates very quickly. The
them, the following are lists of well-known institutes: iLab of advantage of frequency domain algorithm in computing
U.S.C., Artificial Intelligence Laboratory (MIT AILab) of speed also cannot be ignored. Hou et al. proposed the
Massachusetts Institute of Technology, Visual Attention and Spectral Residual Approach (SR algorithm) [29] in 2007 and
Conception Laboratory (VAC Lab) of George Mason Achanta et al. proposed a Frequency-tuned algorithm (FT

978-1-4673-9200-6/15 $31.00 © 2015 IEEE 329


DOI 10.1109/CyberC.2015.98
algorithm) [30] in 2009. They are both frequency domain The orientation feature O is four-orientation angle
based algorithms. components of Gabor wavelet at θ={0,45,90,135}. The
An effective saliency detection method should be able to Gabor filter is used to extract this four orientation features,
detect saliency objects in cluttered backgrounds, and has and the input filter is the product of Gaussian and cosine
little computational cost. However, it must be admitted that, functions.
in some extent, image saliency detection is difficult. On one
hand, the visual attention mechanism [40] relies on the very g  x, y ,  ,  ,  , 
complex theoretical background of psychology and
neurology, which can’t be fully realized by simple algorithm   2 2   x n!  
 exp  x '   2 y ' 2 2  .cos  2 
and computer simulation. On the other hand, the      r ! n r !
combination of visual attention and image processing is a
new research direction. Lots of problems remain unsolved,
Here, θ is the angle; σ is the bandwidth of the filter; γ is
such as the selection and extraction of low-level visual
the image aspect ratio; and λ represents the wavelength.
features, the selection and transfer of the attention focuses
There is x’=xcosθ +ysinθ, y’=-xcosθ +ysinθ. (x, y) is the
[41].
pixel location coordinates in the space domain. The
Currently, it lacks the summary literature for typical
orientation feature O is obtained from the convolution of
image saliency detection algorithm. So it is quite necessary
image brightness feature I and g in (1).
to collect and analyze the image saliency research in the
current stage, expecting to be able to guide future research
 O  I g  
works better. On the basis of summarizing the background
and current status of image saliency research, this paper
makes classification and description to the state-in-arts The Itti Model is carried out in multiscale, and
saliency detection methods. Moreover, through the experimential results in [22] show that the image scale is an
experiments on the typical image databases, the results and important factor affecting the human visual attention
run time of different methods were compared. Then finally mechanism. This model divides the image into eight scales
summary and prospect are made to the image saliency of 0-7 from fine to coarse. Moreover, it zooms feature map
detection. at coarse scale s with interpolation to feature map at fine
scale c. Then the subtraction of corresponding points is done
II. THE PRINCIPAL METHODS OF SALIENCY DETECTION
and represented by . The feature map of the fine scale
Saliency detection methods have a variety of represents the central region while the feature map of the
classification types. According to the saliency models, they coarse scale represents the surrounding area. At last, six
can be divided into the entirely biologically inspired brightness feature attention maps, 12 color feature attention
algorithms, the pure calculation methods and the hybrid
maps, and 24 orientation feature attention maps are obtained.
methods. According to the visual attention mechanism, they
Brightness feature attention maps is represented as
can be divided into bottom-up methods and top-down
methods. According to the processing areas, they can be following:
divided into space domain methods and frequency domain
methods. According to the algorithm ideas, they can be  I  c, s   I  c  I  s   

divided into spatial characteristics [42] based algorithms,
statistical characteristics [43] based algorithms and atlas- Color feature attention maps:
based algorithms. Here in this paper, the relevant methods
are classified in accordance with the processing area.
RG  c, s    R  c  G  c    G  s  R  s  
A. Saliency Detection Algorithm Based on Space Domain    
At present, Itti model is thought to be a classic one in BY  c, s    B  c  Y  c   Y  s  B  s  
models mentioned above, which is based on visual computer
model built on local characteristics contrasts. This model Orientation feature attention maps:
simulates the selective attention model of biological visual
attention mechanism. It promotes the visual selective
attention model to the quantitative calculation level. By the  O  c, s,   O  c,  O  c,   
extraction of color, brightness, direction and other low-level
features of the original image, it calculates the feature maps Where RG represents the color difference of R and G,
in different scales according to the center-surround operator
and BY is the color difference between B and Y.
of features. The feature maps will be normalized and
Itti model adopts the normalized factor N(•) to enhance
integrated, and then the saliency map is obtained.
Itti model extracts the brightness feature I=(r+g+b)/2, the the attention map with fewer saliency peaks combined to
red component R=r-(g+b)/2, the green component G=g- generate the saliency map.
(r+b)/2, the blue component B=b-(r+g)/2, the yellow The most significant contribution of Itti Model is that it
component Y=(r+g)/2-|r-g|/2-b and the orientation feature O. inherits the feature integration theory and the inhibition of

330
return mechanism [46-47]. For the first time, it uses four principles [38] referred by visual psychology literature
mathematical tools to realize quantitative calculation of the [52]. It considers local low-level factors, such as color and
above mechanism. So it makes the visual selective attention contrast [53-55], global factors, visual organization rules [56]
mechanism go to engineering application field of and high-level factors. It is mainly suitable for situations
quantitative calculation [48]. It achieves excellent results in where the surrounding environment and the saliency object
particular scenes. However, noise interference is aggravated have the same importance.
by adding all the saliency maps in the model. Inspired by In the space domain, saliency detection models based on
Itti’s method, Frintrop and others [26] used visual attention the graph theory always divide the images into blocks and
mechanism to detect the saliency region of the objective. take blocks as nodes. Then the graph maps are determined by
They proposed to use the square filter to calculate the center- the weighted edges among pixel blocks according to color,
surround difference and used the image integration to brightness, orientation and other visual features [57]. The
accelerate the speed of operation. representative algorithm is the graph theory based algorithm
AC algorithm [28] is a saliency map generating (GBVS Algorithm) [27]. In the processing of feature
algorithm with pure mathematical calculation. The saliency extraction, this algorithm simulates the visual principle,
value of this algorithm is a local contrast between the region which is similar to Itti algorithm. However, in the process of
of the input image R1 and the neighborhood R2 in different generating saliency map, it introduces Markov chain and
scales. In Lab color space, the saliency value of pixel (x, y) uses the graph model to calculate the center-surround
in a scale is obtained by the subtraction between v1=[L,a1,b1] difference. Then pure mathematical calculation is used to
and v2=[L,a2,b2]: ci,j=d(v1,v2), where v1 and v2 are the mean obtain saliency map.
feature vectors of region R1 and R2, and d is the Euclidean The GBVS saliency map is obtained as follows:
distance[49]. Sum of the saliency value m(i,j)=∑sc(i,j) under Firstly, suppose that the input image feature map is
the multi-scale s is the final saliency value. This algorithm is obtained as M:[n]2→R. Then the difference of two point
different from Itti Model, for it does not need to change the features is calculated by Equation (6):
input image ratio but only to resize the ratio of R2. So the
obtained resolutions of the saliency map finally has same  M  i, j 
size with the original image.  d   i, j  ||  p, q    log  
M  p, q 
Ming-Ming Cheng et al. proposed two saliency models
based on global contrast [50], called Histogram based
Contrast (HC) and Region based Contrast (RC). In the HC This difference could be replaced by formula:
map, the saliency value in the image is determined by the

color feature, which means saliency value is obtained by the
color contrast between one pixel and its neighors. The HC  d   i, j  ||  p, q    M  i, j  M  p, q   
method uses the histogram-based approach to obtain the
saliency map. The resolutions of saliency map has the same In which, M(i, j) and M(p, q) respectively are
size with the original image. RC saliency map is improved eigenvalues of nodes (i, j) and (p, q). Each node in feature
from HC saliency map. For the spatial relationship also map M is connected with other N-1 nodes to get the directed
plays an important role in human attention models. The high graph GA, and the weight of the edge with the orientation
contrast of adjacent areas lead to more visual attention than from node (i, j) to node (p, q) is defined as following:
that of far areas. Therefore, RC method combines spatial
relationship with the HC method. Firstly, the input image is 

divided into several regions. Then the color contrasts [51] of     i, j  ,  p, q    d  i, j  ||  p, q    F i p, j q   
each region are calculated and the saliency value of this
region is defined by the weighted contrast sum of each area   a 2  b2 
with its neighboring areas. The weight is determined by the  F  a, b   exp   
regional spatial distance, and the farther regions have the  2 
less weight.
Recently, Goferman et al. proposed a context based Here,  is a free parameter (usually one-tenth to one-
image saliency detection algorithm called CA (Context- fifteenth of the image width). The weight of two opposite
Aware) algorithm [21]. It can detect the saliency regions of orientation is the same. Then the weights of edges with
the representative scene, but not just the saliency target. different orientations between any two nodes is normalized
This method divides the input image into blocks. Firstly, the to [0, 1]. Define a Markov chain in GA, get the saliency
dissimilarity of arbitrary two blocks is calculated value from the comparison between each two nodes. Then
furthermore the local-global saliency in a single scale; then the active map is obtained which will also be normalized.
saliency is enhanced by multiple scales, and the display of Finally, the saliency map is obtained. GBVS algorithm has
the center-surround region becomes prominent. Finally, the relatively strong robustness and can highlight the saliency
saliency map is obtained. This algorithm is based on the region in the obvious way [58].

331
B. Frequency Domain Based Saliency Detection Algorithm smoothing operator [64, 65]. The reason of smoothing is to
Space domain saliency algorithms commonly have have the better visual effect on the saliency map.
shortcomings of being sensitive to parameter selection, high SR algorithm can restrain some simple backgrounds, and
computational complexity, and so on, which restrict their obtain relatively satisfying results in some extend. What’s
application in a real-time system. While frequency domain more, its operating speed is highly improved compared to
information can solve this problem. Saliency detection based space domain algorithms. The theory of SR is the scale-
on frequency domain converts the image from the space invariant principles for natural images, which is the 1/f rule.
domain to the frequency domain [59]. Then the frequency The mean Fourier frequency spectrum A(f) of natural
spectrum is used to detect the relationship among saliency images follows the distribution that E{A(f)} is proportional
features. The representatives models are SR (Spectral to 1/f. However, one single image does not have this scale-
Residual) algorithm [29], PQFT (Phase spectrum of invariant property. Moreover, it also lacks the support of the
Quaternion Fourier Transform) algorithm [51], FT physiological basis. The 1/f law in natural images may not
(Frequency-tuned) algorithm [30] and MSS (Maximum exist in the visual system.
Symmetric Surround) method [52], and so on. Take one-dimensional waveform as example, Guo et al.
Hou and Zhang proposed the Spectral Residual [66] explained that when the waveforms are recovered only
algorithm (SR algorithm). It is the most representative by using phase spectrum information [67], periodic signal
frequency domain based saliency region detection method. [68-70] will be restrained while regions without periodic
SR method considers that the information contained in the change will not be restrained. Accordingly, Guo et al. put
image can be distributed into saliency information and forward the PQFT method. It only uses the phase spectrum
redundant information. For different image data, the log of the input image after Fourier transform but discards the
spectrums have the similar distribution trend, and the curve magnitude spectrum. Then by inverse Fourier transform, the
satisfies the local linear condition [60]. So just by focusing saliency map that is similar to that of SR method is obtained.
on the differential part, the saliency region corresponding to At the same time, they made an expansion for PFT (Phase
the original image could be obtained. According to the spectrum of Fourier Transform). Besides the brightness
spectral residual theory [61], firstly the input image I(x) is information [71], according to the color feature extraction
transformed to the two-dimensional discrete Fourier domain method in Itti Model, they also extracted two color features:
[62]. Then the image amplitude is calculated and evaluated RG and BY. For dynamic images, the difference image M
on the logarithm. Finally, the log spectrum L(f) is obtained. between two frames was extracted to get a feature map. So
Equalized frequency spectrum: the image I(t) obtained in the time t could be replaced by a
quaternion image q(t).
 A f    !"$ I  x %#  &
 q(t )  M (t )  RG (t ))1  BY (t ))2  I (t )) 3  
Phase spectrum:
Where )12  1 , )22  1 , )32  1 ,
 
P  f   ' !"$ I  x #%   ( m1 * m2 , m2 * m3 , m3 * m1 , m3  m1  m2 .
Represent q(t) in the dual form:
In which, ! (I) is the Fourier transform of I(x), A(f)
is the Fourier transform frequency spectrum [63], and P(f)  q  t   f1  t   f 2  t  )2  
is the phase spectrum of I(x).
Logarithm frequency spectrum: Here:
f1  t   M  t   RG  t  )1 , f 2  t   BY  t   I  t  )1 .
 L( f )  log( A( f ))  
The Fourier transform of quaternion image q(x,y,t)
can be defined as:
Spectral redundancy R(f) is the subtraction of log
spectrums between before and after mean filtering:
 Q +u, v,  F1 +u, v ,  F2 +u, v , ) 2 
 R( f )  L( f ) hn ( f )  L( f ) 
M 1 N 1
- - e 1     .
1
) 2 mv / M  nu / N
Where Fi +u, v ,  fi  n, m 
Saliency map is as following: MN m0 n 0

From (15) and (16), we can get frequency domain table
 () = () ∗ ℑ
 () + ()
 
 Q(t) of q(t), and in polar coordinates it can be written as:
Q(t)=ǁQ(t)ǁ*e^ (μ*φ(t)), where φ(t) is the phase spectrum.
In which, hn(f) is the median filtering operator of n*n Suppose magnitude spectrum ǁQ(t)ǁ=1, after inverse
whose sum is 1. g(x) is a two-dimensional Gaussian quaternion Fourier transform, the space saliency map is

332
obtained. PQFT is a hypercomplex Fourier transform phase Composing several filters will get:
N 1
spectrum based saliency detection method [72], which is  ) G( x, y, 0 n )  G( x, y, 0 N  ) G( x, y,  ) .
constituted by using hypercomplex Fourier phase spectrum - G( x, y, 0
n 0
n 1

from different color pair subtraction and have high saliency Suppose the first Gaussian standard deviation ρNσ is
in the margins, while it can’t describe relatively integrated infinite. After convolution, every pixel value in the image is
saliency target. the mean value of all the pixels in the original image. The
In 2009, Achanta et al. proposed frequency domain second Gaussian standard deviation σ is relatively small.
analysis for saliency algorithm, called FT [30]. The basic The binomial filter can be used to approximate and
idea is to filter image from low frequency to high frequency accelerate the calculation speed. Saliency value S(x,y)=ǁIμ-
using several band-pass filters. Then it combines all the Iωhc (x,y)ǁ, where Iμ is the average image feature vector and
outputs to get the final saliency map. This algorithm tries to Iωhc is the corresponding pixel in the image after Gaussian
produce saliency map in full resolution [73-74]. The whole filtering. In Lab color space, saliency map is obtained by
process is realized by composing several Gaussian band- calculating Euclidean distance between the average vector
pass filters. The Gaussian band-pass filter is discribed as of the input image and the vector after Gaussian filtering.
following: This algorithm has clear boundaries, uniformly highlighting
the saliency region. Its resolution is the same as that of the
DoG  x, y  original image, and the calculation speed is relatively fast.
Meanwhile, it stresses on highlighting the biggest saliency
1 " 1  x 2  y 2  2 12 1 
x2  y 2  2 22 # objective. It can get the overall outline of the saliency target,
  . 2e e / 
2 $. 1  22 %/ but the brightness difference between the saliency region
and the rest is not obvious.
 G  x, y ,  1  G  x , y ,  2 

TABLE I. CHARACTERISTICS OF TYPICAL SALIENCY DETECTION METHODS


Method Color Multi- Biology
Abstract Method Type Resolution Range of Application
Name Space scale Basis
The resolution requirement of the saliency map is not high,
Itti RGB Local T Space domain 1 16 11 16 T
and the noise interferent of the original image is little

AC LAB Local T Space domain 11 1 F High real-time requirement


Global The saliency object and the surrounding environment are
CA LAB T Space domain 11 1 F
and local both important

GBVS RGB Global F Space domain 1 8 11 8 T Original image resolution is relatively low

Frequency Exact location and outline of saliency object are not


SR RGB Global F 64p×64p F
domain required
Frequency
PQFT RGB Global F 11 1 F Color images and video frames with white noise
domain
Frequency
FT LAB Global F 11 1 F Relatively small saliency object
domain
Frequency
MSS LAB Global F 11 1 F Relatively integrated saliency object
domain

FT algorithm uses the filter with fixed value. When the height of h, in the pixel (x,y), the symmetrical surrounding
saliency object is too significant, it works not well. The region saliency value in the image sSS(x,y) is represented as:
reason is that in the saliency formula s(x,y), the saliency
object will affect the calculation of LAB space mean value.  Sss  x, y   I )  x, y  I f  x, y   &
So the saliency value calculated from Euclidean distance is
slightly small than that of the background. In 2010, Achanta
et al. improved FT method and proposed MSS method [52]. Iμ (x,y) is the average vector of the sub-region centered
In accordance with the distance between the pixel point and on point (x,y),
the image boundary, this method changes center-surround
bandwidth. So it uses the mean value of the most probable 1 x  x0 y  y0

symmetrical surrounding region to replace calculating the I )  x, y   -


A i  x x0
- I  i, j 
j  y y0
mean value of the global average feature vector that in the   (
FT algorithm. To the image with the width of w and the
The offsets x0, y0 and sub-region A are defined as:

333
x0  min  x, w x  results under different situations. Table 2 shows the average
times each method requires. All codes are from the authors’
 y0  min( y, h y )   Matlab source codes. The operational software environment
A   2 x0  1 2 y0  1 is MATLAB R2010a. The system environment is Windows
7, RAM is 2GB, and CPU dominant frequency is 2.1GHZ.
MSS model reduces background saliency. However, if
TABLE II. AVERAGE RUN TIME FOR EACH METHOD TO CALCULATE
the saliency object in the image is not integral, then it will 200 IMAGES IN IMGSAL DATABASE
be processed as a background.
Method Itti SR GBVS FT CA MSS
C. Characteristics of Different Saliency Detection Methods
Time (s) 3.286 0.141 3.202 0.964 114.769 2.391
This section summarizes the characteristics of currently
typical saliency detection methods mentioned above all. The
color space, abstract, method type, image resolution, From the experiment results in Fig. 1 and Fig. 2, it can
whether having biology basis and range of application are be seen, IT and GBVS take the local features into account,
listed in Table 1. so the salient regions in the maps are relatively high.
However, the outlines of saliency region are not clear, and
III. EXPERIMENT RESULTS AND ANALYSIS some parts in the background are also highlighted. FT and
Currently, commonly used databases for saliency MSS algorithms consider more about global feature. The
detection are MSRA [75] and ImgSal [76]. MSRA is the saliency regions keep the relatively complete outline.
biggest public test sets existed which has 25000 images. In However, the brightness of salient regions are low, and the
the recent years, most of saliency models apply this locations of the most salient regions are not obvious
database to test their efficiency [77]. The size of images in compared with backgrounds. The global frequency domain
ImgSal database is 480*640 pixels and there are six analysis based SR algorithm does not keep enough high-
categories in this database, which are 50 images with big frequency information. So the boundary of the salient region
saliency region, 80 images with medium saliency region, 60 is not clear.
images with small saliency region, 15 images with complex It can be seen from results in Table 2 that the times
saliency background, 15 images with repetitive interferent required by frequency domain based algorithms are
and 15 images with two kinds of saliency regions( large and apparently less than space domain based algorithms.
small). Saliency regions of all images are manually marked, Moreover, CA algorithm, fully considering local and global
and the eye movement data are collected by using the eye features, needs more time than others. In the following, two
tracker. methods of fixed threshold segmentation and self-adapting
Besides, McGill is also a wide-used saliency database . threshold segmentation are used to perform experiments
The image resolution is 1024*768 pixels. It has 12 animal quantificationally.
images, 12 street scene images, 16 building images, 20 plant A. Fixed Threshold Segmentation
images and 41 natural landscape images. Images in Mouse
To extract saliency objects in saliency map, the easiest
tracking database are divided into three kinds: natural image,
method is the global threshold segmentation. Here the
advertisement and website homepage. The images of
threshold is gradually increasing from 0 to 255. Then each
PASCAL2 are classified into four categories: human,
threshold is used to make segmentation in the saliency maps.
animal, vehicle and indoor item. It selects typical saliency
Finally the ROC (Receiver Operating Characteristic) curves
detection methods to perform experiments. The selection is
of six saliency detection methods are calculated. The ROC
judged by the following: more cited rates (Itti [22], SR [29]),
curve [78] is a kind of quantitative method for analyzing,
recently published methods (CA [21], MSS [52], FT [30],
judging and deciding performance. It can dynamically and
GBVS [27]). Among which, Itti [22], CA [21], GBVS [27]
objectively evaluate whether a sorting technique is useful or
are space domain based algorithms, and SR [29], FT [30]
not. When having lower false positive rate, the method with
and MSS [52] are frequency domain based algorithms.
the higher true positive rate is the best. The ROC curve
These algorithms can be easily realized in the computer.
plotted in Fig. 3 still adopts images of ImgSal database.
Experiment results are relatively good. Moreover, they have
The ROC curve in Fig. 3 shows that CA algorithm and
high citing rates.
GBVS algorithm have higher true positive rate when in the
These methods are used to calculate 200 images in
lower false positive rate. However, SR algorithm has the
ImgSal database. Fig. 1 and Fig. 2 are saliency detection
worst effect.

334
Input
Images

Itti

GBVS

SR

FT

MSS

CA

(a) (b) (c) (d) (e) (f)


Figure 1. Saliency maps of various methods under different situations: (a)-(b)Large saliency object (c)-(d)Medium saliency object (e)-(f)Small saliency
object.

Input
Images

Itti

GBVS

SR

FT

MSS

CA

(a) (b) (c) (d) (e) (f)


Figure 2. Saliency maps of various methods under different situations: (a)-(b)Repetitive interferent in the background (c)-(d)Cluttered background (e)-
(f)Large and small saliency regions.

335
them be improved. The greater the P, R and F values are, the
higher the absolute efficiency of the object showed in
saliency map is. This kind of statistical index results in a
contrasted absolute value. The contrasted result of self-
adapting threshold segmentation is showed in Fig. 4.

Figure 3. ROC curve.

B. Self-Adapting Threshold Segmentation


Except the easiest fixed threshold segmentation, there
are also many more complex and more efficient methods to
extract saliency target from saliency map. The self-adapting
threshold set for saliency map is: Figure 4. The precision P, recall rate R and F-measure of self-adapting
threshold segmentation.

2 W 1 H 1
T2  - - S ( x, y )
W 1 H x 0 y 0
The contrasted graph shows that, the saliency maps of
GBVS and CA have the highest absolute efficiency of
  
showing targets, FT algorithm and MSS algorithm take the
second place, while Itti algorithm and SR algorithm are the
In which, W and H are width and height of the image,
worst.
S(x,y) is the saliency value of the pixel point (x,y). In
saliency map, if the grayscale of pixel point is greater than IV. SUMMARY AND PROSPECT
the threshold, Tx=1, otherwise, Tx=0. Then the binary image
This paper introduces the research progresses of image
is obtained. We can calculate the average precision P, recall
saliency in the current stage. On the basis of classifying
rate R and F-measure of each algorithm in the dataset:
saliency detection methods, we focus on principle methods.
Through the analysis of experiment results, get the
sum( S , A)
P  precision  advantages and disadvantages of many methods.
 sumA  
 With the research going deep, the further work of image
saliency detection includes:
sum(S , A) 3 Reduce the time cost. The time complexity of
R  recall  existing algorithms is high, unsuitable for the real-
 sumS    time process.
3 Focus on the development of top-down algorithms.
21 P 1 R This paper introduces some important bottom-up
F  F measure 
 P  R  saliency algorithms of the visual attention
mechanism. At present, there are several
achievements in the study of top-down visual
In which, sum(S,A) represents the sum after multiplying
attention mechanism. In the future, the development
corresponding points in saliency map S with manually
of top-down visual attention mechanism should be
segmented image A. While sum(S) and sum(A) respectively paid more attention to making detection and
are the sums of all pixel point values in the saliency map S judgment of objects, which has important research
and the manually segmented image A. P is the value of significance and research space.
effective saliency regions in saliency map and salient 3 Increase the application range of saliency detection
regions in manually segmented image, called precision ratio. algorithm. For instance, current saliency detection
R is the specific value of effective saliency regions, and all algorithms achieve good effects on processing small-
saliency regions showed in the saliency map, called recall scale images, however, the results are not satisfying
ratio. The relationship between precision ratio and recall on large-scale images containing the large scene.
ratio is reciprocal, however, F is the value requires both of 3 Expand saliency processing methods. The
algorithms can be expanded to signal processing

336
based on methods, computational vision based [22] L. Itti, C. Koch and E. Niebur, “A model of saliency-based visual
methods and data mining or machine learning based attention for rapid scene analysis,” IEEE Transactions on pattern
analysis and machine intelligence, 1998, vol. 20, no. 11, pp. 1254-
methods. 1259.
[23] L. Itti and C. Koch, “A saliency-based search mechanism for overt
REFERENCES and covert shifts of visual attention,” Vision research, 2000, vol. 40,
no. 10, pp. 1489-1506.
[1] N. Sang, Z. Li and T. Zhang, “Applications of human visual attention [24] L. Itti and C. Koch, “Computational modelling of visual attention,”
mechanisms in object detection,” Infrared and Laser Engineering, Nature reviews neuroscience, 2001, vol. 2, no. 3, pp. 194-203.
2004, vol. 33, no. 1, pp. 38-42. [25] L. Itti, C. Gold and C. Koch, “Visual attention and target detection in
[2] Y. F. Ma and H. J. Zhang, “Contrast-based image attention analysis cluttered natural scenes,” Optical Engineering, 2001, vol. 40, no. 9,
by using fuzzy growing,” C. Proceedings of the eleventh ACM pp. 1784-1793.
international conference on Multimedia. ACM, 2003, pp. 374-381. [26] S. Frintrop, M. Klodt and E. Rome, “A real-time visual attention
[3] C. Christopoulos, A. Skodras and T. Ebrahimi, “The JPEG2000 still system using integral images,” International Conference on Computer
image coding system: an overview,” J. Consumer Electronics, IEEE Vision Systems, 2007, pp. 1-10.
Transactions on, 2000, vol. 46, no. 4, pp. 1103-1127. [27] J. Harel, C. Koch and P. Perona, “Graph-based visual saliency,”
[4] Y. Hu, X. Xie and W. Y. Ma, “Salient region detection using Advances in neural information processing systems, 2006, pp. 545-
weighted feature maps based on the human visual attention model,” 552.
Advances in Multimedia Information Processing-PCM. Springer [28] R. Achanta, F. Estrada and P. Wils, “Salient region detection and
Berlin Heidelberg, 2005, pp. 993-1000. segmentation,” Computer Vision Systems, Springer Berlin
[5] B.C. Ko and J.Y. Nam, “Object-of-interest image segmentation based Heidelberg, 2008, pp. 66-75.
on human attention and semantic region clustering,” Journal of the [29] X. Hou and L. Zhang, “Saliency detection: A spectral residual
Optical Society of America A, 2006, Vol. 23, pp. 2462-2470. approach,” Computer vision and pattern recognition (CVPR) 2007,
[6] C. Koch and S. Ullman, “Shifts in selective visual attention: towards IEEE Conference on. IEEE, 2007, pp. 1-8.
the underlying neural circuitry,” Matters of intelligence, Springer [30] R. Achanta, S. Hemami and F. Estrada, “Frequency-tuned salient
Netherlands, 1987, pp. 115-141. region detection” Computer vision and pattern recognition (CVPR)
[7] U. Rutishauser, D. Walther and C. Koch, “Is bottom-up attention 2009, IEEE Conference, 2009, pp. 1597-1604.
useful for object recognition?” Proceedings of the 2004 IEEE [31] T. Heskes and S. Gielen, “Task-dependent learning of attention,”
Computer Society Conference on. IEEE, 2004, 2: II-37-II-44 Vol. 2, Neural networks, 1997, vol. 10, no. 6, pp. 981-992.
2004.
[32] I. A. Rybak, V. I. Gusakova and A. V. Golovan, “A model of
[8] S. Avidan and A. Shamir, “Seam carving for content-aware image attention-guided visual perception and recognition. Vision research,”
resizing,” ACM Transactions on graphics. ACM, 2007, vol. 26, no. 3, 1998, vol. 38, no. 5, pp. 2387-2400.
p. 10.
[33] N. Oliver, B. Rosario and A. Pentland, “A Bayesian computer vision
[9] J. Mairal, M. Elad and G. Sapiro, “Sparse representation for color system for modeling human interactions,” Pattern Analysis and
image restoration,” Image Processing, IEEE Transactions, 2008, vol. Machine Intelligence, 2000, vol. 22, no. 8, pp. 831-843.
17, no. 1, pp. 53-69.
[34] L. Itti, “Models of bottom-up and top-down visual attention,”
[10] M. Ding and R. F. Tong, “Content-aware copying and pasting in California Institute of Technology, 2000, pp. 1-216.
images,” The Visual Computer, 2010, vol. 26, pp. 721-729.
[35] A. Salah, E. Alpaydin and L. Akarun, “A selective attention-based
[11] Y. Wang, “Optimized scale-and-stretch for image resizing,” ACM method for visual pattern recognition with application to handwritten
Transactions on Graphics, 2008, vol. 27, no. 5, pp. 1-8. digit recognition and face recognition,” Pattern Analysis and Machine
[12] H. S. Wu, “Resizing by symmetry-summarization,” ACM Intelligence, 2002, vol. 24, no. 3, pp. 420-425.
Transactions on Graphics, 2010, vol. 29, no. 6, pp. 1-9. [36] V. Navalpakkam and L. Itti, “An integrated model of top-down and
[13] G. X. Zhang, “A shape-preserving approach to image resizing,” bottom-up attention for optimizing detection speed,” Computer
Computer Graph Forum, 2009, vol. 28, no. 7, pp. 1897-1906. Vision and Pattern Recognition, 2006 IEEE Computer Society
[14] M. Banks, J. Read and R. Allison, “Stereoscopy and the human visual Conference, 2006, pp. 2049-2056.
system,” SMPTE motion imaging journal, 2012, pp. 24-43. [37] S. Frintrop, E. Rome and H.I. Christensen, “Computational visual
[15] H. Wang, “Human Visual System Based Stereoscopic Image Quality attention systems and their cognitive foundations: A survey,” ACM
Assessment,” Intelligent Computer and Applications, 2014, pp. 50-53. Transactions on Applied Perception, 2010, vol. 7, no. 1, p. 6.
[16] D. Wang and F. Ding, “Input-output data filtering based recursive [38] A. Treisman and G. Gelade, “A feature-integration theory of
least squares identification for CARARMA systems,” Digital Signal attention,” Cognitive psychology, 1980, vol. 12, no. 1, pp. 97-136.
Processing, 2010, vol. 20, no. 4, pp. 991-999. [39] G. John and J.L. Kemeny Snell, “Finite markov chains,” Van
[17] A. W. Ellis and A. W. Young, “Human cognitive neuropsychology: A Nostrand, 1967, p. 210.
textbook with readings,” Psychology Press, 2013. [40] L. Itti and C. Koch, “Computational modelling of visual attention,” J.
[18] A. S. Hervey, “Neuropsychology of adults with attention- Nature reviews neuroscience, 2001, vol. 2, no. 3, pp. 194-203.
deficit/hyperactivity disorder: a meta-analytic review,” [41] J. Müller, M. Philiastides and W. Newsome, “Microstimulation of the
Neuropsychology, 2004, vol. 18, no. 3, p. 485. superior colliculus focuses attention without moving the eyes,” J.
[19] N. Ouerhani, R. Von Wartburg and H. Hugli, “Empirical validation of Proceedings of the National Academy of Sciences of the United
the saliency-based model of visual attention. Electronic letters on States of America, 2005, vol. 102, no. 3, pp. 524-529.
computer vision and image analysis,” 2003, vol. 3, no. 1, pp. 13-24. [42] T. Wang, “Image Retrieval Based on Color-Spatial Feature,” Journal
[20] M. Tian, “Extracting Bottom-Up Attention Information Based on of Software, 2002, vol. 13, no. 10, pp. 2031-2036.
Local Complexity and Early Visual Features,” Journal of Computer [43] L. Zhu, “A Research on the Histogram's Matching that Based on the
Research and Development, 2008, pp. 1739-1746. Histogram's Statistical Characterization,” Computing Technology and
[21] S. Goferman, L. Zelnik-Manor and A. Tal, “Context-aware saliency Automation, 2004, pp. 48-51.
detection,” Pattern Analysis and Machine Intelligence, IEEE [44] A. Singhal, C. Buckley and M. Mitra, “Pivoted document length
Transactions on, 2012, vol. 34, no. 10, pp. 1915-1926. normalization,” Proceedings of the 19th annual international ACM

337
SIGIR conference on Research and development in information [67] Z. Wang and X. Huang, “Digital Signal All Phase Spectral Analysis
retrieval, 1996, pp. 21-29. and Filtering Technology,” 2009, pp. 30-45.
[45] M.S. Liu, “Establishment of normalization model for quantitatively [68] Y. Makihara, M.R. Aqmar and N.T. Trung, “Phase Estimation of a
evaluating impact power of university's academic journals,” Acta Single Quasi-Periodic Signal,” IEEE Transactions on Signal
Editologica, 2004, pp. 405-406. Processing, 2014, vol. 62, pp. 2066-2079.
[46] R. Klein, “Inhibition of return,” Trends in cognitive sciences, 2000, [69] M. Mojiri and A.R. Bakhshai, “An adaptive notch filter for frequency
vol. 4, no. 4, pp. 138-147. estimation of a periodic signal,” Automatic Control, IEEE
[47] V. Navalpakkam and L. Itti, “A goal oriented attention guidance Transactions on, 2004, vol. 49, pp. 314-318.
model,” Biologically motivated computer vision, Springer Berlin [70] J. Xi and J. F. Chicharo, “A new algorithm for improving the
Heidelberg, 2002, pp. 453-461. accuracy of periodic signal analysis,” Instrumentation and
[48] Y.Y. Qu, “Salient Building Detection Based on SVM,” Journal of Measurement, IEEE Transactions on, 1996, vol. 45, pp. 827-831.
Computer Research and Development, 2007, vol. 44, no. 1, pp. 141- [71] J.F. Barraza and E.M. Colombo, “The effect of chromatic and
147. luminance information on reaction times,” Visual neuroscience, 2010,
[49] E. Jayatunga and O. Dobre, “A robust higher-order cyclic cumulants vol. 27, pp. 119-129.
feature-based vector for QAM classification,” Communication [72] T.A. Ell and S.J. Sangwine, Hypercomplex Fourier transforms of
Systems, Networks & Digital Signal Processing (CSNDSP), 2014 9th color images. Image Processing, IEEE Transactions on, 2007, vol. 16,
IEEE, International Symposium on, 2014, pp. 417-421. pp. 22-35.
[50] M.M. Cheng, “Global Contrast based Salient Region Detection,” [73] A. Lumsdaine and T. Georgiev, “Full resolution lightfield rendering,”
Proceedings of IEEE Conference on Computer Vision and Pattern Indiana University and Adobe Systems, 2008.
Recognition (CVPR), Colorado Springs, IEEE, 2011, pp. 409-416. [74] R. Achanta and S. Susstrunk, “Saliency detection using maximum
[51] D. Jameson and L. Hurvich, “Theory of brightness and color contrast symmetric surround,” Image Processing (ICIP), 2010 17th IEEE
in human vision,” Vision Research, 1964, vol. 4, no. 1, pp. 135-154. International Conference on, 2010, pp. 2653-2656.
[52] Z. Xu, “Data modeling: Visual psychology approach and L1/2 [75] T. Liu, “MSRA Salient Object Database,”
regularization theory,” Congress of Mathmaticians, 2010, pp. 3151- http://research.Microsoft.com/enus/um/people/jiansun/salientobject/sa
3184. lient_object.htm.
[53] T. Pu, “Contrast-Based Multiresolution Image Fusion,” ACTA [76] X.J. An, “ImgSal: A benchmark for saliency detection V1.0,”
ELECTRONICA SINICA, 2000, vol. 28, no. 12, pp. 116-118. http://www.cim.mcgill.ca/~lij-ian/database.htm
[54] O. Russakovsky, Y. Lin and K. Yu, “Object-centric spatial pooling [77] T. Liu, “Learning to detect a salient object,” IEEE Transactions on
for image classification,” Computer Vision–ECCV 2012, Springer Pattern Analysis and Machine Intelligence, 2011, vol. 33, pp. 353–
Berlin Heidelberg, 2012, pp. 1-15. 367.
[55] M.M. Cheng, G.X. Zhang and N.J. Mitra, “Global contrast based [78] C.H. Yu, “The Basic Principle of ROC Analysis,” Chinese Journal of
salient region detection,” Computer Vision and Pattern Recognition Epidemiology, 1998, pp. 413-415.
(CVPR), 2011 IEEE Conference on, 2011, pp. 409-416
[56] S. Yen and L. Finkel, “Extraction of perceptually salient contours by
striate cortical networks,” J. Vision research, 1998, vol. 38, no. 5, pp.
719-741.
[57] W. Aiello, F. Chung and L. Lu, “A random graph model for massive
graphs,” Proceedings of the thirty-second annual ACM symposium
on Theory of computing, 2000, pp. 171-180.
[58] H. Lin and W. Long, “Some Problems of System Robustness
Backgrounds,” Status and Challenges, Control Theory &
Applications, 1991, vol. 1, pp. 001.
[59] J. Li, M.D. Levine and X. An, “Visual saliency based on scale-space
analysis in the frequency domain,” Pattern Analysis and Machine
Intelligence, IEEE Transactions on, 2013, vol. 35, pp. 996-1010.
[60] K. Yu and M.C. Jones, “Local linear quantile regression,” Journal of
the American statistical Association, 1998, vol. 93, pp. 228-237.
[61] X. Hou and L. Zhang, “Saliency detection: A spectral residual
approach. Computer Vision and Pattern Recognition,” IEEE
Conference on, 2007, pp. 1-8.
[62] Y. Liu, J. Luo and Q. Huang, “FPGA Implementation of Integer DCT
Transform and Quantization for H. 265/HEVC,” Dianshi Jishu(Video
Engineering), 2013, vol. 37, pp. 12-14.
[63] R. Bell, “Introductory Fourier transform spectroscopy,” Elsevier,
2012.
[64] C. Lopez-Molina and H. Bustince, “Multiscale edge detection based
on Gaussian smoothing and edge tracking,” Knowledge-Based
Systems, 2013, vol. 44, pp. 101-111.
[65] X. He, D. Cai and Y. Shao, “Laplacian regularized gaussian mixture
model for data clustering,” Knowledge and Data Engineering, IEEE
Transactions on, 2011, vol. 23, pp. 1406-1418.
[66] C. Guo, Q. Ma and L. Zhang, “Spatio-temporal saliency detection
using phase spectrum of quaternion fourier transform,” Computer
vision and pattern recognition, 2008, IEEE conference on, 2008, pp.
1-8.

338

You might also like