You are on page 1of 6

2016 13th International Conference Computer Graphics, Imaging and Visualization

Grouping K-means adjacent regions for semantic


image annotation using Bayesian networks

M. OUJAOURA*, R. EL AYACHI*, B. MINAOUI*, M. FAKIR*, and


O. BENCHAREF**

* Faculty of Science and Technology, Computer Science Department, Laboratory of Information Processing and
Telecommunications, Sultan Moulay Slimane University, Bni Mellal, Morocco.
Emails: oujaouram@yahoo.fr, rachidieea@yahoo.fr, bra_min@yahoo.fr, & fakfad @yahoo.fr

** Higher School of Technology, Computer Science Department, Cadi Ayyad University, Essaouira, Morocco.
Email: bencharef98@gmail.com

Abstract: To perform a semantic search on a large dataset of images, we need to be able to transform the visual content of
images (colors, textures, shapes) into semantic information. This transformation, called image annotation, assigns a caption
or keywords to the visual content in a digital image. In this paper we try to resolve partially the region homogeneity problem
in image annotation, we propose an approach to annotate image based on grouping adjacent regions, we use the k-means
algorithm as the segmentation algorithm while the texture and GIST descriptors are used as features to represent image
content. The Bayesian networks were been used as classifiers in order to find and allocate the appropriate keywords to this
content. The experimental results were been obtained from the ETH-80 image database.

Keywords: Color; image; annotation; segmentation; descriptor; classification.

1. Introduction for the image annotation by using a Bayesian network


classifier. The Section 5 is reserved to present and
The recent technological advances in multimedia data discuss the experimental results for image annotation.
acquisition have led to an exponential growth of Finally, in the last section, the principal conclusion
available digital content. For the user of these huge concerning the proposed approach were been given in
databases, the information retrieval is very problematic addition to the possible future works.
because it assumes that the contents was been indexed
and annotated correctly. With the rapid growth in data
2. K-means Image Segmentation
volumes, the manual annotation remains today Usually, the features vector extracted from the entire
prohibitively expensive and impractical. Therefore, the image loses local information. Therefore, it is
automatic image annotation becomes an important and necessary to segment an image into regions or objects
unavoidable way to reduce the existing semantic gap of interest and use of local characteristics. Image
between the high-level concepts that was been used by segmentation is a method that localizes and extracts an
humans to describe images and the low-level visual object from an image or divides the image into several
content used by the machine to represent images [1-2]. regions. It plays a very important role in many
In this paper, we focus on producing automated image applications for image processing, and remains a
annotation approaches that attempt to provide an challenge for scientists and researchers. So far, the
answer to this problem that still persistent. We efforts and attempts are still been made to improve the
introduce a system based on the k-means automatic segmentation techniques. Actually, with the
image segmentation with texture and GIST descriptors. improvement of computer processing capacities, there
The Bayesian network was been used as a classifier. are several possible segmentation techniques of an
Since the most segmentation algorithms use low-level image: K-means, threshold, region growing, active
predicates to control the homogeneity of the regions, contours, level sets, etc... [3].
the resulting regions are not always being semantically
The K-Means method is primarily a traditional tool of
compact. Therefore, we proposed the approach of
classification that divides a data set into k
regrouping the different region in order to have
homogeneous classes. Since the most digital images
compact object that could been annotated with semantic
locally satisfy homogeneity properties, especially in
and appropriate keywords.
terms of light intensity. The k-Means algorithm
The rest of the paper is organised as follows. The therefore allows finding a solution to image
Section 2 discusses the k-means image segmentation segmentation. The K-Means algorithm is based on a
while the section 3 presents the texture and GIST clustering algorithm that does not require the presence
features extraction method. The Section 4 is reserved of a learning database. Therefore, this algorithm can

978-1-5090-0811-7/16 $31.00 2016 IEEE 243


DOI 10.1109/CGiV.2016.54
organize the pixels of the image in the form of one or
more classes according to optimization criteria and
homogeneity [4].
Given a set of image pixels X = {p1 , p 2 , K , p n } R d
where each pixel is a veritable vector of dimension d =
3 in the case of a colour image (d = 5 if the pixels
coordinates are introduced as information of spatial
coherence or connectivity). The k-Means algorithm
aims to classify and divide the n pixels of the image Figure 1. Principle used for the selection of k.
into k sets or regions (k n) S = {R1 , R2 ,K , Rk } with
After transformation of the color image into a single
a manner that minimizes the interclass variance, which image formed by the reduced numbers of colors, the
results in minimizing the sum of squared Euclidean cluster number k can be the number of peaks in the
distances among the clusters defined by: histogram from the transformed image.
k k
An example of image segmentation, by using region
E= = Card (Ri ) Var (Ri )
2
p j mi K-means segmentation algorithm, is been presented in
i =1 p j Ri i =1
Figure 2.
Where:
p j pixel vector;
Card (Ri ) is the number of pixels in the cluster or
region Ri ;
p
p j Ri
j

mi = is the center of the cluster or region


Card (Ri )
Ri also known as kernel;


2
p j mi
Var (Ri ) =
p j Ri
is the variance of pixel Figure 2. Example of K-means image segmentation.
(Card (Ri ))2 The adjacent regions are regrouped in order to get
cluster or region. compact object. As illustrated in Figure 3, the adjacent
The k-means image segmentation algorithm finds the regions are grouped iteratively and annotated by the
pixels groups that minimize the defined above quantity appropriated keyword if its probability is higher than
E. This comes somehow for each cluster or region, to 0.5.
minimize the following quantity:
Input Image

2
p j mi
p j Ri Image Segmentation

The principle of the minimization algorithm of this


error can result in the following main steps [5]: Features Extraction
Grouping adjacent regions

1. Choosing the number of clusters (number of


kernels); Classification by calculating
2. Initialization of clusters and their kernels; keywords probability P(k)
3. Updating clusters by optimizing the error clustering;
No
4. Calculation and revaluation of the new clusters; P(k)>0.5?
5. Iterate and repeat steps 3 and 4 until clusters
stabilization.
The number of clusters k can match approximately the Add the keywords
number of dominant colors used to represent the image.
The determination of k could been performed using the
color histograms. Figure 1 shows the principle used for Yes
Possibility of regrouping?
the selection of k.

Annotation Results

Figure 3. Block diagram of the principle of grouping adjacent


regions.

244
We have, in Figure 4, the K-means automatic image in features database in order to be used for the
segmentation of an image representing the object "car" annotation by classification.
that is segmented into multiple regions or clusters. It
also shows the grouping possibility of these clusters or
6.1. Texture Descriptors
regions to have a semantically compact object. We can Several images have textured patterns. Therefore, the
see from this figure that the grouping of clusters 1 and texture descriptor was been used as feature extraction
3 is a compact object that could been easily annotated method from the segmented image.
more correctly than the objects of other non-grouped
The texture descriptor was been extracted using the
clusters. Hence, the major interest of regrouping and
co-occurrence matrix introduced by Haralick in 1973
merger of adjacent regions for the semantic image
annotation. [7]. So for a color image I of size N N 3 in a
colour space (C1 , C 2 , C3 ) , for (k , l ) [1,L, N ]2
and (a , b ) [1, L , G ] , the co-occurrence matrix
2

M Ck ,, lC ' [I ] of the two colour components


C , C ' {C1 , C 2 , C 3 } from the image I is defined by:
Nk N l
1
MCk,,lC' ([I ], a, b) = (I(i, j,C) a , I(i + k, j + l,C') b)
(N k)(N l)
i =1 j =1

Where is the unit pulse defined by:


1 if x= y=0
(x , y ) =
0 else
Each color image I in a colour space (C1 , C 2 , C 3 )
can be characterized by six color co-occurrence
matrix:
M C1 , C1 [I ] , M C2 , C2 [I ] , M C3 , C3 [I ] , M C1 , C2 [I ] ,
M C1 , C3 [I ] and M C2 , C3 [I ] .
Matrix M C2 , C1 [I ] , M 3 1 [I ] and M 3 2 [I ] are not
C ,C C ,C

taken into account because they can be deduced


respectively by diagonal symmetry from
matrix M C1 , C 2
[I ], M [I ] and M [I ]. As
C1 , C3 C 2 , C3

they measure local interactions between pixels, they


are sensitive to significant differences in spatial
resolution between the images. To reduce this
Figure 4. Example of segmentation and regrouping of clusters of a sensitivity, it is necessary to normalize these matrices
segmented image. by the total number of the considered co-occurrences
Therefore, the grouping of regions before annotation matrix:
can help solving the problem of non-compact regions of M Ck ,, lC ' ([I ], a , b )
a segmented image, reducing the semantic gap existing M Ck ,, lC ' ([I ], a , b ) = T 1 T 1
between the high-level concepts and low-level M C,C'
k,l ([I ], i , j )
descriptors. i =0 j =0

Where T is the number of quantization levels of the


3. Texture and GIST descriptors color components.
After dividing the original image into several distinct To reduce the large amount of information of these
regions that correspond to objects in a scene, the feature matrices, the 14 Haralick indices [7] of these matrices
vector might been extracted carefully from a region in are used. There will be then 84 textures attributes for
order to reduce the rich content and large data of six co-occurrence matrices (14 6 ) .
images and preserve the content representation of the 6.2. GIST descriptors
entire image. Therefore, the feature extraction task can
decrease the processing time. Not only it enhances the In computer vision, GIST descriptors are a
retrieval and annotation accuracy, but also the representation of a low-dimensional image that
annotation speed, since a large image database can be contains enough information to identify the scene in
organized according to the classification rule and, an image. Global GIST descriptors allow a very small
therefore, search can be performed [6]. size representation of an image. These descriptors
were been introduced by Oliva and Torralba in 2001
All the features descriptors are extracted for all the [8, 9]. They can represent the dominant spatial
images in reference database and stored with keywords structure of the scene from a set of perceptual

245
dimensions. They require no segmentation. The authors {(
= P X i Pa ( X i ) )} is a conditional probabilities
have tried to capture the gist descriptor of the image by
analyzing the spatial frequency and orientation. The of each node X i relative to the state of his parents
global descriptor was been constructed by combining Pa( X i ) in G.
the amplitudes obtained at the output of the K Gabor The graphical part of the Bayesian network indicates
filters [10] at different scales E and orientations O. To the dependencies between variables and gives a visual
reduce the feature vector size, each filtered output representation tool of knowledge more easily
image is scaled and divided into N * N blocks (N understandable by users. Bayesian networks combine
between 2 and 16), which gives a vector of dimension qualitative part that is graphs and a quantitative part
N * N * K * E * O. This dimension might be further representing the conditional probabilities associated
reduced by a principal component analysis (PCA), with each node of the graph with respect to parents.
which also gives the weights applied to different filters. Pearl and all [12] have also shown that Bayesian
The computation and extraction of GIST descriptors networks allow to compactly representing the joint
were done through several steps. After the pre- probability distribution over all the variables:
processing step of the input image, the next step n
consists on changing the image into different scales and P( X ) = P( X 1 , X 2 , L , X n ) = P ( X i Pa( X i ))
orientations. Finally, the features vectors are calculated i =1
for each scale, orientation and frequency. Those Where Pa( X i ) is the set of parents of node X i in the
features vectors are combined to form a global feature
graph G of the Bayesian network.
descriptor which is reduced by a principal component
This joint probability could be actually simplified by
analysis (PCA). The Figure 5 shows an image and its
the Bayes rule as follows [13]:
GIST descriptors. n
P( X ) = P( X1 , X2 , L, Xn ) = P(Xi Pa( Xi ))
i =1

= P(Xn Xn1 , L, X1 ) P(Xn1 Xn2 , L, X1 ) L P( X2 X1 ) P( X1 )


n
= P( X1 ) P(Xi Xi 1 , L, X1 )
i =2

The construction of a Bayesian network consists in


finding a structure or a graph and estimates its
parameters by machine learning. In the case of the
classification, the Bayesian network can have a class
node Ci and many attribute nodes X j . The naive
Figure 5. Example of image and its GIST descriptor.
Bayes classifier is used in this paper due to its
4. Bayesian network classifier robustness and simplicity.
Automatic image annotation can be approached and To estimate the Bayesian network parameters and
tackled by the classifier that was been generated and probabilities, Gaussian distributions are generally
trained from the training examples to reduce the gap used. The conditional distribution of a node relative to
between low-level vector and high-level concepts. The its parent is a Gaussian distribution whose mean is a
trained classifier can directly match all the low-level linear combination of the parents value and whose
features to the high-level conceptual classes. Several variance is independent of the parents value [14]:
types of classifier were been used for classification. ij
2
1
ni
exp 2 xi i + 2 (xj j )
1
Each classifier was been found suitable for the P(Xi Pa( Xi )) =
2i2 2i j =1 j


classification of a particular type of feature vectors
according to their parameters. Where,
Bayesian networks were been based on a probabilistic Pa( X i ) Are the parents of X i ;
approach governed by Bayes' rule. The Bayesian
approach was been based on the conditional probability i , j , i and j are the means and variances of
that estimates the probability of occurrence of an event the attributes X i and X j respectively without
assuming that another event is verified. A Bayesian
considering their parents;
network is a graphical probabilistic model representing
the random variable as a directed acyclic graph. It is ni is the number of parents;
defined by [11]: i j is the regression matrix of weights.
G = ( X , E ) , Where X is the set of nodes and E is After the parameter and structure learning of a
the set of edges, G is a Directed Acyclic Graph Bayesian network, The Bayesian inference is used to
(DAG) whose vertices are associated with a set of calculate the probability of any variable in a
random variables X = {X 1 , X 2 , L , X n }; probabilistic model from the observation of one or
more other variables. So, the chosen class Ci is the one
that maximizes these probabilities [15]:

246

(
P(Ci ) P X j Pa(X j ) , Ci ) descriptor is higher than the annotation rate based on
n
if X j has parents
Texture descriptors.
P(Ci X ) =
j =1

P(C ) P(X C )
n
Table 1 General Annotation and Error Rates with Global Learning
i j i else and Execution Times of Texture and GIST Descriptors using
j =1
Bayesian Network classifier.
For the naive Bayes classifier, the absence of parents
Extraction Learning Execution Annotation Error Rate
and the variables independence assumption are used to Method Time (s) Time (s) Rate (%) (%)
write the posterior probability of each class as given in Texture 9.41 413.68 55.00 45.00
GIST 11.32 1796.17 62.50 37.50
the following equation [16]:
P(C i X ) = P(C i ) P(X j C i )
n
The results are also affected by the accuracy of the
image segmentation method. In most cases, it is very
j =1
difficult to have an automatic ideal segmentation. The
Therefore, the decision rule d of attribute X is given by: predicate used to control the image segmentation is
d ( X ) = arg max P(C i X ) low level. This leads to regions that are not compact
Ci
semantically. This problem decreases the annotation
= arg max P(C i ) P (X j C i )
n
rates. Therefore, any annotation attempt must consider
Ci j =1 image segmentation as an important step, not only for
The class with maximum probability leads to the automatic image annotation system, but also for the
suitable keywords for the input image. other systems which requires its use. So, to reduce this
problem, we developed a new method based on
5. Experiments and Results
regrouping adjacent region in order to have more
6.3. Experiments compact region that can represent object in the
In our experiments, for each region that represent an image.
object from the query image, the number of input Table 2 General Annotation and Error Rates with Global Learning
features extracted using Texture extraction method is and Execution Times of Texture and GIST Descriptors using
14 x 6 = 84 while the number of input features Bayesian Network classifier based on regrouping region.
extracted using GIST extraction method is 32. These Extraction Learning Execution Annotation Error Rate
Method Time (s) Time (s) Rate (%) (%)
inputs are presented and feed to the classifier; which is
Texture 9.41 1829.51 60.00 40.00
the Bayesian network in order to select the appropriate GIST 11.32 1882.77 65.00 35.00
keywords from the reference database.
The Figure 6 shows some examples of image objects The general annotation rates and error rates of Texture
from ETH-80 image database used in our experiments. and GIST descriptors based on the Bayesian network
The experiments are made based on eight classes of classifier and the regrouping approach were been
objects (Apple, Car, Cow, Cup, Dog, Horse, Pears, and given in Table 2. The experimental results showed that
Tomato). the annotation rate of the proposed approach based on
regrouping different region is higher than the
annotation rate when using k-means segmentation
directly.
The Figure 7 gives the confusion matrix of the
annotation system based on the Texture descriptor and
the Bayesian network classifier in the case of using k-
means segmentation directly and the case of
regrouping region.

Figure 6. Some objects images from ETH-80 database.

The accuracy of image annotation is evaluated by the


precision rate which is the number of correct results
divided by the number of all returned results.
All the experiments are conducted using ETH-80
database containing a set of 8 different object images
[17]. The proposed system has been implemented and
tested on a core 2 Duo personnel computer using
Matlab software.
6.4. Results and Discussion
When using the Bayesian network as a classifier, the
general annotation rates and error rates of Texture and
GIST descriptors are given in Table 1. The
experimental results showed that the annotation rates of Figure 7. Confusion matrix of the annotation system based on the
the Bayesian network classifier based on GIST Texture descriptor and the Bayesian network classifier.

247
The Figure 8 gives the confusion matrix of the 10.1109/ ICMCS.2014.6911218 (ICMCS'14),
annotation system based on the GIST descriptor and the 2014 IEEE.
Bayesian network classifier in the case of using k- [3] Frank Y. Shih, Shouxian Cheng, Automatic
means segmentation directly and the case of regrouping seeded region growing for color image
region. segmentation, Image and Vision Computing 23,
pp. 877886, 2005.
[4] Aristidis Likas, Nikos A. Vlassis, and Jakob J.
Verbeek. The global k-means clustering
algorithm. Pattern Recognition, 36(2): pp. 451-
461, 2003.
[5] Lior Rokach, Oded Maimon. Data Mining and
Knowledge Discovery Handbook, Chapter 15:
Clustering Methods. pp 321-352, Springer series,
2nd Edition, New York, October 1, 2010.
[6] Ryszard S. Choras, Image Feature Extraction
Techniques and Their Applications for CBIR
and Biometrics Systems, International Journal
Of Biology And Biomedical Engineering, Issue
1, Vol. 1, pp. 6-16, 2007.
[7] R. Haralick, K. Shanmugan, et I. Dinstein.
Textural features for image classification. IEEE
Figure 8. Confusion matrix of the annotation system based on the Transactions on SMC, 3(6) : pp. 610621, 1973.
GIST descriptor and the Bayesian network classifier.
[8] Aude Oliva and Antonio Torralba. Modeling the
6. Conclusion shape of the scene : A holistic representation of
the spatial envelope. International Journal of
In this paper, we developed and presented an image Computer Vision, 42 : pp. 145175, 2001.
annotation system using k-means as image [9] Aude Oliva , Antonio Torralba, Building the gist
segmentation algorithm. For this image annotation of a scene: the role of global image features in
system, we discussed the effect of regrouping different recognition, Progress in Brain Research, 2006.
region in order to have compact object. The texture and [10] Hans G. Feichtinger, Thomas Strohmer: "Gabor
the GIST descriptor are been used with Bayesian Analysis and Algorithms", Birkhuser, 1998.
networks to classify and annotate the input image by [11] Ann.Becker, Patrick Naim : les rseaux
the suited keywords that are selected from the reference baysiens: modles graphiques de connaissance.
database image. The performance of the proposed Eyrolles.1999.
method was been experimentally analysed. This [12] J. Pearl, "Bayesian Networks" UCLA Cognitive
approach increases the general annotation rates. The Systems Laboratory, Technical Report (R-216),
successful experimental results proved that the Revision I. In M. Arbib (Ed.), Handbook of
proposed image annotation system gives good results Brain Theory and Neural Networks, MIT Press,
for some image that are well and properly segmented. 149-153, 1995.
However, Image segmentation remains a challenge that [13] Sabine Barrat, Modles graphiques probabilistes
needs more attention in order to increase precision and pour la reconnaissance de formes, thse de
accuracy of the image annotation system. In addition, luniversit Nancy 2, Spcialit informatique,
the gap between the low-level features and the semantic dcembre 2009.
content of an image would been reduced and [14] George H. John and Pat Langley. Estimating
considered for more accuracy of any image annotation continuous distributions in bayesian classifiers,
system. Other segmentation method and other features the Eleventh Conference on Uncertainty in
extraction method would been considered for future Artificial Intelligence, 1995.
work. [15] Patrick Nam, Pierre Henri Wuillemin, Philippe
References Leray, Olivier pourret, Anna becker, Rseaux
[1] Oujaoura, M.; Minaoui, B.; Fakir, M., "A semantic baysiens, Eyrolles, 3me dition, Paris, 2008.
approach for automatic image annotation," [16] Tom .Mitchell: Generative and discriminative
Intelligent Systems: Theories and Applications classifier: Nave bayes and logistic regression.
(SITA), 8th International Conference on , vol., Machine learning. Draft 2010.
no., pp.18, 8-9 May 2013, doi: [17] ETH-80 database image. [Online]. Available:
10.1109/SITA.2013.6560800, 2013 IEEE. http://www.d2.mpi-inf.mpg.de/Datasets/ETH80
[2] Oujaoura, Mustapha; Minaoui, Brahim; Fakir,
Mohamed, "Combined descriptors and classifiers
for automatic image annotation," International
Conference on Multimedia Computing and
Systems (ICMCS), 2014 , vol., no., pp.270-276,
14-16 April 2014, Marrakesh, Morocco, doi:

248

You might also like