You are on page 1of 6

Mathematical Methods and Techniques in Engineering and Environmental Science

A survey of semantic multimedia retrieval systems


Carmelo Pino and Roberto Di Salvo
Department of Electrical, Electronics and Computer Engineering
University of Catania
V.le A. Doria, 6 - 95125, ITALY
E-mail: {carmelo.pino, roberto.disalvo}@dieei.unict.it

Abstract: - A growing number of research approaches are focusing on combining multimedia retrieval
processing with semantics and knowledge based methods in order to achieve higher-level understanding of
multimedia content. This research direction, often called semantic multimedia, combines techniques such as
low-level multimedia feature extraction and common semantic representation schemes for features and
concepts, thus making possible to manage query based on semantics that is a way for better supporting end user
searches and result visualization in multimedia retrieval. Since low-level representations of media greatly differ
from the higher level concepts associated with them, understanding the semantics of a query required a further
insight in multimedia retrieval to bridge the semantic gap. In this paper we review the state-of-the-art
techniques in semantic multimedia retrieval by discussing how relevant multimedia retrieval systems
incorporate a semantic layer to improve the system performance. Some criticism is also expressed since the
current systems lack in clustering to increase the recall and to support the reuse of multimedia material for
developing new artefacts, thus envisaging novel research directions.
Key-Words: - Semantic multimedia retrieval, image processing, semantic web, experience reuse, clustering

resource description. The current standard de facto
1 Introduction
in multimedia content description is the MPEG-7
[2], known as Multimedia Content Description
MultiMedia Retrieval (MMR) refers to a set of
Interface. It supports multimedia content description
theories, algorithms, and systems that aim at
from several points of view, including media
extracting multimedia content related to pertinent
information, creation information, structure, usage
descriptors or metadata, thus supporting advanced
information, textual annotations, media semantic,
search functions. Initially, these systems were made
and low-level visual and audio features.
by extraction algorithms of multimedia items
At the semantic level, the interoperability among
matching the Low Level Features (LLF) indicated
different Multimedia (MM) archives is achieved
in the user query [1]. Currently, the users need to
through the integration of domain knowledge
access more and more multimedia content at the
expressed in the form of domain ontologies defined
semantic level. Therefore, current researches in
by RDF triplets or OWL expressions. Other
multimedia information retrieval are mainly devoted
interesting approaches for distributed retrieval are
to develop techniques to bridge the semantic gap
based on probabilistic inference mechanisms [3] and
using algorithms that extract semantic-level
on innovative ideas of cross-modal indexing and
descriptors from machine-level audio-visual feature
retrieval [4].
descriptors. Such techniques give rise to a semantic
The different techniques and systems on the stage
layer that links the media features with semantic
motivates the aim of the paper that is the one of
descriptors. Also, the current media retrieval
presenting a survey of the semantic MMR. In
systems are powered by Query By Example
particular, in section 2 the characteristics of a
interfaces (QBE) and the search is refined adding
modern multimedia retrieval system provided with a
interactivity with Relevance Feedback (RFB)
semantic layer are pointed out by taking into
techniques.
account the domain features. Section 3 illustrates in
The semantic layer characteristics depend strongly
details the main tools to improve the performance
on the purpose of the retrieval and on the context.
and reduce the semantic gap of MMR. Some
There are several technologies that are used for the
criticism is expressed to envisage useful further
design of a semantic layer of an MMR system, as
improvements of the current
MMR systems.
well as different technologies are used for the

ISBN: 978-1-61804-046-6

353

Mathematical Methods and Techniques in Engineering and Environmental Science

Finally, concluding remarks address future works


and researches.

2 Characteristics of a Semantic MMR


As mentioned in [5], a first subdivision of the
MMR systems is based on: system that integrate a
QBE approach, system that integrate an Interactive
retrieval method like RFB (Relevance Feedback),
systems that have a personalized and adaptive
content delivery functionality and Systems that are
based on Semantic indexing. In each of these
approaches, the user interacts with MMR by an
interface that makes possible to formulate a
powerful multimedia query.
The MMR systems perform the LLF extraction
starting from the set of algorithms that are contained
in its core. If MMR does not have semantic
functions, the retrieval is done by comparing the
features of the stored media with the ones featuring
the example issued by the user by means of a low
level metrics. In a MMR system that incorporate
semantic functionalities, a specific layer supports
semantic mapping functionalities. The semantic
layer interacts with the LLF extraction portion of the
MMR system by using a specific method (related to
a specific approach) to create the bridge between the
LLF of the stored items with the High Level
Features (HLF), usually given by textual
expressions, associated to the example issued by
user through the QBE interface.
Fig.1 shows the typical architecture of a
semantic multimedia retrieval system. The core
contains the algorithms to implement the LLF
extraction. The user interface allows the user to
formulate a query by providing a suitable example.
The semantic Layer (dotted line) works using
different approaches to mapping the LLF to HLF
and vice-versa. The main mapping algorithms are
based on ontological matching [6], cross-modal
similarities [7], relevance feedbacks [8], and
MPEG-7 based metrics [9].
Another module, often present in the semantic
level, aims at managing possible annotations
associated with the media. This allows the user to
supervise the mapping between HHF and LLF for
each type of considered media as illustrated in fig.2.
Therefore, MMR should be implemented by
search engines that are able to manage manual or
semi-automatic annotations and other semantic
information extracted from the contents surrounding
the example provided by the user as, for example,
HTML links in case of Web contents.

ISBN: 978-1-61804-046-6

Fig. 1 Semantic multimedia framework.

sky

mountain
cloud

Fig.2 Image annotated using a semantic tool.


Concepts describing the picture are mapped in an
ontology.
Currently, the definition of metadata used for the
ontological matching are mainly based on MPEG-7,
thats the greatest metadata framework created to
date. However, although MPEG-7 supports
multimedia contents description from several points
of view, as mentioned in the introduction, it is based
on XML schemas that don't have formal semantics.
Consequently, there are many attempts to move
MPEG-7 to the Semantic Web technologies, such as
RDF and OWL, to develop metadata in standard
format that is a suitable way for adapting MMR to
specific contexts and for interoperability [9]. Other
aspects that may further reduce the semantic gap are
the implementation of intrinsic [28] or extrinsic
mechanisms of relevance feedback, e.g., by a
machine learning approach, and the use of formal

354

Mathematical Methods and Techniques in Engineering and Environmental Science

schema for advanced matching metrics using


metadata.
Also, in the design of an effective semantic MMR
system, the choice of the steps of the retrieval
process plays a key role. Indeed, knowing the
context in which a query is formulated, helps to
understand the algorithms most appropriate to use
for improving the retrieval process. Thus, how
taking into account the domain of interest and how
to design the related query interface is also essential
for a successful MMR system. In fact, general
purpose MMR systems, based on a flow of static
operations, may be widely used but they often show
low retrieval performance.
The next section illustrates in details the main
approaches to power the MMR systems not only
with general semantic functionalities (e.g.,
annotations, ontological descriptions, relevance
feedback) but also with feature extraction
algorithms suitable for the specific domain of
interest (domain-specific algorithms) [10].

aims at favoring the reuse of experience to produce


new MM materials. The RUSHES project allows us
to point the importance of creating hyperlinks
between MM materials to increase the recall, i.e., to
facilitate the retrieval of MM materials similar to the
one retrieved by the basic matching capabilities of
the MMR system. This would allow us retrieve
media similar to the query, even if they are not
annotated or defined by the same words. This may
be obtained by following the approach first
proposed in [13] to let design concepts to emerge
from linked data. In particular, storing the links
between any MM material existing in either a
centralized or a distributed archive and the
influential ones (e.g., the ons identified by the
relevance feedback issued by the user), it is possible
to interrelate different design stories., e.g., the
designs of new MM assets from previous raw MM
materials, also known as "rusches".
In [14] a neural networks based unsupervised
clustering technique has been proposed to favor the
extraction of similar semantic units. We expect that
such a clustering technique may be used to support
communities of designers since in this case the
concepts supported by the shared design memory
well fits the common culture of the users [15]. A
first example partially following this trend is the
framework named CALIMERA (Conference
Advanced Level Information Management and
Retrieval) which aims at facilitating the
management and retrieval of the multimedia
information generated within the scientific
conferences [10]. In this framework the MM data
store is structured in such a way to facilitate the
retrieval of relevant data from metadata formats
such as MPEG-7, RDF and OWL taking advantage
from the annotations semi-automatically produced
by the system. The framework implement also a
P2P method to share the data and metadata with
other systems. Let us note that the importance of
specializing the MMR systems for a community of
users have been well understood by CALIMERA
for improving semi-automatic annotation systems,
whereas the mentioned
advanced clustering
techniques for facilitating the reuse of experience
are not present neither in this system or in the other
current MMR frameworks. Clustering techniques
for similarity analysis has been proposed for video
retrieval, e.g., [17] where a semantic-based
approach for similarity computation is proposed to
enhance the retrieval effectiveness in concept-based
video retrieval. The proposed method is based on
the integration of knowledge-based and corpusbased semantic word similarity measures in order to
retrieve video shots for concepts whose annotations

3 Multimedia retrieval systems with


semantic approach
The MMR frameworks at the state of the art have
different characteristics, thus only comparing these
different approaches one can get a precise idea of
how the semantics of modern retrieval systems may
be used and improved. The tools considered for
managing and improving MMR deal with the
following aspects: a) annotation, relevance feedback
and concept based MMR, b) ontology mediated
MMR, c) intrinsic semantic framework for
recognizing image objects and d) composition of
media ontologies with domain specific ontologies.

3.1 Annotation and concept based MMR


An overview of the annotation tools for video and
image retrieval is given in [11] where the most well
known manual annotation tools are illustrated for
addressing both functionality aspects, such as
coverage and granularity, as well as interoperability
concerns with respect to the supported annotation
vocabularies and representation languages. Due to
the complexity of the manual annotation process,
many approaches have been proposed for automatic
annotation, e.g., [12]. Also, international projects
have been carried out for the annotation and
retrieval of multimedia material such as RUSHES
(www.rushes-project.eu) i.e., a MM engine for the
retrieval of annotated multimedia semantic units that

ISBN: 978-1-61804-046-6

355

Mathematical Methods and Techniques in Engineering and Environmental Science

are not available for the system. The integrated


similarity method is shown and evaluated in terms
of Mean Average Precision (MAP). Although this
system achieves relevant results, the supervised and
unsupervised clustering techniques proposed in [17]
and [18] should be tried to increase the effectiveness
of the recall of MM materials. In particular, this
would guarantee that both MM data belonging to the
same cluster or to close clusters are similar between
them. Massive MMR in a distributed environment
could be carried out by the parallel implementation
of the mentioned clustering technique over a GRID
[19].

domain. Conversely, visual information may


identify requirements that a visual ontology has to
meet. Based on these requirements, an interesting
ontology based annotation approach is proposed in
[24] which investigates how annotations may be
created automatically from two existing knowledge
corpora (WordNet and MPEG-7) by creating links
between visual and general concepts. In this
example we can see how the mapping between LLF
and HLF is made using MPEG-7 as a container of
metadata and HL concept. The problem of joint
modeling the text and image components of
multimedia documents is also addressed in [25]
where explicit modeling of correlations between
images and texts are investigated. In particular,
cross-modal correlations using canonical correlation
analysis (CCA) are proposed and the Mapping of
the text and image from their respective natural
spaces to a CCA space was done successfully using
CCA representations.

3.2 An Ontology Mediated MMR System


Another example of framework that uses, even
more convincingly, the ontology to create a bridge
from to LLF and HLF trough query formulation, is
proposed in [20]. It is called DL-MEDIA and is an
ontology mediated retrieval system that combines
logic-based retrieval with multimedia feature-based
similarity retrieval. The DL-MEDIA architecture
has two basic components: the DL-based ontology
component and the (feature-based) multimedia
retrieval component. The DL-component supports
both the definition of the ontology and query
answering. The (feature-based) multimedia retrieval
component supports the retrieval of text and images
based on low-level feature indexing. It uses an
extension of DLR-Lite(D) like language as query
and ontology representation language.
This approach could be used to extract MM
materials to manage in real time control systems by
knowing how the system images develop in time
and assuming that they are described by an
ontology. For example the traffic control could be
facilitated by an MMR system if it is able to
automatically extract the current congested
situations or dangerous people behaviors. In this
case the MM store should be formed by real time
images taken from cameras as shown in [21] and
[22], whereas the ontological description of the
traffic system given in [23] should be powered by a
(feature-based) multimedia retrieval component to
extract the images relevant for the query. This
would open a novel interesting scenario of MMR
applications where a system manager may describe
in advance the relevant situations that the Online
MMR (OMMR) system should signal to prevent or
limit dangerous consequences. The alert images
extracted by OMMR should help the manager in
better controlling the system. Let us note now that
an ontology that contains visual information can
facilitate also the annotation process in a broad

ISBN: 978-1-61804-046-6

3.3 An intrinsic semantic framework for


recognizing image objects
Semantic information may be derived also in
intrinsic way, i.e., from the features of the media
object (images in this case), e.g., [26], where an
approach to find semantic meaning in visual object
class is proposed accordingly to the Gestalt law of
proximity. The approach does not propose a
framework with a semantic layer, but a method for
semantic features computation. In order to follow
the semantic grouping approach, the image structure
is transformed into a line segment model. Micro
level semantic structures are formed by line
segments (arcs also approximated into line segments
based on pixel deviation threshold) which are in
close proximity. These structures are hierarchically
combined till a semantic label can be assigned. This
algorithm has been tested successfully on a standard
benchmark database. The approach of tagging the
images implicitly is typical of the MM recognition
systems, e.g. face recognition algorithms [27],
which are able to derive either general or specific
properties from images for either off-line retrieval
or real time control. Thus, we expect that other
recognition techniques will be studied and applied
for automatic image tagging.

3.4 Adaptive Architecture for Automatic


Multimedia Retrieval Composition
The previous sections pointed out the importance
of implicit media indexing algorithms and textual

356

Mathematical Methods and Techniques in Engineering and Environmental Science

descriptions to extract relevant MM materials. Also,


textual descriptions are more and more effective if
they take into account domain knowledge to
improve the MMR precision, whereas suitable
clustering techniques were envisaged to increase
MMR recall and to give rise to a new generation of
MMR systems (i.e., the OMMR systems).
Thus, a general MMR architecture for composing
the system suitable for the problem at hand could
be useful as proposed in [10] where a domain-media
independent multimedia retrieval (MMR) system is
implemented, whose functionalities are targeted
both to the user, for providing customized and
multiple domain-specific views of multimedia
content, and to the developer, that may easily create
a multimedia retrieval system for the domain and
the media type of interest. The basic idea behind this
work is to model the retrieval process of every
domain by the integration of three ontologies: a
domain ontology, a media ontology and a
processing ontology (the algorithms for the media
processing). In details, the integration between
domain and processing ontologies (called domainprocessing ontology) adapts the processing
workflow for the media contents retrieval taking
into account the constraints defined by the users for
the specific domain, whereas the media ontology
and the domain-processing ontology are responsible
of the definition of the systems interfaces. Let us
note that also this approach lacks in clustering, thus
future works are envisaged to include clustering
techniques also in the composition architecture.

Advanced user interfaces to better support query


formulation and result visualization are also
expected to provide the users with MMR systems
that possess a semantic level more and more fullbodied. From the techniques illustrated in this paper,
we may conclude that an effective semantic based
MMR should be provided with: 1) a method of
describing metadata (trough MPEG-7 or RDF
schema), 2) an adequate interface to allow the user
to formulate her/his query as precisely as possible,
3) a way to describe the repositories by using a
standard schema that ensures interoperability
between different systems, 4) a flexible
implementation of the retrieval process to be
adapted to the domain and the users' preferences, 5)
a method to store the profile and preferences and to
make adaptive the retrieval process, and 7) LLF
algorithms based on intrinsic elements of semantic.
Finally, we claimed that clustering methods should
be included in current MMR systems to enhance
their performance. How this could be accomplished
has been also discussed especially to carry out
complex tasks such as the reuse of MM materials
and the real time control of traffic systems.
References:
[1] Salembier P., Smith J.R., MPEG-7 multimedia
description schemes, Circuits and Systems for
Video Technology, IEEE Transactions on,
Vol.11, pp. 748-759.
[2] Mller R., Neumann B., Ontology-based
reasoning
techniques
for
multimedia
interpretation and retrieval, Springer London,
2008.
[3] Tong S., Chang E., Support Vector Machine
Active Learning
for Image Retrieval,
Proceedings of the ninth ACM international
conference on Multimedia, ACM, 2001.
[4] Naphide, H.R., Huang, T.S., A probabilistic
framework for semantic video indexing
filtering and retrieval, Multimedia, IEEE
Transactions on, Vol.3, 2001, pp. 141-151.
[5] Garcia R., Celma ., Semantic Integration and
Retrieval of Multimedia metadata, 5th Int.
Workshop on Knowledge Markup and Semantic
Annotation, 2005.
[6] Giordano D., Kavasidis I., Pino C., Spampinato
C., A Semantic-Based and Adaptive
Architecture for Automatic Multimedia
Retrieval Composition, 9th International
Workshop on Content-Based Multimedia
Indexing (CBMI), 2011, pp. 181-186.
[7] Zhang R., et al., A Probabilistic Semantic
Model for Image Annotation and Multi-Modal
Image Retrieval, Tenth IEEE International

4 Concluding remarks
Multimedia retrieval semantic techniques have
been discussed by means of representative
examples. In particular, frameworks that use
ontologies
to map low-level concepts trough
annotation tool based on manual or semi-automatic
methods have been illustrated and criticized. Also,
approaches that don't need ontology to link images
and textual descriptions have been pointed such as
the cross-modal systems, or others that refer to a
detailed description of the metadata using MPEG-7.
Although MPEG-7 metadata are widely used in
MMR, we expect that in a near future each resource
may be identified by a specific RDF schema.
Indeed, RDF and OWL enable conceptual schemas
that can be sometimes a solution more adaptable to
a specific contexts. More effective MMR
composition architectures are envisaged for better
integrating domain specific and general purpose
indexing schemas.

ISBN: 978-1-61804-046-6

357

Mathematical Methods and Techniques in Engineering and Environmental Science

Conference on Computer Vision, Vol.1, 2005,


pp. 846-851.
[8] Yang Y, Nie F, Xu D, Luo J, Zhuang Y, Pan
Y., A Multimedia Retrieval Framework Based
on Semi-Supervised Ranking and Relevance
Feedback, IEEE Trans Pattern Anal Mach
Intell., 2011.
[9] Tous R. Delgado J., Semantic-Driven
Multimedia Retrieval with the MPEG Query
Format, Multimedia Tools and Applications,
Vol.49, 2010, pp. 213-233.
[10] Sokhn M., Mugellini E., Khaled O.A.,
Serhrouchni
A.,
End-to-End
Adaptive
Framework for Multimedia Information
Retrieval,
Wired/Wireless
Internet
Communications, Vol.6649, 2011, pp. 197-206.
[11] Dasiopoulou S., Giannakidou E., Litos G.,
Malasioti P., Kompatsiaris Y, A Survey of
Semantic Image and Video Annotation Tools,
Knowledge-Driven Multimedia Information
Extraction and Ontology Evolution, Vol.6050,
2011, pp. 196-239.
[12] Badii A., et al. Semi-automatic annotation and
retrieval of visual content using the topic map
technology, VIS 08 WSEAS Conferences,
Pittsburgh, 2008.
[13] Faro A., Giordano D., Concept Formation from
Design Cases: Why Reusing Experience and
Why Not. Knowledge Based Systems Journal,
Vol.11, 1998, p.437-448. Elsevier Science.
[14] Faro A., Giordano D. StoryNet : an Evolving
Network of Cases to Learn Information
Systems
Design.,
IEE
Proceedings
SOFTWARE, 1998, pp.119-127
[15] Giordano D., Evolution of interactive graphical
representations into a design language: a
distributed cognition account, International
Journal of Human-Computer Studies, Vol. 57,
Issue 4, October 2002, pp. 317-345.
[16] Memar S., Suriani Affendey L., Mustapha N.,
Shyamala C., Doraisamy M. E., An integrated
semantic-based approach in concept based
video retrieval, Multimedia Tools and
Applications, 2011, pp. 1-19.
[17] Faro A., Giordano D., Maiorana F.,
Discovering complex regularities from adaptive
self organizing classifications. Proceedings of
WASET, N.4, 2005, pp.27-30.
[18] Faro A., Giordano D., Maiorana F.,
Discovering complex regularities: from tree to
semi-lattice
classifications.
International
Journal of Computational Intelligence. Vol. 2,
N.1, 2005, pp. 34-39.
[19] Faro A., Giordano D., Maiorana F., Mining
massive datasets by an unsupervised parallel

ISBN: 978-1-61804-046-6

clustering on a GRID: Novel algorithms and


case study. Future Generation Computer
Systems, Vol.27, Issue 6, 2011, Pages 711-724.
[20] Straccia U., Visco G., An Ontology Mediated
Multimedia Information Retrieval System, 40th
IEEE International Symposium on MultipleValued Logic (ISMVL), 2010, pp. 319-324.
[21] Faro A., Giordano D., Spampinato C., Soft
computing agents processing webcam images
to optimize metropolitan traffic systems.
Computer Vision and Graphics, Computational
Imaging and Vision, 2006, Vol.32, pp.968-974,
Springer.
[22] Faro A., Giordano D., Spampinato C.,
Evaluation of the traffic parameters in a
metropolitan area by fusing visual perceptions
and CNN processing of webcam based images.
IEEE Transactions on Neural Networks, 2007.
[23] Faro A., Giordano D. and Musarra A.,
Ontology based intelligent mobility systems.
IEEE SMC03 Proc. Int. Conf. on Systems,
Man and Cybernetics, Washington, D.C. USA.,
2003, Vol.5, pp.4334-4339. IEEE Press.
[24] Hollink L., Worring M., Building a Visual
Ontology for Video Retrieval, Proceedings of
the 13th annual ACM international conference
on Multimedia, 2005.
[25] Rasiwasia N., Costa Pereira J., Coviello E.,
Doyle G., Lanckriet G., Levy R., Vasconcelos
N., A New Approach to Cross-Modal
Multimedia Retrieval, Proceedings of the
international conference on Multimedia, 2010.
[26] Ahmad N., Youngeun A., Park J., An intrinsic
semantic framework for recognizing image
objects, Multimedia Tools and Applications,
2011, pp. 1-16.
[27] Faro A., Giordano D., Spampinato C., 2006,.
An Automated Tool for Face Recognition
Using Visual Attention and Active Shape
Models Analysis, IEEE EMBC 2006
Engineering in Medicine and Biology
Conference, 2006, New York, IEEE.
[28] Faro A., Giordano D., Pino C., Spampinato C.,
Visual Attention for implicit Relevance
feedback in a content based image retrieval,
Proceedings of the 2010 Symposium on EyeTracking Research Applications, ETRA, 2010,
pp. 73-76.

358

You might also like