You are on page 1of 13

Reinforced Similarity Integration in Image-Rich

Information Networks
Xin Jin, Jiebo Luo, Fellow, IEEE, Jie Yu, Member, IEEE, Gang Wang, Member, IEEE,
Dhiraj Joshi, Member, IEEE, and Jiawei Han, Fellow, IEEE
AbstractSocial multimedia sharing and hosting websites, such as Flickr and Facebook, contain billions of user-submitted images.
Popular Internet commerce websites such as Amazon.com are also furnished with tremendous amounts of product-related images. In
addition, images in such social networks are also accompanied by annotations, comments, and other information, thus forming
heterogeneous image-rich information networks. In this paper, we introduce the concept of (heterogeneous) image-rich information
network and the problem of how to perform information retrieval and recommendation in such networks. We propose a fast algorithm
heterogeneous minimum order /-SimRank (HMok-SimRank) to compute link-based similarity in weighted heterogeneous information
networks. Then, we propose an algorithm Integrated Weighted Similarity Learning (IWSL) to account for both link-based and content-
based similarities by considering the network structure and mutually reinforcing link similarity and feature weight learning. Both local
and global feature learning methods are designed. Experimental results on Flickr and Amazon data sets show that our approach is
significantly better than traditional methods in terms of both relevance and speed. A new product search and recommendation system
for e-commerce has been implemented based on our algorithm.
Index TermsInformation retrieval, image mining, information network, ranking

1 INTRODUCTION
S
OCIAL multimedia (photo and video) sharing and hosting
websites, such as Flickr, Facebook, YouTube, Picasa,
ImageShack, and Photobucket, are popular around the
world, with over billions of photos uploaded by users.
Popular Internet commerce websites such as Amazon are
also furnished with tremendous amounts of product-related
images. In addition, many images in such social networks are
accompanied by information such as owner, consumer,
producer, annotations, and comments. They can be modeled
as heterogeneous image-rich information networks. Fig. 1
shows an example of the Flickr information network, where
images are tagged by the users and image owners contribute
images to topic groups. Fig. 2 shows an Amazon information
network of product images, categories, and consumer tags.
Conducting information retrieval in such large image-
rich information networks is a very useful but also very
challenging task, because there exists a lot of information
such as text, image feature, user, group, and most
importantly the network structure. In text-based retrieval,
estimating the similarity of the words in the context is useful
for returning more relevant images. WordNet manually
groups words into synonym sets, Google Distance [2]
computes word similarity by co-occurrence in search results.
Flickr Distance [3] considers visual relationship. In image
content-based retrieval, most methods (such as Googles
VisualRank [4]) and systems [5], [6], [7], [8], [9] compute
image similarity based on image content features. Hybrid
approach combine text features and image content features
together [10], [11], [12], [13]. Most commercial image search
engines use textual similarity to return semantically relevant
images and then use visual similarity to search for visually
relevant images. Integration-based approaches [11], [12],
[13] use linear or nonlinear combination of the textual and
visual features. However, existing works cannot handle the
link structure. In this paper, we propose an image-rich
information network model where the similarities between
same type of nodes and different types of nodes can be
better estimated based on the mutual impact under the
network structure.
Among algorithms that compute object similarity in
information networks, SimRank [14] is one of the most
popular, but it is veryexpensive tocalculate andthesimilarity
is only based on the link information. When consider the
images in the network, image similarity can actually also be
judged by content features, such as RGB histogramand SIFT.
In this paper, we propose an efficient approach called
MoK-SimRank to significantly improve the speed of Sim-
Rank, andintroduce its extensionHMok-SimRanktoworkon
weighted heterogeneous information networks. Then, we
propose algorithm IWSL to provide a novel way of integrat-
ing both link and content information. IWSL performs
content and link reinforcement style learning with either
global or local feature weight learning.
448 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013
. X. Jin and J. Han are with the Department of Computer Science,
University of Illinois at Urbana Champaign, 2132 Siebel Center, 201 N.
Goodwin Ave., Urbana, IL 61801.
E-mail: {xinjin3, hanj}@illinois.edu, hanj@cs.uiuc.edu.
. J. Luo and D. Joshi are with the Intelligent Systems Research, Kodak
Research Labs, 1999 Lake Avenue Rochester, NY 14650-2103.
E-mail: {jiebo.luo, dhiraj.joshi}@kodak.com.
. J. Yu is with the Global Research Center, General Electric Co., Niskayuna,
NY 12309. E-mail: jerry.j.yu@gmail.com.
. G. Wang is with the School of EEE, Nanyang Technological University,
office S1-B1c-112, 50 Nanyang Avenue, Singapore 639798, and Advanced
Digital Science Center, Singapore. E-mail: wanggang@ntu.edu.sg.
Manuscript received 22 Oct. 2010; revised 13 July 2011; accepted 26 July
2011; published online 10 Nov. 2011.
Recommended for acceptance by X. Zhou.
For information on obtaining reprints of this article, please send e-mail to:
tkde@computer.org, and reference IEEECS Log Number TKDE-2010-10-0562.
Digital Object Identifier no. 10.1109/TKDE.2011.228.
1041-4347/13/$31.00 2013 IEEE Published by the IEEE Computer Society
Based on the proposed algorithm, a novel product
recommendation system has been implemented for e-
commerce to find both visually and semantically relevant
products modeled in an image-rich information network.
Fig. 3 describes the system architecture. The bottom layer
contains the product data warehouse whichincludes product
images and related product information. The second layer
performs metainformation extraction and image feature
extraction. The third layer builds a weighted heterogeneous
image-rich information network. The fourth layer performs
information network analysis based ranking to find relevant
results for a query. The top layer contains a user-friendly
interface, which interacts with users, responds to their
requests, and collects feedback.
The remainder of the paper is organized as follows:
Section 2 describes preliminaries and notations. Section 3
introduces algorithms to compute link-based similarity in
image-rich information networks. Section 4 presents a
method to compute weighted content-based similarity.
Section 5 presents feature learning algorithms and integra-
tion algorithm IWSL. Section 6 reports experimental results.
Section 7 concludes the paper.
2 PRELIMINARIES AND NOTATIONS
We model a weighted heterogeneous image-rich information
network as a graph G = (\ . 1. ) with vertices/nodes \ ,
edges 1 and edge weights . \ = \

, where = 1. . . . . Q
and Q is the number of types of heterogeneous nodes, [\

[ is
the number of nodes of type i. Every image has a 1-
dimensional content feature 1 1
1
, which is either a single
type of feature or a combination of multiple types of
features. Add an edge |
i,
1 between nodes i \ and ,
\ when they are linked together. Denote c
i,
as the weight
of this link.
Without losing generality, we consider a heterogeneous
network graph with three types of nodes (Q = 3): images \
1
,
groups \
G
, and tags \
T
. Take Flickr network as an example,
there is an undirected link between an image node c \
1
and a tag node t \
T
if c is annotated with t, there is also an
undirected link between c and a group node q \
G
if c
belongs to group q. There is no link between nodes of the
same type.
Table 1 lists the major notations used throughout this
paper.
JIN ET AL.: REINFORCED SIMILARITY INTEGRATION IN IMAGE-RICH INFORMATION NETWORKS 449
Fig. 2. Information network for Amazon, connected by products, user
tags, and categories.
Fig. 3. Product search and recommendation system architecture.
Fig. 1. Information network for Flickr, connected by images, user tags,
and groups.
TABLE 1
Major Notations Used in This Paper
3 LINK-BASED SIMILARITY
SimRank [14] is one of the most popular link-based
algorithms for evaluating similarity between nodes in
information networks. It computes node similarity based
on the idea that two nodes are similar if they are linked by
similar nodes in the network. Inspirit of PageRank [15],
SimRank computes the similarity between each pair of
nodes in an iterative fashion with a theoretical guarantee of
the convergence. In a basic homogeneous network, Sim-
Rank computes the similarity score between two objects o
and o
/
is defined as:
o(o. o
/
) =
1
[`(o)|`(o
/
)[
X
o`(o)
X
/`(o
/
)
o(o. /). (1)
with 1 [0. 1[ as the damping factor, `(o) as the in-link
nodes of o, [`(o)[ as the cardinality of set `(o). There are
two special cases: 1) if o = o
/
, then o(o. o
/
) = 1; and 2) if
`(o) = or `(o
/
) = , then o(o. o
/
) = 0.
For a network G of i nodes, the memory space needed by
SimRank to store the similarity pairs is O(i
2
). Denote 1 as
the time spending for calculating (1), then the time complex-
ity is O(i
2
1) in a single iteration. Because SimRank is
computed iteratively, the total time complexity is O(1i
2
1)
for 1 iterations. The original SimRank is too computationally
expensive to be used in large scale networks.
3.1 Mok-SimRank for Fast Computation
Some algorithms [14], [16], [17], [18], [19] have been
proposed for more efficient SimRank computation. We use
the pruning idea proposed by Jeh and Widom [14]. Initialize
every objects top / (/ i) similar objects as candidates and
focus computation on the chosen candidates. This can reduce
the space complexity to O(i/) and time complexity to
O(1i/1), and such method can be denoted as /-SimRank.
We choose this strategy because it fits the property of large-
scale image retrieval wherein most images are dissimilar to
the query image and estimating their similarities many times
is a waste. The time complexity of 1 in /-SimRank is
O([`(o)|`(o
/
)[|oq(/)), where |oq(/) is the complexity to
decide whether `
,
(o
/
) is a candidate of object `
i
(o).
To make /-SimRank even more computationally efficient,
we describe an approach called minimum order /-SimRank
(Mok-SimRank). Denote /(c) as cs top / similar candidates.
Between `(o) and `(o
/
), denote `
/iq
as the neighborhood
that has larger cardinality and `
:io||
as the smaller one.
The basic idea of Mok-SimRank is that beginning with
`
:io||
, for every c `
:io||
, compute the scores considering
two cases:
. Case 1 (/ < [`
/iq
[). For d /(c), check whether
d `
/iq
. If true, return the score; if false, get zero;
. Case 2 (/ _ [`
/iq
[). For d `
/iq
, check whether
d /(c). If true, return the score; if false, get zero.
The time complexity of 1 is then reduced to be always
the minimum combination 1
iii
,
1
iii
=
O([`
:io||
[/|oq([`
/iq
[)) : i) / < [`
/iq
[
O([`
:io||
|`
/iq
[|oq(/)) : i) / _ [`
/iq
[.

(2)
This is the optimal combination that achieves the minimum
cost via automatically making the minimum computation
order.
Table 2 summarizes the time and space complexity of the
link-based similarity algorithms in a homogeneous network
of i nodes.
3.2 HMok-SimRank for Weighted Heterogeneous
Networks
Mok-SimRank can be extended to work for a nciq/tcd
heterogeneous information network, which contains multi-
ple types of nodes. To explain, we take the image-rich
information network from Flickr as an example. Similar
images are likely to link to similar groups and tags, so we
define the link-based semantic similarity between images c
\
1
and c
/
\
1
as follows:
o
i1
(c. c
/
) = c
1
o
G
i
(c. c
/
) u
1
o
T
i
(c. c
/
). (3)
with
o
G
i
(c. c
/
) =
1
G
1

`
G
(c)

`
G
(c
/
)

X
o`
G
(c)
X
/`
G
(c
/
)

o/
cc
/ o
i
(o. /).
(4)
o
T
i
(c. c
/
) =
1
T
1

`
T
(c)

`
T
(c
/
)

X
o`
T
(c)
X
/`
T
(c
/
)

o/
cc
/ o
i
(o. /).
(5)
where `
G
(c) is set of groups image c links to, `
T
(c) is set of
tags image c links to. c
1
and u
1
are the weights of link-based
similarity for group and tag, respectively. We set both as 0.5
in experiment to treat them as equally important. 1
G
1
and
1
T
1
are the damping factors.
(`
G
(c)) is the sum of the weights for the links between
image c and nodes in `
G
(c),

`
G
(c)

=
X
o`
G
(c)
c
co
. (6)

o/
cc
/ is the importance/contribution of o(o. /) for o(c. c
/
)
considering the link weighting, and is defined as the
multiplicative combination of the weights of the two links
|
co
and |
c
/
/
,

o/
cc
/ = c
co
+ c
c
/
/
. (7)
Weight c can be set manually or automatically. The
simplest case is that we set all weights to 1, then the network
essentially becomes unweighted and all links are treated as
equally important. However, in real applications, the links
can be of nonequal importance. Take Amazon as an example,
the tag frequency represents the number of users who think
450 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013
TABLE 2
Complexity of Algorithms in Homogeneous Network
the tag is relevant to the product. So we can use the tag
frequency (or log value) as weight c
co
, for the link between
product image c and tag o.
Similarly, we can define and compute the link-based
group and tag similarity.
The group similarity is computed via the similarity of the
images and tags they link to. For each pair of groups q \
G
and q
/
\
G
,
o
i1
(q. q
/
) = c
G
o
1
i
(q. q
/
) u
G
o
T
i
(q. q
/
). (8)
with
o
1
i
(q. q
/
) =
1
1
G

`
1
(q)

`
1
(q
/
)

X
o`
1
(q)
X
/`
1
(q
/
)

o/
qq
/ o
i
(o. /).
(9)
o
T
i
(q. q
/
) =
1
T
G

`
T
(q)

`
T
(q
/
)

X
o`
T
(q)
X
/`
T
(q
/
)

o/
qq
/ o
i
(o. /).
(10)
where `
1
(q) is the set of images of group q, and `
T
(q) is set
of tags of group q. The meaning and setting of parameters
c
G
, u
G
, 1
1
G
, and 1
T
G
are similar to those in (3), (4), and (5).
The tag similarity is calculated via the similarity of the
images and groups they link to. For each pair of tags t
and t
/
,
o
i1
(t. t
/
) = c
T
o
1
i
(t. t
/
) u
T
o
G
i
(t. t
/
). (11)
with
o
1
i
(t. t
/
) =
1
1
T

`
1
(t)

`
1
(t
/
)

X
o`
1
(t)
X
/`
1
(t
/
)

o/
tt
/ o
i
(o. /).
(12)
o
G
i
(t. t
/
) =
1
G
T

`
G
(t)

`
G
(t
/
)

X
o`
G
(t)
X
/`
G
(t
/
)

o/
tt
/ o
i
(o. /).
(13)
where `
1
(t) is the set of images tagged by t, and `
G
(t) is
set of groups tagged by t. The meaning and setting of
parameters c
T
, u
T
, 1
1
T
, and 1
G
T
are similar to those in (3),
(4), and (5).
The similarity score for any pair of nodes within the
same type is initialized as 0 for different nodes and 1 for the
same node. We do not consider similarity between nodes
from different types.
The similarity scores in (4), (5), (9), (10), (12), and (13) can
be efficiently computed by the idea introduced in Mok-
SimRank. So we can still achieve high efficiency via Mok-
SimRank for heterogeneous networks. Note that the compu-
tation of the similarity scores in (3), (8) and (11) are mutually
dependent.
Generally, basic SimRank, K-SimRank, and Mok-Sim-
Rank can be extended to compute link-based similarity in
(weighted) heterogeneous networks, and we call them
HSimRank and HK-SimRank and HMok-SimRank to
distinguish with the situation in homogeneous networks.
Table 3 summarizes the time and space complexity of the
link-based similarity algorithms in a (weighted) hetero-
geneous network of j types of nodes, with i
i
(i 1. . . . . j)
nodes for type i. Note that the total number of nodes in the
network is i =
P
j
i=1
i
i
.
4 CONTENT-BASED IMAGE SIMILARITY
Image similarity can be estimated from image content
features [10], [20], [21], such as color histogram, edge
histogram [22], Color Correlogram [23], CEDD [24], GIST,
texture features [25], [26], Gabor features [27], [28], shape
[29], [30], and SIFT [31].
4.1 Content Similarity Metric
We represent an image as a point in a 1-dimension feature
space with either a single type of feature or a combination of
multiple types of features. Tang at el. proposed strategies to
integrate both local and global features [32]. If the
integrated feature space has fixed number of dimensions,
our approach is also applicable.
Normalize feature 1 1
1
, where 1 is the number of
dimensions in the feature space, to be of unit length: for any
)
d
, the value of feature 1 on dimension d (d = 1. . . . . 1),
divide it by the sum of values on all dimensions
)
d
= )
d
oiiq

X
1
d=1
)
d
oiiq
. (14)
The
2
test statistic distance between two feature vectors
1
i
and 1
,
is defined as:
A
i,
= A

1
i
. 1
,

=
X
1
d=1
c
d
i,
(15)
=
1
2
X
1
d=1

)
d
i
)
d
,

2
)
d
i
)
d
,
. (16)
When feature vectors are normalized to unit length, the

2
test statistic distance varies from 0 to 1, with 0 indicating
the most similar and 1 indicating the most different.
4.2 Weighted Content Similarity
Existing studies on metric learning [33], [34], [35] have
empirically and theoretically shown that instead of treating
each dimension of the feature vector equally, a learned
weighted metric can significantly improve the performance
for tasks such as image retrieval [33], classification [36], and
clustering [37]. This is because the feature dimensions are
usually not equally important for measuring the similarity
between images. We can obtain better result by putting more
JIN ET AL.: REINFORCED SIMILARITY INTEGRATION IN IMAGE-RICH INFORMATION NETWORKS 451
TABLE 3
Complexity of Algorithms in (Weighted) Heterogeneous Network
weights on a subset of features, which are more relevant to
the semantic meaning of the images.
Based on the
2
test statistic distance and a 1-dimen-
sional feature weighting vector \ = (n
1
. n
2
. . . . . n
1
), we
define the weighted content similarity C
\
i,
between images i
and , as follows:
C
\
i,
= 1
1
2
X
1
d=1

n
d
)
d
i
n
d
)
d
,

2
n
d
)
d
i
n
d
)
d
,
(17)
= 1
X
1
d=1
n
d
c
d
i,
. (18)
which is used to evaluate the image similarity in our paper.
There are two reasons why we choose
2
: First, it has shown
performance as good as, and sometimes better, with cosine,
11 and 12 measures for image similarity in our experiments
by human judgment. It has been used in image retrieval [38],
[39] and obtains best accuracy as a kernel for SVM-based
image classification [40], [41], [42]. Second, its sum of square
formula makes it convenient to perform Gradient Decent-
based optimization used in our algorithm as described in the
next section.
5 INTEGRATION OF LINK AND CONTENT
SIMILARITIES
Using content similarity only may lead to unsatisfying
results. Fig. 4 shows one pair of Flickr images that have
similar content similarity estimated from the low-level
feature, but with different semantic meaning.
Direct use of link information solely based on human
annotations may also lead to unsatisfying results if the
annotation is wrong, too general, or incomplete. In addition,
if the image does not link to any object in the information
network, then only based on link information cannot work.
Fig. 5 shows several examples that are all linked to tag
flower but they are not visually similar.
Traditional thinking is to combine content and link
information to achieve more robust performance. In this
section, we describe a novel way to integrate these two types
of information.
5.1 Learning the Feature Weights
To build a bridge between the content and semantics, we
learna weightingvector \ 1
1
for the feature space toforce
the weighted content-based similarity C
\
i,
to be somehow
consistent with the semantic link-based similarity o
i,
. We
consider two types of approaches: global feature learning
(GFL) and local feature learning (LFL).
5.1.1 Global Feature Learning
We minimize the following regulated objective function to
find \,
((\. j. /) = u|\|
2

X
[\
1
[
i=1
X
,/(i)
}
i,
. (19)
where [\
1
[ is the number of images and /(i) is the set of top /
neighboring candidate images of image i, by combining both
top /,2 most visually similar and top /,2 most semantically
similar images. The first component u|\|
2
is a 1
2
regula-
tion. The second component }
i,
serves as the bridge between
content-based similarity and link-based similarity,
}
i,
=

C
\
i,

jo
i,
/

2
T
i,
= 1
X
1
d=1
n
d
c
d
i,

jo
i,
/

!
2
T
i,
.
(20)
where C
\
i,
(defined in (17)) and o
i,
(defined in (3)) may have
different scales and shifts, so we introduce parameters j
and / to automatically estimate them.
If the tags (or groups) of an image are incomplete (0 or
very few) and thus cannot fully describe its semantic
meaning, the link-based similarity becomes less reliable. In
order to consider this factor, we introduce T
i,
as the
confidence (or importance) of o
i,
, and define it as a function
of the number of linked annotations (including both tags and
groups) to the images i and ,,
T
i,
= 1 c
t
i,
. (21)
where
i,
is defined as the minimum number of links
(including tags and groups) for image i and ,, i.e.,

i,
= iii[`
T
(i) `
G
(i)[. [`
T
(,) `
G
(,)[. (22)
Parameter t rescales the value of
i,
to adjust the importance
of count. It can be either manually set or automatically
estimated by the average of
i,
values.
The value of T
i,
varies from 0 to 1, with 1 indicating the
highest confidence and 0 indicating the lowest confidence. If

i,
= 0 (i.e., at least one of images does not have any tag and
group), then T
i,
= 0, which means the link-based similarity
is not reliable because the semantic information given by
human annotation is missing.
Note that we can also use graph ranking algorithms, such
as PageRank [15] andHITS [43], to estimate the importance of
a node, instead of simple counting. In addition, we may also
consider the t) + id) score of tags, similar to document
452 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013
Fig. 4. Images with high visual similarity, but low semantic similarity.
Fig. 5. Images annotated by the tag flower, but with low visual
similarity.
retrieval, to give higher weights to topic terms and lower
weights to general terms.
The overall importance T
o
is defined as
T
o
=
X
[\
1
[
i=1
X
,/(i)
T
i,
. (23)
By considering T
o
, we obtain the following new global
objective function:
((\. j. /) = u|\|
2

1
T
o
X
[\
1
[
i=1
X
,/(i)
}
i,
. (24)
where u is the weight of the regulator.
To find (\
+
. j
+
. /
+
) that minimizes the above objective
function, we first compute the first-order partial derivatives,
0(
0n
d
= 2un
d

1
T
o
X
[\
1
[
i=1
X
,/(i)
2T
i,
1
X
1
d=1
n
d
c
d
i,
(jo
i,
/)
!

c
d
i,

.
(25)
0(
0j
=
1
T
o
X
[\
1
[
i=1
X
,/(i)
2T
i,
1
X
1
d=1
n
d
c
d
i,
(jo
i,
/)
!
(o
i,
).
(26)
0(
0/
=
1
T
o
X
[\
1
[
i=1
X
,/(i)
2T
i,
1
X
1
d=1
n
d
c
d
i,
(jo
i,
/)
!
(1).
(27)
The variables are estimated by Gradient Decent (or Stochas-
tic Gradient Decent, which could be faster) iteratively,
n
d
i1
= n
d
i

n
d
0(
0n
d

n
d
=n
d
i
)oi o|| d 1. . . . . 1. (28)
j
i1
= j
i

j
0(
0j

j=j
i
. (29)
/
i1
= /
i

/
0(
0/

/=/
i
. (30)
where i denoted the ith iteration.
After global feature learning, based on the new feature
weighting, update the image similarity as a combination of
content-based and link-based similarity,
o(i. ,) = (1 j)C
\
+
i,
j(j
+
o
i,
/
+
). (31)
where parameter j [0. 1[ controls the weight of link-based
similarity in the combination. We could manually set a value
based on user preference or automatically use the value of
T
i,
which estimates the confidence score of the link-based
similarity, as defined in (21).
5.1.2 Local Feature Learning
The problem of global feature learning is that using a global
feature weighting for all images may be too general.
Different images may belong to different semantic topics
and thus need different weightings to capture their specific
important features. Therefore, we can perform LFL to find a
specific feature weight \
+
i
for image i. More specifically, for
each image i, we learn a local weight \
i
, which minimizes
the following objective function L
i
,
L
i
(\. j. /) = u|\|
2

1
T
i
o
X
,/(i)
}
i,
= u|\|
2

X
,/(i)

C
\
i,

j
i
o
i,
/
i

2
T
i,
T
i
o
.
(32)
where T
i
o
is the normalization factor to sum up the
importance of all pairs for image i, and is defined as
T
i
o
=
X
,/(i)
T
i,
. (33)
Similar to GFL, in order to find parameters (\
+
. j
+
. /
+
)
for image i that minimizes its objective function, we first
compute the first-order partial derivatives,
0L
i
0n
d
= 2un
d

1
T
i
o
X
,/(i)
2T
i,

C
\
i,

jo
i,
/

c
d
i,

.
(34)
0L
i
0j
=
1
T
i
o
X
,/(i)
2T
i,

C
\
i,

jo
i,
/

o
i,

. (35)
0L
i
0/
=
1
T
i
o
X
,/(i)
2T
i,

C
\
i,

jo
i,
/

(1). (36)
Use Gradient Decent to compute parameters iteratively
using the following formula:
n
d
i1
= (1 2
n
d u)n
d
i

n
d
1
T
i
o
X
,/(i)
2T
i,

C
\
i
i,
(j
i
o
i,
/
i
)

c
d
i,
. (37)
j
i1
= j
i

j
1
T
i
o
X
,/(i)
2T
i,

C
\
i
i,
(j
i
o
i,
/
i
)

o
i,

.
(38)
/
i1
= /
i

/
1
T
i
o
X
,/(i)
2T
i,

C
\
i
i,
(j
i
o
i,
/
i
)

. (39)
Because the weights have similar properties, we set all

n
d as the same:
n
. Then, the number of parameters is
reduced from 12 to 3. We initialize the variables as
follows: n
d
0
= 1 (for all d = 1. . . . . 1), j
0
= 1, and /
0
= 0.
After local metric learning, based on the learned local
weights, the similarity between two images i and , can be
computed in two ways: symmetric and asymmetric.
JIN ET AL.: REINFORCED SIMILARITY INTEGRATION IN IMAGE-RICH INFORMATION NETWORKS 453
The asymmetric similarity o

(i. ,) is defined as
o

(i. ,) = (1 j)C
\
+
i
i,
j

j
+
i
o
i,
/
+
i

. (40)
where (\
i
. j
i
. /
i
) are the learned parameters for image i.
The symmetric similarity o
o
(i. ,) is defined as
o
o
(i. ,) =
o

(i. ,) o

(,. i)
2
. (41)
5.2 Integration Algorithm
We present novel algorithm to integrate link-based and
content-based similarities. A basic approach would be in
two-stage: First perform HMok-SimRank to compute the
link-based similarities and Second perform feature learning
(either GFL or LFL) considering the link-based similarity to
update the feature weights, and then update the node
similarities based on the new content similarity. Algorithm 1
describes the procedure of the Two-Stage approach.
Algorithm 1. Two-Stage Approach
Input: G, the image-rich information network.
1. Find top 1 similar candidates of each object;
2. Initialization;
3. Iterate
4. Compute link similarity for all image pairs;
5. Compute link similarity for all group pairs;
6. Compute link similarity for all tag pairs;
7. until converge or stop criteria satisfied;
8. Perform feature learning to update \ = \
+
i1
;
9. Update image similarities by Equation 31 (global),
or 40, 41 (local);
Output: o, pair-wise node similarity scores.
In the Two-Stage approach, image content similarity is
not used to help upgrade tag and group similarities. To solve
this problem, we need an algorithm that can provide deeper
integration between content and link information. The idea
is that after feature learning, we update the image similarity
by combining the link similarity with weighted content
similarity. (Before going to the next step, one option is to use
the new similarity to update the set of similar candidates by
introducing more candidates that may have been missed in
the initialization step.) Based on the new image similarity,
we can update the group and tag similarity. With new tag
and group similarity, we can update new link-based image
similarity and learn a new weight. The image/tag/group
similarities will be mutually updated iteratively until the
process converges or any stop criteria is satisfied. We call this
approach Integrated Weighted Similarity Learning (IWSL).
The term weighted has two meanings: 1) we learn weighted
content feature, and 2) the algorithm works in a general
weighted heterogeneous network. Algorithm 2 describes the
procedure of IWSL.
Algorithm 2. Integrated Weighted Similarity Learning
(IWSL)
Input: G, the image-rich information network.
1. Construct /d-tree [44] (or LSH [45] and c.-tree [46]
index) over the image features;
2. Find top / (or c-range) similar candidates of each object;
3. Initialize similarity scores;
4. Iterate
5. Calculate the link similarity for image pairs via
HMok-SimRank;
6. Perform feature learning to update \ = \
+
i1
,
using either global or local feature learning;
7. (Optional) Search for new top / similar image
candidates based on the new similarity weighting;
8. Update the new image similarities o
i1
(i. i
/
) by
Equation 31 (global), or 40, 41 (local);
9. Compute link-based similarity for all group and
tag pairs via HMok-SimRank;
10. until converge or stop criteria satisfied.
Output: o, pair-wise node similarity scores.
Table 4 summarizes the time and space complexity of
Two-Stage and IWSL in a (weighted) heterogeneous network
of j types of nodes, where there are i
i
(i 1. . . . . j) nodes
for type i and the total number of nodes is i =
P
j
i=1
i
i
. Let J
be the number of iterations for Gradient Decent iterative
updating, then Q = O(J[\
1
[/) is time complexity for feature
learning (both GFL and LFL are the same). Denote . as
the number of variables to be estimated. For GFL, . = 12,
and for LFL, . = [\
1
[(12).
6 EXPERIMENTS
6.1 Data Sets
We conduct experiments on two data sets: Flickr and
Amazon. The Flickr data set is created by downloading the
images and related metadata information, such as groups
and tags using Flickr API. The Amazon data set is created by
downloading product images and related metadata informa-
tion, such as category, tags, and title, via the API of Amazon.
The Amazon API only returns the top five tags for each
product, so we use the words in the title as additional tags.
Product category is treated as group. Table 5 shows the
statistics for the two data sets.
For image feature extraction, we extracted CEDD [24],
which is a compact descriptor that considers both color and
edge features. In literature, it has shown good performance
compared with many traditional features. Note that our
model is general to other features or combination of them.
Tag preprocessing. We change all tags to lower case. Tags
with only number characters are removed. Stop-words,
such as the, you and me, are also removed. The remaining
tags are stemmed using the Porter stemming algorithm [47].
454 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013
TABLE 4
Complexity of Two-Stage and IWSL in (Weighted)
Heterogeneous Network
TABLE 5
Statistics for the Data Sets
We remove very infrequent tags and only retain those that
appear in more than / (e.g., / = 2) images.
Fig. 6 shows the tag frequency (the number of images
annotated with the tag) of data set Flickr. We can observe
that many tags have low frequency, and a very few tags
have high frequency. The Amazon data set has similar
curve shape. Table 6 shows the top 10 most frequent tags for
the two data sets.
6.2 Experimental Setting
Experiments were conducted on a PC with Intel Pentium(R)
D 3.4 GHz CPU and 4 GB RAM, running Windows XP.
6.3 Speed Performance
Fig. 7 shows the speed-up (T
/o:c|iic
,T
i
) of H1-SimRank and
HMoK-SimRank over baseline HSimRank. With the in-
crease of the number of images, they become increasingly
faster than HSimRank. Note that we only show result for as
most 3,000 images due to the expensive time complexity of
the baseline HSimRank, which takes too long for larger
data. Fig. 8 shows the speed-up of HMoK-SimRank over
H1-SimRank. We can see that HMoK-SimRank is much
faster than HK-SimRank. Because IWSL is based on HMoK-
SimRank, it has similar time efficiency except for the time
spent on feature weight learning.
For exact executing time, we take the Amazon data set as
an example: when the number of images is 3,000, HSimRank
takes 598 seconds, H1-SimRank takes 24 seconds, and
HMoK-SimRank takes 9 seconds; when the number of image
increase to 20,000, H1-SimRank takes 928 seconds, while
HMoK-SimRank only takes 132 seconds.
6.4 Relevance Performance
6.4.1 General Result
Because of the large number of images, it is difficult to
check one by one to obtain a complete set of relevant images
for each query image. In order to generate an approximate
ground truth for performance evaluation, we assume that if
two images are relevant, their visual similarity should be
above a threshold
.
and the number of shared tags should
also be above a threshold
t
.
We ignore images which contain less than five tags. Such
images make up 6.5 and 15.3 percent of the Flickr and
Amazon data set, respectively. Since all considered images
have more than five tags, which are human annotations, if
two images dont have any common tag, it is likely that the
users who made those tags do not think they are relevant.
On the Internet, there are many images do not have tags, to
simulate the real world case, we randomly select 50 percent
images to remove all their tags. Our algorithm can still find
some of them as relevant because we are able to learn a
feature weight based on those images which have tags or
other link-based information.
We compare VLWC (weighted combination of visual and
link similarity without feature weight learning), IWSL_L
(IWSL with local feature weight learning), and IWSL_G
(IWSL with global feature weight learning) to several
baselines: Visual (only use the visual similarity), Text (only
use the textual similarity, following a popular text retrieval
approach: cosine measure based on the t) + id) weighted
tag vector), VTWC [11] (weighted combination of visual and
textual similarity, we choose equal weight), Link (HMok-
SimRank which only use the link similarity), MinFusion and
MaxFusion [11].
JIN ET AL.: REINFORCED SIMILARITY INTEGRATION IN IMAGE-RICH INFORMATION NETWORKS 455
Fig. 6. Tag frequency for the Flickr data set. Y -axis denotes the
frequency, A-axis denotes the ordered (by frequency) tag id.
TABLE 6
Top 10 Most Frequent Tags
Fig. 7. Speed-up of HK-SimRank and HMok-SimRank over HSimRank.
A-axis denotes the number of images, Y -axis denotes the speed-up (in
times ratio).
Fig. 8. Speed-up of HMok-SimRank over HK-SimRank. A-axis denotes
the number of images, Y -axis denotes the speed-up (in times ratio).
Evaluation method. We use mean average precision (MAP)
to measure the retrieval performance of the algorithms. For
every image in the data set, we obtain a ranking list of
relevant images computed by each algorithm and compute
the average precision based on the approximate ground
truth before removing tags. The final MAP score for each
algorithm is calculated as the mean average precision of
each image. Note that there is no training data. So all the
algorithms are unsupervised.
Figs. 9 and 10 show the result on Flickr and Amazon data,
respectively. We can see that link-based similarity performs
better than text-based similarity; VLWC achieves better
performance than traditional algorithms by linearly combin-
ing visual and link information together. Algorithm IWSL
further improves the performance by introducing a newway
of integrating content and link information via mutual
reinforcement with feature learning. IWSL_L achieves better
results than IWSL_G, because IWSL_L performs local feature
learning, which can find a specific and better feature
weighting for each image than global feature learning, which
finds a general feature weighting for all images.
6.4.2 Case Study
As an example from the Flickr data set, Fig. 11 shows the
top 10 most similar images for a query image about
moon, using link-based (SimRank) (first row), content-
based similarity (second row), and IWSL (third row),
respectively. The top left image is the query image. Clearly,
IWSL obtains the most relevant matches for both semantic
and visual appearances.
In another example from the Amazon data set, Fig. 12
shows the top 10 most similar images for a query image
about iPhone, using link-based similarity (SimRank) (first
row), content-based similarity (second row), and IWSL
(third row). Again, IWSL obtains the best results in terms
of the relevance for both semantic and visual appearances.
Our experiments also show good performance of our
algorithm to find similar groups and relevant tags. For
example, similar groups about night are Night Images,
No-Flash Night Shots, After Dark - Night Photography
and Night Lights. Relevant tags to flower are floral,
flora, and botani. We can use such tag similarity to help
find more relevant images for a keyword query.
6.5 Parameter Setting
The experiments are based on the following parameter
setting: t = 0.5 (21), j = T
i,
((31) and (40)), u = 0.5 ((24) and
(33)) and all parameters as 0.5 for Gradient Decent. The
damping factor parameters defined in Section 3.2 are set as
0.8 by following the standard of SimRank.
As an example to demonstrate how we obtain the optimal
parameter setting, Fig. 13 shows the MAP of algorithm
IWSL_L w.r.t. parameter u, for both data sets. When u is too
small which means significantly ignoring the regulator, the
performance is not good, which proves the benefit of
introducing the regulator; when u is too big, the performance
is also not good, because the regulator will dominate the
optimization.
6.6 Convergence
Figs. 14 and 15 show o = [o
i1
o
i
[, the absolute change
in the average sum of similarity scores (including images,
groups, and tags) from iteration i to i1 for algorithm
IWSL_L on data set Flickr and Amazon, respectively.
456 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013
Fig. 9. MAP of the algorithms on Flickr data. A-axis denotes the
algorithms. Y -axis denotes the MAP (percent).
Fig. 10. MAP of the algorithms on Amazon data. A-axis denotes the
algorithms. Y -axis denotes the MAP (percent).
Fig. 11. Top 10 retrieval results by (1) Link (SimRank), (2) Content similarity, and (3) IWSL. The top left image is the query image from Flickr. It is
tagged with moon, lune, sky and belongs to group After Dark - Night Photography.
IWSL_G has similar results. We can see that IWSL converges
very fast, and only five to six iterations are enough for most
scenarios.
6.7 Product Search and Recommendation System
for E-Commerce
Fig. 16 shows our product search and recommendation
system for e-commerce using Amazon products as an
example. When users search and click on a product in a
web browser, we can recommend both visually and
semantically relevant products, with the relevance score
computed by our IWSL algorithm.
Fig. 17 shows a comparison of our recommendation with
the Amazon recommendation based on Customers Who
Bought This Item Also Bought (which we call cobought).
When a consumer wants to buy a bag, our approach
provides more relevant recommendations for the product.
Fig. 18 shows another comparison with the Amazon
recommendation based on Customers Who Bought This
Item Also Viewed (which we call coviewed). The results are
similar, because when a Amazon user wants to buy a
jewelry, he or she is likely to browse through similar
products before making a final choice.
One problem of using coviewed or cobought information
for recommendation is that only the top-/ ranking list is
available, but the details (e.g., the frequency of such co-
occurrence) are commercial secret and are not publicly
accessible. Without such proprietary information, it is
difficult to combine sources from multiple e-commerce
websites, such as Amazon, BestBuy, WalMart, and Target,
to generate an overall recommendation. We believe this is
one of the reasons why general product search engines,
such as Google Product Search and Bing Shopping,
currently do not have such recommendation functions.
JIN ET AL.: REINFORCED SIMILARITY INTEGRATION IN IMAGE-RICH INFORMATION NETWORKS 457
Fig. 13. MAP w.r.t. parameter u. A-axis denotes u. Y -axis denotes MAP.
Fig. 14. Convergence of IWSL on Flickr data. A-axis denotes the
number of iteration. Y -axis denotes o (i.e., 1c|to o).
Fig. 12. Top 10 retrieval results by (1) Link (SimRank), (2) Content similarity, and (3) IWSL. The top left image is the query image from Amazon. It is
tagged with iPhone, invisible shield, accessories and belongs to category WirelessAccessories.
Fig. 15. Convergence of IWSL on Amazon data. A-axis denotes the
number of iteration. Y -axis denotes o (i.e., 1c|to o).
Fig. 16. Snapshot of the product search and recommendation system for
e-commerce.
However, for products (e.g., jewelry, watch, bag, glasses,
and clothes) that depend heavily on visual appearance to
attract consumers, we are able to generate both visually and
semantically relevant recommendations without the cov-
iewed or cobought information. This leads to a better
general product search service that enables users to
compare and make the best choice.
Similar strategy can be applied to Flickr for photo and
interest recommendation. When users browse Flickr photos,
we can recommend relevant Flickr photos, or interest
groups for users to join [48]. In addition, by integrating
Flickr and Amazon networks, we can recommend relevant
Amazon product photos to a Flickr photo for advertisement.
7 CONCLUSIONS AND FUTURE WORK
This paper presents a novel and efficient way of finding
similar objects (such as photos and products) by modeling
major social sharing and e-commerce websites as image-
rich information networks. Our major contributions are as
follows:
1. We propose HMok-SimRank to efficiently compute
weighted link-based similarity in weighted hetero-
geneous image-rich information networks. The
method is much faster than heterogeneous SimRank
and K-SimRank.
2. We propose both global and local feature learning
approaches for learning a weighting vector to capture
more important feature subspace to narrow the
semantic gap.
3. We propose the algorithm IWSL to provide a novel
way of reinforcement style integrating with feature
weighting learning for similarity/relevance compu-
tation in weighted heterogeneous image-rich infor-
mation network.
4. We conduct experiments on Flickr and Amazon
networks. The results have shown that our algo-
rithm achieves better performance than traditional
approaches.
5. We have implemented a new product search and
recommendation system to find both visually similar
and semantically relevant products based on our
algorithm.
Future work. Under the concept of heterogeneous image-
rich information network, many future works are in our
sight. It will be interesting to see how such kind of network
structure may benefit various image mining and computer
vision tasks, such as image categorization, image segmenta-
tion, tag annotation, and collaborative filtering. As for the
proposed algorithm IWSL, we plan to study the problem of
how to get an optimal combination of both local and global
learning to achieve a balance on time and quality perfor-
mance. In order to use IWSL in web scale search engine, a
distributive computing extension will be investigated.
Considering dynamic environment is also important. One
458 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013
Fig. 18. Recommendation comparison, ours versus Amazons Customers Who Bought This Item Also Viewed.
Fig. 17. Recommendation comparison, ours versus Amazons Customers Who Bought This Item Also Bought.
potential solution could be: First, perform network cluster-
ing to partition the whole large network into small
connected components as subnetworks; Second, run the
proposed algorithm on each subnetwork; and third, when a
new image come, update the subnetwork which the new
image belongs to.
ACKNOWLEDGMENTS
Research was sponsored by Kodak and in part by NSF grant
IIS-09-05215, MURI award FA9550-08-1-0265 and NS-CTA
W911NF-09-2-0053. The views and conclusions contained in
this document are those of the authors and should not be
interpreted as representing official policies, either expressed
or implied, of the sponsors. The authors would like to thank
the owners of the Flikcr and Amazon photos used in this
paper. This work is extended from our WWW 10 paper [1].
REFERENCES
[1] X. Jin, J. Luo, J. Yu, G. Wang, D. Joshi, and J. Han, iRIN: Image
Retrieval in Image-Rich Information Networks, Proc. 19th Intl
Conf. World Wide Web (WWW 10), pp. 1261-1264, 2010.
[2] R.L. Cilibrasi and P.M.B. Vitanyi, The Google Similarity
Distance, IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3,
pp. 370-383, Mar. 2007.
[3] L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li, Flickr Distance,
Proc. 16th ACM IntL conf. Multimedia, pp. 31-40, 2008.
[4] Y. Jing and S. Baluja, VisualRank: Applying Pagerank to Large-
Scale Image Search, IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 30, no. 11, pp. 1877-1890, Nov. 2008.
[5] R.C. Veltkamp and M. Tanase, Content-Based Image Retrieval
Systems: A Survey, technical report, Dept. of Computing Science,
Utrecht Univ., 2002.
[6] H. Tamura and N. Yokoya, Image Database Systems: A Survey,
Pattern Recognition, vol. 17, no. 1, pp. 29-43, 1984.
[7] W.I. Grosky, Multimedia Information Systems, IEEE Multi-
Media, vol. 1, no. 1, pp. 12-24, Spring, 1994.
[8] V.N. Gudivada and V.V. Raghavan, Content-Based Image
Retrieval Systems, Computer, vol. 28, no. 9, pp. 18-22, Sept. 1995.
[9] Y. Rui, T.S. Huang, and S.-F. Chang, Image Retrieval: Current
Techniques, Promising Directions, and Open Issues, J. Visual
Comm. and Image Representation, vol. 10, no. 1, pp. 39-62, 1999.
[10] R. Datta, D. Joshi, J. Li, and J.Z. Wang, Image Retrieval: Ideas,
Influences, and Trends of the New Age, ACM Computing Surveys,
vol. 40, no. 2, pp. 1-60, Apr. 2008.
[11] T. Deselaers and H. Mller, Combining Textual- and Content-
Based Image Retrieval, Tutorial, Proc. 19th Intl Conf. Pattern
Recognition (ICPR 08), http://thomas.deselaers.de/teaching/
files/tutorial_icpr08/04-combinatio n.pdf, 2008.
[12] S. Sclaroff, M. La Cascia, and S. Sethi, Unifying Textual and
Visual Cues for Content-Based Image Retrieval on the World
Wide Web, Computing Vision Image Understanding, vol. 75, pp. 86-
98, July 1999.
[13] Z. Ye, X. Huang, Q. Hu, and H. Lin, An Integrated Approach for
Medical Image Retrieval through Combining Textual and Visual
Features, Proc. 10th Intl Conf. Cross-Language Evaluation Forum:
Multimedia Experiments (CLEF 09), pp. 195-202, 2010.
[14] G. Jeh and J. Widom, SimRank: A Measure of Structural-Context
Similarity, Proc. Eighth Intl Conf. Knowledge Discovery and Data
Mining (KDD 02), 2002.
[15] L. Page, S. Brin, R. Motwani, and T. Winograd, The Pagerank
Citation Ranking: Bringing Order to the Web, technical report,
Stanford InfoLab, 1999.
[16] D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov, Accuracy
Estimate and Optimization Techniques for Simrank Computa-
tion, VLDB Endowment, vol. 1, no. 1, pp. 422-433, 2008.
[17] D. Fogaras and B. Racz, Scaling Link-Based Similarity Search,
Proc. 14th Intl Conf. World Wide Web, pp. 641-650, 2005.
[18] J. Wang, H.-J. Zeng, Z. Chen, H. Lu, L. Tao, and W.-Y. Ma,
ReCoM: Reinforcement Clustering of Multi-Type Interrelated
Data Objects, Proc. 26th ACM Ann. SIGIR Conf. Research and
Development in Information Retrieval, pp. 274-281, 2003.
[19] X. Yin, J.H., and P.S. Yu, LinkClus: Efficient Clustering via
Heterogeneous Semantic Links, Proc. 32nd Intl Conf. Very Large
Data Bases, pp. 427-438, 2006.
[20] T. Deselaers, D. Keysers, and H. Ney, Features for Image
Retrieval: An Experimental Comparison, Information Retrieval,
vol. 11, no. 2, pp. 77-107, 2008.
[21] Z. Yang and C.-C.J. Kuo, Survey on Image Content Analysis,
Indexing, and Retrieval Techniques and Status Report of Mpeg-7,
Tamkang J. Science and Eng., vol. 3, no. 2, pp. 101-118, 1999.
[22] R.C. Ltd., MINDSs Descriptors for Still Images - Spatial Edge
Distribution Descriptor,ISO/IEC/JTC1/SC29/WG11, Lancaster,
United Kingdom, p. 109, 1999.
[23] J. Huang, S.R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, Image
Indexing Using Color Correlograms, Proc. IEEE CS Conf.
Computer Vision and Pattern Recognition (CVPR 97), pp. 762-768,
1997.
[24] S. Chatzichristofis and Y. Boutalis, CEDD: Color and Edge
Directivity Descriptor: A Compact Descriptor for Image Indexing
and Retrieval, Proc. Sixth Intl Conf. Computer Vision Systems,
pp. 312-322, 2008.
[25] S. Aksoy and R.M. Haralick, Textural Features for Image Data
base Retrieval, Proc. IEEE Workshop Content - Based Access of Image
and Video Libraries (CBAIVL 98), p. 45, 1998.
[26] K. Muller and J.R. Ohm, Wavelet-Based Contour Descriptor,
ISO/IEC/JTC1/SC29/WG11, Lancaster, Uinted Kingdom, p. 567,
1999.
[27] M. Park, J.S. Jin, and L.S. Wilson, Fast Content-Based Image
Retrieval Using Quasi-Gabor Filter and Reduction of Image
Feature Dimension, Proc. IEEE Fifth Southwest Symp. Image
Analysis and Interpretation (SSIAI 02), p. 178, 2002.
[28] D.M. Squire, W. Muler, H. Muler, and T. Pun, Content-Based
Query of Image Databases: Inspirations from Text Retrieval,
Pattern Recognition Letters - Selected Papers from 11th Scandinavian
Conf. Image, vol. 21, nos. 13/14, pp. 1193-1198, 2000.
[29] T., Inc. Normalized Contour as a Shape Descriptor for Visual
Objects,ISO/IEC/JTC1/SC29/WG11, Lancaster, United King-
dom, p. 579, 1999.
[30] W.Y. Kim and Y.S. Kim, A Rotation Invariant Geometric Shape
Descriptor Using Zernike Moment, ISO/IEC/JTC1/SC29/WG11,
Lancaster, United Kingdom, p. 687, 1999.
[31] D.G. Lowe, Object Recognition from Local Scale-Invariant
Features, Proc. IEEE Seventh Intl Conf. Computer Vision, vol. 2,
pp. 1150-1157, 1999.
[32] J. Tang, H. Li, G.-J. Qi, and T.-S. Chua, Image Annotation by
Graph-Based Inference with Integrated Multiple/Single Instance
Representations, IEEE Trans. Multimedia, vol. 12, no. 2, pp. 131-
141, Feb. 2010.
[33] L. Yang, R. Jin, L. Mummert, R. Sukthankar, A. Goode, B. Zheng,
S.C. Hoi, and M. Satyanarayanan, A Boosting Framework for
Visuality-Preserving Distance Metric Learning and its Application
to Medical Image Retrieval, IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 32, no. 1, pp. 30-44, Jan. 2010.
[34] E. Xing, A. Ng, M. Jordan, and S. Russell, Distance Metric
Learning with Applications to Clustering with Side Information,
Proc. 16th Conf. Advances in Neural Information Processing Systems,
2002.
[35] L. Wu, S.C. Hoi, R. Jin, J. Zhu, and N. Yu, Distance Metric
Learning from Uncertain Side Information with Application to
Automated Photo Tagging, Proc. 17th ACM Intl Conf. Multimedia,
pp. 135-144, 2009.
[36] B. Babenko, S. Branson, and S. Belongie, Similarity Functions for
Categorization: From Monolithic to Category Specific, Proc. IEEE
12th Intl Conf. Computer Vision (ICCV 09), 2009.
[37] L. Yang and A.R. Jin, Contents Distance Metric Learning: A
Comprehensive Survey, technical report, Dept. of Computer
Science and Eng., Michigan State Univ., 2006.
[38] S. Belongie, J. Malik, and J. Puzicha, Shape Matching and
Object Recognition Using Shape Contexts, IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr.
2002.
[39] G. Wang and D. Forsyth, Object Image Retrieval by Exploiting
Online Knowledge Resources, Proc. IEEE CS Conf. Computer
Vision and Pattern Recognition (CVPR), pp. 1-8, 2008.
[40] J.R.R. Uijlings, A.W.M. Smeulders, and R.J.H. Scha, Real-Time
Bag of Words, Approximately, Proc. ACM Intl Conf. Image and
Video Retrieval (CIVR 09), pp. 6:1-6:8, 2009.
JIN ET AL.: REINFORCED SIMILARITY INTEGRATION IN IMAGE-RICH INFORMATION NETWORKS 459
[41] Y.-G. Jiang, C.-W. Ngo, and J. Yang, Towards Optimal Bag-of-
Features for Object Categorization and Semantic Video Retrieval,
Proc. Sixth ACM Intl Conf. Image and Video Retrieval (CIVR 07),
pp. 494-501, 2007.
[42] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local
Features and Kernels for Classification of Texture and Object
Categories: A Comprehensive Study, Intl J. Computer Vision,
vol. 73, pp. 213-238, June 2007.
[43] S. Chakrabarti, B. Dom, P. Raghavan, and S. Rajagopalan,
Automatic Resource Compilation by Analyzing Hyperlink
Structure and Associated Text, Proc. Seventh World Wide Web
Conf. (WWW 97), pp. 65-74, 1997.
[44] J.L. Bentley, Multidimensional Divide-and-Conquer, Comm.
ACM, vol. 23, no. 4, pp. 214-229, Apr. 1980.
[45] M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni, Locality
Sensitive Hashing Scheme Based on P-Stable Distribution, Proc.
20th Ann. Symp. Computational Geometry, 2004.
[46] A. Beygelzimer, S. Kakade, and J. Langford, Cover Trees for
Nearest Neighbor, Proc. 23rd Intl Conf. Machine Learning, pp. 97-
104, 2006.
[47] M.F. Porter, An Algorithm for Suffix Stripping, Program, vol. 14,
no. 3, pp. 130-137, 1980.
[48] J. Yu, X. Jin, J. Han, and J. Luo, Collection-Based Sparse Label
Propagation and Its Application on Social Group Suggestion from
Photos, ACM Trans. Intelligent Systems Technology, vol. 2, pp. 12:1-
12:21, Feb. 2011.
Xin Jin received the diploma in physics at Beijing Normal University,
China, in 2005 and the diploma and MS degrees in computer science at
Beijing Normal University, China, in 2005 and 2007, respectively. Since
2007, he has been working toward the PhD degree in the Department of
Computer Science at University of Illinois at Urbana Champaign, as a
research assistant. His current research interests include data mining,
machine learning, and information retrieval.
Jiebo Luo received the BS and MS degrees in
electrical engineering from the University of
Science and Technology of China in 1989 and
1992, respectively, and the PhD degree in
electrical engineering from the University of
Rochester in 1995. He is a senior principal
scientist with the Kodak Research Laboratories
in Rochester, New York. He has been actively
involved in numerous technical conferences,
including serving as program co-chair of the
2010 ACM Multimedia Conference, general chair of the 2008 ACM
International Conference on Content-based Image and Video Retrieval
(CIVR), program cochair of the 2012 IEEE International Conference on
Computer Vision and Pattern Recognition (CVPR), and program cochair
of the 2007 SPIE International Symposium on Visual Communication
and Image Processing (VCIP). He is the editor-in-chief for the Journal of
Multimedia, and has served on the editorial boards of the IEEE
Transactions on Pattern Analysis and Machine Intelligence (TPAMI),
the IEEE Transactions on Multimedia (TMM), Pattern Recognition (PR),
Machine Vision Applications (MVA), and the Journal of Electronic
Imaging (JEI). He is a fellow of the SPIE, IAPR, and IEEE.
Jie Yu received the BE degree in electrical
engineering from Dong Hua University, Shang-
hai, China, in 2000 and the PhD degree in
computer science from the University of Texas
at San Antonio (UTSA), San Antonio, in 2007.
He is a computer scientist at GE Global
Research. Before joining, he was a research
scientist in the Intelligence Systems Group at
Kodak Research Labs. His research interests
include multimedia information retrieval, ma-
chine learning, computer vision, and pattern recognition. He has
published more than 20 journal articles, conference papers, and book
chapters in these fields. He is the recipient of the Student Paper Contest
Winner Award of IEEE ICASSP 2006 and the Presidential Dissertation
Award of UTSA in 2006. He is a member of the IEEE and the ACM.
Gang Wang received the BS degree in electrical
engineering from Harbin Institute of Technology
in 2005 and the PhD degree from the Depart-
ment of Electrical and Computer Engineering at
the University of Illinois at Urbana Champaign in
2010. He is an assistant professor in the School
of EEE at Nanyang Technological University and
a research scientist at the Advanced Digital
Science Center. His research interests include
computer vision and machine learning. He
received the Harriett & Robert Perry Fellowship (2009-2010) and
the CS/AI award (2009) at Beckman Institute, UIUC. He is a member
of the IEEE.
Dhiraj Joshi is a research scientist in the
Intelligent Systems Group at the Eastman
Kodak Research Labs, Rochester, New York.
At Kodak, his primary focus is associating image
content, tags, and location metadata with image
semantics. He is also interested in building
intelligent systems which use semantics across
multiple modalities of media for enriching user
experience. Before joining Kodak, he was a
research intern at the I.B.M. T.J. Watson
Research Labs, and the Idiap Research Institute (Switzerland). In
2006, he was selected as an emerging leader in multimedia research to
present at the Watson Emerging Leaders in Multimedia Workshop. He
co-organized a special session on image aesthetics, moods, and
emotions at the IEEE International Conference on Image Processing,
2008. He has also participated in the Kodak Visiting Scientist
programme to promote science and mathematics education in Roche-
ster area schools. He is a member of the IEEE and currently serve as a
Rochester chapter vice-chair of IEEE Signal Processing Society.
Jiawei Han is a professor in the Department of
Computer Science at the University of Illinois. He
has been working on research into data mining,
information network analysis, data warehousing,
stream mining, spatiotemporal, and multimedia
data mining, text and web mining, and software
bug mining, with more than 400 conference and
journal publications. He has chaired or served in
more than 100 program committees of interna-
tional conferences and workshops and also
served or is serving on the editorial boards for Data Mining and
Knowledge Discovery, the IEEE Transactions on Knowledge and Data
Engineering, the Journal of Computer Science and Technology, and the
Journal of Intelligent Information Systems. He is currently the founding
editor-in-chief of the ACM Transactions on Knowledge Discovery from
Data (TKDD). He has received IBM Faculty Awards, a HP Innovation
Award, a W. Wallace McDowell Award (2009), the Outstanding
Contribution Award at the International Conference on Data Mining
(2002), the ACM Service Award (1999), the ACM SIGKDD Innovation
Award (2004), and the IEEE Computer Society Technical Achievement
Award (2005). He is an ACM and IEEE fellow. He is currently the director
of Information Network Academic Research Center (INARC) supported
by the Network Science-Collaborative Technology Alliance (NS-CTA)
program of US Army Research Lab. His book Data Mining: Concepts
and Techniques (Morgan Kaufmann) has been used worldwide as a
textbook. He is a fellow of the IEEE.
> For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
460 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 2, FEBRUARY 2013

You might also like