Professional Documents
Culture Documents
Yuliang Shi2
Jie Yu1
Tianhong Wang1
Jingzhe Wu1
2
School of Computer Engineering and Science
School of Computer Science and Technology
Shanghai University, China
Shandong University, China
ffliu@shu.edu.cn liangyus@sdu.edu.cn jieyu@shu.edu.cn tianhonglouis@yahoo.com jzwu@shu.edu.cn
I. INTRODUCTION
Web service has already become an important paradigm
for developing web applications. Growing number of web
services raise the issue of efficiently locating the desired
web services.
Many approaches have been proposed with respect to the
way in which services are described. Semantic
communities provide ontology languages for web services
such as OWL-s [9] and WSMO [10]. Services are
organized in ontology. To locate desired services depends
on semantic match of services. The formal foundation helps
the semantic matching approaches [1-3] obtaining high
accuracy. On the other side, many efforts are required from
both users and developers on the semantic description of
services and the applied range is also limited by the timeconsuming building and reasoning of ontology.
Methods in [5,15] apply the traditional IR techniques on
the text description of services. But in practice the text
field of services provided by developers often contains
much information unrelated with the functionality of
services which lowers down the accuracy of metrics.
As most services have WSDL to describe their operations,
many methods focus on information extracted from XML978-0-7695-4128-0/10 $26.00 2010 IEEE
DOI 10.1109/ICWS.2010.67
155
http://www.programmableweb.com
156
k 11
k12
S1
k 31
corresponding WSDLs. E
k 32
S3
S2
k 41
k 42
k 43 S
4
( k i , k j !, wi , j ) , where wi , j is
k j S 2
max{ wi , j }
where 0 d wi , j d 1
k22
max_ value1, 2
k 21
ki S1
(1)
157
k 31
k 32
S3
k 32
k 42
k 43
k 31
S3
k 41
S4
k 51
k 52
k 53
S5
TABLE I
DECOMPOSING RULES FOR WSDL
Rule
158
name
Word
Case change
filmAction
flim, action
Case change
Suffix number
elimination
Underscore separator
ComedyFilm
comdey, film
film1
Film
comey_film
comdey, film
from Google respectively, N (k1 k 2 ) denote the cooccurrence of k1 , k 2 in the web pages from Google. Then
according to [8], the semantic distance of k1 , k 2 is:
Measure 1 (M_sum).
N ( k1 k 2 ) u N
)
N ( k1 ) u N ( k 2 )
(3)
term _ dis tan ce1, 2
log N
where N is the number of total web pages and usually set
to 1011 .
Google offers the values of N (k1 ), N (k 2 ) . The similarity
degree of almost any two terms thus can be measured. The
weights over edges in fig. 2 and fig. 3 are calculated
according to (3). While for the semantic distance of two
same terms, to ensure the precision we decide that the
semantic distance is just set to 1 and doesnt need to be
calculated.
wi , j
log(
Similarity1, 2
k i S1 , k j S 2
( 4)
| S1 | u | S 2 |
Measure 2 (M_max).
2 max_ value1, 2 max{wi , j }
k S
k U _ MV
Similarity1, 2
(5)
| S1 | | S 2 |
Measure 3 (M_min).
2 max_ value12 min{wi , j }
k S
k U _ MV
Similarity12
( 6)
| S1 | | S 2 |
Measure 4 (M_avg).
2 max_ value12 average( wi , j )
k S
k U _ MV
Similarity12
(7 )
| S1 | | S 2 |
Measure 1 (M_sum) is the average of all the weighted
edges among two services. Its for the purpose of making
comparison with the two-phase similarity measures
measures 2, 3 and 4 (M_max, M_min and M_avg).
When | S 1 | | S 2 | , the two-phase measures are just the
j
159
recall
(9)
F1
A. Tuning
At the tuning phase, we randomly select 240 services
which belong to 7 different categories from the benchmark.
The terms are firsty extracted according to the rules in table
1. The similarity of terms between services is computed as
equation (3).
Fig.4 shows the precision, recall and F1 of four
measures. From the results, M_sum only has the highest
recall but the lowest values of precision and F1. M_max,
M_sum and M_avg have better precision and F1 than
M_sum. Thats for the reason that M_max, M_min and
M_avg are at least as good as the maximum matching
measure. We decide M_max is the best one because of its
highest precision and F1. M_max chooses the maximal
weights of each remaining term which is consistent with
the maximum weight matching because both try to find the
maximal values from the weights over edges. M_max could
identify the distance of the remaining parts with the already
matched part, while M_min and M_sum are not quite fit
because they may not be consistent with the basic trend and
thus not reflect whether the left unmatched part is related to
the matched part.
We have also noticed that the precision, recall and F1 of
M_max, M_min and M_avg do not exceed 70%. Thats
mainly for two reasons. Firstly we only decompose terms
from portType names and neglect terms appearing in other
Numoriginal
2 * precison * recall
(10)
precision recall
Besides the application of service classification, we also
apply our similarity to service query. According to [4], we
choose R-precision and AP (Average precision) as
parameters for evaluating the similarity measure on service
query. Let P n represent the number of the relevant services
among the first n returned services for the given query w.r.t
the first n returned services. When n equals to the number
of all the relevant services for the given query in the corpus,
it is R-precision. AP is the average of precisions P n when n
varies from 1 to the number of all the relevant services for
the given query. More precisely:
n
Numrelevant
(11)
Pn
n
n
represents the number of the relevant
where Numrelevant
services among the first n returned services for the given
query. Let N denote the number of all the relevant services
for the given query, then
N
Numrelevant
(12)
R precision
N
N
Num1relevant ... Numrelevant
(13)
AP
N
Numcorrectly _ classified
(8)
http://semwebcentral.org/projects/owls-tc/
160
0BVXP
0BPD[
0BPLQ
0BDYJ
80%
75%
70%
69.07%
68.87%
69.05%
66.65%
62.08%
62.18%
62.07%
65%
61.71%
56.54%
57.45% 56.50%
56.37%
60%
55%
50%
precision
F1
recall
edges be w _ min
the example in fig.3 (b)), and the maximum weight for the
left
terms
with
the
matched
terms
be w _ max c max{wi , j | k i S1 , k j U _ MV }} (its 0.11 for
k j U _ MV
90%
80%
70%
As
MM 1, 2 M _ max 1, 2
t w _ min S1 ( S 2 S1 ) w _ max c( S 2 S1 ) S1
and by the condition w _ min t w _ max c
50%
40%
58.47%
20%
10%
0%
recall
For the example in fig.3 (b), its reasonable that when the
left term has weak connection (0.157>0.11) with matched
terms, the similarity value should be lower. Therefore, the
property shows that the metric M_max utilizes each term
fully to decide the final similarity degree of services and is
able to reflect the practical case much better than the
maximum matching.
30%
precision
we
53.18% 55.58%
49.89%
60%
have MM 1, 2 M _ max1, 2 t 0 .
MM
M_max
65.06%
63.06%
k i S1
S1 ( S1 S 2 )
SUHFLVLRQUHFDOODQG)RI00DQG0BPD[
100%
F1
Fig. 5 the precision, recall and F1 of the maximum weight matching (MM)
and M_max
161
ACKNOWLEDGMENT
140%
100%
100%
120% 100%
100% 100%
90%
100%
73%
67%
80%
67%
50%55%
60%
50%
40%
20%
0%
Top5
Top10 Top15
QM
AM2
M_max
92%
83%
83%
REFERENCES
[1]
48%
30%
28%
[2]
R-precision AP
Top20
[3]
[4]
[5]
max_ value1.2
(14)
| S1 |
Apart from M_max and (14), we also provide a revised
asymmetric version of M_max. Given S1 as the query, we
have
similarity1, 2
[7]
max{wi , j }
max_ value1, 2
k S
k U _ MV
| S1 |d| S2 |
| S1 |
(15)
{wi , j }
max_ value1, 2 k U_ MV max
k S
| S1 |!| S2 |
| S1 |
Similarity1, 2
[6]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
162