Professional Documents
Culture Documents
1 1 1 1 2
1. 1000382. 210094
Abstract
For a literature collection in some area, if two research objects have higher
co-occurrence frequency, then one usually assumes that there exists an underlying link between
the objects in a higher probability. It is this reason that prompts the popularity of many
co-occurrence analysis methods, such as co-word analysis, co-citation analysis, co-authorship
analysis, etc. The process of traditional co-occurrence analysis often consists of three steps.
However, there are problematic for the previous two steps, which may lead to some misleading
co-occurrence clustering results. Therefore, this paper introduces a new method for co-occurrence
clustering analysismaximal frequent itemset miningfrom association rule mining domain.
_______________________
2011 1 23
2011 8 26
1979
E-mail: xush@istic.ac.cn1965
E-mail: qiaox@istic.ac.cn1973
E-mail: zhulj@istic.ac.cn,1979
E-mail: zhangyl@istic.ac.cn1979
E-mail: xuechunxiang@gmail.com
1)
2011BAH10B04
YY-20112909TQC011
This approach compresses three steps in the traditional co-occurrence clustering into one step,
which simplifies greatly the resulting process. One of the most appealing characteristic of this
approach is that it can make the best use of all available information, which overcomes the
problem in the traditional co-occurrence analysis. Experimental results show that one can
basically obtain satisfactory clustering results by setting a proper minimal support threshold.
Keywords
[1]
//
/[2]
Salton
[3]
20 70 ,
1971
20
2.1
k ( 3)
k ( 3)
[4]
Zipf Donohue
[11]
[5]
b
D
N I = {T1,
single-linkage
complete-linkage
T2, , TN}
average-linkage
N C Ci,,j = Cj,,ii j
T2,2, , T2,n}
i, j = 1, 2, , N<Ti, Tj>
Ci,,ii = 1, 2, , N
Sim(T1,i, T2,j)
Ti
S1 S2
(1)
(2)
e1 S1 e2 S 2
C inclusion
e1 S1 e2 S 2
Simaverage ( S1 , S 2 ) =
[7]
Pajek[8]NetDraw[9]
KNN
k-Means[10][11]Science
APaffinity propagation[12]
multidimensional scalingMDS
1
Sim(e1 , e2 )
m n e1S1 e2 S2
(3)
ID
1
2
3
4
5
6
{A, C, D}
{A, B, C, D, E}
{D, E}
{B, C}
{B, C, D}
{B, D}
ID
7
8
9
10
11
12
{B, C}
{A, B, D, E}
{B, C, D}
{A, C, D, E}
{A, C, D}
{D, E}
1 12
[13]
2.2
1 1
S1 S2
2 2 ab
S1S2
B
2
A
B
C
D
A
B
CD
A
BCD
ABCD
C
4
5
B
2
D
5
5
6
CD
4.5 (4)
5 (3)
BCD
3.67 (1)
E
3
2
2
5
E
3
2
3.5 (2)
E
3
3 (1)
E
3 (1)
S1S2
Simtrue(S1,
S2) Simsingle(S1, S2)Simcomplete(S1, S2)
Simaverage(S1, S2)
Agrawal et al.
[14]
/
Bayardo 1998
[15]
3.1
I
I
itemset
1 D
X X
Sup(X)
2 D
min_sup X
I Sup(X) min_sup X
D
FI
3
D min_sup
X I Sup(X) min_sup
X Y sup(Y) < min_sup
X D
MFI
MFI
MaxMiner[15] MAFIA[16]
GenMax[17][18] Pincer-Search[19]
Diffset [20] HBMFI[21]
[17]
3{A, B, C, D, E}
3.2
2 1
min_sup = 2 3 {A, B,
C, D, E} 3
1
3
3
3 4
{A, B, C, D, E}
C, D}
[22]-[24]
1-
3 4 2
4.2 FAO-780
overlapping
case
countries
studies developing
3
72
(9.23%)
10
61
(7.82%)
4
79
(10.13%)
11
46
(5.90%)
5
74
(9.49%)
12
43
(5.51%)
4.1
6
77
(9.87%)
13
41
(5.26%)
7
73
(9.36%)
14
30
(3.85%)
8
87
(11.15%)
15
18
(2.31%)
9
68
(8.72%)
11
(1.41%)
25
C C
5
FAO-780
FAO-780 MFI
[25]
780
320
FAO
179 MFI
1560
23 7.98
2
4.2
2.1
141
3 141
106 25 23
25
case studiesdeveloping
countries
25 C
min_sup
2 min_sup = 5 A
min_sup
106 23 =
83 23
[26]
525
3
ID
2
3
4
5
ID
8
9
10
30
22(1): 155-205.
[8]
Network
[OL].
[2010-10-12].
http://pajek.imfm.si/doku.php?id=pajek.
[9]
[2010-12-12]. http://www.analytictech.com/netdraw/
netdraw.htm.
[1]
.
[J]. ,
[2]
[6]
972-976.
[13] . Proximity
30(11): 1163-1170.
[14] Agrawal R, Imielinski T, Swami A. Mining
, 2009.
207-216.
[15] Bayardo Jr. R J. Efficiently Mining Long Patterns
115-123.
Maximal
1973.
Washington:
433-442.
[7]
386-393.
[5]
[J]. , 2011,
[4]
5[R].
[3]
Frequent
IEEE
Itemset
Computer
Algorithm
Society,
for
2001:
1986: 103-123.
1-20.
IEEE
Computer
Society,
2001:
163-170.
[19] Lin D-I, Kedem Z M. Pincer-Search: An Efficient
Algorithm for Discovering the Maximum Frequent
Set [J]. IEEE Transactions on Knowledge and Data
Engineering, 2002, 14(3); 553-566.
[20] Zaki M J, Gouda K. Fast Vertical Mining using
Diffsets [C] // Proceedings of the 9th ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining (KDD). New York: ACM Press,
2003: 326-335.
[21] Zubair Rahman A M J M, Balasubramanie P. An
Efficient Algorithm for Mining Maximal Frequent
Item Sets [J]. Journal of Computer Science, 2008,
4(8): 638-645.
[22] Creativity J. Creativity and Conformity in Science:
Titles, Keywords and Co-word Analysis [J]. Social
Studies of Science, 1989, 19(3): 473-496.
[23] He Q. Knowledge Discovery through Co-word
Analysis [J]. Library Trends, 1999, 48(1): 133-159.
[24] Law J, Whittaker J. Mapping Acidification Research:
A Test of the Co-word Method [J]. Scientometrics,
1992, 23(3): 417-461.
[25] Medelyan O, Witten I H, Frank E. FAO-780 [OL].
[2010-10-05]. http://code.google.com/p/maui-indexe
r/downloads/list.
[26] Perfetti C A. The Limits of Co-occurrence: Tools and
Theories in Language Research [J]. Discourse
Processes, 1998, 25(2&3): 363-377.