Professional Documents
Culture Documents
2012 2
Vol.2 No.1
Feb. 2012
471003
KNN
Abstract: Automatic summarization can help us accurately and efficiently obtain the information needed from the magnanimity
information and has attracted more attention. In this paper, a new method for Chinese text summarization using the algorithm of Affinity
Propagation Cluster (APC) is presented. It is not necessary to set the number of clusters and the initial representative exemplars in APC,
so it can avoid the problems of local-optimal and instable clustering results caused by randomly selecting initial representative
exemplars. And the algorithm has high computing efficiency. The results of the experiments show us that Chinese text automatic
summarization based on APC has higher accuracy than KNN cluster. APC is a suitable method for automatic text summarization.
Key words: affinity propagation cluster; text auto summarization; Chinese text
[1]
IBMLuhn[2]
Linclue words
[3]Julian K.[4]
Tadashidiversity of concepts
[5]Tadashi
[6][5][6]
[5][6]
VSM[6]LSA
[8][9]
[9]
1
1.1
T S1, S 2,
{Sk1 , Sk 2 ,
, Sl T {Sn } S k
, Skm }
A {Sk1 , Sk 2 ,
T
, Skm }
[8,9]
T i S i j S j w
Si {wi1 , wi 2 ,
wip } S j {w j 1 , w j 2 ,
w jq } | Si | | S j | S i S j
| W | T S i S j
sim( Si , S j ) k i 1 k
|S |
log 2 | S j |
if wik S j
log 2 | W |
if wik S j
k {
stop-of-word
1.2
r responsibility a
availability
i k r (i, k ) i
k evidence a(i, k ) k i
[9]
a(i, k ) min{0, r (k , k )
2
3
i ' s .t .i '{i , k }
a(k , k )
max{0, r (i ', k )}
i ' s .t .i ' k
a availability0 r 2
r (i, k ) a(i, k )
preference P P
S (i, i )
P
S (i, i ) P
P
[9]
2
JonesIntrinsic
Extrinsic[10]
Lin
ROUGERecall Oriented Understudy for Gisting Evaluation[11]
ROUGEn-gramROUGE-N
ROUGE-LROUGE-S ROUGE-WROUGE-L,
ROUGE
n s m
m
s
m
n
F [12]
2RP
RP
P R F
3
3.1
Goldstein
[13] [14]
1
30
10
130160
3.2
neucsp[15]
P
K-MeansAPC1
P APC 1P sim( Si , Si ) 30 K-Means1 APC
1 APC 2 P sim( Si , Si ) 80 K-Means2 APC 2
3.3
1K-Means[6][16]
[6][16]K-Means
1K-Means
1
2
0.678
0.635
0.656
K-Means1
0.644
0.603
0.623
APC2
0.495
0.754
0.597
K-Means2
0.464
0.706
0.559
APC1
[1]
. [M], , 2006.
[2]
Luhn H P. The Automatic Creation of Literature Abstract[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
[3]
Lin C Y, Hovy E. Identifying Topics by Position[C]//Proc. of the 5th Conference on Applied Natural Language Processing. [S. l.]: IEEE
Press, 1997: 283-290.
[4]
Julian K, Pedersen J O, Chen F. A. Trainable Document Summari[C]//Proceedings of the 18th Annuual International ACM SIGIR
Conference on Research and Development in Information Retrieval. Seattle, WA: [s.n], 1995: 68-73.
[5]
Tadashi N., Matsumoto Y. A New Approach to Unsupervised Text Summarization[C]//Proc. of Annual ACM Conference on Research
and Development in Information Retrieval. [S.l.]. IEEE Press, 2001.
[6]
[7]
Filatova E, Hatzivassiloglou V. Event-based Extractive Summarization[C]//Proc. of ACL Workshop on Summarization. Barcelona, Spain:
[s. n.], 2004.
[8]
T. M. Cover, J. A. Thomas, Elements of Information Theory, (John Wiley & Sons, New York, NY, 1991).
[9]
Frey, B.J. and D. Dueck, Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976.
[10] Jones K S, Galliers J R. Evaluating Natural Language Processing Systems: An Analysis and Review. Berlin: Springer, 1996.
[11] Lin C. Y. ROUGE: A Package for Automatic Evaluation of Summaries[A]. In: Proceedings of the ACL2004 Workshop on Text
Summarization[C]. Spain, 2004, 7: 428.
[12] Van Rijsbergen, C. J. Information Ret rieval, 2nd edition[M] . Dept. of Computer Science, University of Glasgow. 1979.
[13] Goldstein J et al. Creating and evaluating multi-document sentence ext ract summaries// Proceedings of the 9th International Conference
on Information and Knowledge Management. Virginia , USA , 2000: 165-172.
[14] , . [J]. , 2005, (6).
[15] NEUCSP. http://www.nlplab.com/chinese/source.htm.
[16] , , . [J]. , 2007, (8).