You are on page 1of 4

25 1 Vol. 25 No.

1
2005 1 Computer Applications Jan. 2005

: 1001- 9081( 2005) 01- 0017- 03


, ,
( , 030006)
( four_tiger@ sina. com. cn)
: , ,
,
, , ; ,
, ,
: ; ;
: TP391 : A

Study on auto proofreading method for POS tagging of Chinese corpus


ZHANG Hu, ZHENG Jia heng, LIU Jiang
(College of Computer & Inf ormation Technology , Shanxi University , Taiyuan Shanxi 030006 , China)

Abstract: The auto proofreading problem in the large scale corpus was analyzed, and a new method inspecting the correctness of
POS tagging and an auto proofreading method based on clustering and classifying were put forward. Using clustering and classifying,
the method firstly classified the sequences of part of speech of the example and got the threshold value. Then according to the
threshold value, it classified the test sequences to judge its correctness, and gave out a proofreading POS to the wrong POS Tagging.
Furthermore, it enhanced the correctness ratio of the part of speech tagging on large scale corpus.
Key words: clustering; POS Tagging ; auto proofreading

76% , 83% ,
0
70% , 2~ 3
,

, , 1
:
, ,
, 89% ;
96% [5] , ,
: , :
, , 1) , ,
,
:
, : kv
[ 6] , : / u / v / u / n / n / k / n
/ d / vu / v
, , : v
; : , v
; , , k
, 2) , ,
; , :
, ,
50 , :

: 2004- 06- 15; : 2004- 11- 27 : 863 ( 2001AA4031)


: ( 1979- ) , , , , : ; ( 1948- ) , , , ,
: ; ( 1980- ) , , , , : .
18 2005

: vaq
an : : { n v a d u p r m q c w I f s t b z e o l j h
: / p / t / a / v / m / wp k g y}
: v :
: , ( v) 0 0 1 0 0
, , 1 0 0 0 0
, 0 0 0 0 1
, Y= 0 0 1 0 0
200 1 0 0 0 0
, 200 11% , 0 0 0 1 0
47% , 0 1 0 0 0
: ,
: , 0
: / p / v / n / u / ns / n / a / 3:
v 18/ m / q / m / wp
: / p / r / n / v / v 1 000/ m / m / q : Vec= X Y
/ wp / v / a / n / u / a / n / d / v /
, n, / wp :
Vec = ( 1/ 22, 1/ 11, 2/ 11, 4/ 11, 2/ 11, 1/ 11, 1/ 22) Y=
a, v
( 3/ 11, 1/ 22, 9/ 22, 1/ 11, 2/ 11, 0, 0 )
2 ,

, ,
,


,
:

Si , j = ( x i , y i ) V- 1( xi , yi ) ( 1)
2. 1
, : x i y i
m
: V=
1
(xi - x )( xi- x )
m- 1 i= 1
1
:
a) / v / a / n / u / a / n / d / v / n, / wp
1 2 3 4 5 6 7 b) / r / v / m / q / a / n / u / n, / wp
: ( ) ( )
:

a) ( 3/ 11, 1/ 22, 9/ 22, 1/ 11, 2/ 11, 0, 0 )
1:
b) ( 5/ 22, 1/ 11, 4/ 11, 0, 1/ 11, 0, 0, 1/ 11, 2/ 11, 0 )


( 1)
, ,
0. 236
X= { ( 1/ 22) , ( 1/ 11) , ( 2/ 11) , ( 4/ 11) , ( 2/ 11) ,
2. 2
( 1/ 11) , ( 1/ 22) }

1/ 22: ( )
,
1/ 11: ( )

2/ 11: ( )
x i xj
4/ 11:
d ij :
2:
1

, d ij H ( 2)
k- 1
, H
7 m : : k , H

; H , :
Step1: , ,
: / v / a / n / u / a / n / d / v
/ n, / wp VA ,
: ( a n u a n d v ) , ( 1)
1 : 19

VA : :
Distance( Vi , VA ) , Vi 76% , 0. 20% ,

Step2: Step1 ,
: ,
Aver age( Distance( Vi , VA ) ) 24% 76%
H ,
83%
3 , :
, ,
, , 1
:
a) / v / v / p / n / nl / u /wp
v, v, :
vvpnn
1
b) / d / v / wp / nsh / n / wp / u
: 1 n ( 1 n) ; / n / n
n, n, :
1 , wunn
, ,
:
Step1:
, , , ,
Distance( V ( i) , 1 514 6 v, v : 0. 396% ,
VA ( i) ) , i
Step2: c) / m / v / iv, / wp / v / d / v / u

Min{ Distance( V( i) , VA ( i) ) } , / n; / wp / m / v / wp / n / wp / in
I v, v
1 , ( n ) ,
i - 1 , i - 1 , ,
4. 2
n - 1, ,
Distance( V ( i) , VA ( i) ) , 70%
Min{ Distance( V( i) , VA ( i) ) } I , :
I , I a) / v / v / aq / u / wp / n / v / wp
: / v : / n
b) / r / v / vl / d / v / u / wp /
4 jn / wp
4. 1 : / v : / n
50 : c) / n / u / n / v / d / d
64 040 , 562, / aq
518, 91 , : / v : / n
75. 98% , 0. 14% , 367, :
70. 84% a) / v / n / n / c / vl / m / q /
: M , m, aq / u / n
N , n, b) / p / n / v / m / q / aq / n / u
c / v
= ( N - n) / m; c) / u / / n / r ?/ wp ( 24 )
= n / ( M - m) ;
= c/ N
24 2005

2) ALOOS MSTE
5 3) ,
:
ykp = sin( 10( xk + 1) ) + w k ( 14) 6
{ xk } [ - 1 1] , { WK } 2 = , ALOOS
0. 002 tanh
,
5. 1
LM ,
k ( z ) 108 P z ( k , :
k ) [ 1/ N 1] , 1 N [ 1] ANDERS U , KORN O. Model Select ion in N eural Networks[ J] . Neural
, MSPE , N etwork, 1999, ( 12) : 309- 323.
ALOOS MSTE N [ 2] RIVALS I, PERSONNAZ L. Construction of Conf idence Intervals for
Neural Networks Based on Least Squares Estimat ion [ J ] . Neural Net
1 work, 2000, ( 13) : 80- 90.
[ 3] REED R. Pruning A lgorithms - a Survey [ J] . N eural N etwork, 1993,
ALOOS
N hf ull h sel MSTE ALOOS
MSTE ( 4) : 740- 747.
50 3 3 1. 94 E - 3 3. 08 E - 3 1. 64 [ 4] KWOK TY , YEUNG DY . Construct ive A lgorithms for Structure
200 4 4 1. 86 E - 3 2. 15 E - 3 1. 15 Learning in Feedforward N eural Networks for Regression Problems[ J] .
500 4 4 1. 94 E - 3 2. 05 E - 3 1. 06 N eural N etworks. 1997, ( 8) : 630- 645.
5. 2 [ 5] KAMIN ED . A Simple Procedure for Pruning Back progat ion
N 100 4, 74% Trained Neural Network [ J] . Neural Network, 1990, ( 2) : 239- 242.
, [ 6] , .

N 100, 4 , 1 [ J] . , 1998, ( 4) : 42- 46.


[ 7] , . [ J] .
97. 9%
, 2000, ( 2) : 27- 29.
5. 3
[ 8] . [ M ] . :
1) , k ( z ) ALOOS
, 1990.

( 19 ) 2) ,
:
3) ,


v, n v n v
,
v, n v n v
,
n, v n v n
: , , ,
n
, 4) ,
, ,
,
, ,
, ,
,
:
5 [ 1] , . [ M ] . : ,
2002.
[ 2] . [ M ] . : , 2000.
, 50 [ 3] , . A nalyzing Popular Clustering A lgorithms from Differ
, 200 ent Viewpoints[ J] . , 2002, 13( 8) : 1382~ 1394.
[ 4] , . [ M ] . : , 2001.
, :
[ 5] , , . [ J] .
1) , , 1998, 9( 2) .
[ 6] , , .
[ J] . , 2000, 11( 4) .
,