You are on page 1of 13

5 6 9 2007 3 235- 246 ,

*
COLSEC

COLSEC
, /
0 / 0 / 0,


1.
,
( ) , ,
,
, ,

,
,
1990
Granger ( 1998: 3 - 18 ) 20 ICLE
( International Co rpus o f L earner Eng lish) ,
: , ,
; ,
, , ,
, CLEC ( Ch in ese Learnersp Eng lish Corpus,
2003), JPU ( Janus P annn iu s U niversity) , Lodz
Lancaster REPL ICA , USE ( U ppsa la Student Eng lish)
, HKUST ( H ong Kong Un iv ersity o f Sc ience and T echno lo gy)
, ELT CLC
( Cambridge L earner Corpus), Longm an E ssential A ctivator

* ( 01BYY 007) / 0,
, ,

: http: / /www. ddyyx. com 235


LLC ( L ongm an L earnersp Corpus) ( Pravec 2002: 81- 114),
,
, ,
, ICLE MELD( M ontclair E lectronic L anguage Database, P ravec 2002: 82- 3)
, ,
,
; , ,
;
,
, : ,
,
;

, CLEC,
) ) ) COLSEC ( College
L earners Spoken Eng lish Corpus)
; COLSEC , ,
, ;
,
COLSEC,
, ,
2.
,
;
, ( ex terna l criteria)
( interna l criter ia, S in cla ir 2002: 312- 3) ,
;
,

,
2. 1 ( task setting )
, : ? ,
? ,
: ,
, ,
, (

236
, : CLC LLC USE ; T SLC ICLE
) COLSEC ,
; ,
, , ,
, ; ,
/ 0, ,
, ( )
COLSEC

,
? , ; ,
, , ,
COLSEC ,

2. 2 ( discourse genre)
/ 0,
, / ) ) ) ) 0
, , / 0
, , ,
; / 0 / 0, ,
,
, , COLSEC
: ) ) ) ( 2004: 140)
,
: , ,
,

2. 3 ( topic)
,
, ,
/ ( type / token ratio) ; , / , , / 0
, COL SEC 39
, / 0,
,

2. 4 ( tim e span) ( diachrony)


,
, COLSEC

2007 3 237
, , 2001 2004, 4
, ( synchron ic corpus), ,
, ,
, S in cla ir Bank o f Eng lish:
, , S in c la ir
( m onitor corpus) , ( S in cla ir 1991: 24- 5) / 0
; / 0,

2. 5 ( learner backg round)



, , ,
( docum enta tio n),
( , COLSEC ),
, , ,
,
3.

, COLSEC ( 1) , ,
, ,
, ;
; ; ( 2)
, ,
( 3) , ,
, ,
:

3. 1 ( head inform at io n m ark-up)


COLSEC XML ,
, ,
: ,
, :
< transcription id= 0002 disno = 00021201030402> , < / transcr iption>

/ 0
< participant interlocu to r= 1 speake r= 3 /> < /participant>

/ 0

238
< speaker sp1= fem a le sp2= fem ale sp3= m a le /> < / speake r>

/ 0
< interlocu to r gender= 1 /> < / inter locutor>

/ 0 ( 1, 2 ) / 0
,
, , ,
,

3. 2 ( d iscourse inform at io n tagg ing)


,
COLSEC
( 1)
, , :
< sp1> , < / sp1>

,
( 2) / 0,
, COLSEC
,
A. ,
( transit ion relevance p laces),
, ;
, , , / 0 ( in trusive
in terruption) ( Furo 2001: 48) , < in terrupted> , < / interrupted> ,
:
< sp2> Bu t you shou ld rem ember tha t there a re thousands [W th-s] o,f you know, ag ed people, oh,
o ld people. Y es, they have been sm oking for m aybe about of mo re than fifty yea rs, o r things like that.
Itps a part o f h is life and you-you canpt just deprive h im of his < /sp2>
< interrupted>
< sp1> T hese peop le a lso stop smok ing. O k, so. . . < / sp1>
< / interrupted>

B. , ,
,
, , < in terrupted> , < / interrupted>
, , [ ], :
< sp1> O k, I th ink, I think that er I say you shou ld st-stop sm oking dur ing a period o f life. M aybe,

2007 3 239
about er before forty years o ld, [ you-you can stop sm ok ing] < /sp1>
< interrupted>
< sp2> [ It itps nothing to do w ith age]. Itps just er a hab it. < / sp2>
< / interrupted>

C. , ; ( verbal
response) , ( non-verba l response) , ,
, , [ ] , :
< sp2> H e donpt care the consequence about d isease o r [ th ings like that. ] < / sp2>
< sp1> [ Y eah. Thatps right] . < / sp1>
< sp2> I- I th ink itps qu ite acceptab le. < /sp2>

( 3) ,
,
,
, / - 0 for-for- for- fo rw ard
,
( 4) , ,
COLSEC , ( )
, (
) , / 0
, / 0, 1
, 6, ,
: 2- 3, 4- 6
, COLSEC ,
; ,
, / . . . 0; , / . . . . . . 0, :
< sp3> Bu t the s. . . the sm oke er the cigarette is also ligh t and w ill m aybe a danger to h -
i h im and
m ake a . . . make a . . . < /sp3>
< interrupted>
< interlocu to r> Cause a fire. < / in terlocuto r>
< / interrupted>
< sp3> A nd if she if he didnpt have mm good . . . . . . er, if he have good . . . . . . er < / sp3>
< interrupted>
< interlocu to r> G ood liv ing habits? < / inte rlo cutor>
< / interrupted>

, ,

2005 1 13 , ,

240
,
( 5) ,
( reactive tokens) , , ,
, mm, m n, ,
, , mm, m n, erm, er, hm , :
< interlocu to r> N ow, * * * , whatps the m ost popu lar hobby o f your c lassma tes? < / interlocuto r>
< sp3> M m, for our classm ates, they like to[ [ pt- r] p lay footbal.l < /sp3>

( 6) ,
, , / ? 0,
, / ? ??? 0,

3. 3 ( m ispronunciation tagg ing)


, COLSEC
, / 0 ( S in clair 1991: 21) ,
,
10,
, :
, / [ ] 0,
:
( 1) W ,
W , ,
m edia [W e-a i] m ed ia e [ ai] ;
, , banana [ 2a- ae] banana a
ae( [ ] )
( 2) P ,
, / - 0, c lass
[ P a-r] class a [ r]
, , success [ P1c-er] success c
er( [ ] )
( 3) , M
, hum an [ M n] hum an n
, , m agazin e [ M 2a] m agazin e
a
( 4) S, S
psycho log ica l [ S2] psycho lo g ica l

2007 3 241
COLSEC , ,
, ,
, 200 10, 29
4, ,
( 2005: 48- 58)
4. : XM L
, ,
, , ,
, ,
, ; ,
, , ,
, HTML XML , ,
, ,

4. 1
CLAW S AGTS,
, ,
TOSCA /LOB QTAGW inB rill (
http: / / corpus. sjtu. edu. cn ) , TOSCA /LOB
: 1) , ; 2) ; 3) SGML /
TE I; 4) SGML ( < p> < /p> )

4. 2

( 1) Brow n / / 0 / _0,
study /NN study_NN 1980
, W inB rill
( 2) SGML SGM L,
, :
< w ord attribute= "NN 1" > study < /wo rd>

( 3) XML XML , SGML


,

COLSEC XM L , , HTML
, COL SEC HTM L XML :

COLSEC

242
< ! - - s- - > < span class= " JJ" > G ood< / span> < span class= " NN " > m orning < /span> < c
SPER > . < /c> < ! - - / s- - >

, :
( 1) HTML , ( class)
, ( NN) / 0, - - ,
NN - -
NNS,
,
( 2) HTML , HTML
Dreamw eaver ( split w indow ) ,
, , ,
, , , ,

( 3)
( 4) , ,
,
, : ,
, ,

4. 3
, ,
TOSCA /
LOB / 0 / 0, ,
; < s> < / s> ; < w > < /w > ;
TLB-C-, TLB-A- ( ,
, 2005) TOSCA /LOB SGML /TE I
, ,
,

, , :
COLSEC :
< ? xm l version= " 1. 0" encoding= "UT F- 8"? >
< htm l xm lns: xs i= http: / /www. w3. o rg /2001 /XM LSchem a- instance
xs:i noN am espaceSchem aL ocation= h ttp: / / corpus. sjtu. edu. cn / corpus /X SD /COL SEC. x sd" >
< HEAD >
< transcription id= " 0106" d iscno= " 0002112203" / >
< participant interlocuto r= "1" speaker= " 3" / >
< speaker sp1= " fem a le" sp2= " fem ale" sp3= "m a le" />

2007 3 243
< interlocu to r gender= "2" />
< /H EAD >
< BODY >
< p>
< interlocu to r> < span c lass= " RO LE " > Inter locutor: < / span>
< ! - - s- - > < span class= " JJ" > G ood< / span> < span class= " NN " > m orning < /span> < c
SCOM > , < /c> < span class= "PN " > everybody< / span> < c SPER > . < /c> < ! - - / s- - >
< / inter lo cutor>
< /p>
< p>
< spa ll> < span c lass= "ROLE " > A ll speakers: < / span>
< ! - - s- - > < span class= " JJ" > G ood< / span> < span class= " NN " > m orning < /span> < c
SPER > . < /c> < ! - - / s- - >
< /spa ll>
< /p>
< /BODY >
< /HTM L >

, ,
, ,
( 1),
, ; ( 2)
; ( 3)
5.

, :

5. 1
,

, COLSEC , , 10 166,
, th, , [W th- z]: 1 152,
[ W th-s] : 1 125, [ W th-d]: 114,
: 4 375; 2 195; 878

COLSEC , :
( 1) , ,
?
( 2) ? ,
?

244
( 3) ?
( 4)
?

5. 2
COLSEC XML DTD ( docum ent type def in ition )
, ( turn sequences) ( adjacency
pa irs) ( T sui 2000 [ 1994] : 11) , ,
, ) ( question- response ) ) ( reques-
t
statem ent) ) ( statem en-
t agreem en t) ) ( statem en-
t question ) )
t statem ent) , ,
( statem en-
( act) ( m ove) ( exchange) , Sinc lair Cou lthard( 1975) I-R-F
( init iatio n- response- follow up, ) ) ) ;
, COLSEC
, , , I-R-F
, 5: ( 1) I-R-F ; ( 2) I-R- I-R-I-R
, F; ( 3 ) I-R- I-R- I-R ; ( 4) I- I1-R1-R-F ; ( 5) I-R- I-R-F-FF , (
2004: 143- 4) COLSEC xm l DTD
,
, COLSEC,
, :
( 1) , ?
( 2) , ?
( 3) ?

5. 3
,
,
, , COLSEC
, / 0

, ( 2004: 145- 7) :
( 1)
( 2)
( 3)
6.
COLSEC

2007 3 245
, COLSEC : ;
; ,
, , ,
; ,

Furo, H. 2001. Turn-tak ing in Eng lish and J apanese. N ew Y ork / London: R outledg e.
G range r, S. 1998. T he Com puter Learne r Corpus: A v ersatile new source of data fo r SLA research. In S.
G range r, ed. , Learner English on Comp uter. L ondon: Longm an. Pp. 3- 18.
P ravec, N. A. 2002. Survey of learner co rpo ra. ICAM E J ournal 26, 81- 114.
S incla ir, J. 1991. Corp us, Concordance, C olloca tion. Ox ford: O x fo rd U niversity Press.
. 2002. Corpus L ingu istics a t the M illennium. In Y ang Hu izhong ( ), ed. , A n In troduction to
Corp us L inguis tics. Shangha :i Shangha i F oreign L anguage Educa tion P ress. Pp. 310- 30.
S incla ir, J. and M. Cou lthard. 1975. T ow ards an A naly sis of D iscourse. L ondon: Ox ford U n ive rs ity P ress.
T su ,i A. B. M ( ). 2000 [ 1994] . Eng lish C onversation. Shangha :i Shangha i Fore ign Languag e
Education P ress.
, 2003, 5 6:
, 2004, 5 6 2, 140- 9
, 2005, COLSEC 5
6: 48- 58
, 2005, COLSEC 5
6: 11- 25

, , , : ,
, : / 0 /
0: nxw e@
i ma i.l sjtu. edu. cn
W E I N a ix ing, m ale, P h. D. , is a professor at Shangha i Jiao tong U n iversity. H is research interests include
m ethodology in corpus construction, co rpus-dr iven lingu istics and contrastive interlanguage analysis. H is ma jor
publica tions a re: / Corpus-based and corpus-driven studies of co llocation0 and / Approaches to the study o f
sem antic prosodies0. E-m ai:l nxw e@
i ma i.l sjtu. edu. cn

: 200030
453002
471003

246
W EI N aixing, LI W en zhong and PU Jianzhong, D esign princip les and annotation
m ethods of th e COLSEC corpus
Th is art icle describ es the desig n princ ip les o f th e COLSEC corpus and th e m ethods adopted in
transcrib ing and annotat ing its data. It spe lls out such im portan t reg ister factors as task settin g,
discourse genre, topic variety and learner background in form at ion, in sam p ling data fo r the
learner spoken Eng lish corpus. It th en goes on to discuss the related issues and the ir so lu tions
as regards head inform ation m ark-up, discourse inform a tio n tagg ing and pronunciat io n error
tagging, w hich ought to, in genera,l abide by the overall princ ip les of truthfulness, accuracy
and com pleteness. The paper also dea ls w ith th e questions o f POS tagger se lect io n, w ord c lass
tag assig nm ent form at and its adaptat io n fo r on- line search on the Interne.t
K eyword s: design prin ciple, learner spoken Eng lish corpus, discourse inform a tio n tagg in g,
pronunciation erro r tagg ing

LIU W e,i D ynam ic syn tax: A case study of th e resum p tiv e p ronouns in Ch inese
top ica lization structures
Resumpt iv e pronouns in Ch in ese topicalization structures are som etim es obliga tory and som etim es
opt iona.l F ollow in g th e fram ew ork of dynam ic syntax, the presen t paper estab lishes a dynam ic
parsin g m ode l and attem pts to g ive a princip led and uniform explanation for th e presence and
absence of resum ptive pronouns in top icalization structures. By so doing, w e attem pt to
expound the work ing princ ip les of dynam ic syntax and explore the possibility of app lying
dynam ic syntactic theory to the study o f Ch in ese in genera.l
K eyword s: dynam ic syn tax, Ch in ese com putational ru les, Ch in ese lex ical actions, resum ptive
pronouns

FANG L,i A review of G eneralized Phrase S tructure G ramm ar


Th is article in troduces the m a in theore tica l m ach in ery o f Generalized Phrase Structure G ramm ar
( GPSG ). It show s how GPSG accounts for m any o f the language facts that in vo lve m ovem en t
operations in classical transform at io na-lgenerative gramm ar. It a lso touches R is tadps finding that
GPSG is com putationally in tractab le.
K eyword s: transform at io n, imm ediate dom in ance ru le, m etaru le, expressive pow er,
com putational com plex ity

286

You might also like